<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" article-type="research-article" dtd-version="2.3" xml:lang="EN">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Psychol.</journal-id>
<journal-title>Frontiers in Psychology</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Psychol.</abbrev-journal-title>
<issn pub-type="epub">1664-1078</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fpsyg.2023.1167003</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Psychology</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Perceived rhythmic regularity is greater for song than speech: examining acoustic correlates of rhythmic regularity in speech and song</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Yu</surname>
<given-names>Chu Yi</given-names>
</name>
<xref rid="aff1" ref-type="aff"><sup>1</sup></xref>
<xref rid="aff2" ref-type="aff"><sup>2</sup></xref>
<uri xlink:href="https://loop.frontiersin.org/people/1780623/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Cabildo</surname>
<given-names>Anne</given-names>
</name>
<xref rid="aff3" ref-type="aff"><sup>3</sup></xref>
<uri xlink:href="https://loop.frontiersin.org/people/2314057/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Grahn</surname>
<given-names>Jessica A.</given-names>
</name>
<xref rid="aff1" ref-type="aff"><sup>1</sup></xref>
<xref rid="aff2" ref-type="aff"><sup>2</sup></xref>
<uri xlink:href="https://loop.frontiersin.org/people/10952/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Vanden Bosch der Nederlanden</surname>
<given-names>Christina M.</given-names>
</name>
<xref rid="aff1" ref-type="aff"><sup>1</sup></xref>
<xref rid="aff2" ref-type="aff"><sup>2</sup></xref>
<xref rid="aff3" ref-type="aff"><sup>3</sup></xref>
<xref rid="c001" ref-type="corresp"><sup>&#x002A;</sup></xref>
<uri xlink:href="https://loop.frontiersin.org/people/1557196/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>The Brain and Mind Institute, Western University</institution>, <addr-line>London, ON</addr-line>, <country>Canada</country></aff>
<aff id="aff2"><sup>2</sup><institution>Department of Psychology, Western University</institution>, <addr-line>London, ON</addr-line>, <country>Canada</country></aff>
<aff id="aff3"><sup>3</sup><institution>Department of Psychology, University of Toronto, Mississauga</institution>, <addr-line>ON</addr-line>, <country>Canada</country></aff>
<author-notes>
<fn id="fn0001" fn-type="edited-by">
<p>Edited by: Dan Zhang, Tsinghua University, China</p>
</fn>
<fn id="fn0002" fn-type="edited-by">
<p>Reviewed by: Yue Ding, Shanghai Mental Health Center, China; Juan Huang, Johns Hopkins University, United States</p>
</fn>
<corresp id="c001">&#x002A;Correspondence: Christina Vanden Bosch der Nederlanden, <email>c.dernederlanden@utoronto.ca</email></corresp>
</author-notes>
<pub-date pub-type="epub">
<day>26</day>
<month>05</month>
<year>2023</year>
</pub-date>
<pub-date pub-type="collection">
<year>2023</year>
</pub-date>
<volume>14</volume>
<elocation-id>1167003</elocation-id>
<history>
<date date-type="received">
<day>15</day>
<month>02</month>
<year>2023</year>
</date>
<date date-type="accepted">
<day>09</day>
<month>05</month>
<year>2023</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x00A9; 2023 Yu, Cabildo, Grahn and Vanden Bosch der Nederlanden.</copyright-statement>
<copyright-year>2023</copyright-year>
<copyright-holder>Yu, Cabildo, Grahn and Vanden Bosch der Nederlanden</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract>
<p>Rhythm is a key feature of music and language, but the way rhythm unfolds within each domain differs. Music induces perception of a beat, a regular repeating pulse spaced by roughly equal durations, whereas speech does not have the same isochronous framework. Although rhythmic regularity is a defining feature of music and language, it is difficult to derive acoustic indices of the differences in rhythmic regularity between domains. The current study examined whether participants could provide subjective ratings of rhythmic regularity for acoustically matched (syllable-, tempo-, and contour-matched) and acoustically unmatched (varying in tempo, syllable number, semantics, and contour) exemplars of speech and song. We used subjective ratings to index the presence or absence of an underlying beat and correlated ratings with stimulus features to identify acoustic metrics of regularity. Experiment 1 highlighted that ratings based on the term &#x201C;rhythmic regularity&#x201D; did not result in consistent definitions of regularity across participants, with opposite ratings for participants who adopted a beat-based definition (song greater than speech), a normal-prosody definition (speech greater than song), or an unclear definition (no difference). Experiment 2 defined rhythmic regularity as how easy it would be to tap or clap to the utterances. Participants rated song as easier to clap or tap to than speech for both acoustically matched and unmatched datasets. Subjective regularity ratings from Experiment 2 illustrated that stimuli with longer syllable durations and with less spectral flux were rated as more rhythmically regular across domains. Our findings demonstrate that rhythmic regularity distinguishes speech from song and several key acoustic features can be used to predict listeners&#x2019; perception of rhythmic regularity within and across domains as well.</p>
</abstract>
<kwd-group>
<kwd>rhythmic regularity</kwd>
<kwd>beat</kwd>
<kwd>speech</kwd>
<kwd>song</kwd>
<kwd>music information retrieval</kwd>
<kwd>periodicity</kwd>
<kwd>rhythm</kwd>
</kwd-group>
<counts>
<fig-count count="3"/>
<table-count count="2"/>
<equation-count count="0"/>
<ref-count count="103"/>
<page-count count="12"/>
<word-count count="10112"/>
</counts>
<custom-meta-wrap>
<custom-meta>
<meta-name>section-at-acceptance</meta-name>
<meta-value>Auditory Cognitive Neuroscience</meta-value>
</custom-meta>
</custom-meta-wrap>
</article-meta>
</front>
<body>
<sec id="sec1" sec-type="intro">
<title>Introduction</title>
<p>Rhythm is crucial for the perception and production of vocal communication in both music and language. In language, syllable rhythms aid in the segmentation of speech (<xref ref-type="bibr" rid="ref15">Cutler and Butterfield, 1992</xref>; <xref ref-type="bibr" rid="ref19">Dilley and McAuley, 2008</xref>), convey the meaning of the speaker through prosodic stress (e.g., sarcasm, <xref ref-type="bibr" rid="ref12">Cheang and Pell, 2008</xref>), illustrate the presence of a foreign speakers&#x2019; accent (<xref ref-type="bibr" rid="ref69">Polyanskaya et al., 2017</xref>), and support simultaneous acquisition of multiple languages in infancy (<xref ref-type="bibr" rid="ref96">Werker and Byers-Heinlein, 2008</xref>). In music, rhythm contributes to melodic identity (<xref ref-type="bibr" rid="ref44">Jones et al., 1987</xref>; <xref ref-type="bibr" rid="ref38">H&#x00E9;bert and Peretz, 1997</xref>), enables beat perception (<xref ref-type="bibr" rid="ref71">Povel and Essens, 1985</xref>; <xref ref-type="bibr" rid="ref65">Parncutt, 1994</xref>), impacts perceived groove in music (<xref ref-type="bibr" rid="ref57">Matthews et al., 2019</xref>), and provides the structure that allows synchronization with music or other people (<xref ref-type="bibr" rid="ref25">Fitch, 2016</xref>). Rhythm is clearly an important feature for both language and music, but the way that rhythm is realized in each domain&#x2014;that is, how rhythm unfolds in time&#x2014;is different.</p>
<p>Rhythm, in both music and language, can be defined as the pattern of &#x2018;events&#x2019; in time (<xref ref-type="bibr" rid="ref58">McAuley, 2010</xref>; <xref ref-type="bibr" rid="ref74">Ravignani and Madison, 2017</xref>). Events in language typically occur at the syllable level, and events in music occur at the note level. Music and language differ in how the time intervals between events are structured. In musical rhythms, events are usually structured around a beat, or an underlying pulse (<xref ref-type="bibr" rid="ref21">Drake, 1998</xref>; <xref ref-type="bibr" rid="ref58">McAuley, 2010</xref>). Even though individual events are not equally spaced, the intervals between events relate to the beat, which means that durations are most commonly related by small integer ratios like 1:2 (e.g., quarter note:half note). The beat in music leads to the perception that the intervals between beats are roughly the same duration (i.e., isochronous; <xref ref-type="bibr" rid="ref74">Ravignani and Madison, 2017</xref>; <xref ref-type="bibr" rid="ref75">Ravignani and Norton, 2017</xref>) and gives listeners the sense of periodicity, or the perception of a pattern repeating regularly at a fixed period or interval in time (<xref ref-type="bibr" rid="ref66">Patel, 2003</xref>; <xref ref-type="bibr" rid="ref68">Patel et al., 2005</xref>; <xref ref-type="bibr" rid="ref47">Kotz et al., 2018</xref>). Periodicity is present in music even despite natural tempo fluctuations or expressive timing that make a strictly isochronous beat improbable in human produced music (<xref ref-type="bibr" rid="ref27">Fraisse, 1982</xref>; <xref ref-type="bibr" rid="ref23">Epstein, 1985</xref>; <xref ref-type="bibr" rid="ref7">Bharucha and Pryhor, 1986</xref>). In contrast, speech rhythms do not have a beat. It is this presence of a beat that we call rhythmic regularity.</p>
<p>Despite a long history of searching for strictly periodic intervals at the syllable or stress level in speech, no one has found regularly repeating patterns of equal duration in speech (<xref ref-type="bibr" rid="ref30">Grabe and Low, 2002</xref>; <xref ref-type="bibr" rid="ref66">Patel, 2003</xref>; <xref ref-type="bibr" rid="ref68">Patel et al., 2005</xref>; <xref ref-type="bibr" rid="ref14">Cummins, 2012</xref>; <xref ref-type="bibr" rid="ref29">Goswami and Leong, 2013</xref>; <xref ref-type="bibr" rid="ref9">Brown et al., 2017</xref>). Although speech sounds are generally considered rhythmic, those rhythms are constrained to the length of the word, linguistic stress pattern, syntactic rules, or prosodic emphasis in a sentence (<xref ref-type="bibr" rid="ref16">Cutler and Foss, 1977</xref>; <xref ref-type="bibr" rid="ref37">Hay and Diehl, 2007</xref>; <xref ref-type="bibr" rid="ref88">Turk and Shattuck-Hufnagel, 2013</xref>), which does not lend well to rhythmic regularity. These temporal regularities are crucial for speech intelligibility (<xref ref-type="bibr" rid="ref81">Shannon et al., 1995</xref>) and more crucial than spectral characteristics of speech (<xref ref-type="bibr" rid="ref1">Albouy et al., 2020</xref>). Speakers learn the typical rhythmic patterns of their language and this knowledge gives rise to temporal predictability in speech (<xref ref-type="bibr" rid="ref77">Rosen, 1992</xref>; <xref ref-type="bibr" rid="ref36">Hawkins, 2014</xref>; <xref ref-type="bibr" rid="ref43">Jadoul et al., 2016</xref>; <xref ref-type="bibr" rid="ref73">Rathcke et al., 2021</xref>), rather than any rhythmic regularities in the speech signal (<xref ref-type="bibr" rid="ref6">Beier and Ferreira, 2018</xref>). The differences in regularity between music and language are especially salient when comparing sensorimotor synchronization to speech and song, where speech has much greater variability in the alignment of taps to syllable events in speech (30%) compared to note events song (4%, <xref ref-type="bibr" rid="ref54">Lidji et al., 2011</xref>; <xref ref-type="bibr" rid="ref14">Cummins, 2012</xref>; <xref ref-type="bibr" rid="ref17">Dalla Bella et al., 2013</xref>).</p>
<p>In each domain, there is considerable research characterizing the degree or type of rhythmic information in the signal. These studies ask, for instance, whether language is rhythmic at all (e.g., <xref ref-type="bibr" rid="ref63">Nolan and Jeon, 2014</xref>) or what acoustic factors contribute to the strength of perceived regularity in music (e.g., <xref ref-type="bibr" rid="ref8">Bouwer et al., 2018</xref>). A range metrics have been used to characterize rhythm and/or regularity within each domain and, in a few cases, across domains. These metrics include the calculation of inter-onset-intervals between successive notes or syllables (e.g., stressed and unstressed IOIs; <xref ref-type="bibr" rid="ref91">Vanden Bosch der Nederlanden et al., 2022a</xref>,<xref ref-type="bibr" rid="ref92">b</xref>), durational contrastiveness between pairs of successive notes or syllables (Pairwise Variability Index; <xref ref-type="bibr" rid="ref30">Grabe and Low, 2002</xref>; <xref ref-type="bibr" rid="ref67">Patel and Daniele, 2003</xref>; <xref ref-type="bibr" rid="ref34">Hannon, 2009</xref>; <xref ref-type="bibr" rid="ref35">Hannon et al., 2016</xref>), the proportion of vocalic intervals in an utterance (vowel reduction; <xref ref-type="bibr" rid="ref30">Grabe and Low, 2002</xref>; <xref ref-type="bibr" rid="ref98">Wiget et al., 2010</xref>; <xref ref-type="bibr" rid="ref4">Arvaniti, 2012</xref>), acoustic feature extraction using music information retrieval techniques (e.g., <xref ref-type="bibr" rid="ref50">Lartillot and Toiviainen, 2007</xref>; <xref ref-type="bibr" rid="ref48">Lartillot et al., 2008</xref>; <xref ref-type="bibr" rid="ref2">Alluri and Toiviainen, 2010</xref>; <xref ref-type="bibr" rid="ref10">Burger et al., 2013</xref>, <xref ref-type="bibr" rid="ref11">2014</xref>), autocorrelations to detect self-similarity in the envelope of a signal (<xref ref-type="bibr" rid="ref52">Leong, 2012</xref>; <xref ref-type="bibr" rid="ref84">Suppanen et al., 2019</xref>), clock timing evidence and counter-evidence (<xref ref-type="bibr" rid="ref71">Povel and Essens, 1985</xref>), and integer multiple relatedness (<xref ref-type="bibr" rid="ref76">Roeske et al., 2020</xref>; <xref ref-type="bibr" rid="ref18">De Gregorio et al., 2021</xref>). These metrics have been useful within their own contexts of identifying, for example, whether a composer&#x2019;s language background influenced the musical rhythms they employed (<xref ref-type="bibr" rid="ref67">Patel and Daniele, 2003</xref>; <xref ref-type="bibr" rid="ref89">Van Handel, 2006</xref>) or determining the strength of a beat in one musical rhythm compared to another (<xref ref-type="bibr" rid="ref39">Henry et al., 2017</xref>; <xref ref-type="bibr" rid="ref57">Matthews et al., 2019</xref>). However, not all speech-rhythm metrics have proven to be reliable or strong predictors of perceived speech rhythms (<xref ref-type="bibr" rid="ref97">White and Mattys, 2007</xref>; <xref ref-type="bibr" rid="ref4">Arvaniti, 2012</xref>; <xref ref-type="bibr" rid="ref43">Jadoul et al., 2016</xref>). In music, the task of beat extraction is difficult (<xref ref-type="bibr" rid="ref59">McKinney et al., 2007</xref>; <xref ref-type="bibr" rid="ref32">Grosche et al., 2010</xref>), even if humans do it spontaneously (<xref ref-type="bibr" rid="ref31">Grahn and Brett, 2007</xref>; <xref ref-type="bibr" rid="ref41">Honing, 2012</xref>). The goal of the current paper is to examine whether some of the above metrics used to characterize rhythmic regularity in music or language separately can characterize the differences in rhythmic regularity <italic>between</italic> language and music.</p>
<p>Past work has examined where in the acoustic signal the beat is located in speech and song, finding consistent tapping in speech and song at p-centers (but see conflicting takes on p-centers <xref ref-type="bibr" rid="ref61">Morton et al., 1976</xref>; <italic>cf.</italic> <xref ref-type="bibr" rid="ref56">Marcus, 1981</xref>; <xref ref-type="bibr" rid="ref94">Vos and Rasch, 1981</xref>; <xref ref-type="bibr" rid="ref70">Pompino-Marschall, 1989</xref>; <xref ref-type="bibr" rid="ref80">Scott, 1998</xref>; <xref ref-type="bibr" rid="ref93">Villing et al., 2007</xref>), vowel onsets (<xref ref-type="bibr" rid="ref73">Rathcke et al., 2021</xref>), or at peaks in the acoustic envelope (<xref ref-type="bibr" rid="ref46">Kochanski and Orphanidou, 2008</xref>; <xref ref-type="bibr" rid="ref49">Lartillot and Grandjean, 2019</xref>). Still others have used cochlear models of acoustic salience to find the beat location in vocally-produced songs (<xref ref-type="bibr" rid="ref22">Ellis, 2007</xref>; <xref ref-type="bibr" rid="ref13">Coath et al., 2009</xref>). While these approaches are germane to the current question, our goal is to determine whether acoustic features of speech and song can eventually provide evidence of rhythmic regularity&#x2014;in the form of an equally-spaced, repeating pulse&#x2014;in a range of communicative and non-communicative domains. For instance, there is increasing evidence that regularity is a salient feature in the sensory landscape (<xref ref-type="bibr" rid="ref3">Aman et al., 2021</xref>), with listeners detecting regularity within a single cycle of it emerging from a random background (<xref ref-type="bibr" rid="ref83">Southwell and Chait, 2018</xref>) or preferentially attending to a visual stream with statistical regularities despite having no conscious perception of that regularity (<xref ref-type="bibr" rid="ref99">Zhao et al., 2013</xref>). Stimuli in studies like these are created with careful control over what features should give rise to regularity, but a wide range of natural stimuli, including non-human animal vocalizations (<xref ref-type="bibr" rid="ref47">Kotz et al., 2018</xref>; <xref ref-type="bibr" rid="ref76">Roeske et al., 2020</xref>; <xref ref-type="bibr" rid="ref18">De Gregorio et al., 2021</xref>) and environmental sounds (e.g., <xref ref-type="bibr" rid="ref33">Gygi et al., 2004</xref>; <xref ref-type="bibr" rid="ref78">Rothenberg, 2013</xref>) also give rise to regularity in a variety of different acoustic characteristics. Our goal is to find a metric that indexes the differences in regularity between speech and song with the future goal of using this metric to detect the degree of regularity in a range of naturally occurring sounds.</p>
<p>Acoustic features that differentiate temporal regularity in speech and song will also feed into perceptual and cognitive questions related to how humans differentiate speech and song in development (<xref ref-type="bibr" rid="ref91">Vanden Bosch der Nederlanden et al., 2022a</xref>,<xref ref-type="bibr" rid="ref92">b</xref>). Rhythmic regularity is an important feature for speech-to-song or environmental sound-to-song transformations (<xref ref-type="bibr" rid="ref82">Simchy-Gross and Margulis, 2018</xref>; <xref ref-type="bibr" rid="ref86">Tierney et al., 2018</xref>; <xref ref-type="bibr" rid="ref79">Rowland et al., 2019</xref>), but spectral features seem to be better predictors of a listeners&#x2019; perception of an utterance as speech or song (<xref ref-type="bibr" rid="ref40">Hilton et al., 2022</xref>; <xref ref-type="bibr" rid="ref91">Vanden Bosch der Nederlanden et al., 2022a</xref>,<xref ref-type="bibr" rid="ref92">b</xref>; <xref ref-type="bibr" rid="ref201">Albouy et al., 2023</xref>; <xref ref-type="bibr" rid="ref208">Ozaki et al., 2023</xref>). Given the importance of rhythmic differences between and among languages for helping children acquire language (<xref ref-type="bibr" rid="ref72">Ramus et al., 1999</xref>; <xref ref-type="bibr" rid="ref62">Nazzi et al., 2000</xref>; <xref ref-type="bibr" rid="ref45">Jusczyk, 2002</xref>), and for bringing about a transformation from speech to song, a clear acoustic metric of rhythmic regularity may prove useful for understanding the development of distinct domains of communication.</p>
<p>We address the goals in the current study by first obtaining subjective ratings of the differences in rhythmic regularity between spoken and sung utterances. After establishing this subjective metric, acoustic features of spoken and sung utterances were related to subjective ratings of rhythmic regularity to examine which features are most predictive of perceived rhythmic regularity.</p>
</sec>
<sec id="sec2">
<title>Experiment 1</title>
<sec id="sec3">
<title>Participants</title>
<p>Thirty-three 18- to 24-year-old participants (16 males) participated in the study. An additional 7 people participated in the study but were excluded because they did not complete the study (<italic>N</italic>&#x2009;=&#x2009;5 did not provide a rating for at least 90% of the rating trials, <italic>N</italic>&#x2009;=&#x2009;2 did not pass attention checks within the survey; see Procedure). A third of participants reported taking music lessons and a third of participants self-reported being bilingual, but most participants were English monolinguals who learned English from birth (see <xref ref-type="supplementary-material" rid="SM1">Supplementary Table S1</xref>). About half of participants identified as white. Participants were recruited from the University of Western Ontario undergraduate psychology participant pool and were required to speak English fluently and have no known hearing deficits. All participants were compensated with course credit and provided informed consent to participate. All materials were approved by Western University&#x2019;s Research Ethics Board (REB).</p>
</sec>
<sec id="sec4">
<title>Stimuli</title>
<p>One set of sung and spoken utterances was used for Experiment 1. We used a stimulus set generated for a different study (see <xref ref-type="bibr" rid="ref91">Vanden Bosch der Nederlanden et al., 2022a</xref>,<xref ref-type="bibr" rid="ref92">b</xref>). For purposes related to the previous studies&#x2019; need for acoustic control, the spoken and sung utterances were acoustically matched on several features, including the sentence texts (see <xref ref-type="supplementary-material" rid="SM2">Appendix A</xref>), speaker identity, total duration (utterance length), tempo (syllable rate), pitch contour, RMS amplitude, and number of syllables. In total, this stimulus set included 96 stimuli (48 unique texts), 48 spoken, 48 sung, with 3 male speakers (American and British English accents). The stimuli ranged from 1.62 to 3.86&#x2009;s in length with an average of approximately 2.46&#x2009;s. For details on stimulus creation please see <xref ref-type="bibr" rid="ref91">Vanden Bosch der Nederlanden et al. (2022a)</xref>.</p>
</sec>
<sec id="sec5">
<title>Procedure</title>
<p>Participants accessed the online study using <xref ref-type="bibr" rid="ref209">Qualtrics (2021)</xref> and completed a regularity rating task and a background demographics questionnaire. In the rating task, participants heard each spoken or sung sentence presented in random order in a single block. The presentation order of spoken and sung utterances was not constrained, so participants could hear multiple spoken or sung utterances in a row. On each trial, participants rated each audio clip according to how rhythmically regular it sounded (see <xref ref-type="supplementary-material" rid="SM2">Appendix B1</xref>), using a rating scale of 1 (not very regular) to 9 (very regular). Two catch trials were randomly presented to ensure participants were paying attention. The audio in these catch trials gave explicit instructions for ratings. For example, if the catch trial audio said &#x201C;This is a test trial. Please select number 3 on the slider below,&#x201D; the participant should have moved the slider to 3 before proceeding to the next trial. Immediately after the rating task, participants were asked to write out their own definition of rhythmic regularity in an open text box. Participants completed a demographic background questionnaire at the end. On average, participants completed the study in 33.61&#x2009;min.</p>
</sec>
<sec id="sec6">
<title>Results</title>
<p>Rhythmic regularity ratings were averaged separately for spoken and sung utterances. Ratings were normally distributed, with skewness and kurtosis ratings between +/&#x2212;3. Average ratings were submitted to a one-way repeated-measures Analysis of Variance (ANOVA) with Utterance (Speech, Song) as the main factor. As illustrated in <xref rid="fig1" ref-type="fig">Figure 1A</xref>, regularity ratings did not differ between speech and song, <italic>F</italic>(1, 32)&#x2009;=&#x2009;1.044, <italic>p</italic>&#x2009;=&#x2009;0.314, &#x03B7;<sup>2</sup>&#x2009;=&#x2009;0.032. However, we provided no training or guidance on what rhythmic regularity was. To capture whether participants&#x2019; definition of rhythmic regularity influenced their ratings, we thematically coded each listener&#x2019;s self-reported definition of &#x201C;rhythmic regularity&#x201D; and identified 3 groups: beat-based, normal-prosody, and unclear definitions. Participants were grouped into beat-based definitions if they mentioned the words &#x201C;beat&#x201D; or &#x201C;meter&#x201D; and/or discussed the importance of rhythmic consistency (e.g., even spacing). Participants were grouped into normal-prosody definitions if they discussed linguistic stress, prosodic pitch, rhyme, and that regularity depended on sounding normal for conversation (e.g., normal speed/tempo/flow for speech). Finally, participants were placed in the unclear definition group if their definition was not based on acoustic factors (e.g., annoyance, familiarity), was not a definition (e.g., about what the goal of the study was), or had a definition that could be either beat or prosody based (see <xref ref-type="supplementary-material" rid="SM1">Supplementary Table S2</xref>). In the end, 12 listeners had beat-based definitions, 11 listeners had normal-prosody definitions, and 10 listeners had unclear definitions of rhythmic regularity. A follow-up 2 (Utterance: speech, song) by 3 (Definition: beat, prosody, unclear) ANOVA again showed no main effect of utterance type (speech vs. song), <italic>F</italic>(1,30)&#x2009;=&#x2009;1.934, <italic>p</italic>&#x2009;=&#x2009;0.175, &#x03B7;<sub>p</sub><sup>2</sup>&#x2009;=&#x2009;0.061, but there was a significant interaction with definition, <italic>F</italic>(2, 30)&#x2009;=&#x2009;6.606, <italic>p</italic>&#x2009;=&#x2009;0.004, &#x03B7;<sub>p</sub><sup>2</sup>&#x2009;=&#x2009;0.306. As illustrated in <xref rid="fig1" ref-type="fig">Figure 1B</xref>, the normal-prosody group rated speech as more rhythmically regular than song, <italic>F</italic>(1, 10)&#x2009;=&#x2009;7.085, <italic>p</italic>&#x2009;=&#x2009;0.024, &#x03B7;<sup>2</sup>&#x2009;=&#x2009;0.415, while the beat-based group rated song as more rhythmically regular than speech, <italic>F</italic>(1, 11)&#x2009;=&#x2009;4.963, <italic>p</italic>&#x2009;=&#x2009;0.048, &#x03B7;<sup>2</sup>&#x2009;=&#x2009;0.311, and the unclear group did not reliably differentiate regularity in speech and song, <italic>F</italic>(1, 9)&#x2009;=&#x2009;2.846, <italic>p</italic>&#x2009;=&#x2009;0.126, &#x03B7;<sup>2</sup>&#x2009;=&#x2009;0.240. These results suggest that the perceived rhythmic regularity of speech and song differed based on participants&#x2019;, sometimes inaccurate, definition of rhythmic regularity.</p>
<fig position="float" id="fig1"><label>Figure 1</label>
<caption>
<p><bold>(A)</bold> Average rhythmic regularity rating of song and speech illustrating no difference in regularity ratings and <bold>(B)</bold> a significant interaction illustrating that speech and song regularity ratings were dependent on participants&#x2019; definition of rhythmic regularity. Standard Error is within-subjects error (<xref ref-type="bibr" rid="ref207">Morey, 2008</xref>).</p>
</caption>
<graphic xlink:href="fpsyg-14-1167003-g001.tif"/>
</fig>
</sec>
</sec>
<sec id="sec7">
<title>Interim discussion</title>
<p>Experiment 1 illustrated that participants had varying definitions of rhythmic regularity when we left it undefined and did not provide training examples. Initially it appeared that our acoustically matched stimuli did not differ in perceived rhythmic regularity, but after taking participants&#x2019; definitions into account (whether their definition was beat-based, normal-prosody, or unclear), regularity was greater for song than speech for beat-based definitions and greater for speech than song for normal-prosody definitions. Note that the normal-prosody definition group did not describe prosodic rhythmic regularity or a beat in speech, but rather participants in this group largely based their definitions only on the regular part of the term rhythmic regularity. Instead, these participants focused on how normal the speech sounded for everyday conversations. Although definition groupings explained a significant amount of variability in regularity ratings, it is also possible that the acoustic constraints placed on the stimuli reduced the differences in rhythmic regularity between spoken and sung exemplars. In this case, different profiles of regularity for speech and song in Experiment 1 may mean stimuli did not differ or only weakly differed in rhythmic regularity. We designed Experiment 2 to determine whether providing a clear definition of rhythmic regularity would shift participants&#x2019; ratings to align with the beat-based definition of rhythmic regularity we set out to examine in addition to determining whether regularity ratings were consistent across different stimulus sets.</p>
<p>We improved on Experiment 1 in three ways: (1) We provided a concrete rhythmic regularity rating scale &#x201C;How easy would it be to tap or clap along to that clip?&#x201D; (2) We provided training examples before participants began the rating task consisting of spoken and sung clips that would be easy and not easy to tap or clap to using familiar stimuli, and (3) We added a second unmatched stimulus set of spoken and sung stimuli that were not acoustically matched to examine regularity differences between unconstrained spoken and sung exemplars.</p>
<p>A second goal of Experiment 2 was to relate participants&#x2019; regularity ratings to acoustic features of spoken and sung exemplars. To achieve this goal, speech- and music-based acoustic features were extracted from all stimuli using Praat, MIR Toolbox, and custom music-inspired scripts (see OSF). We used standard acoustic features that are known to differ between speech and song (<xref ref-type="bibr" rid="ref91">Vanden Bosch der Nederlanden et al., 2022a</xref>,<xref ref-type="bibr" rid="ref92">b</xref>), as well as several features described in the introduction related to temporal regularity (see <xref ref-type="supplementary-material" rid="SM2">Appendix D</xref> for full feature list).</p>
</sec>
<sec id="sec8">
<title>Experiment 2</title>
<sec id="sec9">
<title>Participants</title>
<p>Fifty-one participants (13 males) between the ages of 17&#x2013;24&#x2009;years of age participated. An additional 6 individuals participated but were excluded because they did not pass all attention checks (see Procedure). Note that one included participant passed attention checks but did not respond to 2 trials in the acoustically matched stimulus set. About a quarter of the participants reported musical training (see <xref ref-type="supplementary-material" rid="SM1">Supplementary Table S3</xref>). Almost a third of participants self-reported being bilingual, but most participants were English monolinguals and learned English from birth (see <xref ref-type="supplementary-material" rid="SM1">Supplementary Table S3</xref>). About half of participants identified as white (see <xref ref-type="supplementary-material" rid="SM1">Supplementary Table S3</xref>). Participants were recruited from the University of Western Ontario undergraduate psychology participant pool and were required to be English speakers and have no known hearing deficits. All participants were compensated with course credit and provided informed consent to participate. All materials were approved by Western University&#x2019;s Research Ethics Board (REB).</p>
</sec>
<sec id="sec10">
<title>Stimuli</title>
<p>Experiment 2 included the acoustically matched stimulus set from Experiment 1 and an unmatched stimulus set created for this study. This additional stimulus set addressed the possibility that matched spoken and sung utterances did not differ on rhythmic regularity because of the constraints placed on tempo, duration, contour in their recording process. The unmatched stimulus set consisted of short clips pulled from several free sources on the internet including <ext-link xlink:href="http://audiobooks.org" ext-link-type="uri">audiobooks.org</ext-link> (<italic>N</italic>&#x2009;=&#x2009;15), <ext-link xlink:href="http://looperman.com" ext-link-type="uri">looperman.com</ext-link> (<italic>N</italic>&#x2009;=&#x2009;7), <ext-link xlink:href="http://ccmixter.org" ext-link-type="uri">ccmixter.org</ext-link> (<italic>N</italic>&#x2009;=&#x2009;12), <ext-link xlink:href="http://Soundcloud.com" ext-link-type="uri">Soundcloud.com</ext-link> (<italic>N</italic>&#x2009;=&#x2009;2), the SiSEC database (<italic>N</italic>&#x2009;=&#x2009;8; <xref ref-type="bibr" rid="ref55">Liutkus et al., 2017</xref>), and a previous paper examining music and language comparisons (<italic>N</italic>&#x2009;=&#x2009;1; <xref ref-type="bibr" rid="ref1">Albouy et al., 2020</xref>). Podcast recordings (<italic>N</italic>&#x2009;=&#x2009;15) were sampled from <ext-link xlink:href="http://spotify.com" ext-link-type="uri">spotify.com</ext-link> under the fair dealing and educational exceptions to copyright (Copyright Act, R.S.C., 1985). The unmatched stimuli ranged from 1.84 to 3.71&#x2009;s in length, with an average of 2.38&#x2009;s in duration, on average. A total of 60 sentences (see <xref ref-type="supplementary-material" rid="SM2">Appendix C</xref>) were retrieved from the above sources, with half spoken and half sung recordings of solo voices (no instruments in the sung versions). Sentence text and speaker were not matched in this unmatched set, so no sentences were repeated. Although these stimuli were not matched for overall duration, pitch, etc., they were equated for total RMS amplitude. The acoustic features and derived rhythm metrics are reported for each stimulus set separately in <xref rid="tab1" ref-type="table">Table 1</xref>, and the description and method for extracting each feature is reported in <xref ref-type="supplementary-material" rid="SM2">Appendix B</xref>.</p>
<table-wrap position="float" id="tab1"><label>Table 1</label>
<caption>
<p>Acoustic features extracted for all matched and unmatched stimuli, using Praat-based linguistic metrics, Music Information retrieval metrics from MIR Toolbox, and music-inspired regularity metrics.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th/>
<th/>
<th align="center" valign="top" colspan="3">Matched</th>
<th align="center" valign="top" colspan="3">Unmatched</th>
<th align="center" valign="top">Speech</th>
<th align="center" valign="top">Song</th>
<th align="center" valign="top"><italic>P</italic></th>
<th align="center" valign="top">Speech</th>
<th align="center" valign="top">Song</th>
<th align="center" valign="top"><italic>P</italic></th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="middle" rowspan="12">Praat-based metrics</td>
<td align="left" valign="top">F0</td>
<td align="center" valign="top">138.45 (20.09)</td>
<td align="center" valign="top">138.15 (11.41)</td>
<td align="center" valign="top">0.930</td>
<td align="center" valign="top">158.88 (55.86)</td>
<td align="center" valign="top">277.53 (75.59)</td>
<td align="center" valign="top">&#x003C;0.001</td>
</tr>
<tr>
<td align="left" valign="top">F0 instability</td>
<td align="center" valign="top">1.40 (0.50)</td>
<td align="center" valign="top">0.68 (0.14)</td>
<td align="center" valign="top">&#x003C;0.001</td>
<td align="center" valign="top">1.23 (0.38)</td>
<td align="center" valign="top">0.97 (0.34)</td>
<td align="center" valign="top">0.006</td>
</tr>
<tr>
<td align="left" valign="top">Total duration</td>
<td align="center" valign="top">2.43 (0.33)</td>
<td align="center" valign="top">2.49 (0.37)</td>
<td align="center" valign="top">0.381</td>
<td align="center" valign="top">2.29 (0.23)</td>
<td align="center" valign="top">2.48 (0.42)</td>
<td align="center" valign="top">0.030</td>
</tr>
<tr>
<td align="left" valign="top">Syllable duration</td>
<td align="center" valign="top">0.26 (0.04)</td>
<td align="center" valign="top">0.27 (0.04)</td>
<td align="center" valign="top">0.196</td>
<td align="center" valign="top">0.21 (0.04)</td>
<td align="center" valign="top">0.39 (0.11)</td>
<td align="center" valign="top">&#x003C;0.001</td>
</tr>
<tr>
<td align="left" valign="top">Stressed duration</td>
<td align="center" valign="top">0.37 (0.08)</td>
<td align="center" valign="top">0.37 (0.09)</td>
<td align="center" valign="top">0.771</td>
<td align="center" valign="top">0.31 (0.12)</td>
<td align="center" valign="top">0.43 (0.24)</td>
<td align="center" valign="top">0.020</td>
</tr>
<tr>
<td align="left" valign="top">Vocalic nPVI</td>
<td align="center" valign="top">53.61 (14.49)</td>
<td align="center" valign="top">54.44 (16.35)</td>
<td align="center" valign="top">0.792</td>
<td align="center" valign="top">59.66 (16.73)</td>
<td align="center" valign="top">72.02 (26.37)</td>
<td align="center" valign="top">0.035</td>
</tr>
<tr>
<td align="left" valign="top">Consonantal PVI</td>
<td align="center" valign="top">117.87 (39.83)</td>
<td align="center" valign="top">108.93 (32.83)</td>
<td align="center" valign="top">0.233</td>
<td align="center" valign="top">95.20 (51.39)</td>
<td align="center" valign="top">184.16 (71.25)</td>
<td align="center" valign="top">&#x003C;0.001</td>
</tr>
<tr>
<td align="left" valign="top">Stress syllable nPVI</td>
<td align="center" valign="top">51.07 (15.24)</td>
<td align="center" valign="top">51.88 (13.74)</td>
<td align="center" valign="top">0.784</td>
<td align="center" valign="top">51.95 (21.11)</td>
<td align="center" valign="top">67.59 (32.94)</td>
<td align="center" valign="top">0.033</td>
</tr>
<tr>
<td align="left" valign="top">Syllable nPVI</td>
<td align="center" valign="top">61.96 (15.32)</td>
<td align="center" valign="top">57.00 (15.16)</td>
<td align="center" valign="top">0.114</td>
<td align="center" valign="top">55.39 (14.83)</td>
<td align="center" valign="top">65.06 (23.23)</td>
<td align="center" valign="top">0.060</td>
</tr>
<tr>
<td align="left" valign="top">%V</td>
<td align="center" valign="top">0.49 (0.07)</td>
<td align="center" valign="top">0.55 (0.08)</td>
<td align="center" valign="top">&#x003C;0.001</td>
<td align="center" valign="top">0.48 (0.08)</td>
<td align="center" valign="top">0.66 (0.09)</td>
<td align="center" valign="top">&#x003C;0.001</td>
</tr>
<tr>
<td align="left" valign="top">&#x0394;C</td>
<td align="center" valign="top">0.08 (0.02)</td>
<td align="center" valign="top">0.07 (0.02)</td>
<td align="center" valign="top">0.154</td>
<td align="center" valign="top">0.07 (0.03)</td>
<td align="center" valign="top">0.08 (0.04)</td>
<td align="center" valign="top">0.288</td>
</tr>
<tr>
<td align="left" valign="top">&#x0394;V</td>
<td align="center" valign="top">0.07 (0.02)</td>
<td align="center" valign="top">0.07 (0.02)</td>
<td align="center" valign="top">0.002</td>
<td align="center" valign="top">0.06 (0.02)</td>
<td align="center" valign="top">0.21 (0.10)</td>
<td align="center" valign="top">&#x003C;0.001</td>
</tr>
<tr>
<td align="left" valign="middle" rowspan="14">Music information retrieval</td>
<td align="left" valign="top">Spectral flux</td>
<td align="center" valign="top">45.17 (4.42)</td>
<td align="center" valign="top">38.81 (4.12)</td>
<td align="center" valign="top">&#x003C;0.001</td>
<td align="center" valign="top">107.44 (40.93)</td>
<td align="center" valign="top">90.44 (20.45)</td>
<td align="center" valign="top">0.048</td>
</tr>
<tr>
<td align="left" valign="top">Sub-band flux 1</td>
<td align="center" valign="top">1.36 (1.37)</td>
<td align="center" valign="top">1.01 (0.56)</td>
<td align="center" valign="top">0.107</td>
<td align="center" valign="top">1.36 (0.67)</td>
<td align="center" valign="top">1.397 (1.01)</td>
<td align="center" valign="top">0.880</td>
</tr>
<tr>
<td align="left" valign="top">Sub-band flux 2</td>
<td align="center" valign="top">1.36 (1.37)</td>
<td align="center" valign="top">1.01 (0.56)</td>
<td align="center" valign="top">0.107</td>
<td align="center" valign="top">4.85 (3.89)</td>
<td align="center" valign="top">1.11 (0.37)</td>
<td align="center" valign="top">&#x003C;0.001</td>
</tr>
<tr>
<td align="left" valign="top">Sub-band flux 3</td>
<td align="center" valign="top">7.13 (2.03)</td>
<td align="center" valign="top">6.33 (2.42)</td>
<td align="center" valign="top">0.080</td>
<td align="center" valign="top">40.58 (32.16)</td>
<td align="center" valign="top">13.74 (17.82)</td>
<td align="center" valign="top">&#x003C;0.001</td>
</tr>
<tr>
<td align="left" valign="top">Sub-band flux 4</td>
<td align="center" valign="top">13.27 (3.01)</td>
<td align="center" valign="top">10.71 (2.47)</td>
<td align="center" valign="top">&#x003C;0.001</td>
<td align="center" valign="top">46.03 (21.90)</td>
<td align="center" valign="top">33.14 (16.98)</td>
<td align="center" valign="top">0.014</td>
</tr>
<tr>
<td align="left" valign="top">Sub-band flux 5</td>
<td align="center" valign="top">20.56 (4.44)</td>
<td align="center" valign="top">17.75 (4.21)</td>
<td align="center" valign="top">0.002</td>
<td align="center" valign="top">28.92 (8.87)</td>
<td align="center" valign="top">26.65 (12.85)</td>
<td align="center" valign="top">0.429</td>
</tr>
<tr>
<td align="left" valign="top">Sub-band flux 6</td>
<td align="center" valign="top">12.95 (3.63)</td>
<td align="center" valign="top">11.28 (3.58)</td>
<td align="center" valign="top">0.026</td>
<td align="center" valign="top">19.36 (7.57)</td>
<td align="center" valign="top">24.93 (15.61)</td>
<td align="center" valign="top">0.085</td>
</tr>
<tr>
<td align="left" valign="top">Sub-band flux 7</td>
<td align="center" valign="top">10.84 (3.59)</td>
<td align="center" valign="top">10.45 (3.92)</td>
<td align="center" valign="top">0.614</td>
<td align="center" valign="top">13.99 (7.15)</td>
<td align="center" valign="top">20.47 (9.31)</td>
<td align="center" valign="top">0.004</td>
</tr>
<tr>
<td align="left" valign="top">Sub-band flux 8</td>
<td align="center" valign="top">6.56 (2.37)</td>
<td align="center" valign="top">5.76 (2.25)</td>
<td align="center" valign="top">0.091</td>
<td align="center" valign="top">8.21 (4.24)</td>
<td align="center" valign="top">10.53 (6.28)</td>
<td align="center" valign="top">0.099</td>
</tr>
<tr>
<td align="left" valign="top">Sub-band flux 9</td>
<td align="center" valign="top">2.82 (1.42)</td>
<td align="center" valign="top">2.10 (0.90)</td>
<td align="center" valign="top">0.004</td>
<td align="center" valign="top">7.92 (6.73)</td>
<td align="center" valign="top">12.33 (9.52)</td>
<td align="center" valign="top">0.022</td>
</tr>
<tr>
<td align="left" valign="top">Pulse clarity (Max)</td>
<td align="center" valign="top">0.22 (0.10)</td>
<td align="center" valign="top">0.23 (0.10)</td>
<td align="center" valign="top">0.757</td>
<td align="center" valign="top">0.23 (0.08)</td>
<td align="center" valign="top">0.23 (0.09)</td>
<td align="center" valign="top">0.948</td>
</tr>
<tr>
<td align="left" valign="top">Pulse clarity (Min)</td>
<td align="center" valign="top">0.16 (0.06)</td>
<td align="center" valign="top">0.16 (0.06)</td>
<td align="center" valign="top">0.571</td>
<td align="center" valign="top">0.20 (0.06)</td>
<td align="center" valign="top">0.19 (0.05)</td>
<td align="center" valign="top">0.603</td>
</tr>
<tr>
<td align="left" valign="top">Tempo (autocorr)</td>
<td align="center" valign="top">127.87 (36.19)</td>
<td align="center" valign="top">132.70 (36.93)</td>
<td align="center" valign="top">0.524</td>
<td align="center" valign="top">116.90 (26.70)</td>
<td align="center" valign="top">110.47 (30.63)</td>
<td align="center" valign="top">0.390</td>
</tr>
<tr>
<td align="left" valign="top">Tempo (spectrum)</td>
<td align="center" valign="top">146.14 (29.10)</td>
<td align="center" valign="top">144.77 (26.68)</td>
<td align="center" valign="top">0.812</td>
<td align="center" valign="top">141.08 (28.74)</td>
<td align="center" valign="top">126.55 (27.83)</td>
<td align="center" valign="top">0.054</td>
</tr>
<tr>
<td align="left" valign="middle" rowspan="5">Music-inspired metrics</td>
<td align="left" valign="top">Integer multiple</td>
<td align="center" valign="top">0.35 (0.20)</td>
<td align="center" valign="top">0.36 (0.20)</td>
<td align="center" valign="top">0.893</td>
<td align="center" valign="top">0.37 (0.16)</td>
<td align="center" valign="top">0.36 (0.27)</td>
<td align="center" valign="top">0.925</td>
</tr>
<tr>
<td align="left" valign="top">Asynchrony</td>
<td align="center" valign="top">0.12 (0.12)</td>
<td align="center" valign="top">0.11 (0.13)</td>
<td align="center" valign="top">0.703</td>
<td align="center" valign="top">0.16 (0.12)</td>
<td align="center" valign="top">0.16 (0.13)</td>
<td align="center" valign="top">0.366</td>
</tr>
<tr>
<td align="left" valign="top">Asynchrony SD</td>
<td align="center" valign="top">0.12 (0.11)</td>
<td align="center" valign="top">0.12 (0.12)</td>
<td align="center" valign="top">0.743</td>
<td align="center" valign="top">0.14 (0.12)</td>
<td align="center" valign="top">0.12 (0.12)</td>
<td align="center" valign="top">0.489</td>
</tr>
<tr>
<td align="left" valign="top">Signed asynchrony</td>
<td align="center" valign="top">0.04 (0.14)</td>
<td align="center" valign="top">0.02 (0.15)</td>
<td align="center" valign="top">0.444</td>
<td align="center" valign="top">0.11 (0.19)</td>
<td align="center" valign="top">0.06 (0.16)</td>
<td align="center" valign="top">0.324</td>
</tr>
<tr>
<td align="left" valign="top">Signed SD</td>
<td align="center" valign="top">0.14 (0.12)</td>
<td align="center" valign="top">0.13 (0.13)</td>
<td align="center" valign="top">0.791</td>
<td align="center" valign="top">0.16 (0.12)</td>
<td align="center" valign="top">0.14 (0.13)</td>
<td align="center" valign="top">0.458</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>Average (st dev) value for spoken and sung exemplars, the <italic>p</italic>-value (uncorrected paired samples <italic>t</italic>-test) characterizes whether the metric differed for speech and song.</p>
</table-wrap-foot>
</table-wrap>
</sec>
<sec id="sec11">
<title>Procedure</title>
<p>The procedure was similar to Experiment 1, except that the stimuli from the unmatched and matched datasets were blocked and rated separately from one another. Participants were asked to wear headphones and complete the surveys in a distraction-free environment. The same order&#x2013;matched stimulus set, followed by the unmatched stimulus set&#x2013;was used for all participants so as not to increase variability in ratings across stimulus sets and for maximal comparison to Experiment 1. Prior to each rating task, participants heard a training section with 4 training stimuli that provided examples of spoken and sung utterances that were easy and hard to clap to. Training utterances were spoken and sung by a single male speaker using the text and melody of the familiar children&#x2019;s song &#x201C;Twinkle, Twinkle, Little Star&#x201D; (<xref ref-type="bibr" rid="ref85">Taylor and Taylor, 1806</xref>), and were labeled as &#x201C;Song&#x201D; or &#x201C;Speech&#x201D; and &#x201C;Easy to tap or clap along to&#x201D; or &#x201C;Not easy to clap or tap along to.&#x201D; Easy to tap/clap utterances were sung with a strict metrical pulse or spoken like a poem with a clear prosodic metrical foot alternation. The other stimuli were performed with temporal irregularities including saying words quickly and with irregular pauses between words to disrupt any perception of a beat. Participants could listen to these examples as many times as they wanted and had to listen to all 4 to move forward in the survey. For each stimulus in the rating task, participants rated &#x201C;How easy would it be to clap or tap to that clip?&#x201D; with a rating scale of &#x201C;1&#x2009;=&#x2009;Not Very Easy&#x201D; through to &#x201C;9&#x2009;=&#x2009;Very Easy.&#x201D; As before, participants could listen to the clips as many times as they wanted but had to listen at least once to move forward. Participants completed an unrelated task [the SSS test reported in <xref ref-type="bibr" rid="ref202">Assaneo et al. (2019)</xref>] between the matched and unmatched ratings, but those data are beyond the scope of the current paper and are not reported here. The same two catch (&#x201C;attention check&#x201D;) trials were used from Experiment 1 and were randomly incorporated in each block (4 in total). Finally, participants filled out a demographic background questionnaire.</p>
</sec>
<sec id="sec12">
<title>Results</title>
<p>Rhythmic regularity ratings were averaged separately for spoken and sung utterances in both the matched and unmatched stimulus sets and submitted to a 2 (Utterance: speech, song) by 2 (Stimulus set: matched, unmatched) repeated-measures ANOVA. Song was rated as more rhythmically regular than speech, <italic>F</italic>(1, 50)&#x2009;=&#x2009;39.490, <italic>p</italic>&#x2009;&#x003C;&#x2009;0.001, &#x03B7;<sub>p</sub><sup>2</sup>&#x2009;=&#x2009;0.441, and matched stimuli had higher regularity ratings than unmatched stimuli, <italic>F</italic>(1, 50)&#x2009;=&#x2009;21.089, <italic>p</italic>&#x2009;&#x003C;&#x2009;0.001, &#x03B7;<sub>p</sub><sup>2</sup>&#x2009;=&#x2009;0.297. However, a significant interaction between stimulus set and utterance, <italic>F</italic>(1, 50)&#x2009;=&#x2009;13.899, <italic>p</italic>&#x2009;&#x003C;&#x2009;0.001, &#x03B7;<sub>p</sub><sup>2</sup>&#x2009;=&#x2009;0.218, suggested that the effect of utterance type was larger in the unmatched than the matched set, as illustrated in <xref rid="fig2" ref-type="fig">Figure 2</xref>. Simple effects revealed that for matched stimuli, song ratings were higher than speech ratings by 0.874&#x2009;units on the rating scale, <italic>F</italic>(1, 50)&#x2009;=&#x2009;20.863, <italic>p</italic>&#x2009;&#x003C;&#x2009;0.001, &#x03B7;<sup>2</sup>&#x2009;=&#x2009;0.294. For the unmatched stimuli, song ratings were higher than speech by 1.696&#x2009;units on the rating scale, <italic>F</italic>(1, 50)&#x2009;=&#x2009;40.338, <italic>p</italic>&#x2009;&#x003C;&#x2009;0.001, &#x03B7;<sup>2</sup>&#x2009;=&#x2009;0.447. Overall, song was consistently rated as more rhythmically regular than speech, but this difference was larger for unmatched compared to matched utterances. These findings indicate that a clear definition of rhythmic regularity allows listeners to be sensitive to rhythmic regularity as a distinguishing feature between music and language. Participants were sensitive to differences in rhythmic regularity in acoustically constrained settings as well, when features that are typically correlated with regularity, like tempo, are held constant across spoken and sung exemplars.</p>
<fig position="float" id="fig2"><label>Figure 2</label>
<caption>
<p>Average rhythmic regularity ratings for song and speech grouped by matched and unmatched stimulus sets, within-subjects standard error (Morrey, 2008).</p>
</caption>
<graphic xlink:href="fpsyg-14-1167003-g002.tif"/>
</fig>
</sec>
<sec id="sec13">
<title>Correlating rhythmic measures with subjective ratings</title>
<p>To examine which acoustic features best predicted listeners&#x2019; rhythmic regularity ratings, we included features that were correlated with regularity ratings in a linear mixed effects model. First, we performed first order correlations among all the extracted metrics (see Method and <xref ref-type="supplementary-material" rid="SM1">Supplementary Table S4</xref>) despite redundancy across rhythmic measures. Unmatched spoken and sung utterances differed greatly in the number of syllables (fewer for song than speech), which affected several other metrics including average syllable duration and metrics related to syllable or vocalic/consonant onsets. We performed separate first order correlations for matched and unmatched stimulus sets to ensure that features correlated in one set but not another due to syllable number had the opportunity to be entered into the model (see <xref ref-type="supplementary-material" rid="SM1">Supplementary Table S4</xref>). Several first order correlation features were highly correlated with other predictors, such that F0, syllable duration, stressed interval, %V, consonantal PVI, and &#x0394;V were all correlated with one another (all <italic>rs</italic>&#x2009;&#x003E;&#x2009;0.3, see <xref ref-type="supplementary-material" rid="SM1">Supplementary Table S5</xref>). To reduce multicollinearity, the feature that was most highly correlated with rhythmic regularity was entered for model testing (i.e., average syllable duration, see <xref ref-type="supplementary-material" rid="SM1">Supplementary Table S4</xref>). Spectral flux was correlated with each sub-band flux metric. Total spectral flux was chosen for model testing over any sub-band measure because overall flux correlated consistently with rhythmic regularity in each stimulus set, while sub-band flux correlations were present or absent depending on the stimulus set. The final features entered into the model were F0 instability, total duration, average syllable duration, and spectral flux (but see <xref ref-type="supplementary-material" rid="SM1">Supplementary Table S6</xref> for additional analyses using consonantal PVI and %V instead of syllable duration). All measures were mean-centered and any measures with kurtosis or skewness (+/&#x2212;3) were log-transformed and mean-centered before being entered into the model.</p>
<p>Participant ID and Stimulus ID were entered as random effects, with 1 spectral and 3 temporal features added as fixed effects. These fixed effects significantly improved the fit of the basic model (see <xref rid="tab2" ref-type="table">Table 2</xref>, Model 1), but duration did not uniquely contribute to the model. After removing duration, Model 2 accounted for a significant amount of variance compared to the random effects model and Model 1 did not account for more variance than the Model 2 (<italic>p</italic>&#x2009;=&#x2009;0.743). Model 3 included syllable count to ensure that predictors were robust to the small number of syllables present in sung utterances from the unmatched condition. Syllable count did not significantly improve fit compared to Model 2 (see <xref rid="tab2" ref-type="table">Table 2</xref>, Model 3), and did not change the significance of average syllable duration. Finally, Model 4 examined whether the acoustic features from Model 2 would remain significant even after adding speech and song labels into the model (utterance type). F0 Instability was no longer significant in this final model, presumably because F0 stability was more predictive of speech-song differences than regularity within stimulus classes. Thus, in addition to songs having greater rhythmic regularity than speech, stimuli with longer syllable durations and less spectral flux were rated as more rhythmically regular (<xref rid="fig3" ref-type="fig">Figure 3</xref>).</p>
<table-wrap position="float" id="tab2"><label>Table 2</label>
<caption>
<p>LME models predicting rhythmic regularity.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="top">Model</th>
<th align="left" valign="top">Variable</th>
<th align="center" valign="top">Estimate</th>
<th align="center" valign="top">t-value</th>
<th align="center" valign="top"><italic>P</italic></th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">Model 1:</td>
<td align="left" valign="bottom">Duration</td>
<td align="center" valign="bottom">0.074</td>
<td align="center" valign="bottom">0.323</td>
<td align="center" valign="bottom">0.7469</td>
<td align="left" valign="bottom">Syllable duration</td>
<td align="center" valign="bottom">
<bold>3.270</bold>
</td>
<td align="center" valign="bottom">
<bold>4.668</bold>
</td>
<td align="center" valign="bottom">
<bold>&#x003C;0.0001</bold>
</td>
<td align="left" valign="bottom">spectral flux</td>
<td align="center" valign="bottom">
<bold>&#x2212;0.008</bold>
</td>
<td align="center" valign="bottom">
<bold>&#x2212;3.521</bold>
</td>
<td align="center" valign="bottom">
<bold>0.0006</bold>
</td>
<td align="left" valign="bottom">F0 instability</td>
<td align="center" valign="bottom">
<bold>&#x2212;0.467</bold>
</td>
<td align="center" valign="bottom">
<bold>&#x2212;2.980</bold>
</td>
<td align="center" valign="bottom">
<bold>0.0034</bold>
</td>
</tr>
<tr>
<td align="left" valign="top" colspan="5"><bold><italic>X</italic></bold><sup>
<bold><italic>2</italic></bold>
</sup><bold>(8, <italic>N</italic>&#x2009;=&#x2009;7,954)&#x2009;=&#x2009;61.254, <italic>p</italic>&#x2009;&#x003C;&#x2009;0.001, AIC&#x2009;=&#x2009;32,857</bold> (<italic>compared to random intercept model</italic>)</td>
</tr>
<tr>
<td align="left" valign="top">Model 2:</td>
<td align="left" valign="bottom">Syllable duration</td>
<td align="center" valign="bottom">
<bold>3.372</bold>
</td>
<td align="center" valign="bottom">
<bold>5.412</bold>
</td>
<td align="center" valign="bottom">
<bold>&#x003C;0.0001</bold>
</td>
<td align="left" valign="bottom">Spectral flux</td>
<td align="center" valign="bottom">
<bold>&#x2212;0.008</bold>
</td>
<td align="center" valign="bottom">
<bold>&#x2212;3.625</bold>
</td>
<td align="center" valign="bottom">
<bold>0.0004</bold>
</td>
<td align="left" valign="bottom">F0 instability</td>
<td align="center" valign="bottom">
<bold>&#x2212;0.466</bold>
</td>
<td align="center" valign="bottom">
<bold>&#x2212;2.986</bold>
</td>
<td align="center" valign="bottom">
<bold>0.0033</bold>
</td>
</tr>
<tr>
<td align="left" valign="top" colspan="5"><bold><italic>X</italic></bold><sup>
<bold><italic>2</italic></bold>
</sup><bold>(7, <italic>N</italic>&#x2009;=&#x2009;7,954)&#x2009;=&#x2009;61.1468, <italic>p</italic>&#x2009;&#x003C;&#x2009;0.001, AIC&#x2009;=&#x2009;32,855</bold> (<italic>compared to random intercept model</italic>)</td>
</tr>
<tr>
<td align="left" valign="top">Model 3:</td>
<td align="left" valign="bottom">Syllable count</td>
<td align="center" valign="bottom">&#x2212;0.018</td>
<td align="center" valign="bottom">&#x2212;0.299</td>
<td align="center" valign="bottom">0.7718</td>
<td align="left" valign="bottom">Syllable duration</td>
<td align="center" valign="bottom">
<bold>3.121</bold>
</td>
<td align="center" valign="bottom">
<bold>2.985</bold>
</td>
<td align="center" valign="bottom">
<bold>0.0033</bold>
</td>
<td align="left" valign="bottom">Spectral flux</td>
<td align="center" valign="bottom">
<bold>&#x2212;0.008</bold>
</td>
<td align="center" valign="bottom">
<bold>&#x2212;3.546</bold>
</td>
<td align="center" valign="bottom">
<bold>0.0005</bold>
</td>
<td align="left" valign="bottom">F0 instability</td>
<td align="center" valign="bottom">
<bold>&#x2212;0.467</bold>
</td>
<td align="center" valign="bottom">
<bold>&#x2212;2.985</bold>
</td>
<td align="center" valign="bottom">
<bold>0.0033</bold>
</td>
</tr>
<tr>
<td align="left" valign="top" colspan="5"><bold><italic>X</italic></bold><sup>
<bold><italic>2</italic></bold>
</sup><bold>(8, <italic>N</italic>&#x2009;=&#x2009;7,954)&#x2009;=&#x2009;0.0921, <italic>p</italic>&#x2009;=&#x2009;0.7615, AIC&#x2009;=&#x2009;32,857</bold> (<italic>compared to model 2</italic>)</td>
</tr>
<tr>
<td align="left" valign="top">Model 4:</td>
<td align="left" valign="bottom">Utterance type (speech)</td>
<td align="center" valign="bottom">
<bold>&#x2212;0.980</bold>
</td>
<td align="center" valign="bottom">
<bold>&#x2212;5.297</bold>
</td>
<td align="center" valign="bottom">
<bold>&#x003C;0.0001</bold>
</td>
<td align="left" valign="bottom">Syllable duration</td>
<td align="center" valign="bottom">
<bold>1.483</bold>
</td>
<td align="center" valign="bottom">
<bold>2.194</bold>
</td>
<td align="center" valign="bottom">
<bold>0.0298</bold>
</td>
<td align="left" valign="bottom">Spectral flux</td>
<td align="center" valign="bottom">
<bold>&#x2212;0.008</bold>
</td>
<td align="center" valign="bottom">
<bold>&#x2212;4.363</bold>
</td>
<td align="center" valign="bottom">
<bold>&#x003C;0.0001</bold>
</td>
<td align="left" valign="bottom">F0 instability</td>
<td align="center" valign="bottom">&#x2212;0.095</td>
<td align="center" valign="bottom">&#x2212;0.531</td>
<td align="center" valign="bottom">0.5965</td>
</tr>
<tr>
<td align="left" valign="top" colspan="5"><bold><italic>X</italic></bold><sup>
<bold><italic>2</italic></bold>
</sup><bold>(8, <italic>N</italic>&#x2009;=&#x2009;7,954)&#x2009;=&#x2009;26.464, <italic>p</italic>&#x2009;&#x003C;&#x2009;0.0001, AIC&#x2009;=&#x2009;32,830</bold> (<italic>compared to model 2</italic>)</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>Model 4 is the best fitting model, with syllable duration and spectral flux predicting rhythmic regularity even after accounting for stimulus type (speech vs. song). Model 3 and Model 4 illustrate syllable duration and spectral flux are robust predictors of rhythmic regularity even after accounting for the number of syllables in an utterance and stimulus type. Bold variables indicate significant predictors to the model.</p>
</table-wrap-foot>
</table-wrap>
<fig position="float" id="fig3"><label>Figure 3</label>
<caption>
<p>Model 4 indicates that syllable duration and spectral flux are significant predictors of perceived rhythmic regularity even after accounting for utterance type (speech vs. song), but F0 instability, which was significant in models without utterance type as a factor, is no longer a significant predictor of perceived regularity. Error bars for utterance type and shaded error regions represent standard error calculated using the Kenward-Roger coefficient covariance matrix (Effect package, R).</p>
</caption>
<graphic xlink:href="fpsyg-14-1167003-g003.tif"/>
</fig>
</sec>
</sec>
<sec id="sec14">
<title>Interim discussion</title>
<p>A major goal of Experiment 2 was to standardize participants&#x2019; interpretation of rhythmic regularity by providing a concrete definition centered on ease of clapping or tapping along with the stimulus. With this definition, rhythmic regularity ratings were significantly higher for sung than spoken utterances. Experiment 2 also expanded on the acoustically matched stimulus set from Experiment 1 by including an additional unmatched stimulus set more representative of speech and song in everyday settings. Participants rated song as more rhythmically regular than speech for both sets, but the difference was larger for the acoustically unmatched than the matched set. Naturally recorded utterances may emphasize the differences in regularity between song and speech compared to recordings that equate tempo, pitch contour, and average pitch between speech and song. However, regularity differences are apparent even in carefully acoustically matched stimulus sets, suggesting that regularity helps differentiate speech and song. Finally, we estimated which acoustic features across both stimulus sets were most predictive of regularity ratings. Although the type of stimulus (speech or song) was a significant predictor of regularity, longer syllable durations and less spectral flux also predicted higher rhythmic regularity ratings.</p>
</sec>
<sec id="sec15">
<title>General discussion</title>
<p>The goal of this work was to obtain a subjective metric of rhythmic regularity&#x2014;an equally-spaced, repeating pulse&#x2014;and examine acoustic features that predict participants&#x2019; ratings of regularity. Experiment 1 illustrated that the term rhythmic regularity was interpreted differently across participants, leading to different patterns of regularity across spoken and sung exemplars. Experiment 2 operationalized the definition of rhythmic regularity by asking how easy it would be to tap or clap to the stimulus. With this definition, participants rated song as more regular&#x2013;or easier to clap or tap to&#x2013;than speech in both acoustically matched and acoustically unmatched stimulus sets. Subjective regularity ratings were significantly affected by acoustic features of syllable duration and spectral flux, with longer durations and less flux related to higher regularity ratings. These results add to the literature by (1) highlighting the salience of rhythmic regularity as a differentiator of speech and song (<xref ref-type="bibr" rid="ref67">Patel and Daniele, 2003</xref>; <xref ref-type="bibr" rid="ref68">Patel et al., 2005</xref>; <xref ref-type="bibr" rid="ref92">Vanden Bosch der Nederlanden et al., 2022b</xref>) and (2) adding to the growing literature on spectral flux as a salient acoustic feature in listeners&#x2019; perceptual processing of sound (<xref ref-type="bibr" rid="ref95">Weineck et al., 2022</xref>).</p>
<p>Spectral flux is a metric of the distance between successive frames, or moments in time, in the frequency spectrum, with larger values indicating large changes in the spectrum from moment to moment (<xref ref-type="bibr" rid="ref2">Alluri and Toiviainen, 2010</xref>). It logically follows that song should have less spectral flux since notes are held longer (i.e., greater proportion of the utterance is vocalic) than in speech, creating fewer changes in the spectrum on a moment-to-moment basis. The metrical framework of sung utterances may also make for fewer sudden and more evenly spaced changes in the spectrum compared to speech. Spectral flux has been described as an acoustic correlate of the beat in music, but with greater spectral flux indicating greater beat salience (<xref ref-type="bibr" rid="ref10">Burger et al., 2013</xref>). These authors extracted spectral flux from low and high frequency bands in the spectrum corresponding to the kick drum, hi-hat, and cymbal. For this reason, large amounts of spectral flux in these bands acted as a proxy for rhythmic information from these instruments. These stimulus-specific differences help to explain the seeming paradox of greater spectral flux predicting more beat salience in music, while greater spectral flux predicts less rhythmic regularity when comparing speech to song.</p>
<p>Our results elucidate what features participants use to provide regularity ratings when comparing speech and song, but these features alone are unlikely to capture the presence of a beat or the integer multiple relatedness of sounds snapping to the metrical grid across a wide range of environmental stimuli. We attempted to account for listeners&#x2019; subjective regularity ratings using several music- and language-inspired metrics of regularity. In particular, the proportion of intervals per sentence that were related by integer multiples (<xref ref-type="bibr" rid="ref76">Roeske et al., 2020</xref>) was not correlated with regularity ratings. It may be that our sentence-level approach is too coarse a metric and behavioral responses like tapping or continuous regularity ratings could shed light on which features participants relied on at particular moments in time to feel a beat (similar to <xref ref-type="bibr" rid="ref73">Rathcke et al., 2021</xref>). The consistency with which those moments align with inter-onset-interval or stimulus features could provide a path forward for creating novel metrics to characterize regularity differences in speech and song. Another set of metrics used for this study (Asynchrony, Signed Asynchrony and their variability) was inspired by the clock timing work from <xref ref-type="bibr" rid="ref71">Povel and Essens (1985)</xref> (similar to <xref ref-type="bibr" rid="ref64">Norton and Scharff, 2016</xref> for birdsong). However, this metric also failed to provide any relationship to subjective regularity and may also require input from the p-center-related literature (e.g., <xref ref-type="bibr" rid="ref73">Rathcke et al., 2021</xref>) to determine the correct beat locations and onset times used to develop the underlying &#x201C;clock&#x201D; for speech and song. Onset intervals related to vocalic or other salient features of the stimulus may be more fruitful than the reliance on linguistic onsets used here. Finally, music information retrieval metrics like pulse clarity and stimulus-extracted tempo had no relationship to rhythmic regularity in speech and song, suggesting that these feature extraction methods are perhaps better suited for use with multi-instrument (e.g., vocals and instrumentation) excerpts of musical pieces rather than vocal sung and spoken utterances.</p>
<p>Linguistic measures, including measures that have previously been used to relate speech and music to one another, such as nPVI, also did not explain additional variance in rhythmic regularity beyond average syllable duration (see <xref ref-type="supplementary-material" rid="SM1">Supplementary Table S6</xref>). Vocalic nPVI was originally developed to capture the vowel reduction (i.e., change in vowel quality to a &#x201C;schwa&#x201D; and shortened duration of vowel length) that happens in many of the so-called &#x201C;stress-timed&#x201D; languages (<xref ref-type="bibr" rid="ref30">Grabe and Low, 2002</xref>; <xref ref-type="bibr" rid="ref68">Patel et al., 2005</xref>; <xref ref-type="bibr" rid="ref14">Cummins, 2012</xref>). This measure is not best at capturing rhythmic variability, but rather contrastiveness between pairs of syllables. Indeed, our calculations indicated that music often had more contrastiveness than speech (see <xref rid="tab1" ref-type="table">Table 1</xref>, Unmatched stimuli), which is likely due to large integer-related duration differences like quarter notes to half or whole notes that speech does not employ. Comparisons of previous work from separate studies suggested that nPVIs were much higher for speech (in the 50&#x2013;70 range) than instrumental music (in the 30&#x2013;40 range; <xref ref-type="bibr" rid="ref67">Patel and Daniele, 2003</xref>; <xref ref-type="bibr" rid="ref35">Hannon et al., 2016</xref>), but these studies used musical notation to estimate nPVI durations instead of actual recordings. Studies that have used acoustic segmentation of speech and song have illustrated more comparable nPVI values (<xref ref-type="bibr" rid="ref91">Vanden Bosch der Nederlanden et al., 2022a</xref>,<xref ref-type="bibr" rid="ref92">b</xref>). Thus, it is not surprising that this metric did not uniquely predict rhythmic regularity for spoken compared to sung stimuli.</p>
<p>Despite the ease with which humans pick up on regularity in speech, song, and environmental sounds, easily extractable acoustic features that characterize those subjective reports remain elusive. Our study confirms that participants hear more rhythmic regularity in sung compared to spoken utterances, providing concrete metrics for how best to obtain participant&#x2019;s subjective regularity ratings. The findings from this study also add to the literature by characterizing that regularity is easier to detect&#x2013;or more likely to be present&#x2013;when syllables are longer, and when there is less moment-to-moment fluctuation in the spectrum. Future work should build on these results to develop more continuous and fine-grained metrics for quantifying rhythmic regularity from the acoustic signal. There is growing evidence that rhythmic regularity is an important signal for attention, perception, development, and movement (<xref ref-type="bibr" rid="ref31">Grahn and Brett, 2007</xref>; <xref ref-type="bibr" rid="ref28">Gordon et al., 2014</xref>; <xref ref-type="bibr" rid="ref5">Bedoin et al., 2016</xref>; <xref ref-type="bibr" rid="ref87">Trainor et al., 2018</xref>; <xref ref-type="bibr" rid="ref3">Aman et al., 2021</xref>; <xref ref-type="bibr" rid="ref51">Lense et al., 2021</xref>) in humans, and is present in a range of human and non-human primate communicative vocalizations (<xref ref-type="bibr" rid="ref76">Roeske et al., 2020</xref>; <xref ref-type="bibr" rid="ref18">De Gregorio et al., 2021</xref>), as well as many environmental sounds (<xref ref-type="bibr" rid="ref33">Gygi et al., 2004</xref>). Indeed, the perception of rhythmic regularity is key to how both human and non-human animals (e.g., cockatoos, sea lions) align their movements to a beat (<xref ref-type="bibr" rid="ref24">Fitch, 2013</xref>). A greater understanding of what acoustic features humans rely on to perceive regularity and extract an underlying pulse in communicative signals like speech and song will contribute to theories of evolutionary origins of beat processing (e.g., are the features humans use to find a beat the same or different from animals?) and theories about perceptual biases toward regularity in everyday soundscapes.</p>
<p>One potential limitation of the current study is the use of lyrics in both the music and language domains. We wanted to use speech and song because they exemplify the acoustic and structural differences between domains (<xref ref-type="bibr" rid="ref91">Vanden Bosch der Nederlanden et al., 2022a</xref>,<xref ref-type="bibr" rid="ref92">b</xref>), while maintaining the ability to control for timbral, semantic, and other temporal or spectral acoustic features. It will be important to characterize the role linguistic content plays in the perception of rhythmic regularity in song. For instance, is song without words perceived as more strictly regular that song with words given that note durations are less dictated by word length or stress? If so, then are instrumental melodies perceived as more rhythmically regular than songs without words? Or does linguistic or semantic content help to bolster temporal prediction for what type of note and/or word will come next? Similarly, would speech without semantic content (e.g., low-pass filtered) be perceived as more or less regular than semantic speech? This and future work will help shed light on the temporal features that distinguish speech and song and, more broadly, the domains of music and language.</p>
<p>The current findings add to the literature on rhythm in music and language by providing a concrete subjective metric of rhythmic regularity that reliably differs between speech and song across stimulus sets. The metric is simple to understand and can be used to characterize the perception of rhythmic regularity across developmental populations, in individuals with little or no musical training, and in a range of stimulus sets beyond music and language (e.g., bird song). Our findings are important for characterizing the inherent differences in music and language that (1) may be important for learning to differentiate musical and linguistic communication early in development (<xref ref-type="bibr" rid="ref91">Vanden Bosch der Nederlanden et al., 2022a</xref>,<xref ref-type="bibr" rid="ref92">b</xref>) and (2) underlie many of the perceptual advantages ascribed to music over language. For instance, cross-culturally humans prefer simple integer ratios in music (<xref ref-type="bibr" rid="ref204">Jacoby and McDermott, 2017</xref>) and remember these musical rhythms better than syncopated rhythms that disrupt the occurrence of events on a beat (<xref ref-type="bibr" rid="ref26">Fitch and Rosenfeld, 2007</xref>). Future work comparing the prominence of features in speech compared to song could address the divergence of musical and linguistic communication in humans. For instance, does the preservation of rhythmic regularity in music come at a cost to the transmission of quick messages meant to transact information? Is strict isochrony better for promoting verbatim memory of information occurring on, but not off the beat (<xref ref-type="bibr" rid="ref205">Jones et al., 1981</xref>; <xref ref-type="bibr" rid="ref206">Large, 2008</xref>; <xref ref-type="bibr" rid="ref203">Helfrich et al., 2018</xref>) while vague periodicity without strict isochrony (as in speech) is better for encoding the gist of a message? Answering seemingly simple questions like how humans perceive differences in rhythmic regularity in speech and song, has the potential to address several important areas of psychology related to human communicative development, origins of music and language, cross-species comparisons, and perceptual biases toward regularity in everyday scenes.</p>
</sec>
<sec id="sec16" sec-type="data-availability">
<title>Data availability statement</title>
<p>The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found at: <ext-link xlink:href="https://osf.io/hnw5t/" ext-link-type="uri">https://osf.io/hnw5t/</ext-link>.</p>
</sec>
<sec id="sec17">
<title>Ethics statement</title>
<p>The studies involving human participants were reviewed and approved by the University of Western Ontario Ethics Board. The patients/participants provided their written informed consent to participate in this study.</p>
</sec>
<sec id="sec18">
<title>Author contributions</title>
<p>CY, JG, and CV designed the experiments. CY and CV recruited the participants and performed the data analysis. CV and AC extracted acoustic features and manually segmented stimuli. CY wrote the first draft. CV provided subsequent drafts. All authors contributed to the article and approved the submitted version.</p>
</sec>
<sec id="sec19" sec-type="funding-information">
<title>Funding</title>
<p>This work was supported by NSERC RGPIN-2016-05834 awarded to JG and NSERC RGPIN-2022-04413 and DGECR-2022-00294 awarded to CV.</p>
</sec>
<sec id="conf1" sec-type="COI-statement">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec id="sec100" sec-type="disclaimer">
<title>Publisher&#x2019;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
</body>
<back>
<ack>
<p>We would like to thank Patrick Grzela and Olivia Jiaming Xu for their help setting up the online survey and training trials for Experiment 1. Thanks to Teia Tremblett for helping with stimulus organization and identifying stressed syllables. We also wish to thank Renee Ragguet for her assistance programming the music-inspired rhythmic regularity metrics used in this study.</p>
</ack>
<sec id="sec21" sec-type="supplementary-material">
<title>Supplementary material</title>
<p>The Supplementary material for this article can be found online at: <ext-link xlink:href="https://www.frontiersin.org/articles/10.3389/fpsyg.2023.1167003/full#supplementary-material" ext-link-type="uri">https://www.frontiersin.org/articles/10.3389/fpsyg.2023.1167003/full#supplementary-material</ext-link></p>
<supplementary-material xlink:href="Data_Sheet_1.PDF" id="SM1" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Data_Sheet_2.docx" id="SM2" mimetype="application/vnd.openxmlformats-officedocument.wordprocessingml.document" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="ref1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Albouy</surname> <given-names>P.</given-names></name> <name><surname>Benjamin</surname> <given-names>L.</given-names></name> <name><surname>Morillon</surname> <given-names>B.</given-names></name> <name><surname>Zatorre</surname> <given-names>R. J.</given-names></name></person-group> (<year>2020</year>). <article-title>Distinct sensitivity to spectrotemporal modulation supports brain asymmetry for speech and melody</article-title>. <source>Science</source> <volume>367</volume>, <fpage>1043</fpage>&#x2013;<lpage>1047</lpage>. doi: <pub-id pub-id-type="doi">10.1126/science.aaz3468</pub-id>, PMID: <pub-id pub-id-type="pmid">32108113</pub-id></citation></ref>
<ref id="ref201">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Albouy</surname> <given-names>P.</given-names></name> <name><surname>Mehr</surname> <given-names>S. A.</given-names></name> <name><surname>Hoyer</surname> <given-names>R. S.</given-names></name> <name><surname>Ginzburg</surname> <given-names>J.</given-names></name> <name><surname>Zatorre</surname> <given-names>R. J.</given-names></name></person-group> (<year>2023</year>). <article-title>Spectro-temporal acoustical markers differentiate speech from song across cultures</article-title> <source>bioRxiv.</source> doi: <pub-id pub-id-type="doi">10.1101/2023.01.29.526133</pub-id>, PMID: <pub-id pub-id-type="pmid">20329856</pub-id></citation></ref>
<ref id="ref2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Alluri</surname> <given-names>V.</given-names></name> <name><surname>Toiviainen</surname> <given-names>P.</given-names></name></person-group> (<year>2010</year>). <article-title>Exploring perceptual and acoustical correlates of polyphonic timbre</article-title>. <source>Music. Percept.</source> <volume>27</volume>, <fpage>223</fpage>&#x2013;<lpage>242</lpage>. doi: <pub-id pub-id-type="doi">10.1525/mp.2010.27.3.223</pub-id></citation></ref>
<ref id="ref3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Aman</surname> <given-names>L.</given-names></name> <name><surname>Picken</surname> <given-names>S.</given-names></name> <name><surname>Andreou</surname> <given-names>L. V.</given-names></name> <name><surname>Chait</surname> <given-names>M.</given-names></name></person-group> (<year>2021</year>). <article-title>Sensitivity to temporal structure facilitates perceptual analysis of complex auditory scenes</article-title>. <source>Hear. Res.</source> <volume>400</volume>:<fpage>108111</fpage>. doi: <pub-id pub-id-type="doi">10.1016/j.heares.2020.108111</pub-id>, PMID: <pub-id pub-id-type="pmid">33333425</pub-id></citation></ref>
<ref id="ref4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Arvaniti</surname> <given-names>A.</given-names></name></person-group> (<year>2012</year>). <article-title>The usefulness of metrics in the quantification of speech rhythm</article-title>. <source>J. Phon.</source> <volume>40</volume>, <fpage>351</fpage>&#x2013;<lpage>373</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.wocn.2012.02.003</pub-id></citation></ref>
<ref id="ref202">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Assaneo</surname> <given-names>M. F.</given-names></name> <name><surname>Ripoll&#x00E9;s</surname> <given-names>P.</given-names></name> <name><surname>Orpella</surname> <given-names>J.</given-names></name> <name><surname>Lin</surname> <given-names>W. M.</given-names></name> <name><surname>de Diego-Balaguer</surname> <given-names>R.</given-names></name> <name><surname>Poeppel</surname> <given-names>D.</given-names></name></person-group> (<year>2019</year>). <article-title>Spontaneous synchronization to speech reveals neural mechanisms facilitating language learning</article-title>  <source>Nat. Neurosci.</source> <volume>22</volume>, <fpage>627</fpage>&#x2013;<lpage>632</lpage>. doi: <pub-id pub-id-type="doi">10.1038/s41593-019-0353-z</pub-id>, PMID: <pub-id pub-id-type="pmid">20329856</pub-id></citation></ref>
<ref id="ref5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bedoin</surname> <given-names>N.</given-names></name> <name><surname>Brisseau</surname> <given-names>L.</given-names></name> <name><surname>Molinier</surname> <given-names>P.</given-names></name> <name><surname>Roch</surname> <given-names>D.</given-names></name> <name><surname>Tillman</surname> <given-names>B.</given-names></name></person-group> (<year>2016</year>). <article-title>Temporally regular musical primes facilitate subsequent syntax processing children with specific language impairment</article-title>. <source>Front. Neurosci.</source> <volume>10</volume>:<fpage>e00245</fpage>. doi: <pub-id pub-id-type="doi">10.3389/fnins.2016.00245</pub-id>, PMID: <pub-id pub-id-type="pmid">27378833</pub-id></citation></ref>
<ref id="ref6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Beier</surname> <given-names>E. J.</given-names></name> <name><surname>Ferreira</surname> <given-names>F.</given-names></name></person-group> (<year>2018</year>). <article-title>The temporal prediction of stress in speech and its relation to musical beat perception</article-title>. <source>Front. Psychol.</source> <volume>9</volume>:<fpage>431</fpage>. doi: <pub-id pub-id-type="doi">10.3389/fpsyg.2018.00431</pub-id>, PMID: <pub-id pub-id-type="pmid">29666600</pub-id></citation></ref>
<ref id="ref7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bharucha</surname> <given-names>J. J.</given-names></name> <name><surname>Pryhor</surname> <given-names>J. H.</given-names></name></person-group> (<year>1986</year>). <article-title>Disrupting the isochrony underlying rhythm: an asymmetry in discrimination</article-title>. <source>Percept. Psychophys.</source> <volume>40</volume>, <fpage>137</fpage>&#x2013;<lpage>141</lpage>. doi: <pub-id pub-id-type="doi">10.3758/bf03203008</pub-id>, PMID: <pub-id pub-id-type="pmid">3774495</pub-id></citation></ref>
<ref id="ref8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bouwer</surname> <given-names>F. L.</given-names></name> <name><surname>Burgoyne</surname> <given-names>J. A.</given-names></name> <name><surname>Odijk</surname> <given-names>D.</given-names></name> <name><surname>Honing</surname> <given-names>H.</given-names></name> <name><surname>Grahn</surname> <given-names>J. A.</given-names></name></person-group> (<year>2018</year>). <article-title>What makes a rhythm complex? The influence of musical training and accent type on beat perception</article-title>. <source>PLoS One</source> <volume>13</volume>:<fpage>e0190322</fpage>. doi: <pub-id pub-id-type="doi">10.1371/journal.pone.0190322</pub-id>, PMID: <pub-id pub-id-type="pmid">29320533</pub-id></citation></ref>
<ref id="ref9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Brown</surname> <given-names>S.</given-names></name> <name><surname>Pfordresher</surname> <given-names>P. Q.</given-names></name> <name><surname>Chow</surname> <given-names>I.</given-names></name></person-group> (<year>2017</year>). <article-title>A musical model of speech rhythm</article-title>. <source>Psychomusicology</source> <volume>27</volume>, <fpage>95</fpage>&#x2013;<lpage>112</lpage>. doi: <pub-id pub-id-type="doi">10.1037/pmu0000175</pub-id></citation></ref>
<ref id="ref10">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Burger</surname> <given-names>B.</given-names></name> <name><surname>Ahokas</surname> <given-names>J. R.</given-names></name> <name><surname>Keipi</surname> <given-names>A.</given-names></name> <name><surname>Toiviainen</surname> <given-names>P.</given-names></name></person-group> (<year>2013</year>). Relationships between spectral flux, perceived rhythmic strength, and the propensity to move. In: <italic>10th Sound And Music Computing Conference</italic>. Stockholm, Sweden.</citation></ref>
<ref id="ref11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Burger</surname> <given-names>B.</given-names></name> <name><surname>Thompson</surname> <given-names>M. R.</given-names></name> <name><surname>Luck</surname> <given-names>G.</given-names></name> <name><surname>Saarikallio</surname> <given-names>S. H.</given-names></name> <name><surname>Toiviainen</surname> <given-names>P.</given-names></name></person-group> (<year>2014</year>). <article-title>Hunting for the beat in the body: on period and phase locking in music-induced movement</article-title>. <source>Front. Hum. Neurosci.</source> <volume>8</volume>:<fpage>e00903</fpage>:<fpage>903</fpage>. doi: <pub-id pub-id-type="doi">10.3389/fnhum.2014.00903</pub-id></citation></ref>
<ref id="ref12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cheang</surname> <given-names>H. S.</given-names></name> <name><surname>Pell</surname> <given-names>M. D.</given-names></name></person-group> (<year>2008</year>). <article-title>The sound of sarcasm</article-title>. <source>Speech Commun.</source> <volume>50</volume>, <fpage>366</fpage>&#x2013;<lpage>381</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.specom.2007.11.003</pub-id></citation></ref>
<ref id="ref13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Coath</surname> <given-names>M.</given-names></name> <name><surname>Denham</surname> <given-names>S. L.</given-names></name> <name><surname>Smith</surname> <given-names>L. M.</given-names></name> <name><surname>Honing</surname> <given-names>H.</given-names></name> <name><surname>Hazan</surname> <given-names>A.</given-names></name> <name><surname>Holonowicz</surname> <given-names>P.</given-names></name> <etal/></person-group>. (<year>2009</year>). <article-title>Model cortical responses for the detection of perceptual onsets and beat tracking in singing</article-title>. <source>Connect. Sci.</source> <volume>21</volume>, <fpage>193</fpage>&#x2013;<lpage>205</lpage>. doi: <pub-id pub-id-type="doi">10.1080/09540090902733905</pub-id></citation></ref>
<ref id="ref14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cummins</surname> <given-names>J.</given-names></name></person-group> (<year>2012</year>). <article-title>The intersection of cognitive and sociocultural factors in the development of reading comprehension among immigrant students</article-title>. <source>Read. Writ.</source> <volume>25</volume>, <fpage>1973</fpage>&#x2013;<lpage>1990</lpage>. doi: <pub-id pub-id-type="doi">10.1007/s11145-010-9290-7</pub-id></citation></ref>
<ref id="ref15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cutler</surname> <given-names>A.</given-names></name> <name><surname>Butterfield</surname> <given-names>S.</given-names></name></person-group> (<year>1992</year>). <article-title>Rhythmic cues to speech segmentation: evidence from juncture misperception</article-title>. <source>J. Mem. Lang.</source> <volume>31</volume>, <fpage>218</fpage>&#x2013;<lpage>236</lpage>. doi: <pub-id pub-id-type="doi">10.1016/0749-596X(92)90012-</pub-id></citation></ref>
<ref id="ref16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cutler</surname> <given-names>A.</given-names></name> <name><surname>Foss</surname> <given-names>D. J.</given-names></name></person-group> (<year>1977</year>). <article-title>On the role of sentence stress in sentence processing</article-title>. <source>Lang. Speech</source> <volume>20</volume>, <fpage>1</fpage>&#x2013;<lpage>10</lpage>. doi: <pub-id pub-id-type="doi">10.1177/002383097702000101</pub-id></citation></ref>
<ref id="ref17">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dalla Bella</surname> <given-names>S.</given-names></name> <name><surname>Bia&#x0142;u&#x0144;ska</surname> <given-names>A.</given-names></name> <name><surname>Sowinski</surname> <given-names>J. S.</given-names></name></person-group> (<year>2013</year>). <article-title>Why movement is captured by music, but less by speech: role of temporal regularity</article-title>. <source>PLoS One</source> <volume>8</volume>:<fpage>e71945</fpage>. doi: <pub-id pub-id-type="doi">10.1371/journal.pone.0071945</pub-id></citation></ref>
<ref id="ref18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>De Gregorio</surname> <given-names>C.</given-names></name> <name><surname>Valente</surname> <given-names>D.</given-names></name> <name><surname>Raimondi</surname> <given-names>T.</given-names></name> <name><surname>Torti</surname> <given-names>V.</given-names></name> <name><surname>Miaretsoa</surname> <given-names>L.</given-names></name> <name><surname>Friard</surname> <given-names>O.</given-names></name> <etal/></person-group>. (<year>2021</year>). <article-title>Categorical rhythms in a singing primate</article-title>. <source>Curr. Biol.</source> <volume>31</volume>, <fpage>R1379</fpage>&#x2013;<lpage>R1380</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.cub.2021.09.032</pub-id>, PMID: <pub-id pub-id-type="pmid">34699799</pub-id></citation></ref>
<ref id="ref19">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dilley</surname> <given-names>L. C.</given-names></name> <name><surname>McAuley</surname> <given-names>J. D.</given-names></name></person-group> (<year>2008</year>). <article-title>Distal prosodic context affects word segmentation and lexical processing</article-title>. <source>J. Mem. Lang.</source> <volume>59</volume>, <fpage>294</fpage>&#x2013;<lpage>311</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.jml.2008.06.006</pub-id></citation></ref>
<ref id="ref21">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Drake</surname> <given-names>C.</given-names></name></person-group> (<year>1998</year>). <article-title>Psychological processes involved in the temporal Organization of Complex Auditory Sequences: universal and acquired processes</article-title>. <source>Music. Percept.</source> <volume>16</volume>, <fpage>11</fpage>&#x2013;<lpage>26</lpage>. doi: <pub-id pub-id-type="doi">10.2307/40285774</pub-id></citation></ref>
<ref id="ref22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ellis</surname> <given-names>D. P. W.</given-names></name></person-group> (<year>2007</year>). <article-title>Beat tracking by dynamic programming</article-title>. <source>J. New Music Res.</source> <volume>36</volume>, <fpage>51</fpage>&#x2013;<lpage>60</lpage>. doi: <pub-id pub-id-type="doi">10.1080/09298210701653344</pub-id></citation></ref>
<ref id="ref23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Epstein</surname> <given-names>D.</given-names></name></person-group> (<year>1985</year>). <article-title>Tempo relations: a cross-cultural study</article-title>. <source>Music Theory Spectr</source> <volume>7</volume>, <fpage>34</fpage>&#x2013;<lpage>71</lpage>. doi: <pub-id pub-id-type="doi">10.2307/745880</pub-id></citation></ref>
<ref id="ref24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fitch</surname> <given-names>W. T.</given-names></name></person-group> (<year>2013</year>). <article-title>Rhythmic cognition in humans and animals: distinguishing meter and pulse perception</article-title>. <source>Front. Syst. Neurosci.</source> <volume>7</volume>:<fpage>68</fpage>. doi: <pub-id pub-id-type="doi">10.3389/fnsys.2013.00068</pub-id>, PMID: <pub-id pub-id-type="pmid">24198765</pub-id></citation></ref>
<ref id="ref25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fitch</surname> <given-names>W. T.</given-names></name></person-group> (<year>2016</year>). <article-title>Dance, music, meter and groove: a forgotten partnership</article-title>. <source>Front. Hum. Neurosci.</source> <volume>10</volume>:<fpage>64</fpage>. doi: <pub-id pub-id-type="doi">10.3389/fnhum.2016.00064</pub-id>, PMID: <pub-id pub-id-type="pmid">26973489</pub-id></citation></ref>
<ref id="ref26">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fitch</surname> <given-names>W. T.</given-names></name> <name><surname>Rosenfeld</surname> <given-names>A. J.</given-names></name></person-group> (<year>2007</year>). <article-title>Perception and production of syncopated rhythms</article-title>. <source>Music. Percept.</source> <volume>25</volume>, <fpage>43</fpage>&#x2013;<lpage>58</lpage>. doi: <pub-id pub-id-type="doi">10.1525/mp.2007.25.1.43</pub-id></citation></ref>
<ref id="ref27">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Fraisse</surname> <given-names>P.</given-names></name></person-group> (<year>1982</year>). &#x201C;<article-title>Rhythm and tempo</article-title>&#x201D; in <source>Psychology of music</source>. ed. <person-group person-group-type="editor"><name><surname>Deutsch</surname> <given-names>D.</given-names></name></person-group> (<publisher-loc>New York</publisher-loc>: <publisher-name>Academic Press</publisher-name>), <fpage>149</fpage>&#x2013;<lpage>180</lpage>.</citation></ref>
<ref id="ref28">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gordon</surname> <given-names>R. L.</given-names></name> <name><surname>Shivers</surname> <given-names>C. M.</given-names></name> <name><surname>Wieland</surname> <given-names>E. A.</given-names></name> <name><surname>Kotz</surname> <given-names>S. A.</given-names></name> <name><surname>Yoder</surname> <given-names>P. J.</given-names></name> <name><surname>McAuley</surname> <given-names>J. D.</given-names></name></person-group> (<year>2014</year>). <article-title>Musical rhythm discrimination explains individual differences in grammar skills in children</article-title>. <source>Dev. Sci.</source> <volume>18</volume>, <fpage>635</fpage>&#x2013;<lpage>644</lpage>. doi: <pub-id pub-id-type="doi">10.1111/desc.12230</pub-id></citation></ref>
<ref id="ref29">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Goswami</surname> <given-names>U.</given-names></name> <name><surname>Leong</surname> <given-names>V.</given-names></name></person-group> (<year>2013</year>). <article-title>Speech rhythm and temporal structure: converging perspectives?</article-title> <source>Lab. Phonol.</source> <volume>4</volume>, <fpage>67</fpage>&#x2013;<lpage>92</lpage>. doi: <pub-id pub-id-type="doi">10.1515/lp-2013-0004</pub-id></citation></ref>
<ref id="ref30">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Grabe</surname> <given-names>E.</given-names></name> <name><surname>Low</surname> <given-names>E.</given-names></name></person-group> (<year>2002</year>). <article-title>Durational variability in speech and the rhythm class hypothesis</article-title>. <source>Lab. Phonol.</source> <volume>7</volume>, <fpage>515</fpage>&#x2013;<lpage>546</lpage>. doi: <pub-id pub-id-type="doi">10.1515/9783110197105.2.515</pub-id></citation></ref>
<ref id="ref31">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Grahn</surname> <given-names>J. A.</given-names></name> <name><surname>Brett</surname> <given-names>M.</given-names></name></person-group> (<year>2007</year>). <article-title>Rhythm and beat perception in motor areas of the brain</article-title>. <source>J. Cogn. Neurosci.</source> <volume>19</volume>, <fpage>893</fpage>&#x2013;<lpage>906</lpage>. doi: <pub-id pub-id-type="doi">10.1162/jocn.2007.19.5.893</pub-id></citation></ref>
<ref id="ref32">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Grosche</surname> <given-names>P.</given-names></name> <name><surname>M&#x00FC;ller</surname> <given-names>M.</given-names></name> <name><surname>Sapp</surname> <given-names>C.</given-names></name></person-group> (<year>2010</year>). What makes beat tracking difficult? A case study on Chopin mazurkas. In: <italic>Proceedings of the 11th International Society for Music Information Retrieval Conference</italic>, ISMIR 2010, Utrecht, Netherlands. 649&#x2013;654.</citation></ref>
<ref id="ref33">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gygi</surname> <given-names>B.</given-names></name> <name><surname>Kidd</surname> <given-names>G. R.</given-names></name> <name><surname>Watson</surname> <given-names>C. S.</given-names></name></person-group> (<year>2004</year>). <article-title>Spectral-temporal factors in the identification of environmental sounds</article-title>. <source>J. Acoust. Soc. Am.</source> <volume>115</volume>, <fpage>1252</fpage>&#x2013;<lpage>1265</lpage>. doi: <pub-id pub-id-type="doi">10.1121/1.1635840</pub-id>, PMID: <pub-id pub-id-type="pmid">15058346</pub-id></citation></ref>
<ref id="ref34">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hannon</surname> <given-names>E. E.</given-names></name></person-group> (<year>2009</year>). <article-title>Perceiving speech rhythm in music: listeners classify instrumental songs according to language of origin</article-title>. <source>Cognition</source> <volume>111</volume>, <fpage>403</fpage>&#x2013;<lpage>409</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.cognition.2009.03.003</pub-id>, PMID: <pub-id pub-id-type="pmid">19358985</pub-id></citation></ref>
<ref id="ref35">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hannon</surname> <given-names>E. E.</given-names></name> <name><surname>L&#x00E9;v&#x00EA;que</surname> <given-names>Y.</given-names></name> <name><surname>Nave</surname> <given-names>K. M.</given-names></name> <name><surname>Trehub</surname> <given-names>S. E.</given-names></name></person-group> (<year>2016</year>). <article-title>Exaggeration of language-specific rhythms in English and French Children's songs</article-title>. <source>Front. Psychol.</source> <volume>7</volume>:<fpage>939</fpage>. doi: <pub-id pub-id-type="doi">10.3389/fpsyg.2016.00939</pub-id>, PMID: <pub-id pub-id-type="pmid">27445907</pub-id></citation></ref>
<ref id="ref36">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hawkins</surname> <given-names>S.</given-names></name></person-group> (<year>2014</year>). <article-title>Situational influences on rhythmicity in speech, music, and their interaction</article-title>. <source>Philos. Trans. R. Soc. B</source> <volume>369</volume>:<fpage>20130398</fpage>. doi: <pub-id pub-id-type="doi">10.1098/rstb.2013.0398</pub-id>, PMID: <pub-id pub-id-type="pmid">25385776</pub-id></citation></ref>
<ref id="ref37">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hay</surname> <given-names>J. S. F.</given-names></name> <name><surname>Diehl</surname> <given-names>R. L.</given-names></name></person-group> (<year>2007</year>). <article-title>Perception of rhythmic grouping: testing the iambic/trochaic law</article-title>. <source>Percept. Psychophys.</source> <volume>69</volume>, <fpage>113</fpage>&#x2013;<lpage>122</lpage>. doi: <pub-id pub-id-type="doi">10.3758/BF03194458</pub-id>, PMID: <pub-id pub-id-type="pmid">17515221</pub-id></citation></ref>
<ref id="ref38">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>H&#x00E9;bert</surname> <given-names>S.</given-names></name> <name><surname>Peretz</surname> <given-names>I.</given-names></name></person-group> (<year>1997</year>). <article-title>Recognition of music in long-term memory: are melodic and temporal patterns equal partners?</article-title> <source>Mem. Cognit.</source> <volume>25</volume>, <fpage>518</fpage>&#x2013;<lpage>533</lpage>. doi: <pub-id pub-id-type="doi">10.3758/BF03201127</pub-id>, PMID: <pub-id pub-id-type="pmid">9259629</pub-id></citation></ref>
<ref id="ref203">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Helfrich</surname> <given-names>R. F.</given-names></name> <name><surname>Fiebelkorn</surname> <given-names>I. C.</given-names></name> <name><surname>Szczepanski</surname> <given-names>S. M.</given-names></name> <name><surname>Lin</surname> <given-names>J. J.</given-names></name> <name><surname>Parvizi</surname> <given-names>J.</given-names></name> <name><surname>Knight</surname> <given-names>R. T.</given-names></name></person-group>  et al. (<year>2018</year>). <article-title>Neural Mechanisms of Sustained Attention Are Rhythmic.</article-title>  <source>Neuron</source> <volume>99</volume>, <fpage>854</fpage>&#x2013;<lpage>865</lpage>.e5. doi: <pub-id pub-id-type="doi">10.1016/j.neuron.2018.07.032</pub-id>, PMID: <pub-id pub-id-type="pmid">20329856</pub-id></citation></ref>
<ref id="ref39">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Henry</surname> <given-names>M. J.</given-names></name> <name><surname>Herrmann</surname> <given-names>B.</given-names></name> <name><surname>Grahn</surname> <given-names>J. A.</given-names></name></person-group> (<year>2017</year>). <article-title>What can we learn about beat perception by comparing brain signals and stimulus envelopes?</article-title> <source>PLoS One</source> <volume>12</volume>:<fpage>e0172454</fpage>. doi: <pub-id pub-id-type="doi">10.1371/journal.pone.0172454</pub-id>, PMID: <pub-id pub-id-type="pmid">28225796</pub-id></citation></ref>
<ref id="ref40">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hilton</surname> <given-names>C. B.</given-names></name> <name><surname>Moser</surname> <given-names>C. J.</given-names></name> <name><surname>Bertolo</surname> <given-names>M.</given-names></name> <name><surname>Lee-Rubin</surname> <given-names>H.</given-names></name> <name><surname>Amir</surname> <given-names>D.</given-names></name> <name><surname>Bainbridge</surname> <given-names>C. M.</given-names></name> <etal/></person-group>. (<year>2022</year>). <article-title>Acoustic regularities in infant-directed speech and song across cultures</article-title>. <source>Nat. Hum. Behav.</source> <volume>6</volume>, <fpage>1545</fpage>&#x2013;<lpage>1556</lpage>. doi: <pub-id pub-id-type="doi">10.1038/s41562-022-01410-x</pub-id>, PMID: <pub-id pub-id-type="pmid">35851843</pub-id></citation></ref>
<ref id="ref41">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Honing</surname> <given-names>H.</given-names></name></person-group> (<year>2012</year>). <article-title>Without it no music: beat induction as a fundamental musical trait</article-title>. <source>Ann. N. Y. Acad. Sci.</source> <volume>1252</volume>, <fpage>85</fpage>&#x2013;<lpage>91</lpage>. doi: <pub-id pub-id-type="doi">10.1111/j.1749-6632.2011.06402.x</pub-id></citation></ref>
<ref id="ref204">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jacoby,</surname> <given-names>N.</given-names></name> <name><surname>McDermott</surname> <given-names>J. H.</given-names></name></person-group> (<year>2017</year>). <article-title>Integer Ratio Priors on Musical Rhythm Revealed Cross-culturally by Iterated Reproduction</article-title>. <source>Curr. Biol.</source> <volume>27</volume>, <fpage>359</fpage>&#x2013;<lpage>370</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.cub.2016.12.031</pub-id></citation></ref>
<ref id="ref43">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jadoul</surname> <given-names>Y.</given-names></name> <name><surname>Ravignani</surname> <given-names>A.</given-names></name> <name><surname>Thompson</surname> <given-names>B.</given-names></name> <name><surname>Filippi</surname> <given-names>P.</given-names></name> <name><surname>de Boer</surname> <given-names>B.</given-names></name></person-group> (<year>2016</year>). <article-title>Seeking temporal predictability in speech: comparing statistical approaches on 18 world languages</article-title>. <source>Front. Hum. Neurosci.</source> <volume>10</volume>:<fpage>e00586</fpage>. doi: <pub-id pub-id-type="doi">10.3389/fnhum.2016.00586</pub-id>, PMID: <pub-id pub-id-type="pmid">27994544</pub-id></citation></ref>
<ref id="ref205">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jones</surname> <given-names>M. R.</given-names></name> <name><surname>Kidd</surname> <given-names>G.</given-names></name> <name><surname>Wetzel</surname> <given-names>R.</given-names></name></person-group> (<year>1981</year>). <article-title>Evidence for rhythmic attention</article-title>. <source>J. Exp. Psychol. Hum. Percept. Perform</source>. <volume>7</volume>, <fpage>1059</fpage>&#x2013;<lpage>1073</lpage>. doi: <pub-id pub-id-type="doi">10.1037//0096-1523.7.5.1059</pub-id>, PMID: <pub-id pub-id-type="pmid">36094165</pub-id></citation></ref>
<ref id="ref44">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jones</surname> <given-names>M. R.</given-names></name> <name><surname>Summerell</surname> <given-names>L.</given-names></name> <name><surname>Marshburn</surname> <given-names>E.</given-names></name></person-group> (<year>1987</year>). <article-title>Recognizing melodies: a dynamic interpretation</article-title>. <source>Q. J. Exp. Psychol.</source> <volume>39</volume>, <fpage>89</fpage>&#x2013;<lpage>121</lpage>. doi: <pub-id pub-id-type="doi">10.1080/02724988743000051</pub-id></citation></ref>
<ref id="ref45">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jusczyk</surname> <given-names>P. W.</given-names></name></person-group> (<year>2002</year>). <article-title>How infants adapt speech-processing capacities to native-language structure</article-title>. <source>Curr. Dir. Psychol. Sci.</source> <volume>11</volume>, <fpage>15</fpage>&#x2013;<lpage>18</lpage>. doi: <pub-id pub-id-type="doi">10.1111/1467-8721.00159</pub-id></citation></ref>
<ref id="ref46">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kochanski</surname> <given-names>G.</given-names></name> <name><surname>Orphanidou</surname> <given-names>C.</given-names></name></person-group> (<year>2008</year>). <article-title>What marks the beat of speech?</article-title> <source>J. Acoust. Soc. Am.</source> <volume>123</volume>, <fpage>2780</fpage>&#x2013;<lpage>2791</lpage>. doi: <pub-id pub-id-type="doi">10.1121/1.2890742</pub-id>, PMID: <pub-id pub-id-type="pmid">18529194</pub-id></citation></ref>
<ref id="ref47">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kotz</surname> <given-names>S. A.</given-names></name> <name><surname>Ravignani</surname> <given-names>A.</given-names></name> <name><surname>Fitch</surname> <given-names>W. T.</given-names></name></person-group> (<year>2018</year>). <article-title>The evolution of rhythm processing</article-title>. <source>Trends Cogn. Sci.</source> <volume>22</volume>, <fpage>896</fpage>&#x2013;<lpage>910</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.tics.2018.08.002</pub-id></citation></ref>
<ref id="ref206">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Large</surname> <given-names>E. W.</given-names></name></person-group> (<year>2008</year>). <article-title>Resonating to Musical Rhythm: Theory and Experiment</article-title>. <source>Psychol. Time.</source>  <fpage>189</fpage>&#x2013;<lpage>232</lpage>. doi: <pub-id pub-id-type="doi">10.1016/B978-0-08046-977-5.00006-5</pub-id></citation></ref>
<ref id="ref48">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Lartillot</surname> <given-names>O.</given-names></name> <name><surname>Eerola</surname> <given-names>T.</given-names></name> <name><surname>Toiviainen</surname> <given-names>P.</given-names></name> <name><surname>Fornari</surname> <given-names>J.</given-names></name></person-group> (<year>2008</year>). Multi-feature modeling of pulse clarity: design, validation and optimization. In: <italic>9th International Conference on Music Information Retrieval</italic>. Philadelphia, USA. 521&#x2013;526.</citation></ref>
<ref id="ref49">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lartillot</surname> <given-names>O.</given-names></name> <name><surname>Grandjean</surname> <given-names>D.</given-names></name></person-group> (<year>2019</year>). <article-title>Tempo and metrical analysis by tracking multiple metrical levels using autocorrelation</article-title>. <source>Appl. Sci.</source> <volume>9</volume>:<fpage>5121</fpage>. doi: <pub-id pub-id-type="doi">10.3390/app9235121</pub-id></citation></ref>
<ref id="ref50">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Lartillot</surname> <given-names>O.</given-names></name> <name><surname>Toiviainen</surname> <given-names>P.</given-names></name></person-group> (<year>2007</year>). MIR in Matlab (II): a toolbox for musical feature extraction from audio. In: <italic>Proceedings of the 10th International Conference on Digital Audio Effects</italic>. Bordeaux, France. 127&#x2013;130.</citation></ref>
<ref id="ref51">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lense</surname> <given-names>M. D.</given-names></name> <name><surname>Lad&#x00E1;nyi</surname> <given-names>E.</given-names></name> <name><surname>Rabinowitch</surname> <given-names>T.-C.</given-names></name> <name><surname>Trainor</surname> <given-names>L. J.</given-names></name> <name><surname>Gordon</surname> <given-names>R.</given-names></name></person-group> (<year>2021</year>). <article-title>Rhythm and timing as vulnerabilities in neurodevelopmental disorders</article-title>. <source>Phil. Trans. R. Soc.</source> <volume>376</volume>:<fpage>20200327</fpage>. doi: <pub-id pub-id-type="doi">10.1098/rstb.2020.0327</pub-id>, PMID: <pub-id pub-id-type="pmid">34420385</pub-id></citation></ref>
<ref id="ref52">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Leong</surname> <given-names>V.</given-names></name></person-group> (<year>2012</year>). Prosodic rhythm in the speech amplitude envelope: Amplitude modulation phase hierarchies (AMPHs) and AMPH models. Doctoral dissertation. University of Cambridge, Cambridge.</citation></ref>
<ref id="ref54">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lidji</surname> <given-names>P.</given-names></name> <name><surname>Palmer</surname> <given-names>C.</given-names></name> <name><surname>Peretz</surname> <given-names>I.</given-names></name> <name><surname>Morningstar</surname> <given-names>M.</given-names></name></person-group> (<year>2011</year>). <article-title>Listeners feel the beat: entrainment to English and French speech rhythms</article-title>. <source>Psychon. Bull. Rev.</source> <volume>18</volume>, <fpage>1035</fpage>&#x2013;<lpage>1041</lpage>. doi: <pub-id pub-id-type="doi">10.3758/s13423-011-0163-0</pub-id>, PMID: <pub-id pub-id-type="pmid">21912999</pub-id></citation></ref>
<ref id="ref55">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Liutkus</surname> <given-names>A.</given-names></name> <name><surname>St&#x00F6;ter</surname> <given-names>F.-R.</given-names></name> <name><surname>Rafii</surname> <given-names>Z.</given-names></name> <name><surname>Kitamura</surname> <given-names>D.</given-names></name> <name><surname>Rivet</surname> <given-names>B.</given-names></name> <name><surname>Ito</surname> <given-names>N.</given-names></name> <etal/></person-group>. (<year>2017</year>). The 2016 signal separation evaluation campaign. In: <italic>13th International Conference on Latent Variable Analysis and Signal Separation</italic>. Grenoble, France.</citation></ref>
<ref id="ref56">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Marcus</surname> <given-names>S. M.</given-names></name></person-group> (<year>1981</year>). <article-title>Acoustic determinants of perceptual center (P-center) location</article-title>. <source>Percept. Psychophys.</source> <volume>30</volume>, <fpage>247</fpage>&#x2013;<lpage>256</lpage>. doi: <pub-id pub-id-type="doi">10.3758/bf03214280</pub-id>, PMID: <pub-id pub-id-type="pmid">7322800</pub-id></citation></ref>
<ref id="ref57">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Matthews</surname> <given-names>T. E.</given-names></name> <name><surname>Witek</surname> <given-names>M.</given-names></name> <name><surname>Heggli</surname> <given-names>O. A.</given-names></name> <name><surname>Penhune</surname> <given-names>V. B.</given-names></name> <name><surname>Vuust</surname> <given-names>P.</given-names></name></person-group> (<year>2019</year>). <article-title>The sensation of groove is affected by the interaction of rhythmic and harmonic complexity</article-title>. <source>PLoS One</source> <volume>14</volume>:<fpage>e0204539</fpage>. doi: <pub-id pub-id-type="doi">10.1371/journal.pone.0204539</pub-id>, PMID: <pub-id pub-id-type="pmid">30629596</pub-id></citation></ref>
<ref id="ref58">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>McAuley</surname> <given-names>J. D.</given-names></name></person-group> (<year>2010</year>). &#x201C;<article-title>Tempo and rhythm</article-title>&#x201D; in <source>Music perception</source>. eds. <person-group person-group-type="editor"><name><surname>Jones</surname> <given-names>M. R.</given-names></name> <name><surname>Fay</surname> <given-names>R. R.</given-names></name> <name><surname>Popper</surname> <given-names>A. N.</given-names></name></person-group> (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>165</fpage>&#x2013;<lpage>199</lpage>.</citation></ref>
<ref id="ref59">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>McKinney</surname> <given-names>M.</given-names></name> <name><surname>Moelants</surname> <given-names>D.</given-names></name> <name><surname>Davies</surname> <given-names>M.</given-names></name> <name><surname>Klapuri</surname> <given-names>A.</given-names></name></person-group> (<year>2007</year>). <article-title>Evaluation of audio beat tracking and music tempo extraction algorithms</article-title>. <source>J. New Music Res.</source> <volume>36</volume>, <fpage>1</fpage>&#x2013;<lpage>16</lpage>. doi: <pub-id pub-id-type="doi">10.1080/09298210701653252</pub-id></citation></ref>
<ref id="ref207">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Morey</surname> <given-names>R. D.</given-names></name></person-group> (<year>2008</year>). <article-title>Confidence Intervals from Normalized Data: A correction to Cousineau (2005)</article-title>. <source>Tutor. Quant. Methods Psychol.</source> <volume>4</volume>, <fpage>61</fpage>&#x2013;<lpage>64</lpage>. doi: <pub-id pub-id-type="doi">10.20982/tqmp.04.2.p061</pub-id></citation></ref>
<ref id="ref61">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Morton</surname> <given-names>J.</given-names></name> <name><surname>Marcus</surname> <given-names>S.</given-names></name> <name><surname>Frankish</surname> <given-names>C.</given-names></name></person-group> (<year>1976</year>). <article-title>Perceptual centers (P-centers)</article-title>. <source>Psychol. Rev.</source> <volume>83</volume>, <fpage>405</fpage>&#x2013;<lpage>408</lpage>. doi: <pub-id pub-id-type="doi">10.1037/0033-295X.83.5.405</pub-id></citation></ref>
<ref id="ref62">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nazzi</surname> <given-names>T.</given-names></name> <name><surname>Jusczyk</surname> <given-names>P. W.</given-names></name> <name><surname>Johnson</surname> <given-names>E. K.</given-names></name></person-group> (<year>2000</year>). <article-title>Language discrimination by English-learning 5-month-olds: effects of rhythm and familiarity</article-title>. <source>J. Mem. Lang.</source> <volume>43</volume>, <fpage>1</fpage>&#x2013;<lpage>19</lpage>. doi: <pub-id pub-id-type="doi">10.1006/jmla.2000.2698</pub-id></citation></ref>
<ref id="ref63">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nolan</surname> <given-names>F.</given-names></name> <name><surname>Jeon</surname> <given-names>H. S.</given-names></name></person-group> (<year>2014</year>). <article-title>Speech rhythm: a metaphor?</article-title> <source>Philos. Trans. R. Soc. Lond. B Biol. Sci.</source> <volume>369</volume>:<fpage>20130396</fpage>. doi: <pub-id pub-id-type="doi">10.1098/rstb.2013.0396</pub-id>, PMID: <pub-id pub-id-type="pmid">25385774</pub-id></citation></ref>
<ref id="ref64">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Norton</surname> <given-names>P.</given-names></name> <name><surname>Scharff</surname> <given-names>C.</given-names></name></person-group> (<year>2016</year>). <article-title>"bird Song Metronomics": isochronous Organization of Zebra Finch Song Rhythm</article-title>. <source>Front. Neurosci.</source> <volume>10</volume>:<fpage>309</fpage>. doi: <pub-id pub-id-type="doi">10.3389/fnins.2016.00309</pub-id>, PMID: <pub-id pub-id-type="pmid">27458334</pub-id></citation></ref>
<ref id="ref208">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ozaki</surname> <given-names>Y.</given-names></name> <name><surname>Kloots</surname> <given-names>M. de H.</given-names></name> <name><surname>Ravignani</surname> <given-names>A.</given-names></name> <name><surname>Savage</surname> <given-names>P. E.</given-names></name></person-group> (<year>2023</year>). <article-title>Cultural evolution of 575 music and language</article-title> <source>PsyArXiv.</source> doi: <pub-id pub-id-type="doi">10.31234/osf.io/s7apx</pub-id>, PMID: <pub-id pub-id-type="pmid">20329856</pub-id></citation></ref>
<ref id="ref65">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Parncutt</surname> <given-names>R.</given-names></name></person-group> (<year>1994</year>). <article-title>A perceptual model of pulse salience and metrical accent in musical rhythms</article-title>. <source>Music. Percept.</source> <volume>11</volume>, <fpage>409</fpage>&#x2013;<lpage>464</lpage>. doi: <pub-id pub-id-type="doi">10.2307/40285633</pub-id></citation></ref>
<ref id="ref66">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Patel</surname> <given-names>A.</given-names></name></person-group> (<year>2003</year>). <article-title>Rhythm in language and music</article-title>. <source>Ann. N. Y. Acad. Sci.</source> <volume>999</volume>, <fpage>140</fpage>&#x2013;<lpage>143</lpage>. doi: <pub-id pub-id-type="doi">10.1196/annals.1284.015</pub-id></citation></ref>
<ref id="ref67">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Patel</surname> <given-names>A. D.</given-names></name> <name><surname>Daniele</surname> <given-names>J. R.</given-names></name></person-group> (<year>2003</year>). <article-title>An empirical comparison of rhythm in language and music</article-title>. <source>Cognition</source> <volume>87</volume>, <fpage>B35</fpage>&#x2013;<lpage>B45</lpage>. doi: <pub-id pub-id-type="doi">10.1016/S0010-0277(02)00187-7</pub-id>, PMID: <pub-id pub-id-type="pmid">12499110</pub-id></citation></ref>
<ref id="ref68">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Patel</surname> <given-names>A. D.</given-names></name> <name><surname>Iversen</surname> <given-names>J. R.</given-names></name> <name><surname>Chen</surname> <given-names>Y.</given-names></name> <name><surname>Repp</surname> <given-names>B. H.</given-names></name></person-group> (<year>2005</year>). <article-title>The influence of metricality and modality on synchronization with a beat</article-title>. <source>Exp. Brain Res.</source> <volume>163</volume>, <fpage>226</fpage>&#x2013;<lpage>238</lpage>. doi: <pub-id pub-id-type="doi">10.1007/s00221-004-2159-8</pub-id>, PMID: <pub-id pub-id-type="pmid">15654589</pub-id></citation></ref>
<ref id="ref69">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Polyanskaya</surname> <given-names>L.</given-names></name> <name><surname>Ordin</surname> <given-names>M.</given-names></name> <name><surname>Busa</surname> <given-names>M. G.</given-names></name></person-group> (<year>2017</year>). <article-title>Relative salience of speech rhythm and speech rate on perceived foreign accent in a second language</article-title>. <source>Lang. Speech</source> <volume>60</volume>, <fpage>333</fpage>&#x2013;<lpage>355</lpage>. doi: <pub-id pub-id-type="doi">10.1177/0023830916648720</pub-id>, PMID: <pub-id pub-id-type="pmid">28915779</pub-id></citation></ref>
<ref id="ref70">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pompino-Marschall</surname> <given-names>B.</given-names></name></person-group> (<year>1989</year>). <article-title>On the psychoacoustic nature of the P-center phenomenon</article-title>. <source>J. Phon.</source> <volume>17</volume>, <fpage>175</fpage>&#x2013;<lpage>192</lpage>. doi: <pub-id pub-id-type="doi">10.1016/S0095-4470(19)30428-0</pub-id></citation></ref>
<ref id="ref71">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Povel</surname> <given-names>D. J.</given-names></name> <name><surname>Essens</surname> <given-names>P.</given-names></name></person-group> (<year>1985</year>). <article-title>Perception of temporal patterns</article-title>. <source>Music. Percept.</source> <volume>2</volume>, <fpage>411</fpage>&#x2013;<lpage>440</lpage>. doi: <pub-id pub-id-type="doi">10.2307/40285311</pub-id></citation></ref>
<ref id="ref209">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Qualtrics</surname></name></person-group> (<year>2021</year>). Available at: <ext-link xlink:href="https://www.qualtrics.com" ext-link-type="uri">https://www.qualtrics.com</ext-link></citation></ref>
<ref id="ref72">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Ramus</surname> <given-names>F.</given-names></name> <name><surname>Nespor</surname> <given-names>M.</given-names></name> <name><surname>Mehler</surname> <given-names>J.</given-names></name></person-group> (<year>1999</year>). Correlates of linguistic rhythm in the speech signal. <source>Cognition</source> <volume>73</volume>, <fpage>265</fpage>&#x2013;<lpage>292</lpage>. doi: <pub-id pub-id-type="doi">10.1016/s0010-0277(99)00058-x</pub-id></citation></ref>
<ref id="ref73">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rathcke</surname> <given-names>T.</given-names></name> <name><surname>Lin</surname> <given-names>C.</given-names></name> <name><surname>Falk</surname> <given-names>S.</given-names></name> <name><surname>Dalla Bella</surname> <given-names>S.</given-names></name></person-group> (<year>2021</year>). <article-title>Tapping into linguistic rhythm</article-title>. <source>Lab. Phonol.</source> <volume>12</volume>:<fpage>11</fpage>. doi: <pub-id pub-id-type="doi">10.5334/labphon.248</pub-id></citation></ref>
<ref id="ref74">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ravignani</surname> <given-names>A.</given-names></name> <name><surname>Madison</surname> <given-names>G.</given-names></name></person-group> (<year>2017</year>). <article-title>The paradox of Isochrony in the evolution of human rhythm</article-title>. <source>Front. Psychol.</source> <volume>8</volume>:<fpage>1820</fpage>. doi: <pub-id pub-id-type="doi">10.3389/fpsyg.2017.01820</pub-id>, PMID: <pub-id pub-id-type="pmid">29163252</pub-id></citation></ref>
<ref id="ref75">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ravignani</surname> <given-names>A.</given-names></name> <name><surname>Norton</surname> <given-names>P.</given-names></name></person-group> (<year>2017</year>). <article-title>Measuring rhythmic complexity: a primer to quantify and compare temporal structure in speech, movement, and animal vocalizations</article-title>. <source>J Lang. Evol.</source> <volume>2</volume>, <fpage>4</fpage>&#x2013;<lpage>19</lpage>. doi: <pub-id pub-id-type="doi">10.1093/jole/lzx002</pub-id></citation></ref>
<ref id="ref76">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Roeske</surname> <given-names>T. C.</given-names></name> <name><surname>Tchernichovski</surname> <given-names>O.</given-names></name> <name><surname>Poeppel</surname> <given-names>D.</given-names></name> <name><surname>Jacoby</surname> <given-names>N.</given-names></name></person-group> (<year>2020</year>). <article-title>Categorical rhythms are shared between songbirds and humans</article-title>. <source>Curr. Biol.</source> <volume>30</volume>, <fpage>3544</fpage>&#x2013;<lpage>3555.e6</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.cub.2020.06.072</pub-id></citation></ref>
<ref id="ref77">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rosen</surname> <given-names>S.</given-names></name></person-group> (<year>1992</year>). <article-title>Temporal information in speech: acoustic, auditory and linguistic aspects</article-title>. <source>Philos. Trans. Biol. Sci.</source> <volume>336</volume>, <fpage>367</fpage>&#x2013;<lpage>373</lpage>. doi: <pub-id pub-id-type="doi">10.1098/rstb.1992.0070</pub-id>, PMID: <pub-id pub-id-type="pmid">1354376</pub-id></citation></ref>
<ref id="ref78">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Rothenberg</surname> <given-names>D.</given-names></name></person-group> (<year>2013</year>). <source>Bug music: How insects gave us rhythm and noise</source>. <publisher-name>Manhattan, United States: St. Martin's Press</publisher-name>.</citation></ref>
<ref id="ref79">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rowland</surname> <given-names>J.</given-names></name> <name><surname>Kasdan</surname> <given-names>A.</given-names></name> <name><surname>Poeppel</surname> <given-names>D.</given-names></name></person-group> (<year>2019</year>). <article-title>There is music in repetition: looped segments of speech and nonspeech induce the perception of music in a time-dependent manner</article-title>. <source>Psychon. Bull. Rev.</source> <volume>26</volume>, <fpage>583</fpage>&#x2013;<lpage>590</lpage>. doi: <pub-id pub-id-type="doi">10.3758/s13423-018-1527-5</pub-id>, PMID: <pub-id pub-id-type="pmid">30238294</pub-id></citation></ref>
<ref id="ref80">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Scott</surname> <given-names>S. K.</given-names></name></person-group> (<year>1998</year>). <article-title>The point of P-centres</article-title>. <source>Psychol. Res.</source> <volume>61</volume>, <fpage>4</fpage>&#x2013;<lpage>11</lpage>. doi: <pub-id pub-id-type="doi">10.1007/PL00008162</pub-id></citation></ref>
<ref id="ref81">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shannon</surname> <given-names>R. V.</given-names></name> <name><surname>Zeng</surname> <given-names>F.-G.</given-names></name> <name><surname>Kamath</surname> <given-names>V.</given-names></name> <name><surname>Wygonski</surname> <given-names>J.</given-names></name> <name><surname>Ekelid</surname> <given-names>M.</given-names></name></person-group> (<year>1995</year>). <article-title>Speech recognition with primarily temporal cues</article-title>. <source>Science</source> <volume>270</volume>, <fpage>303</fpage>&#x2013;<lpage>304</lpage>. doi: <pub-id pub-id-type="doi">10.1126/science.270.5234.303</pub-id></citation></ref>
<ref id="ref82">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Simchy-Gross</surname> <given-names>R.</given-names></name> <name><surname>Margulis</surname> <given-names>E. H.</given-names></name></person-group> (<year>2018</year>). <article-title>The sound-to-music illusion: repetition can musicalize nonspeech sounds</article-title>. <source>Music Sci</source> <volume>1</volume>:<fpage>205920431773199</fpage>. doi: <pub-id pub-id-type="doi">10.1177/2059204317731992</pub-id></citation></ref>
<ref id="ref83">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Southwell</surname> <given-names>R.</given-names></name> <name><surname>Chait</surname> <given-names>M.</given-names></name></person-group> (<year>2018</year>). <article-title>Enhanced deviant responses in patterned relative to random sound sequences</article-title>. <source>Cortex</source> <volume>109</volume>, <fpage>92</fpage>&#x2013;<lpage>103</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.cortex.2018.08.032</pub-id>, PMID: <pub-id pub-id-type="pmid">30312781</pub-id></citation></ref>
<ref id="ref84">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Suppanen</surname> <given-names>E.</given-names></name> <name><surname>Huotilainen</surname> <given-names>M.</given-names></name> <name><surname>Ylinen</surname> <given-names>S.</given-names></name></person-group> (<year>2019</year>). <article-title>Rhythmic structure facilitates learning from auditory input in newborn infants</article-title>. <source>Infant Behav. Dev.</source> <volume>57</volume>:<fpage>101346</fpage>. doi: <pub-id pub-id-type="doi">10.1016/j.infbeh.2019.101346</pub-id>, PMID: <pub-id pub-id-type="pmid">31491617</pub-id></citation></ref>
<ref id="ref85">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Taylor</surname> <given-names>J.</given-names></name> <name><surname>Taylor</surname> <given-names>A.</given-names></name></person-group> (<year>1806</year>). <source>Rhymes for the nursery</source>. <publisher-loc>London</publisher-loc>: <publisher-name>Arthur Hall, Virtue, &#x0026; Co</publisher-name>, <fpage>24</fpage>.</citation></ref>
<ref id="ref86">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tierney</surname> <given-names>A.</given-names></name> <name><surname>Patel</surname> <given-names>A. D.</given-names></name> <name><surname>Breen</surname> <given-names>M.</given-names></name></person-group> (<year>2018</year>). <article-title>Acoustic foundations of the speech-to-song illusion</article-title>. <source>J. Exp. Psychol. Gen.</source> <volume>147</volume>, <fpage>888</fpage>&#x2013;<lpage>904</lpage>. doi: <pub-id pub-id-type="doi">10.1037/xge0000455</pub-id>, PMID: <pub-id pub-id-type="pmid">29888940</pub-id></citation></ref>
<ref id="ref87">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Trainor</surname> <given-names>L. J.</given-names></name> <name><surname>Chang</surname> <given-names>A.</given-names></name> <name><surname>Cairney</surname> <given-names>J.</given-names></name> <name><surname>Li</surname> <given-names>Y.-C.</given-names></name></person-group> (<year>2018</year>). <article-title>Is auditory perceptual timing a core deficit of developmental coordination disorder?</article-title> <source>Ann. N. Y. Acad. Sci.</source> <volume>1423</volume>, <fpage>30</fpage>&#x2013;<lpage>39</lpage>. doi: <pub-id pub-id-type="doi">10.1111/nyas.13701</pub-id>, PMID: <pub-id pub-id-type="pmid">29741273</pub-id></citation></ref>
<ref id="ref88">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Turk</surname> <given-names>A.</given-names></name> <name><surname>Shattuck-Hufnagel</surname> <given-names>S.</given-names></name></person-group> (<year>2013</year>). <article-title>What is speech rhythm? A commentary on Arvaniti and Rodriquez, Krivokapi&#x0107;, and Goswami and Leong</article-title>. <source>Lab. Phonol.</source> <volume>4</volume>, <fpage>93</fpage>&#x2013;<lpage>118</lpage>. doi: <pub-id pub-id-type="doi">10.1515/lp-2013-0005</pub-id></citation></ref>
<ref id="ref89">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Van Handel</surname> <given-names>L.</given-names></name></person-group> (<year>2006</year>). &#x201C;<article-title>Trends in/over time: rhythm in speech and musical melody in 19th-century art song</article-title>&#x201D; in <source>Sound and music computing, 2006</source> (<publisher-loc>Marseille</publisher-loc>: <publisher-name>France</publisher-name>)</citation></ref>
<ref id="ref91">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vanden Bosch Der Nederlanden</surname> <given-names>C. M.</given-names></name> <name><surname>Joanisse</surname> <given-names>M. F.</given-names></name> <name><surname>Grahn</surname> <given-names>J. A.</given-names></name> <name><surname>Snijders</surname> <given-names>T. M.</given-names></name> <name><surname>Schoffelen</surname> <given-names>J. M.</given-names></name> <name><surname>Schoffelen</surname> <given-names>J.-M.</given-names></name></person-group> (<year>2022a</year>). <article-title>Familiarity modulates neural tracking of sung and spoken utterances</article-title>. <source>Neuroimage</source> <volume>252</volume>:<fpage>119049</fpage>. doi: <pub-id pub-id-type="doi">10.1016/j.neuroimage.2022.119049</pub-id>, PMID: <pub-id pub-id-type="pmid">35248707</pub-id></citation></ref>
<ref id="ref92">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vanden Bosch Der Nederlanden</surname> <given-names>C. M.</given-names></name> <name><surname>Qi</surname> <given-names>X.</given-names></name> <name><surname>Sequeira</surname> <given-names>S.</given-names></name> <name><surname>Seth</surname> <given-names>P.</given-names></name> <name><surname>Grahn</surname> <given-names>J. A.</given-names></name> <name><surname>Joanisse</surname> <given-names>M. F.</given-names></name> <etal/></person-group>. (<year>2022b</year>). <article-title>Developmental changes in the categorization of speech and song</article-title>. <source>Dev. Sci.</source>  <fpage>e13346</fpage>. doi: <pub-id pub-id-type="doi">10.1111/desc.13346</pub-id>, PMID: <pub-id pub-id-type="pmid">36419407</pub-id></citation></ref>
<ref id="ref93">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Villing</surname> <given-names>R. C.</given-names></name> <name><surname>Ward</surname> <given-names>T.</given-names></name> <name><surname>Timoney</surname> <given-names>J.</given-names></name></person-group> (<year>2007</year>). <source>A review of P-Centre models. [Conference presentation]. Rhythm production and perception workshop</source>, <publisher-loc>Dublin, Ireland</publisher-loc>.</citation></ref>
<ref id="ref94">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vos</surname> <given-names>J.</given-names></name> <name><surname>Rasch</surname> <given-names>R.</given-names></name></person-group> (<year>1981</year>). <article-title>The perceptual onset of musical tones</article-title>. <source>Percept. Psychophys.</source> <volume>29</volume>, <fpage>323</fpage>&#x2013;<lpage>335</lpage>. doi: <pub-id pub-id-type="doi">10.3758/bf03207341</pub-id></citation></ref>
<ref id="ref95">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Weineck</surname> <given-names>K.</given-names></name> <name><surname>Wen</surname> <given-names>O. X.</given-names></name> <name><surname>Henry</surname> <given-names>M. J.</given-names></name></person-group> (<year>2022</year>). <article-title>Neural synchronization is strongest to the spectral flux of slow music and depends on familiarity and beat salience</article-title>. <source>Elife</source> <volume>11</volume>:<fpage>e75515</fpage>. doi: <pub-id pub-id-type="doi">10.7554/eLife.75515</pub-id>, PMID: <pub-id pub-id-type="pmid">36094165</pub-id></citation></ref>
<ref id="ref96">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Werker</surname> <given-names>J. F.</given-names></name> <name><surname>Byers-Heinlein</surname> <given-names>K.</given-names></name></person-group> (<year>2008</year>). <article-title>Bilingualism in infancy: first steps in perception and comprehension</article-title>. <source>Trends Cogn. Sci.</source> <volume>12</volume>, <fpage>144</fpage>&#x2013;<lpage>151</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.tics.2008.01.008</pub-id>, PMID: <pub-id pub-id-type="pmid">18343711</pub-id></citation></ref>
<ref id="ref97">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>White</surname> <given-names>L.</given-names></name> <name><surname>Mattys</surname> <given-names>S. L.</given-names></name></person-group> (<year>2007</year>). <article-title>Calibrating rhythm: first language and second language studies</article-title>. <source>J. Phon.</source> <volume>35</volume>, <fpage>501</fpage>&#x2013;<lpage>522</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.wocn.2007.02.003</pub-id></citation></ref>
<ref id="ref98">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wiget</surname> <given-names>L.</given-names></name> <name><surname>White</surname> <given-names>L.</given-names></name> <name><surname>Schuppler</surname> <given-names>B.</given-names></name> <name><surname>Grenon</surname> <given-names>I.</given-names></name> <name><surname>Rauch</surname> <given-names>O.</given-names></name> <name><surname>Mattys</surname> <given-names>S.</given-names></name></person-group> (<year>2010</year>). <article-title>How stable are acoustic metrics of contrastive speech rhythm?</article-title> <source>J. Acoust. Soc. Am.</source> <volume>127</volume>, <fpage>1559</fpage>&#x2013;<lpage>1569</lpage>. doi: <pub-id pub-id-type="doi">10.1121/1.3293004</pub-id>, PMID: <pub-id pub-id-type="pmid">20329856</pub-id></citation></ref>
<ref id="ref99">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhao</surname> <given-names>J.</given-names></name> <name><surname>Al-Aidroos</surname> <given-names>N.</given-names></name> <name><surname>Turk-Browne</surname> <given-names>N. B.</given-names></name></person-group> (<year>2013</year>). <article-title>Attention is spontaneously biased toward regularities</article-title>. <source>Psychol. Sci.</source> <volume>24</volume>, <fpage>667</fpage>&#x2013;<lpage>677</lpage>. doi: <pub-id pub-id-type="doi">10.1177/0956797612460407</pub-id>, PMID: <pub-id pub-id-type="pmid">23558552</pub-id></citation></ref>
</ref-list>
</back>
</article>