Differential sensitivity to speech rhythms in young and older adults

Pearson, Dylan V.; Shen, Yi; McAuley, J. Devin; Kidd, Gary R.

doi:10.3389/fpsyg.2023.1160236

ORIGINAL RESEARCH article

Front. Psychol., 12 May 2023

Sec. Auditory Cognitive Neuroscience

Volume 14 - 2023 | https://doi.org/10.3389/fpsyg.2023.1160236

Differential sensitivity to speech rhythms in young and older adults

Dylan V. Pearson¹

Yi Shen²

J. Devin McAuley³

Gary R. Kidd¹^*

¹Department of Speech, Language, and Hearing Sciences, Indiana University, Bloomington, IN, United States
²Department of Speech and Hearing Sciences, University of Washington, Seattle, WA, United States
³Department of Psychology, Michigan State University, East Lansing, MI, United States

Sensitivity to the temporal properties of auditory patterns tends to be poorer in older listeners, and this has been hypothesized to be one factor contributing to their poorer speech understanding. This study examined sensitivity to speech rhythms in young and older normal-hearing subjects, using a task designed to measure the effect of speech rhythmic context on the detection of changes in the timing of word onsets in spoken sentences. A temporal-shift detection paradigm was used in which listeners were presented with an intact sentence followed by two versions of the sentence in which a portion of speech was replaced with a silent gap: one with correct gap timing (the same duration as the missing speech) and one with altered gap timing (shorter or longer than the duration of the missing speech), resulting in an early or late resumption of the sentence after the gap. The sentences were presented with either an intact rhythm or an altered rhythm preceding the silent gap. Listeners judged which sentence had the altered gap timing, and thresholds for the detection of deviations from the correct timing were calculated separately for shortened and lengthened gaps. Both young and older listeners demonstrated lower thresholds in the intact rhythm condition than in the altered rhythm conditions. However, shortened gaps led to lower thresholds than lengthened gaps for the young listeners, while older listeners were not sensitive to the direction of the change in timing. These results show that both young and older listeners rely on speech rhythms to generate temporal expectancies for upcoming speech events. However, the absence of lower thresholds for shortened gaps among the older listeners indicates a change in speech-timing expectancies with age. A further examination of individual differences within the older group revealed that those with better rhythm-discrimination abilities (from a separate study) tended to show the same heightened sensitivity to early events observed with the young listeners.

Introduction

Difficulty hearing speech in complex environments is one of the most common problems reported by older listeners. Although hearing loss accounts for much of this difficulty, many studies have found that audibility cannot fully account for differences in speech-in-noise performance (Gordon-Salant and Fitzgibbons, 1993; Humes et al., 1994; Pichora-Fuller et al., 1995; Frisina and Frisina, 1997; Gordon-Salant, 2005; Vermiglio et al., 2012; Carroll et al., 2016; Holmes and Griffiths, 2019). Decreases in cognitive abilities and temporal-processing abilities are two of the most often cited factors accounting for increased difficulty understanding speech in noise among older listeners (Humes et al., 2013; Füllgrabe et al., 2015; Gieseler et al., 2017; Nuesse et al., 2018). Among cognitive abilities, working memory and attentional control have often been found to be important for speech understanding, especially under difficult listening conditions (Humes, 2007; Akeroyd, 2008; Houtgast and Festen, 2008; Humes and Dubno, 2010; Humes et al., 2013; Tierney et al., 2020). Although there is good agreement that temporal processing abilities also have a significant influence on the ability to understand speech in noise, especially in the presence of modulated noise, competing speech, or other time-varying sounds (Sommers and Humes, 1993; Dubno et al., 2002; George et al., 2006, 2007; Houtgast and Festen, 2008; Humes and Dubno, 2010; Fitzgibbons and Gordon-Salant, 2010a,b), it has been difficult to establish a clear connection between specific measures of temporal processing ability and speech understanding. Among young normal-hearing (NH) listeners, differences in neither spectral nor temporal resolving power (measured with nonspeech stimuli) account for individual differences in speech understanding in Gaussian noise (Karlin, 1942; Surprenant and Watson, 2001; Watson and Kidd, 2002; Kidd et al., 2007). Among older listeners, the situation is less clear; age-related deficits in termporal sensitivity that may be linked to speech understanding are frequently found (e.g., Fitzgibbons and Gordon-Salant, 1994, 2004, 2011; Schneider et al., 1994; Snell and Frisina, 2000; Gordon-Salant and Fitzgibbons, 2004; Lister and Tarver, 2004; Humes et al., 2009, 2013; Gallego Hiroyasu and Yotsumoto, 2020; Humes, 2021), but the causal role of temporal-processing deficits is difficult to determine, due to correlations with other age-related factors, such as hearing loss, cognitive decline and a general perceptual slowing (see Schneider et al., 2005; Humes and Dubno, 2010; Humes et al., 2013). Notably, temporal processing deficits among the elderly tend to be greater when measured in the context of a temporal sequence than with isolated events (e.g., Fitzgibbons and Gordon-Salant, 1995; Gordon-Salant and Fitzgibbons, 1999; Fitzgibbons and Gordon-Salant, 2001), suggesting that the primary influence of temporal processing abilities on speech understanding may be related to the ability to attentionally track temporal patterns, rather than differences in basic temporal resolving power.

Although many studies have shown a decrease in temporal processing abilities with age, relatively little is known about the effect of age on sensitivity to suprasegmental timing patterns or speech rhythm. It is clear that prosody is important for speech understanding (see Cutler and Swinney, 1987; Cutler et al., 1997; Fletcher, 2010), and that temporal aspects of prosody influence speech perception in young and older listeners (e.g., Hoyte et al., 2009). Both word segmentation and phoneme identification are affected by the manipulation of suprasegmental timing (Martin, 1972; Kidd, 1989; Cutler and Mehler, 1993; Dilley and McAuley, 2008). However, few studies have measured sensitivity to changes in the timing of speech events in running speech in both young and older adults. This is the focus of the present study.

The temporal envelope of naturally produced speech exhibits a quasi-rhythmic structure which provides sufficient predictability for listeners to create temporal expectations that influence speech perception (Kidd, 1989; Dilley and McAuley, 2008). Similarly, a more regular rhythmic structure in an auditory signal has been demonstrated to facilitate perception (Aubanel et al., 2016; Wang et al., 2018; Shen and Pearson, 2019), and degrading the natural rhythm of speech has been found to decrease speech understanding in noise and multi-talker backgrounds (McAuley et al., 2020, 2021). However, it is not clear how the ability to use the rhythmic regularity of speech to aid speech understanding may change with age, or how important this ability is for speech understanding compared to other age-related changes in temporal processing or cognitive abilities.

A potential explanation of how sensitivity to timing and rhythmic structure affects speech understanding is based on dynamic attending theory (DAT, see Jones, 1976; Jones and Boltz, 1989; Large and Jones, 1999). DAT proposes that rhythms in the environment (i.e., stimulus rhythms) serve to entrain (synchronize) natural temporal fluctuations in listeners’ attention. This stimulus-driven attentional synchronization focuses pulses of attentional energy at periodic time intervals that align with rhythmically salient points in the stimulus, thus facilitating the perception of events that occur at these time points. Support for DAT has been found in studies showing better discrimination and detection of events that occur at rhythmically expected times than those that are early or late relative to rhythmic expectations (McAuley and Kidd, 1998; Jones et al., 2002; McAuley and Jones, 2003; Jones and McAuley, 2005; Miller et al., 2013; McAuley and Fromboluti, 2014).

Speech rhythm entrainment can also help explain some effects of speech context where expectations set by the temporal context of preceding speech can influence word segmentation and lexical processing (Kidd, 1989; Dilley and McAuley, 2008; Morrill et al., 2014; Baese-Berk et al., 2019). In these studies, changes in speech rhythms or tempos established early in a spoken sentence create expectancies that influence the perception of later-occurring words or syllables, despite the absence of temporal changes in their local context. Several neurophysiological studies have also provided support for DAT by showing that synchrony between cortical oscillations and speech rhythms is important for the understanding of speech, and for the separation of a single speech stream from background sounds (e.g., Ahissar et al., 2001; Luo and Poeppel, 2007; Giraud and Poeppel, 2012; Golumbic et al., 2012; see Peelle and Davis, 2012, for a review).

The current study provides another test of DAT using a task that measures listeners’ ability to detect deviations from the natural timing of events in spoken sentences with intact or rhythmically altered timing prior to the temporal deviation. The task is to judge whether a spoken sentence that is briefly inaudible (replaced by silence) continues at the correct time when audibility returns. The onset of the sentence continuation after the silent period occurs either at the correct time (as though the sentence had continued without interruption), or it is temporally shifted, occurring slightly earlier or later than in the intact sentence. The task is performed with sentences that are either rhythmically intact or rhythmically altered prior to the silent interruption. A comparison of judgment accuracy with a rhythmically altered vs. intact early sentence provides a measure of the listener’s ability to use the speech rhythm in the earlier part of the sentence to predict the timing of later-occurring speech events. The study includes both young and older normal-hearing listeners to determine whether the ability to use speech rhythm to predict the timing of upcoming speech events changes with age.

According to DAT, the natural rhythms of speech facilitate attentional entrainment and lead to temporal expectations about the onsets of upcoming speech events. Thus, if the predictable speech rhythms are disrupted early in a spoken sentence, listeners will have difficulty anticipating the onsets of later-occurring events in the sentence. Therefore, the ability to detect a temporal shift in sentence timing after the silent interruption should be degraded when the rhythm of the preceding speech is altered. A comparison of young and older listeners’ abilities to detect temporal deviations with intact and rhythmically altered sentence contexts will help to determine the extent to which older listeners’ speech understanding problems may be due to changes in the ability to use speech rhythms to predict the onset of upcoming speech events. If older listeners are less able to use rhythmic context to guide temporal expectations, their performance should be poorer with rhythmically intact sentence contexts and they should be less affected by rhythmic alterations. Additionally, an examination of sensitivity to late onsets vs. early onsets after the brief silent period will help us evaluate the symmetry of the temporal expectancies and may reveal differences in the temporal expectancies generated by young vs. older listeners. Earlier studies with nonspeech stimuli have found an asymmetry in the perception of unexpectedly early and late events (e.g., Halpern and Darwin, 1982; McAuley and Kidd, 1998; McAuley and Fromboluti, 2014; Di Luca and Rhodes, 2016). In the present context, a comparison of detection accuracy for early and late deviations has the potential to provide insight into the nature of rhythm-based temporal expectancies in speech perception, and to show how rhythmic sensitivity may change with age.

Finally, most of the older subjects in this study participated in a large test battery as part of a separate study. The test battery was designed to examine the relation between rhythm perception and speech perception using a variety of speech and non-speech measures. Three tasks were selected from the battery based on their focus on temporal and rhythm processing and possible connection with the perception of speech rhythm. The first was a gap detection task (GAP), where listeners were given a fixed set of gap detection trials and performance was recorded as percent correct. The second was a rhythm discrimination (RD) task, where listeners were presented two rhythms and made a same/different judgment, and discrimination sensitivity was measured using d’. The third was a synchronization and continuation (S&C) tapping task where listeners tapped in synchrony with an isochronous auditory stimulus presented at different tempi and then continued tapping at the same tempo after the stimulus stopped. These tasks were used to determine whether these temporal/rhythm abilities might be associated with the older listeners’ sensitivity to sentence timing as measured in the present study. A measure of working memory from the earlier test battery was also included to evaluate a non-rhythmic cognitive ability as a predictor of performance in the current study.

Materials and methods

Participants

Twenty-one native English speakers were recruited from the Bloomington, Indiana area to participate in the experiment. The young cohort consisted of 11 participants (7 female) ranging from 18 to 26 years (mean age: 20.8) recruited from the student population at Indiana University. The older cohort consisted of 10 participants (7 female) ranging from 59 to 71 years (mean age: 63.2). All listeners had normal hearing as defined by audiometric thresholds equal to or better than 25 dB HL (ANSI, 2004) from 250 through 8,000 Hz in both ears, with the exception of one older listener who had a hearing loss at 8,000 Hz of 35 (right ear) and 55 (left ear) dB HL. [The definition of normal hearing used here is consistent with that specified by WHO (1991)]. Older subjects showed no signs of cognitive impairment (Mini Mental Status Exam, MMSE, > 25; Folstein et al., 1975), and all subjects had English as their native language.

Listeners were compensated at an hourly rate for their participation and informed consent was obtained prior to data collection. Ethical approval (IRB#2007541750) was obtained from the institutional review board at Indiana University.

Stimuli

This study used the sentence “Ready Charlie go to white six now” from the Coordinate Response Measure (CRM) corpus (Bolia et al., 2000), presented at 68 dB SPL. The sentence was read by two different speakers (1 male, 1 female) which alternated across trials. Each trial consisted of a reference sentence (the intact original sentence), followed by two comparison sentences in which the “white six” portion of the sentence was replaced with a silent gap. One comparison sentence had a silent gap equal to the duration of the missing speech and the other comparison sentence included a slightly longer or shorter silent gap. The task was to indicate which comparison pattern had the incorrect silent gap, resulting in the word “now” occurring too late or too early (see Figure 1). The order of the two comparison patterns (correct vs. incorrect silent duration) was randomized across trials. The inter-stimulus interval between sentences randomly varied between 400 and 800 ms to prevent any potential temporal expectation of the “now” onset based on timing regularity across sentences within a trial. The unaltered silent gap, T, between “to” and “now” (with “white six” removed) was 1,661 ms for the male speaker and 1,603 ms for the female speaker. The temporally altered comparison sentence had either a shortened gap duration (T-ΔT) or a lengthened duration (T + ΔT), leading to the onset of “now” occurring unexpectedly early or late, respectively, relative to the timing of the reference (unaltered) sentence. The values of ΔT for the early-onset and late-onset conditions were independently varied adaptively (as a proportion of the reference duration) within a block of trials.

FIGURE 1

Figure 1. A diagram showing a late-onset trial (top) and an early-onset trial (bottom). An intact reference sentence is presented first, followed by two comparison sentences; one with a temporally unaltered gap and one with a temporally altered gap, presented in random order. T represents the duration of the “white six” portion of the sentence (blue boxes) and ΔT represents the change in duration (red boxes indicate durations that have been altered by ΔT).

To evaluate the influence of the speech rhythm in the early part of the sentences on the ability to detect temporal shifts in the onsets of later-occurring words, this study included a condition in which sentence rhythm was altered using a rhythm alteration that maintained intelligibility (and some degree of naturalness) while disturbing the natural speech rhythm. The rhythm alteration was applied to the early portion of the reference and comparison sentences (i.e., “Ready Charlie go to”) by first dividing that portion of the sentence into 50-ms frames. The speech in these frames was then independently compressed or expanded according to a sinusoidal modulator. The adjusted new frame duration relative to the original frame duration (50 ms) is given by:

\frac{New Frame Duration}{Original Frame Duration} = 1 + m \sin (2 π f_{m} t + ϕ),

where m, f_m, and ϕ are the modulation depth, modulation rate and the initial phase of the modulator, respectively.

The modulation depth dictated the degree of rhythm alteration. At a modulation depth of 100%, the new frame duration would be double the original frame duration at the peak of the sinusoidal modulator and it would be compressed to 0 ms at the trough of the modulator. At a modulation depth of 0%, the new frame duration would always be equal to the original frame duration. In the altered-rhythm condition in the current experiment, the modulation depth was set to 75%. This value was selected, based on earlier work, to introduce a salient rhythm alteration while maintaining good intelligibility (McAuley et al., 2020).

The modulation rate determined the frequency of the alternating compression and expansion within the early portion of the sentence. For this experiment, two modulation rates were used: a low-rate, which consisted of one modulator cycle, and a high-rate, which consisted of three modulator cycles. Thus, there was either one cycle of sinusoidal shortening and lengthening, or three cycles of shortening and lengthening in the early portion of the sentence. Figure 2 shows the effect of the high- and low-rate rhythm alteration on the waveform of the sentence. In this example, the Altered (Low-rate) sentence has the lengthening period at the start of the sentence, which can be seen in the lengthening of the word “ready” compared to the unaltered sentence. The shortening toward the end of the early sentence can be seen in the “-arlie go” portion of the sentence. The Altered (High-rate) sentence has smaller periods of duration change that are less salient visually, but the shortening and lengthening can be seen, for example, within the first word of the sentence, where the first syllable “Rea-” is shortened and the second syllable “-dy” is lengthened. This modulation process ensured that for every 50 ms frame that was lengthened, another frame was shortened by the same amount, thus keeping the overall duration of the rhythmically altered portion of the sentence identical to the unaltered version (see the red box in Figure 2) regardless of the rhythm alteration applied.

FIGURE 2

Figure 2. Example waveforms of the reference sentence in each of the three rhythm conditions. The red box highlights the portion of the sentences that are rhythmically altered. The black box indicates the unaltered portion.

The phase of the modulator determined the proportion of compression or expansion at a given point in the cycle. The initial phase of the modulator was randomly drawn from eight different values (0 to 7π/4, with a π/4 spacing) with equal probabilities.

Procedure

Listeners were seated in a sound attenuated booth in front of a computer monitor while auditory signals were presented diotically through insert headphones. During the temporal-shift detection task, participants listened to an intact sentence followed by two presentations of the same sentence in which two consecutive late-occurring words (“white six”) were replaced with a silent period that was either equal to, shorter, or longer than the duration of those words in the intact sentences. The task was to identify which of the two comparison sentences with gaps was the one with the temporally shifted final-word onset. The order of the three rhythm conditions (unaltered, low-rate, high-rate) was randomly drawn for each participant. Each rhythm condition consisted of three consecutive blocks of 40 trials. Each block included an equal number of early- and late-onset trials presented in a random order, preventing participants from identifying the altered sentence based on the total sentence duration. Total testing time for each subject was approximately 3 h in two sessions, with neither session exceeding 2 h. The gap deviations (ΔT/T) for the early- and late-onset trials within a block were varied using two interleaved 2-down 1-up adaptive tracks. To create an easily detected starting point, the initial ΔT/T values were 1.0 (i.e., a 100% decrease in the standard gap, resulting in no gap) for the early-onset condition and 1.5 (i.e., a 150% increase in the standard gap) for the late-onset conditions. The gap deviation was bounded between 0 and 1 for the early-onset condition, and between 0 and 2 for the late-onset condition. The step size was half the gap deviation until the second reversal and ¼ of the gap deviation for the remainder of the block. When the gap deviation reached the upper or lower bound, that value was repeated until two correct responses caused a change in the opposite direction. In each block, reversals were tracked separately for early- and late-onset trials. A temporal-shift detection threshold was estimated for each of the early- and late-onset trials for each block by taking the average gap deviation of the last four reversals, and the mean threshold across three trial blocks was computed for each subject in each condition.

Results

Figure 3 shows temporal-shift detection thresholds (ΔT/T) for early and late temporal shifts for the three rhythm modulation conditions for the young and older listener groups. A 2 (Age: Young vs. Older) × 3 (Rhythm Modulation: Unaltered, Altered Low-Rate, Altered High-Rate) × 2 (Temporal-Shift Direction: Early onset vs. Late onset) mixed-measures analyses of variance (ANOVA) on detection thresholds revealed no main effect of Age, F(1, 19) = 0.024, p = 0.88, η² = 0.001, but main effects of Rhythm Modulation, F(1.46, 27.78) = 6.14, p = 0.011, η² = 0.241, and Temporal Shift Direction, F(1, 19) = 16.22, p = 0.001, η² = 0.460, as well as an interaction between Temporal Shift Direction and Age [F(1, 19) = 4.616, p = 0.045, η² = 0.191]. There were no other interactions (all p’s > 0.19).

FIGURE 3

Figure 3. Temporal-shift detection thresholds (where a lower threshold indicates more accurate shift detection) for the young (upper panels) and older (lower panels) listener groups, plotted for all three rhythm conditions. Thresholds (ΔT/T) for the early-onset conditions are shown in the panels on the left, with late-onset thresholds on the right. Mean thresholds are shown by blue dots (connected by lines) and thresholds for all individual subjects are shown by red dots. Error bars represent the standard error of the mean.

The lack of a main effect of age shows that thresholds, overall, do not reliably differ between the young adult group (M = 0.42, SD = 0.20, 95% CI = 0.29–0.54) and the older adult group (M = 0.43, SD = 0.17, 95% CI = 0.30–0.54). With respect to the main effect of Rhythm Modulation, post-hoc t-tests show that detection thresholds are lower (better) for the unaltered rhythm modulation condition (M = 0.37, SD = 0.17) than those for the altered low-rate condition [M = 0.46, SD = 0.21, t(20) = −3.05, p = 0.006, Cohen’s d = −0.66] and those for the altered high-rate conditions [M = 0.44, SD = 0.22, t(20) = −3.67, p = 0.002, Cohen’s d = −0.80], but that thresholds for the two altered rhythm conditions do not significantly differ, [t(20) = 0.53, p = 0.6, Cohen’s d = 0.12]. With respect to the main effect of Temporal Shift Direction, detection thresholds are reliably lower (better) for early onsets (M = 0.35, SD = 0.19) than for late onsets (M = 0.50, SD = 0.22). Post-hoc paired t-tests investigating the interaction between Direction and Age reveal that young adults show significantly lower thresholds for early onsets compared to late onsets [Early, M = 0.30, SD = 0.17; Late, M = 0.53, SD = 0.27; t(10) = −4.85, p = 0.001, Cohen’s d = −1.46], but that older adults show no difference in temporal-shift detection thresholds between the early and late onset conditions [Early, M = 0.39, SD = 0.22; Late, M = 0.46, SD = 0.16; t(9) = −1.2, p = 0.26, Cohen’s d = −0.38].

Individual differences among listeners

Although this is not an individual differences study, an examination of the performance of individual participants can help with the interpretation of the results. This is especially true in the current study, due to the relatively large individual differences within each age group. Although there was no main effect of age in this study, there was a significant interaction between Age and Temporal Shift Direction: only the younger group had significantly higher thresholds for late than for early onsets.

The individual differences in performance in this study are shown in Figure 4, which presents the data in terms a late-early difference score (mean of late-onset thresholds minus the mean of early onset thresholds) on the abscissa and a rhythm difference score (mean altered-rhythm thresholds minus the mean unaltered-rhythm thresholds) on the ordinate. It can be seen that among the older listeners (red dots, with ages indicated), four performed more like the younger listeners (blue dots) in terms of the difference between early and late thresholds. However, the rhythm difference scores for these older subjects were lower than those for many of the younger subjects with similar late-early difference scores, indicating somewhat less sensitivity to the rhythm manipulation despite the larger late-early difference scores.

FIGURE 4

Figure 4. Late-early difference scores (mean of thresholds for late onsets minus the mean of thresholds for early onsets) and rhythm difference scores (mean of thresholds for the altered rhythm conditions minus the mean of thresholds for the unaltered rhythm condition) for each subject. Ages are shown for the older subjects.

The four subjects in the older group with greater late-early difference scores (like those of the younger subjects) included the two oldest subjects (67 and 71 years), clearly showing that age alone does not account for their performance. To further examine individual differences among the older subjects we utilized data from an earlier study in which nine of the 10 older subjects had participated. The earlier study included a large test battery focusing on rhythm and speech perception. Three temporal tasks were selected from the battery, based on their focus on temporal and rhythm processing, and a working-memory measure was included to evaluate cognitive abilities. The first temporal task was a gap detection task (GAP), where listeners were asked to detect a gap (which varied over trials) in the middle of a 750-ms gaussian noise signal. The second was a rhythm discrimination (RD) task, where listeners made a same/different judgment about two similar rhythms formed by sequences of 6–8 tone pulses with temporal patterns defined by the sequence of tone-pulse intervals. The third was a synchronization and continuation (S&C) tapping task, where listeners tapped in synchrony with an isochronous auditory sequence, presented at different tempi, and then continued tapping at the same tempo after the stimulus stopped. Performance measures for these tasks were GAP: percent correct; RD: d-prime; S&C: the slope parameter of a linear regression that captures the central (non-motor) variability in tapped rhythm. The working-memory measure was from a working-memory test battery (Lewandowsky et al., 2010) consisting of three working memory tests: Memory updating, Sentence span, and Spatial short-term memory. The mean performance across all three working-memory (WM) tasks was used as a general measure of an important cognitive ability for the older participants. The tasks from the earlier study are described in more detail in Supplementary Appendix.

A correlation analysis including the GAP, S&C, RD, and WM scores from the earlier study and performance measures from the current study (late-early difference score, rhythm difference score, and overall mean performance) was conducted using the threshold data for the nine older subjects who had participated in the earlier study. The correlations and significance levels are shown in Table 1.

TABLE 1

Table 1. Correlations between measures from the temporal-shift detection task and measures of temporal abilities and working memory from an earlier study, for the older participants in this study.

As seen in Table 1, three measures from the earlier study (GAP, S&C, and WM) were highly correlated with mean thresholds across all conditions in the present study. However, the one measure that was not correlated with overall performance (RD), was the only measure that was significantly correlated with the Late-Early Difference score. (The scatterplot in Figure 5 shows an orderly association between these two measures.) The difference in performance between altered and unaltered rhythm was not significantly correlated with any of the measures from the earlier study. These results show that the differential sensitivity to early vs. late onsets (which was the only age-related effect observed in this study) is related to rhythm-discrimination ability, but not to other factors (as measured by GAP, S&C, and WM tasks) that are related to overall performance.

FIGURE 5

Figure 5. Scatterplot showing the correlation between performance in the rhythm discrimination task and the late-early difference scores (the difference between thresholds for late and early onsets) for the older listeners.

Discussion

Temporal-shift detection among young and older listeners

The main purpose of this study was to examine young and older listeners’ reliance on speech rhythm in predicting the onset of upcoming speech events. Participants detected temporal deviations in the onset of the final word in spoken sentences with and without alterations of the natural speech rhythm in the early portion of the sentences. Both young and older listeners were better at detecting temporal deviations (early and late onsets of the final word) when the rhythm of the early portion of the sentences was intact than when it was altered, and there was no difference in their overall performance levels. However, the younger group was significantly less accurate with late onsets than with early onsets, while older listeners were unaffected by the direction of the temporal shift.

These findings are consistent with a DAT framework and an entrainment timing model in which listeners’ internal (attentional) rhythm is entrained by an external, rhythmic stimulus (Jones, 1976; Jones and Boltz, 1989; McAuley and Kidd, 1998; Large and Jones, 1999; McAuley and Jones, 2003). The alteration of the natural rhythm disrupts listener entrainment and weakens the temporal expectations needed for optimal anticipation of the onset of upcoming speech events. In this experiment, listeners’ temporal expectations for the onset of the final word (“now”) are less precise without an intact natural rhythmic structure, making it more difficult to detect the temporal shift in the altered test sentences. This result is in line with existing literature showing that listeners have more difficulty with the perception of auditory patterns, both speech and nonspeech, with altered or irregular rhythmic structures (Rimmele et al., 2012; Aubanel et al., 2016; Wang et al., 2018; Shen and Pearson, 2019; McAuley et al., 2020, 2021). The present demonstration of the effect of rhythmic context on timing judgments in speech reinforces the notion that irregular or altered rhythms have a negative impact on understanding because of their effect on temporal expectancies. That irregular rhythms affect both temporal predictions (in the current study) and speech understanding (in previous work) suggests that entrainment to speech rhythm underlies both phenomena: a consistent, natural speech rhythm promotes entrainment which guides temporal expectancies which, in turn, facilitate the perception of speech events that occur at expected times.

Despite no overall differences in thresholds between the two age groups, there were significant differences in sensitivity to temporal deviations between young and older listeners. Young listeners demonstrated a significant asymmetry in detection thresholds, with consistently worse performance for late onsets than for early onsets. Related asymmetries have been demonstrated in previous work examining responses to deviations from an established temporal pattern in an acoustic signal. This is true for both the detection of a temporal deviation and for the perception of a stimulus presented earlier or later than expected. For example, in a study of the effect of deviations from expected timing on tempo judgments with isochronous tone sequences, McAuley and Kidd (1998) found that with relatively fast tempos, subjects were better at detecting tempo increases when a sequence was presented earlier than expected, but they were better at detecting tempo decreases with late-onset sequences. McAuley and Fromboluti (2014) found that the durations of unexpectedly early events were underestimated, while the durations of late events were overestimated. Another type of early/late asymmetry has been found in studies of the detection of temporally deviant onsets in isochronous sequences of clicks or brief tones: Listeners are differentially sensitive to early and late onsets, and relative performance for early and late onsets varies with the tempo of the sequence, with a slight advantage for early events at slower tempos (see Friberg and Sundberg, 1995). In a study of temporal-order judgment with auditory–visual pairs, Di Luca and Rhodes (2016) found that earlier-than-expected events were perceptually delayed, while late events were perceptually accelerated. An early/late asymmetry has also been observed in tapping tasks, where an unexpectedly early event in a guiding sequence disrupts tapping more than a late event, particularly at short inter-onset intervals (Repp, 2011; Repp and Moseley, 2012).

In the current study, the early-late asymmetry is consistent with expectations based on speech rhythm: a slowing tempo is often expected at the end of a sentence, whereas a sudden tempo increase at the end of a sentence is less likely (Delattre, 1966; Oller, 1973; Cooper and Paccia-Cooper, 1980), thus making the timing of a late onset seem correct and an early onset surprising and more salient. This type of asymmetry is consistent with a dynamic-attending account in which expectations are linked to an attentional rhythm that is guided by external timing, but also influenced by other factors (such as temporal expectations in a given context (e.g., speech), or pattern tempo relative to a preferred tempo). In addition to their speech-rate expectations, young listeners may be less sensitive to late events because of an asymmetry in the attentional pulse that facilitates perception of events that occur at expected temporal locations (see Jones, 1976; Large and Jones, 1999). That is, early events (earlier than expected) never fall within an attentional pulse, but attention may be sustained beyond the expected event onset, especially when pattern slowing is expected (as with phrase-final slowing in speech and music). Early events that do not coincide with an attentional pulse are generally less well resolved, but the “surprise” of an unexpectedly early event can attract attention and enhance the detection of early events (see Large and Jones, 1999). However, despite the enhanced detection, the lack of attentional focus generally results in poorer resolving power and poorer identification of early events. Late-occurring events that are only slightly late can fall within the attentional pulse, especially in a context in which slowing may be expected. These events lack the surprise-based salience of early events, and the timing delay may not be as noticeable, but they are often well-resolved because listeners are still attentionally prepared.

The lack of an early/late event asymmetry in the older group suggests that there may be a decline in sensitivity to speech rhythms with age. That older listeners were affected by rhythmic alteration indicates that they were still attuned to speech rhythms and could make use of them, but the lack of an early-onset advantage suggests a weakening of speech-based expectations that result in an asymmetry in the sensitivity to early and late sentence-final events.

Although as a group, the older listeners did not show a significant difference in their detection of early and late events, some older listeners did show an early/late asymmetry like that seen with young listeners. And those older listeners who did show the asymmetry also showed better rhythm-discrimination abilities with tone sequences in an earlier study (described above and in Supplementary Appendix). This suggests that not all older listeners experience the same decrease in sensitivity to speech rhythms. Although the basis for this performance difference within the older group in this study cannot be determined by the current findings, the results provide some encouragement for the continuing search for factors (e.g., auditory training, musical experience, exercise) that may reduce a decline in listening abilities with age. If, as the data suggest, age-related declines in listening abilities are related to a decrease in the ability to attentionally synchronize with speech rhythms (or external rhythms in general), then more experience with active listening to rhythmic stimuli (such as speech and music) should help preserve listening abilities as people age. Further research with a wider range of speech materials and larger groups of listeners with different listening experience will provide a better understanding of how sensitivity to speech rhythms affects speech perception and how that sensitivity changes with age.

Conclusion

Both young and older listeners were shown to rely on the rhythm of a spoken sentence when judging whether a sentence-final word was presented at the correct time after a brief muting of the sentence (a silent period replacing two words prior to the final word). This extends findings from earlier work showing that altering the natural rhythm of spoken sentences adversely affects speech understanding. The same type of rhythm alterations that led to poorer speech understanding in the earlier studies resulted in poorer detection of changes in the onset of the final word in the present study. This suggests that a decreased ability to predict the onset of upcoming speech events, resulting from an alteration of the natural speech rhythm, at least partly accounts for the poorer speech recognition performance observed with altered speech rhythms. These findings are consistent with dynamic attending theory, which proposes that speech understanding depends on the entrainment of attentional rhythms to speech rhythms, resulting in a facilitation of the perception of speech events that occur at expected times.

The findings also showed that young listeners’ temporal expectancies differed from those in the older group. Despite similar overall thresholds, young listeners were significantly worse at detecting late onsets of the final word in the sentence than early onsets, while the older group showed no significant difference in thresholds for early and late onsets. This suggests a greater reliance on speech-based expectancies with the younger listeners, whose performance was more consistent with an expectation of a slowing tempo at the end of a sentence. However, that some older listeners (those who performed better on a rhythm-discrimination task in an earlier study) showed an early-late asymmetry, like that observed with the younger group, suggests that not all older listeners undergo the same age-related change in their ability to use speech rhythms to guide temporal expectancies and facilitate speech understanding.

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement

The studies involving human participants were reviewed and approved by the Institutional Review Board, Indiana University. The patients/participants provided their written informed consent to participate in this study.

Author contributions

GK and JM conceived the study. DP wrote the first draft and performed the statistical analysis with YS. All authors contributed to the design and to the manuscript revision, and all approved the submitted version.

Funding

This research was supported by the NIH (Grant Nos. R01DC013538 to PIs: GK and JM and R01DC017988 to PI: YS).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyg.2023.1160236/full#supplementary-material

References

Ahissar, E., Nagarajan, S., Ahissar, M., Protopapas, A., Mahncke, H., and Merzenich, M. M. (2001). Speech comprehension is correlated with temporal response patterns recorded from auditory cortex. Proc. Natl. Acad. Sci. U. S. A. 98, 13367–13372. doi: 10.1073/pnas.201400998

PubMed Abstract | CrossRef Full Text | Google Scholar

Akeroyd, M. A. (2008). Are individual differences in speech reception related to individual differences in cognitive ability? A survey of twenty experimental studies with normal and hearing-impaired adults. Int. J. Audiol. 47, S53–S71. doi: 10.1080/14992020802301142

PubMed Abstract | CrossRef Full Text | Google Scholar

ANSI (2004). S3.6–2004, specification for audiometers. New York, NY: ANSI.

Google Scholar

Aubanel, V., Davis, C., and Kim, J. (2016). Exploring the role of brain oscillations in speech perception in noise: intelligibility of isochronously retimed speech. Front. Syst. Neurosci. 10:430. doi: 10.3389/fnhum.2016.00430