Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Neurosci., 06 February 2026

Sec. Auditory Cognitive Neuroscience

Volume 20 - 2026 | https://doi.org/10.3389/fnins.2026.1751421

Neural tracking of continuous speech reveals enhanced late responses to degraded speech

  • 1Department of Biomedical Engineering, University of Ulsan, Ulsan, Republic of Korea
  • 2Department of Electrical, Electronic and Computer Engineering, University of Ulsan, Ulsan, Republic of Korea
  • 3Department of Communication Sciences and Disorders, University of Iowa, Iowa City, IA, United States

Introduction: Comprehending degraded speech demands greater cognitive effort. While previous studies have identified the neural pathways involved in processing degraded speech signals, the temporal dynamics of these neural networks remain unclear.

Methods: This study investigated the time course of neural responses to clean and degraded (i.e., noise-vocoded) speech signals using temporal response functions (TRFs).

Results: Our findings reveal that early TRF components (N1TRF and P2TRF) exhibited greater amplitude and temporal precision for clean speech. In contrast, degraded speech elicited additional cortical responses with a longer delay, designated as P400TRF. Subsequent source localization analyses showed that the P400TRF component originates from language processing areas within the temporal and frontal lobes.

Discussion: These findings highlight the role of delayed neural mechanisms in maintaining speech comprehension when intelligibility is compromised, offering novel insights that broaden our understanding of auditory cortical processing under challenging listening conditions.

1 Introduction

When speech signals become noisy, distorted, or challenging to comprehend, the human auditory system compensates by combining high-level cognitive processes and low-level auditory mechanisms. The brain engages a complex network of neural systems to process degraded speech, integrating sensory, cognitive, and linguistic brain regions in a coordinated manner (Obleser and Kotz, 2010; Karunathilake et al., 2023b). Additionally, studies have highlighted that phase-locked responses in the auditory cortex are enhanced during speech comprehension, underscoring the role of neural synchronization in processing complex auditory information (Peelle et al., 2013; Hauswald et al., 2022; Kösem et al., 2023).

In the context of degraded speech, auditory cortical areas, such as the primary auditory cortex (A1) and secondary auditory regions, are primarily responsible for processing the degraded acoustic features. These regions encode basic elements like pitch, duration, and spectral properties, even when noise or reverberation is present. However, when speech intelligibility is compromised—due to background noise or low-quality signals—the brain recruits higher-level cortical regions, particularly in the prefrontal cortex and parietal areas, which involve attentional control, working memory, and top-down modulation. These regions enable the brain to fill perceptual gaps, stabilize auditory representations, and improve the listener’s ability to extract meaningful speech information (Davis and Johnsrude, 2007; Obleser and Kotz, 2010; Karunathilake et al., 2023b). Moreover, robust cortical entrainment to the speech envelope, which relies on the spectro-temporal fine structure, plays a critical role in maintaining speech intelligibility in challenging environments (Ding et al., 2014; Hauswald et al., 2022).

The neural mechanisms extend beyond simple auditory processing by integrating semantic and linguistic expectations to enhance speech intelligibility. For instance, the left inferior frontal gyrus (IFG), associated with syntactic and semantic processing, has been shown to become more active when listeners are attempting to make sense of speech under challenging conditions (Obleser et al., 2007; Peelle et al., 2010; Peelle and Wingfield, 2022). Additionally, the cortical entrainment response is significantly influenced by prior knowledge and speech intelligibility, indicating that listeners’ expectations and contextual understanding are vital for effective neural tracking of speech (Baltzell et al., 2017; Karunathilake et al., 2023b).

While temporal response function (TRF) studies have predominantly used extended speech stimuli, recent research supports the efficacy of shorter sentences (Vanthornhout et al., 2019; Das et al., 2020; Muncke et al., 2022; Slaats et al., 2023; Pendyala et al., 2024). Short sentences have been shown to provide reliable neural tracking responses and are effective in studying speech intelligibility (Muncke et al., 2022; Slaats et al., 2023).

Nevertheless, an important aspect that remains underexplored is how the timing of neural responses contributes to the comprehension of degraded speech. Most previous studies investigating the neural processing of degraded speech have primarily focused on the spatial localization of neural activity, which identifies which brain regions are activated in response to degraded speech. However, neural timing plays a crucial role in speech perception, as it determines how efficiently the brain processes degraded signals over time. Examining the onset, duration, and temporal patterns of neural responses can offer valuable insights into how the brain adapts to distorted auditory input. Understanding these time-varying dynamics is key to revealing how auditory signals are integrated and compensated for, particularly in noisy or unclear conditions.

Important findings into the time course of speech perception have emerged from eye-tracking experiments using the Visual World Paradigm (VWP; Huettig et al., 2011). VWP posits that language is processed incrementally. In spoken words, the initial portions of a speech signal often do not provide enough information to immediately identify the word’s meaning. This initial uncertainty leads to temporary but significant ambiguity among several possible words, resulting in a process known as lexical competition. Multiple potential word candidates are activated during this process and compete for selection. Lexical competition is critical in real-time language processing and is crucial for understanding how listeners decipher speech in everyday environments. Even when a word is clearly pronounced, listeners with normal hearing still experience temporary competition among these possible lexical candidates. The mechanisms through which typical listeners resolve these ambiguities have been extensively studied (e.g., McQueen et al., 1999; Dahan and Magnuson, 2006).

Immediate competition has been observed across various groups, including infants (Swingley et al., 1999; Fernald et al., 2001), adolescents (Sekerina and Brooks, 2007; Rigler et al., 2015), and postlingually deafened adults using cochlear implants (CIs) (Farris-Trimble et al., 2014). However, CI users face unique challenges due to the degraded auditory signal provided by these devices (Wilson and Dorman, 2008). These challenges highlight the differences in how CI users, especially prelingually deaf children, might process spoken language compared to typical listeners. In particular, prelingually deaf children using CIs may experience word recognition processes that differ significantly from typical listeners (McMurray et al., 2017). Despite research on lexical competition across various groups, there is a notable lack of studies investigating the neural mechanisms underlying these processes, especially in cochlear implant users. Addressing this gap is crucial, as understanding the neural basis of these strategies could lead to improvements in CI technology and rehabilitation methods. In recent years, several groups have advanced our understanding of neural responses in cochlear implant users using electrophysiological and neuroimaging approaches (Dimitrijevic et al., 2019; Langner et al., 2021; Prince et al., 2021). These studies provide important context for interpreting neural data in CI users and underscore the relevance of investigating neural processing alongside behavioral outcomes.

Recent auditory neuroscience frameworks suggest that predictive coding may provide a comprehensive explanation for how the brain processes degraded speech (Rao and Ballard, 1999; Friston, 2005). Predictive coding proposes that the brain continuously generates predictions about upcoming sensory input and minimizes the difference between expected and actual input through a process of error correction (Sohoglu and Davis, 2016; Heilbron et al., 2022). In degraded listening conditions, listeners may rely more heavily on top-down predictions to compensate for the reduced acoustic detail. Importantly, predictive coding mechanisms can operate even during passive listening, supporting the idea that linguistic and auditory processing continues automatically without requiring active attention (Todorovic et al., 2011; Auksztulewicz and Friston, 2016; Sedley et al., 2016; Sohoglu and Davis, 2016). This framework aligns with the observed neural tracking patterns in both early and late time windows under degraded speech conditions, offering a strong theoretical basis for understanding speech processing when auditory input is compromised.

Understanding the temporal dynamics of speech perception in CI users is crucial for developing more effective auditory prostheses and improving communication abilities (Sharma et al., 2002b). To address this gap, it is necessary to delve into how the brain responds to various types of speech input, particularly under conditions that replicate the auditory experiences of CI users. Investigating neural responses to different forms of lexical competition in listeners requires analyzing their reactions to continuous speech. In response to speech features, the brain demonstrates neural tracking during language processing (Lalor and Foxe, 2010). This neural tracking can provide insights into how different populations process speech, especially when the auditory signal is less than optimal.

Studies have used speech envelopes to measure neural responses to continuous speech, as these envelopes capture essential temporal dynamics of speech that correlate with EEG signals (e.g., Ding and Simon, 2012a; Di Liberto et al., 2015; O’Sullivan et al., 2015). Recent research showed that older adults exhibit stronger tracking of speech envelopes even in conditions with poorer signal-to-noise ratios (Decruy et al., 2021; Karunathilake et al., 2023a), possibly due to compensatory mechanisms to maintain comprehension (Presacco et al., 2016). These findings suggest that the brain may adapt to degraded auditory conditions by enhancing certain neural processes to maintain speech understanding. This enhanced neural tracking likely reflects the increased cognitive effort required to resolve lexical competition in challenging listening conditions. Additionally, neural tracking has been studied to indicate how the brain processes new linguistic information, particularly in high lexical competition scenarios. Such studies often reveal specific neural signatures, such as the N400 event-related potential, which reflect the cognitive effort involved in integrating challenging or unexpected words within a sentence (Broderick et al., 2018; Koskinen et al., 2020; Weissbart et al., 2020; Toffolo et al., 2022). These late neural tracking responses are typically delayed relative to word onset and are associated with higher cognitive effort required for semantic integration, especially under conditions of high lexical competition (Lau et al., 2008; reviews on N400 responses: Kutas and Federmeier, 2011; Frank et al., 2015; Frank and Willems, 2017).

Researchers often use noise-vocoded speech to replicate the auditory experience of cochlear implant users (Friesen et al., 2001; Rosen et al., 2013). This method allows researchers to simulate the auditory challenges faced by CI users and study how these challenges affect language processing at the neural level. In this study, we employed EEG recordings and speech envelope tracking techniques while presenting participants with both clear and degraded speech sentences. We aim to understand better how degraded auditory input impacts lexical processing by investigating the neural dynamics within the predictive coding framework. We specifically test whether degraded speech primarily elicits increased reliance on top-down predictions and enhanced error correction mechanisms that facilitate speech processing in challenging listening conditions. The findings from this study could contribute to developing more effective and targeted auditory training and rehabilitation strategies for CI users.

Here, we compare the TRF differences between two speech intelligibility conditions. Using scalp EEG, within-subject comparisons between cortical neural tracking of highly intelligible natural sentences and barely intelligible vocoded sentences were performed by extracting temporal envelope-related responses. In the early time range (approximately 100 ms after sound onset), the N100 component (N1), typically associated with early auditory processing and sensory abstraction stages, is expected to show stronger responses in the natural sentences condition (Näätänen, 2001; Krumbholz et al., 2003; Obleser et al., 2006). Similarly, the P2 component (approximately 200 ms after sound onset), which follows the N1 and reflects higher-level auditory processing including attention and stimulus evaluation, is also anticipated to exhibit enhanced responses in the natural speech condition. Although traditionally linked with early sensory processing, stronger N1 and P2 responses in the natural sentence condition could indicate more efficient or immediate engagement with the speech signal, facilitating quicker lexical competition and selection (Commuri et al., 2023).

Moreover, we anticipate stronger late responses in the vocoded sentences condition for the typical trajectory of semantic integration effort (N400, occurring around 300–500 ms after sentence onset). Recent studies employing TRF analyses have revealed not only early components such as N1 and P2, but also prominent later responses (often termed “P400” TRFs) that are thought to parallel the classic N400 ERP associated with semantic processing (e.g., Broderick et al., 2018). We hypothesized that degraded speech would elicit increased reliance on top-down predictions and enhanced error correction mechanisms, reflected in delayed and enhanced late neural responses, as the brain works to resolve increased prediction error (Auksztulewicz and Friston, 2016; Sohoglu and Davis, 2016; Heilbron and Chait, 2018).

We first tested the above hypothesis using sensor-space signals from central electrodes (i.e., FCz, Cz, and CPz) that are commonly employed to investigate auditory cortex activities related to speech perception (Phillips et al., 2000; Näätänen, 2001; Čeponien et al., 2002; Tremblay et al., 2003; Martin et al., 2008; Brandmeyer et al., 2013; Kayser et al., 2015; Jafarpisheh et al., 2016; Khalighinejad et al., 2017; Steinmetzger and Rosen, 2017; Drennan and Lalor, 2019; Etard and Reichenbach, 2019; Mai and Wang, 2019; Synigal et al., 2020). Subsequently, we further tested the same hypothesis in source space within six bilateral cortical regions of interest (ROI) that were selected based on previous studies that demonstrated the involvement of bilateral Heschl’s gyrus (HG), the planum polare (PP), the planum temporale (PT), the supramarginal gyrus (SMG), inferior frontal gyrus (IFG), and the middle temporal gyrus (MTG).

2 Materials and methods

2.1 Participants

Fifty normal-hearing subjects aged between 20 and 33 (mean, 24.1 years; standard deviation, 2.4; 25 men and 25 women) participated in this study. All study procedures were reviewed and approved by the Institutional Review Board of the University of Ulsan. All participants signed an informed consent form, and the study was carried out under approved guidelines.

2.2 Behavioral speech intelligibility (SI) test and stimuli

The 4-channel vocoded and natural conditions were employed. The Korean Sentence Recognition Test for adults was conducted to obtain the behavioral SI scores prior to EEG data acquisition (Jang et al., 2008). A 4-channel vocoder degraded the sentences to provide a lower speech intelligibility condition, as Wilson et al. (1991) outlined. The vocoder implementation used a logarithmically-spaced filter bank spanning 200–5,000 Hz. The envelope of each frequency band was extracted using a 200 Hz low-pass filter and used to modulate Gaussian white noise carriers, which were then synthesized sequentially to reconstruct the speech signal (Mehta and Oxenham, 2017). This process preserved temporal envelope cues while removing spectral fine structure, resulting in moderate spectral degradation characteristic of 4-channel vocoded speech. The SI scores of the 4-channel vocoded conditions were significantly lower than those in the natural speech condition (p < 0.05; paired t-test, Table 1). Ten continuous Korean sentences, each lasting less than 2.5 s, were selected from the Korean Standard Sentence Lists for Adults (KS-SL-A) (Jang et al., 2008). To avoid stimulus overlap between behavioral and EEG testing, the behavioral SI test employed 10 sentence sets that did not contain any of the sentences selected for the EEG experiment.

Table 1
www.frontiersin.org

Table 1. Means and standard deviations for word-recognition scores for both speech intelligibility conditions.

2.3 EEG recording and processing

Participants were seated approximately 1 m apart from two loudspeakers while watching a silent movie. Sentences were randomly played at 65 dB SPL through the loudspeakers in a soundproof room under two conditions: passive listening to vocoded speech (degraded condition) and natural speech (clean condition). This presentation level was selected to ensure comfortable listening while maintaining consistent audibility across all participants. To minimize training effects and ensure that participants’ performance in the degraded condition was not influenced by prior exposure to clear speech, all participants first completed the degraded condition, followed by the clean condition. Critically, the identical set of 10 sentences was used in both the vocoded and natural conditions. Each sentence was repeated 100 times with an inter-stimulus interval of 3 s in a randomized order within each condition block. This design allowed for direct comparison of neural responses to the same linguistic content under different acoustic intelligibility conditions. Brain activity during the passive listening tasks was recorded using a 64-channel EEG system (Biosemi Active 2 system, Biosemi Co., Netherlands) at a sampling rate of 2,048 Hz. The EEG data were band-pass filtered between 1 and 57 Hz using a finite impulse response filter. Signals were subsequently downsampled to 256 Hz and segmented into epochs of 3 s, starting 0.5 s before stimulus onset. The estimated potential was computed by averaging 100 epochs and filtered through a 1–15 Hz band-pass filter for further analysis.

2.4 Control analysis: alpha power as a marker of alertness

To address potential confounding effects of block order on alertness and fatigue, we conducted a control analysis examining alpha power (8–12 Hz), a well-established electrophysiological marker of vigilance and cognitive engagement (Pivik and Harman, 1995; Lai et al., 2022). Alpha band activity (8–12 Hz) was extracted from the preprocessed EEG and averaged across parietal sites (Pz, POz). Mean alpha power was compared between vocoded and natural condition using paired t-tests (two-tailed, degrees of freedom = 49, n = 50). The absence of significant differences would indicate that condition effects are not confounded by fatigue/ alertness changes.

2.5 TRF estimation

An auditory spectrogram was created, containing 128 spectrally resolved sub-band envelopes of the speech signals, using the NSL toolbox (Chi et al., 2005). These sub-bands are logarithmically spaced between approximately 90 and 4,000 Hz. The speech envelopes were obtained by averaging all sub-band envelopes across frequency bands. The speech envelopes were downsampled to a sampling rate of 256 Hz to match the sampling rate of the EEG signals. Baseline normalization was performed by subtracting the baseline mean from the EEG signals and then dividing by the baseline standard deviation. The baseline period (0.5 s before stimulus onset) was pooled across epochs and channels to estimate the baseline mean and standard deviation. This normalization was applied across epochs and channels to ensure consistency across epochs and preserve the relative power differences across channels.

The TRFs of natural and vocoded condition were estimated using the multivariate temporal response function (mTRF) Toolbox (Ding and Simon, 2012b; Crosse et al., 2016). The time window ranged from −100 to 600 ms. The ridge parameter (λ), which control for overfitting, was searched over a range from 10−6 to 106. For each λ, TRF model performance was validated using a leave-one-out cross-validation (across epochs) approach, with performance quantified by the mean squared error (MSE) (averaged across channels) on unseen data (Browne, 2000). Model performance for each condition at each λ was then averaged over subjects. The λ value that yielded the lowest MSE for both conditions was 750 and was subsequently used as the optimal ridge parameter. Finally, a grand average of TRFs was obtained for each condition.

The global field power (GFP) of TRFs was computed for each participant in both the natural and 4-channel vocoded speech conditions. For sensor-level analysis, paired t-tests (two-tailed, degrees of freedom = 49, n = 50) were performed at each time point to compare GFP between conditions. The resulting p-values were corrected for multiple comparisons using the false discovery rate (FDR) method (Benjamini and Hochberg, 1995). Three continuous clusters of significant time periods (80–103 ms, 131–185 ms, and 376–408 ms) were identified, corresponding to the N1TRF, P2TRF, and P400TRF components, respectively, following conventional auditory evoked potential nomenclature. For each period, t-values comparing natural and vocoded conditions are shown in the figures using color coding. Additionally, GFPs for each condition were compared with their respective baselines (average amplitude from −100 ms to 0 ms) to identify significant response periods (two-tailed paired t-test, degrees of freedom = 49, n = 50).

2.6 TRF components validation: permutation testing

To validate that identified TRF components (N1, P2, and P400) represent genuine stimulus–response coupling rather than chance-level noise or artifacts, permutation testing was conducted. Stimulus envelopes were shuffled relative to EEG data to generate a null distribution of TRF amplitudes expected under the assumption of no true stimulus–response relationship. For each subject and condition (vocoded and natural), TRFs were computed using the identical analysis pipeline as the original analysis, except stimulus envelopes were shuffled relative to the EEG recordings. Valid TRFs (computed from original, non-shuffled stimulus-EEG pairings) were compared against this null distribution (shuffled TRFs) at each time point and each condition. Peak amplitudes of TRF components (N1, P2, and P400) were compared between valid TRFs and shuffled TRFs using paired t-tests (two-tailed, α = 0.05, with FDR correction for multiple comparisons). Significantly higher valid TRF amplitudes would demonstrate that components reflect genuine neural tracking of the speech signal rather than noise.

2.7 TRF at source level using the inverse problem

The inverse problem was solved to calculate the source level of the evoked potentials using the MNE-python toolbox with 64-electrode EEG signals at the sensor level. First, the linear inverse operator was assembled using the noise covariance computed from the baseline period of each epoch (−500 ms to 0 ms) and the forward operators provided by FreeSurfer (Fischl, 2012). Then, the inverse operator was used to project EEG signals to 20,484 voxels at the source level using dSPM noise normalization (Dale et al., 2000). Only the orientation perpendicular to the cortical surface was extracted to compute the TRF at the source level.

The TRF of the source level was generated using the mTRF toolbox between the source current intensity and the speech temporal envelope at the same time points corresponding to the significant periods identified at the sensor level (N1TRF, P2TRF, and P400TRF) for each voxel. The ridge parameter λ was set to 750, the same as the sensor level. For source-level analysis, paired t-tests (two-tailed, degrees of freedom = 49, n = 50) were performed at each of 20,484 voxels comparing natural and 4-channel vocoded conditions, with statistical significance set at p < 0.05 and FDR correction applied across all voxels. The t-values are denoted by colors in the figures. Regions of interest were defined using the Destrieux atlas implemented in FreeSurfer (Fischl, 2012). Six bilateral ROIs were selected based on their established roles in speech processing: HG, PP, TP, SMG, MTG, and IFG. These regions were chosen to capture both early auditory processing and higher-level language areas.

3 Results

3.1 TRF at sensor level

Figure 1A displays the grand average TRFs at central electrodes (FCz, Cz, CPz) for both conditions. Figure 1C shows the corresponding GFP, which quantifies the overall neural response strength across all electrodes. The mTRF analysis revealed distinct components: N1TRF, P2TRF, and P400TRF (Figure 1A). GFP analysis identified three time periods with statistically significant differences between conditions (FDR-corrected paired t-tests, t(49), p < 0.05): 80–103 ms (N1TRF), 131–185 ms (P2TRF), and 376–408 ms (P400TRF), respectively. Notably, significant differences were observed between the natural and vocoded conditions at the central electrodes for N1TRF (80–103 ms) and P2TRF (131–185 ms), with larger amplitudes in the natural condition (p < 0.05, paired t-test, FDR-corrected) (Figure 1B). Additionally, the P400TRF within 376,408 ms showed more extensive responses in the vocoded condition than in the natural condition (p < 0.05, paired t-test, FDR-corrected) (Figure 1B).

Figure 1
Graphs and visuals illustrate the impact of natural and vocoded conditions on TRF units and global field power. (A) Line graph showing TRF unit (β) over lag (ms) with significant intervals labeled a, b, c. (B) Box plots of TRF unit across different lag intervals exhibiting statistically significant differences. (C) Line graph of global field power of TRF over lag, highlighting significant intervals. (D) Topographical maps showing distribution of TRF (β) for both natural and vocoded conditions, alongside dominance maps. Red and blue colors indicate TRF levels and dominance t-values. Significant differences marked by asterisks.

Figure 1. (A) Grand averaged TRF at the central electrodes (i.e., FCz, Cz, CPz). The black boxes represent significantly different periods between the natural (red line) and vocoded (blue line) conditions. The solid line denotes the grand average TRF, while the shaded area denotes the standard error of TRFs in each condition. (B) The box plot represents TRF amplitude at the N1TRF (80–103 ms), P2TRF (131–185 ms), and P400TRF (376–408 ms). (C) The global field power (GFP) of TRF is in the natural (red) and vocoded (blue) condition. (D) Topography of TRF components in the natural (top), in the vocoded (middle), and dominance (bottom) at the N1TRF, P2TRF, and P400TRF time lags. Topographic dominance maps show the results of paired t-tests comparing natural vs. vocoded conditions at each electrode (t49, p < 0.05, FDR-corrected). Red indicates significantly larger responses to natural speech, blue indicates significantly larger responses to vocoded speech. The t-value scale ranges from −10 to +10 to accommodate the full range of statistical differences observed across electrodes. The central area significantly differs in the N1TRF, P2TRF, and P400TRF components.

The topographic maps of grand average TRF for the natural and vocoded conditions (first and second row) and their t-value (third row) at the N1TRF, P2TRF, and P400TRF time lags are shown in Figure 1D. In the natural condition, N1TRF and P2TRF components dominated the central region, peaking at electrode FC4 [t(49) = +7.49, p < 0.05, FDR-corrected] and electrode C3 [t(49) = +12.10, p < 0.05, FDR-corrected], respectively. In contrast, the vocoded condition exhibited dominance in the P400TRF component, peaking at electrode Cz [t(49) = −7.49, p < 0.05, FDR-corrected]. Across all 64 channels, t-values ranged from −7.49 to +12.10 (Figure 1D).

3.2 Control analysis: alpha power as a marker of alertness

To address the potential confound of fixed block order, we conducted control analyses examining alpha power (8–12 Hz) of the TRF model across conditions (Figure 2A). Alpha power did not differ significantly between natural (mean: 9.69 μV2/Hz, standard deviation (SD): 9.45) and vocoded (mean: 9.50 μV2/Hz, SD: 7.94) condition [t(49) = 0.32, p = 0.75]. The absence of significant differences indicates that condition effects are not confounded by fatigue or alertness changes.

Figure 2
Three box plots comparing

Figure 2. (A) Alpha power (8–12 Hz) across conditions. (B) Mean squared error (MSE) of TRF model prediction. (C) Pearson correlation between stimulus envelope and neural response. Box plots display median (center line), interquartile range (box), and full range (whiskers) for natural (red) and vocoded (blue) conditions. Asterisks denote statistical significance: *p < 0.05, ***p < 0.001; n.s., not significant (p > 0.05).

3.3 TRF model performance

To characterize acoustic-neural coupling, we examined measurements of TRF model prediction performance: MSE and Pearson correlation (Figures 2B,C). MSE of the TRF model differed significantly between conditions [Natural: 1.297 ± 0.121, Vocoded: 1.227 ± 0.095, t(49) = 2.51, p = 0.016]. Pearson correlation between stimulus envelope and neural response was significantly higher in the vocoded condition compared to the natural condition [Natural: 0.030 ± 0.009, Vocoded: 0.037 ± 0.011, t(49) = −6.43, p < 0.001]. Notably, across both natural and vocoded conditions, valid TRFs showed markedly better prediction performance than shuffled TRFs at the subject level for both mean squared error and Pearson correlation (all p < 0.001; Table 2), indicating that the reported model performance cannot be explained by chance-level correlations.

Table 2
www.frontiersin.org

Table 2. TRF model prediction performance for valid and shuffled models in natural and vocoded speech conditions.

3.4 TRF components validation against chance level

Valid TRFs showed significantly higher amplitudes than shuffled TRFs at all three identified components in both conditions (all p < 0.001) (Figure 3). For the N1TRF component, valid TRF amplitudes significantly exceeded shuffled controls in both the vocoded condition [Valid: 0.29 ± 0.11, Shuffled: 0.18 ± 0.05, t(49) = 6.29, p < 0.001] and the natural condition [Valid: 0.43 ± 0.20, Shuffled: 0.18 ± 0.06, t(49) = 8.44, p < 0.001]. Similarly, P2TRF component amplitudes were significantly higher for valid TRFs compared to shuffled TRFs in the vocoded condition [Valid: 0.31 ± 0.11, Shuffled: 0.19 ± 0.05, t(49) = 7.71, p < 0.001] and the natural condition [Valid: 0.55 ± 0.21, Shuffled: 0.19 ± 0.05, t(49) = 12.01, p < 0.001]. Most notably, the P400TRF component—the late response hypothesized to reflect compensatory processing for degraded speech—showed a robust validation pattern. P400TRF amplitudes significantly exceeded shuffled controls in the vocoded condition [Valid: 0.40 ± 0.13, Shuffled: 0.17 ± 0.05, t(49) = 13.30, p < 0.001] and in the natural condition [Valid: 0.28 ± 0.10, Shuffled: 0.17 ± 0.05, t(49) = 6.92, p < 0.001]. This robust validation across all three components—particularly the pronounced P400TRF in the vocoded condition—demonstrates that observed TRF amplitudes reflect genuine stimulus–response coupling rather than noise or chance-level correlations. The significantly higher valid TRF amplitudes across all time windows provide definitive evidence that the identified neural responses represent robust, stimulus-locked neural processes underlying intelligibility-driven speech processing.

Figure 3
Graphs showing global field power of TRF for actual and shuffled pairs. (A) Red line for actual pairs peaks around 200 ms. (B) Blue line for actual pairs shows consistent peaks. (C) Box plots for 80-103, 131-185, and 376-408 ms intervals showing significant differences marked with asterisks for red plots. (D) Similar significant differences for blue plots. Labels a, b, and c mark intervals.

Figure 3. (A,B) Grand-averaged TRF waveforms for natural (A, red) and vocoded (B, blue) conditions, comparing actual stimulus–response pairs (colored lines with shaded 95% confidence interval) against shuffled pairs (black lines). Horizontal gray bars denote the three time windows analyzed (a: 80–103 ms, b: 131–185 ms, c: 376–408 ms). (C,D) Global field power (GFP) of valid TRFs (colored) versus shuffled TRFs (gray) for natural (C) and vocoded (D) conditions within each time window. Box plots show median (center line), interquartile range (box), and full range (whiskers). Asterisks denote statistical significance: *** p < 0.001.

3.5 TRF at source level

At the source level, the mTRF analysis was performed for the N1TRF, P2TRF, and P400TRF periods, investigating source localization related to speech intelligibility (Figure 4). The natural condition exhibited significant early dominance for N1TRF, with the peak voxel located in the left MTG [t(49) = 8.67, p < 0.05], and for P2TRF, peaking in the left SMG [t(49) = 13.67, p < 0.05]. In contrast, the vocoded condition showed dominant P400TRF responses, with the peak voxel located in the right IFG [t(49) = −7.24, p < 0.05]. Figure 4 also shows the dominance of both natural and vocoded conditions, with the color bar representing t-values from −10 to 10 (paired t-test, FDR-corrected). As observed in Figure 4, the natural condition exhibited dominance in the early components (N1TRF and P2TRF), while the vocoded condition showed dominance in the late component (P400TRF).

Figure 4
Brain activation maps displaying TRF beta and dominance t-values during auditory processing at three time frames: N1_TRF (80–103 ms), P2_TRF (131–185 ms), and P400_TRF (376–408 ms). Natural and vocoded stimuli are compared, showing varying levels of activity visualized with a color scale from blue to red for TRF beta and an additional scale from blue to yellow for dominance t-value.

Figure 4. Source localization of the continuous speech-evoked potential at N1TRF (80–103 ms), P2TRF (131–185 ms), and P400TRF (376–408 ms) time lags. The dominance row denotes the T-value from paired t-tests at each of the 20,484 cortical voxels (blue: vocoded condition > natural condition, red: natural condition > vocoded condition), comparing natural vs. vocoded conditions (t49, p < 0.05, FDR-corrected across all voxels). The mask was applied to display only voxels reaching statistical significance after multiple comparison correction. A mask was applied (p < 0.05, FDR-corrected). TRF, temporal response function; FDR, false discovery rate.

The ratios of significant voxels within 6 ROIs to the condition were calculated (Table 3). For all ROIs, the early components (N1TRF and P2TRF) were dominant to the natural condition, while the late component (P400TRF) was dominant to the vocoded condition.

Table 3
www.frontiersin.org

Table 3. The percentage of significantly dominant voxels in 6 regions of interest (ROIs) related to language processing for N1TRF, P2TRF, and P400TRF (paired t-test, p < 0.05, false discovery rate-corrected).

4 Discussion

The present study sought to elucidate the time course of degraded speech perception by examining the TRF in response to speech with varying intelligibility. By comparing clear and vocoded speech conditions, we identified distinct neural patterns associated with each condition, offering insights into how the brain manages lexical competition in diverse auditory environments.

Numerous studies have demonstrated that language processing can occur automatically, even when listeners are not actively attending to speech. Neural responses associated with lexical and semantic processing—such as the N400—have been observed in passive listening paradigms, with recent evidence demonstrating that semantic-level markers including N400 and semantic TRFs can be elicited in such contexts (Brodbeck et al., 2018; Yang et al., 2024). Recent work using TRF and continuous speech paradigms further shows that robust neural tracking of linguistic features is present during passive listening, suggesting that core aspects of language comprehension operate automatically and do not require focused attention (Crosse et al., 2016; Brodbeck et al., 2018; Broderick et al., 2018). Importantly, clinical and neuroimaging research confirms that passive auditory paradigms can reliably elicit language lateralization and language-related brain activity (Okahara et al., 2024). In addition, electrophysiological studies have shown that native language advantages in speech processing are evident even in passive listening, supporting the notion that language processing can proceed without active engagement (Yang et al., 2024). Collectively, these findings support the view that passive listening is sufficient to engage the neural mechanisms underlying language processing, allowing researchers to investigate speech comprehension in more naturalistic and ecologically valid settings.

Within a predictive coding framework (Friston, 2005), our findings demonstrate that accurate sensory predictions in clear speech reduce early TRF amplitudes, while increased prediction errors under degraded conditions lead to delayed, amplified late TRF components (P400TRF). Our findings revealed that early TRF components, specifically N1TRF and P2TRF, exhibited stronger responses in the clear speech condition, indicating more effective initial auditory processing when the speech signal is intelligible (Näätänen, 2001; Krumbholz et al., 2003). This aligns with the predictive coding framework, where accurate top-down predictions facilitate rapid and efficient early sensory processing, minimizing prediction error and supporting fluent speech comprehension (Friston, 2005; Auksztulewicz and Friston, 2016; Sohoglu and Davis, 2016; Heilbron et al., 2022). In contrast, the P400TRF component was delayed and more pronounced in response to degraded speech. Within the predictive coding framework, this late component is interpreted as a neural signature of increased prediction error and the recruitment of higher-order cortical regions to resolve uncertainty (Garrido et al., 2009b; Blank and Davis, 2016; Sohoglu and Davis, 2016; Heilbron et al., 2022). Although this response showed a positive deflection—unlike the canonical N400 observed in traditional ERP studies—such polarity differences are less critical in the context of TRF analyses. TRF analysis estimates the linear stimulus–response mapping (Crosse et al., 2016). As highlighted in previous TRF studies, the timing and functional context of neural responses are more informative than their polarity, which can vary based on analysis methods and referencing schemes (Di Liberto et al., 2015; Crosse et al., 2016; Brodbeck et al., 2018; Broderick et al., 2018). Thus, our key finding is the delayed temporal dynamics of the response, likely reflecting compensatory or additional semantic integration processes as the brain attempts to maintain comprehension when early acoustic features are degraded.

While our paradigm involved passive listening, converging evidence from EEG, MEG, and fMRI studies demonstrates that key markers of linguistic processing—such as the N400 and semantic TRFs—can be elicited even in the absence of explicit tasks (Wacongne et al., 2011; Brodbeck et al., 2018; Heilbron and Chait, 2018; Yang et al., 2024). The late TRF component (P400TRF) observed in our study is consistent in timing and cortical localization with these established markers of semantic and lexical processing. Although passive listening may not engage all aspects of conscious language comprehension, these findings support the interpretation that our results reflect genuine linguistic processing, not merely low-level auditory prediction.

An interesting dissociation emerged in measurements of TRF model performance between vocoded and natural conditions. MSE of the TRF model was significantly lower for vocoded conditions (vocoded: 1.227 ± 0.095 vs. natural: 1.297 ± 0.121, p = 0.016). Pearson correlation was significantly higher for vocoded speech (vocoded: 0.037 ± 0.011 vs. natural: 0.030 ± 0.009, p < 0.001). This dissociation likely reflects differences in neural processing complexity, consistent with recent findings on speech intelligibility (Karunathilake et al., 2023b). Vocoded speech with reduced acoustic variability and more consistent spectral structure constrains neural processing to envelope-driven mechanisms, resulting in more stimulus-locked and thus more predictable neural responses (lower MSE and higher correlation), while natural speech enables complex linguistic and semantic processing beyond envelope tracking, introducing variability that reduces correlation and increases MSE. Within a predictive coding framework (Sohoglu and Davis, 2016), degraded speech simplifies processing to lower-level sensory predictions relying on prediction error minimization (Blank and Davis, 2016), whereas intact speech engages richer, higher-order mechanisms that support the observed P400TRF response as a compensatory error-correction process. An important consideration is whether the reduced P400TRF in the natural condition reflects intelligibility-driven processing or instead arises from repetition suppression and stimulus familiarity effects, given that identical sentence content was presented in both vocoded and natural condition blocks. The component-specific pattern observed in our data argues against a pure familiarity account. Repetition suppression typically manifests as global neural suppression across all components; our findings instead reveal selective enhancement of early components (N1TRF, P2TRF) coupled with suppression only of the late P400TRF component (Grill-Spector et al., 2006; Garrido et al., 2009a). This selective dissociation is inconsistent with global suppression predicted by repetition suppression or familiarity-driven accounts. Furthermore, alpha power did not differ significantly between blocks [t(49) = 0.32, p = 0.75], indicating that systematic changes in neural vigilance or fatigue do not confound the observed condition differences (Engell and Mccarthy, 2014). The source localization of the P400TRF to the inferior frontal gyrus (IFG)—a region consistently associated with semantic processing, error correction, and prediction error detection during language comprehension rather than novelty detection—further supports an intelligibility-driven interpretation (Thompson-Schill et al., 1997; Hartwigsen et al., 2013; Teghipco et al., 2023). Within the predictive coding framework, the vocoded condition necessitates increased reliance on top-down predictions and error correction, manifested as enhanced P400TRF activity, while the natural condition provides sufficient acoustic cues for rapid word recognition with minimal prediction error, reducing the need for late error-correction mechanisms (Blank and Davis, 2016; Sohoglu and Davis, 2016). The observed dissociation between component enhancement (N1TRF, P2TRF) and suppression (P400TRF), combined with stable neural state and appropriate source localization, is most parsimoniously explained by intelligibility-driven neural processing rather than repetition suppression or familiarity effects.

The observed patterns at both the sensor and source levels highlight the brain’s dynamic adaptation to varying auditory environments. In clear speech conditions, TRF components likely reflect successful top-down predictions that minimize sensory surprise, supporting rapid word recognition. However, under vocoded conditions, the brain appears to compensate by prolonging processing, recruiting additional cortical resources to support comprehension. This adaptive mechanism is particularly relevant for individuals with hearing impairments, such as CI users, who often experience degraded auditory signals similar to those simulated by vocoded speech in this study (Wilson and Dorman, 2008). The neural adaptability observed here suggests that even under less-than-optimal listening conditions, the brain may reallocate resources to maintain comprehension, albeit with increased cognitive effort (Giraud and Truy, 2002; Alain et al., 2018).

The increased P400TRF in the vocoded condition may reflect a broader engagement of cognitive resources that supports semantic processing when acoustic cues are degraded. Evidence from continuous speech TRF analyses indicates that late neural components originating from prefrontal and temporal language regions encode semantic and linguistic information, with enhanced responses during high lexical competition scenarios or when acoustic input is compromised (Kegler et al., 2022). This finding is consistent with studies on aging, which show that older adults often exhibit stronger neural tracking of speech envelopes despite poorer signal-to-noise ratios, likely due to compensatory mechanisms aimed at preserving speech comprehension (Presacco et al., 2016; Duta and Plunkett, 2021; Yip et al., 2021). These results suggest that under degraded conditions, the brain increasingly depends on higher-level predictive and semantic processes to support understanding (Ding and Simon, 2013; Peelle and Wingfield, 2016).

It is noteworthy that we used a passive listening condition where subjects watched a silent movie during auditory stimuli were presented. Such conditions are commonly used to divert participants’ attention while still enabling the study of automatic neural responses related to lexical competition and syntactic processing (Pulvermüller et al., 2008; Kong et al., 2014). These previous findings indicate that speech processing can occur without explicit attention, highlighting the robustness of automatic linguistic and lexical processing mechanisms. Our finding is consistent; both short and long-latency TRF components appeared during passive listening, which may suggest speech processing occurs even in the absence of an explicit task. This finding suggests that passive listening paradigms may be a reliable method for studying degraded speech perception, particularly in clinical populations who may have difficulty sustaining attention during active tasks.

Understanding these neural mechanisms is crucial for improving auditory prostheses and rehabilitation strategies for CI users. Enhanced early neural responses in clear conditions could inform the development of auditory training programs to aim at strengthening early prediction-based processing. Conversely, heightened late responses in degraded conditions suggest the importance of interventions that enhance error monitoring and semantic integration. This could include auditory training focused on top-down prediction, or the use of visual or contextual cues to support comprehension under challenging conditions (Sharma et al., 2002a; Giraud and Lee, 2007).

While the present study provides insights into the neural processing of degraded speech, one notable limitation warrants discussion. The behavioral SI test revealed near-ceiling performance in the natural speech condition (mean = 99.65%, SD = 1.01%), precluding examination of correlations between behavioral SI and neural response measures. Future studies employing populations with greater SI variability—such as older adults, individuals with hearing loss, or cochlear implant users—could reveal informative between-subject associations. Additionally, the present study employed normal-hearing young adults, limiting generalizability to other populations with degraded auditory input. Future research should investigate how these neural mechanisms evolve with experience and training, particularly in auditory-impaired populations. Longitudinal studies could provide valuable insights into the neural plasticity associated with degraded speech processing and word recognition strategies. Additionally, exploring the interplay between cognitive factors such as attention, working memory, and neural responses could further elucidate how the brain manages comprehension in challenging listening conditions (Davis and Johnsrude, 2007; McGettigan et al., 2010). Future studies should include explicit word-onset and lexical surprisal predictors to confirm the semantic interpretation of the P400TRF component (Crosse et al., 2016; Brodbeck et al., 2018; Broderick et al., 2018).

While our findings are well explained by the predictive coding framework, an alternative interpretation is the “wait-and-see” strategy (Norris et al., 2016; McMurray et al., 2017). This approach suggests that listeners may delay lexical decisions and accumulate more acoustic information before committing to a word, especially under degraded conditions (Brodbeck et al., 2018; Broderick et al., 2018). The delayed P400TRF component observed in our study could reflect this adaptive, conservative processing style. Although our passive listening paradigm does not directly test this hypothesis, the presence of a delayed neural response in degraded conditions is consistent with this interpretation. Future studies employing active tasks or explicit decision-making paradigms could further clarify the contribution of “wait-and-see” mechanisms to speech comprehension under degraded listening conditions.

In conclusion, this study advances our understanding of the neural dynamics involved in predictive processing under varying speech intelligibility conditions. The differential engagement of early and late TRF components across clear and vocoded speech conditions underscores the brain’s flexibility in processing spoken language. These findings have significant implications for developing auditory prostheses and highlight the importance of tailored rehabilitation strategies that account for the cognitive demands of different listening environments.

Data availability statement

De-identified raw data supporting the conclusions of this article will be made available from the authors upon reasonable request, in accordance with ethics approval and participant consent.

Ethics statement

The studies involving humans were approved by the Institutional Review Board of the University of Ulsan. The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.

Author contributions

YN: Formal analysis, Writing – review & editing, Data curation, Visualization, Methodology, Validation, Software, Investigation, Writing – original draft. LD: Writing – original draft, Software, Investigation, Visualization, Writing – review & editing, Methodology, Formal analysis, Data curation. HJ: Data curation, Writing – review & editing, Investigation, Writing – original draft. IC: Writing – original draft, Writing – review & editing, Validation, Supervision, Conceptualization. JW: Writing – original draft, Resources, Writing – review & editing, Project administration, Validation, Funding acquisition, Supervision, Conceptualization.

Funding

The author(s) declared that financial support was received for this work and/or its publication. This work was supported by the National Research Foundation of Korea (NRF) grants from the Korean government (NRF-RS-2024-00461617, RS-2024-00338148). This work was supported by U.S. Department of Defense (DoD) Hearing Restoration and Rehabilitation Program grant awarded to I.C. (HT9425-23-1-0912), National Institute on Deafness and Other Communication Disorders (NIDCD) P50 (DC000242) awarded to I.C., T.D.G., B.M., and Bruce J. Gantz, and National Science Foundation Grant No. 2306331 awarded to Chipara, Adhikari, Wu, and Choi.

Conflict of interest

The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declared that Generative AI was not used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Alain, C., Du, Y., Bernstein, L. J., Barten, T., and Banai, K. (2018). Listening under difficult conditions: an activation likelihood estimation meta-analysis. Hum. Brain Mapp. 39, 2695–2709. doi: 10.1002/HBM.24031,

PubMed Abstract | Crossref Full Text | Google Scholar

Auksztulewicz, R., and Friston, K. (2016). Repetition suppression and its contextual determinants in predictive coding. Cortex 80, 125–140. doi: 10.1016/J.CORTEX.2015.11.024,

PubMed Abstract | Crossref Full Text | Google Scholar

Baltzell, L. S., Srinivasan, R., and Richards, V. M. (2017). The effect of prior knowledge and intelligibility on the cortical entrainment response to speech. J. Neurophysiol. 118, 3144–3151. doi: 10.1152/jn.00023.2017,

PubMed Abstract | Crossref Full Text | Google Scholar

Benjamini, Y., and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B Stat Methodol. 57, 289–300. doi: 10.1111/j.2517-6161.1995.tb02031.x

Crossref Full Text | Google Scholar

Blank, H., and Davis, M. H. (2016). Prediction errors but not sharpened signals simulate multivoxel fMRI patterns during speech perception. PLoS Biol. 14:e1002577. doi: 10.1371/JOURNAL.PBIO.1002577

Crossref Full Text | Google Scholar

Brandmeyer, A., Farquhar, J. D. R., McQueen, J. M., and Desain, P. W. M. (2013). Decoding speech perception by native and non-native speakers using single-trial electrophysiological data. PLoS One 8:e68261. doi: 10.1371/journal.pone.0068261,

PubMed Abstract | Crossref Full Text | Google Scholar

Brodbeck, C., Hong, L. E., and Simon, J. Z. (2018). Rapid transformation from auditory to linguistic representations of continuous speech. Curr. Biol. 28, 3976–3983.e5. doi: 10.1016/j.cub.2018.10.042,

PubMed Abstract | Crossref Full Text | Google Scholar

Broderick, M. P., Anderson, A. J., Di Liberto, G. M., Crosse, M. J., and Lalor, E. C. (2018). Electrophysiological correlates of semantic dissimilarity reflect the comprehension of natural, narrative speech. Curr. Biol. 28, 803–809.e3. doi: 10.1016/j.cub.2018.01.080

Crossref Full Text | Google Scholar

Browne, M. W. (2000). Cross-validation methods. J. Math. Psychol. 44, 108–132. doi: 10.1006/jmps.1999.1279

Crossref Full Text | Google Scholar

Čeponien, R., Rinne, T., and Näätänen, R. (2002). Maturation of cortical sound processing as indexed by event-related potentials. Clin. Neurophysiol. 113, 870–882. doi: 10.1016/S1388-2457(02)00078-0

Crossref Full Text | Google Scholar

Chi, T., Ru, P., and Shamma, S. A. (2005). Multiresolution spectrotemporal analysis of complex sounds. J. Acoust. Soc. Am. 118, 887–906. doi: 10.1121/1.1945807,

PubMed Abstract | Crossref Full Text | Google Scholar

Commuri, V., Kulasingham, J. P., and Simon, J. Z. (2023). Cortical responses time-locked to continuous speech in the high-gamma band depend on selective attention. Front. Neurosci. 17:1264453. doi: 10.3389/FNINS.2023.1264453/BIBTEX

Crossref Full Text | Google Scholar

Crosse, M. J., Di Liberto, G. M., Bednar, A., and Lalor, E. C. (2016). The multivariate temporal response function (mTRF) toolbox: a MATLAB toolbox for relating neural signals to continuous stimuli. Front. Hum. Neurosci. 10:604. doi: 10.3389/fnhum.2016.00604,

PubMed Abstract | Crossref Full Text | Google Scholar

Dahan, D., and Magnuson, J. S. (2006). “Spoken word recognition” in Handbook of psycholinguistics. eds. M. J. Traxler and M. A. Gernsbacher (Amsterdam: Elsevier), 249–283.

Google Scholar

Dale, A. M., Liu, A. K., Fischl, B. R., Buckner, R. L., Belliveau, J. W., Lewine, J. D., et al. (2000). Dynamic statistical parametric mapping: combining fMRI and MEG for high-resolution imaging of cortical activity. Neuron 26, 55–67. doi: 10.1016/S0896-6273(00)81138-1,

PubMed Abstract | Crossref Full Text | Google Scholar

Das, N., Vanthornhout, J., Francart, T., and Bertrand, A. (2020). Stimulus-aware spatial filtering for single-trial neural response and temporal response function estimation in high-density EEG with applications in auditory research. NeuroImage 204:116211. doi: 10.1016/J.NEUROIMAGE.2019.116211,

PubMed Abstract | Crossref Full Text | Google Scholar

Davis, M. H., and Johnsrude, I. S. (2007). Hearing speech sounds: top-down influences on the interface between audition and speech perception. Hear. Res. 229, 132–147. doi: 10.1016/j.heares.2007.01.014,

PubMed Abstract | Crossref Full Text | Google Scholar

Decruy, L., Vanthornhout, J., Kuchinsky, S. E., Anderson, S., Simon, J. Z., and Francart, T. (2021). “Neural tracking of continuous speech is exaggerated in healthy aging and hearing impaired adults” in Abstract book of 2021 midwinter meeting of Association for Research in otolaryngology. Orlando, Florida, United states.

Google Scholar

Di Liberto, G. M., O’Sullivan, J. A., and Lalor, E. C. (2015). Low-frequency cortical entrainment to speech reflects phoneme-level processing. Curr. Biol. 25, 2457–2465. doi: 10.1016/j.cub.2015.08.030

Crossref Full Text | Google Scholar

Dimitrijevic, A., Smith, M. L., Kadis, D. S., and Moore, D. R. (2019). Neural indices of listening effort in noisy environments. Sci. Rep. 9:11278. doi: 10.1038/S41598-019-47643-1,

PubMed Abstract | Crossref Full Text | Google Scholar

Ding, N., Chatterjee, M., and Simon, J. Z. (2014). Robust cortical entrainment to the speech envelope relies on the spectro-temporal fine structure. NeuroImage 88, 41–46. doi: 10.1016/J.NEUROIMAGE.2013.10.054,

PubMed Abstract | Crossref Full Text | Google Scholar

Ding, N., and Simon, J. Z. (2012a). Emergence of neural encoding of auditory objects while listening to competing speakers. Proc. Natl. Acad. Sci. USA 109, 11854–11859. doi: 10.1073/pnas.1205381109,

PubMed Abstract | Crossref Full Text | Google Scholar

Ding, N., and Simon, J. Z. (2012b). Neural coding of continuous speech in auditory cortex during monaural and dichotic listening. J. Neurophysiol. 107, 78–89. doi: 10.1152/jn.00297.2011,

PubMed Abstract | Crossref Full Text | Google Scholar

Ding, N., and Simon, J. Z. (2013). Adaptive temporal encoding leads to a background-insensitive cortical representation of speech. J. Neurosci. 33, 5728–5735. doi: 10.1523/JNEUROSCI.5297-12.2013,

PubMed Abstract | Crossref Full Text | Google Scholar

Drennan, D. P., and Lalor, E. C. (2019). Cortical tracking of complex sound envelopes: modeling the changes in response with intensity. eNeuro 6:eneuro 0082–19-2019. doi: 10.1523/eneuro.0082-19.2019

Crossref Full Text | Google Scholar

Duta, M., and Plunkett, K. (2021). A neural network model of lexical-semantic competition during spoken word recognition. Front. Hum. Neurosci. 15:700281. doi: 10.3389/FNHUM.2021.700281/BIBTEX

Crossref Full Text | Google Scholar

Engell, A. D., and Mccarthy, G. (2014). Repetition suppression of face-selective evoked and induced EEG recorded from human cortex. Hum. Brain Mapp. 35, 4155–4162. doi: 10.1002/HBM.22467,

PubMed Abstract | Crossref Full Text | Google Scholar

Etard, O., and Reichenbach, T. (2019). Neural speech tracking in the theta and in the delta frequency band differentially encode clarity and comprehension of speech in noise. J. Neurosci. 39, 5750–5759. doi: 10.1523/JNEUROSCI.1828-18.2019,

PubMed Abstract | Crossref Full Text | Google Scholar

Farris-Trimble, A., McMurray, B., Cigrand, N., and Bruce Tomblin, J. (2014). The process of spoken word recognition in the face of signal degradation. J. Exp. Psychol. Hum. Percept. Perform. 40, 308–327. doi: 10.1037/A0034353,

PubMed Abstract | Crossref Full Text | Google Scholar

Fernald, A., Swingley, D., and Pinto, J. P. (2001). When half a word is enough: infants can recognize spoken words using partial phonetic information. Child Dev. 72, 1003–1015. doi: 10.1111/1467-8624.00331,

PubMed Abstract | Crossref Full Text | Google Scholar

Fischl, B. (2012). FreeSurfer. NeuroImage 62, 774–781. doi: 10.1016/J.NEUROIMAGE.2012.01.021,

PubMed Abstract | Crossref Full Text | Google Scholar

Frank, S. L., Otten, L. J., Galli, G., and Vigliocco, G. (2015). The ERP response to the amount of information conveyed by words in sentences. Brain Lang. 140, 1–11. doi: 10.1016/j.bandl.2014.10.006,

PubMed Abstract | Crossref Full Text | Google Scholar

Frank, S. L., and Willems, R. M. (2017). Word predictability and semantic similarity show distinct patterns of brain activity during language comprehension. Lang. Cogn. Neurosci. 32, 1192–1203. doi: 10.1080/23273798.2017.1323109

Crossref Full Text | Google Scholar

Friesen, L. M., Shannon, R. V., Baskent, D., and Wang, X. (2001). Speech recognition in noise as a function of the number of spectral channels: comparison of acoustic hearing and cochlear implants. J. Acoust. Soc. Am. 110, 1150–1163. doi: 10.1121/1.1381538,

PubMed Abstract | Crossref Full Text | Google Scholar

Friston, K. (2005). A theory of cortical responses. Philos. Trans. R. Soc. Lond. Ser. B Biol. Sci. 360, 815–836. doi: 10.1098/RSTB.2005.1622,

PubMed Abstract | Crossref Full Text | Google Scholar

Garrido, M. I., Kilner, J. M., Kiebel, S. J., Stephan, K. E., Baldeweg, T., and Friston, K. J. (2009a). Repetition suppression and plasticity in the human brain. NeuroImage 48, 269–279. doi: 10.1016/J.NEUROIMAGE.2009.06.034,

PubMed Abstract | Crossref Full Text | Google Scholar

Garrido, M. I., Kilner, J. M., Stephan, K. E., and Friston, K. J. (2009b). The mismatch negativity: a review of underlying mechanisms. Clin. Neurophysiol. 120, 453–463. doi: 10.1016/J.CLINPH.2008.11.029

Crossref Full Text | Google Scholar

Giraud, A.-L., and Lee, H.-J. (2007). Predicting cochlear implant outcome from brain organisation in the deaf. Restor. Neurol. Neurosci. 25, 381–390. doi: 10.3233/RNN-2007-253420,

PubMed Abstract | Crossref Full Text | Google Scholar

Giraud, A. L., and Truy, E. (2002). The contribution of visual areas to speech comprehension: a PET study in cochlear implants patients and normal-hearing subjects. Neuropsychologia 40, 1562–1569. doi: 10.1016/s0028-3932(02)00023-4,

PubMed Abstract | Crossref Full Text | Google Scholar

Grill-Spector, K., Henson, R., and Martin, A. (2006). Repetition and the brain: neural models of stimulus-specific effects. Trends Cogn. Sci. 10, 14–23. doi: 10.1016/j.tics.2005.11.006,

PubMed Abstract | Crossref Full Text | Google Scholar

Hartwigsen, G., Saur, D., Price, C. J., Ulmer, S., Baumgaertner, A., and Siebner, H. R. (2013). Perturbation of the left inferior frontal gyrus triggers adaptive plasticity in the right homologous area during speech production. Proc. Natl. Acad. Sci. USA 110, 16402–16407. doi: 10.1073/PNAS.1310190110,

PubMed Abstract | Crossref Full Text | Google Scholar

Hauswald, A., Keitel, A., Chen, Y.-P., Rösch, S., and Weisz, N. (2022). Degradation levels of continuous speech affect neural speech tracking and alpha power differently. Eur. J. Neurosci. 55, 3288–3302. doi: 10.1111/ejn.14912,

PubMed Abstract | Crossref Full Text | Google Scholar

Heilbron, M., Armeni, K., Schoffelen, J. M., Hagoort, P., and De Lange, F. P. (2022). A hierarchy of linguistic predictions during natural language comprehension. Proc. Natl. Acad. Sci. USA 119:e2201968119. doi: 10.1073/PNAS.2201968119,

PubMed Abstract | Crossref Full Text | Google Scholar

Heilbron, M., and Chait, M. (2018). Great expectations: is there evidence for predictive coding in auditory cortex? Neuroscience 389, 54–73. doi: 10.1016/J.NEUROSCIENCE.2017.07.061,

PubMed Abstract | Crossref Full Text | Google Scholar

Huettig, F., Rommers, J., and Meyer, A. S. (2011). Using the visual world paradigm to study language processing: a review and critical evaluation. Acta Psychol. 137, 151–171. doi: 10.1016/J.ACTPSY.2010.11.003,

PubMed Abstract | Crossref Full Text | Google Scholar

Jafarpisheh, A. S., Jafari, A. H., Abolhassani, M., Farhadi, M., Sadjedi, H., Pourbakht, A., et al. (2016). Nonlinear feature extraction for objective classification of complex auditory brainstem responses to diotic perceptually critical consonant-vowel syllables. Auris Nasus Larynx 43, 37–44. doi: 10.1016/j.anl.2015.06.003,

PubMed Abstract | Crossref Full Text | Google Scholar

Jang, H., Lee, J., Lim, D., Lee, K., Jeon, A., and Jung, E. (2008). Development of Korean standard sentence lists for sentence recognition tests. Audiol 4, 161–177. doi: 10.21848/audiol.2008.4.2.161

Crossref Full Text | Google Scholar

Karunathilake, I. M. D., Dunlap, J. L., Perera, J., Presacco, A., Decruy, L., Anderson, S., et al. (2023a). Effects of aging on cortical representations of continuous speech. J. Neurophysiol. 129, 1359–1377. doi: 10.1152/JN.00356.2022

Crossref Full Text | Google Scholar

Karunathilake, I. M. D., Kulasingham, J. P., and Simon, J. Z. (2023b). Neural tracking measures of speech intelligibility: manipulating intelligibility while keeping acoustics unchanged. Proc. Natl. Acad. Sci. 120:e2309166120. doi: 10.1073/PNAS.2309166120,

PubMed Abstract | Crossref Full Text | Google Scholar

Kayser, S. J., Ince, R. A. A., Gross, J., and Kayser, C. (2015). Irregular speech rate dissociates auditory cortical entrainment, evoked responses, and frontal alpha. J. Neurosci. 35, 14691–14701. doi: 10.1523/JNEUROSCI.2243-15.2015,

PubMed Abstract | Crossref Full Text | Google Scholar

Kegler, M., Weissbart, H., and Reichenbach, T. (2022). The neural response at the fundamental frequency of speech is modulated by word-level acoustic and linguistic information. Front. Neurosci. 16:915744. doi: 10.3389/FNINS.2022.915744/BIBTEX

Crossref Full Text | Google Scholar

Khalighinejad, B., Cruzatto da Silva, G., and Mesgarani, N. (2017). Dynamic encoding of acoustic features in neural responses to continuous speech. J. Neurosci. 37, 2176–2185. doi: 10.1523/JNEUROSCI.2383-16.2017,

PubMed Abstract | Crossref Full Text | Google Scholar

Kong, Y. Y., Mullangi, A., and Ding, N. (2014). Differential modulation of auditory responses to attended and unattended speech in different listening conditions. Hear. Res. 316, 73–81. doi: 10.1016/J.HEARES.2014.07.009

Crossref Full Text | Google Scholar

Kösem, A., Dai, B., McQueen, J. M., and Hagoort, P. (2023). Neural tracking of speech envelope does not unequivocally reflect intelligibility. NeuroImage 272:120040. doi: 10.1016/j.neuroimage.2023.120040

Crossref Full Text | Google Scholar

Koskinen, M., Kurimo, M., Gross, J., Hyvärinen, A., and Hari, R. (2020). Brain activity reflects the predictability of word sequences in listened continuous speech. NeuroImage 219:116936. doi: 10.1016/j.neuroimage.2020.116936

Crossref Full Text | Google Scholar

Krumbholz, K., Patterson, R. D., Seither-Preisler, A., Lammertmann, C., and Lütkenhöner, B. (2003). Neuromagnetic evidence for a pitch processing center in Heschl’s gyrus. Cereb. Cortex 13, 765–772. doi: 10.1093/cercor/13.7.765,

PubMed Abstract | Crossref Full Text | Google Scholar

Kutas, M., and Federmeier, K. D. (2011). Thirty years and counting: finding meaning in the N400 component of the event-related brain potential (ERP). Annu. Rev. Psychol. 62, 621–647. doi: 10.1146/annurev.psych.093008.131123,

PubMed Abstract | Crossref Full Text | Google Scholar

Lai, J., Price, C. N., and Bidelman, G. M. (2022). Brainstem speech encoding is dynamically shaped online by fluctuations in cortical α state. NeuroImage 263:119627. doi: 10.1016/J.NEUROIMAGE.2022.119627,

PubMed Abstract | Crossref Full Text | Google Scholar

Lalor, E. C., and Foxe, J. J. (2010). Neural responses to uninterrupted natural speech can be extracted with precise temporal resolution. Eur. J. Neurosci. 31, 189–193. doi: 10.1111/j.1460-9568.2009.07055.x,

PubMed Abstract | Crossref Full Text | Google Scholar

Langner, F., Arenberg, J. G., Büchner, A., and Nogueira, W. (2021). Assessing the relationship between neural health measures and speech performance with simultaneous electric stimulation in cochlear implant listeners. PLoS One 16:e0261295. doi: 10.1371/JOURNAL.PONE.0261295,

PubMed Abstract | Crossref Full Text | Google Scholar

Lau, E. F., Phillips, C., and Poeppel, D. (2008). A cortical network for semantics:(de) constructing the N400. Nat. Rev. Neurosci. 9, 920–933. doi: 10.1038/nrn2532,

PubMed Abstract | Crossref Full Text | Google Scholar

Mai, G., and Wang, W. S. Y. (2019). Delta and theta neural entrainment during phonological and semantic processing in speech perception. bioRxiv :556837. doi: 10.1101/556837

Crossref Full Text | Google Scholar

Martin, B. A., Tremblay, K. L., and Korczak, P. (2008). Speech evoked potentials: from the laboratory to the clinic. Ear Hear. 29, 285–313. doi: 10.1097/AUD.0B013E3181662C0E,

PubMed Abstract | Crossref Full Text | Google Scholar

McGettigan, C., Agnew, Z. K., and Scott, S. K. (2010). Are articulatory commands automatically and involuntarily activated during speech perception? Proc. Natl. Acad. Sci. 107, E42–E42. doi: 10.1073/pnas.1000186107

Crossref Full Text | Google Scholar

McMurray, B., Farris-Trimble, A., and Rigler, H. (2017). Waiting for lexical access: Cochlear implants or severely degraded input lead listeners to process speech less incrementally. Cognition 169, 147–164. doi: 10.1016/J.COGNITION.2017.08.013,

PubMed Abstract | Crossref Full Text | Google Scholar

McQueen, J. M., Norris, D., and Cutler, A. (1999). Lexical influence in phonetic decision making: evidence from subcategorical mismatches. J. Exp. Psychol. Hum. Percept. Perform. 25:1363. doi: 10.1037/0096-1523.25.5.1363

Crossref Full Text | Google Scholar

Mehta, A. H., and Oxenham, A. J. (2017). Vocoder simulations explain complex pitch perception limitations experienced by cochlear implant users. J. Assoc. Res. Otolaryngol. 18, 789–802. doi: 10.1007/s10162-017-0632-x,

PubMed Abstract | Crossref Full Text | Google Scholar

Muncke, J., Kuruvila, I., and Hoppe, U. (2022). Prediction of speech intelligibility by means of EEG responses to sentences in noise. Front. Neurosci. 16:876421. doi: 10.3389/FNINS.2022.876421/BIBTEX

Crossref Full Text | Google Scholar

Näätänen, R. (2001). The perception of speech sounds by the human brain as reflected by the mismatch negativity (MMN) and its magnetic equivalent (MMNm). Psychophysiology 38, 1–21. doi: 10.1111/1469-8986.3810001,

PubMed Abstract | Crossref Full Text | Google Scholar

Norris, D., McQueen, J. M., and Cutler, A. (2016). Prediction, Bayesian inference and feedback in speech recognition. Lang. Cogn. Neurosci. 31, 4–18. doi: 10.1080/23273798.2015.1081703,

PubMed Abstract | Crossref Full Text | Google Scholar

O’Sullivan, J. A., Power, A. J., Mesgarani, N., Rajaram, S., Foxe, J. J., Shinn-Cunningham, B. G., et al. (2015). Attentional selection in a cocktail party environment can be decoded from single-trial EEG. Cereb. Cortex 25, 1697–1706. doi: 10.1093/cercor/bht355

Crossref Full Text | Google Scholar

Obleser, J., and Kotz, S. A. (2010). Expectancy constraints in degraded speech modulate the language comprehension network. Cereb. Cortex 20, 633–640. doi: 10.1093/CERCOR/BHP128,

PubMed Abstract | Crossref Full Text | Google Scholar

Obleser, J., Scott, S. K., and Eulitz, C. (2006). Now you hear it, now you don’t: transient traces of consonants and their nonspeech analogues in the human brain. Cereb. Cortex 16, 1069–1076. doi: 10.1093/cercor/bhj047,

PubMed Abstract | Crossref Full Text | Google Scholar

Obleser, J., Wise, R. J. S., Dresner, M. A., and Scott, S. K. (2007). Functional integration across brain regions improves speech perception under adverse listening conditions. J. Neurosci. 27, 2283–2289. doi: 10.1523/JNEUROSCI.4663-06.2007,

PubMed Abstract | Crossref Full Text | Google Scholar

Okahara, Y., Aoyagi, K., Iwasa, H., and Higuchi, Y. (2024). Language lateralization by passive auditory fMRI in presurgical assessment for temporal lobe epilepsy: a single-center retrospective study. J. Clin. Med. 13, 1706–1706. doi: 10.3390/JCM13061706

Crossref Full Text | Google Scholar

Peelle, J. E., Gross, J., and Davis, M. H. (2013). Phase-locked responses to speech in human auditory cortex are enhanced during comprehension. Cereb. Cortex 23, 1378–1387. doi: 10.1093/CERCOR/BHS118,

PubMed Abstract | Crossref Full Text | Google Scholar

Peelle, J. E., Johnsrude, I. S., and Davis, M. H. (2010). Hierarchical processing for speech in human auditory cortex and beyond. Front. Hum. Neurosci. 4:51. doi: 10.3389/FNHUM.2010.00051/FULL

Crossref Full Text | Google Scholar

Peelle, J. E., and Wingfield, A. (2016). The neural consequences of age-related hearing loss. Trends Neurosci. 39, 486–497. doi: 10.1016/j.tins.2016.05.001,

PubMed Abstract | Crossref Full Text | Google Scholar

Peelle, J., and Wingfield, A. (2022). How our brains make sense of noisy speech. Acoust. Today 18, 40–48. doi: 10.1121/AT.2022.18.3.40

Crossref Full Text | Google Scholar

Pendyala, V., Sethares, W., and Easwar, V. (2024). Assessing speech audibility via syllabic-rate neural responses in adults and children with and without hearing loss. Trends Hear. 28:23312165241227815. doi: 10.1177/23312165241227815,

PubMed Abstract | Crossref Full Text | Google Scholar

Phillips, C., Pellathy, T., Marantz, A., Yellin, E., Wexler, K., Poeppel, D., et al. (2000). Auditory cortex accesses phonological categories: an MEG mismatch study. J. Cogn. Neurosci. 12, 1038–1055. doi: 10.1162/08989290051137567,

PubMed Abstract | Crossref Full Text | Google Scholar

Pivik, R. T., and Harman, K. (1995). A reconceptualization of EEG alpha activity as an index of arousal during sleep: all alpha activity is not equal. J. Sleep Res. 4, 131–137. doi: 10.1111/J.1365-2869.1995.TB00161.X,

PubMed Abstract | Crossref Full Text | Google Scholar

Presacco, A., Simon, J. Z., and Anderson, S. (2016). Evidence of degraded representation of speech in noise, in the aging midbrain and cortex. J. Neurophysiol. 116, 2346–2355. doi: 10.1152/jn.00372.2016,

PubMed Abstract | Crossref Full Text | Google Scholar

Prince, P., Paul, B. T., Chen, J., Le, T., Lin, V., and Dimitrijevic, A. (2021). Neural correlates of visual stimulus encoding and verbal working memory differ between cochlear implant users and normal-hearing controls. Eur. J. Neurosci. 54, 5016–5037. doi: 10.1111/EJN.15365,

PubMed Abstract | Crossref Full Text | Google Scholar

Pulvermüller, F., Shtyrov, Y., Hasting, A. S., and Carlyon, R. P. (2008). Syntax as a reflex: neurophysiological evidence for early automaticity of grammatical processing. Brain Lang. 104, 244–253. doi: 10.1016/J.BANDL.2007.05.002,

PubMed Abstract | Crossref Full Text | Google Scholar

Rao, R. P. N., and Ballard, D. H. (1999). Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nat. Neurosci. 2, 79–87. doi: 10.1038/4580,

PubMed Abstract | Crossref Full Text | Google Scholar

Rigler, H., Farris-Trimble, A., Greiner, L., Walker, J., Tomblin, J. B., and McMurray, B. (2015). The slow developmental time course of real-time spoken word recognition. Dev. Psychol. 51:1690. doi: 10.1037/dev0000044,

PubMed Abstract | Crossref Full Text | Google Scholar

Rosen, S., Souza, P., Ekelund, C., and Majeed, A. A. (2013). Listening to speech in a background of other talkers: effects of talker number and noise vocoding. J. Acoust. Soc. Am. 133, 2431–2443. doi: 10.1121/1.4794379,

PubMed Abstract | Crossref Full Text | Google Scholar

Sedley, W., Gander, P. E., Kumar, S., Kovach, C. K., Oya, H., Kawasaki, H., et al. (2016). Neural signatures of perceptual inference. eLife 5:e11476. doi: 10.7554/ELIFE.11476,

PubMed Abstract | Crossref Full Text | Google Scholar

Sekerina, I. A., and Brooks, P. J. (2007). Eye movements during spoken word recognition in Russian children. J. Exp. Child Psychol. 98, 20–45. doi: 10.1016/j.jecp.2007.04.005,

PubMed Abstract | Crossref Full Text | Google Scholar

Sharma, A., Dorman, M. F., and Spahr, A. J. (2002a). A sensitive period for the development of the central auditory system in children with cochlear implants: implications for age of implantation. Ear Hear. 23, 532–539. doi: 10.1097/00003446-200212000-00004,

PubMed Abstract | Crossref Full Text | Google Scholar

Sharma, A., Dorman, M. F., and Spahr, A. J. (2002b). Rapid development of cortical auditory evoked potentials after early cochlear implantation. Neuroreport 13, 1365–1368. doi: 10.1097/00001756-200207190-00030,

PubMed Abstract | Crossref Full Text | Google Scholar

Slaats, S., Weissbart, H., Schoffelen, J. M., Meyer, A. S., and Martin, A. E. (2023). Delta-band neural responses to individual words are modulated by sentence processing. J. Neurosci. 43, 4867–4883. doi: 10.1523/JNEUROSCI.0964-22.2023,

PubMed Abstract | Crossref Full Text | Google Scholar

Sohoglu, E., and Davis, M. H. (2016). Perceptual learning of degraded speech by minimizing prediction error. Proc. Natl. Acad. Sci. USA 113, E1747–E1756. doi: 10.1073/PNAS.1523266113,

PubMed Abstract | Crossref Full Text | Google Scholar

Steinmetzger, K., and Rosen, S. (2017). Effects of acoustic periodicity and intelligibility on the neural oscillations in response to speech. Neuropsychologia 95, 173–181. doi: 10.1016/j.neuropsychologia.2016.12.003,

PubMed Abstract | Crossref Full Text | Google Scholar

Swingley, D., Pinto, J. P., and Fernald, A. (1999). Continuous processing in word recognition at 24 months. Cognition 71, 73–108. doi: 10.1016/s0010-0277(99)00021-9,

PubMed Abstract | Crossref Full Text | Google Scholar

Synigal, S. R., Teoh, E. S., and Lalor, E. C. (2020). Including measures of high gamma power can improve the decoding of natural speech from EEG. Front. Hum. Neurosci. 14:130. doi: 10.3389/fnhum.2020.00130,

PubMed Abstract | Crossref Full Text | Google Scholar

Teghipco, A., Okada, K., Murphy, E., and Hickok, G. (2023). Predictive coding and internal error correction in speech production. Neurobiol. Lang. 4, 81–119. doi: 10.1162/NOL_A_00088

Crossref Full Text | Google Scholar

Thompson-Schill, S. L., D’Esposito, M., Aguirre, G. K., and Farah, M. J. (1997). Role of left inferior prefrontal cortex in retrieval of semantic knowledge: a reevaluation. Proc. Natl. Acad. Sci. 94, 14792–14797. doi: 10.1073/PNAS.94.26.14792,

PubMed Abstract | Crossref Full Text | Google Scholar

Todorovic, A., van Ede, F., Maris, E., and de Lange, F. P. (2011). Prior expectation mediates neural adaptation to repeated sounds in the auditory cortex: an MEG study. J. Neurosci. 31, 9118–9123. doi: 10.1523/JNEUROSCI.1425-11.2011,

PubMed Abstract | Crossref Full Text | Google Scholar

Toffolo, K. K., Freedman, E. G., and Foxe, J. J. (2022). Evoking the N400 event-related potential (ERP) component using a publicly available novel set of sentences with semantically incongruent or congruent eggplants (endings). Neuroscience 501, 143–158. doi: 10.1016/J.NEUROSCIENCE.2022.07.030

Crossref Full Text | Google Scholar

Tremblay, K. L., Friesen, L., Martin, B. A., and Wright, R. (2003). Test-retest reliability of cortical evoked potentials using naturally produced speech sounds. Ear Hear. 24, 225–232. doi: 10.1097/01.aud.0000069229.84883.03,

PubMed Abstract | Crossref Full Text | Google Scholar

Vanthornhout, J., Decruy, L., and Francart, T. (2019). Effect of task and attention on neural tracking of speech. Front. Neurosci. 13:977. doi: 10.3389/FNINS.2019.00977/BIBTEX

Crossref Full Text | Google Scholar

Wacongne, C., Labyt, E., Van Wassenhove, V., Bekinschtein, T., Naccache, L., and Dehaene, S. (2011). Evidence for a hierarchy of predictions and prediction errors in human cortex. Proc. Natl. Acad. Sci. USA 108, 20754–20759. doi: 10.1073/PNAS.1117807108,

PubMed Abstract | Crossref Full Text | Google Scholar

Weissbart, H., Kandylaki, K. D., and Reichenbach, T. (2020). Cortical tracking of surprisal during continuous speech comprehension. J. Cogn. Neurosci. 32, 155–166. doi: 10.1162/jocn_a_01467,

PubMed Abstract | Crossref Full Text | Google Scholar

Wilson, B. S., and Dorman, M. F. (2008). Cochlear implants: a remarkable past and a brilliant future. Hear. Res. 242, 3–21. doi: 10.1016/j.heares.2008.06.005,

PubMed Abstract | Crossref Full Text | Google Scholar

Wilson, B. S., Finley, C. C., Lawson, D. T., Wolford, R. D., Eddington, D. K., and Rabinowitz, W. M. (1991). Better speech recognition with cochlear implants. Nature 352, 236–238. doi: 10.1038/352236A0;KWRD=SCIENCE

Crossref Full Text | Google Scholar

Yang, T., Kurkela, J. L. O., Chen, K., Liu, Y., Shu, H., Cong, F., et al. (2024). Native language advantage in electrical brain responses to speech sound changes in passive and active listening condition. Neuropsychologia 201:108936. doi: 10.1016/J.NEUROPSYCHOLOGIA.2024.108936,

PubMed Abstract | Crossref Full Text | Google Scholar

Yip, M. C. W., Blumenfeld, H. K., and Cieślicka, A. B. (2021). Bilingual and multilingual spoken-word recognition: empirical and theoretical perspectives. Front. Psychol. 12:696354. doi: 10.3389/FPSYG.2021.696354,

PubMed Abstract | Crossref Full Text | Google Scholar

Keywords: compensatory neural processing, degraded speech, neural tracking, speech perception, temporal response functions

Citation: Na Y, Quan LDA, Joo H, Choi I and Woo J (2026) Neural tracking of continuous speech reveals enhanced late responses to degraded speech. Front. Neurosci. 20:1751421. doi: 10.3389/fnins.2026.1751421

Received: 21 November 2025; Revised: 17 January 2026; Accepted: 21 January 2026;
Published: 06 February 2026.

Edited by:

Claude Alain, Rotman Research Institute (RRI), Canada

Reviewed by:

Jesyin Lai, St. Jude Children’s Research Hospital, United States
Mareike Daeglau, University of Oldenburg, Germany

Copyright © 2026 Na, Quan, Joo, Choi and Woo. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Jihwan Woo, amh3b29AdWxzYW4uYWMua3I=

These authors have contributed equally to this work

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.