Altered neural encoding of vowels in noise does not affect behavioral vowel discrimination in gerbils with age-related hearing loss

Introduction Understanding speech in a noisy environment, as opposed to speech in quiet, becomes increasingly more difficult with increasing age. Using the quiet-aged gerbil, we studied the effects of aging on speech-in-noise processing. Specifically, behavioral vowel discrimination and the encoding of these vowels by single auditory-nerve fibers were compared, to elucidate some of the underlying mechanisms of age-related speech-in-noise perception deficits. Methods Young-adult and quiet-aged Mongolian gerbils, of either sex, were trained to discriminate a deviant naturally-spoken vowel in a sequence of vowel standards against a speech-like background noise. In addition, we recorded responses from single auditory-nerve fibers of young-adult and quiet-aged gerbils while presenting the same speech stimuli. Results Behavioral vowel discrimination was not significantly affected by aging. For both young-adult and quiet-aged gerbils, the behavioral discrimination between /eː/ and /iː/ was more difficult to make than /eː/ vs. /aː/ or /iː/ vs. /aː/, as evidenced by longer response times and lower d’ values. In young-adults, spike timing-based vowel discrimination agreed with the behavioral vowel discrimination, while in quiet-aged gerbils it did not. Paradoxically, discrimination between vowels based on temporal responses was enhanced in aged gerbils for all vowel comparisons. Representation schemes, based on the spectrum of the inter-spike interval histogram, revealed stronger encoding of both the fundamental and the lower formant frequencies in fibers of quiet-aged gerbils, but no qualitative changes in vowel encoding. Elevated thresholds in combination with a fixed stimulus level, i.e., lower sensation levels of the stimuli for old individuals, can explain the enhanced temporal coding of the vowels in noise. Discussion These results suggest that the altered auditory-nerve discrimination metrics in old gerbils may mask age-related deterioration in the central (auditory) system to the extent that behavioral vowel discrimination matches that of the young adults.


Introduction
As we age, our auditory system inevitably degrades.By the age of 70, the number of sensory hair cells and auditory nerve (AN) fibers is approximately half of what it was at birth (Wu et al., 2020) and 63% of people over 70 experience some degree of hearing loss (Lin et al., 2011).One of the major consequences of age-related hearing loss is difficulty with communication, putting the elderly at risk for social isolation and depression.Even in elderly with clinically mild hearing loss, almost 60% reported communication difficulties (Dalton et al., 2003).In particular, understanding speech in a noisy environment, as opposed to speech in quiet, is strongly degraded with increasing age (Dubno et al., 1984;Fogerty et al., 2012;Füllgrabe et al., 2015).Therefore, it is crucial to understand the underlying mechanisms for age-related deficits in speech-in-noise perception.
There are multiple levels at which communication difficulties can be studied, ranging from the inability to distinguish between different meaningless syllables (logatomes) up to increased cognitive effort while listening to speech in noise.Here, we focus on the fundamental discrimination between vowels.Vowels differ from each other at the spectral level.Specifically, different vowels have different formant frequencies (f 1 -f 3 ), which are global regions of maximum energy in the vowel spectrum (Diehl, 2008).Especially f 1 and f 2 were shown to be important for the discrimination of different vowels, both in gerbils and in humans (Peterson and Barney, 1952;Jüchter et al., 2022).The spectrum of spoken vowels, as opposed to that of whispered vowels, also comprises harmonics of the fundamental frequency (f 0 ), the pitch of the speaker's voice, which does not systematically differ between vowels (Diehl, 2008).
When presenting vowels in noise, both the f 0 and the formant frequencies are encoded by the AN in the exact spike timing, given the signal-to-noise ratio (SNR) is sufficiently high.This can be studied in recordings from single fibers of the AN and visualized in representation schemes based on metrics of spike timing, such as the averaged localized interval rate (ALIR) and the dominant component scheme (Young and Sachs, 1979;Sachs and Young, 1980;Delgutte and Kiang, 1984a).Not only for synthetic vowels in quiet, but also for naturally-spoken vowels presented in background noise do the formant frequencies appear in these spike-timing based representation schemes (Heeringa and Köppl, 2022).Specifically, inter-spike interval histograms were constructed to account for small changes in periodicity of f 0 over time, which is common for naturally spoken vowels.Here, we investigate if and how these representation schemes are affected by age-related hearing loss in the gerbil.
The quiet-aged gerbil is an attractive model for studying the pathology of age-related deficits in speech-in-noise perception and has often been used to study age-related hearing loss.Therefore, much is known about its age-related cochlear and central auditory pathologies (e.g., Schmiedt, 2010;Radtke-Schuller et al., 2015;Kessler et al., 2020).Furthermore, the gerbil has good low-frequency hearing, similar to humans, and important to translate speech encoding and perception, for which spectral energy at low frequencies is fundamental (Ryan, 1976;DePaolis et al., 1996).In addition, gerbils can easily be trained to indicate their behavioral discrimination between different logatomes (Sinnott et al., 1997;Sinnott and Mosteller, 2001).Importantly, young, normal-hearing gerbils experience the same confusions between different vowels as humans, although they need higher SNRs for equivalent performance (Jüchter et al., 2022).
Our previous studies on behavioral vowel discrimination in noise by young-adult gerbils revealed which vowels were difficult and which were easy to discriminate from each other (Jüchter et al., 2022).Furthermore, neural vowel discrimination at the level of a single AN fiber agreed with behavioral discrimination when applying a spike timing-based discrimination metric (Heeringa and Köppl, 2022).Here, our aim was to study single auditory nerve fiber and behavioral vowel discrimination in quiet-aged gerbils, to elucidate some of the underlying mechanisms of the age-related deficit in speech-in-noise perception.We showed that mild age-related hearing loss provided an advantage in terms of vowel discrimination at the level of AN spiking patterns, while behavioral discrimination abilities for vowels in noise were not significantly affected in quiet-aged gerbils.

Animals
For the neurophysiological part of this study, seven youngadult (three female) and five quiet-aged (one female) Mongolian gerbils (Meriones unguiculatus) were used.Animals weighed between 63 and 120 grams, were of either sex, and were free of middle-ear infections.Young-adult gerbils were aged 3.5-6.0months and quiet-aged gerbils were aged 36.7-41.4months.Hearing thresholds were determined by measuring auditory brainstem responses (ABRs).Only animals with an ABR threshold to chirps <60 dB sound pressure level (SPL) were included, thereby excluding several quiet-aged gerbils with severe age-related hearing loss.This criterion was necessary to make sure all animals could process the speech stimuli, which had a fixed stimulus SPL (see below).Data of six out of seven young-adult gerbils were published in a previous report (Heeringa and Köppl, 2022).
For the behavioral experiments, a different group of nine youngadult (four female) and ten quiet-aged gerbils (three female) was used, with weights between 60 and 87 grams.The age range for the youngadult gerbils was 4-9 months and for the quiet-aged gerbils 33-43 months, at the time of data collection.Binaural ABR thresholds to clicks were < 52 dB peak equivalent SPL (dB pe SPL) for the youngadult gerbils and < 69 dB pe SPL for the quiet-aged gerbils, measured before the training started or during the behavioral testing.Behavioral data of four out of nine young-adult gerbils were published previously (Jüchter et al., 2022).
All animals were born at the University of Oldenburg animal house and were group housed in a controlled, quiet environment (average sound levels L Aeq of 48 and 55 dB SPL outside and during working hours, respectively) to minimize noise-induced hearing loss.All experimental procedures were reviewed and approved by the ethics authorities of Lower Saxony, Germany, under permit numbers AZ 33.19-42502-04-15/1990, AZ 33.19-42502-04-21/3695, and AZ 33.19-42502-04-21/3821.

The auditory brainstem response (ABR)
For the neurophysiological experiments, ABRs were used to determine hearing thresholds before recording from single-unit AN fibers and to monitor cochlear health during the ongoing experiment.ABRs were recorded during the presentation of chirps (0.3-19 kHz, 4.2 ms duration).Stimulus levels were typically separated by 5 dB and were presented randomly (200-400 repetitions per level).Acoustic stimuli were generated and calibrated using custom-written software in MATLAB (MathWorks) and an external audio card (Hammerfall DSP Multiface II, RME Audio; 48 kHz sampling rate).After preamplification (HB7, TDT), stimuli were presented through a small speaker (IE 800, Sennheiser) that was sealed into the ear bar.After each (re)placement of the ear bar, a calibration file was obtained by measuring SPL near the eardrum with a miniature microphone (ER7-C, Etymotic Research) sealed in the same ear bar, amplified by a microphone amplifier (MA3, TDT).To record ABRs, one platinum needle electrode was placed subcutaneously ventral of the ear canal, and one was placed in the ipsilateral neck muscle.ABRs were amplified (10,000 times) and bandpass filtered (0.3-3 kHz) using an amplifier (ISO 80, World Precision Instruments), and recorded using the same Hammerfall audio card (48 kHz sampling rate) controlled by custom-written software (MATLAB).ABR thresholds were defined as the lowest sound level at which clear ABR waves were still visually distinguishable, and at which wave I had an amplitude of 4 μV or higher.
For the behavioral experiments, ABRs were measured once to make sure the animal was able to hear the stimuli needed for training.Recordings of the ABRs were obtained similarly as described above, with a few exceptions.ABRs were recorded during the presentation of clicks (0.2-15 kHz, 40 μs duration) with 10-dB level steps.No microphone amplifier was used.Furthermore, the stainless-steel needle recording and reference electrodes were placed subcutaneously at the vertex and in the neck on the midline, respectively.Finally, ABR thresholds were defined using custom MATLAB software written by one of the authors (R.B.) using the procedure described in Suthakar and Liberman (2019), which was visually cross-checked for each threshold.All animals had click ABR thresholds <70 dB pe SPL and proceeded to the behavioral training.

Surgical procedures
Gerbils were anesthetized with 135 mg/kg ketamine (Ketamidor, WDT) and 6 mg/kg xylazine (Xylazin, Serumwerk) diluted in saline (0.9% NaCl) and injected intraperitoneally.Anesthetic depth was maintained by supplementary subcutaneous ketamine/xylazine injections.One-third of the initial dose was injected hourly, and one-sixth of the initial dose was provided upon a positive hind paw reflex.Additionally, three out of five quiet-aged gerbils received a preemptive injection of the antiphlogistic agent meloxicam (Metacam, Boehringer Ingelheim; 1 mg/kg).Two out of five quiet-aged gerbils were tracheotomized but breathed unaided.Additional oxygen was provided (1.5 L/min) in front of the mouth or tracheotomy tube throughout the experiment to all animals.Heartbeat, heart rate, muscle potentials, and breathing rate were constantly monitored through electrocardiogram recordings using intramuscular needle electrodes in the front and contralateral hind leg.Body temperature was maintained at 38°C using a rectal probe and a homeothermic blanket (Harvard Apparatus).
The skull of the animal was fixed in a bite bar (Kopf Instruments), with the head mount glued to the exposed frontal skull using dental cement.The right bony ear canal was exposed by removing the pinna.The ear bar, which contained the speaker (IE 800, Sennheiser) and miniature microphone (ER7-C, Etymotic Research), was sealed to the ear canal using petroleum jelly.A small opening was made in the dorsallateral bulla to prevent negative pressure buildup in the middle ear.
The auditory nerve was visualized using a dorsal approach as follows.The bony structures overlying the right cerebellum, including parts of the occipital, parietal, and temporal bone, were removed carefully.After a duratomy, the cerebellum was partially aspirated until the brainstem overlying the AN was exposed.The AN was visualized by placing one or two small balls of paper tissue (< 0.5 mm diameter), drenched in saline, between the brainstem and the temporal bone.

Single-unit recordings
Glass micropipette electrodes (BF120F-10, Science products) were pulled on a P-2000 puller (Sutter Instruments) and filled with 3 M KCl.Impedances were typically between 20 and 40 MΩ.Electrodes were mounted into an electrode holder that was attached to an inchworm motor controller (6000 ULN, Burleigh), which was held by a micromanipulator (Märzhäuser).Under visual control and using the micromanipulator, electrodes were placed just above the AN fiber bundle.Electrodes were advanced through the AN bundle in small steps (1-5 μm; 6005 ULN handset, Burleigh).Meanwhile, a broadband-noise search stimulus (50-70 dB SPL) was played through the speaker.Signals recorded from the glass electrode were amplified (WPI 767), filtered for line-frequency noise (50/60 Hz; Hum Bug, Quest Scientific), made audible through a speaker (MS2, TDT), visualized on an oscilloscope (SDS 1102CNL, SIGLENT Technologies), and digitized (RX6, TDT; 48,828 Hz sampling rate) before being displayed on a personal computer using custom-written MATLAB software.
After a single unit was isolated, a quick audiovisual estimate of the fiber's best frequency (BF) and threshold was obtained.Tone bursts at around 10 dB above the estimated threshold (50 ms ON -190 ms OFF time, 5 ms cosine rise/fall times, 5 repetitions, 10 to 20 linear frequency steps spanning approximately 1.5 octave) were presented to determine the unit's BF from a frequency-response curve.Subsequently, tone bursts at BF were presented at a range of levels (3 dB step size, 10 repetitions) to determine the unit's threshold from a rate-level function.After recording responses to the consonant-vowelconsonant (CVC) stimuli, as described below, 24 s of neural activity in the absence of acoustic stimuli were recorded to determine the unit's spontaneous rate (SR).When this recording was not available (n = 103/194 of units), SR was estimated from the randomly inserted silent trials of the rate-level-function recordings (total duration of 800 ms).Furthermore, clicks were presented (97 dB pe SPL, 5 ms delay, 20 ms acquisition time, condensation click, 300 repetitions) to determine response latency.In a small subset of units (n = 4 in youngadult and n = 4 in quiet-aged gerbils), responses to amplitudemodulated tones were collected (128 Hz modulation frequency, 100% modulation depth, carrier frequency at BF or one octave above BF, 2.5 s duration, 20 ms delay, 20 ms cos 2 rise and fall time, levels ranging from 10-80 dB SPL, 10 repetitions per level).All stimuli, except for the clicks, were calibrated using custom-made MATLAB software.

Consonant-vowel-consonant stimuli
CVCs derived from the Oldenburg Logatome speech corpus (OLLO) database (Meyer et al., 2010).Vowels were selected based on the outcomes of a behavioral study in young-adult gerbils of the same colony that had demonstrated common and uncommon confusions between all vowels available in the database (Jüchter et al., 2022).Three vowels were chosen from that study: two that were difficult to discriminate from each other (/eː/ vs. /iː/) and one that was easy to discriminate from the other two (/aː/).The outer consonant was fixed at /b/, resulting in the following spoken logatomes: 'behb' , 'bieb' , and 'bahb' .Logatomes were spoken by a native German, female speaker (code S01F in the OLLO database), who had an average f 0 of 259 Hz across the three logatomes.Formant frequencies of these three vowels are listed in Table 1 and were determined as follows.Center phonemes of the logatome stimuli were cut out from the full signal at the AN response analysis window.These cutout signals were filtered with a single pole pre-emphasis filter, which flattens the overall spectral shape, that is, approximates a white noise more closely.This is a prerequisite for the subsequent formant extraction step, which is performed by linear prediction (MATLAB, lpc function, 16th order).The resulting filter pole center frequencies and bandwidths were chosen by a set of criteria: f 2 must be greater than f 1 , f 1 , and f 2 have to lie within the known general limits for f 1 and f 2 , respectively, of all vowels, and the bandwidths have to be within reasonable limits.F 0 as well as the drift of f 0 are also specified in Table 1.Average f 0 was extracted using the maximum of the autocorrelation function of the full cutout phoneme, and instantaneous f 0 was extracted using the (discrete) derivative of the instantaneous complex phase of the bandpass (50-400 Hz) filtered signal.The phase was calculated with the help of a Hilbert transform.
Acquisition duration of one trial with a logatome was 1.15 s, logatome delay was 0.125 s, average length of the logatomes was 0.57 s, and each logatome was presented 60 times.Logatomes were presented at 65 dB SPL against a speech-shaped background noise (ICRA1) derived from the ICRA database (Dreschler et al., 2001) at 5 dB SNR.One excerpt from the ICRA1 noise (duration of 1.15 s) was used as frozen noise background for all stimulus presentations during single unit recordings.Thus, the background noise cannot be a factor explaining variability between responses.Furthermore, the putatively detrimental effects of background noise cannot be averaged out when the noise is frozen between repetitions.We found previously that having a different token of the ICRA1 background noise does not change spike timing-based vowel discrimination in AN fiber responses of young-adult gerbils (Heeringa and Köppl, 2022), suggesting that this particular frozen noise token did not strongly affect the outcomes of the analyses.Noise onset and offset at the beginning and end of the recording time were ramped (10-ms cos 2 ramps).

Single-unit characterization
All neural signals were bandpass filtered (300-3,000 Hz) and revisited offline for spike detection threshold on a trial-by-trial basis.To determine BF more accurately, rate-frequency curves were fitted using a smoothing spline function.BF was defined by the location of the peak of this fitted smoothing spline function along the frequency axis.Threshold was determined from the rate-level function, defined as the tone level evoking a firing rate larger than mean (SR) + 1.2*std (SR) and at least 15 spikes/s above the mean SR.SR was determined from the 24 s long recording in silence or from the randomly presented silent trials throughout the tone presentations for the rate-level function at BF, which had a total length of 800 ms.The SR was used to separate the data between low-and high-SR fibers, with the cut-off between these populations at 18 spikes/s (Schmiedt, 1989).
To confirm that the recorded unit was an AN fiber, and not accidentally recorded from the cochlear nucleus, several checks were carried out.Spike waveforms were checked for the absence of a prepotential, the response pattern to tones at 20-30 dB above threshold were checked for having a primary-like shape, and the shape of the ratelevel function was required to fall into one of the three known shapes of AN rate-level function (straight, sloping saturating, or flat saturating; Winter et al., 1990;Heeringa et al., 2023).When responses to clicks were obtained, click latency was determined and matched to our own click latency vs. BF distribution.When any of these checks failed, the unit was excluded from further analysis.Furthermore, inter-spike intervals (ISIs) were assessed to confirm single-unit isolation.Units with ISIs <0.6 ms, that is the absolute refractory period of AN fibers (Heil et al., 2007), were excluded from further analysis since these spikes likely derived from more than one AN fiber.

Analysis of vowel responses
We found previously that a spike timing-based discrimination metric, the Δ correlation index (ΔCI), agreed with behavioral discrimination abilities in young-adult gerbils (Heeringa and Köppl, 2022;Jüchter et al., 2022).Therefore, we also explored the effects of aging on this neural vowel discrimination metric.ΔCI is calculated as follows.First, the shuffled autocorrelogram (SAC) was calculated for each individual vowel response, according to Louage et al. (2004).The SAC differs from the autocorrelogram (or all-order inter-spike interval histogram) in that it is based on intervals between spikes across spike trains from different trials, rather than the intervals between spikes within one spike train.SACs were calculated using a bin width of 20.48 μs and were normalized for firing rate (r), repetitions (number of trials) (N), bin width (Δτ), and analysis window duration (D) by dividing the resulting SAC with the following correction factor (C SAC ): Only spiking activity recorded during the steady-state part of the vowel was analyzed (analysis window duration D of 233 ms).Next, the crossed autocorrelogram (XAC) was calculated for each combination of vowel responses recorded in one AN fiber, that is /aː/ vs. /eː/, /aː/ vs. /iː/, and /eː/ vs. /iː/.The XAC can be described as the all-order interval histogram of spikes across responses from two different

Neurophysiology (S01F)
Behavior (frequency range of all speakers) stimuli, rather than between responses to one stimulus as in the SAC (Louage et al., 2004).Since there are no identical repetitions, as in the SAC, and two different rates (r 1 and r 2 , referring to the rates of vowel 1 and vowel 2), the correction factor for the XAC (C XAC ) is described as follows: The same analysis window, start time, and stop time was applied to all three vowel responses.Since the background noise was frozen between all recordings, this also rules out background noise-based variability.The correlation index (CI), defined by the maximum peak height of the normalized SAC, was shown to be a measure of trial-bytrial temporal similarity between responses to the same stimulus (Joris et al., 2006) and has a strong correlation to the classical vector strength measure for phase locking (Kessler et al., 2021).Similarly, the CIx, defined by the maximum peak height of the normalized XAC, can be regarded as a measure of temporal similarity between spiking responses to two different stimuli, with low values meaning low similarity and thus suggesting good discrimination.To account for cases where temporal similarity of responses to the same vowel was already low, the difference between CI and CIx was taken as a spike timingbased discrimination metric, here termed ΔCI (Heeringa and Köppl, 2022).A high ΔCI value reflects temporal dissimilarity between neural responses to the two compared vowels, whereas a low ΔCI value reflects temporally similar responses to the two vowels, regardless of the overall temporal precision of spiking of a given AN fiber.
Neural vowel representation schemes were constructed based on vowel responses from AN fibers recorded in young-adult and quietaged gerbils according to methods described in detail in Heeringa and Köppl (2022).Specifically, the ALIR (averaged localized interval rate) and the dominant component scheme were constructed based on the fast-Fourier transform (FFT) of the all-order inter-spike interval histogram (ISIH; 31.25 ms length, 256 bins) (Sachs and Young, 1980;Delgutte and Kiang, 1984b).The all-order ISIHs were calculated over all trials with responses to one vowel during an analysis window when the vowel was present.For the ALIR, the all-order ISIHs were first converted to interval/s before calculating the FFT.Subsequently, the peak height of the ISIH FFT at each harmonic of f 0 was averaged for all units with a BF near that harmonic (Sachs and Young, 1980).The ALIR of each vowel response was plotted as a function of the f 0 harmonics for youngadult and quiet-aged gerbils.By excluding the correction for interval rate in the all-order ISIH and plotting it as a probability histogram, the FFT produces a purely temporal measure (Moissl and Meyer-Base, 2000).Since the unit of a first-order ISIH FFT is a temporal coding measure ranging from 0-1 (the synchronization index), the all-order ISIH FFT is expressed as the squared synchronization index (SI 2 ).The dominant component scheme was obtained by plotting the frequency of the highest peak at or near f 0 or its harmonics in this temporal ISIH FFT as a function of the fiber's BF, following methods described by Delgutte and Kiang (1984b).
To determine total temporal power at or away from the vowel's formant frequencies, the all-order ISIHs expressed as ISI probability, and their FFT were considered, as described above.First, a general overview of the AN response was obtained by plotting the median all-order ISIH FFT for each vowel and each age group across all AN fibers with a low-BF and across those with a high-BF (cut-off at 3.5 kHz; Huet et al., 2016).Additionally, peaks in the ISIH FFT were further analyzed to determine synchronization (that is, temporal coding strength) at f 0 , at the harmonics closest to the vowel's formant frequencies, and at all f 0 harmonics combined, according to methods described by Wong et al. (1998).At f 0 and its harmonics, the highest peak in the all-order ISIH FFT, expressed as the SI 2 value, in an f 0 -wide window was determined.Synchronization at the formant frequencies was calculated by adding the SI 2 at the harmonics closest to f 1 and f 2 , L indicates the list of harmonics which are closest to f 1 and f 2 : for /aː/ L = [3, 4, 5], for /eː/ L = [2, 9, 10, 11], and for /iː/ L = [2, 10].Higher formant frequencies were often buried in the noise and synchronization to these formants was typically not observed in the neural responses.Furthermore, f 1 and f 2 have been shown to account for most variability in discriminating between vowels (Peterson and Barney, 1952;Jüchter et al., 2022).Therefore, only f 1 and f 2 were considered in the analysis of the formant synchronization.The total synchronizing power of the ISIH FFT was calculated as follows, where N indicates the total number of f 0 harmonics included in the analysis.Here, we used the first 15 harmonics of f 0 , relating to a maximum frequency of 4.16 kHz.

Responses to amplitude-modulated tones
Vector strength (VS) to the modulation frequency of the amplitude-modulated tones (Goldberg and Brown, 1969) was calculated in the time window between 20 ms after the start of the stimulus until the end of the stimulus.VS represents the tendency of a unit to phase lock to the period of the modulation frequency, and was calculated by where n represents the total number of spikes, φ(j) is the phase of the j-th spike relative to the period of the modulation frequency, and i is the complex number √(−1).Significance of the VS was determined by p e n VS = − . 2 When p < 0.001 and n > 50, VS was considered significant (Mardia and Jupp, 2000).

Behavioral paradigm
Gerbils were trained to detect a vowel change in a continuously repeating reference of CVC logatomes with a fixed outer consonant.When the vowel in the repeating reference logatome changed, the animal had to indicate the detection of the change by jumping off a pedestal.If this was correct, the animal was rewarded with a 10-mg food pellet.The order of reference and target vowel were randomized across sessions and between animals.For the behavioral part of the study, an extended set of CVCs including various central vowels and outer consonants were used, but only the behavioral responses to the CVCs used in the AN recordings are analyzed and reported here (72 trials per animal).All CVCs derived from the OLLO speech material database and were presented against continuous ICRA1 background noise at the same SNR (+5 dB) and SPL (65 dB) as the ones used in AN recordings.CVCs used in the behavioral tests were spoken by two male and two female speakers, including the female speaker who spoke the CVCs used in the AN recordings, and included two utterances per speaker.Each consecutive CVC was randomly chosen from these speakers and utterances.Hence, only a change in vowel, not speaker identity, needed to be reported.For the discrimination of, for example, /aː/ vs. /eː/, both an /aː/ deviant in a stream of repeating /eː/, and an /eː/ deviant in a stream of repeating /aː/ is considered.Table 1 shows the frequency ranges of the fundamental and formant frequencies of the three vowels spoken by the speakers listed above.F 0 and formant frequencies were derived as described above (at 2.3.3.Consonantvowel-consonant stimuli) except that center phonemes of the logatome stimuli were cut out from the full signal at the phoneme label boundaries provided with the OLLO speech corpus.For a detailed description of the behavioral setup and procedure, see Jüchter et al. (2022).

Analysis of behavioral data
Response latencies and detection probabilities were measured to quantify the discriminability of the vowels.For misses and correct rejections in catch trials, a maximum response time of 1.5 s was used.Similarity matrices filled with ranks that were based on the response times of the gerbils for the discrimination between the different CVCs were entered into the multidimensional scaling (MDS) procedure PROXSCAL (Busing et al., 1997) in SPSS (IBM; version 27) for generating perceptual maps.MDS translates the differences in response time ranks to distances in a multidimensional space representing the perceived stimulus similarity by spatial proximity.Long response times are reflected by short distances in the perceptual map, and thus represent poor behavioral discriminability between two vowels.Short response times are shown as large distances in the perceptual map and thus indicate good behavioral discriminability between the two vowels.Values for the sensitivity-index d-prime (d') were calculated, applying the inverse cumulative standard normal distribution function Φ −1 to the hit (H) and false-alarm rates (FA): d' = Φ −1 (H) -Φ −1 (FA) (Macmillan and Creelman, 2004).For more details about the behavioral data analyses, see Jüchter et al. (2022).

Statistics
Each data distribution that was entered into statistical analyses was tested for normality using the Shapiro-Wilk test.When composite normality was a reasonable assumption (p > 0.05 on the Shapiro-Wilk test), parametric tests were used to evaluate group differences.Specifically, to evaluate effects of age group and vowel comparison on response latencies and d'-values from the behavioral experiments, two-way ANOVAs followed by post-hoc two-sample T-tests were used.For ABR thresholds, a normal distribution could also be reasonably assumed and a two-sample T-test was used to determine significant differences between age groups.For the other data distributions, normality could not be assumed, and one of the following non-parametric tests were used as appropriate: Mann-Whitney U tests, Friedman tests, or Wilcoxon Signed Rank tests.Bonferroni corrections were used to correct for multiple comparisons.Statistical analyses were carried out in SPSS (IBM; version 27) and MATLAB (MathWorks; version R2023b) using the Statistics and Machine Learning Toolbox™ (version 23.2).

Behavioral vowel discrimination did not decline in gerbils with age-related hearing loss
Figure 1 shows the perceptual maps of young-adult (Figure 1A) and quiet-aged gerbils (Figure 1B), for the vowels that were included in the neurophysiological part of this study.These maps visualize which vowel discrimination is easy and which difficult.Proximity between the three vowels as well as their locations on the perceptual map are remarkably similar between the two age groups.For all animals, the vowels /eː/ and /iː/ were more difficult to discriminate (closer to each other on the perceptual map) than /iː/ vs. /aː/ or /eː/ vs. /aː/.

Neural vowel discrimination was enhanced in single auditory-nerve fibers of quiet-aged gerbils
The quiet-aged gerbils, from which AN fiber responses to the vowels were recorded, had a mild to moderate degree of age-related hearing loss.ABR thresholds to chirps were significantly elevated in the quiet-aged compared to the young-adult gerbils by 20 dB on average (two-sample T-test: T(10) = 4.02, p = 2.4*10 −3 ; Figure 2A).BF and thresholds of all single AN units that were recorded from these animals are shown in Figure 2B.Fibers from which also CVC responses were recorded are indicated separately as filled symbols.BFs of fibers from which vowel responses were recorded, ranged from 880-11,000 Hz for young adults and from 785-10,600 Hz for quiet-aged gerbils.Note that the BFs of these fibers do not extend into the range of f 1 of the vowels / eː/ (440 Hz) and /iː/ (275 Hz) and barely encompass f 1 of /aː/ (850 Hz).Thresholds of AN fibers recorded in the quiet-aged gerbils were also significantly elevated compared to those of young-adult gerbils (Mann-Whitney U-test: U = 10.88,p = 1.38*10 −27 ), with a median age-related elevation of 22 dB.Furthermore, the SR of fibers recorded in quiet-aged gerbils was significantly lower than those of young adults (U = 2.50, p = 0.013), consistent with what we have observed previously in a different sample of quiet-aged gerbils (Heeringa et al., 2020(Heeringa et al., , 2023)).Table 2 shows for each vowel and age group the number of AN fibers for which responses to CVCs were recorded.

Qualitative temporal representation of formant frequencies was unaffected by aging
To further investigate the underlying mechanisms of the putatively improved vowel discrimination in the AN fibers of quiet-aged gerbils, spike timing-based representation schemes were constructed for the three separate vowels.Figures 4A-C shows the ALIRs of young-adult and quiet-aged gerbils' AN fibers in response to the vowels /aː/, /eː/, and /iː/, respectively, which represent the average synchronized firing rate for fibers tuned to the f 0 harmonic nearest their own BF.ALIRs of fibers recorded in quiet-aged gerbils appeared very similar to ALIRs of young-adult gerbils, for all tested vowels.Peaks at formant frequencies for the vowels /aː/ and /eː/ were preserved with aging (Figures 4A,B) and the ALIR of the vowel /iː/ did not show peaks at formant frequencies for either age group (Figure 4C).Dominant component schemes, as described in Heeringa and Köppl (2022), were also examined, and revealed the same qualitative outcomes as described above for the ALIR (see Supplementary Figure S1).
Together, these data suggest that better temporal discrimination, as expressed as higher ΔCI values, in quiet-aged gerbils (Figure 3) does not correspond to qualitatively better temporal representation, when possible, e.g., in the ALIR of the vowel /iː/.Interestingly, ΔCI values were considerably higher in quiet-aged compared to young-adult gerbils for fibers tuned to BFs above 3.5 kHz (Figures 3A-C), the upper frequency limit for phase-locking in gerbils (Versteegh et al., 2011).Such high frequency fibers likely temporally code for the lower harmonics of the vowel's power spectrum.The following section will explore this observation in more detail.

Temporal locking to fundamental and formant frequencies was stronger in aged fibers
An indication of temporal coding strength at different frequencies in the vowel responses was obtained by plotting the median ISIH FFTs of all AN fiber recordings.Figures 5A-C shows the median ISIH FFT of the low-BF fibers (BF < 3.5 kHz) and confirms stronger temporal coding in fibers of quiet-aged gerbils to the f 0 harmonic near f 1 for the vowels /aː/ and /eː/.High-BF fibers (BF > 3.5 kHz) also revealed enhanced temporal coding to the /aː/ and /eː/ vowels in quiet-aged gerbils, in addition to a peak at f 0 for vowel /aː/ which was absent in the data of the young-adult The effects of aging on neurophysiological thresholds and auditory-nerve spontaneous rates.(A) ABR thresholds to chirps of the young-adult (n = 7) and quiet-aged (n = 5) gerbils from which single-unit recordings were collected.(B) Single-unit AN fiber thresholds for tone bursts at BF as a function of the fiber's BF.Data of young-adult and quiet-aged gerbils are plotted as blue circles and red squares, respectively.Units in which responses to at least one vowel were also recorded are represented by the filled symbols.(C) Single-unit AN fiber SRs as a function the fiber's BF.Symbols represent the same as in panel (B).5D-E).Median ISIH FFT do not show clear peaks near fundamental or formant frequencies in response to the vowel /iː/ for both young-adult and quiet-aged gerbils, in both BF groups (Figures 5C,F).
3.5.Enhanced temporal coding in quiet-aged gerbils could be explained by stimulating closer to threshold CVC stimuli were presented at the same level -65 dB SPL -for young-adult and quiet-aged gerbils.However, quiet-aged gerbils had significantly higher thresholds (Figure 2B), meaning that the stimulus was presented closer to threshold in old compared to young-adult gerbils.This is important because temporal locking to low-frequency amplitude modulations is known to vary with stimulus level in AN fibers of normal-hearing cats, with a maximum at levels just above threshold (Joris and Yin, 1992).To determine if this relation is similar in young-adult and quiet-aged gerbils, we presented sinusoidal amplitude-modulated tones at BF at a range of different levels to a subsample of fibers recorded in young-adult and quiet-aged gerbils.

A B C
The effects of aging on ISIH FFT power.The effects of stimulus level on temporal coding.Vector strength, a measure of temporal locking to periodic sinusoidal stimuli, steeply increased with increasing level and then decreased again less steeply, in fibers from both age groups (Figure 7A).At 65 dB SPL, indicated by the vertical dashed line in Figure 7A, vector strength in fibers of quiet-aged gerbils was higher than in those of young-adult gerbils.This can be explained by the difference in threshold sensitivity between young-adult and quiet-aged gerbils.The peak in the vector strength vs. stimulus level curve was within −5 and +20 dB of the fibers' individual rate threshold, both for young-adult and quiet-aged gerbils (Figure 7B).Hence, the elevated thresholds in quiet-aged gerbils may partly explain enhanced temporal coding to vowels presented at the same SPL.If this is true, a 'young adult-like' vowel encoding in fibers of quiet-aged gerbils is predicted when the stimulus level of the vowel in noise is increased.
To test this hypothesis, we recorded responses to one vowel (/eː/) at two different levels from five single fibers of a quiet-aged gerbil, one at the regular 65 dB SPL and one at 80 dB SPL to compensate for the age-related threshold shift.Both levels included the background noise at 5 dB SNR relative to the SPL of the vowel.Like the ISIH FFT representation in Figure 5, responses of fibers with BF < 3.5 kHz are plotted separately from those with BF > 3.5 kHz.In the two low-BF fibers, the vowel in noise at 80 dB SPL resulted in a decrease of the peak at the first harmonic of f 0 (~ 550 Hz) and an emergence of a peak at the fundamental frequency (Figure 7C).This same pattern, but more pronounced, is seen in the high-BF fibers (Figure 7D).These data suggest that the enhanced temporal locking to the first harmonic in quiet-aged gerbils was due to stimulating closer to threshold.However, the stronger response to f 0 is not simply explained by a lower effective stimulation level.It appears to emerge specifically in old gerbils.

Discussion
Using behavioral experiments, we here showed that behavioral vowel discrimination was not significantly affected by aging.Response latencies and d' sensitivity indices were similar between young adults and old gerbils.In contrast, temporal firing patterns were clearly dissimilar between young-adult and quiet-aged gerbils, for responses to all vowels.Contrary to intuition, a general improvement of temporal vowel encoding in noise across frequency was observed in old individuals.Furthermore, in quiet-aged gerbils, neural discrimination was not significantly worse for the behaviorally difficult /eː/ vs. /iː/ comparison, compared to the two easy comparisons.Representation schemes, based on the spectrum of the inter-spike interval histogram, revealed stronger encoding of both the fundamental and the first and second formant frequencies in AN fibers of quiet-aged gerbils.Elevated thresholds in combination with a fixed stimulus level can help explain these findings.

Why do AN fibers of quiet-aged gerbils display stronger temporal responses to vowels in noise?
Previously, our lab showed that temporal coding in AN fibers of quiet-aged gerbils was not affected by aging (Heeringa et al., 2020).In that study, temporal coding was evaluated from spiking activity in response to pure tones and white noise.Results showed that both encoding of the temporal fine structure, derived from tone-burst at BF and noise responses, and of the sound envelope, derived from noise responses, was unaffected by age-related hearing loss.Importantly, these results were obtained when comparing AN fiber responses between young adults and old gerbils at a fixed sensation level, i.e., every unit was stimulated at 20 dB above its individual rate threshold.In the current study, the stimulus had a fixed level of 65 dB SPL for both young-adult and quiet-aged gerbils, to enable a direct translation to the behavioral study.While temporal-fine structure encoding, as determined by vector strength to tone-bursts at BF, seems to be unaffected by aging for both fixed sound levels and fixed sensation levels (Heeringa et al., 2020), envelope encoding strongly varies as a function of level relative to the fiber's threshold (Figure 7B).This can explain the higher synchronization indices especially to the lower frequencies (< 1 kHz) by fibers of quiet-aged gerbils.
Enhanced temporal coding by AN fibers has also been studied extensively following noise-induced cochlear damage.These studies revealed specifically that, even at equal sensation levels, noise-induced hearing loss causes a distorted tonotopy (e.g., Henry et al., 2016;reviewed in: Parida and Heinz, 2022a).This means that fibers respond more strongly to the low-frequency, off-BF part of stimuli, or even become hypersensitive to these tail frequencies.With noise-induced cochlear damage, frequency selectivity and tip-to-tail ratio are strongly correlated to distorted tonotopy.It is unlikely that a distorted tonotopy among AN fibers plays a major role in gerbils with age-related hearing loss, as the fiber's frequency selectivity (measured 10 dB above of the most sensitive point) is not altered and tail hypersensitivity is absent in aged gerbils (Hellstrom and Schmiedt, 1996;Heeringa et al., 2020).Noise-induced deficits in frequency tuning and, consequently, distorted tonotopy are thought to be caused mainly by damage to outer hair cells (Liberman and Dodds, 1984;Parida and Heinz, 2022b), while quiet-aged gerbils, especially those with only mild-to-moderate hearing loss such as the ones involved in this study, have only minimal hair cell loss (Tarnowski et al., 1991).The main cause of age-related hearing loss in gerbils is strial dysfunction, resulting in a loss of the endocochlear potential (Schmiedt et al., 2002).Indeed, temporal coding deficits and associated tonotopic distortions were much less severe for experimentally induced metabolic hearing loss compared to a noiseinduced hearing loss of the same degree (Henry et al., 2019).Nevertheless, a small degree of outer hair cell dysfunction and subsequent tonotopic distortion may have been present, as indicated by the enhanced f 0 response when presenting stimuli at the same sensation level (Figures 7C,D) as well as by a high proportion of low-frequency fine structure responses in high-BF fibers of old gerbils (see Figure 3D in Heeringa et al., 2020).

How is neural vowel encoding related to behavioral vowel discrimination?
Apart from the enhanced temporal coding, neural representation of naturally-spoken vowels in noise was qualitatively not affected by age-related hearing loss.This is similar to what has been found for vowel encoding by AN fibers following acoustic trauma (Geisler, 1989;Miller et al., 1997;Parida and Heinz, 2022b).In order to understand what is being said in adverse listening conditions, neural encoding of vowels only needs to be good enough for a perceptual discrimination from another vowel to remain possible.In our previous study, we have shown that the spike timing-based discrimination metric ΔCI agrees well with behavioral vowel discrimination abilities in young-adult gerbils (Heeringa and Köppl, 2022).Vowel combinations that were behaviorally difficult to discriminate from each other (/eː/ vs. /iː/) also showed lower values of ΔCI in responses of AN fibers.This suggests that, for young, normalhearing gerbils, the limiting factor for behavioral discrimination of vowels may reside at or peripheral to the auditory nerve.Interestingly, quiet-aged gerbils displayed higher values of ΔCI in their AN fiber responses (Figure 3A).However, this neither translated to increased d' values nor to shorter response latencies in old individuals.Furthermore, ΔCI of the different vowel comparisons did not comply with behavioral vowel discrimination abilities in quiet-aged gerbils.While the results suggest that ΔCI limits behavioral vowel discrimination in young-adult gerbils, this is not the case in quiet-aged gerbils.Rate-place coding can also contribute to neural vowel discrimination (Conley and Keilson, 1995).However, background noise strongly deteriorates the rateplace code in the auditory nerve, making it unlikely to be an additional limiting factor with age-related hearing loss (Sachs et al., 1983;Heeringa and Köppl, 2022).In addition, the ALIR, which contains an interval rate-and temporal-place coding, was not degraded with age-related hearing loss (see Figure 4).Therefore, these results suggest that additional age-related deteriorating processes in the central (auditory) system, starting at the synapse between the AN and its cochlear-nucleus targets, may affect vowel discrimination to the extent that the performance matches that of the young adults.
One such age-related deteriorating process to speech encoding in the central auditory system is known for the level of the auditory midbrain (inferior colliculus) (Khouri et al., 2011).Specifically, temporal coding of speech-like sounds and pulses are also enhanced in spiking activity of inferior colliculus neurons.However, temporal selectivity, that is to which modulation frequency the neuron responds best, as well as the heterogeneity in temporal responses between neurons decreases, which leads to a reduced benefit of pairing neurons for speech discrimination (Khouri et al., 2011;Parthasarathy et al., 2019).In other words, there is more redundancy in responses to speech among neurons of the inferior colliculus, resulting in putatively poorer speech encoding.An age-related decline in inhibition is thought to underlie these changes (Frisina and Walton, 2006;Caspary et al., 2008;Kessler et al., 2020).

What are the implications for humans with age-related hearing loss?
Age-related cochlear damage in the quiet-aged gerbil has been well characterized and proposed as a good model to make the translation to human age-related cochlear deficits (Schmiedt, 2010;Heeringa and Köppl, 2019).Similar to humans, quiet-aged gerbils in our colony also show substantial synapse loss, especially at the basal end of the cochlea, which processes the high-frequency sounds (Wu et al., 2020;Steenken et al., 2021).Among high-BF fibers, the low-SR fibers are particularly vulnerable to this synapse loss (Schmiedt et al., 1996;Heeringa et al., 2023).However, a reduction in the endocochlear potential, due to age-related damage to the stria vascularis, also affects the SR (Schmiedt, 1996;Wu et al., 2020).This can explain the general reduction of spontaneous rate, most strongly seen for the low-BF fibers (Figure 2C; Heeringa et al., 2020Heeringa et al., , 2023)).Such cochlear changes in SR distribution, especially the loss of low-SR fibers, have been hypothesized to cause speech-in-noise perceptual deficits (Bharadwaj et al., 2014).However, as of yet, no strong evidence has been shown to support this hypothesis and much controversy remains (Ripley et al., 2022).The current study does not support the hypothesis that synapse loss causes a vowel-in-noise encoding or perceptual deficit.
The absence of an aging effect on vowel discrimination in gerbils seems to be in contrast to the common complaints of elderly humans about difficulties to understand speech in noisy conditions (Dubno et al., 1984;Fogerty et al., 2012;Füllgrabe et al., 2015).However, studies in humans that compared vowel vs. consonant perception showed that age-related problems with consonant perception are more common than those with vowels and also degrade speech intelligibility to a greater extent (Fogerty et al., 2012(Fogerty et al., , 2015)).By comparing behavioral with neural vowel discrimination using the same species and acoustic stimuli, we found here that although neural vowel encoding was altered by age, this did not significantly affect behavioral vowel encoding.
FIGURE 1 The effects of aging on behavioral vowel discrimination.(A) Perceptual map of young-adult gerbils (n = 9).Vowels in close proximity to each other indicate a more difficult discriminability.The perceptual map was constructed based on the three logatomes that were used in the neurophysiological study, with vowels /aː/, /eː/, and /iː/ at 5 dB SNR.Blue, red, and yellow crosses represent data from individual gerbils for the vowels /aː/, /eː/, and /iː/, respectively; black circles represent the group mean.(B) Perceptual map of quiet-aged gerbils (n = 10).(C) Boxplots of the response latencies for the three vowel discriminations of young-adult (filled box plots) and quiet-aged gerbils (open box plots).The boxes indicate the 25th and 75th percentiles and the median.The whiskers indicate the upper and lower limits of the range of data points.(D) Boxplots of the sensitivity index d' for the three vowel discriminations of young-adult (filled box plots) and quiet-aged gerbils (open box plots).
FIGURE 3 Neural spike-timing based vowel discrimination in young-adult and quiet-aged gerbils.(A-C) ΔCI for the vowel comparisons /aː/ vs. /eː/, /aː/ vs. /iː/, and /eː/ vs. /iː/ plotted as a function of the fiber's BF [panel (A-C), respectively].Formant frequencies of the vowel /aː/ are shown in blue dashed lines, of /eː/ in red dotted-dashed lines, and of /iː/ in yellow solid lines (colors correspond to the plot titles).Significant differences between ΔCI from fibers of young-adult and quiet-aged gerbils are indicated by the horizontal bars between the boxplots shown at the right margin of each scatter plot.(D) ΔCI for the three vowel discriminations in young-adult and quiet-aged gerbils separately.Boxplots show the median, the 25th and 75th percentiles, the range of data points (without outliers), and outliers of the ΔCI values.Significant differences between vowel comparisons within each age group are indicated here.*** indicates p < 0.001.
FIGURE 7 (A) Vector strength to the modulation frequency (f m = 128 Hz) of AM stimuli, as a function of different stimulus levels in four fibers of a young-adult and four fibers of a quiet-aged gerbil, indicated in blue and red traces, respectively.Open symbols indicate vector strength values that were not statistically significant (see Methods).The dashed line indicates the level of the CVC stimuli (65 dB SPL).(B) The same data as in panel (A), replotted relative to the individual fiber's threshold, indicated by the dashed-dotted line.Legend of panel (A) applies.(C) Median ISIH FFT of low-BF fibers from an old gerbil responding to /eː/ in noise (5 dB SNR) at 65 dB SPL (black trace) and at 80 dB SPL (pink trace) (n = 2 AN fibers).The line spectrum of /eː/ is plotted in grey.Note that the spectrum of the background noise was not plotted to retain visibility of formant frequencies otherwise buried in the noise.f 0 , f 1 , and f 2 are indicated in the plot.(D) Median ISIH FFT of three high-BF fibers from the same old gerbil, in response to the same stimuli as in panel (C).Format is similar to panel (C).

TABLE 1
Fundamental and formant frequencies of the presented vowels. Hz)

TABLE 2
Number of AN fibers recorded for each vowel and age group.