Contributions of Sensory Coding and Attentional Control to Individual Differences in Performance in Spatial Auditory Selective Attention Tasks

Dai, Lengshi; Shinn-Cunningham, Barbara G.

doi:10.3389/fnhum.2016.00530

ORIGINAL RESEARCH article

Front. Hum. Neurosci., 20 October 2016

Sec. Sensory Neuroscience

Volume 10 - 2016 | https://doi.org/10.3389/fnhum.2016.00530

This article is part of the Research TopicNeural Mechanisms of Auditory Selective Attention and its DeficitsView all 7 articles

Contributions of Sensory Coding and Attentional Control to Individual Differences in Performance in Spatial Auditory Selective Attention Tasks

Lengshi Dai

Barbara G. Shinn-Cunningham^*

Department of Biomedical Engineering, Boston University, Boston, MA, USA

Listeners with normal hearing thresholds (NHTs) differ in their ability to steer attention to whatever sound source is important. This ability depends on top-down executive control, which modulates the sensory representation of sound in the cortex. Yet, this sensory representation also depends on the coding fidelity of the peripheral auditory system. Both of these factors may thus contribute to the individual differences in performance. We designed a selective auditory attention paradigm in which we could simultaneously measure envelope following responses (EFRs, reflecting peripheral coding), onset event-related potentials (ERPs) from the scalp (reflecting cortical responses to sound) and behavioral scores. We performed two experiments that varied stimulus conditions to alter the degree to which performance might be limited due to fine stimulus details vs. due to control of attentional focus. Consistent with past work, in both experiments we find that attention strongly modulates cortical ERPs. Importantly, in Experiment I, where coding fidelity limits the task, individual behavioral performance correlates with subcortical coding strength (derived by computing how the EFR is degraded for fully masked tones compared to partially masked tones); however, in this experiment, the effects of attention on cortical ERPs were unrelated to individual subject performance. In contrast, in Experiment II, where sensory cues for segregation are robust (and thus less of a limiting factor on task performance), inter-subject behavioral differences correlate with subcortical coding strength. In addition, after factoring out the influence of subcortical coding strength, behavioral differences are also correlated with the strength of attentional modulation of ERPs. These results support the hypothesis that behavioral abilities amongst listeners with NHTs can arise due to both subcortical coding differences and differences in attentional control, depending on stimulus characteristics and task demands.

Introduction

A number of recent studies suggest that listeners with normal hearing thresholds (NHTs) may suffer from auditory neuropathy, or a loss of ascending auditory nerve fibers (Schaette and McAlpine, 2011; Plack et al., 2014; Bharadwaj et al., 2015). This kind of loss appears to have a particularly strong impact on how well listeners can understand speech in noise or when there are competing sources (Hind et al., 2011; Ruggles and Shinn-Cunningham, 2011; Ruggles et al., 2012). Recent work in animal models are consistent with these reports, showing that auditory neuropathy can be fairly severe without impacting the quietest sound that can be detected (e.g., see Kujawa and Liberman, 2009; Lin et al., 2011; Lobarinas et al., 2013). Evidence suggests that low-spontaneous rate auditory nerve fibers, which only become active at supra-threshold levels, are more susceptible to damage from noise exposure than high-spontaneous rate fibers, which respond at hearing threshold (Furman et al., 2013); this helps explain why supra-threshold sound perception is degraded even though detection thresholds are unaffected. Given this, auditory neuropathy, very like driven by noise exposure and aging (Schaette and McAlpine, 2011; Bharadwaj et al., 2014, 2015; Plack et al., 2014), is a likely contributor to individual differences in the encoding of subtle spectro-temporal features of supra-threshold sound. Such features are critical for segregating sound sources; if a listener cannot segregate sources, then they will have trouble directing attention to whichever source is of interest. Given this, auditory neuropathy may explain why some NHT listeners experience communication problems in noisy environments (Shinn-Cunningham et al., in press).

Consistent with this, in one recent set of studies, NHT subjects were asked to report spoken digits from straight ahead while ignoring otherwise identical digits ±15^° off center. Despite having normal auditory thresholds, performance varied from below 40% to nearly 90%; moreover, almost all mistakes arose because listeners reported the content of one of the competing streams, rather than because they failed to understand the digits in the mixture (Ruggles and Shinn-Cunningham, 2011; Ruggles et al., 2011). Importantly, these difficulties in focusing on target speech amidst competing speech were correlated with the strength of the subcortical response to periodic sound, known variously as the frequency-following response (FFR) or the envelope following response (EFR; see Ruggles and Shinn-Cunningham, 2011; Ruggles et al., 2012). These results suggest that poor subcortical encoding can lead to deficits in the ability to focus selective auditory attention on a source from a particular direction.

Still, it is clear that individual differences in the ability of listeners to understand speech in noisy settings are not always due to differences in sensory coding fidelity; everything from general cognitive ability to aging affects the ability to understand speech in complex settings (e.g., see Gordon-Salant et al., 2006, 2007; Singh et al., 2008, 2013; Grose et al., 2009; Grose and Mamo, 2010, 2012; Nakamura and Gordon-Salant, 2011; Rönnberg et al., 2011; Weisz et al., 2011; Banh et al., 2012; Benichov et al., 2012; Hall et al., 2012; Noble et al., 2012; Tun et al., 2012; Anderson et al., 2013; Brungart et al., 2013; Veneman et al., 2013). Consistent with this, most of the studies demonstrating a link between sensory coding deficits and failures of selective auditory attention were designed so that the features that distinguished the target source from competing speech streams differed only modestly (e.g., only 15^° of separation between competing streams; see Ruggles et al., 2011, 2012), on the edge of what even “good” listeners are able to use reliably. By design, performance in such paradigms depends on subtle differences in the robustness of temporal coding of supra-threshold, audible sound. These subtle differences are likely the primary limitation on performance in these experiments, and thus correlate with individual differences in ability, even though more central differences in processing ability may also be present.

Selective auditory attention engages multiple regions that must work together to modulate the sensory representation of sound based on task demands (e.g., see Giesbrecht et al., 2003; Fritz et al., 2007; Hill and Miller, 2010). In situations where streams are easy to segregate and have perceptually distinct features, differences in the efficacy of these cortical control networks are likely to determine individual performance and perceptual ability. Indeed, one recent study shows that there are large inter-subject differences in how well listeners can identify melody contours when there are competing melodies from widely separated directions, a task in which segregation and selection probably does not depend on individual differences in sensory coding (Choi et al., 2014). Yet, in this study, individual differences in performance were consistent across conditions, and performance correlated with how strongly cortical responses to the competing melodies were modulated by attentional focus (Choi et al., 2014). These results suggest that in addition to differences in subcortical coding fidelity, there are significant, relatively central individual differences in the ability to control selective auditory attention, and that these consistent individual differences determine behavioral ability on tasks where peripheral coding does not limit performance.

Still, the relationship between sensory coding differences and differences in cortical control of attention are not entirely clear. For instance, it is possible that differences in the strength of attentional modulation arise not from differences in central control, but are driven instead by differences in sensory coding fidelity. For instance, if coding fidelity is so poor that a listener cannot separate the target source from competing sources, it will necessarily lead to failures in suppressing neural responses to competing sound sources (Shinn-Cunningham and Best, 2008; Shinn-Cunningham and Wang, 2008). It is possible that this kind of cascade could explain the individual differences observed by Choi et al. (2014). Specifically, they did not measure subcortical coding fidelity in their subjects; it is possible that listeners who performed well and were able to modulate cortical auditory responses strongly were the listeners with the most robust peripheral encoding of supra-threshold sound. As a result, these listeners may have been best at segregating the competing melodies and suppressing the unimportant streams in the mixture.

The current study was designed to test directly whether individual differences in sensory coding and differences in the central control of attention both contribute to the ability to analyze one target sound stream when it is presented with simultaneous, competing streams. We undertook two experiments to examine the relationships between subcortical sensory coding fidelity, the strength of attentional modulation of cortical responses and behavior performance. In both experiments, we measured all three (subcortical coding, attentional modulation and performance) in the same listeners at the same time. By varying stimulus characteristics, we expected to shift the balance in how important peripheral and central factors were in determining performance, allowing us to demonstrate that these factors interact to affect the ability to perform spatial auditory attention tasks.

Both subcortical and cortical responses can be measured using electroencephalography (EEG). However, the experimental design typically depends on which kind of response a study aims to measure; the type of stimuli, timing of the stimuli, number of stimulus repetitions, EEG sampling rate, electrode configuration and EEG data pre-processing and processing schemes (to name some of the experimental parameters) usually are set differently depending on which kind of measure is desired. Perhaps as a result, few studies have simultaneously measured subcortical and cortical responses. Still, if an experiment is designed with both subcortical and cortical response characteristics in mind, they can be measured in the same experiment, at the same time (e.g., see Hackley et al., 1990; Krishnan et al., 2012). In the current study, we measured cortical and subcortical responses during selective auditory attention tasks in order to examine individual differences in performance and how they relate to both measures. By recording subcortical and cortical data at the same time in our subjects, we guaranteed that the same physiological and psychological conditions were at play in each measurement, allowing us to compare outcomes directly.

To measure cortical responses, we considered auditory event-related potentials (ERPs), which are elicited by auditory events such as onsets of notes in a melody or syllables in an ongoing stream of speech. By comparing the magnitude of ERPs to the same mixture of auditory inputs when listeners attend to one stream vs. when they attend to a different stream, we can quantify the degree of top-down control of selective attention for individual listeners (e.g., see Choi et al., 2014).

We analyzed subcortical responses using the EFR, a measure that quantifies the degree to which the subcortical portions of the pathway phase lock to ongoing temporal periodicities in an input acoustic stimulus (Zhu et al., 2013). By focusing on relatively high-frequency modulation (above 100 Hz), the brainstem response, rather than cortical activity, dominates this measure (see Shinn-Cunningham et al., in press). In addition, a number of past studies have related EFRs to perceptual ability (Krishnan et al., 2009; Bidelman et al., 2011; Carcagno and Plack, 2011; Gockel et al., 2011).

The two different experiments were similar, but we hypothesized that they would yield different results. In both experiments, there were two potential target streams, one from the left and one from the right of the listener. From trial to trial, we randomly varied which stream was the target, using a visual cue to indicate whether the listener should direct attention to the stream on the left or the stream on the right. While the overall structure of the two experiments was grossly similar, the tasks and auditory stimuli differed in order to try to isolate different factors contributing to individual differences.

Experiment I presented listeners with two streams of repeated complex tones and asked listeners to count pitch deviants in the attended stream. Because the pitch deviations were small, we hypothesized that subject differences in the ability to report the correct number of deviants would be related to differences in subcortical temporal coding. In Experiment I, the spatial separation of the two streams was large. Therefore, we did not expect differences in subcortical temporal coding to limit how well or fully listeners could focus spatial attention on the target stream. While we expected subjects to differ from one another in the degree to which they could focus spatial attention and modulate cortical responses to the competing streams, we did not expect these subject differences in cortical control to correlate with either behavioral performance or with the subcortical coding fidelity given how clearly segregated we expected the competing streams to be.

Experiment II presented listeners with two potential target streams that each comprised simple melody contours. Listeners were asked to report the shape of the melody of the attended stream, which consisted of sequences of high and low pitches separated by a small pitch difference. Thus, as in Experiment I, the task required listeners to judge small pitch variations within an ongoing stream. In contrast to Experiment I, we made the ability to selectively focus attention on the target stream challenging by including a third, distractor stream melody from straight ahead and by reducing the spatial separation between the competing streams. As a result, the ability to selectively focus attention was more of a bottleneck in Experiment II than in Experiment I. We hypothesized that in Experiment II, performance would depend on individual differences in subcortical temporal coding, because coding fidelity would determine both how well listeners could hear the melody contour and how well they could use the modest spatial differences that differentiated the target stream from the two competing streams. We further hypothesized that individual differences in the strength of subcortical coding would partially correlate with both the degree of cortical modulation of ERPs and with performance on the selective attention task. However, we also hypothesized that even after factoring out correlations with subcortical responses, remaining differences in performance would correlate with attentional modulation strength. This final result would suggest that in Experiment II, central differences in attentional control differed across listeners and directly impacted individual differences in the ability to perform the task, even after accounting for the effects of sensory coding fidelity.

Common Methods

Subjects

All subjects were screened to confirm that they had NHTs at frequencies between 250 Hz and 8000 Hz (thresholds of at most 20 dB HL) for both ears. This study was carried out in accordance with the recommendations of the Boston University Charles River Campus Institutional Review Board (CRC IRB). All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the CRC IRB. All subjects were compensated at the rate of $25 per hour, and were paid a $0.02 bonus for each correct response to ensure that they remained attentive throughout the task.

Equipment

Subjects sat in a sound-treated booth while performing the tasks using a PC keyboard and monitor. The PC controlled the experiment using Psychtoolbox 3 (Brainard, 1997) and Matlab (Mathworks; Natick, MA, USA). The control code also generated triggers that were recorded to mark the times of key events. Auditory stimuli were presented through a TDT System Three unit (Tucker-Davis Technologies, Alachua, FL, USA) and ER-1 insert headphones (Etymotic, Elk Grove Village, IL, USA).

A BioSemi Active Two System (Amsterdam, Netherlands) recorded EEG signals using a 4.096 kHz sampling rate. Recordings were taken from 32 active scalp electrodes in the standard 10/20 configuration. Two additional electrodes were placed on the mastoids; during analysis, the EEG recordings were re-referenced to the mean of the two mastoid electrodes. Synchronized triggers from the TDT system were recorded simultaneously with the EEG data, which were stored on the controlling PC.

Stimuli

Stimuli were generated in Matlab (Mathworks; Natick, MA, USA) using a sampling rate of 48.828 kHz. Each trial consisted of a mixture of simultaneous, isochronous sequences of complex tones that had different repetition rates, so that onsets of the notes in the different sequences were resolvable in time. In both experiments, two of these streams were potential target streams (Stream A and Stream B). On each trial, we varied the perceived laterality of the streams using interaural time differences (ITDs), so that one of the potential target streams was heard from one hemifield, and the other potential target from the opposite hemifield (chosen randomly from trial to trial). Experiment I presented only two streams, while Experiment II included a third, central stream that was never the focus of attention (Stream C). Each of the complex tones in both Stream A and Stream B consisted of the first 33 harmonics of some fundamental frequency, all of equal amplitude, added in sine phase. In Experiment II, the notes in Stream C were made up of the first three harmonics of their fundamentals, all of equal amplitude, added in sine phase. All notes in both experiments were played at a level of 70 dB SPL (root-mean-squared).

The fundamental frequencies of the notes in each stream as well as the repetition rates of the notes were carefully chosen to ensure that they were not harmonically related to each other or to 60 Hz. Because of this design, when we binned responses to notes in each stream, any interference from neural responses to competing streams and any ongoing line noise was random across bins, and tended to cancel out. The temporal structure of the trials permitted us to analyze cortical EEG responses to note onsets in each stream by examining responses at the correct time points; as a consequence of this design, the number of notes in Stream A and Stream B differed (For Stream A and Stream B, respectively, the number of notes was 10 and 8 in Experiment I and 5 and 4 in Experiment II; Stream C in Experiment II had 4 notes). In order to extract the brainstem EFRs from the EEG, the stimuli in half of the trials in each experiment were presented in negative polarity (see Skoe and Kraus, 2010).

Task Design

The general task structure is shown in Figure 1. Each trial started with the presentation of a 0.4-s long fixation dot, followed by a 1-s long visual cue. The cue was an arrowhead that appeared to one side of the fixation dot and pointed either to the left or right, indicating the direction of the target stream on that trial (selected randomly for each trial, separately for each subject). After the cue ended, there was a 0.3 s of pre-stimulus quiet period, then the auditory stimulus began (6.8 s of duration for Experiment I and 3.8 s of duration for Experiment II). A 0.4 s of post-stimulus silent period followed the auditory stimulus presented on each trial, after which a circle appeared around the fixation dot to indicate the response period, which lasted 1.5 s. Listeners were instructed to maintain gaze to the fixation dot/cue, and then, during the response period, to use number keys on the computer keyboard to provide their response. The program recorded the last button push within the response period as the registered answer, so subjects could correct a mistaken button push if they changed their answer within that time (if there was no response during the response period, no response was recorded and the trial was counted as incorrect). Feedback was given after the response period ended: the fixation dot flashed for 0.3 s, either red for an incorrect response or blue for a correct response. After the end of the visual feedback, the next trial began after a random pause (0–0.1 s, randomly selected on each trial from a uniform distribution).

FIGURE 1

Figure 1. General structure of the experiments. Both experiments start with the presentation of a fixation dot on the screen, after which a visual cue appears to indicate the direction to which listeners should attend. Two potential target streams are presented symmetrically in the left and right hemispheres using interaural time differences (ITDs; Experiment II also presents a distractor stream from the center, which is always ignored). Listeners have a brief response period after the conclusion of the auditory stimuli in which to respond using a computer keypad. Feedback is then provided to tell them whether or not their response was correct. After a random pause, the next trial begins.

Cortical ERP Analysis

To isolate cortical responses from the scalp-recorded EEG, signals were band-pass filtered from 2 to 25 Hz using the eegfiltfft.m function in EEGLab toolbox (Delorme and Makeig, 2004). We focused our cortical analysis of auditory ERPs on channel Cz (channel 32 in the 10/20 system), where they tend to be greatest. For each trial, we analyzed epochs of the EEG from −0.2 s (before the sound stimulus began) to the end of the stimulus. For each such epoch, we found the maximum absolute peak voltage. In order to reduce contamination from movement and other artifacts, for each subject we created a histogram of peak values across trials, and then rejected trials in the top 15% of each subject’s distribution from further analysis.

Using the remaining trials, we used a bootstrap procedure to compute average ERPs to the onsets of notes in Stream A and Stream B separately for when Stream A was the target and when Stream B was the target. Specifically, for each attention condition for each subject, we used a 200-draw bootstrap procedure with replacement (100 trials per draw). The N1 magnitude of each note onset ERP was taken to be the local minimum between 100–220 ms after the onset of the corresponding note. The P1 magnitude of each ERP was taken to be the local maximum in the period from 30–100 ms after the note onset. We computed the difference in these magnitudes to estimate the average peak-to-peak P1-N1 magnitude. Thus, for each subject, we estimated the P1-N1 magnitude in response to each note onset in Stream A and Stream B when Stream A was the target and separately when Stream B was the target. We denote these magnitudes as $M_{s, n}^{focus}$ , where s is the stream containing the note onset being analyzed (either A or B), focus denotes whether that stream was Attended or Ignored, and n denotes the temporal position of the note in the corresponding stream.

Since it takes time for an auditory stream to be perceptually segregated from a sound mixture (Cusack et al., 2004; Best et al., 2008), we quantified the strength of top-down attentional modulation for each individual note in each stream. However, because the effects of endogenous attention and attention switching could interfere with top-down modulation of responses to the first notes in each stream, we omitted these from analysis of the effects of top-down attention on the P1-N1 magnitude.

Top-down executive control is expected to modulate the sensory representation of sound in the cortex, leading to reduced responses when a stream is ignored compared to when it is attended, which may be due to both suppression of the stream when it is ignored and enhancements of the stream when it is attended (e.g., see Picton and Hillyard, 1974; Choi et al., 2014). Given this, we expected $M_{s, n}^{Attended}$ to be larger than $M_{s, n}^{Ignored}$ . However, ERP magnitudes vary significantly across subject, due to differences in brain geometry, electrode impedance, and other “nuisance” factors; these factors cause shifts in measured ERPs that are constant on a logarithmic scale. Computing differences in ERP amplitudes on a linear scale would not compensate for these changes in overall strength. Consistent with this, past experiments in our lab suggest that the percentage change in ERP amplitudes, or (equivalently in a mathematical sense) the difference of the ERP amplitudes on a logarithmic scale, is a good way to quantify individual differences in how strongly attention modulated responses, as if the effect of attention is well modeled as a multiplicative gain change in response amplitude (e.g., Choi et al., 2014). Therefore, to quantify an individual’s ability to modulate the neural representation based on top-down attention, we computed the Attentional Modulation Index (AMI) for each stream by computing the difference of the log of the magnitudes of $M_{s, n}^{Attended}$ and $M_{s, n}^{Ignored}$ . Specifically, the AMI was computed for each subject as the average across note onsets (from the second to final note) of the log of the ratio of $M_{s, n}^{Attended}$ over $M_{s, n}^{Ignored}$ :

{AMI}_{s} = \frac{1}{N - 1} \sum_{n = 2}^{N} log (\frac{M_{s, n}^{Attended}}{M_{s, n}^{Ignored}}) (1)

where N is the number of notes comprising stream s (in Experiment I, N = 10 for Stream A and N = 8 for Stream B; in Experiment II, N = 5 for Stream A and N = 4 for Stream B). Defined this way, the AMI should be zero if attention has no effect on the neural representation of the stream ( $M_{s, n}^{Attended}$ equals $M_{s, n}^{Ignored}$ ) and increases monotonically with the strength of attentional modulation of the neural responses.

EFR Analysis

To isolate subcortical responses from the scalp-recorded EEG, signals were high-pass filtered with a 65 Hz cutoff using the eegfiltfft.m function in EEGLab toolbox (Delorme and Makeig, 2004). We then quantified the fidelity with which the subcortical response of each subject encoded the fundamental frequency of identical complex tones in the presented streams (Bharadwaj and Shinn-Cunningham, 2014).

For each complex tone that we analyzed, we treated each identical tone repetition as an independent sample, regardless of its temporal position in a stream. For each of these repetitions, we analyzed the epoch from the note onset to the end of the note. We combined an equal number of positive polarity and negative polarity repetitions to compute the EFR (Skoe and Kraus, 2010; Shinn-Cunningham et al., in press). In order to achieve the best possible signal-to-noise ratio (SNR) in our estimates, we combined measurements across the EEG sensors using complex principal components analysis (Bharadwaj and Shinn-Cunningham, 2014). We quantified the EFR using the phase locking value (PLV; see Lachaux et al., 1999), a normalized index that ranges from 0 (no phase locking across trials) to 1 (perfect phase locking). Importantly, the number of repetitions used in this analysis determines the noise floor of the PLV, making it easy to interpret the results (Zhu et al., 2013).

Past work shows that selective auditory attention has a negligible effect on the EFRs generated by subcortical structures (Varghese et al., 2015). In Experiment I, we tested this by comparing PLVs in response to the most commonly repeated notes in each stream when that stream was attended vs. when it was ignored. For this analysis, we used a 200-draw bootstrap procedure with replacement (400 repetitions per polarity per draw) separately when listeners attended to the stream the notes were in and when listeners ignored the stream they were in. In Experiment II, we reduced the number of notes per stream and had fewer repetitions of the same notes per trial. Because of this, there were not enough trials to allow a direct comparison of EFRs to the same notes when listeners attended to the stream they were in vs. when that stream was ignored in Experiment II.

In both experiments, we quantified the strength of the EFRs for individual subjects by combining all repetitions of the most commonly repeated notes in Stream A and in Stream B, collapsing across conditions when the stream containing the notes was attended and when it was ignored. We used a 200-draw bootstrapping with replacement, with 500 repetitions per polarity per draw.

Like cortical ERPs, individual differences in the absolute EFR are influenced by various nuisance factors (e.g., brain geometry, electrode impedance, overall cortical noise levels; Bharadwaj et al., 2014, 2015). Within an experimental session for a given subject, these factors should affect both Stream A and Stream B EFRs identically on a logarithmic scale. Thus, we planned to quantify individual differences in subcortical coding using “normalized” EFR measures computed as the ratio of the EFR to Stream B notes to the ratio in Stream A notes, thereby canceling out nuisance factors (see also Bharadwaj et al., 2015, which demonstrates that individual differences in subcortical coding fidelity are better described by normalized EFRs than by absolute EFR strength).

Mid- to high-frequency stimulus content is the dominant signal driving EFRs (Zhu et al., 2013). As described in detail below, in both Experiment I and Experiment II, all of the harmonics but the fundamental in Stream A overlapped with the lower half of the spectral content of the notes in Stream B. However, the upper half of the spectrum of the notes in Stream B did not overlap with any other stimulus components. Because of this design, we expected the EFR in response to the notes of Stream B to be relatively strong for all subjects; the mid- to high-frequencies in the Stream B notes were not masked and also had deep modulations to drive the EFR (see Bharadwaj et al., 2014, 2015). In contrast, because Stream A notes were spectrally masked due to the interfering spectral content of Stream B (and thus had reduced modulation depth in the mid- and high-frequency portions of the stimulus), we expected these EFRs to depend more directly on the degree of cochlear neuropathy in an individual subject (Bharadwaj et al., 2014, 2015). Therefore, the ratio of the PLV to notes in Stream B divided by the PLV to notes in Stream A should be relatively small in good listeners (strong EFR to Stream B notes divided by a relatively strong EFR to Stream A notes) and large in listeners with a reduced number of auditory nerve fibers (strong EFR to Stream B notes divided by a relatively weak EFR to Stream A notes). By this logic, we expected this ratio to be negatively correlated with differences in how well listeners could perform the behavioral task, which relied, in both tasks, on the ability to discern small pitch differences between notes in the attended stream. To quantify these individual differences, we thus computed the PLV ratio for each subject as:

P L V R_{s} = \frac{P L V_{StreamB, s}}{P L V_{StreamA, s}} (2)

where PLV_{Stream x,s} is the PLV of the EFR to the repeated notes in Stream x for subject s.

Statistical Tests

Experimental factors were analyzed using multi-way ANOVAs based on mixed-effects models (Baayen, 2008) implemented in R (Foundation for Statistical Computing). Subject-related factors, which were not assumed to comply with homoscedasticity, were treated as random effects. All other factors and interactions were treated as fixed-effect terms (although some factors were nested, precluding inclusion of some interaction terms). To prevent over-fitting and determine the most parsimonious model, we compared models with and without each random effect term using the Akaike information criterion (AIC; Pinheiro and Bates, 2006). All data sets were checked for normality using the Kolmogorov–Smirnov test.

In addition, we examined individual differences by looking for correlations between variables. Significance was tested by computing the Pearson correlation coefficient; p-values were then computed using a two-tailed student’s t test.

Experiment I

Experiment I presented listeners with two ongoing tone streams with different spectral content, and asked listeners to count pitch deviants in the attended stream. We measured the strength with which the subcortical EFR phase locked to the pitch of notes making up each stream, and the strength of cortical responses to the onsets of the notes in each stream.