Neural Correlates of Auditory Pattern Learning in the Auditory Cortex

Learning of new auditory stimuli often requires repetitive exposure to the stimulus. Fast and implicit learning of sounds presented at random times enables efficient auditory perception. However, it is unclear how such sensory encoding is processed on a neural level. We investigated neural responses that are developed from a passive, repetitive exposure to a specific sound in the auditory cortex of anesthetized rats, using electrocorticography. We presented a series of random sequences that are generated afresh each time, except for a specific reference sequence that remains constant and re-appears at random times across trials. We compared induced activity amplitudes between reference and fresh sequences. Neural responses from both primary and non-primary auditory cortical regions showed significantly decreased induced activity amplitudes for reference sequences compared to fresh sequences, especially in the beta band. This is the first study showing that neural correlates of auditory pattern learning can be evoked even in anesthetized, passive listening animal models.


INTRODUCTION
Sensory perception requires correctly recognizing incoming sensory stimuli by extracting relevant information from memory. Such memory can be formed by implicit learning of sensory input through repetitive exposure. Fast memory formation by capturing unique features of sensory signals is thus one key factor for efficient sensory perception, which requires active involvement of primary sensory cortices (Harris et al., 1999;Bao et al., 2004;Gavornik and Bear, 2014;Rosenthal et al., 2016).
In hearing, a series of recent studies reported fast and robust learning of abstract sounds, using a novel experimental paradigm that resembles unsupervised implicit learning of newly presented acoustic stimuli in auditory scenes (Agus et al., 2010;Luo et al., 2013;Andrillon et al., 2015). In this paradigm, participants were simply asked to detect a within-sequence repetition in random noise samples. Unbeknownst to them, one specific noise sample would re-occur occasionally, and even though the subjects were unaware of this, they nevertheless showed fast, selective improvement in processing the frozen "reference" stimulus, which implies rapid and robust memorization of random features of complex sounds. Such behavioral improvement for the re-occurring sound was supported by increased inter-trial coherence of brain responses for the reoccurring stimulus compared to other random stimuli measured by subsequent EEG and MEG studies in humans (Luo et al., 2013;Andrillon et al., 2015). Interestingly, increases in neural coherence could even be observed when the human subjects were in Rapid Eye Movement (REM) or light non-REM sleep during the experiment (Andrillon et al., 2017), suggesting that a neural index related to learning new sounds can be traced even following passive exposure. While these findings provided insights into the neural correlates of implicit learning of new auditory stimuli, further investigations using invasive measurements will be needed to understand the underlying mechanisms. The present study aimed at investigating neural responses shaped by passively presented re-occurring sounds in the auditory cortex using rats as an animal model.
Previous electrophysiological studies have investigated how neurons adapt to re-occurring sounds to understand memory and adaptation processes, by using a simplified experimental paradigm, in which a series of standard sounds (usually pure tones) is disrupted by a presentation of a deviant sound (Garrido et al., 2009;Malmierca et al., 2014;Nieto-Diego and Malmierca, 2016). Under such paradigms, stimulus specific adaptation (SSA) has been widely reported using comparisons between habituated neural responses to a standard sound against the typically greater responses for a novel, deviant sound. SSA effects have been observed along the auditory pathway, first in the primary auditory cortex (AC), and then in non-lemniscal subdivisions of the inferior colliculus (IC) and the medial geniculate body (MGB; Ulanovsky et al., 2003;Anderson et al., 2009;Ayala and Malmierca, 2013;Parras et al., 2017). A more recent study further reported stronger SSA in non-primary AC fields compared to primary AC (Nieto-Diego and Malmierca, 2016). Another study using more complex and realistic sounds has suggested that higher-order regions in the AC, rather than primary fields, may be uniquely susceptible to the adaptation to repeatedly presented realistic auditory inputs (Lu et al., 2018). The study further reported that the adaptation effect was retained after the disruption period from another repetitive presentation of the other sound input in the AC. These results point to the active involvement of the AC in learning and adaptation to ongoing or predictable sounds, which is thought to play a role not only in encoding stimuli, but also their context (Bar-Yosef et al., 2002;Skipper, 2014;Lu et al., 2018), as well as the prefrontal cortex (PFC; Casado-Román et al., 2020). However, while previous studies compared neural responses evoked by occasional deviants relative to consecutively presented standards, such constant presentation of a single sound is not sufficient to fully explain our ability of fast implicit learning for newly presented sounds. Instead, recognizable re-occurring sounds typically appear occasionally, interspersed with other random, non-repeating sounds, and yet listeners learn them without much effort.
In the present study, instead of the classical paradigm of constant representations of a single sound, we adapted an experimental paradigm (Agus et al., 2010) to intermittently present frozen "reference" sequences among other random sequences. The aim of the present experiment was to look for a physiological correlate of the "learning" of the frozen sequence that can occur even during passive exposure in the AC of anesthetized rats, using electrocorticography (ECoG) as a first step for identifying neurophysiological markers. We focused on investigating neural characteristics that emerged by learning re-occurring auditory patterns across primary and non-primary auditory fields within the AC. We particularly controlled for physical differences between stimuli by comparing pure induced, non-phase-locked neural responses to the stimulus computed after the time-frequency decomposition rather than evoked neural responses. By doing so, we could minimize any observed effect to be drawn from characteristics of the stimulus itself and focus on the neural modulations induced by higher-order processing (Klimesch et al., 1998;David et al., 2006). Our results show more attenuated induced activity amplitudes for the reoccurring sounds compared to other sounds, in both primary and non-primary fields, especially in the beta frequency band.

Animal Subjects
Six female adult Wistar rats (age = 8-21 weeks, mean = 12.5, SD = 4.42, weight = 257-315 g) were acquired from the Chinese University of Hong Kong. Experimental procedures were approved by the City University Animal Research Ethics Sub-Committee and conducted under license by the Department of Health of Hong Kong  in DH/SHS/8/2/5 Pt.5].

Stimuli
We generated sequences of acoustic stimuli shown schematically in Figure 1A. Each sequence consisted of five segments. Sequences could be made up either of 0.24 s long dynamic random chords (DRCs) or of 0.2 s long white noise (WN) snippets. Sequences either consisted of the same segment repeated five times (repeated sequence, RS), or they were nonrepeating, random sequences (S). To make it easier to distinguish neural signatures of repetition detection from simple onset or offset responses, sequences were bracketed with additional "head" and "tail" segments, which were always generated afresh, and ramped on or off linearly. Segments were joined with 5 ms ramping overlaps to avoid transients. Sequences were presented in blocks with inter-sequence-interval of 0.6 s. One block contained 100 unique RS and S sequences each, as well as one "frozen RS" and one "frozen S" sequence, which were presented 100 times each in each block. Adopting the nomenclature of Agus et al. (2010) we refer to the frozen sequences as "references", or "RefRS" and "RefS, " respectively. Thus, one block consisted of a shuffled series of 400 stimuli, with 100 different S and RS sequences and 100 times of 1 unique RefS and RefRS sequence being presented in random order (see Figure 1B). For each block, sequences were generated anew with new random seeds.
The DRC sequences consisted of 12 chords of superimposed 20 ms pure tones at 15 log-spaced frequencies from 500 to 20,000 Hz. The level of each tone was randomly drawn from FIGURE 1 | (A) Sequences composed of random spectral pattern (DRC or WN) segments (marked by red vertical lines) that were either repeated 5 times in a row (RS sequence) or non-repeating (S sequence). Ramped, random "head" and "tail" segments bracketed each sequence. (B) Sequences were presented in blocks of 400 trials. Each block contained 100 sequences, each of unique R and S sequences as well as repeated "reference" RefRS and RefS sequences, which were presented in random order. a uniform 50-90 dB SPL range to have mean 70 dB SPL, generating random spectro-temporal patterns characteristic of each DRC. The WN sequences consisted of Gaussian noise snippets, generated from different random seed values. As DRC segments comprise more salient spectral contrasts than WN segments, we expected the learning of Reference sequences to be easier in DRC than in WN sequences.

Experimental Procedure
We recorded responses to five blocks of DRC sequences and five blocks of WN sequences from ECoG arrays placed onto the right AC. Anesthesia was induced using Ketamine (80 mg/kg) and Xylazine (12 mg/kg, Intraperitoneal injection; i.p.) and maintained with Urethane (20%, 7.5 µl/g, i.p.). Urethane anesthesia minimizes NMDA receptor blockage and closely resembles REM and stage II nREM sleep-like status (Pagliardini et al., 2012). Dexamethasone (0.2 mg/kg, i.p.) was injected to prevent inflammation. Adequate anesthesia was confirmed by regular testing for the suppression of the toe pinch withdrawal reflex. Body temperature was kept at 36 ± 1 • C with a heating pad. The rat was placed in a stereotaxic frame and the head was fixed with hollow ear bars to allow the delivery of auditory stimuli. We measured the auditory brainstem responses (ABRs) in each ear to confirm that the rats had normal hearing sensitivity (click thresholds < 20 dB SPL). The right AC was exposed by a rectangular 5 × 4 mm craniotomy which extended from 2.5 to 7.5 mm posterior from Bregma, with its medial edge 2.5 mm from the midline (Polley et al., 2007). A 61-channel ECoG array (Woods et al., 2018) was connected to a Tucker-Davis Technologies (TDT) PZ5 neurodigitizer and RZ2 real-time processor and placed on the exposed cortex. Sound stimuli were presented via a TDT RZ6 multiprocessor through the hollow ear bars at a sampling rate of 48,828 Hz, and ECoG responses were recorded at 24,414 Hz using BrainWare software.
The correct placement of the ECoG array was confirmed by recording frequency responses to 100 ms pure tones at a range of frequencies (500-32 kHz, 1/4 octave steps) at 70 dB SPL to obtain frequency tuning curves of individual electrodes and a mapping of the best frequency across the recording site. Given that the ECoG electrodes are rather large and their spacing is relatively wide relative to the reported dimensions of tonotopic fields of the rat described in the literature, the frequency response area (FRA) maps obtained did not show clear tonotopic gradients, but they nevertheless revealed physiological features of a frequency response topography of the AC, which were reproducible from animal to animal. In particular, we were able to verify that tentative primary auditory (A1) areas have distinct frequency gradients from low to high frequencies, while the tentative non-A1 areas (SRAF) have frequency gradients from high to low frequencies (from caudal to rostral; Figure 2). These findings are consistent with previous studies from other laboratories (e.g., Nieto-Diego and Malmierca, 2016).

Data Analyses
Acquired neural responses were pre-processed to obtain eventrelated potentials (ERPs) for each channel and condition for each rat. ERPs were used to look for differences across the four conditions (S, RS, RefS, RefRS), using the time-frequency analysis described below. To calculate ERPs of each channel, ECoG signals were low-pass filtered (second-order zero-phase Butterworth) at 45 Hz, downsampled to 1,000 Hz, and re-referenced to the common mean. Time points at which signal values exceeded ±3 SD of the mean signal across time were identified as outliers and removed [i.e., replaced by linear interpolation from neighboring points, and detrending as described in de Cheveigné and Arzounian (2018)]. Signals for each trial were then epoched from −100 to 1600 ms relative to the onset of each sequence. Epochs for each condition were averaged to compute mean ERPs for each channel. To reduce data dimensionality, as well as minimize the effect of individual variations in electrode placement between rats, we subjected each rat's channel-bytime ERP matrix (averaged across conditions) to a principal component analysis (PCA) and ordered components from the highest to the lowest amount of variance. We selected the top FIGURE 2 | (A) An example of frequency tuning curves for each electrode at the recording site to 100 ms pure tone at different frequencies at 70 dB SPL. (B) An example central frequency gradient across the recorded AC site. Tonotopic gradients of tentative A1 (low to high characteristic frequency from caudal to rostral) and non-A1 (high to low characteristic frequency from caudal to rostral) areas were observed.
components (in order of variance explained) describing at least 99% variance and calculated the weighted sum of the spatial components to quantify the evoked response topography with reduced variabilities across rats. A visual inspection of regional response differences per rat from the obtained topography revealed that channels with the lowest response weights were mainly around A1 areas while channels with the highest response weights were mainly around non-A1 areas. Thus, we grouped top response-weighted channels as a tentative non-A1 cluster, and the bottom response-weighted channels as a tentative A1 cluster for further analysis of regional differences. Since the number of channels included in each cluster did not affect the results, we grouped the channels into the top 30 channels for the non-A1 cluster and the rest for the A1 cluster.
Next, to characterize the differences in induced responses to reference sequences (RefRS and RefS) compared to fresh sequences (RS and S), we ran a time-frequency analysis of singletrial ECoG signals using Morlet wavelets implemented in the FieldTrip toolbox for Matlab (frequency range: 4-80 Hz in 2 Hz steps; 400 ms fixed time window; Billig et al., 2019) for each rat. The time-frequency power spectrum of each trial was rescaled by subtracting the time-frequency spectrum of the average ERP for the same condition (i.e., evoked power) on a logarithmic scale. This subtraction yielded an estimate of induced activity amplitude, whereby the responses in each individual trial did not have to be precisely time-locked to the stimulus (Hartmann et al., 2012). Therefore, the induced response differs from the ERP by focusing on the oscillation of spectral power rather than on phase-locked responses to the stimuli. The resulting singletrial induced responses were log-scaled and averaged across trials. After obtaining average time-frequency power spectra for each rat, channel group, and stimulus condition, we ran two clusterbased permutation paired t-tests (as implemented in FieldTrip) on RefRS versus RS stimuli and on RefS versus S across rats as independent observations, with 1000 iterations per test. This statistical analysis was performed to test whether the observed effects were largely consistent across rats.

RESULTS
First, a cluster-based permutation paired t-test with 1,000 iterations revealed no significant differences in ERP amplitudes (averaged across channels) between RefRS and RS conditions or RefS and S conditions, either for DRC or for WN sequences.
For both DRC and WN stimuli, channels presumed to be A1 area showed lower evoked response weights (averaged across all trials and conditions) than channels presumed to be non-A1 areas, mainly from around suprarhinal auditory field (SRAF; Figure 3A). Based on the evoked response weights, we grouped channels into two clusters, A1 and non-A1 clusters, for further analyses on comparing induced activity amplitudes differences across conditions. A time-frequency analysis of induced activity revealed robust differences between pairs of Ref and non-Ref conditions for both clusters for DRC, but not for WN.
We first focused on neural activity in the non-A1 cluster induced by DRC stimuli, based on the hypothesis that perceptual learning of complex stimuli may primarily modulate activity in higher-order regions. When comparing time-frequency responses between RefRS and RS conditions, we observed significantly decreased power for RefRS versus RS during the sequence presentation, emerging from the onset of RefRS mostly in the beta band (10-40 Hz; T min = −19.41, T max = −2.57, all cluster-based p's < 0.05). This was especially pronounced during the first three segments of the sequences (Figure 3B). In the RefS versus S comparison, decreased power for RefS was also observed from the RefS onset to the sound offset across the theta, alpha, and beta band (4-30 Hz; T min = −8.54, T max = −2.58, all cluster-based p's < 0.05). In the A1 cluster, power decrease for RefRS vs. RS was observed in a similar frequency range (4-30 Hz; T min = −19.82, T max = −2.58, all cluster based p's < 0.05) to the non-A1 cluster, but persisted for a longer time period (from sequence onset to sound offset), mostly in the beta band. Power decrease for RefS vs. S was observed for similar time FIGURE 3 | (A) Spatial topography maps (methods of Nieto-Diego and Malmierca, 2016 andPolley et al., 2007) of average evoked responses collected by 8 × 8 ECoG for DRC (left) and WN (right) sequences, respectively. Each pixel represents individual ECoG channel of an 8 × 8 grid placed over the AC area. Colorscale is fixed for both sound types. Tentative subfields of the AC are marked with relevant labels (A1, VAF, AAF, and SRAF). The main A1 cluster (white line) shows slightly lower evoked response weights, and the putative SRAF cluster (black line) showed generally greater evoked response weights. Overall evoked responses to DRC sequences were stronger than for WN. In both cases, greater evoked responses were generally found in the non-A1 clusters. (B) Differences in average time-frequency induced power spectra for RefRS minus RS (left) and RefS minus S (right) pairs for DRC (top) and WN (bottom). Black solid vertical lines indicate sound onset and offset, dashed vertical lines indicate reference sequence onset and offset, and dotted vertical lines on the left panel indicate within-sequence segment boundaries. Black contours in DRC spectra indicate time-frequency areas where a significant difference between conditions was observed for the non-A1 cluster, and white contours indicate the areas with a significant difference observed for the A1 cluster (cluster-based p < 0.05). No significant power difference was observed for both pairs in WN.
period and frequency bands to the non-A1 cluster (4-30 Hz; T min = −9.98, T max = −2.57, all cluster-based p's < 0.05). For WN sequences, we did not observe any significant differences in time-frequency response spectra between either RefRS versus RS or RefS versus S comparisons.

DISCUSSION
We assessed distinct neural correlates of implicit learning processes through repetitive passive exposure to a specific auditory sequence. We compared neural dynamics of re-occurring sequences with the same acoustic characteristics (RefRS and RefS) and a group of other sequences that were presented only once (RS and S), by computing induced activity amplitude of neural signals recorded from primary (A1) and higher-order auditory cortex. We observed decreased induced activity amplitude throughout the stimulus sequence for RefRS and RefS compared to RS and S, mainly in the beta band, both for A1 and non-A1 channel clusters, but only for DRC stimulus sequences which contain more salient acoustical features compared to WN. This finding suggests an active involvement of both primary and non-primary AC in the implicit learning of complex auditory patterns.
Unlike most previous studies that computed differences between evoked responses as an index of learning (Lim et al., 2016;Lu et al., 2018), we did not observe any significant difference in ERPs across conditions. This result, however, was expected in our study as Ref sequences were presented in a passive listening setting with a complex and unpredictable experimental design. Previous neuroimaging studies in humans under similar paradigms also mainly focused on comparing inter-trial coherence rather than ERP differences between RefRS and RS or RefS and S, as similar ERPs for RefRS and RS were observed in most cases (Luo et al., 2013;Andrillon et al., 2015Andrillon et al., , 2017. Thus, we focused on comparing induced response power obtained from each trial after the time-frequency decomposition. For DRC stimuli, we found distinct response patterns for both of re-occurring sequences (RefRS and RefS) from the beginning of the sequence presentation when compared to fresh sequences (RS and S). Each test block contained different, randomly generated re-occurring sequences, and thus, there was no build-up effect along successive blocks. Our finding suggests that within-sequence repetitions are not a requirement for the learning process, as long as the sequence contains salient information to be learned. The effects in our study were observed mostly in the beta band, which has been implicated in sensory memory (Haenschel et al., 2000;Scholz et al., 2017) and sensory predictions (Pearce et al., 2010;Auksztulewicz and Friston, 2016;Auksztulewicz et al., 2017).
The effect of attenuated induced activity amplitude for RefRS over RS and RefS over S was found from both A1 and non-A1 channel clusters. The effect mostly overlapped between the two clusters, especially for RefS. Interestingly, for RefRS, significant power differences in the non-A1 cluster started to diminish already after the first three segment repetitions-unlike the power difference in the A1 cluster which lasted toward the end of the sequence (Figure 3B). Although further investigation is required, we hypothesize that repeated segments within RS may also have been learned, resulting in no distinctive difference between RefRS and RS to be observed toward the end of the sequence in the non-A1 cluster. It could be possibly due to increased suppression to reoccurring segments in putative non-lemniscal (non-A1) clusters relative to lemniscal (A1) clusters (Parras et al., 2017). It further indicates that acoustic features presented in re-occurring brief segments that are as short as 200 ms can be effectively learned and recognized. Such characteristic was only observed in the non-A1 cluster, suggesting a hierarchical structure of the AC and indicating a role of higher-order regions in repetition suppression and prediction in a shorter time frame (Auksztulewicz and Friston, 2016;Lim et al., 2016;Lu et al., 2018). The present finding provides further insights into neural responses mediating RefRS learning within a short timeframe. Whether such characteristics remain in a longer-term memory should be further studied.
One caveat of the present study is that we did not observe a gradual development of the effect across trials, which is one of the key factors that would imply the effect as an outcome of learning. Since the data were recorded with ECoG as a first step to verify whether any difference emerges in the AC, the present study focused on the signals accumulated over multiple trials. Further investigation with single and multi-unit recordings would be beneficial to study changes of neuronal activities on a trial-by-trial level, by separating units that are selective to the Ref sequences from non-selective ones to increase the signal-to-noise ratio (e.g., Lu et al., 2018).
Finally, although a previous study using the similar paradigm showed neural correlates of RefRS using WN as stimuli (Luo et al., 2013;Andrillon et al., 2015), we did not observe any distinctive characteristic of RefRS and RefS for WN. The repeating segment length of WN in the present study (200 ms) was shorter than previous studies that used WN (500 ms; e.g., Agus et al., 2010). Factors such as segment duration or seamless presentation could have affected the saliency of the Ref sequence, depending on the type of stimuli (Agus et al., 2010;Andrillon et al., 2017;Kang et al., 2017). Thus, different outcomes between DRC and WN could be due to a greater saliency that DRC stimuli could generate for their recognition compared to WN, especially when such short repeating segment was presented. Our results raise the intriguing possibility that more salient Ref stimuli may be those which induce more beta power, but whether this is indeed the case will need to be tested in future experiments with greater statistical power.
The present experiment was conducted under anesthesia. Anesthetics could affect certain aspects of auditory processing such as spike timing, population activity or frequency tuning, depending on the type of anesthetics (Zurita et al., 1994;Gaese and Ostwald, 2001;Huetz et al., 2009;Noda and Takahashi, 2015). However, a large amount of previous physiology research investigating neural adaptation for re-occurring sounds or information processing has been conducted on animals under anesthesia (e.g., Bao et al., 2004;Anderson et al., 2009), especially using urethane (e.g., Astikainen et al., 2011Astikainen et al., , 2014Lipponen et al., 2019). These studies, as well as our results, suggest that neural responses under anesthesia carry important information that are highly correlated with sensory perception. In the present study, the usage of urethane was chosen to minimize any adverse effect of anesthetics on brain function (Capsius and Leppelsack, 1996;Hara and Harris, 2002;Curto et al., 2009). Furthermore, our findings suggesting distinct neural traces for Ref sequences over the AC in rats under anesthesia are comparable to the findings observed from human neuroimaging study during REM sleep (Andrillon et al., 2017). Thus, the present findings provide important initial findings on neural correlates during such passive, implicit learning in AC. Certain discrepancies between the present findings and previous human studies (e.g., no significant difference found for WN presentations) could be further studied by conducting experiments in awake animals.
In summary, the present study showed distinctive neural traits for re-occurring abstract auditory patterns that provide salient acoustic features (DRC) in the AC. While decreased induced activity amplitudes in the beta band observed throughout the AC suggest that both A1 and non-A1 areas are involved in encoding the information of re-occurring acoustic stimuli, such memory encoding in non-A1 areas could be processed in a shorter time frame. In this study, we report, for the first time, a neural correlate of this type of memory formation in an easy to use, passive listening animal model, which should greatly facilitate further investigation into underlying neural mechanisms.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: https:// auditoryneuroscience.org/PL_ECoGdata.

ETHICS STATEMENT
The animal study was reviewed and approved by the Animal Research Ethics Sub-committee, City University of Hong Kong.

AUTHOR CONTRIBUTIONS
HK, RA, NA, MS, and JS designed the study and ran pilots. HK and HA conducted the experiments. HK and RA analyzed the data. HK wrote the first draft of the manuscript. All authors revised the manuscript.