^{†}

^{*}

Edited by: Malcolm Slaney, Google, United States

Reviewed by: Guillaume Garreau, IBM Research Almaden, United States; Amin Saremi, University of Oldenburg, Germany

This article was submitted to Neuromorphic Engineering, a section of the journal Frontiers in Neuroscience

†Present Address: Chetan S. Thakur, Department of Electronic Systems Engineering, Indian Institute of Science, Bangalore, India

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

This paper presents a digital implementation of the Cascade of Asymmetric Resonators with Fast-Acting Compression (CAR-FAC) cochlear model. The CAR part simulates the basilar membrane's (BM) response to sound. The FAC part models the outer hair cell (OHC), the inner hair cell (IHC), and the medial olivocochlear efferent system functions. The FAC feeds back to the CAR by moving the poles and zeros of the CAR resonators automatically. We have implemented a 70-section, 44.1 kHz sampling rate CAR-FAC system on an Altera Cyclone V Field Programmable Gate Array (FPGA) with 18% ALM utilization by using time-multiplexing and pipeline parallelizing techniques and present measurement results here. The fully digital reconfigurable CAR-FAC system is stable, scalable, easy to use, and provides an excellent input stage to more complex machine hearing tasks such as sound localization, sound segregation, speech recognition, and so on.

The human auditory system is superior to any machine-hearing system in efficiency of perceiving sound. As the input structure for the auditory pathway, the tonotopically-organized cochlea decomposes, converts and amplifies sound waves nonlinearly into electrical signals, and delivers the results to the nervous system. The cochlea is characterized by a remarkably wide dynamic range (0-120 dB SPL) (Fettiplace and Hackney,

Cochlear models can be divided into two classes: transmission-lines (TL) and auditory filterbanks (Duifhuis,

Auditory filterbank models use either parallel or cascade filters to model wave propagation on the BM. Parallel filterbank models use independent filters, such as rounded-exponential (roex) filters (Glasberg et al.,

Parallel filterbank models are mostly concerned with reproducing the observed mechanical and pay little attention to the biological structure of the cochlea. For example, Wang et al. implemented a parallel ultra-steep roll-off filter model on a 0.35μm CMOS chip (Wang et al.,

Cascade filterbank models take advantage of the way sound propagates in the forward direction as traveling waves in the cochlea. In the cascade of filters, each filter stage models a segment of the nonuniform distributed wave system and its output becomes the input of the next section (Lyon,

The biological cochlea is a causal, active, and nonlinear system. Figure

The frequency response measured from a chinchilla cochlea for various levels input strength measured in dB of sound pressure level (SPL) adapted from (Ruggero,

In auditory filterbank models, the nonlinearities can be described as linear filters with parameters depending on signal level. For example, the parallel and cascade gammachirp filter models (PrlGC and CasGC) (Irino and Patterson,

The CAR-FAC model is a digital cascade auditory filter model proposed by Richard Lyon and described in detail in (Lyon,

Saremi et al. compared seven computational cochlear models including one cascade filterbank model (CAR-FAC), one transmission-line model, one biophysical model, and four parallel filterbank models (Saremi et al.,

We target a digital ASIC implementation of the CAR-FAC model for machine hearing applications since it is small, more energy efficient and more stable than analog implementations (Sarpeshkar,

The CAR-FAC model consists of a cascade of asymmetric resonators, a digital OHC (DOHC) model, a digital IHC (DIHC) model and an AGC loop, as shown in Figure _{i} is connected to its next stage and the DIHC. It also gives an intermediate variable, velocity, to the DOHC. The DIHC feeds back to the DOHC through the AGC loop. The DOHC combines the AGC loop output and the velocity and feeds back to the resonator. The CAR-FAC output includes a multi-channel BM out _{i} and a DIHC out, which can be transformed into the neural activity patterns _{i}. The details of each model are described hereafter:

Structure of the CAR-FAC model. _{1} to _{N} are the transfer functions of the CAR part, and _{1} to _{N} represent the CAR-FAC output. The CFs of the CAR resonators decrease from left to right. The DOHC, the DIHC and the AGC loop comprise the FAC part. The neural activity pattern (NAP) rate outputs, _{1} to _{N}, are estimations of average instantaneous nerve firing rates.

In the CAR, the asymmetric resonator is a coupled form two-pole-two-zero filter, as shown in Figure

Structure of the two-pole-two-zero resonator. _{0}, _{0}, and _{0} and _{1} are the intermediate variables,

The two-pole coupled form has a pair of conjugate poles (

where θ_{R} is the pole angle in the z plane. The conjugate zeros (_{zero} and

where θ_{Z} is the zero angle in the z plane. The zero radius is the same as the pole radius, _{R}) < 0:

Coefficient

In this structure, the zeros can be moved together with the poles by changing _{0} to keep the zero frequency at half an octave above the pole frequency.

Additionally, changing the poles and the zeros of the filter, via

The zeros and poles are set initially for each cascade stage. The poles of the two-pole-two-zero resonator are chosen to be equally spaced along the normalized length of the cochlea according to the Greenwood map function (Greenwood,

Here, coefficient

In the CAR-FAC model, the FAC effects are achieved by moving the initial CAR poles and zeros positions by varying their radius

The DOHC models the OHCs function, actively and nonlinearly amplifying the wave propagation in the cochlea. In the CAR-FAC model, the DOHC gain control mechanism integrates a local instantaneous nonlinearity and a multi-time-scale nonlinearity, as shown in Figure _{1}. The multi-time-scale nonlinearity comes from the DIHC feedback through the AGC loop filter. Both combine to change the pole (zero) radius

where coefficient _{1} is the minimum radius, corresponding to the maximumdamping of the resonator. In a digital implementation, _{1} is given by:

where the coefficient _{s} is the sampling frequency. _{1} keeps the damping away from zero, thereby keeping the system away from the Hopf bifurcation of the resonators. _{1} also makes the damping bounded. The increment of _{1} is the relative undamping. It is the product of the nonlinear function (_{1}) (Lyon,

Structure of the DOHC model. The instantaneous nonlinearity performs a nonlinear gain control (NLF) on the CAR velocity, which is calculated from the BM coefficient _{1}. The multi-time-multi-scale dynamic gain-control factor,

The

where ν is the CAR velocity,

The level dependence of the damping mechanism introduces frequency distortions. The velocity-squared function includes a double-frequency term that interacts with the CAR coefficients (_{0}_{0}_{1} and _{2} (where _{1} < _{2}), then a third tone, at the frequency (2_{1}–_{2}) will appear and propagate through the cascade of filters. The _{2}–_{1}) (Lyon,

The DIHC models the IHC function. It comprises a high-pass filter (HPF), a transduction nonlinearity unit, a transducer unit and two LPFs. The IHCs are mechano-electrical transducers that sense the BM vibration, convert the mechanical motion into electrical signals, and deliver the results to the nervous system. The DIHC model is shown in Figure

where _{hpf} is the high pass filtered CAR output,

Structure of the DIHC model. It comprises a HPF, a transduction nonlinearity unit, a transducer unit and two LPFs.

The transducer unit detects and amplifies the signal onset, then compresses and reduces its response gain quickly after the signal onset. It is implemented by:

where

The AGC loop consists of a four-stage cascade FIR LPF, with each stage coupled with its left and right neighbors to form a three-stage spatial LPF. It feeds the DIHC signal back to the DOHC at a much lower update rate than other parts of the CAR-FAC model. The AGC loop models the medial olivocochlear system's efferent feedback that exerts an AGC on the BM vibration through the OHCs. The AGC loop filter is shown in Figure _{1}, _{1}-_{2}, _{2}] apply weight _{1} to the left neighbor value, _{2} to the right neighbor value, and _{1}-_{2} to the current channel value to keep the total mixing gain equal to 1. For a 44.1 kHz signal, in the fastest and most local stage, AGC-SF4, _{1} is 0.14 and _{2} is 0.2 (Lyon,

Structure of the AGC loop. Four stages of the temporal smoothing filters (SF) (Upper). Each stage consists of a temporal LPF with a defined time constant (0.002, 0.008, 0.032, and 0.128 s) and a three-tap spatial smoothing filter. The internal structure of an AGC-SF (Lower), the input of the AGC-SF comes from the lower filter stage with the smaller time constant as well as the accumulation of the DIHC. The output goes to the next stage of the temporal filter. The spatial smoothing filter is a three-tap smoothing filter coupled with lateral channels. _{1}, _{2}, and _{1}-_{2} are the spatial filter coefficients.

The CAR-FAC system can be efficiently implemented on FPGA, and the system is configurable in filter parameters and channel numbers Figure

Architecture of the CAR-FAC FPGA system. The system consists of an audio codec, a CAR-FAC module, a controller module and an interface module. The FPGA board is hosted by a PC through a USB interface.

The CAR-FAC module implements the components described in section The CAR-FAC Model. Additionally, the CAR module can operate independently: when the FAC function is turned off, the DOHC and AGC loop function will be switched off, and all the CAR coefficients (_{0}, _{0},

The controller module controls the system data flow, including writing the initial coefficients, and/or the audio file input to the CAR-FAC module, as well as the CAR-FAC module output to the interface module. Additionally, the output of the system is selectable: we can choose either the BM output or the DIHC output as the system output.

The interface module consists of a data synchronization module, an external memory, and a USB interface. The data synchronization circuit synchronizes data between different clock domains. There exist two clock domains in the system: a system clock domain (250 MHz) and an interface clock domain (100 MHz). The system clock domain includes the controller module and the CAR-FAC module. The interface clock domain is unique to the interface module. The external memory is a 1 GB DDR3 SDRAM on the FPGA board: it stores the CAR-FAC output data. The USB interface communicates between the FPGA board and the PC, and transmits the system's initial coefficients (_{0}, _{0}, _{1},

We first simulated the CAR-FAC model in Python with floating-point numbers. Next, we verified the model using the fixed-point numbers to determine the required word length for the FPGA implementation. We use 20-bit BM variables, 20-bit DOHC variables, 14-bit DIHC variables and 14-bit AGC variables to approximate the floating-point CAR-FAC performance and to meet the input, output and internal variables range to achieve a 70 dB input dynamic range. We use the pipeline technique to parallel the CAR module, the DOHC module, and the DIHC_AGC module, and the time-multiplexing approach to reuse single CAR, DOHC, and DIHC_AGC hardware module to implement a compact reconfigurable CAR-FAC system. The system design diagram is shown in Figure

CAR-FAC system design diagram. The CAR-FAC system is implemented with 20-bit word length for the design coefficients, BM output, and DOHC output, and 14-bit for the DIHC output and the AGC output. The controller state machine determines the cochlear channel to be processed at any particular time and controls the CAR-FAC coefficients and data for that channel. The BM_start signal controls the start of the system through the controller, and it is triggered by the Audio_in_ready signal. The ohc_sel is a selector switch for the CAR/CAR-FAC function. The agc_sel is a switch for the AGC loop function. The CAR state machine calculates the transfer function of Equation (1) and controls the DOHC and DIHC_AGC start in the system. The DOHC state machine calculates Equation (10–12) and feeds back an updated

In digital audio, 44.1 kHz is a common sampling frequency, and the digital hardware of the CAR module (the two-zero-two-pole resonator) and the FAC module (the DOHC module and the DIHC-AGC module) can operate much faster than the audio sample interval (22.68 μs). Hence, in this system, a single CAR-FAC hardware module is reused multiple times to implement the multiple-channel multi-level pipeline CAR-FAC system. At 44.1 kHz sampling frequency, with a single CAR-FAC module, we were able to implement up to 70 filter channels real-time CAR-FAC system.

For each CAR-FAC module, there exist four state machines in the system. The controller state machine determines the cochlear channel to be processed at a particular time and controls the CAR-FAC coefficients and data for that channel. The CAR state machine calculates the transfer function of Equation (1). The DOHC state machine calculates Equation (10–12), and feeds back an updated

The BM_start signal controls the start of the system through the controller and is triggered by the Audio_in_ready signal. If there exists an audio input (Audio_in) from either the PC or the audio codec, the BM_start signal will be sent to the CAR through the controller, and the CAR will start to run. The ohc_sel is a selector switch for the CAR/CAR-FAC function, and the agc_sel is a switch for the AGC loop function. When the ohc_sel is low, the DOHC function is switched off, and the CAR-FAC operates as a linear CAR system, and we can choose either the CAR or the DIHC as the output. When both the ohc_sel and the agc_sel are high, the whole CAR-FAC function is switched on. When the ohc_sel is high and the agc_sel is low, the AGC loop function is switched off, leaving only the instantaneous nonlinearity in the CAR-FAC system.

The CAR state machine controls the DOHC and DIHC_AGC start in the system. It will send a start signal to the DOHC and the DIHC-AGC module separately at a particular time to start the DOHC and the DIHC-AGC function if both the ohc_sel and the agc_sel are high. The DOHC state machine starts when the CAR module finishes updating the internal variables _{0}/_{1}. The DIHC-AGC state machine starts when the BM output calculation is finished. The pipelined CAR, DOHC, and DIHC_AGC structure is shown in Figure

The device utilization for a single CAR-FAC module is shown in Table

Device utilization summary.

ALM | 5,235 | 29,080 | 18 |

Memory (bits) | 1,082,812 | 4,567,040 | 24 |

DSPs | 49 | 150 | 33 |

We have implemented a real-time digital CAR-FAC system at a 44.1 kHz sampling rate on a Cyclone V FPGA board covering an input frequency range up to 22.05 kHz. The number of channels in the system is reconfigurable, and more channels will result in more overlap among filters if the frequency range is kept the same. For machine hearing applications, about 50% overlap in items of equivalent rectangular bandwidth (ERB) is considered to provide a well-behaved representation of a sound (Lyon,

The measured system transfer function in response to a -40 dB full scale (FS), 1 s sine tone sweep from 20 Hz to 22.05 kHz (squared-cosine rise and decay time of 0.1 s to minimize the influence of the spectral splatter) is shown in Figure

Transfer function of the 70-channel CAR-FAC system to a -40 dB FS, 1 s sine tone sweep from 20 Hz to 22.05 kHz (squared-cosine rise and decay time of 0.1 s to minimize the influence of the spectral splatter). The CAR response (Upper) when the FAC function is switched off; The CAR-FAC response (Lower).

Figure

CAR and CAR-FAC output in response to 0.5, 1, 2, and 4 kHz tones with an amplitude of -40 dB FS at the channels of CFs corresponding to the input frequencies.

Excitation patterns show the vibration amplitude across the BM to a single sound. Here, the excitation patterns were calculated as the root-mean-square (RMS) signal at the output of all the CAR-FAC channels (Ren,

Figures

Excitation patterns calculated as the RMS output signal of the 70 CAR-FAC channels in response to tones at

Additionally, we calculated the BM input/output (I/O) function to evaluate the nonlinear and compression effects of the system. The I/O function is the ratio between the RMS output at the CF channel corresponding to the stimulus frequency and the RMS of the stimulus (Saremi et al.,

The CAR-FAC frequency selectivity was evaluated from the system frequency responses. The frequency response was calculated using the FFT from the system impulse responses at the channels of CFs corresponding to 0.5, 1, 2, 4, and 8 kHz.

Furthermore, in the CAR-FAC system, quality factor (_{ERB} (de Boer and Nuttall,

The ERB was evaluated from the system's impulse response power spectral density (PSD).

Figures _{ERB} under different damping factors. The smaller _{ERB} corresponds to higher damping, and at higher damping (0.5 and 0.7), _{ERB} is higher at moderate CFs than lower and higher CFs.

_{ERB} at CFs corresponding to 1, 0.5, 2, 4, and 8 kHz estimated from the BM impulse response PSD at CFs.

The relation between dB FS and Sound Pressure Level, expressed in dB SPL, depends on the _{1} in Equation (10)]. Comparing the peak gain at moderate frequencies (1, 2, and 4 kHz) with the measured biological cochlea frequency response in Figure

We also investigated the system's impulse response characteristics in the time domain and the intensity dependence of the _{ERB} factors. Figure _{ERB} factor for clicks with intensities between -60 dB FS and -10 dB FS in steps of 10 dB FS at the CF corresponding to 1 kHz. The _{ERB} factor decreases as the stimulus intensity increases. The sharpness of the frequency response thus decreases as the stimulus intensity increases.

System impulse responses at the 1 kHz CF channel to -50 dB FS, -30 dB FS, -10 dB FS clicks. The arrows mark the amplitude of clicks. The red dashed lines mark two consecutive impulse response zero-crossings (_{ERB} factors derived from impulse responses at relative intensities from -60 dB FS and -10 dB FS in steps of 10 dB FS (

To investigate the DIHC characteristics, we measured the DIHC response to tones. In order to present stimuli with same amplitude to the DIHC, we made use of the linearity of the CAR: we switched off the FAC function, leaving the CAR amplifying the input tones linearly. Firstly, we presented 0.5, 1, and 4 kHz tones to the system, and measured the CAR output at channels with CFs corresponding to each of those tones. We adjusted each tone's amplitude to make sure the CAR output at the corresponding channel had the same amplitude of 2.28 dB FS. Next, we used the adjusted tones as the input to the system and measured the DIHC output in response to those tones with the same CAR output amplitude at the corresponding CFs (Gmel et al.,

Figure

DIHC output and CAR output in response to 100 ms tones of 0.5, 1, and 4 kHz at the channels of CFs corresponding to those tones.

This paper presents a fully digital implementation of the CAR-FAC cochlear model. We use time-multiplexing and pipeline parallelizing techniques to implement a 70-channel real time CAR-FAC system at 44.1 kHz on a Cyclone V FPGA board. We measured the system responses to a set of stimuli such as pure tones and condensation clicks and analyzed the CAR-FAC nonlinear growth characteristics, excitation patterns, frequency selectivity and impulse response. We investigated the CAR-FAC

Here, we compare the system with prior silicon cochleae with respect to architecture, channel number, frequency range, input range,

Comparison with prior silicon cochleae.

Architecture | Cascade | Parallel | Parallel | Active coupling | Parallel | Passive coupling |

Channel number | 70 × 3^{a} |
64 × 2 | 16 | 360 | 16 | 100 |

Frequency range | up to 22.05 k Hz | 8–20 k Hz | N/A | 210–14 k Hz | 100–5 k Hz | 200–20 k Hz |

Input range (dB) | 70 | 73(including 18dB of the attenuator) | 92 | 52 | 75(with AGC) 55(without AGC) | 50 |

Power supply (V) | 1.1 | 0.5 | 1.8 | 2.5 | 2.8 | 3.3 |

Power ( |
1,260^{b} |
0.055 | 0.028 | 35.9 | 0.06 | 1.7 |

<10 (through |
1.3-39 from channel 18 | 0.83-7 | 1.16 ± 0.92 | <10 | 0.25–12 |

YX, RW, and AvS: proposed the idea and designed the FPGA system; YX: recorded the data; YX, TH, RW, and AvS: evaluated and discussed the results; YX: wrote the manuscript. All authors discussed the results, commented on the manuscript and approved it for publication.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

This work was supported by the Australian Research Council Grant DP140103001. It was inspired by a project at 2016 Telluride Neuromorphic workshop. The support by the Altera university program is gratefully acknowledged.