Measuring Distortion-Product Otoacoustic Emission With a Single Loudspeaker in the Ear: Stimulus Design and Signal Processing Techniques

The distortion-product otoacoustic emission (DPOAE) is a backward propagating wave generated inside the cochlea during the wave amplification process. The DPOAE signal can be detected rapidly under relatively noisy conditions. In recent years, the earphone industry demonstrated interest in adopting DPOAE as an add-on feature to make their product “intelligent” of inner-ear status. However, a technical challenge remains to be tackled—the loudspeaker in an earphone generates its own cubic distortion at the same frequency as DPOAE. Unfortunately, the intensity of loudspeaker distortion is typically comparable to that of the DPOAE, if not higher. In this research, we propose two strategies, namely compensation and cancellation, to enable DPOAE measurement with a single loudspeaker. The compensation strategy exploits the part of the growth function of the loudspeaker distortion which is almost linear, and thus suppresses the distortion it generates while retaining a larger portion of DPOAE in the residual signal. The cancellation strategy utilizes a one-dimensional Volterra filter to remove the cubic distortion from the loudspeaker. Testing on normal-hearing ears shows that the compensation strategy improved the DPOAE-to-interference ratio by approximately 7 dB, resulting in a cross-correlation of 0.62 between the residual DPOAE level and the true DPOAE level. Meanwhile, the cancellation strategy directly recovered both the magnitude and the phase of DPOAE, reducing the magnitude estimation error from 15.5 dB to 3.9 dB in the mean-square sense. These pilot results suggest that the cancellation strategy may be suitable for further testing with more subjects.


INTRODUCTION
Otoacoustic emissions (OAEs) are sounds generated in the cochlea that propagate backward to emit from the ear (1). OAEs can be classified into two types (2)-spontaneous OAEs (SOAEs) and evoked OAEs. SOAEs occur in the absense of external stimulus, and evoked OAEs can be regarded as acoustic responses to external stimulus. Within the family of evoked OAEs, the distortion-product OAE (DPOAE) is widely used as an objective tool for detecting hearing impairment associated with outer hair cell (OHC) dysfunctions (3,4). To measure DPOAE, a pair of primary tones at frequencies f 1 < f 2 are delivered to an earphone inserted to the ear canal. With appropriately chosen intensities and frequencies of the primary tones [e.g., f 2 /f 1 = 1.22, (5)], the most prominent distortion product would occur at 2f 1 − f 2 and it can be recorded from a microphone in the ear canal. Because the primary tones' excitation patterns mainly overlap near the f 2 characteristic place in the cochlea (6), the sound-pressure level (SPL) of DPOAE at f DP = 2f 1 − f 2 represents the cochlea's ability to process signals normally at frequency f 2 . Thus, DPOAE serves as a robust and non-invasive tool for assessing cochlear functions in a frequencyspecific manner (4). It has been applied clinically for hearing screening (7,8), and diagnosis of acute hearing loss (9) and other kinds of hearing impairment (10,11).
Typically, a clinical DPOAE probe consists of two loudspeakers and one microphone; for each ear, the primary tones at f 1 and f 2 are separately delivered to the two speakers to avoid generating intermodulation distortion (IMD) electrically (12). As a rare exception, a single-speaker configuration was adopted for measuring vibration caused by DPOAE on insect tympanal organs (13); however, it was emphasized that one should avoid over-driving the speaker and thus producing IMD artifacts (14). In the field of cochlear neurophysiology, nonetheless, a combination of 5-7 tones with carefully arranged frequencies could be delivered simultaneously to a single speaker to elicit auditory-nerve responses (15); in their study, loudspeaker IMD was not a concern because the neural response by nature contains strong quadratic-distortion components which actually facilitate efficient estimation of the cochlear tuning curve at the auditory-nerve level.
Recently, the two-loudspeaker hardware design has been adopted by a commercial headphone that promotes at-home DPOAE measurement as a means of providing personalized frequency response adjustment (16). The two-speaker design seems necessary because, even with a high-quality headphone or earphone, the total harmonic distortion (THD) can reach 3% when driven to its full dynamic range (17). This THD level is acceptable for listening to music; however, when delivering two pure tones simultaneously, we found that the distortion generated by such speakers would significantly interfere with the DPOAE from the ear since the cubic distortion of the speaker also occurs at f DP .
Nevertheless, human DPOAE and loudspeaker IMD have different generation mechanisms even though they may occur at the same frequency. For example, the DPOAE signal is comprised of a direct component plus a reflective component (18,19); the direct component travels back from the f 2 characteristic place in the cochlea, while the reflective component travels further to the f DP place and changes direction due to coherent reflection (20). The two components thus have different latency in the range of 5-20 ms, which allows them to be separated via envelope-tracking techniques (21). Also, they may superpose constructively or destructively depending on their relative phase. In comparison, the loudspeaker IMD is perhaps elicited nearly instantaneously, so we expect that its latency and rate of growth with respect to the primary tone levels L 1 and L 2 might differ from that of DPOAE. In this research, we seek to exploit these differences and develop methods for estimation of DPOAE levels using a single speaker, despite of interference from loudspeaker IMD.
In particular, we propose stimulus design and signal processing strategies that handle the interference issues due to loudspeaker IMD. The first strategy is called compensation and it involves finding a combination of L 1 and L 2 such that the IMD level grows almost linearly with respect to simultaneous increment in (L 1 , L 2 ). The second strategy is called cancellation and it utilizes 3rd-order one-dimensional Volterra filter (22) to subtract the loudspeaker IMD from the signal. The organization of the remaining part of this paper follows the standard order of Methods, Results, Discussion, and Conclusion.

METHODS
In this section, we first review the mathematics of IMD generated by two tones. Typical spectrums of DPOAE and loudspeaker IMD will be shown so we can examine the similarities and differences. Then, the compensation strategy and the cancellation strategy will be described. This section ends with brief descriptions of the recording equipment and the human subjects who participated in the testing.

IMD and Mathematical Notations
Assuming that an acoustic or electrical stimulus, called the input signal, contains two frequency components f 1 and f 2 , so the signal can be expressed as follows, where A 1 , A 2 and φ 1 , φ 2 denote the amplitude and phase for two components, respectively. Assume that the stimulus is delivered to a nonlinear system G so that the response y(t) can be denoted as y(t) = G x(t) . When G is instantaneous, it can be expanded by Taylor's series near the origin; that is, By setting η = x(t) and through simple trigonometry, one can show that where B m,n are complex-valued coefficients, and m and n sum over all integers such that mf 1 +nf 2 > 0. The components B m,0 or B 0,n are referred to as harmonics; the additional components B m,n when m and n are both nonzero are called the intermodulation products and they can be classified by their order |m| + |n|. In the context of DPOAE measurement with a single loudspeaker, y(t) thus contains not only the primary frequencies f 1 and f 2 but also higher-order components at mf 1 + nf 2 . Empirically, the 3rd-order intermodulation products B 2,−1 and B −1,2 are most prominent from a loudspeaker (see Figure 1A), and a few other components such as B 3,−2 , B −2,3 can also be identified above the noise floor. In comparison, from the DPOAE spectrum (measured by the conventional two-speaker approach), the B −1,2 component corresponding to frequency 2f 2 − f 1 usually cannot be detected (see Figure 1B). The reason is because, even though the intermodulation product at 2f 2 − f 1 is generated due to OHC nonlinearity near the f 2 place, it is prohibited from backward propagation along the basilar membrane (23). By inspecting Figure 1, note that the loudspeaker components occurring at 2f 1 − f 2 has the same frequency as DPOAE. This 3rd-order component from the loudspeaker is referred to as IMD3 hereafter, and we shall investigate how to estimate DPOAE regardless of the presence of IMD3.

The Compensation Strategy
The DPOAE level, denoted as L DP , depends systematically on parameters (L 1 , L 2 , f 1 , f 2 ). The relation L DP = L DP (L 1 , L 2 , f 1 , f 2 ) was comprehensively measured from a cohort of 20 normalhearing human subjects (24) with a purpose to recommend the optimal choice of L 1 = L opt 1 that maximizes L DP given L 2 . When L 1 increases beyond L opt 1 , L DP starts to decrease due to two-tone suppression (25). The same phenomenon has also been reproduced in silico by simulation of cochlear mechanics (23). In this section, we report on how differently the IMD3 level depends on the parameters, and hence devise a way to suppress IMD3 by considering two sets of primary-tone level (L 1 , L 2 ) jointly.

Growth Function of IMD3
In contrast to cochlear mechanics, the loudspeaker nonlinearity does not demonstrate two-tone suppression; for instance, Figure 2 shows IMD3 level as a function of (L 1 , L 2 ) with f 1 = 1, 000 Hz and several different ratios f 2 /f 1 . The results were obtained by delivering the primary tones to one of the two loudspeakers of a DPOAE probe (see section 2.4 for details) and measuring the response inside a syringe of approximately 2.0 cc. For any fixed L 2 , as L 1 increases, we do not find a clear L opt 1 beyond which L IMD3 starts to decrease, and this is quite unlike what was observed in human subjects with the most commonly used frequency ratio f 2 /f 1 ≈ 1.2 (24, Figure 1).
The contour plot of IMD3 also differs from that of human DPOAE in the rate of growth with respect to L 1 and L 2 . In particular, when L 1 and L 2 increases proportionately as they move toward the top-right corner of the plot along the straight lines L 1 = L 2 + 10 or L 1 = L 2 , the rate of growth of L IMD3 with respect to L 2 seems to be close to 1.0 dB/dB for a wide range of L 2 and across different f 2 /f 1 ratio (see the "growth functions" in Figure 3). The slope of these growth functions are shown in Figure 4, and the path L 1 = L 2 happens to have the slope that is closest to 1.0 dB/dB across different primary-frequency ratios. In comparison, the average human DPOAE growth rate when L 1 = L 2 falls in the range of 0. Figure 1). By exploiting this difference between the growth function of loudspeaker IMD3 and human DPOAE, we present a method that suppresses the IMD3 level while partially retaining the DPOAE.

Signal Acquisition Protocol
To leverage the part of IMD3 growth function that is almost linear (i.e., 1 dB/dB slope), we can devise the following performance metric, where (P 1 , P 2 ) denotes primary-tone sound pressure in Pa, β > 1 denotes a scaling factor, and G IMD3 and G DP denote the sound pressure of IMD3 and DPOAE in Pa, respectively. Conceptually, the goal of the compensation strategy is to choose (P 1 , P 2 ) and β such that J is maximized. However, since the numerator (referred to as DPOAE residual) would vary among individuals, we seek to minimize the denominator in J, referred to as the IMD3 residual. Based on the results shown in Figure 4, we selected L 1 = L 2 (i.e., P 1 = P 2 ) for the remaining parts of this paper. As implied by Figure 3, choosing L 1 anywhere between 45 dB to 65 dB SPL should work well in reducing the IMD3 residual since G IMD3 (βP 1 , βP 2 ) ≈ βG IMD3 (P 1 , P 2 ). In contrast, we expect a larger proportion of DPOAE would remain in the DPOAE residual because the rate of growth against (P 1 , P 2 ) is sub-linear (i.e., < 1 dB/dB). Based on the above-mentioned concept, we propose the following signal acquisition protocol.
• Step 1: Calibrate the stimulus levels A 1 and A 2 in Equation (1) such that P 1 = P 2 in the ear canal. • Step 2: Transform the recorded signal to the frequency domain, and calculate the magnitude at 2f 1 − f 2 , which is the vector sum of IMD3 and DPOAE. Denote the result as Y(P 1 , P 2 ).
To summarize, the goal of the signal acquisition protocol is to keep a large portion of DPOAE in the residual while maximally suppressing IMD3 at the same time.

The Cancellation Strategy
To describe the cancellation strategy, since digital adaptive filtering techniques are involved, we change the time variable from t to the integer index n (not to be confused with the index n in Equation 3). We follow a standard digital signal processing notation in defining y[n] = y(nT) (26), where y(t) is a continuous-time signal, T denotes the sampling period, and y[n] denotes the result after sampling in time.
The idea behind this strategy is to cancel IMD3 instantaneously. To achieve this goal, we utilize two techniques, namely a phase-controlled exponential swept-sine chirp and the one-dimensional Volterra filters (ODVFs), to adjust the input signal before sending it to the speaker. The workflow is shown in Figure 5 where H(ω) denotes a frequency response measured by the phase-controlled exponential swept-sine chirp (to be described in section 2.3.1), and the normalization factor ensures that the input to ODVF is limited to the range [−1, 1]. The input signal which contains two pure tones for DPOAE measurement is first transformed from time domain to frequency domain through DFT, then transformed back to time domain after multiplying with the frequency response of the single speaker H(ω). The filter coefficients of ODVF are meant to be obtained offline by adaptive LMS algorithm (to be described in section 2.3.2); when applied online, the ODVF filter coefficients are fixed. Subsequently, the multiplication by H −1 (ω) is to compensate the gain and phase change due to multiplication by H(ω). This workflow produces an adjusted input signal to be delivered to the single speaker for the purpose of measuring DPOAE.
The rationale of this design is to artificially generate intermodulation distortions by the ODVF filter but with an inverted polarity, so they can cancel the real intermodulation distortion generated by the loudspeaker. We shall see in section 2.3.2 that the ODVF is trained off-line to emulate the situation of using two separate loudspeakers.

Linear System Estimation by Phase Controlled Exponential Swept-Sine Chirp
A phase-controlled exponential swept-sine chirp (27) is used to obtain the linear coupling response from the loudspeaker to the microphone. This chirp exhibits an instantaneous frequency that increases exponentially with time as follows, where A is the amplitude of the sine wave, Q is an integer number of octaves, L is the ideal chirp length, and N is the real chirp length which is L rounded to the nearest integer. We also need an inverse chirp to convolve with, which can be expressed as The result of convolving s[n] and s −1 [n] is approximately a Dirac delta impulse, and the gain is 0 dB for all frequencies.
We set the chirp's frequency to glide from 47 to 24, 000 Hz and the chirp length was 10.5 s. Then, we delivered the chirp to

IMD3 Cancellation by One-Dimensional Volterra Filtering
Since the intermodulation is due to loudspeaker nonlinearity, the behavior of its inverse system can ideally be characterized by Volterra series expansion (28). However, the full-scale Volterra series expansion requires estimate of multi-variate kernel functions which may be computationally impractical to implement and its estimation might be slow in convergence. In this research, we adopted a simplified version called onedimensional Volterra filters (ODVF) (22) -assume that the inverse system can be modeled as follows, where x[n] and y[n] denote the input and output of the inverse system, respectively, h p [i] denotes the pth-order kernel of ODVF, and M p is the length of the pth-order kernel. Since this research focuses on canceling a cubic distortion, a partial ODVF retaining only the 1st-order and 3rd-order kernels was used; that is, The filter coefficients could be obtained by the adaptive least mean square (LMS) method (29). The LMS method involves updating the filter coefficients constantly as the time index n proceeds. First, denote the filter coefficients in the following vector form, and similarly, Note that the subscript n indicates that both vectors are updated as n increases. Then, an approximation error signal can be defined as follows, where d[n] is a desired signal to be defined shortly, and vectors x[n] and x 3 [n] are defined as follows, Finally, the update equations are given as follows, and similarly, where α = 6 × 10 −3 denotes a stepsize that was chosen empirically. Figure 6 shows how the ODVF coefficients were obtained in this research. The stimulus s[n] contained two primary tones, and we set d[n], the desired signal, to be the signal recorded by sending the two tones to separate speakers of a reliable reference probe, so d[n] was free of IMD. Meanwhile, the input x[n] to the ODVF was the signal recorded by using one single speaker while y[n] denotes the output of the ODVF. Here, we emphasize that the adaptive procedure was performed offline just for one time, and it was not necessary to repeat the procedure when measuring DPOAE from individual ears. In practice, we first recorded x[n] and d[n] separately in the same 2-cc syringe with the same stimulus s[n]. Then, the filter coefficients h 1,0 were initialized at 1 and h 3,0 were initialized at 0, and we computed the updates iteratively according to Equations (9)(10)(11). After the filter coefficients converged, we could expect that the variance of e[n] would be minimized and y[n] should approximate d[n], which is a cubic distortion-free signal, in a stochastic sense.
In practice, we found that the 1st-order kernel h 1 [n] tended to converge to a band-pass filter around the primarytone frequencies which simply rejected all the intermodulation components linearly. This caused the "training" of h 3,n to fail. Therefore, we set the 1st-order kernel length M 1 to 1 to ensure that h 3 [n] learns to cancel the cubic distortion.

Equipment
All recordings were collected using a Python script that controls the ER-10C DPOAE probe-microphone (Etymotic Research) system via a 24-bit soundcard (Fireface UFX II, RME). The sampling frequency was set to 48 kHz. All recordings were done in a sound-proof room with the noise floor of approximately 19-21 dB SPL (30).

Human Subjects
Twelve subjects between age 22 and 32 participated in the research, including 8 males and 4 females. The compensation strategy was tested on the data from 5 of the subjects, while the cancellation strategy was tested on the data from 7 subjects. All the subjects did not have ear infection or report any hearing problems at the time of experiment. The recruitment of human subjects was approved by the IRB of National Tsing Hua University (No. 10912HE101).

RESULTS
Here we report the efficacy of applying the compensation and cancellation strategies.
Here, K is just the IMD3 to IMD3 residual ratio in dB scale. The values K for different combinations of β and L 1 are listed in Table 1. Note that K is quite insensitive to change in L 1 , while β = 1.5 gives the highest K. Therefore, β = 1.5 was chosen for testing in the ear. Also, L 1 = L 2 = 60 dB SPL was selected in order to maximize DPOAE and its residual.
DPOAEs residuals were recorded from five subjects for their left and right ears using the compensation strategy. Two protocols were considered. The first one is called the "stereo" protocol which uses separate speakers to obtain the ground truth of DPOAE and the residual thereof after applying the compensation strategy. The second is called "mono" protocol and it uses a single speaker mentioned in section 2.2.2 to obtain the DPOAE residual subject to the IMD3 interference. The residual obtained with the stereo protocol should be regarded as a performance upper-bound for the compensation strategy since the IMD3 component is negligible when both speakers are used. We then study the correlation between the residual level obtained by both protocols and the true DPOAE level to evaluate the effectiveness of the compensation strategy.
The results of mono vs. stereo protocols are plotted in Figure 7. The x-axis is the true DPOAE level obtained with two speakers, and the y-axis represents the DPOAE residual levels obtained with the two protocols. The dashed lines show the results of linear regression. For the stereo protocol, the regression line is y = 0.77x − 12.46, and for the mono protocol, the regression line is y = 0.38x − 7.76. By using the stereo protocol, the correlation between DPOAE residual and true DPOAE level, both in dB SPL, equals to 0.86 with a high significance (p < 0.001). However, the DPOAE residual levels were 15 dB lower than the real DPOAE levels in average. This indicates that the DPOAE was suppressed by about 80% after applying the compensation strategy. Note that, nevertheless, the IMD3 component was suppressed by 22 dB under the same settings (see Table 1). Thus, we can say that the compensation strategy improved the DPOAE to IMD3 ratio by 7 dB in average. By using the mono protocol, the correlation between the residual and the true DPOAE level is lowered to 0.62. Nevertheless, with a p < 0.01, the correlation is deemed significant for this particular set of data, in the sense that the null hypothesis (no correlation) is rejected.

The Cancellation Strategy
We obtained different sets of ODVF coefficients at f 1 = 1,000, 1,200, 1,500, and 2,000 Hz, respectively, while f 2 /f 1 = 1.2 was fixed. The amplitude of both tones was set to 20 mPa (60 dBSPL), and the recordings ran for 10.5 s with 2.5 ms raised cosine ramp for the rising and falling edges of the stimulation. Empirically, the recording time was sufficiently long to ensure convergence of the filter coefficients. Figure 8 shows the resulting 3rd-order filter coefficients h 3 [n] for f 1 = 1,000 and 2,000 Hz, respectively. These h 3 [n] coefficients were subsequently used for validating the proposed cancellation strategy depicted in Figure 5.
Following the workflow described in Figure 5, we first tested the adjusted input signal in the 2-cc syringe. The results are shown in Figure 9. Figure 9A is the signal recorded by using a single speaker to play two primary tones. Therefore, the spectrum contains intermodulation distortion. Figure 9B is the signal recorded by using separated speakers and it therefore does not contain intermodulation distortion, and Figure 9C is the signal by using the adjusted input signal produced by the workflow of Figure 5. The black dots represent the magnitude of primary tones f 1 and f 2 , the red dot represents the magnitude of the IMD3. All the three recordings ran for 2.5 s with 2.5 ms raised cosine ramp for the rising as well as falling edges of the stimulation. The result shows that IMD3 is largely reduced to submerge below the noise floor. Note that the 2f 2 − f 1 component at 1400 Hz is also suppressed, though not as perfectly as at 2f 1 − f 2 . The fifth-order distortions at 600 and 1600 Hz remain unchanged.
Then we applied the same recording procedure in human ears. Figures 10, 11 show some typical results. Figure 10A contains DPOAE interfered with the original IMD3, Figure 10B shows the ideal signal recorded by using separate speakers for the primary tones, and the results are regarded as the true DPOAE signal to compare against. Figure 10C is the signal recorded by using the adjusted input signal described in Figure 5; the signal contains DPOAE interfered with the remaining IMD3.
We also extended the experiments to f 1 = 1, 200, 1, 500, and 2, 000 Hz while f 2 /f 1 = 1.2. The results without (Figures 12A,C) and with (Figures 12B,D) the IMD3 cancellation strategy are shown in Figure 12. In Figures 12A,B, the horizontal axis is the true DPOAE level, and the vertical axis is the magnitude at f DP . In Figures 12C,D, the horizontal axis is the true DPOAE phase, and the vertical axis is the phase measured at f DP . The linear regression lines are also shown for visualization.
Without applying IMD3 cancellation strategy, the regression lines of the magnitude and the phase are y = 0.12x + 19.87 and y = 1.03x − 0.25, respectively, and the root mean square errors (RMSE) are 15.52 dB and 1.50 rad, respectively. After applying IMD3 cancellation strategy, the regression lines of the magnitude and the phase become y = 0.76x + 2.64 and y = 1.02x − 0.06, respectively, and the RMSE are reduced to 3.88 dB and 0.76 rad, respectively.
After applying the IMD3 cancellation strategy, we calculated the correlation between the estimated and the true DPOAE sound pressure levels, and the correlation between the estimated and the true DPOAE phase, respectively. Subsequently, we ran the Wald test with t-distribution to evaluate the significance of correlation. The correlation between the estimated and true DPOAE levels equals to 0.81 with a high significance (p < 0.001); also, the correlation between the estimated and the true DPOAE phase equals to 0.93 with a high significance (p < 0.001).
In contrast, without applying the IMD3 cancellation strategy, the correlation between the measured DPOAE and the true DPOAE sound pressure levels equals to 0.25 with a low  significance (p = 0.057). Although the phase estimation error is higher without applying the cancellation strategy, the crosscorrelation between the measured and the true DPOAE phase was still high (0.80) and significant (p < 0.001).
The number of data points turns out to be 55. We recruited 7 subjects, test both ears at 4 frequencies, resulting in 56 DPOAE magnitudes and phases. However, the DPOAE level of one of the ears was below the noise floor when f 1 = 1, 000 Hz. So that single point of data was abandoned.

DISCUSSION
As true wireless, noise-cancellation earphones are gaining popularity in recent years, the ear canal also becomes an over-booked space for various body sensors to enter and make the earphone intelligent (31,32). Since active noise-cancellation earphones are indeed equipped an internal microphone 1 , there is no reason why the microphone cannot measure DPOAE. The main factor that hinders such application might be the interference due to loudspeaker IMD. As much as we are aware of, consumer earphones are not typically designed to have two speakers in one ear 2 , so sending the primary tones to separate speakers would not be a choice. In this research, we demonstrated 1 For example AirPods Pro and AirPods Max Active Noise Cancellation and Transparency mode, https://support.apple.com/en-us/HT210643. 2 For example "Qualcomm QCC3056 is an ultra-low power, single-chip solution, optimized for use in truly wireless earbuds and hearables." The chip features mono audio playback. https://www.qualcomm.com/products/qcc3056.  that "wrongly" using one single loudspeaker to play the f 1 and f 2 tones may still work as long as we can cancel the IMD3 it generates. Hence, we hope that this research can serve as a feasibility study for the earphone industry to promote DPOAE as a service to consumers of active noise cancellation earphones. We envision that making DPOAE available at home could also enrich any remote hearing care program in the future.
Among the two proposed strategies, cancellation outperforms compensation in producing a more accurate prediction of the true DPOAE level. On one hand, the cancellation strategy achieves a higher cross-correlation to the true DPOAE level (0.81) than the compensation strategy (0.62); one the other hand, it also provides a direct estimate of the DPOAE magnitude and phase, instead of just a residual. It remains to be seen in the future whether similar results would be obtained with a larger sample size.
The usage of Volterra filters also brings up many research questions. For instance, Figure 8 shows that the 3rd-order function h 3 [n] are different for different choices of (f 1 , f 2 ). Presently, we are uncertain whether (a) this is a limitation due to omitting all the off-diagonal elements of the Volterra filter to make it one-dimensional, or (b) is it possible to apply certain transformation akin to pre-whitening so the adaptive system eventually "learns" a universal ODVF for canceling all the cubic distortions given any input signal. Apparently, there is still much room to explore on this topic.
Meanwhile, as much as the compensation strategy is concerned, it is surprising that we found a large region on the (L 1 , L 2 ) plane where the IMD3 level grows quasi-linearly when L 1 and L 2 increase proportionately. In practice, it may be interesting to see if a similar property can be observed in other DPOAE probes or consumer earphones. We speculate that the quasilinear growth is an epiphenomenon because cubic distortion is by nature a third-order component. Under the light of Taylor expansion in Equation (2), we can expect IMD3 to demonstrate a 3 dB/dB growth when the stimulus is at low intensity-and so does DPOAE (23). As the intensity of η increases, higher order terms G (k) (0)η k /k! begin affecting the growth function. In particular, all the odd-order terms should jointly reduce the slope of growth of B 2,−1 and account for its saturation (it is straightforward to show that the even-order terms do not contribute to B 2,−1 ). So the fact that we observe nearly 1 dB/dB growth at the working range of (L 1 , L 2 ) may just be a coincidence.
Some other techniques might be worth consideration for estimating the DPOAE level under IMD3 interference. As mentioned in section 1, DPOAE itself has two componentsthe direct one and the reflection. Based on the difference in latency, Vetesník et al. (21) designed short pips to elicit DPOAE and applied envelope tracking techniques to separate the two components. If the loudspeaker distortion is generated within a shorter time before DPOAE emerges, one might be able to identify an early peak by tracking the instantaneous amplitude at the 2f 1 − f 2 frequency. Thus, the loudspeaker IMD3 and DPOAE might be separated in the time domain. This may require careful re-thinking of the stimulus design and is left for future exploration.

CONCLUSIONS
We proposed two strategies to estimate the DPOAE level subject to interference from the loudspeaker IMD3. The compensation strategy was designed to suppress IMD3 based on its quasilinear growth with respect to primary-tone levels, in contrast to DPOAE's sub-linear growth. Results show that, although the DPOAE level was also suppressed by 80%, the residual level correlates to the true DPOAE level. The cancellation strategy adjusted the input signal nonlinearly to emulate distortionfree stimulation. It thus recovered both the DPOAE magnitude and phase directly. Overall, this research suggests that it might be feasible to use a single-loudspeaker probe to measure DPOAE. Testing with a larger sample of human subjects as well as various types of earphones shall ensue to evaluate whether commercial noise-cancellation earphones could be utilized to allow sufficiently accurate DPOAE measurement at home.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Research Ethic Committee, National Tsing Hua University. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
YCC discovered the almost linear growth in IMD3, and came up with the compensation strategy. WCH tested the compensation strategy, and conceived the cancellation strategy using ODVF and then implemented and tested the cancellation strategy. YWL supervised the research and is responsible for the manuscript preparation process. All authors contributed to the article and approved the submitted version.