ORIGINAL RESEARCH article

Front. Signal Process., 10 April 2024

Sec. Audio and Acoustic Signal Processing

Volume 4 - 2024 | https://doi.org/10.3389/frsip.2024.1304069

Doppler frequency analysis for sound-field sampling with moving microphones

  • 1. Institute for Signal Processing, University of Lübeck, Lübeck, Germany

  • 2. German Research Center for Artificial Intelligence (DFKI), Lübeck, Germany

Abstract

Moving microphones allow for the fast acquisition of sound-field data that encode acoustic impulse responses in time-invariant environments. Corresponding decoding algorithms use the knowledge of instantaneous microphone positions for relating the dynamic samples to a positional context and solving the involved spatio-temporal channel estimation problem subject to the particular parameterization model. Usually, the resulting parameter estimates are supposed to remain widely unaffected by the Doppler effect despite the continuously moving sensor. However, this assumption raises issues from the physical point of view. So far, mathematical investigations into the actual meaning of the Doppler effect for such dynamic sampling procedures have been barely provided. Therefore, in this paper, we propose a new generic concept for the dynamic sampling model, introducing a channel representation that is explicitly based on the instantaneous Doppler shift according to the microphone trajectory. Within this model, it can be clearly seen that exact trajectory tracking implies exact Doppler-shift rendering and, thus, enables unbiased parameter recovery. Further, we investigate the impact of non-perfect trajectory data and the resulting Doppler-shift mismatches. Also, we derive a general analysis scheme that decomposes the microphone signal along with the encoded parameters into particular subbands of Doppler-shifted frequency components. Finally, for periodic excitation, we exactly characterize the Doppler-shift influences in the sampled signal by convolution operations in the frequency domain with trajectory-dependent filters.

1 Introduction

Dynamic measurement procedures are capable of acquiring entire sound-field information by use of only one non-stop moving microphone. Due to their low effort in hardware and calibration compared to stationary approaches, such continuous techniques are ideally suited to gather fast estimates of acoustic impulse responses (AIRs) within expanded listening areas. Prominent examples of AIRs are head-related impulse responses (HRIRs), room impulse responses (RIRs), and binaural room impulse responses (BRIRs). Adequate estimates of AIR fields are essential for various applications related to multichannel equalization and cross-talk cancellation, sound-field analysis, auralization, and audio reproduction, e.g., for virtual and augmented reality systems ().

An analytical method for the reconstruction of AIRs along linear and circular microphone trajectories has been presented in (). Here, a specially designed input signal is needed, and the speed of the microphone must be constant and is restricted to an upper limit. Quite different from usual AIR measurements, the excitation signal must not contain all audio frequencies, but only a certain subset in order to avoid overlapping frequency shifts. The omitted frequencies are essentially generated through the Doppler effect. Beyond that method, dynamic approaches are typically based on estimates from linear equations that model the convolution of involved AIRs and excitation sequences, and relate them to the samples acquired at instantaneously varying microphone positions. In one group of such methods, the spatio-temporal dependencies are simplified and impulse responses at steadily changing positions are considered as time-varying systems whose coefficients are tracked by adaptive filtering concepts (). This setup is well-known from acoustic echo cancellation (Sondhi, 1967; ), and, combined with controlled surroundings and trajectories, it is also suitable for the fast acquisition of HRIRs (; ).

More recently, a new group of dynamic techniques have emerged, where the continuously acquired sound-field samples are directly embedded into a spatio-temporal context (; , 2018, ; ; ; ; Urbanietz and Enzner, 2020). Here, the measurement model is explicitly multidimensional, with a moving microphone that collects uniform samples in the time domain and, in general, non-uniform samples at varying points in the spatial domain. Thus, such techniques inherently require positional information based either on a controlled pre-defined trajectory or a tracking of the microphone positions (). The method by Urbanietz and Enzner (2020) uses a spatial Fourier basis for the angular reconstruction of HRIRs from continuous-azimuth recordings and shows more accurate performance than an adaptive-filter-based solution. In (), perfect-sequence excitation (Stan et al., 2002) is used for the orthogonal expansion of impulse responses, in order to describe the dynamic spatio-temporal sampling by notional static sampling processes of single expansion coefficients. This method simplifies the problem to pure interpolation in space. It has been investigated for reconstructing RIRs (; ) and BRIRs () along circular trajectories. In (), a versatile framework has been presented that allows for RIR reconstruction at off-trajectory positions within cubical volumes. To achieve this, the sound field is parameterized by modeling virtual grid points in space, and dynamic samples are understood as the result of bandlimited interpolation on that grid using sampled sinc-function approximations. In practice, the corresponding inverse problem is most likely ill-posed or even underdetermined. Therefore, in (Katzberg et al., 2018; Katzberg and Mertins, 2022), a strategy has been proposed that exploits sparse Fourier representations and applies principles of compressed sensing (CS) (). Recently, the dynamic framework subject to sinc-function related parameters has been generalized to a formulation, where arbitrary spatial basis functions cover expanded target volumes (). Based on this, a physical perspective has been provided for representing dynamic sound-field samples in terms of spherical solutions to the acoustic wave equation ().

For the measurement methods where a non-stop moving microphone collects sound-field data to AIRs, the Doppler effect will always be present in the sampled signal. The dynamic approaches considering the spatio-temporal context rely on the assumption that the occurring Doppler shifts are at least partly covered by the sampling model, as long as the microphone signal at the particular receiving times is connected to the instant positions in space. However, for all the existing frameworks, the implication between positional tracking and Doppler-shift rendering is not directly apparent from the mathematical model. Therefore, in this paper, we introduce a new interpretation of the dynamic sampling procedure from the Doppler perspective.

First, in Section 2, we briefly outline the basic differences in the mathematical models between stationary and dynamic sound-field measurement strategies. Then, in Section 3, we develop an equivalent Doppler-domain concept from which one can easily see that exact tracking of the microphone trajectory inherently allows for the exact tracking of the involved and unknown Doppler shifts in the frequency domain. The proposed representation explicitly describes the instantaneous interferences of the Doppler effect during the dynamic observation process. This can be used to estimate the underlying sound-field parameters without any Doppler bias, provided that the acoustic properties of the surroundings remain constant during the measurement sessions. In Section 4, we analyze the case of non-perfect positional tracking and describe the resulting Doppler-shift mismatches in the mathematical model. We derive structured expressions for the corresponding error terms and adapt stability guarantees and error bounds for both least-squares and CS-based solutions. In Section 5, the Doppler-based channel representation is exploited to provide a spatio-temporal filtering scheme that can be employed for decomposing the Doppler-shifted microphone signal into particular subbands. This allows us to reconstruct low-frequency content in cases where the wideband recovery problem is ill-conditioned. Considering periodic excitation, Section 6 shows how the proposed concept enables us to directly relate the Doppler shifts in the measured signal to trajectory-dependent filters that spread spectral sound-field characteristics across adjoining frequency bins. Finally, in Section 7, we demonstrate the key points of the Doppler framework on experimental data.

2 Dynamic sound-field sampling procedures

Assuming a fixed environment with several reflecting surfaces and constant atmospheric conditions, the propagation of the sound signal s(t) inside the target area can be considered as a linear time-invariant (LTI) system. This leads to the observation modelwhere s(t) originates at a fixed source position xS∉Ω subject to the global time , p(x, t) is the observed sound-pressure field depending on the receiver position x = [x,y,z]T ∈ Ω in Cartesian coordinates, and h(x, τ) is the spatio-temporal AIR subject to the delay time τ. The AIR describes the particular LTI system, i.e., the sound field for a Dirac impulse excitation at τ = 0.

By coupling the spatial dimensions to the time dimension, the dynamic sound field along the trajectory , defined by the time-dependent motion function , is observed aswhere represents the spatio-temporal AIR within the trajectory space . In the following subsections, we briefly compare the general concepts of stationary and dynamic sound-field sampling strategies.

2.1 Stationary sampling schemes

Measuring the sound-field signal (1) leads to stationary sampling schemes, where R microphones provide temporal samples with high acquisition rates fs at uniform points tn = n/fs, and, in general, spatial samples at N non-uniform positions xr (r ∈ {1, , N}) inside the volume of interest Ω. Strategies for the sampling and reconstruction in space are generally based on an appropriate parametrization of the continuous-space and discrete-time AIR within bounds of Ω,where f(c, x) are sampled basis functions for the interpolation inside Ω according to a spatialization model (e.g., sampled plane waves), (l, m) are basis functions for the sampled delay time dimension m (e.g., discrete Fourier basis), and γ(c, l) are the corresponding sound-field parameters, i.e., coefficients describing the AIR field h(x, m). For h(xr, m) vanishing into the noise level at mL, the parameters are encoded in spatio-temporal samples of (1) subject towith ηr(n) comprising the measurement noise and parameterization errors. Parameter decoding may be achieved by solving the inverse problem given a controlled excitation sequence s(n) and calibrated sampling positions xr. Adequate estimates of γ(c, l) from the corresponding system of linear equations may be used for the sound-field reconstruction according to (3). The reconstruction along the delay time m is simple due to high temporal sampling rates fs ≥ 2fcut of the microphones, effortlessly achievable for practical cutoff frequencies fcut. Note that, of course, stationary microphones allow for decoupling the multidimensional sampling problem (4). So, in practice, estimates of h(xr, m) are calculated first by solving the deconvolution problem for each microphone position separately, and then the parameters γ(c, l) are recovered from the remaining linear interpolation equations of the form .

2.2 Dynamic sampling schemes

For the sampling of the dynamic sound pressure (2), measurement procedures with moving microphones can be used that generate samples at uniform points in time and, generally, at continuously varying and non-uniform positions in space. As temporal sampling implies spatial sampling in such setups, only one microphone measuring along an appropriate trajectory is sufficient for gathering the entire sound-field information. At this, the speed of the microphone is arbitrary and may vary, pausing is not required.

By employing the identical parameterization model as in (3), the spatio-temporal AIR sampled (non-uniformly) along the trajectory readsAccordingly, the particular sound-field parameters γ(c, l) are encoded in samples of (2) subject towhere encapsulates the errors due to sampling and finite-dimensional parameterization. Note, because , the sampling model (6) may use the very same parameters and basis functions as the static model (4). However, in (6), the sampling of the spatial component is coupled to the global time index n subject to , since the involved AIR is immediately varying during the dynamic measurement process. The controlling of s(n) and tracking of allow for estimates of γ(c, l) from the linear equations provided by (6) at time points and for the spatial reconstruction inside Ω according to (3). For high-speed movements, the noise level can be increased by air-flow perturbations in practice.

In fact, it turns out that all spatio-temporal sampling procedures with moving microphones as outlined in Section 1 are based on the sampling model (6), generally involving non-uniformly sampled basis functions for the spatial dimensions and uniformly sampled basis functions (l, m) for the temporal delay. Various dynamic strategies for recovering AIRs only differ in the specific choice of basis functions, i.e., the parameter representation of the target sound field inside Ω. In any case, spatial components are evaluated subject to the time-dependent microphone trajectory performed in an actually D-dimensional subspace .

In (Urbanietz and Enzner, 2020), for example, the sound-field parameters γ(c, l) are HRIRs themselves, spatially expanded by circular harmonics of maximum order Q. This is equivalent to the choices f(c, ϕ(n)) = ej(cQ)ϕ(n) and (l, m) = δ(ml), where δ is the unit impulse function and is an azimuthal trajectory, i.e., the number of considered spatial dimensions is D = 1. By contrast, the method by uses f(c, ϕ(n)) = φ(c, ϕ(n)) and (l, m) = s(lm), with φ being Lagrange basis polynomials and s being an L-shift cross-orthogonal excitation sequence that fulfills and, thus, simplifies (6) significantly. While these approaches confine the AIR target area to the one-dimensional (circular) trajectory curve itself , the methods in (; Katzberg et al., 2018; ) scale up to D = 3 using three-dimensional basis functions that cover entire listening volumes. This allows for modeling the dynamic sampling along arbitrarily shaped microphone trajectories.

In this paper, we introduce a mathematical framework exploiting a representation by (l, m) = ej2πlm/L, in order to investigate frequency shifts in the sampled signal due to the Doppler effect. The following Doppler-shift descriptions cover dynamic acquisition techniques for several AIR types that assume constant environmental conditions, i.e., AIR variations are solely due to positional variations of the microphone.

3 Doppler-shift model for dynamic channels

In dynamic approaches, the sound signal sampled by the moving microphone is always affected by the Doppler effect. This raises the question about the impact of the involved Doppler shifts on the estimates of the sound-field parameters. Numerical and real-word experiments to the dynamic approaches outlined in Section 1 indicate that the positional tracking of the microphone allows for the Doppler-shift tracking in the dynamic signal, and, in turn, for uncorrupted parameter estimates. Thus, the key of these dynamic techniques lies in the tracking or controlling of the receiver trajectory, e.g., by using a robot-guided microphone, or by performing a circular trajectory at constant speed and measuring the round-trip time of the moving microphone. However, from the mathematical perspective, such implicit Doppler-shift rendering by use of positional data may not be straightforwardly comprehensible from a general spatio-temporal signal description in line with (6). In order to close this conceptual gap, we propose a more intrinsic channel interpretation for the dynamic sampling model.

3.1 General model for Doppler-shifted sound observations

For a general analytical description of interfering Doppler shifts, we use a doubly-dispersive signal model and adopt the concept of Doppler-variant impulse responses and transfer functions (). Such Doppler-domain representations are well-known from time-varying channels in wireless systems (; ; ), e.g., for underwater acoustic communications (Li and Preisig, 2007; Zeng and Xu, 2012).

For the moment, let us disregard the spatial context in (2). Accordingly, the dynamic time signal may be described bywith h(t, τ) being the time-variant impulse response. The Fourier transforms of h(t, τ) along the particular dimensions yield, respectively, the Doppler-variant impulse responsewhich contains the variable ν denoting the Doppler shift, the time-variant transfer functionwith angular frequency , and the Doppler-variant transfer function

The time-variant impulse response in (7) may be interpreted as the time-varying contribution from a continuum of scattering paths. By using the inverse Fourier transform of , the interfering Doppler shifts in (7) are revealed asIn (11), we can see that the signal arriving at the moving receiver is composed of a weighted superposition of time-delayed and frequency-shifted replicas of the source sequence s(t). The weighting defines the contribution to from a scattering path with delay τ. The channel is called doubly-dispersive as it involves both delay and frequency spreads due to both multipath signal propagation and time-varying behavior. By using the inverse Fourier transform of , (11) can also be expressed aswith representing the time-dependent Fourier transform of the source signal s(t). For obvious reasons, and are also referred to as delay-Doppler spreading function and frequency-Doppler spreading function, respectively. The Fourier relationships between the four fundamental functions h, , , and are summarized in Figure 1. Each of such representations describes any linear time-variant system completely ().

FIGURE 1

3.2 Spatial parameterization of the time-variant transfer function

In (12), the term describes the time-varying channel as the interference of instantaneous Doppler shifts, where parameterizes the frequency-dependent Doppler characteristics. The total effect of these characteristics could be considered unconnected, and, in general, estimated by adaptive filtering techniques that track the time-varying transfer function . However, in contrast to, e.g., underwater acoustic channels where significant dynamics in the surrounding medium and scatterers are present, we assume a time-invariant configuration of scattering paths inside the target volume Ω. Accordingly, is fully defined by the receiver trajectory within the LTI acoustic field H(x, ω). The acoustic field can be described by the general modelwhere A(ρ, ω) is a continuum of sound-field parameters subject to the variable ρ that constitutes a continuum of possible spatial variations for x ∈ Ω due to multipath arrivals, and f(ρ, x) are appropriate spatial basis functions. The reverberant field H(x, ω) represents solutions to the Helmholtz equation for a Dirac impulse excitation at τ = 0. In the following descriptions, H(x, ω) is considered as the spatial transfer function from the static sound source to the receiver position x.

Doppler shifts can be represented in terms of the (unknown) acoustic field (13) with fixed initial and boundary conditions. By introducing the spatial context subject to H(x, ω), we rewrite the dynamic signal (12) aswhere is the Doppler-shifted observation of sound-field characteristics by use of the trajectory . Hence, we define as the sound field in Doppler domain for the underlying receiver trajectory . In this paper, the term Doppler domain is used to describe the sound pressure that lives along and, therefore, is subject to the Doppler shifts along . For the physical conditions regarded (e.g., constant temperature, fixed source and scatterer positions), the dynamically shaped sound-field observation along is equivalent to the time-variant transfer function,As a result, is brought into a direct spatio-temporal relationship with the LTI acoustic field by use of positional information in terms of trajectory data and by the corresponding evaluation of particular basis functions in space. The Doppler effect is completely modeled by , the sound-field parameters A(ρ, ω) remain unaffected. Finally, the time-varying AIR along the receiver trajectory is determined by the spatio-temporal connection

3.3 Examples of Doppler-aware spatialization concepts

Using plane-wave basis functions is probably the most intuitive way to represent Doppler-based sound-field observations and to calculate frequency shifts due to a moving receiver. In existing, physics-heavy literature, complex acoustics in dynamic and fluctuating environments are typically described in terms of plane-wave expressions (Morse and Ingard, 1986; Ostashev and Wilson, 2019). Corresponding simplified examples are given in Section 3.3.1 for special steady-state scenarios. Then, two specific implementations of the spatio-temporal transfer function are provided in Section 3.3.2; Section 3.3.3. These examples are based on our previous works (Katzberg et al., 2018; ; Katzberg and Mertins, 2022).

3.3.1 Plane waves

First, let us consider the Doppler effect for the simplest scenario with an ideal free-field environment having no reflective surfaces and a linearly moving microphone. For a stationary source that emits sound of the single angular frequency ω and a receiver that moves at constant speed v0 and constant angle ψ in relation to the direction of the wave front, the shifted frequency sensed over time is given by the well-known mappingwith the constant scaling factor ζ = v0 cos(ψ)/c0 and the sound velocity c0. For the special cases ψ = 0, ψ = π, and ψ = ± π/2, the receiver is moving along, opposed, and orthogonal to the wave propagation, respectively. In (17), the sensed frequency is composed of the originally emitted frequency ω and the shifting term −ζω. The shifting term models the variation in the spatial phase induced by the receiver motion, and its impact on the temporal frequency of the received time signal. This can be shown by using the spectral relationship (Pierce, 2019) for propagative sound waves (in the far field) having angular wave number , and rewriting (17) according towhere is the velocity vector of the receiver motion and is the wave vector comprising spatial frequencies along the normalized propagation direction ns, i.e., . The knowledge of both the linear receiver dynamics coded in v and the direction of wave propagation coded in k allows for the exact identification of spatio-spectral activity, and, in turn, for the reverse mapping from the observation to the initial temporal frequency ω.

For non-linear receiver trajectories, the Doppler effect can be described in terms of a time-varying velocity vector v(t) (Ostashev and Wilson, 2019). The sensed, so-called instantaneous frequency is given bywhere is the phase function of the observed wave form. Accordingly, the instantaneous phase of a received plane-wave signal can be described aswith the constants and resulting from the boundary conditions of the indefinite integral, and modeling the relative positions of the moving receiver based on its initial state . The instantaneous frequency subject to the relative trajectory isBy comparing (19) and (20) with (18), it is straightforward to see that phase modulations in terms of relative receiver locations induce frequency modulations according to the Doppler effect. Moreover, by choosing appropriate constants τb and db, the observed Doppler-shifted frequencies can be represented subject to the particular sound-field characteristics, i.e., solutions to the acoustic wave equation for the involved initial and boundary conditions. Convenient constants are τb = 0 and db = x0, with x0 being the initial position of the receiver subject to the global coordinate system. By knowing the absolute trajectory of a moving microphone in relation to the reference point x0 ∈ Ω within the acoustic field of interest, the Doppler-shifted microphone signal can be embedded into the spatio-temporal context of the (unknown) LTI environment. This is the underlying idea of the dynamic sampling model and the according sound-field estimation problem.

Based on (19), (20), dynamic free-field observations of sound from a stationary broadband source can be easily expressed as modulated plane waves. For a simplified scenario that regards the global steady-state sound field Hst(x, ω) = S(ω)H(x, ω) (and allows for dropping the delay dimension τ), a moving receiver observes the superposing, trajectory-dependent wave formswhere ejωt is the fundamental temporal solution to the homogeneous wave equation, describes the Doppler effect in terms of spatial solutions to the corresponding Helmholtz equation, and is a continuum of parameters characterizing the steady-state sound field in respect of the involved source signal from the particular direction coded in −k. For a receiver being stationary, i.e., , the signal representation in (21) simplifies to the well-known plane-wave solutions (Williams, 1999). For a receiver being dynamic, the observed signal comprises the particular plane waves whose phases are modulated by the relative motion function . This yields instantaneous Doppler shifts according to (20) in the received time signal. As apparent in (21), the frequency shifts due to the dynamic observation process are represented by means of modifications within the spatial basis functions only, the weightings Ast(ω) remain unaffected. Consequently, (21) may be interpreted as the space-varying observation of sound-field features Ast(ω) by use of the motion function . The knowledge of the absolute trajectory within the acoustic field allows for the reverse mapping from the observed Doppler-shifted signal to the particular plane-wave solutions of the sound field.

Let us finally consider a dynamic receiver that is moving arbitrarily within a reverberant LTI scene. Multiple sound scattering paths lead to multiple angles of arrival within the source-free target volume Ω. Therefore, the dynamically observed sound field comprises multiple interfering Doppler shifts. This can be modeled on the basis of (21), e.g., by extending the wave vector to the dependency on a continuum of scatterers and introducing a corresponding integral. A more elegant description is given bywhere S2 denotes the surface of the unit sphere and is a unit vector modeling any possible direction for arriving plane waves. For the stationary case, the inner integral in (22) is referred to as the Herglotz wave function which is a well-known solution to the Helmholtz equation (Vasquez and Mauck, 2018; ). The complex-valued function is the so-called Herglotz density/kernel and describes the steady-state sound field for the particular source sequence. In (22), the dynamic procedure is modeled in terms of phase modulations of the incident plane waves. Thus, the accessibility of absolute positions within the reverberant field allows for extracting intact sound data from Doppler-shifted observations .

3.3.2 Spherical harmonics

Using the Jacobi-Anger expansion (), the continuous plane-wave representation (22) can also be cast into a series of standing waves in radial direction with a set of infinite but countable coefficients (). This leads to a spherical-harmonic representation of the received signal and is basically equivalent to the dynamic model proposed in ().

In (), the multi-frequency AIR field subject to the delay-time variable τ is modeled aswhere the position vector x ∈ Ω of the receiver is given in spherical coordinates x = [r,θ,ϕ]T, with radius r ∈ [0, rmax], polar angle θ ∈ [0, π], and azimuth angle ϕ ∈ [0, 2π). The Helmholtz-based wave field representation by analogy with (13) is chosen according towith being the spherical harmonic function of order and mode , bv denoting the v-th order spherical Bessel function of the first kind, and Ψ constituting the set of index pairs where |q| ≤ v (Williams, 1999). This involves the parameters A((v, q), ω) that describe the sound field on a surface of a notional sphere around the three-dimensional listening volume for temporal frequencies ω and spatial tuples (v, q). The overall spatial basis functions are , allowing for the spherical-wave extrapolation from the virtual sphere to arbitrary radii pointing to its interior. Following our general parameterization concept (15), the spatial embedding of the time-variant transfer function in terms of spherical harmonics readsAs a result, the acoustic transfer function becomes connected with an actual spatio-temporal meaning. It is equivalent to the dynamically shaped spherical-wave field observed in the Doppler domain along the receiver trajectory for a Dirac impulse excitation at the relative time point τ = 0. By combining the transfer-function model (24) with the signal model (14), the dynamic sound-pressure sequence arriving at the moving receiver is described by

3.3.3 Spatial Fourier bases

The AIR-field reconstruction problem can also be interpreted as the classical bandlimited-interpolation task, which involves sinc-function based interpolation filters in multiple separable dimensions subject to Cartesian coordinates. For this, actually no explicit sound propagation model is needed. The assumption of a sound signal with maximum frequency ωcut allows for the sampling-theory inspired parameterization of the wave field according towhereis the continuous spectrum of the spatially sampled sound field, equidistantly provided along a three-dimensional grid in space at sampling intervals Δξ for ξ ∈ {x, y, z}, , with denoting the three-dimensional Fourier transform along the spatial variables in , and is a separable anti-imaging filter that constrains the periodic spectrum to the spatial baseband frequencies (; Katzberg et al., 2018; Katzberg and Mertins, 2022). According to the Nyquist-Shannon sampling theorem, aliasing-free reconstruction of sound waves with minimal wavelength λmin = c0/fcut = 2πc0/ωcut requires the uniform grid points to have spacings Δξπc0/ωcut.

Following the pure sampling-theory perspective, the transfer-function representation in the sense of (15) can be written aswith Γc = {k : |k| ≤ ωcut/c0} denoting the set of spatial target frequencies in the baseband, constituting the sound-field parameters, and being the spatial basis functions evaluated in terms of the receiver trajectory . In (25), the underlying interpolation kernel is assumed ideal with flat magnitude response and linear phase response for frequencies k ∈ Γc, which, in practice, is achievable by use of higher-order Lagrange polynomials (Välimäki and Laakso, 2000; Katzberg et al., 2018). By applying the transfer-function parameterization (25) to the signal model (14), the dynamically acquired sequence is revealed asNote that this uniform-grid approach, with expressions adopted from classical sampling and reconstruction ideas, is, in fact, closely connected with plane-wave representations such as the ones from Section 3.3.1.

For the physical conditions considered, the dispersion relation (Ostashev and Wilson, 2019; Pierce, 2019) obeys the spectral relationship and, thus, the spectrum of the original sound-field signal ideally lives on the three-dimensional surface of a hypercone along the temporal frequency axis ω. Active frequencies in are structured accordingly. Especially at lower temporal frequencies, the conical shape is assumed to be dominated by a sparse set of frequency combinations. In that regard, the representation (26) has been proven in (Katzberg et al., 2018; Katzberg and Mertins, 2022) to be an excellent basis for CS-based recovery approaches. Also, the grid-based parameterization provides the full scalability to recover particularly on planes (D = 2) and along lines (D = 1) by simply lowering the dimension of the uniform grid modeled in space. This inherently diminishes the spectral structure according to (planar case) and (linear case) due to the released spatial variables in the three-dimensional propagation medium.

3.4 General system of linear equations for broadband parameters

After having provided particular parameterization examples in Section 3.3, let us continue with describing the measurement equations for the more general signal formulation from Section 3.2.

The measurement along the trajectory by use of a moving microphone provides the sampled version of (14),where is the noise term, l defines the discrete frequency variable, denotes the sampled Doppler domain, andis the short-time Fourier transform of the sampled source sequence s(n). By modeling a discrete sound-field approximation of (13) for sampled frequencies ωl = 2πfsl/L, the sampled Doppler domain readsand the Doppler-shifted microphone signal iswith A(c, l) being a finite number of unknown parameters approximating the LTI environment. In practice, a higher temporal frequency leads to more detailed variations in space and, thus, to a larger number Cl of spatial parameters required for an accurate sound-field representation.

Let us highlight the direct link to the abstract formulation (6) that has been initially introduced for outlining existing dynamic sampling models. Substituting (28) into the Doppler-based measurement equation given by (27) yieldsThis emphasizes the relationship to the spatio-temporal AIR along the measurement trajectory in terms of the Fourier transform . The corresponding model for AIR spatialization in the sense of (5) readsThis yields both a delay-based representation by a(c, m) (equivalent to using (l, m) = δ(ml) in (5)) and an associated frequency-based parameterization in terms of (equivalent to using (l, m) = ej2πlm/L in (5)). According to that, any time-domain model from the existing techniques outlined in Section 1 can be simply cast into a respective Fourier formulation that explicitly reproduces the interfering Doppler shifts in the dynamically acquired signal.

In summary, it can be stated that the Doppler effect in the dynamically acquired signal can be represented by time-dependent modulations of spatial basis functions. For LTI conditions, the actual sound-field parameters remain unchanged despite the continuous movement during the measurement process. This includes that the physical impact of the measurement microphone on the sound field is considered negligible, which is a usual assumption for small-capsule condenser microphones. Consequently, by controlling the excitation s(n) and tracking the trajectory , the particular sound-field features can be estimated from a number of M > P dynamic samples by applying least-squares techniques. Following (30) and using vector-matrix notations, a structured description of the measurement equations is given bywhere comprises the microphone signal , are diagonal matrices that render the temporal evolution of discrete frequencies in s(n),the matrices carry the sampled spatial basis functions with the c-th column defined asthe vectors contain the underlying sound-field parameters for each sampled frequency ωl,and is the noise vector modeling additive errors due to sampling, limited parameterization, and measurement noise. Accordingly, the M × P sampling matrixand the vector concatenation of P parameters build up the system of linear equationsNote that the spectrum of the real-valued sound field is conjugate symmetric which can be exploited for reducing the effective number of parameters and saving computational cost. In practice, of course, there will be the usual sampling artifacts due to the measurement within finite observation windows in time and space. Also, the microphone dynamics demand a little amount of oversampling (which is usual either way), since the Doppler effect affects the highest frequencies that fall beyond the cutoff frequency of the anti-aliasing prefilter in case the microphone moves toward the source.

4 Non-ideal microphone tracking

Given perfect positional tracking of the moving microphone, the relative motion function as defined in Section 3.3.1 can be perfectly determined and, thus, currently observed shifted frequencies subject to (20) can be perfectly described in terms of phase modulations in the spatial basis functions. The involved and unknown Doppler shifts caused by multipath arrivals are inherently tracked within the dynamic sampling model due to the connection with the LTI acoustic field. Each sample of the moving microphone is linked with a particular spatio-temporal solution to the wave equation, which in turn allows for the reverse mapping of the Doppler effect. In this section, we show how the according sound-field estimates behave for non-ideal trajectory data.

4.1 Perturbation model

For non-perfect microphone tracking, the positional errors lead to inconsistencies between the real-world Doppler effect and the frequency shifts performed within the signal model. Mathematically, this erroneous mapping of Doppler shifts can be expressed by a perturbation on the sampling matrix according to , where the deviating matrix results from inaccurate evaluations of the spatial basis functions. This produces the perturbed modelwith the multiplicative noise term , i.e., the error is correlated with the targeted sound-field signal. Using the block-structure representation (32) of the sampling matrix, its perturbation can be described aswith deviations given bywhere denotes the estimated trajectory that includes positional mismatches due to inaccuracies of the applied microphone tracking system. This yields the perturbation matrixand specifies the multiplicative noise as

4.2 Sensitivity considerations for Doppler-shift mismatches

The structured perturbation matrix (35) allows us to provide stability conditions and error bounds for sound-field estimates that are based on noisy trajectory data, i.e., Doppler-shift mismatches. For this, existing perturbation theory can be adapted, which is demonstrated in the following for both the least-squares and CS cases.

First, let us consider the effects of positional perturbations in on its pseudoinverse . In general, the differences in the pseudoinverse caused by can be bounded (Wedin, 1973; Stewart, 1977) according towhere ‖ ⋅‖F denotes the Frobenius norm, and, following (35),The bound (36) does not necessarily remain finite as tends to zero. For trajectory data with slight errors approaching zero, but yielding a deviation that changes the rank of the unperturbed sampling matrix, , the error norm may become very large. However, stating the condition , the bound (36) can be strengthened towhich guarantees that is close to if is small (Wedin, 1973).

Now, we consider the solution of the linear least-squares problem to (34),Let us assume a compatible linear system with vanishing model error, , and and both having full column rank with MP. For a relative error rate ϵ fulfillingwith being the condition number, the error of the least-squares estimate is bounded (Wedin, 1973; ) byAccording to the conditions (38) attached to ϵ, the performance guarantee (39) requires an accuracy of the microphone tracking that satisfies , where σmax and σmin denote the maximum and minimum singular values, respectively. Here, the formula (37) provides a handy upper bound for the singular values of the perturbation matrix given particular trajectory offsets, since with (; ).

Regarding CS-based strategies, Doppler-shift mismatches can be seamlessly incorporated into the perturbation model from (). For simplicity, let us assume a strictly K-sparse parameter vector a having the support size ‖a0 = |{i : ai ≠ 0}| ≤ KP. Also, let us assume a sampling matrix with M < P and full row rank, and consider the solution of the basis pursuit problem to (34),where the total perturbation obeysThe worst-case relative perturbations are quantified bywhere denotes the maximum spectral norm over the collection of all K-column submatrices. For a sampling matrix with restricted isometry constant fulfillingor, vice versa, for Doppler-shift errors in satisfyingwe can set the absolute noise parameterwithand guarantee the stability of the basis pursuit solution according towhere the well-behaved factor μ ≥ 0 is defined by and (). Consequently, the stability of the solution scales linearly with the worst-case mismatch measured by . Note that (43) must be considered as the worst-case error bound for sound-field estimates based on non-ideal microphone tracking in the theoretically worst possible scenario. In practice, CS solutions are known to perform much better. The conditions (40), (41), (42) implyand, similar to the least-squares case, they ensure for any k ≤ 2K that , where and , respectively, denote the largest and smallest non-zero singular values over all k-column submatrices, and, thus, that the maximum rank of the corresponding submatrices does not change with the perturbation, .

5 Subband analysis in Doppler domain

For extensive target regions and wide-ranging bandwidths, the linear system (33) with the parameter set in a might become too large for practical applications due to limitations in computational power and memory. For example, by considering a spherical volume with radius rmax, the number of parameters required for the spatialization at ωl is (). The broadband problem formulated in (33) grows accordingly. Here, all frequencies are coupled owing to the wideband excitation with s(n) and its convolutive relationships in Doppler-shifted microphone samples . For the case in which the problem is ill-conditioned due to insufficient sampling, solving the inverse problem may lead to poor solutions over the entire bandwidth. In this section, we establish a frequency-decoupling filtering scheme that is compatible with Doppler-shifting sampling procedures. In contrast to the short-time Fourier analysis that will be presented in Section 6, this scheme is universally applicable for arbitrary excitation signals. It may be used for subband recovery strategies that allow for parallel computing and faster reconstruction, especially at low frequencies.

5.1 Spatio-temporal filtering in Doppler domain

Isolating distinct subbands of temporal frequencies by applying conventional bandpass filtering to the microphone signal is not feasible. The measured position is varying over time since the microphone is supposed to move. It is easy to see that applying a digital bandpass filter (n) along the temporal dimension of inherently involves a spatial filtering since depends on n, i.e., . Due to the frequency shifts caused by mixing up temporal and spatial frequencies, i.e., the Doppler effect, the bandpass filtered signal in the Doppler domain may not be directly linked to an accordingly bandpass filtered sound field.

Nevertheless, let us consider linear-phase FIR filters [](n) of length defining the complete signal decomposition of the broadband measurement into subband signals subject towith operator modeling the delay of samples introduced by the FIR filters, and filtered signals covering Doppler-shifted frequencies of distinct subbands with cutoff frequencies according to . Of course, it is unfeasible to link a subband of spatio-temporal frequencies on the trajectory with the corresponding subband of temporal frequencies inside Ω. However, we can interrelate the Doppler interfered subbands of both the measured sound field on the trajectory and the AIR on the trajectory, since both signals are dynamically shaped by the same motion function, and, therefore, involve shared Doppler artifacts. Thus, using (31), we may formulate the delay-compensated subband modelOur model (45) implies that the time-domain convolution between the measured signal and the filter [](n) is actually equivalent to a spatio-temporal filtering as the microphone is permanently moving. Since the accordingly varying AIR lives on the same trajectory, it is equally affected by this multidimensional filtering operation, leading to the corresponding Doppler-shifted subband version h[] and the unfiltered excitation sequence on the right-hand side of (45). These intuitive relationships can be shown mathematically. Without loss of generality, let us ignore the filter delay, consider infinite convolution sums, and rewrite (45) asUsing the substitutions , , and revealsDefining and introducing the limits yieldswhere the sum over defines the spatio-temporal filtering by and composes the subband AIR along the trajectory.

5.2 Subband decomposition in Doppler domain

Similar to the broadband case in Section 3.4, the frequency-based spatialization model (29) can be applied to the bandpass filtered signal (45), which allows for representing the isolated Doppler-shifted frequencies by their non-shifted counterparts ω in combination with modulated spatial basis functions,withrepresenting the subband AIR in the sampled Doppler domain and being the subband error term. Note that the subband coefficients A[](c, l) depend on l which is still the discrete variable of the non-shifted frequency. Here, again, the trajectory must be known, in order to allow for the reverse mapping of the involved Doppler shifts by use of modulated basis functions . Due to the bandpass filtering, we know a priori that A[](c, l) = 0 for several stopband frequencies, thus, the sum over l is reduced in (46) to the sum over sampled subband frequencies of the setwhere the band interleaving margins α and β are chosen according to the maximum expected Doppler shift, which can be approximated by knowing the microphone trajectory and using (17). The Doppler margins ensure that there is no measured signal component in that translates to original frequencies . In practice, sufficient ranges of α and β will be necessary also due to non-ideal bandpass filters [](n).

5.3 System of linear equations for subband parameters

In comparison with (33), we can now shrink the broadband parameter vector to the subband vector with parameters A[](c, l) for frequencies , and shrink the broadband sampling matrix to the corresponding subset of columns for shaping the system of linear equationsthat describes the subband case (46) with the reduced number of parameters . Generally, the decomposition of the original large problem (33) into subproblems with small Doppler margins α and β enables us to recover the sound field at low frequencies using just a few samples acquired after a short sampling time. Also, with regard to the sensitivity considerations in Section 4.2, the sampled Doppler mismatch induced by errors in the microphone tracking may become less critical due to the frequency decoupling. For trajectories which lead to ill-conditioned or underdetermined broadband problems, the subband method allows for an adaptation of the bandwidth to be recovered.

Caused by the interfering Doppler shifts and the non-ideal bandpass filters, the full information on a single discrete frequency l might be spread over neighbouring subbands (cf. Section 7.1; Figure 3). According to the complete signal decomposition (44), the entire frequency information is composed ofwith . In practice, the Doppler induced frequency shifts amongst subbands have small range. For example, using a microphone moving at maximum speed of 12 km/h, Doppler shifts may be roughly localized according to . Finally, note that this Doppler-domain subband procedure can be regarded as the generalized Fourier equivalent to the specific multigrid recovery scheme described in (; ).

5.4 III-conditioning and subband improvements

Beside the ability of the subband approach to divide a large-scale broadband problem into many small-size subtasks that can be solved in parallel, some further numerical benefits are obtained for finding suitable and robust solutions to the inverse problem. This is highlighted in the following considerations.

5.4.1 Number of unknown parameters

For larger bandwidths and larger target regions, arbitrary microphone trajectories lead to a linear system (33) that will be ill-posed or even underdetermined with high probability unless an excessive number of spatially dense samples are acquired. This is essentially due to the exploding number of spatially dense sound-field parameters required to represent the dynamically coupled samples by broadband wave fields in spacious areas. For example, the uniform-grid model from Section 3.3.3 comprises at least (Δξ = πc0/ωcut)parameters to be estimated for recovering field information at temporal frequency ωl inside a D-dimensional cubical region Ωcub of size . For the spherical-harmonics model from Section 3.3.2, the number of parameters needed to adequately reconstruct sound waves at ωl inside a spherical volume Ωsph of size is at least (v ≤ ⌈ωlrmax/c0⌉)The overall linear system (33) models the typical broadband measurement case where the moving microphone acquires sound-field samples at the rate fs for a spectrally flat excitation signal emitted by a controlled loudspeaker source. Thus, choosing the uniform-grid model for example, the spatio-temporal recovery of AIRs with duration (L − 1)fs involves unknown sound-field parameters (cf. Eqs (29), (30)). For practical applications, this results in a huge sampling matrix that makes it difficult or even impossible to solve (33) with conventional methods. Analogously, the spherical-harmonic representation involves unknowns. Here, there is also an additional influence that tends to produce ill-posed recovery tasks. In the spherical-wave extrapolation formula (23), the spherical Bessel functions for the radial part cross zero at several frequency-radius combinations . Due to that, they are typically known to yield ill-conditioned inverse problems unless taking particular care of radial sampling positions or introducing special regularization filters (Rafaely, 2005; Moreau et al., 2006; Rafaely, 2015).

In the light of the previous considerations, a decomposition of the overall linear system (33) is desirable. There are two strategies to divide the large-scale problem into smaller problems: the spectral decomposition of the global bandwidth [ω0, ωL−1] and the spatial decomposition of the measurement region Ω into multiple subareas with smaller extent. The feasibility of spatial decomposition will be highly dependent on both the microphone trajectory and the spatialization model, and, thus, is not further considered in this paper. However, the proposed subband filtering scheme provides a useful and general tool for the spectral signal and parameter decoupling in the presence of Doppler shifts. It can be used to solve smaller subproblems where only subsets of frequency parameters according to (47) are active.

5.4.2 Singular values

The subband approach solves the inverse problem by use of sampling matrices that are each constructed by a reduced set of columns from the fullband matrix with P < P. Let us introduce as the strict subset of correspondingly selected column indices. The respective bounds of non-zero singular values are defined aswith the setsand ai denoting the i-th element in a. Since VV, it followsHence, the condition number of the subband sampling matrix satisfiesCompared to the broadband case, the subband formulation typically yields a shrunken range between singular values, most likely for PP. This improves the matrix condition and, therefore, the robustness of the estimates and the theoretical error bounds in the sense of (39).

For a broadband sampling matrix with singular values b1 ≥⋯ ≥ bmin(M,P) ≥ 0 that are heavily imbalanced, the impact of general measurement errors in terms of p˜ = ptrue˜ + e on the LS estimate can be illustrated by the expressionwhere and ui, vi are the left- and right-singular vectors, respectively, characterizing the linear mapping from the signal space over to the measurement space over according to . When constructing the LS solution by the reverse mapping (51), relatively small singular values amplify the additive perturbation terms in (51), so that they may dominate other major signal components referring to large singular values. As a remedy, the subband formulation may result in more compact and, thus, more favorable singular values. Based on the interlacing properties derived by Thompson (1972), exact relations amongst the singular values of the fullband matrix and the subband matrices can be provided. With being the singular values of the particular submatrix , it holds thatwhere J = PP is the number of dropped columns due to the subband formulation. Especially for low-frequency subbands that require only a small number of spatial parameters, PP and J in (52) becomes very large, so that the subband solution by analogy with (51) is most likely more robust.

For M < P, the broadband measurement matrix by necessity has linearly dependent columns and, thus, involves underdetermined variables. However, also for decent acquisition times where M is a low multiple of P (such as the twofold or fourfold of P), we observed in our previous works that the resulting problem tends to be underdetermined due to numerical rank deficiency, even for suitable trajectory curves covering the entire target volume (). Having a tall matrix (M > P) of rank R < P, the deletion of one column results in a submatrix that is either of rank R = R or R = R − 1, depending on whether or not the deleted column was a linear combination of some other columns. Low-frequency subband matrices are obtained by deleting a mass of columns in . Therefore, compared to R/P, they yield an improved ratio R/P with very high probability. With just a few samples acquired after a short measurement time, they may even have full column rank. Apart from that, in cases of underdetermined problems, the subband approach may improve robustness guarantees for CS-based solutions by lowering the restricted isometry constants. Closely connected with the expressions from Section 4.2, the restricted isometry constant for K-sparsity can be represented aswhich states that each submatrix constructed by no more than K columns has its singular values in the interval (; ). By comparing to this broadband case and using similar definitions as in (49), (50), it can be shown that all sampling matrices produced by our subband approach have constants . Here, again, we expect noticeable improvements for lower subbands with PP (cf. Section 7.2; Figure 7).

6 Frequency analysis in Doppler domain

Having a moving microphone, the decoupling of single temporal frequencies by applying conventional Fourier analysis to the measured signal is not possible. Due to the receiver motion within the multipath environment, spectral characteristics are actually spread over multiple frequency components (frequency dispersion). The range of this spectral broadening is measured by the so-called Doppler spread, which, in our case, increases for higher velocities of the microphone. For the subband design in Section 5, the extent of frequency spreading is taken into consideration by the frequency set (47), where α and β are simple worst-case bounds for the Doppler spread according to the maximum expected microphone speed. In this section, we derive trajectory-dependent filters that exactly characterize the frequency dispersion in the sampled microphone signal. The knowledge of the receiver trajectory allows for calculating these filters and describing frequency components in the dynamically observed signal by frequency-spread versions of the particular sound-field parameters.

6.1 Spatio-spectral spreading due to dynamic observations

As demonstrated in Section 3, the dynamic signal model (2) can be seamlessly embedded into the concept of Doppler-variant impulse responses and transfer functions, which is often used in wireless-communications literature. Let us adapt the notation in (2) accordingly, i.e.,where the subscript indicates the temporal context to the trajectory. In (53), the signal multiplication subject to t translates to the convolutive frequency representationwhere denotes the inverse Fourier transform from frequency variable ν to time t, , is the trajectory-shaped Doppler-variant impulse response in the sense of (8), and is the corresponding transfer function in the sense of (10). By taking the Fourier transform of both sides of (54), the shifted frequencies observed by a moving receiver can be expressed as the spectral convolutionIn comparison to (14), where the sound-pressure observation is given in the time domain, (55) displays the Fourier correspondence of the dynamic signal subject to the global time variable t. This reveals the convolutional coupling of the frequency variables ω and ν due to the receiver movement.

6.2 Spatial parameterization of the Doppler-variant transfer function

Similar to (15), the trajectory-shaped spreading function in (55) can be represented subject to the particular spatialization model of the time-invariant sound-field characteristics, i.e.,with the trajectory-dependent filtersconstructed by the Fourier transform of the particularly evaluated basis functions. In Figure 2, we summarize the parameter-based relationships between the four fundamental system functions that describe the dynamic channel as introduced in Section 3.1 with direct connections to the receiver trajectory. Note that the parameterized domains in Figure 2 are equivalent to the four domains in Figure 1. Substituting (56) into (55) yields the representationthat describes the dynamically observed frequencies in terms of the sound-field parameters A(ρ, ω) and their trajectory-dependent smearing due to . The observation reveals a spatio-temporal spectrum that results from the combined effects of both the temporal (delay) behavior of the acoustic environment and the spatial filtering subject to the continuous movement of the receiver.

FIGURE 2

6.3 Sampled frequency spreading for periodic excitation

For obtaining a sampled analogy to (57), let us consider the periodic excitationby use of a deterministic sequence of length L. In fact, this is the actual setting of various dynamic procedures that rely on the properties of so-called perfect sequences (; ).

Based on (31), we define the short-time segmentof length L obtained from rectangular windowing of the microphone signal for window indices ∈ {0, , L − 1}. Using (58), (59), the convolution theorem leads towith the discrete Fourier transform of the sequence circularly shifted according to the excitation phase n mod L,the (spatially) windowed Doppler-variant impulse responseshaped by the particular trajectory segment, and the corresponding transfer functionmodeling the frequency spread in the sampled signal due to the moving microphone. Taking the discrete Fourier transform of both sides of (60) yieldswhich clearly describes the resulting frequency shifts in the short-time Fourier representation . The function displays the spatio-temporal spectrum measured under the positional variations along the particular trajectory segment.

6.4 Frequency-spreading filters due to moving microphones

The discrete spatialization model (29) allows for the sampled Doppler-variant representationsubject to the finite set of sound-field parameters A(c, l) and the trajectory-dependent FIR filtersfor the specific choice of spatial basis functions. Substituting (62) into (61) finally reveals the short-time Fourier representationof the dynamic sampling model (30) in terms of the frequency-spreading filters , where encapsulates the frequency-dependent errors due to sampling and parameterization. At this, the particular Doppler spread is defined as the range of frequency bins over which is essentially non-zero. Note, for a non-moving microphone at constant position, i.e., , the Doppler effect and the resulting frequency spreads vanish, which leads to inherent simplifications of (63), (64) according toandrespectively, with (65) being actually dissolved into a short-time Fourier representation of the stationary sampling concept (4) for L-periodic excitation.

Similar to the subband procedure proposed in Section 5, the short-time Fourier analysis (64) could be used to set up a recovery strategy that decomposes the dynamic broadband problem (33) into multiple subproblems of narrow frequency ranges. In fact, this is straightforward for sufficiently slow trajectories which allow for approximations , i.e., the effect of spatio-spectral smearing is negligible within a time window of size L. However, having a microphone with arbitrary velocity profile, the modeling of decoupled sampling matrices from (64) with exact frequency-spreading filters and also the merging of obtained frequency estimates are not straightforward and clearly raise the complexity compared to the simple and universal subband procedure from Section 5, which requires only rough approximations of Doppler margins α and β.

7 Experiments and results

In this section, we demonstrate the key points of the introduced Doppler framework on the basis of experimental data and specifically chosen examples of parameterization models.

7.1 Doppler-spread visualizations

Let us first consider unrealistic toy examples with extreme receiver velocities for clear visualizations of both the Doppler shifts between dynamic subbands as elaborated in Section 5 and the Doppler-spreading FIR filters derived in Section 6. Using the image source method (), the sound field inside a room of size 5.80 m × 4.15 m × 2.55 m and reverberation time T60 = 0.25 s was simulated. Instead of applying a nearest-neighbour interpolator as originally proposed by , we took a Hanning-windowed ideal low-pass filter of length 4 ms to calculate the bandlimited and sampled RIR contributions from the particular image sources (Peterson, 1986). For the excitation sequence, we generated an L-periodic MLS signal (Rife and Vanderkooy, 1989) of about 20 s duration and filtered it in order to obtain a passband of effective audio frequencies between 400 Hz and 1600 Hz. We simulated corresponding (noiseless) audio recordings from an omnidirectional microphone sampling at fs = 4000 Hz and continuously moving along Lissajous trajectories (). Lissajous curves for measurement are commonly used in magnetic particle imaging (Knopp et al., 2009). In (Katzberg et al., 2018), they have been experimentally shown to be a good choice also for sound-field sampling. The Lissajous trajectories considered here are typically slower and denser close to the volume boundaries, while the center regions are crossed fast and sampled more coarsely.

Four experiments were carried out with different trajectory mean velocities of 5 km/h, 50 km/h, 100 km/h, and 200 km/h. The spherical-harmonic parameterization from Section 3.3.2 was selected for the dynamic reconstruction. Accordingly, the spatially embedded transfer function sampled along is represented aswith being the sampled angular wave number and Ψl constituting the set of index pairs where and |q| ≤ v (Rafaely, 2015). The radius of the target region was set to rmax = 0.1 m. The trajectory-dependent sampling of spatial basis functions reads . Using the analysis scheme from Section 5, we modeled subband parameter vectors with fullband margins, i.e., and , and recovered them by solving the according overdetermined subproblems in the least-squares sense. For the chosen cutoffs , Hamming-windowed ideal band-pass filters of order 1000 were applied to the microphone signal. The frequency responses of these filters add up to unit magnitude with only minor variations in the order of 10–4.

The absolute values of the recovered subband parameters A[]((v, q), l) arranged in a are depicted in Figure 3 in reference to the four numerical experiments with various microphone velocities. The parameters are successively concatenated first along the spatial (v, q)-dimensions and then along l with increasing frequency. Dotted lines indicate the vector indices where the defined cutoffs are reached. In Figure 3, the Doppler effect can be clearly observed in terms of subband-overlapping broadening of frequency-based parameters, especially with regard to higher frequencies and faster measurement trajectories. In case of 5 km/h, the subband of Doppler interfered frequencies in from the interval map to original frequencies in a of nearly the same range when solving the modeled inverse problem. Here, the number of essential subband parameters may be vastly reduced in line with (47) using very tight Doppler margins. By contrast, considering the extremely high velocities, Doppler shifts induce a significant spreading of frequency information across multiple subbands and, thus, require large-scale margins. The parameter vector , which refers to the highest frequency band, is basically inactive for low-speed trajectories. However, for velocities higher than 5 km/h, there are more and more signal contributions shifted to the higher frequencies of the subband sequence in (1600 Hz − 2000 Hz) and the corresponding subband parameter vector becomes increasingly essential for the sound-field recovery. On the left-hand side of Figure 3, the magnitudes of the broadband result afull according to (33) without the subband decomposition are presented. They can be compared with the magnitudes of the cumulated subband vectors . Regardless of the trajectory speed, the recovered subband vectors effectively satisfy in each experiment, as expected according to (48). For each speed scenario, the deviation is only about −35 dB.

FIGURE 3

By using the Doppler analysis scheme from Section 6, Figure 4 depicts several examples of the underlying frequency-spreading filters . Here, the filter magnitudes |F1(l, (v, q), u)| and |F2(l, (v, q), u)| refer to particular trajectory segments in the 5 km/h and 50 km/h experiments, respectively, and the specific discrete frequencies l1, l2, l3, and l4 correspond to about 500 Hz, 1000 Hz, 1500 Hz, and 2000 Hz. In conformity with the entire model, the trajectory-dependent FIR filters show increased Doppler broadening for higher velocities and larger frequencies.

FIGURE 4

7.2 Doppler misalignment due to noisy trajectory data

Inaccuracies in the positional tracking of the microphone result in Doppler-shift discrepancies as introduced in Section 4. The subband decoupling scheme from Section 5 may increase the robustness against noisy trajectory data, especially regarding low-frequency parameter estimates as highlighted in Section 5.4. For demonstrating such sensitivity relationships, we carried out numerical experiments in twenty simulated room scenarios, randomly chosen according the uniform distribution of box-shaped dimensions [2 m; 10 m]3 and reverberation times T60 ∈ [0.2 s; 0.35 s]. In each environment, both the source position and the location of the considered measurement plane were randomly selected. Impulse responses were limited to length L = 4095. Recordings of an omnidirectional microphone were simulated at fs = 12 kHz along planar Lissajous trajectories involving mean velocities of about 1 km/h. For the playback signal we used sixty repetitions of an L-periodic MLS. In several experiments, measurements were corrupted by additive white Gaussian noise with a signal-to-noise ratio (SNR) of 10 dB, 20 dB, and 30 dB, respectively. The underlying trajectory data were tested for the ideal case and also for tracking errors according to , with positional noise independently drawn from the zero-mean normal distribution with standard deviation σξ for each of the effective dimensions ξ ∈ {x, y}. The mismatch level was set to σξ = 3 mm and σξ = 6 mm, respectively.

The grid approach from Section 3.3.3 was combined with aspects from theoretical acoustics and used for CS-based sound-field recovery (Katzberg et al., 2018; Katzberg and Mertins, 2022). Regarding the experiments with two-dimensional trajectories on planar target regions, the discretized sound-field model on a finite Gx × Gy grid readswhere and are the grid-related trajectory coordinates subject to the selected grid origin μξ and spacing Δξ for ξ ∈ {x, y}, κξ denotes the integer-valued frequency indices that translate to sampled spatial frequencies according to , and Γl represents the set of discrete frequency pairs (κx, κy) that approximate the conical structure around the sampled temporal frequency ωl = 2πfsl/L (Katzberg et al., 2018; Katzberg and Mertins, 2022). Complying with the proposed Doppler framework, the trajectory-dependent sampling of associated basis functions in space reads . The following recovery results were obtained with a grid design of Gx = Gy = 25 and Δx = Δy = 0.019 m.

As error measure for the overall sound-field reconstruction involving G grid AIRs, the mean normalized system misalignment is used, with containing the true AIR and being the recovered AIR at grid index . The results for the broadband model according to Section 3 and for the subband decomposition scheme according to Section 5 are comparatively depicted in Figure 5 subject to the different levels of trajectory noise. The outcomes over all experimental room scenarios are presented by use of boxplots. For the subband strategy, Hamming-windowed ideal band-pass filters of order 5000 were applied, summing up to unit magnitude response with slight errors of order 10–5. We chose , , and Doppler margins α, β that cover about twenty discrete frequency bins each. This decomposes the primary problem subject to into six decoupled subproblems subject to referring to successive 1 kHz frequency ranges. The corresponding numbers P and P of spatio-temporal parameters are listed in Table 1. Besides, there are the respective values of the perturbation measure from Section 4 for rating the induced Doppler-shift mismatches due to the tested positional noise.

FIGURE 5

TABLE 1

# parametersσξ = 3 mmσξ = 6 mm
broadband:
6.4 ⋅ 1052.905.63
subband:
2.2 ⋅ 1040.210.43
4.8 ⋅ 1040.440.84
8.3 ⋅ 1040.751.49
1.3 ⋅ 1051.162.26
1.8 ⋅ 1051.613.12
2.3 ⋅ 1052.094.02

Comparison of parameter numbers for the broadband and subband strategies, and evaluation of matrix perturbations due to positional deviations of σξ = 3 mm and σξ = 6 mm, respectively.

While the recovery performances are essentially equal for noiseless trajectory data, the subband scheme improves the robustness against the trajectory discrepancies compared to the standard broadband approach. For σξ = 6 mm, its performance gain is about 1 dB on average (Figure 5). For visualizing the effects of positional inaccuracies, the magnitudes of two particular frequency-spreading filters F1((κx, κy), u) are presented in Figure 6 given a true trajectory segment and its noisy version: for filters such as F1((10, 10), u) pointing to rather higher spatial frequencies (and, thus, also to higher temporal frequencies), the resulting Doppler mismatches are typically more considerable. To analyze frequency-based recovery performances, we apply the mean energy spectral density of the error , where (f) and are the Fourier transforms of the true and estimated grid AIRs, respectively. The frequency errors of the broadband and subband procedures for σξ = 6 mm are presented in Figure 7. In line with the considerations made in Section 5.4, the benefit of the subband strategy is especially apparent in the lower halfband up to 3 kHz, where the inverse problem is modeled by the smaller-sized sampling matrices , , and (cf. Table 1).

FIGURE 6

FIGURE 7

8 Conclusion

In this work, we formulated a Doppler-based framework that reveals the frequency-spreading effects in dynamic sound-field sampling procedures. It has been shown that the exact positional tracking of a moving microphone allows for the exact rendering of underlying Doppler shifts in the acquired signal. As it turned out, the involved frequency shifts are directly connected with the sampling of spatial basis functions subject to the microphone trajectory. For the practically relevant case of tracking inaccuracies, we described the resulting impact on the inverse problem in terms of mismatches between true and inaccurately modeled Doppler shifts. Such mismatches lead to a multiplicative perturbation model, for which we provided sensitivity considerations regarding least-squares and CS-based estimates. Also, a subband analysis scheme has been derived, which enables us to split the presented Doppler model for broadband measurements into a number of smaller subproblems that consider particular frequency bands. This allows for parallelizing the computational effort and for obtaining faster reconstructions with improved robustness against the trajectory errors, especially regarding lower frequencies. Further, we provided a reasonable concept for the (short-time) Fourier analysis of the dynamic measurement signal. Due to the continuously moving microphone, this yields an actually spatio-temporal Fourier description, dimensionally coupled by the performed trajectory. In this representation, the included Doppler spreads can be explicitly characterized by a series of trajectory-dependent FIR filters.

Statements

Data availability statement

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.

Author contributions

FK: Writing–original draft. MM: Writing–review and editing. AM: Writing–review and editing.

Funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This work has been supported in part by the German Research Foundation under Grant No. ME 1170/10-2.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

  • 1

    AjdlerT.SbaizL.VetterliM. (2006). The plenacoustic function and its sampling. IEEE Trans. Signal Process.54, 37903804. 10.1109/tsp.2006.879280

  • 2

    AjdlerT.SbaizL.VetterliM. (2007). Dynamic measurement of room impulse responses using a moving microphone. J. Acoust. Soc. Am.122, 16361645. 10.1121/1.2766776

  • 3

    AllenJ.BerkleyD. (1979). Image method for efficiently simulating small-room acoustics. J. Acoust. Soc. Am.65, 943950. 10.1121/1.382599

  • 4

    BahB.TannerJ. (2010). Improved bounds on restricted isometry constants for Gaussian matrices. SIAM J. Matrix Anal. Appl.31, 28822898. 10.1137/100788884

  • 5

    BelloP. (1963). Characterization of randomly time-variant linear channels. IEEE Trans. Commun. Syst.11, 360393. 10.1109/tcom.1963.1088793

  • 6

    BenestyJ.HuangY.ChenJ.NaylorP. A. (2006). “Adaptive algorithms for the identification of sparse impulse responses,” in Topics in acoustic echo and noise control. Editors HänslerE.SchmidtG. (Berlin: Springer), 125153.

  • 7

    BenestyJ.SondhiM.HuangY. (2008). Springer handbook of speech processing (Germany: Springer).

  • 8

    BjörckÅ. (1996). Numerical methods for least squares problems (SIAM)United States: SIAM.

  • 9

    CandèsE.RombergJ.TaoT. (2006). Stable signal recovery from incomplete and inaccurate measurements. Commun. Pure Appl. Math.59, 12071223. 10.1002/cpa.20124

  • 10

    ColtonD.KressR. (2019). Inverse acoustic and electromagnetic scattering theory. 4 edn. Germany: Springer.

  • 11

    EnznerG. (2008). Analysis and optimal control of LMS-type adaptive filtering for continuous-azimuth acquisition of head related impulse responses. Proc. IEEE Int. Conf. Acoust. Speech, Signal Process., 393396. 10.1109/ICASSP.2008.4517629

  • 12

    FaziF. M.NoisternigM.WarusfelO. (2012). Representation of sound fields for audio recording and reproduction. Proc. Acoust., 859865.

  • 13

    GolubG. H.Van LoanC. F. (2013). Matrix computations. 4 edn. United States: Johns Hopkins University Press.

  • 14

    GramiA. (2016). Introduction to digital communications. United States: Academic Press.

  • 15

    HahnN.HahneW.SporsS. (2017). Dynamic measurement of binaural room impulse responses using an optical tracking system. Proc. Int. Conf. Spat. Audio., 1621. 10.1109/WASPAA.2017.81700241

  • 16

    HahnN.SporsS. (2016). Comparison of continuous measurement techniques for spatial room impulse responses. Proc. Eur. Signal Process. Conf., 16381642. 10.1109/EUSIPCO.2016.7760526

  • 17

    HahnN.SporsS. (2017). Continuous measurement of spatial room impulse responses using a non-uniformly moving microphone. IEEE Workshop Appl. Signal Process. Audio Acoust, 205208.

  • 18

    HahnN.SporsS. (2018). Simultaneous measurement of spatial room impulse responses from multiple sound sources using a continuously moving microphone. Proc. Eur. Signal Process. Conf., 21942198. 10.23919/EUSIPCO.2018.8553532

  • 19

    HaykinS. (2001). Adaptive filter theory. USA: Prentice Hall.

  • 20

    HaykinS.LiuK. J. R. (2010). Handbook on array processing and sensor networks (New York: Wiley).

  • 21

    HeJ.RanjanR.GanW.-S.ChaudharyN. K.HaiN. D.GuptaR. (2018). Fast continuous measurement of HRTFs with unconstrained head movements for 3D audio. J. Audio Eng. Soc.66, 884900. 10.17743/jaes.2018.0050

  • 22

    HermanM.StrohmerT. (2010). General deviants: an analysis of perturbations in compressed sensing. IEEE J. Sel. Top. Signal Process.4, 342349. 10.1109/jstsp.2009.2039170

  • 23

    HighamN. J. (2002). Accuracy and stability of numerical algorithms. United States: SIAM.

  • 24

    HlawatschF.MatzG. (2011). Wireless communications over rapidly time-varying channels (USA: Academic Press).

  • 25

    KatzbergF.MaassM.MertinsA. (2021). Spherical harmonic representation for dynamic sound-field measurements. Proc. IEEE Int. Conf. Acoust. Speech, Signal Process., 426430. 10.1109/ICASSP39728.2021.9413708

  • 26

    KatzbergF.MaassM.PallenbergR.MertinsA. (2022). Positional tracking of a moving microphone in reverberant scenes by applying perfect sequences to distributed loudspeakers. Proc. Int. Workshop Acoust. Signal Enhanc. 10.1109/IWAENC53105.2022.9914709

  • 27

    KatzbergF.MazurR.MaassM.KochP.MertinsA. (2017a). Multigrid reconstruction of sound fields using moving microphones. Proc. Workshop Hands-free Speech Commun. Microphone Arrays, 191195. 10.1109/HSCMA.2017.7895588

  • 28

    KatzbergF.MazurR.MaassM.KochP.MertinsA. (2017b). Sound-field measurement with moving microphones. J. Acoust. Soc. Am.141, 32203235. 10.1121/1.4983093

  • 29

    KatzbergF.MazurR.MaassM.KochP.MertinsA. (2018). A compressed sensing framework for dynamic sound-field measurements. IEEE/ACM Trans. Audio, Speech, Lang. Process.26, 19621975. 10.1109/taslp.2018.2851144

  • 30

    KatzbergF.MertinsA. (2022). “Sparse recovery of sound fields using measurements from moving microphones,” in Compressed sensing in information processing. Editors KutyniokG.RauhutH.KunschR. J. (Germany: Springer), 377411.

  • 31

    KnoppT.BiedererS.SattelT.WeizeneckerJ.GleichB.BorgertJ.et al (2009). Trajectory analysis for magnetic particle imaging. Phys. Med. Biol.54, 385397. 10.1088/0031-9155/54/2/014

  • 32

    LiW.PreisigJ. C. (2007). Estimation of rapidly time-varying sparse channels. IEEE J. Ocean. Eng.32, 927939. 10.1109/joe.2007.906409

  • 33

    MoreauS.DanielJ.BertetS. (2006). 3D sound field recording with higher order ambisonics - objective measurements and validation of a 4th order spherical microphone. Proc. 120th Conv. Audio Eng. Soc.(Conv. Paper 6857).

  • 34

    MorseP. M.IngardK. U. (1986). Theoretical acoustics. USA: Princeton University Press.

  • 35

    OstashevV. E.WilsonD. K. (2019). Acoustics in moving inhomogeneous media. 2 edn. United States: CRC Press.

  • 36

    PetersonP. M. (1986). Simulating the response of multiple microphones to a single acoustic source in a reverberant room. J. Acoust. Soc. Am.80, 15271529. 10.1121/1.394357

  • 37

    PierceA. D. (2019). Acoustics: an introduction to its physical principles and applications. Germany: Springer.

  • 38

    RafaelyB. (2005). Analysis and design of spherical microphone arrays. IEEE Trans. Speech Audio Process12, 135143. 10.1109/tsa.2004.839244

  • 39

    RafaelyB. (2015). Fundamentals of spherical array processing. Germany: Springer.

  • 40

    RifeD. D.VanderkooyJ. (1989). Transfer-function measurement with maximum-length sequences. J. Audio Eng. Soc.37, 419444.

  • 41

    SondhiM. (1967). An adaptive echo canceller. Bell Syst. Tech. J.46, 497511. 10.1002/j.1538-7305.1967.tb04231.x

  • 42

    StanG.-B.EmbrechtsJ.-J.ArchambeauD. (2002). Comparison of different impulse response measurement techniques. J. Audio Eng. Soc.50, 249262.

  • 43

    StewartG. W. (1977). On the perturbation of pseudo-inverses, projections and linear least squares problems. SIAM Rev.19, 634662. 10.1137/1019104

  • 44

    ThompsonR. C. (1972). Principal submatrices IX: interlacing inequalities for singular values of submatrices. Linear Algebra Appl.5, 112. 10.1016/0024-3795(72)90013-4

  • 45

    UrbanietzC.EnznerG. (2020). Direct spatial-Fourier regression of HRIRs from multi-elevation continuous-azimuth recordings. IEEE/ACM Trans. Audio, Speech, Lang. Process.28, 11291142. 10.1109/taslp.2020.2982291

  • 46

    VälimäkiV.LaaksoT. (2000). Principles of fractional delay filters. Proc. IEEE Int. Conf. Acoust. Speech, Signal Process., 38703873. 10.1109/ICASSP.2000.860248

  • 47

    VasquezF. G.MauckC. (2018). Approximation by Herglotz wave functions. SIAM J. Appl. Math.78, 12831299. 10.1137/17m1144234

  • 48

    WedinP.-Å. (1973). Perturbation theory for pseudo-inverses. BIT Numer. Math.13, 217232. 10.1007/bf01933494

  • 49

    WilliamsE. G. (1999). Fourier acoustics – sound radiation and nearfield acoustical holography. United States: Academic Press.

  • 50

    ZengW.-J.XuW. (2012). Fast estimation of sparse doubly spread acoustic channels. J. Acoust. Soc. Am.131, 303317. 10.1121/1.3665992

Summary

Keywords

sound field, moving microphone, Doppler effect, Doppler spreading function, wave equation, acoustic impulse response

Citation

Katzberg F, Maass M and Mertins A (2024) Doppler frequency analysis for sound-field sampling with moving microphones. Front. Sig. Proc. 4:1304069. doi: 10.3389/frsip.2024.1304069

Received

28 September 2023

Accepted

24 January 2024

Published

10 April 2024

Volume

4 - 2024

Edited by

Thomas Dietzen, KU Leuven, Belgium

Reviewed by

Nara Hahn, University of Southampton, United Kingdom

Albert Prinn, Fraunhofer Insititute for Integrated Circuits (IIS), Germany

Updates

Copyright

*Correspondence: Fabrice Katzberg,

† Present address: Marco Maass, German Research Center for Artificial Intelligence (DFKI), Lübeck, Germany

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Figures

Cite article

Copy to clipboard


Export citation file


Share article

Article metrics