Event Abstract

Applications of Non-linear Component Extraction to Spectrogram Representations of Auditory Data.

  • 1 Frankfurt Institute for Advanced Studies, Germany

The state-of-the-art in component extraction for many types of data is based on variants of models such as principle component analysis (PCA), independent component analysis (ICA), sparse coding (SC), factor analysis (FA), or non-negative matrix factorization (NMF). These models are linear in the sense that they assume the data to consist of linear super-positions of hidden causes, i.e., these models try to explain the data with linear super-positions of generative fields. This assumption becomes obvious in the generative interpretation of these models [1].For many types of data, the assumption of linear component super-positions represents a good approximation. An example is the super-position of air-pressure waveforms. In contrast, we here study auditory data represented in the frequency domain. We consider data similar to those processed by the human audio system just after the cochlea. Such data is closely aligned with the log-power-spectrogram representations of auditory signals. It is long known that the super-position of data components in these data is non-linear and well approximated by a point-wise maximum of the individual spectrograms [2].

For component extraction from auditory spectrogram data we therefore investigate learning algorithms based on a class of generative models that assume a non-linear superposition of data components. The component extraction algorithm of Maximal Causes Analysis (MCA; [3]) assumes a maximum combination where other algorithms use the sum. Training such non-linear models is, in general, computationally expensive but can be made feasible using approximation schemes based on Expectation Maximization (EM). Here we apply an EM approximation scheme that is based on the pre-selection of the most probable causes for every data-point. The approximation results in approximate maximum likelihood solutions, reduces the computational complexity significantly while at the same time allowing for an efficient and parallelized implementation running on clustered compute nodes. To evaluate the applicability of non-linear component extraction to auditory spectrogram data, we generated training data by randomly choosing and linearly mixing waveforms from a set of 10 different phonemes (sampled at 8000Hz). We then applied an MCA algorithm based on EM and pre-selection. The algorithm was presented only the log-spectrograms of the mixed signals. Assuming Gaussian noise the algorithm was able to extract the log-spectrograms of the individual phonemes. We obtained similar results for different forms of phoneme mixtures including mixtures of always three randomly chosen phonemes.


1. Theoretical Neuroscience, P. Dayan and L. F. Abbott, 2001

2. Automatic Speech Processing by Inference in Generative Models, S. T. Roweis(2004), Speech Separation by Humans and Machines, Springer. Pp 97—134. (Roweis quotes Moore, 1983, as the first pointing out the log-max approximation)

3. Maximal Causes for Non-linear Component Extraction, J. Lücke and M. Sahani (2008) JMLR 9:1227-1267.

Conference: Bernstein Conference on Computational Neuroscience, Frankfurt am Main, Germany, 30 Sep - 2 Oct, 2009.

Presentation Type: Poster Presentation

Topic: Abstracts

Citation: Bornschein J and Lucke J (2009). Applications of Non-linear Component Extraction to Spectrogram Representations of Auditory Data.. Front. Comput. Neurosci. Conference Abstract: Bernstein Conference on Computational Neuroscience. doi: 10.3389/conf.neuro.10.2009.14.115

Copyright: The abstracts in this collection have not been subject to any Frontiers peer review or checks, and are not endorsed by Frontiers. They are made available through the Frontiers publishing platform as a service to conference organizers and presenters.

The copyright in the individual abstracts is owned by the author of each abstract or his/her employer unless otherwise stated.

Each abstract, as well as the collection of abstracts, are published under a Creative Commons CC-BY 4.0 (attribution) licence (https://creativecommons.org/licenses/by/4.0/) and may thus be reproduced, translated, adapted and be the subject of derivative works provided the authors and Frontiers are attributed.

For Frontiers’ terms and conditions please see https://www.frontiersin.org/legal/terms-and-conditions.

Received: 27 Aug 2009; Published Online: 27 Aug 2009.

* Correspondence: Jorg Bornschein, Frankfurt Institute for Advanced Studies, Frankfurt, Germany, bornschein@fias.uni-frankfurt.de