Edited by: Paul Miller, Brandeis University, United States
Reviewed by: Maoz Shamir, Ben-Gurion University of the Negev, Israel; Tim Gollisch, University of Göttingen, Germany
*Correspondence: Joel T. Kaardal
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
The signal transformations that take place in high-level sensory regions of the brain remain enigmatic because of the many nonlinear transformations that separate responses of these neurons from the input stimuli. One would like to have dimensionality reduction methods that can describe responses of such neurons in terms of operations on a large but still manageable set of relevant input features. A number of methods have been developed for this purpose, but often these methods rely on the expansion of the input space to capture as many relevant stimulus components as statistically possible. This expansion leads to a lower effective sampling thereby reducing the accuracy of the estimated components. Alternatively, so-called low-rank methods explicitly search for a small number of components in the hope of achieving higher estimation accuracy. Even with these methods, however, noise in the neural responses can force the models to estimate more components than necessary, again reducing the methods' accuracy. Here we describe how a flexible regularization procedure, together with an explicit rank constraint, can strongly improve the estimation accuracy compared to previous methods suitable for characterizing neural responses to natural stimuli. Applying the proposed low-rank method to responses of auditory neurons in the songbird brain, we find multiple relevant components making up the receptive field for each neuron and characterize their computations in terms of logical OR and AND computations. The results highlight potential differences in how invariances are constructed in visual and auditory systems.
Signal processing in neurobiological systems involves multiple nonlinear transformations applied to multidimensional inputs. Characterizing these transformations is difficult but essential to understanding the neural basis of perception. For example, neurons from successive stages of sensory systems represent inputs in terms of increasingly complex combinations of stimulus features (Felleman and Van Essen,
Present dimensionality reduction methods for recovering receptive fields of sensory neurons can be roughly divided into linear and quadratic methods. Linear methods attempt to reconstruct components of a neuron's receptive field by correlating the neural response to a set of features composed of stimulus components,
Methods designed to recover this relevant subspace can be susceptible to bias when the model is constructed based on incorrect assumptions. For instance, the spike-triggered covariance (STC) method (the quadratic analog of the STA method) assumes that the stimulus components are drawn from a Gaussian white noise distribution (Bialek and de Ruyter van Steveninck,
Our proposed method builds on the second-order MNE model for the probability of a binary response given a set of stimuli (Fitzgerald et al.,
where unknown weights
For currently available datasets, the over-expansion of the stimulus space can be a severe limitation leading to overfitting of quadratic models. To resolve this issue, we designed low-rank MNE models with an explicit rank constraint where a rank
An optimal low-rank MNE model is one that minimizes the negative log-likelihood function with respect to the weights
where
Additional structure can be imposed on the weights by adding a penalty function to the mean negative log-likelihood (Equation 2). Nuclear-norm regularization is defined as the sum of absolute values of the eigenvalue spectrum (i.e.,
and instead take the trace over
where ‖·‖2 is the ℓ2-norm and
This surrogate regularization readily works with gradient based methods for optimization, whereas diagonalization of
While the bilinear factorization of
with proof that this guarantees the rank of
From a practical point of view, the bilinear formulation of the symmetry constraints (Equation 5) is potentially problematic since its Jacobian can be rank-deficient and it introduces
where π
is the
Putting this all together, the low-rank MNE method is a nonlinear program of the form:
This problem can be transformed from a constrained to “unconstrained” optimization via the Lagrangian method:
where
is the objective function. Alternatively, one may directly substitute
where
Unlike the full-rank MNE optimization, the low-rank MNE optimization is a nonconvex problem (see the discussion surrounding Prop 2 in the
When the rank of the ground truth of matrix
To find a feasible local minimizer of the low-rank MNE problem (Equation 8) for given set of nuclear-norm regularization parameters, a line search interior-point method designed to find local minima of nonlinear, nonconvex programming problems based on Ch. 19 of
for the weight and Lagrange multiplier update directions,
The trouble with using this interior-point method to solve Equation (8) is that the size of matrix
The block coordinate descent algorithm cyclically solves the subproblems:
until the KKT conditions (Prop 1 in the
while holding the remaining
Now we turn to the procedure for setting the nuclear-norm regularization parameters. In the globally optimal domain (Prop 4 in the
Low-rank MNE block coordinate descent algorithm (globally optimal domain)
1: | |
2: | |
3: | |
4: | for |
5: | |
6: | |
7: | |
8: | λ |
9: | |
10: | ϵ |
11: | |
12: | interior-point method algorithm with inputs |
13: | ϵ |
14: | λ |
15: | |
16: | |
17: | a ← |
18: | |
19: | |
20: | |
21: | |
22: |
Our approach to the hyperparameter optimization in the locally optimal domain exploits the structure of the block coordinate descent subproblems (Equation 13) where the gradient and Hessian of the block
Low-rank MNE block coordinate descent algorithm (locally optimal domain)
1: | |
2: | |
3: | |
4: | |
5: | |
6: | |
7: | |
8: | |
9: | ϵ |
10: | |
11: | interior-point method algorithm with inputs |
12: | ϵ |
13: | |
14: | if |
15: | |
16: | |
17: | σ ← 0 |
18: | |
19: | |
20: | |
21: | |
22: | |
23: | σ ← σ + 1 |
24: | |
25: |
We tested both of these algorithms and found the locally optimal domain to be most appropriate for recovering receptive field components. In the applications of low-rank MNE to model neurons and avian auditory neurons (for details about the data, see the methods section), we found the minimum amount of regularization necessary to reach the globally optimal domain was unreasonably large (ϵ
The global optimization procedure outlined in Algorithm
Depending on the application, there are a several possible ways to choose the rank of
If instead one intends to find a compressed representation of
We now illustrate the proposed method by analyzing responses of model neurons. Details about the model data may be found in Section 4.4 in methods. First, we tested the method on two model neurons with different signal-to-noise ratios (SNR) (cf. Figure
From a qualitative point of view, the low-rank model performs better than the full-rank model because the matrix
We also found fits of the low-rank model to be resilient even when the rank of
We now show that the proposed low-rank MNE method offers substantial improvement in our ability to resolve multiple relevant components of sensory neurons' receptive fields by applying it to recordings from the avian auditory forebrain (Gill et al.,
Low-rank MNE, full-rank MNE, and STC models were optimized on a dataset of 50 avian auditory forebrain neurons. Logical OR and logical AND FB models were fit using linear combinations of the subspace components of each method. Logical AND FB models from two example neurons are shown
The ultimate utility of methods for receptive field reconstruction is to produce models that can inform our understanding of the transformations performed by high-level sensory neurons. Toward that goal, one can subject the components obtained from low-rank reconstructions to a functional basis (FB) transformation (Kaardal et al.,
The difference between the negative log-likelihood of the best logical AND (
By using low-rank MNE models that are resistant to both overfitting and the biases of naturalistic stimuli, we have estimated multiple components relevant to responses of neurons from the avian auditory forebrain with much greater accuracy than by prior methods. Interestingly, we found that receptive fields of neurons from field L and CM, relatively high-level regions of the avian auditory forebrain, were composed of few components (
We demonstrated that the low-rank MNE models produced better predictive models than full-rank MNE, STC, and linear MNE models and did so with a fewer components than the full-rank MNE models across the population of neurons. There are several reasons why this improvement is observed. As mentioned before, MNE models in principle produce better reconstructions of the relevant components than STC for stimuli drawn from distributions other than Gaussian white noise. This was demonstrated in practice where we saw the STC models performing worse than the low-rank and linear MNE models on both model neurons and recordings from auditory neurons subject to correlated stimuli. With regard to the full-rank MNE models, the low-rank MNE models have three major advantages: (i) the number of components necessary to optimize is explicitly reduced, (ii) the optimization procedure uses nuclear-norm regularization to eliminate fictitious components, and (iii) significant components are recovered via a nonlinear matrix factorization. The first reason is simply a matter of reducing overfitting since fewer weights were estimated in the low-rank MNE models than the full-rank MNE models. To the second point, both the full-rank and low-rank MNE methods used a form of regularization as an attempt to reduce this overfitting but the early exiting procedure used by the full-rank MNE method (Fitzgerald et al.,
Ultimately, since the optimization of full-rank MNE models is convex, the weights that minimize the negative log-likelihood represent the ground truth as captured by the training data. Thus, if the full-rank MNE model's representation of
Overall, the efficiency of the low-rank optimization for the extraction of multiple input components makes it possible to begin to resolve a long-standing puzzle of how high-level auditory responses can combine selectivity to sharp transients with integration over broad temporal components. We find that it is possible to reconstruct many more components more accurately than was possible before. The results highlight an interesting potential difference between high-level visual and auditory responses. In vision, current studies (Serre et al.,
The full-rank MNE method (Fitzgerald et al.,
where
First-order MNE models solve (Equation 15) with
The STA and STC methods are standard methods for analyzing receptive fields of neurons stimulated by Gaussian white noise stimuli (Schwartz et al.,
where
one the spike-weighted mean stimulus covariance and the other the mean stimulus covariance. As with the STA, the stimuli are zero-centered. The matrix
and the relevant components are spanned by the eigenvectors corresponding to the largest variance eigenvalues. Determining where to cut-off the eigenvalue spectrum is done by randomly shuffling the responses in the training set to break correlations between the stimuli and responses (Bialek and de Ruyter van Steveninck,
Since STC is not equipped with a nonlinearity, we optimize a full-rank MNE model with all
To contend with the strong correlations present in the data sets, we repeated the above STC analysis with stimulus correlations removed through data whitening. Data whitening removes the mean correlations between elements of the stimulus space such that the mean covariance of the stimulus samples is the identity matrix. This can be a beneficial pre-processing step for STC models since STC models are biased when the stimulus space is not Gaussian white noise distributed. There are standard ways of whitening data known as principal component analysis (PCA) and zero-phase (ZCA) whitening (Bell and Sejnowski,
Then, for the PCA whitening transform the stimuli are transformed as
The FB method (Kaardal et al.,
and logical OR:
where
In prior applications, the FB method has produced basis vector spaces that are equal to
The low-rank MNE method was tested on synthetic data generated from model receptive fields: (i) a neuron with a high SNR and (ii) a neuron with a low SNR. In this case, high and low SNR correspond to the relative decisiveness of each model neuron's response to a stimulus. For instance, a high SNR model neuron is more likely to have a nonlinearity where
The correlated Gaussian stimuli were generated by first drawing sample vectors,
The data samples for each neuron are divided into 70% training (green), 20% cross-validation (blue), and 10% test (red) sets. The samples that appear in each set are varied between 4 jackknives.
Data from the CRCNS database provided by the Theunissen laboratory was composed of
We processed the stimuli using MATLAB's spectrogram function on each sound clip and adjusted the frequency resolution of the spectrograms to find a reasonably well balanced compromise between the frequency and temporal resolution. Each Hamming window of the spectrogram had a 50% overlap with its neighbors. We found that a 250 Hz frequency resolution, which is coupled with a 2 ms temporal resolution, demonstrated a reasonable spectro-temporal resolution in the linear weights (
We selected 41 of the 189 field L neurons and 9 of the 37 CM neurons for analysis. These neurons were chosen based on whether a spectro-temporal window of the stimuli could be identified that produced an estimate of a single-component receptive field with an observable structure. For each neuron, the STA (Schwartz et al.,
The number of stimulus samples ranges from 9,800 to 58,169 with a median sample size of 42,474 and the spike counts are between 276 and 29,121 with a median count of 6,120. First-order MNE, low-rank MNE, full-rank MNE, STC, and FB models were trained, cross-validated, and tested on 70%/20%/10% of the data, respectively, over four jackknives incremented in the same way as was done with regard to the model neuron data (i.e., Figure
Since we knew already that
Summary of parameter values used in our application of Algorithm
Varies | Maximum rank of |
|
π1, … , π |
Varies | Constraint signs |
ϵmax | 0.5 | Maximum value of the regularization parameters |
501 | Number of different values the regularization parameter can assume forming a uniform grid from 0 to ϵmax | |
70% of samples | Indices of data samples that form the training set | |
20% of samples | Indices of data samples that form the cross-validation set | |
20 | Maximum number of iterations of the block coordinate descent algorithm | |
δp | 0 (machine precision) | Convergence precision |
σmax | 3 | Number of allowed failures to improve cross-validation performance |
Two measurements were used to evaluate the quality of the our models. The overlap metric (Fitzgerald et al.,
measures how well the receptive field is recovered as measured on an interval
Once the mean components were recovered from
In cases where model parameters are determined from multiple training and cross-validation sets, there is a risk that, due to the unique biases of each data set's sampling of stimulus and response space, the datasets may produce weights that disagree on the value of
According to the Wigner semi-circle law, in the limit
Statistical approach for choosing 〈
1: | |
2: | |
3: | |
4: | |
5: | |
6: | |
7: | |
8: | |
9: | |
10: | |
11: | |
12: | |
13: | |
14: | 〈 |
15: | |
16: |
The interior-point method, block coordinate descent algorithm, and FB method were written in Python 2.7 using standard numerical packages numpy (version 1.11.1) and scipy (version 0.18.1) and the machine learning package Theano (version 0.8.2) (Al-Rfou et al.,
This study was carried out with the approval of the Animal and Use Committee at University of California, Berkeley.
The author contributions are itemized below: Model development-JK; Data acquisition-FT; Data analysis and interpretation-JK, TS; Drafting of the manuscript/revising for critically important intellectual content: JK, FT, TS.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
We would like to thank the members of the Sharpee group at the Salk Institute for Biological Studies for helpful discussions and insight.
The Supplementary Material for this article can be found online at: