Channel identification machines for multidimensional receptive fields

We present algorithms for identifying multidimensional receptive fields directly from spike trains produced by biophysically-grounded neuron models. We demonstrate that only the projection of a receptive field onto the input stimulus space may be perfectly identified and derive conditions under which this identification is possible. We also provide detailed examples of identification of neural circuits incorporating spatiotemporal and spectrotemporal receptive fields.


INTRODUCTION
Recently, a novel class of system identification algorithms, called Channel Identification Machines (CIMs), has been developed for identifying dendritic processing in spiking neural circuits (Lazar andSlutskiy, 2010, 2012). In the simplest setting, CIMs allow one to identify a communication/processing channel in [Filter]-[Asynchronous Sampler] circuits, where the effect of the channel on an input signal can be described by a linear filter and the output of the channel is mapped, or encoded, into a time sequence by a non-linear asynchronous sampler. Such [Filter]-[Asynchronous Sampler] circuits are known in the literature as Time Encoding Machines (TEMs) and their specific embodiments in neuroscience include neural circuit models in which an analog dendritic stimulus processor is followed by a point neuron model encoding the aggregate dendritic current produced by the processor. A few examples of asynchronous samplers include asynchronous sigma/delta modulators (ASDMs), conductance-based point neuron models such as Hodgkin-Huxley, Morris-Lecar, Fitzhugh-Nagumo, Wang-Buzsáki, Hindmarsh-Rose, oscillators with multiplicative coupling and simpler models such as the integrate-and-fire neuron or the threshold-and-fire neuron (Brown et al., 2004;Lazar andSlutskiy, 2012, 2014a).
The above-mentioned asynchronous samplers incorporate the temporal dynamics of spike generation at the axon hillock of biological neurons and allow one to consider, in particular for neuroscience applications, more biologically-plausible non-linear spike generation (sampling) mechanisms. This is in contrast to existing methods, such as the generalized linear model (GLM), which typically assumes a simplified description of the spike generation dynamics by using a static non-linearity. The non-linear contribution of a dynamical system such as the Hodgkin-Huxley neuron model is not static, but dynamic and stimulus-driven (Lazar and Slutskiy, 2014a). It changes from one spike to the next and thus affects the estimation procedure if not properly taken into account. Furthermore, since the non-linearity does not fully capture the complex spike generation dynamics, the filters fit to that non-linearity may not provide an adequate description of the neural circuit, and in particular may confound dendritic processing and encoding (Lazar and Slutskiy, 2014a).
More recently, the CIM methodology has been extended to neural circuits in higher brain centers that receive multidimensional spike trains as input stimuli instead of continuous signals and to circuits with extensive lateral connections and feedback (Lazar and Slutskiy, 2014a). Together with TEMs and Time Decoding Machines (TDMs) for decoding stimuli from spike trains, CIMs provide a unified framework for studying encoding, decoding and identification in neural systems.
The motivation for multidimensional CIMs (Lazar and Slutskiy, 2013) is provided by the concept of a receptive field that is well established in neuroscience. Introduced in 1906 by Sherrington (Sherrington, 1906) to describe an area of the body surface capable of eliciting a reflex in response to a stimulus, the term "receptive field" has been extended to many different sensory modalities and spans many different types of neurons. For example, in the visual system, the receptive field of a photoreceptor is a 3-dimensional cone in space comprising all possible directions in which light can hit the photoreceptor. In the auditory system, receptive fields can correspond to certain spectral regions of audio stimuli. More broadly, the receptive field is that part of the the sensory space that can evoke a neuronal response (Dayan and Abbott, 2005). Spatial and spatiotemporal receptive fields have been successfully used in vision to model retinal ganglion cells in the retina as well as neurons in the lateral geniculate nucleus and the visual cortex (see DeAngelis et al., 1995 for a review). Similarly, spectrotemporal receptive fields have been used to describe responses of auditory neurons (Aertsen and Johannesma, 1981), neurons in cochlear nuclei (Clopton and Backoff, 1991) and neurons in the auditory cortex (Kowalski et al., 1996).
Circuits that process multidimensional feedforward input are often encountered in biological systems (e.g., the retina, Pillow et al., 2008). From a modeling standpoint, one can also consider spiking neural circuits in which, in addition to feedforward input, every neuron may receive lateral inputs from neurons in the same layer. Furthermore, back-propagating action potentials (Waters et al., 2005), or feedback, may affect computations performed within the dendritic tree. Until now, CIMs for circuits with lateral connectivity and feedback have only been considered in the context of scalar or vector-valued signals of one variable, e.g., functions of time u 1 (t), t ∈ R.
In this paper we discuss multidimensional channel identification machines that allow one to identify signal transformations applied to multidimensional signals u n (x 1 , . . . , x n ), n ∈ N, where x n typically designates the time variable. A few examples of multidimensional CIMs include (i) spatial CIMs, where the input signal u 2 (x, y) is a function of a two-dimensional space, describing, e.g., images; (ii) spectrotemporal CIMs, where the input signal u 2 (ν, t) is a function of spectrum and time, describing, e.g., auditory signals; (iii) spatiotemporal CIMs, where the input signal u 3 (x, y, t) is a function of space and time, describing, e.g., video signals. Signal transformations are performed by dendritic stimulus processors. They are modeled here as receptive fields whose output is encoded by spiking neurons with lateral connectivity (cross-feedback) and feedback.
The rigorous mathematical formalism employed here enables us to obtain two key results: (i) the projection of the multidimensional receptive field onto the input signal space can be perfectly identified, and (ii) the problem of identification of multidimensional receptive fields is dual to the problem of neural decoding. The duality result shows that the identification of a multidimensional receptive field is equivalent to the problem of encoding its projection with a neural circuit whose receptive field has an impulse response that is exactly the multidimensional input test signal. We provide an identification algorithm and give detailed examples arising in spiking models of audition and vision.

The space of input stimuli
A multidimensional communication/processing channel of interest is shown in Figure 1. An analog signal u n (x 1 , . . . , x n ) of n dimensions is passed through a channel with memory that describes a physical communication link or a signal processing stage. The output of the channel v is then mapped, or encoded, by an asynchronous sampler into the time sequence (t k ) k∈Z . In the example shown in Figure 1, the asynchronous sampler is a (leaky) IAF neuron. We model input signals as elements of a Reproducing Kernel Hilbert Space (RKHS) (Berlinet and Thomas-Agnan, 2004). For practical and computational reasons we choose to work with the multidimensional space of trigonometric polynomials H n defined below. However, the results are not limited to this particular RKHS.
Here i is the bandwidth in dimension x i , L i is the order, and T i = 2π L i / i is the period, for all i = 1, . . . , n, and H n is endowed with the inner product ·, · : Note that given the inner product in (1), the set of elements e l 1 ...l n (x 1 , . . . , x n ) forms an orthonormal basis in H n . Moreover, H n is an RKHS with a reproducing kernel (RK) given by  In what follows we provide two-and three-dimensional signal models that arise in audition and vision. For convenience, we use a simpler and widely-used notation.
Example 1. We model spectrotemporal stimuli u 2 (ν, t) as elements of an RKHS of trigonometric polynomials H 2 defined on with ( 1 , L 1 ) and ( 2 , L 2 ), being the (bandwidth, order) pairs in the spectral direction ν and in time t, respectively. A signal u 2 ∈ H 2 can be written as where the coefficients u l 1 l 2 ∈ C and functions form an orthonormal basis for the (2L 1 + 1)( Example 2. We model spatiotemporal stimuli u 3 (x, y, t) as elements of an RKHS of trigonometric polynomials H 3 defined FIGURE 1 | Multidimensional problem setting. A known multidimensional signal u n (x 1 , x 2 , . . . , x n ), is first passed through a communication/processing channel. A non-linear sampler then maps the output v of the channel into an observable time being the (bandwidth, order) pairs in spatial directions x and y and in time t, respectively. A visual stimulus u 3 ∈ H 3 can be written as where coefficients u l 1 l 2 l 3 ∈ C and functions form an orthonormal basis for the (2L 1 + 1)(2L 2 + 1)(2L 3 + 1)dimensional space H 3 .

Modeling multidimensional processing
In the simplest setting, the communication/processing channel is described by a receptive field with a kernel h n (x 1 , . . . , x n ). The kernel is assumed to be causal in the time variable (if any) and BIBO-stable. We also assume that the kernel has a finite support of length S i ≤ T i in each direction x i . In other words, each kernel h n belongs to the space H n .

Definition 2. The filter kernel space H n is given by
Since the length of the filter support is smaller than or equal to the period of the input signal in each dimension, we effectively require that for given S i and fixed input signal bandwidth i , the order L i of the space H n satisfies L i ≥ S i · i /(2π ) for all i = 1, . . . , n.
Definition 3. The operator P : H n → H n given (by abuse of notation) by is called the projection operator.
Since Ph n ∈ H n , we have

MULTIDIMENSIONAL [FILTER]-[IAF] TEMs AND THEIR T-TRANSFORMS
In this section we analyze the general single-input single-output (SISO) multidimensional TEM shown in Figure 1 and describe in detail its I/O behavior. We then provide two specific examples of SISO multidimensional TEMs that are often encountered in neuroscience.
Without loss of generality, we assume that memory effects in the neural circuit arise only in the temporal dimension t of the stimulus and interactions in other dimensions are multiplicative in their nature. Consequently, the output v of the multidimensional receptive field is given by a convolution in the temporal dimension and integration in all other dimensions, i.e., The temporal signal v(t) represents the total dendritic current flowing into the spike initiation zone, where it is encoded into spikes by a point neuron model, such as the (leaky) IAF neuron of Figure 1 . The mapping of the multidimensional stimulus u n into a temporal sequence (t k ) k ∈ Z is described by the set of equations also known as the t-transform (Lazar, 2004;Lazar and Tóth, 2004), where Assuming the stimulus u n (x 1 , . . . , x n − 1 , t) ∈ H n and using the kernel representation, we have where y = (y 1 , . . . , y n ) and dy = dy 1 dy 2 . . . dy n .
Let us now define the linear functional L k : By the Riesz representation theorem there is a function φ k ∈ H n such that We arrived at the following Lemma 1. The SISO multidimensional TEM with a receptive field described by a kernel h n = h n (x 1 , . . . , x n−1 , t) provides irregular samples, or quantal measurements of the projection Ph n of the kernel h n onto the input stimulus space H n . In other words, the t-transform may be rewritten as an inner product Remark 1. The result above has a simple interpretation. First, information about the receptive field is encoded in the form of quantal measurements q k . These measurements can be readily computed from the spike times (t k ) k ∈ Z . Second, the information about the receptive field is only partial and depends on the stimulus space H n used in identification. Specifically, q k 's are measurements not of the original kernel h n but of its projection Ph n onto the space H n . Figure 2.

Example 3. A SISO Spectrotemporal TEM is shown in
, is the input to a communication/processing channel with kernel h 2 (ν, t). The signal u 2 (ν, t) may represent the time-varying amplitude of a sound in a frequency band centered around ν and h 2 (ν, t) the spectrotemporal receptive field (STRF) (Kowalski et al., 1996). The output v of the kernel is encoded into a sequence of spike times (t k ) k∈Z by the leaky integrate-and-fire neuron with a threshold δ, bias b and membrane time constant RC. A spectrotemporal TEM can be used to model the processing or transmission of, e.g., auditory stimuli characterized by a frequency spectrum varying in time (Kowalski et al.,  1996). The operation of such a TEM can be fully described by the t-transform with q k given by (4) for all k ∈ Z.

Example 4. A simple SISO Spatiotemporal TEM is shown in
, appears as an input to a communication/processing channel described by a filter with a kernel h 3 (x, y, t).
The output v of the kernel is encoded into a sequence of spike times (t k ) k ∈ Z by the leaky integrate-and-fire neuron. A spatiotemporal TEM can be used to model the processing or transmission of, e.g., video stimuli characterized by a spatial component varying in time. The t-transform of such a TEM is given by with q k given by (4) for all k ∈ Z.
Assuming the video stimulus u 3 (x, y, t) ∈ H 3 , Equation (9) can be written as where L k : H 3 → R is a linear functional. By the Riesz representation theorem, there is a function φ k ∈ H 3 such that where L k : H 2 → R is a functional. As before, by the Riesz representation theorem, there is a function φ k ∈ H 2 such that L k (Ph 2 ) = Ph 2 , φ k .

IDENTIFICATION ALGORITHMS
In the previous section we demonstrated a relationship between the problem of identification of a receptive field and an irregular sampling problem. Namely, we showed that a projection Ph n of the multidimensional receptive field h n is embedded in the output spike sequence of the neuron as samples, or quantal measurements, q k of Ph n . A natural followup question to ask is how to reconstruct Ph n from these measurements. We have the following result.

Theorem 1. (SISO Multidimensional CIM)
Let {u i n | u i n ∈ H n } N i = 1 be a collection of N linearly independent stimuli at the input to a [Filter]-[Leaky IAF] circuit with a multidimensional receptive field h n ∈ H n . If the number of signals N ≥ n − 1 p = 1 (2L p + 1) and the total number of spikes produced in response to all stimuli is greater than n p = 1 (2L p + 1) + N, then the filter projection Ph n can be perfectly identified from a collection of input/output ( and, therefore, In matrix form we have q i = i h, with q i k = q i k , where the elements i kl = φ i l 1 ... l n k , with the column index l traversing all possible subscript combinations of l 1 , l 2 , . . . , l n and [h] l = h l 1 ,...,l n . Repeating for all signals i = 1, . . . , N, we obtain q = h with q = q 1 ; q 2 ; . . . ; q N and = 1 ; 2 ; . . . ; N . This system of linear equations can be solved for h, provided that the rank r( ) of the matrix satisfies r( ) = n p = 1 (2L p + 1). A necessary condition for the latter is that the total number of spikes generated by all N neurons is greater or equal to n p = 1 (2L p + 1) + N. Then h = + q, where + denotes a pseudoinverse of . To find the coefficients φ i l 1 ...l n k , we note that Finally, we note that since the dendritic current v has a maximum bandwidth of t , we only need 2L t + 1 measurements to specify it. Thus, in response to each stimulus u i n , the neuron can produce a maximum of only 2L t + 1 informative measurements, or equivalently, 2L t + 2 informative spikes on the interval [0, T t ]. It follows that if the neuron generates ν ≥ 2L t + 2 spikes for each test signal, the minimum required number of signals is N = n − 1 p = 1 (2L p + 1)(2L t + 1)/(2L t + 1) = n − 1 p = 1 (2L p + 1). Similarly, if the neuron generates ν < 2L t + 2 spikes for each signal, then the minimum required number of signals is N = n p = 1 (2L p + 1)/(ν − 1) . The block diagram of the identification procedure and algorithm are shown in Figure 4. Identification of the filter h n has been reduced to the encoding of the projection Ph n with a SIMO TEM whose receptive fields are u i n , i = 1, . . . , N.

SISO multidimensional CIM with lateral connectivity and feedback
Generalizing the ideas above, one can consider more complex spiking neural circuits, in which every neuron may receive not only feedforward inputs, but also lateral inputs from neurons in the same layer and back-propagating action potentials may contribute to computations within the dendritic tree. A two-neuron circuit incorporating these considerations is shown in Figure 5.
Each neuron j processes a visual stimulus u j 3 (x, y, t) using a distinct spatiotemporal receptive field h 1j1 3 (x, y, t), j = 1, 2. The processing of lateral inputs is described by the temporal receptive fields (cross-feedback filters) h 221 and h 212 , while various signals produced by back-propagating action potentials are modeled by the temporal receptive fields (feedback filters) h 211 and h 222 . The aggregate dendritic currents v 1 and v 2 , produced by the receptive fields and affected by back propagation and cross-feedback, are encoded by IAF neurons into spike times (t 1 k ) k∈Z , (t 2 k ) k∈Z .

Theorem 2. (SISO multidimensional CIM with lateral connectivity and feedback)
Let be a collection of N linearly independent vector stimuli at the input to two neurons with multidimensional receptive fields h 1j1 n ∈ H n , j = 1, 2, lateral receptive fields h 212 , h 221 and feedback receptive fields h 211 and h 222 . Let (t 1 k ) k∈Z and (t 2 k ) k∈Z be the sequences of spike times produced by the two neurons. If the number of test stimuli N ≥ n−1 p=1 (2L p + 1) + 2 and the total number of spikes produced by each neuron in response to all stimuli is greater than n p=1 (2L p + 1) + 2(2L n + 1) + N, then the filter projections Ph 211 , Ph 212 , Ph 221 , Ph 222 and Ph   Proof: Essentially the same as in Theorem 1, with an addition of lateral and feedback terms. Each additional temporal filter requires (2L n + 1) additional measurements, corresponding to the number of bases in the temporal variable t. Figures 6-9, and corresponding figure legends demonstrate the performance of the multidimensional Channel Identification Machine.

RESULTS
In simulations pertaining to the spectrotemporal receptive field (see also Figure 6), we used the short-time Fourier transform of an arbitrarily chosen 200 ms segment of the Drosophila courtship song as a model of the STRF. The space of spectrotemporal signals H 2 had bandwidth 1 = 2π · 80 rad/s and order L 1 = 16 in the spectral direction ν and bandwidth 2 = 2π · 120 rad/s and order L 2 = 24 in the temporal direction t. The STRF appeared in cascade with an ideal IAF neuron (see Figure 2), whose parameters were chosen so that it generated a total of more than (2L 1 + 1)(2L 2 + 1) = 33 × 49 = 1, 617 measurements in response to all test signals. We employed a total of N = 40 spectrotemporal signals (which is larger than the (2L 1 + 1) = 33 requirement of Theorem 1) in order to identify the STRF.
In simulations involving the spatiotemporal receptive field (see also Figures 7, 8 we used a spatial Gabor function that was either rotated, dilated or translated in space as a function of time. The space of spatiotemporal signals H 3 had bandwidth 1 = 2π · 12 rad/s and order L 1 = 9 in the spatial direction x, bandwidth 2 = 2π · 12 rad/s and order L 2 = 9 in the spatial direction y, and bandwidth 3 = 2π · 100 rad/s and order L 3 = 5 in the temporal direction t. The STRF appeared in cascade with an ideal IAF neuron (see Figure 2), whose parameters were chosen so that it generated a total of more than (2L 1 + 1)(2L 2 + 1)(2L 3 + 1) = 19 × 19 × 11 = 3, 971 measurements in response to all test signals. In order to identify the projection Ph 3 we employed a total of N = 400 spatiotemporal signals, a number that is larger than the (2L 1 + 1)(2L 2 + 1) = 361 requirement of Theorem 1).
In simulations involving the spatial receptive field (see also Figure 9), we used a static spatial Gabor function. The space of spatial signals H 2 had bandwidths 1 = 2 = 2π · 15 rad/s and L 1 = L 2 = 12 in spatial directions x and y. The STRF appeared in cascade with an ideal IAF neuron (see Figure 2 as a reference), whose parameters were chosen so that it generated a total of more than (2L 1 + 1)(2L 2 + 1) = 25 × 25 = 625 measurements in response to all test signals. In order to identify the projection Ph 2 we employed a total of N = 688 spatial signals (a number that is larger than the (2L 1 + 1)(2L 2 + 1) = 625 requirement of Theorem 1).
Identification results for the circuit in Figure 5 are shown in Figure 10. The spatiotemporal receptive fields used in this simulation were non-separable. The first receptive field was modeled as a single spatial Gabor function (at time t = 0) translated in space with uniform velocity as a function of time, while the second

IMPLICATIONS FOR MULTIDIMENSIONAL ENCODING AND DECODING
The duality between multidimensional channel identification and stimulus decoding problems allowed us to derive identification algorithms for estimation of receptive fields of arbitrary dimensions and precise conditions under which the identification is possible. At this point it is important to pause and analyze the relationship between the dual problems. As it often turns out, looking at a dual problem can provide a tremendous insight about the primal problem. Interestingly, previous results for video time encoding and decoding machines provided only the necessary condition of having enough spikes to decode the video Lazar and Pnevmatikakis, 2011). This condition naturally follows from having to invert a matrix in order to compute the basis coefficients of the video signal. Since the matrix needs to be full rank to provide a unique solution, and there are a total of (2L 1 + 1)(2L 2 + 1)(2L 3 + 1) coefficients involved, (2L 1 + 1)(2L 2 + 1)(2L 3 + 1) + N spikes are needed from a population of N neurons (the number of spikes is larger than the number of needed measurements by N since every measurement q k is computed using two consecutive spikes t k and t k + 1 .).
Note that a necessary condition only tells us that the number of spikes must have been greater than (2L 1 + 1)(2L 2 + 1)(2L 3 + 1) + N if we were able to recover the video signal. In order to guarantee that the video can be recovered we need a sufficient condition.
The sufficient condition can be derived by drawing comparisons between the decoding and identification problems. In identification, estimation of a receptive field from a single trial is usually not possible, even if the neuron produces a large number of spikes (Lazar and Slutskiy, 2014b). Intuitively, this is because the output of the receptive field is just a function of time. In essence, all dimensions of the stimulus are compressed into just onethe temporal dimension-and we need only (2L 3 + 1) measurements to specify a temporal function. As a result, only (2L 3 + 1) measurements are informative and we do not gain any new information if the neuron is oversampling the temporal signal. Thus, if the neuron is producing at least (2L 3 + 1) measurements per each test stimulus, we need N ≥ (2L 1 + 1)(2L 2 + 1) different trials to reconstruct a (2L 1 + 1)(2L 2 + 1)(2L 3 + 1)-dimensional receptive field. Similarly, to decode a (2L 1 + 1)(2L 2 + 1)(2L 3 + 1)-dimensional input stimulus, we need N ≥ (2L 1 + 1)(2L 2 + 1) neurons, with each neuron in the population producing at Frontiers in Computational Neuroscience www.frontiersin.org September 2014 | Volume 8 | Article 117 | 11 least (2L 3 + 1) measurements. If each neuron produces less than (2L 3 + 1) measurements, a larger population N is needed to faithfully encode the video signal. More generally, if the n-dimensional input stimulus is an element of a (2L 1 + 1)(2L 2 + 1) . . . (2L n + 1)-dimensional RKHS (where the last dimension is time), and the neuron is producing at least at least (2L n + 1) + 1 spikes per test stimulus, a minimum of (2L 1 + 1)(2L 2 + 1) . . . (2L n − 1 + 1) linearly independent stimuli, or trials with linearly independent stimuli, are needed to identify the receptive field. This condition is sufficient and by duality between channel identification and time encoding, complements the previous necessary condition derived for time decoding machines.

EXTENSIONS
In the derivations above we implicitly assumed that the I/O system was noiseless. In practice, noise is introduced either by the channel or the sampler itself . In the presence of noise it is not possible to identify the projection Ph n loss-free. However, the analysis/methodology presented above can be extended within an appropriate mathematical setting to I/O systems with noisy measurements. For example, we can still identify an optimal estimate Ph n of Ph n with respect to an appropriately defined cost function, e.g., by using the Tikhonov regularization method. The regularization methodology extensively discussed in Lazar and Slutskiy (2014a) can be adopted with minor modifications to our setting here.
Finally we note that, for convenience and simplicity of notation, the asynchronous sampler used throughout this paper was the IAF neuron. Extensions of our results to neural circuits built with other biophysically-grounded neuron models can be readily obtained by adapting the methodology developed in Lazar andSlutskiy (2012, 2014a).