Time and Category Information in Pattern-Based Codes

Sensory stimuli are usually composed of different features (the what) appearing at irregular times (the when). Neural responses often use spike patterns to represent sensory information. The what is hypothesized to be encoded in the identity of the elicited patterns (the pattern categories), and the when, in the time positions of patterns (the pattern timing). However, this standard view is oversimplified. In the real world, the what and the when might not be separable concepts, for instance, if they are correlated in the stimulus. In addition, neuronal dynamics can condition the pattern timing to be correlated with the pattern categories. Hence, timing and categories of patterns may not constitute independent channels of information. In this paper, we assess the role of spike patterns in the neural code, irrespective of the nature of the patterns. We first define information-theoretical quantities that allow us to quantify the information encoded by different aspects of the neural response. We also introduce the notion of synergy/redundancy between time positions and categories of patterns. We subsequently establish the relation between the what and the when in the stimulus with the timing and the categories of patterns. To that aim, we quantify the mutual information between different aspects of the stimulus and different aspects of the response. This formal framework allows us to determine the precise conditions under which the standard view holds, as well as the departures from this simple case. Finally, we study the capability of different response aspects to represent the what and the when in the neural response.


Introduction: Patterns in the neural response
Sensory neurons represent external stimuli.In realistic conditions, different stimulus features (for example, the presence of a predator or a prey) appear at irregular times.Therefore, an efficient sensory system should not only represent the identity of each perceived stimulus, but also, its timing.Colloquially, qualitative differences between stimulus features have been called the what in the stimulus, whereas the temporal locations of the features constitute the when.Spike trains can encode both the what and the when, for example, as a sequence of spike patterns.This idea constitutes a standard view (Theunissen and Miller, 1995;Borst and Theunissen, 1999;Krahe and Gabbiani, 2004), where the timing of patterns indicates when stimulus features occur, while the pattern identities tag what stimulus features happened (Martinez-Conde et al., 2002;Alitto et al., 2005;Oswald et al., 2007;Eyherabide et al., 2008).The information provided by the distinction between different spike patterns is here called category information.In the same manner, the information transmitted by the timing of spike patterns is here called time information.According to the standard view, the category and the time information represent the knowledge of the what and the when in the stimulus, respectively.In this work, we address the conditions under which these assumptions hold, as well as departures from the standard view.
Many studies have shown the ubiquitous presence of patterns in the neural response.The patterns can be, for instance, high-frequency burst-like discharges of varying length and latency.Examples have been found in primary auditory cortex (Nelken et al., 2005), the salamander retina (Gollisch and Meister, 2008), the mammalian early visual system (DeBusk et al., 1997;Martinez-Conde et al., 2002;Gaudry and Reinagel, 2008), and grasshopper auditory receptors (Eyherabide et al., 2009;Sabourin and Pollack, 2009).In other cases, the patterns are spike doublets of different inter-spike interval (ISI) duration.Reich et al. (2000) presented an example of this type in primate V1; and Oswald et al. (2007) found a similar code in the electrosensory lobe of the weakly electric fish.In yet other cases, patterns are more abstract spatiotemporal combinations of spikes and silences defined in single neurons (Fellous et al., 2004) and neural populations (Nádasdy, 2000;Gütig and Sompolinsky, 2006).
If different spike patterns represent different stimulus features, which aspects of the pattern are relevant to the distinction between the different features?To answer this question, previous studies have classified the response patterns into different types of categories, depending on different response aspects.The relevance of each candidate aspect was addressed using what we here define as the category information.For example, in the auditory cortex, Furukawa and Middlebrooks (2002) assessed how informative patterns were when categorised in three different ways, using the first spike latency, the total number of spikes, or the variability in the spike timing.In an even more ambitious study, Gawne et al. (1996) have not only compared the information separately transmitted by response latency and spike count, but also related these two response properties to two different stimulus features: contrast and orientation, respectively.However, these works have not addressed how the stimulus timing is represented by the response patterns.
The role of patterns in signaling the occurrence of the stimulus features can only be addressed in those experiments where the stimulus features appear at irregular times.In this context, previous approaches have estimated the time information (Gaudry and Reinagel, 2008;Eyherabide and Samengo, 2010), or have either employed other statistical measures such as reverse correlation (Martinez-Conde et al., 2000;Eyherabide et al., 2008).The time information was calculated as the one encoded by the pattern onsets alone, without distinguishing between different types of patterns.
In this paper, we analyse the role of timing and categories of patterns in the neural code.To this aim, we build different representations of the neural response preserving one of these two aspects at a time.This allows us to quantify the time and the category information separately.We determine the precise meaning of these quantities and study of their variations for different representations of the neural response.Unlike previous works (Gaudry and Reinagel, 2008;Eyherabide et al., 2009;Foffani et al., 2009), we quantify the information preserved and lost when the neural response is read out in such a way that only the categories (timing) of patterns are preserved.As a result, the relevance of each aspect of the neural response is unambiguously determined.
In principle, the timing and the categories of spike patterns may be correlated.These interactions may be due to properties of the encoding neuron (such as latency codes Furukawa and Middlebrooks, 2002;Gollisch and Meister, 2008), properties of the decoding neuron (when reading a pattern-based code Lisman, 1997;Reinagel et al., 1999), the convention used to assigned a time reference to the patterns (Nelken et al., 2005;Eyherabide et al., 2008), or the convention used to identify the patterns from the neural response (Fellous et al., 2004;Alitto et al., 2005;Gaudry and Reinagel, 2008).A statistical dependence between timing and categories of patterns may, for example, introduce redundancy between the time and category information.Thus, the same information may be contained in different aspects of the response (categorical or temporal aspects).In addition, the statistical dependence might also induce synergy, in which case extracting all the information about the what and the when requires the simultaneous read-out of both aspects.The presence of synergy and redundancy between the time and category information may affect the way each of them represents the what and the when in the stimulus.
In the present study, we provide a formal framework to gain insight of the interaction between the timing and the categories of patterns for different neural codes.We formally define the what and the when as representations of the stimulus preserving only the identities and timing of stimulus features, respectively.
We then establish the conditions under which the pattern categories encode the what in the stimulus, and the timings the when.We also study departures from this standard interpretation, in particular, when the time position of patterns depends on their internal structure.We show the impact of this dependence on both the link with the what and the when and the relative relevance of the timing and categories of patterns.Our study is therefore intended to motivate more systematic explorations of the neural code in sensory systems.

Reduced representations of the neural response
A representation is a description of the neural response.Formally, it is obtained by transforming the recorded neural activity through a deterministic mapping.Throughout this paper, the expressions "deterministic mapping" and "function" are used as synonyms.We only consider functions that transform the unprocessed neural response U into sequences of events e i = (t i , c i ), characterised by their time positions (t i ) and categories (c i ).An event is a definite response stretch.Based on their internal structure, events are classified into different categories, as explained later in this section.Individual spikes may be regarded as the simplest events.In this case, the sequence of events is called the spike representation (see Figure 1A), comprising events belonging to a single category: the category "spikes".
From the spike representation, we can define more complex events, hereafter called patterns (see bold symbols in the spike representation in Figure 1A).Patterns may be defined in terms of spikes, bursts or ISIs (Alitto et al., 2005;Luna et al., 2005;Oswald et al., 2007;Eyherabide et al., 2008).They may involve one or several neurons.Examples of population patterns are coincident firing, precise firing events and sequences, or distributed patterns (Hopfield, 1995;Abeles and Gat, 2001;Reinagel and Reid, 2002;Gütig and Sompolinsky, 2006).The sequence of patterns obtained by transforming the spike representation is called the pattern representation.Analogously, the sequence of patterns only characterised by either their time positions or their categories constitute the time representation and category representation, respectively.Details on how to build these sequences are explained below.For simplicity, these sequences are represented in Figure 1 as sequences of symbols n, indicating specific events (n > 0) and silences (n = 0).Formally, to obtain the spike representation (R), the unprocessed neural response (U) is transformed into a sequence of spikes (1) and silences (0) (Figure 1A).The time bin is taken small enough to include at most one spike.Differences in shape of action potentials are ignored, while their time positions are preserved, with temporal precision limited by the bin size.As a result, several sequences of action potentials may be represented by the same spike sequence (see Figure 1B).
In the pattern representation (B), the spike sequence is transformed into a sequence of silences (n = 0) and spike patterns (n = b > 0), distinguished solely by their category b.For example, in Figure 1, patterns are defined as response stretches containing consecutive spikes separated by at most one silence.The time positions of the pattern is defined as the first spike in each pattern stretch, whereas patterns with the same number of spikes are grouped into the same pattern category.Only information about pattern categories and time positions remains (compare the bold symbols in the spike and the pattern representation in Figure 1.A).By ignoring differences among patterns within categories, several spike sequences can be mapped into the same pattern sequence, as shown in Figure 1C.
The time position of patterns is measured with respect to a common origin, in general, the begin- ning of the experiment.It can be defined, for example, as the first (or any other) spike of the pattern or as the mean response time (Lisman, 1997;Nelken et al., 2005;Eyherabide et al., 2009).Patterns are classified into categories according to different aspects describing their internal structure, such as the latency, the number of spikes or the spike-time dispersion (Gawne et al., 1996;Theunissen and Miller, 1995;Furukawa and Middlebrooks, 2002).Notice that latencies are usually defined with respect to the stimulus onset, which is not a response property (Chase and Young, 2007;Gollisch and Meister, 2008).
Thus, latencies and timing of spike patterns are different concepts, and the latency cannot be read out from the neural response alone.However, latencies have also been defined with respect to the local field potential (Montemurro et al., 2008) or population activity (Chase and Young, 2007).These definitions can be regarded as internal aspects of spatiotemporal spike patterns (Theunissen and Miller, 1995;Nádasdy, 2000).
Categories of patterns can be built by discretizing the range of one or several internal aspects.For example, Reich et al. (2000) defined patterns as individual ISIs, and categorised them in terms of their duration.Three categories were considered, depending on whether the ISI was short, medium or large.In other cases, patterns may be sequences of spikes separated by less than a certain time interval.Categories of patterns can then be defined, depending on the number of spikes in each pattern (Reinagel and Reid, 2000;Martinez-Conde et al., 2002;Eyherabide and Samengo, 2010), as shown in Figure 1, or depending on the length of the first ISI (Oswald et al., 2007).The theory developed in this paper is valid irrespective of the way in which one chooses to define the pattern time positions and the pattern categories.
From the pattern sequence, we obtain the time representation (T) by only keeping the time positions of patterns.As a result, the neural response is transformed into a sequence of silences (0) and events (1), indicating the occurrence of a pattern in the corresponding time bin and disregarding its category.
The temporal precision of the pattern representation is preserved in the time representation.However, by ignoring differences between categories, different pattern sequences can be mapped into the same time representation, as illustrated in Figure 1D.
The category representation (C) is complementary to the time representation.It is obtained from the pattern sequence, by only keeping information about the categories of patterns while ignoring their time positions.The neural response is transformed into a sequence of integer symbols n > 0, representing the sequence of pattern categories in the response.The exact time position of patterns is lost: only their order remains.Therefore, several pattern sequences may be mapped onto the same category sequence, as indicated in Figure 1E.
The spike (R), pattern (B), time (T) and category (C) representations are derived through functions that depend only on the previous representation, as denoted by the arrows in Figure 1A, and formally expressed by the following equations: where h X→Y represents the function h that is applied to the representation X to obtain the representation Y.
These transformations progressively reduce both the variability in the neural response and the number of possible responses where H(X) means the entropy H of the set X (Cover and Thomas, 1991), and |X| indicates its cardinality, i.e. the number of elements of the set X.

Calculation of mutual information rates
The mutual information I(X; S) between two random variables X and S is defined as the reduction in the uncertainty of one of the random variables due to the knowledge of the other.It is formally expressed as a difference between two entropies where H(X) is the total entropy of X and H(X|S) represents the conditional or noise entropy of X provided that S is known (Cover and Thomas, 1991).
We estimate the mutual information between the stimulus S and a representation X of the neural response using the so-called Direct Method, introduced by Strong et al. (1998).The unprocessed neural response U is divided into time intervals U τ of length τ.Each response stretch U τ is then transformed into the discrete-time representation X τ (X τ = h U→X (U τ )), also called words.As a result This inequality is valid for every time interval of length τ (Cover and Thomas, 1991) and is not limited to the asymptotic regime for long time intervals, like in previous calculations (Gaudry and Reinagel, 2008;Eyherabide et al., 2009).The mutual information calculated with words of length τ only quantifies properly the contribution of spike patterns that are shorter than τ.In order to include the correlations between these patterns, even longer words are needed.Therefore, in this study, the maximum window length ranged between 3 and 4 times the maximum pattern duration.
The total entropy (H(X τ )) and noise entropy (H(X τ |S)) are estimated using the distributions of words X τ unconditional (P(X τ )) and conditional (P(X τ |S)) on the stimulus S, respectively.The mutual information I(S; X τ ) is computed by subtracting H(X τ |S) from H(X τ ) (Eq. 3).This calculation is repeated for increasing word lengths, and the mutual information rate I(S; X) between the stimulus S and a representation X of the neural response is estimated as This quantity represents the mutual information per unit time when the stimulus and the response are read out with very long words.In this work we always calculate mutual information rates unless it is otherwise indicated.However, for compactness, we sometimes refer to this quantity simply as "information".
The estimation of information suffers from both bias and variance (Panzeri et al., 2007).In this work, the sampling bias of the information estimation was corrected using the NSB approach for the experimental data (Nemenman et al., 2004).For the simulations, we used instead the quadratic extrapolation (Strong et al., 1998), due to its simplicity and the possibility of generating large amounts of data.The standard deviation of the information was estimated from the linear extrapolation to infinitely long words (Rice, 1995).The bias correction was always lower than 1.5 % and the standard deviation, always lower than 1 %, for all simulations and all word lengths; thus error bars are not visible in the figures.When comparisons between information estimations were needed, one-sided t-tests were performed (Rice, 1995).

Simulated data
Simulations are used to exemplify the theoretical results and to gain additional insight on how different response conditions affect information transmission in well-known neural models and neural codes.They represent highly idealised cases, with unrealistically long runs and number of trials, that allow us to readily exemplify the theoretical results and transparently obtain reliable information estimates.Firstly, we define the parameters used in the simulations and relate them to the specific aspects of the stimulus and the response.Then, we report the specific values for the parameters.

General description
In the simulations, the stimulus consists of a random sequence of instantaneous discrete events, here called stimulus features.Each stimulus feature is characterised by specific physical properties, as for example, the colour of a visual stimulus, the pitch of an auditory stimulus, the intensity of a tactile stimulus, or the odour of an olfactory stimulus (Poulos et al., 1984;Rolen and Caprio, 2007;Nelken, 2008;Mancuso et al., 2009).
In the real world, however, features are not necessarily discrete.If they are continuous, one can discretize them by dividing their domain into discrete categories (Martinez-Conde et al., 2002;Eyherabide et al., 2008;Marsat et al., 2009).The present framework sets no upper limit to the number of features, nor to the similarity between different categories.In addition, features might not be instantaneous but rather develop in extended time windows, as it happens with the chirps in the weakly electric fish (Benda et al., 2005), the oscillations in the electric field potential (Oswald et al., 2007) and the amplitude of auditory stimuli (Eyherabide et al., 2008).In order to capture the duration of real stimuli, in the simulations we define a minimum inter-feature interval λ s min , for each feature s.After the presentation of a feature s, no other feature may appear in an interval lower or equal to λ s min .
In the simulated data, each stimulus feature elicits a neural response (see Figure 2A).Since in this paper we are interested in pattern-based codes, each feature generates a pattern of spikes belonging to some pattern category.The correspondence between stimulus features and pattern categories may be noisy.We consider both categorical noise (the pattern category varies from trial to trial) and temporal noise (the timing of the pattern varies from trial to trial).In Figure 2B, we show examples of all noise conditions using burst-like response patterns.In those examples, categories were defined according to the number of spikes in each burst.
Symbolically, the stimulus S is represented as a sequence of symbols s, one per time bin ∆t.Each s is drawn randomly from the set of all possible outcomes Σ s = {0, 1, . . ., N S }.The symbol s = 0 indicates a silence (the absence of a feature), whereas s > 0 tags the presence of a given feature.Each feature s elicits a response pattern r, drawn from the set Σ r of all possible patterns, with probability P r (r|s).The response pattern r may appear with latency µ r , which might depend on the evoked pattern r.A neural response R, elicited by a sequence of stimulus features, may be composed of several response patterns (see bold symbol sequences in 2A).The stimulus is depicted as an integer sequence of silences (0) and features (s > 0), one symbol per time bin of size ∆t.After a feature arrival, the stimulus remains silent for a period λ S min .The response is represented as a binary sequence of spikes (1) and silences (0).Each stimulus feature elicits a response pattern: A burst containing n spikes.Different categories correspond to different intra-burst spike counts.(B) Examples of different response conditions.Upper panels: no temporal jitter; lower panels: the pattern, as a whole, is displaced due to temporal jitter; left panels: no categorical noise; right panels: each stimulus feature elicits pattern responses belonging to more than a single category.

Details and parameters
Simulated neural responses consisted of four different patterns, elicited by a stimulus with four different features.The response patterns were bursts of spikes, containing between 1 to 4 spikes.The intra-burst ISI was γ min = 2 ms.However, since the neural response is transformed into the pattern representation, the results are valid irrespective of the nature of the patterns (see section 2.1).The stimulus was presented 200 times, each one lasting for 2000 s.The minimum inter-feature time interval is λ min = 12 ms.In all cases, no interference between patterns was considered (see section 3.8).We used a time bin of size ∆t = 1 ms.
Simulation 1: This simulation is used to illustrate the effect of using different representations of the neural response, and to compare an ideal situation where the correspondence between features and patterns is known, with a more realistic case, where the neural code is unknown.The temporal jitter was σ = 1 ms and the latency was µ = 1 ms.Stimulus features probability p(s) were set to: p(1) = 0.06, p( 2

Electrophysiology
Experimental neural data were provided by Ariel Rokem and Andreas V. M. Herz; they performed intracellular recordings in vivo, on the auditory nerve of Locusta Migratoria (see Rokem et al., 2006, for details).
Auditory stimuli consisted of a 3 kHz carrier sine wave, amplitude modulated by a low pass filtered signal with a Gaussian distribution.The AM signal had a mean amplitude of 53.9 dB, a 6 dB standard deviation and a cut-off frequency of 25 Hz (see Figure 3A upper cell).Each stimulation lasted for 1000 ms with a pause of 700 ms between repeated presentations of the stimulus, in order to minimise the influence of slow adaptation.To eliminate fast adaptation effects, the first 200 ms of each trial were discarded.The recorded response (see Figure 3A lower panel) consisted of 479 trials, with a mean firing rate of 108 ± 6 spikes / s (mean ± standard deviation across trials).Burst activity was observed and associated with specific features in the stimulus (see Eyherabide et al., 2008, for the analysis of burst activity in the whole data set).Bursts contained up to 14 spikes; Figure 3B shows the firing probability distribution as a function of the intra-burst spike count.

Information transmitted by different representations of the neural response: spike and pattern information
In order to understand how stimuli are encoded in the neural response, the recorded neural activity U is transformed into several different representations.Each representation keeps some aspects of the original neural response while discarding others.The spike representation R is probably the most widely used (see section 2.1).We define the spike information I(S; R) as the mutual information rate between the stimulus S and the spike representation R of the neural response.
The spike sequence can be further transformed into a sequence of patterns of spikes, called the pattern representation B. To that end, all possible patterns of spikes are classified into pre-defined categories, for example, burst codes, ISI codes, etc. (see section 2.1 and references therein).We define pattern information I(S; B) as the information about the stimulus S, carried by the sequence of patterns B.
The pattern information cannot be greater than the spike information, which in turn cannot be greater than the information in the unprocessed neural response This result can be directly proved from the deterministic relation between U, R and B (Eqs. 1) and the data processing inequality (Cover and Thomas, 1991).Notwithstanding, several neuroscience papers have reported data contradicting Eq. 6 (see section 4.3).Intuitively, out of all the information carried by the unprocessed neural response, the spike information only contains the information preserved in the spike timing.Analogously, out of the information carried in the spike representation, the pattern information only preserves the information carried by both the time positions and the categories of the chosen patterns.

Choosing the pattern representation
In this paper, we quantify the amount of time and category information encoded by pattern-based codes.
This information depends critically on the choice of the pattern representation.In this subsection, we discuss how to evaluate whether a given choice is convenient or not.One can choose any set of pattern categories to define the alphabet of the pattern representation.Some choices, however, preserve more information about the stimulus than others.The comparison between the information carried by different pattern representations gives insight on how relevant to information transmission the preserved structures are (Victor, 2002;Nelken and Chechik, 2007), i.e. formally, on whether they constitute sufficient statistics (Cover and Thomas, 1991).A suitable representation should reduce the variability in the neural response due to noise, while preserving the variability associated with variations in the encoded stimulus.Thus, any representation preserving less information than the spike information is neglecting informative variability.
In addition, one may also be interested in a neural representation that can be easily or rapidly read out, or that is robust to environmental changes, etc.The chosen neural representation typically results from a trade-off between these requirements.
Here we focus on analysing whether the chosen representation alters the correspondence between the stimulus and the response.For us, a good representation is one where the informative variability is preserved, and the non-informative variability is discarded.As an example, we analyse two different situations (Figure 4).In panel A, we use simulated data, where we know exactly how the neural code is structured.We can therefore compare the performance of the spike representation, with two pattern representations: one of them intentionally tailored to capture the true neural code that generated the data, and another representation discarding some informative variability.The neural response consists of a sequence of four different patterns, associated with each of four stimulus features, in the presence of temporal jitter and categorical noise (see section 2.4.1 Simulation 1).In panel B, we study experimental data (see section 2.5), so the neural code is unknown.Therefore, in this case we compare the spike representation with two candidate pattern representations, ignoring a-priori which is the most suitable.
For both simulation and experimental data, we estimated the information conveyed by the spike representation R; a pattern representation B α , where all bursts are grouped into categories according to their intra-burst spike count; and a second pattern representation B β , with only two categories comprising isolated spikes and complex patterns.This is shown in Figure 4, where the information per unit time is plotted as a function of the window size used to read the neural response.The representations are related through functions, in such a way that B β is a transformation of B α , which is in turn a transformation of R. Therefore, I(S; B β ) ≤ I(S; B α ) ≤ I(S; R), for all finite response windows (see Eq. 6).Nevertheless, notice that B β may be a faster-to-read code than B α , since the latter requires a time window long enough to distinguish not only the differences between isolated spikes and bursts, but also the differences among bursts of different categories.In all cases, error bars < 1% (smaller than the size of the data points).
In the simulation (Figure 4A), the information carried by B α is equal to the spike information (I S im (S; R) = I S im (S; B α ) = 254.2± 0.2 bits/s, one-sided t-test, p(10) = 0.5).This is expected since, by construction, the neural code used in the simulations is, indeed, B α .Therefore, in this case, B α is a lossless representation.
The choice of an adequate representation is more difficult in the experimental example (Figure 4B), where the neural code is not known beforehand.In this case, B α preserves less information than the spike sequence (I Exp (S; R) = 133 ± 4 bits/s, I Exp (S; B α ) = 121 ± 3 bits/s, one-sided t-test, p(10) = 0.004).The information I(S; B α ) represents 91 % of the spike information.In general, whether this amount of information is acceptable or not depends on whether the loss is compensated by the advantages of attaining a reduced representation of the response (Nelken and Chechik, 2007).
Distinguishing only between isolated spikes and bursts (B

Informative aspects of the neural response
The pattern representation may preserve one or several aspects of the neural response that could, in principle, encode information about the stimulus.More specifically, if the response is analysed using windows of duration τ, there are several candidate response aspects that might be informative, namely: a-the number of patterns in the window (number of events -Figure 5A) b-the precise timing of each pattern in the window (time representation -Figure 1D) c-the pattern categories present in the window with no specification of their ordering (response set of categories -Figure 5B) d-the temporally-ordered pattern categories in the window (category representation -Figure 1E).
Figure 5: Identifying the information carriers in the neural response.(A) The representation η of the neural response is obtained by transforming the pattern sequence such that only the number of events is preserved.(B) By transforming the pattern sequence into the representation Θ, the information about the categories present in the neural response is preserved, while their order of occurrence is disregarded.
We find that these aspects are related through deterministic functions.Indeed, aspect a can be univocally determined from aspects b, c or d.Thus, the information transmitted by aspect a is also carried by any of the other aspects.In the same manner, aspect c can be determined from d.However, in Appendix B we prove that the number of patterns in the window (aspect a) makes a vanishing contribution to the information rate.
That is, although aspect a might be informative for a finite window of length τ, its contribution becomes negligible in the limit of long windows.Surprisingly, the unordered set of pattern categories (aspect c) also makes no contribution to the information rate, as shown in Appendix C.Even more, the entropy rates of both aspects tend to zero in the limit of long time windows.Therefore, their information rate with respect to any other aspect, of either the stimulus and/or the neural response, vanishes as the window size increases.
We thus do not discuss aspects a and c any further.This is not the case of response aspects b and d.In other words, they may sometimes be informative; their definitions do not constrain them to be non-informative.Therefore, in what follows, we transform the pattern representation into two other representations preserving the precise timing of each pattern (the time representation) and the temporally-ordered pattern categories (the category representation).Our goal is to determine in which way the precise timing of each pattern conveys information about the time positions of stimulus features (the when), and how the temporally ordered pattern categories provide information about the identity of the stimulus features (the what).

Time and category information
We define the time information I(S; T) as the mutual information rate between the stimulus S and the time representation T. In addition, we define the category information I(S; C) as the mutual information rate between the stimulus S and the category representation C. The category information is novel and, unlike and complementing previous studies (Gaudry and Reinagel, 2008;Eyherabide et al., 2009), allows us to address the relevance of pattern categories in the neural code (see section 3.5).Since both T and C are transformations of the pattern representation B (see Eqs. 1), the time and category information cannot be greater than the pattern information, i.e.

I(S; T) I(S;
Here, I(X; Y; Z) = I(X; Y)−I(X; Y|Z) is called triple mutual information (Cover and Thomas, 1991;Tsujishita, 1995).If ∆ S R is positive, time and category information are synergistic: more information is available when T and C are read out simultaneously.Conversely, if ∆ S R is negative, time and category information are redundant.The proof of Eq. 8 and Eq. 9 is shown in Appendix D. Previous studies have already defined the synergy/redundancy for populations of neurons (Schneidman et al., 2003).It has also been applied to single neurons, to determine how different aspects of response patterns encode the identity of single stimulus features (Furukawa and Middlebrooks, 2002;Nelken et al., 2005).Here we extend the concept to encompass also dynamic stimuli where stimulus features arrive at random times, as well as for arbitrary patterns, defined in time and/or across neurons.
As an example, consider the data presented in Figure 4, when the neural responses represented as a sequence of bursts (B α ).For the case of the simulations (Figure 4.A), the time information is I S im (S, T α ) = 180.4± 0.2 bits/s, and the category information, I S im (S, C α ) = 74.2± 0.5 bits/s.The synergy/redundancy term is slightly negative, but not significant (∆ S im S R = −0.4± 0.5, two-sided t-test, p(15) = 0.44).By construction, in the simulation the time and category information are neither redundant nor synergistic.For the experimental data (Figure 4.B), I Exp (S, T α ) = 63 ± 2 bits/s and I Exp (S, C α ) = 50.6 ± 0.6 bits/s.In this case, we don't know whether the time information and the category information are redundant or synergistic before-hand.Yet, by comparing them with the pattern information we obtain ∆ Exp S R = 7 ± 3bits/s, indicating that timings and categories of patterns are slightly synergistic (two-sided t-test, p(15) = 0.063).
The pattern, time and category information depend on the choice of the alphabet of patterns.For example, the category information may increase or decrease depending on the nature of the aspect defining the pattern categories (Furukawa and Middlebrooks, 2002;Gollisch and Meister, 2008).No general rules can be given, predicting these changes: they depend on the neural representation at hand.However, when the alternative pattern representations are linked through functions, some relations between their variations can be predicted, without numerical calculations.Compare, for instance, B α and B β as defined in section 3.1.
By grouping all bursts with more than one spike into a single category, not only As a result, neither the pattern information nor the category information can increase, whereas the time information remains constant.In addition, if T α and C α are independent and conditionally independent given the stimulus, so are T β and C β .Therefore, the difference in the category information equals the difference in the pattern information (I(S; C α ) − I(S; C β ) = I(S; B α ) − I(S; B β )).
Analogously, consider a representation B γ in which the time positions of patterns identified in B α are read out with lower precision (2 ∆t).Since B γ is a function of B α , two different responses B α i and B α j that only differ little in the pattern time positions are indistinguishable in the representation B γ (B γ i = B γ j ).In this case, the comparison between B α and B γ is analogous to the case analysed in the previous paragraph, with the role of the time and category representations interchanged.
We illustrate these results with an example.In Figure 6, the pattern, time and category information are shown for three different choices of the pattern representation.The simulated neural response is taken from Figure 4A.In the three cases, there is no synergy or redundancy between the time and the category information (∆ S R = 0).From Figure 4A, we already know that I(S; B β ) < I(S; B α ).Comparing the left and middle panels of Figure 6, we find that this reduction is due to a decrement in the category information (I(S, C α ) = 74.2± 0.5 bits/s, I(S, C β ) = 28.6 ± 0.3 bits/s, one-sided t-test, p(10) < 0.001), as expected (see section 3.1).In agreement with the theoretical prediction, the time information remains unchanged (I(S, T α ) = I(S, T β ) = 180.4± 0.2 bits/s, one-sided t-test, p(10) = 0.5).Thus, as mentioned previously, a reduction in the precision with which the patterns are read out always decreases the time information, while keeping the category information constant.
In other examples, the variations in the time and category information may not be directly accompanied by variations in the pattern information, due to the presence of synergy and redundancy.For example, Alitto et al. ( 2005) studied the encoding properties of tonic spikes, long-ISI tonic spikes (tonic spikes preceded by long ISIs) and bursts.To evaluate the relevance of distinguishing between tonic spikes and long-ISI tonic spikes, one can compare the information conveyed by two representations: B ξ , preserving the difference between tonic spikes and long-ISI tonic spikes, and B φ , grouping them into the same category (Gaudry and Reinagel, 2008).Both B ξ and B φ only differ in the category representation, like B α and B β .However, unlike those representations, ∆ ξ S R and ∆ φ S R need not be either equal or zero, and thus . Indeed, by reading simultaneously the timing and category of a pattern, the uncertainty on whether the following pattern will be a long-ISI tonic spike is reduced.Hence, this reduction is a source of redundancy in B ξ , where the long-ISI tonic spikes are explicitly identified.On the other hand, the interpattern time interval (IPI) preceding a long-ISI tonic spike may reveal the duration of the previous pattern.Any information contained in it constitutes a source of synergy in B ξ .The distinction between tonic spikes and bursts produces analogous effects on the synergy and redundancy, affecting both representations B ξ and B φ .
As shown in Cover and Thomas (1991) (see proof in Appendix E).The same ordering applies for both bounds, in such a way that, for example, if I(S; T|C) is the least upper-bound, then I(S; T) is the greatest lower bound, from the set of bounds derived in Eq. 10.These bounds are novel, tighter than the bounds previously mentioned Schneidman et al. (2003).
If the left side of Eq. 10 is zero, time and category information are non-redundant (∆ S R ≥ 0).However, they may still be synergistic (0 ≤ ∆ S R ), even in the case when they are both zero (I(S; T) = I(S; C) = 0 ⇒ ∆ S R ≥ 0).This property has often been overlooked (see, for example, Foffani et al., 2009).Time and category information are non-synergistic if and only if the right side of Eq. 10 is zero.From the definition of the synergy/redundancy ∆ S R (Eq.9), we show that where X, Y and Z represent the variables S, T and C in any order.In this case, the time and category information add up to the pattern information.This situation may occur when either I(X; Y) = I(X; Y|Z) = 0 or I(X; Y) = I(X; Y|Z) > 0 (Schneidman et al., 2003;Nirenberg and Latham, 2003).

Relevance and sufficiency of different aspects of the neural response
Previous studies have addressed the relevance of pattern timing in information transmission by quantifying the time information and comparing it with the pattern information (Denning and Reinagel, 2005;Gaudry and Reinagel, 2008;Eyherabide et al., 2009).In other words, the relevance of pattern timing is given by the amount of information carried by a representation that only preserves the time positions of patterns.We call this paradigm criterium I. Indeed, one can also address the relevance of pattern categories using criterium I.However, instead of quantifying the amount of information carried by the category representation, these previous works have determined the information loss due to ignoring the pattern categories.
Here, this point of view is called criterium II.In what follows, we prove that criterium I and criterium II take into account different information, and can thus lead to opposite results when both of them are applied to the same aspect of the response.
Formally, under criterium I, the pattern timing is relevant (or sufficient) for information transmission if Here, ∆I I th represents a previously set threshold.Although Cover and Thomas (1991) have defined sufficiency only for the case when ∆I I th = 0, in practice, some amount of information loss (∆I I th > 0) is usually accepted (Nelken and Chechik, 2007).We can also employ this criterium to address the relevance of pattern categories, comparing On the other hand, under criterium II, the pattern categories are relevant to information transmission if Therefore, pattern categories are relevant if pattern timings transmit little information, irrespective of the information carried by categories themselves.Remarkably, if ∆I I th = ∆I II th , the pattern categories are relevant (irrelevant) if and only if the pattern timings are irrelevant (relevant) (compare Eqs. 12 and 14).
From the bijectivity between B and (T; C) (see section 3.4), we find that criterium II can be written as As a result, under criterium II, the relevance of an aspect depends not only on the information conveyed by that very aspect -as in criterium I -but also on the synergy/redundancy between that aspect and the complementary ones.Both criteria coincide when ∆I I th + ∆I II th = I(S; B) + ∆ S R (compare Eqs. 13 and 15), implying that equality in the thresholds is neither necessary nor sufficient to obtain a coincidence.By using criterium I for the relevance of pattern timing and criterium II for the relevance of pattern categories, the information that is repeated in both aspects (redundant information) only contributes to the relevance of the pattern timing.However, the information that is carried in both aspects simultaneously Analogous results are obtained for the relevance of pattern timing.In addition, different thresholds are used for the relevance of each aspect (compare Eqs. 12 and 14).In the previous example, the pattern timing is relevant only if I(S; T) > 8 bits/s whereas the pattern categories are relevant only if I(S; C) > 2 bits/s, showing an unjustified asymmetry between both aspects.

Time and category entropy of the stimulus
Many studies have interpreted that pattern-based codes function as feature extractors, where the identity of each stimulus feature (the what) is represented in the pattern category C, and the timing of each stimulus feature (the when), in the pattern temporal reference T (see Introduction and references therein).To assess this standard view, we formally define the what and the when in the stimulus, and relate them with the time and category information.In the next subsection, we determine the conditions that are necessary and sufficient for the standard view to hold.Finally, we show that small category-dependent changes in the timing of patterns (such as latencies) may induce departures from the standard view (altering both the amount and the composition of the information carried by T and C).
Since the stimulus S is composed of discrete features (see Methods for a discussion on continuous stimuli), it can also be written in terms of a time (S T ) and a category (S C ) representation, such that S and the pair (S T , S C ) are related through a bijective map.We formally define the what in the stimulus as the category representation S C , and the when as the time representation S T .Indeed, S T indicates when the stimulus features occurred, whereas S C tags what features appeared.
The stimulus entropy is defined as the entropy rate H(S), while the stimulus time entropy and category entropy are the entropy rates H(S T ) and H(S C ), respectively.The time and category entropies are intimately related to when and what features happened: they are a measure of the variability in the time positions and categories of stimulus features, respectively.These quantities were previously defined for Poisson stimuli in Eyherabide and Samengo (2010), and here these definitions are generalised to encompass any stochastic stimulus.Since S and (S T , S C ) are related through a bijective function,

H(S) = H(S T ) + H(S C ) − I(S T , S C ) ;
(16) where the information rate I(S T , S C ) is a measure of the redundancy between the time and category entropies of the stimulus.Since I(S T , S C ) is always non-negative, S T and S C cannot be synergistic.
The standard view of the role of patterns formally implies that the category information I(S, C) (the time information I(S, T)) can be reduced to the mutual information I(S C , C) (I(S T , T)).Therefore, H(S C ) and H(S T ) must be upper bounds for the category and time information, respectively.However, these bounds are not guaranteed by the mere presence of patterns in the neural response.Some cases may be more complicated because, for example, S C and S T may not be independent variables (see section 2.3).A dependency between these two stimulus properties implies that the what and the when are not separable concepts.

The canonical feature extractor
In this section, we determine the conditions under which the standard interpretation holds: The category information represents the knowledge on the what in the stimulus, and the time information, the knowledge on the when.To that aim, we define a canonical feature extractor as a neuron model in which Under each of these conditions, the time and category information become A canonical feature extractor does not require T and C to be independent nor conditionally independent given the stimulus.In other words, the time and category information may or may not be synergistic or redundant, and the timing (category) of each individual pattern may or may not be correlated with other pattern time positions (pattern categories) or even with pattern categories (pattern time positions).
In addition, conditions 17 may also encompass situations in which some information about S C (S T ) is carried by T (C), but not by C (T).
In order to see how synergy and redundancy behave in a canonical feature extractor, we replace Eqs.18 in Eq. 8, and obtain We find that, for a canonical feature extractor, the synergy/redundancy ∆ S R is lower bounded by (see proof in Appendix G).In other words, the synergy/redundancy term ∆ S R cannot be smaller than the redundancy -already present in the stimulus -between the timing and categories of stimulus features.
In addition, the absence of redundancy in the stimulus (I(S T ; S C ) = 0) constrains the neural model to be non-redundant (∆ S R ≥ 0).
Consider a neural model in which T = f (S T ; ψ T ) and T = f (S C ; ψ C ), where ψ T and ψ C are independent sources of noise, such that p(ψ T , ψ C , S T , S C ) = p(ψ T ) p(ψ C ) p(S T , S C ).Thus, T and C are two channels of information under independent noise (Shannon, 1948;Cover and Thomas, 1991).This model constitutes a canonical feature extractor.Indeed, T (C) is only related to S C (S T ) through S T (S C ), thus complying with condition 17a ( 17b).In addition, if S T and S C are independent, then T and C constitute independent channels of information (Cover and Thomas, 1991;Gawne and Richmond, 1993).This model plays a prominent role in the interpretation of neurons and neural pathways as channels of information (Gawne and Richmond, 1993;Schneidman et al., 2003;Montemurro et al., 2008;Krieghoff et al., 2009), as discussed in section 4.5.
The independent channels of information may be regarded as the simplest canonical feature extractor.
Since T and C are independent and conditionally independent given S, the time and category information add up to the pattern information (∆ S R = 0).An example of this model is shown in Figure 7.In the four simulations carried out, the neural responses consist of a sequence of four different patterns, associated with four different stimulus features, under the presence or absence of temporal jitter and categorical noise (see section 2.4.1 Simulation 2 for a detailed description; Figure 2B shows examples of the different noise conditions).In Figure 7, the spike information is omitted because it coincides with the pattern information (all cases, one-sided t-test, p(10) = 0.5).Indeed, by construction, all the information is transmitted by patterns, which can be univocally identified in the response.In agreement with the theoretical results (Eq.18), the time and the category information are always upper-bounded by the stimulus time and category entropy, respectively (all cases, one-sided t-test, p(10) > 0.4).The left side of each panel shows the stimulus entropy, whereas the right side shows the pattern, time and category information.In all cases, ∆ S R = 0, so the pattern information is equal to the sum of the category and the time information.From left to right: The addition of categorical noise reduces only the category information irrespective of the amount of temporal jitter.From top to bottom: The presence of temporal jitter degrades solely the time information irrespective of the amount of categorical noise.The pattern information is upper bounded by the stimulus entropy, the time information by the stimulus time entropy, and the category information by the stimulus category entropy.In all cases, error bars < 1%.For detailed description of the simulation see section 2.4.1 Simulation 2.
Comparing upper and lower panels of Figure 7, we show that the time information is degraded by the addition of temporal jitter (both cases, one-sided t-test, p(10) < 0.001), while the category information remains constant (both cases, one-sided t-test, p(10) > 0.14).Analogously, comparing left and right panels of Figure 7, we find that the addition of categorical noise decreases the category information (both cases, one-sided t-test, p(10) < 0.001), while keeping the time information constant (panel A and B, I(S; T A ) = 223.3 ± 0.1 bits/s, I(S; T B ) = 222.8± 0.1 bits/s, one-sided t-test, p(10) = 0.08; panel C and D, one-sided t-test, p(10) = 0.5).This is expected since, by construction, the categorical noise only depends on the stimulus categories and affects solely the pattern categories, whereas the temporal jitter considered here only affects the pattern time positions, irrespective of their categories or the stimulus.

Departures from the canonical feature extractor
The example shown in Figure 7 turns out to be more complicated if the pattern timing depends on the pattern category, as occurs in latency codes (Gawne et al., 1996;Furukawa and Middlebrooks, 2002;Chase and Young, 2007;Gollisch and Meister, 2008).Indeed, in those cases, the comparison between the timing of response patterns and the timing of stimulus features carries information about the stimulus categories (I(S C ; T|S T ) > 0).As a result, Eq. 17a does not hold.Latency codes may be an intrinsic property of the encoding neuron, may result as a consequence of synaptic transmission (Lisman, 1997;Reinagel et al., 1999), or may either arise from the convention used to construct the pattern representation, for example, ascribing the timing of a pattern as the mean response time, the first or any other spike inside the pattern (Nelken et al., 2005;Eyherabide et al., 2008).In all these cases, a latency-like dependence between the time positions and categories of patterns may arise.
To assess the effect of different latencies associated with each pattern category on the neural response, consider the neural model used in Figure 7, except that now, the pattern latencies vary with the pattern category b, according to Here α µ is the latency index, representing the difference between the latencies of consecutive pattern categories.Three values of α µ were considered: 0, 2 and 4 ms.
When α µ = 0 ms, all patterns have the same latencies.This case was analysed in Figure 7.As α µ increases, so does the latency difference of different patterns.
Due to the deterministic link between the pattern latencies and pattern categories, the pattern representations (B 0 , B 2 and B 4 ), associated with the different values of α µ are related bijectively.In addition, the category representation does not depend on α µ .Only the time representation is altered by a change in the latency index, irrespective of the presence of absence of temporal jitter and categorical noise.Therefore, any change in the time information is immediately reflected in the synergy/redundancy term Here, ∆ x S R and T x represent the synergy/redundancy term and the time representation, respectively, for α µ = x ms.
The impact of different latencies is twofold.In the first place, the presence of categorical noise increments the temporal noise through the deterministic link between latencies and categories.Therefore, the time noise entropy (time information) when α µ > 0 is greater (less) than that when α µ = 0.However, this does not occur when the time and category representations are read out simultaneously.Indeed, given the category representation, any time representation for α µ = x > 0 can be univocally determined from the time representation for α µ = 0, and vice versa, counteracting the effect of the temporal noise.Therefore, the variation in the time noise entropy (H(T x |S) − H(T 0 |S) in Eq. 21) can be regarded as a source of synergy.
In the second place, the variation in the latencies modifies the inter-pattern time interval distribution, incrementing the time total entropy (and the time information) when α µ > 0 with respect to the case when α µ = 0.In addition, this variation introduces information about the pattern categories in the inter-pattern time interval, and consequently it also introduces information about the stimulus identities.For example, a short interval between two consecutive patterns indicates that the second patterns belongs to a category with a short latency.In consequence, the increment in the time total entropy (H(T x ) − H(T 0 ) in Eq. 21) can be regarded as a source of redundancy.
To illustrate these theoretical inferences, the results of the simulations are shown in Figure 8.As expected, when α µ = x > 0, the latencies alter the time information.However, they do not alter the pattern nor the category information, and thus any variation in the time information is compensated by an opposite variation in the synergy/redundancy term.Notice that the changes in the time information not only depend on the latency index, but also on the presence of temporal and categorical noise.Indeed, in the absence of categorical noise, H(T x |S) = H(T 0 |S) = 0, and thus ∆ S R ≤ 0. The effect of the temporal jitter depends on its distribution as well as the distribution of the inter-pattern time intervals, so this analysis if left for future work.In all cases, when latencies depend on the pattern category, the time information is affected while the category information remains unchanged.Furthermore, the addition of categorical noise not only affects the category information but also the time information.In general, how the addition of temporal and/or categorical noise affects the time information depends on the latency index, as well as on the noise already present in the response.For simulation details, see section 2.4.1 Simulation 2. The case where α µ = 0 ms was analysed in Figure 7 and is reproduced here for comparison.In panel A, the case where α µ = 0 ms also represents the stimulus entropies, as shown in Figure 7A.In all cases, error bars < 1%.
In these examples we see that for non-canonical feature extractors, one can no longer say that the pattern categories represent the what in the stimulus and the pattern timings represent the when, not even in the absence of synergy/redundancy.As shown in Eq.21, ∆ S R results from a complex tradeoff between the effect of categorical noise on the total and noise time response entropies.This tradeoff depends on the latency index and the amount of temporal noise in the system, as shown in Figure 8.
Latency-like effects may be involved in a translation from a pattern-duration code into an inter-spike interval code (Reich et al., 2000;Denning and Reinagel, 2005).Indeed, bursts may increase the reliability of synaptic transmission (Lisman, 1997), making it more probable to occur at the end of the burst.In that case, the duration of the burst determines the latency of the postsynaptic firing.In particular, this indicates that bursts can be simultaneously involved in noise filtering and stimulus encoding, in spite of the belief that these two functions cannot coexist (Krahe and Gabbiani, 2004).Notice that here, latency codes have been studied for well-separated stimuli.However, if patterns are elicited close enough in time, they may interfere in a diversity of manners (Fellous et al., 2004), precluding the code from being read out.Although we cannot address all these cases in all generality, the framework proposed here is valid to address each particular case.

Discussion
In this paper, we have focused on the analysis of temporal and categorical aspects, both in the stimulus and the response.Our results, however, are also applicable to other aspects.In the case of responses, these aspects can be latencies, spike counts, spike timing variability, autocorrelations, etc. Examples of stimulus aspects are colour, contrast, orientation, shape, pitch, position, etc.The only requirement is that the considered aspects be obtained as transformations of the original representation, as defined in section 2.1 (see section 4.2).The information transmitted by generic aspects can be analysed by replacing B (S) with a vector representing the selected response (stimulus) aspects.The amount of synergy/redundancy between aspects is obtained from the comparison between the simultaneous and individual readings of the aspects.
In addition, the results can be generalised for aspects defined as statistical (that is, non-deterministic) transformations of the neural response, or of the stimulus.The data processing inequality also holds in those cases (Cover and Thomas, 1991).

Meaning of time and category information and their relation with the what and the when in the stimulus
In this paper, we defined the category and the time information in terms of properties of the neural response.
The category (time) information is the mutual information between the whole stimulus S and the categories C (timing T) of response patterns (see Figure 9A).These definitions only require the neural response to be structured in patterns.No requirement is imposed on the stimulus, i.e. the stimulus need not be divided into features.Our definitions, hence, are not symmetric in the stimulus and the response.In some cases, however, the stimulus is indeed structured as a sequence of features.One may ask how the stimulus identity (the what) and timing (the when) is encoded in the neural response (see Figure 9B).To that end, we defined the what in the stimulus in terms of the category representation (S C ), and the when, in terms of the time representation (S T ).These rigorous definitions allowed us to disentangle how the what and the when in the stimulus are encoded in the category and time representations of the neural response.We calculated the mutual information rates between different aspects of the stimulus and different aspects of the neural response (see Figure 9C).
In the standard view, the pattern categories are assumed to encode the what in the stimulus, and the timing of patterns, the when (Theunissen and Miller, 1995;Borst and Theunissen, 1999;Martinez-Conde et al., 2002;Krahe and Gabbiani, 2004;Alitto et al., 2005;Oswald et al., 2007;Eyherabide et al., 2008).These assumptions have been stated in qualitative terms.There are two different ways in which the standard view can be formalized as a precise assertion.
On one hand, the standard view can be seen as the assumption that the category (time) representation only conveys information about the what (the when).Evaluating this assumption involves the comparison between the information conveyed by the category (time) representation about the whole stimulus (dotted lines in Figure 9C) with the information that this same representation conveys about the what (the when) in the stimulus (solid lines in Figure 9C).

Two different approaches to the analysis of neural codes
In order to understand a neural code, one needs to identify those aspects of the neural response that are relevant to information transmission.To that aim, two different paradigms have been used: criterium I, assessing the information that one aspect conveys about the stimulus, and criterium II, assessing the information loss due to ignoring that aspect (see section 3.5).Previous studies have used criterium I to analyse the relevance of spike counts (Furukawa and Middlebrooks, 2002;Foffani et al., 2009), spike patterns (Reinagel et al., 1999;Eyherabide et al., 2008), and pattern timing (Denning and Reinagel, 2005;Gaudry and Reinagel, 2008;Eyherabide et al., 2009).However, when assessing the relevance of the complementary aspects, such as spike timing and internal structure of patterns, these studies have used criterium II.As a result, in these studies the relevance of the tested aspect is conditioned to the irrelevance of the other aspects.
There are cases where building a representation that preserves a definite response aspect is not evident (nor perhaps possible).Such is the case, for example, when assessing the differential roles of spike timing and spike count: It is not possible to build a representation preserving the timing of the spikes without preserving the spike count (see section 3.3).It is instead possible to only preserve the spike count.Since the spike-count representation is a function of the spike-timing representation, one may argue that there is an intrinsic hierarchy between the two aspects.The same situation is encountered when evaluating the information encoded by the pattern representation, as compared to the spike representation (see section 3.1).
There, it was not possible to construct a representation only containing those aspects that had been discarded in the pattern representation.However, this is not the case when evaluating the differential role between pattern timing and pattern categories, or the relevance of a specific pattern category.
In the present study, we take advantage of both approaches.Firstly, we notice that pattern timing and pattern categories are complementary response aspects, and quantify the information preserved by each aspect (see section 3.4).Then, we determine whether there is synergy or redundancy between the time and category information, which is formally equivalent to comparing the information preserved by (criterium I) and lost due to ignoring (criterium II) each of the two aspects.As a result, we gain insight on the relevance of each aspect as well as how the aspects interact to transmit information (see section 4.4).These procedures can be extended to encompass any two different aspects of the neural response (see section 4.5).
Notice that the role of correlations, both in time and/or across neurons, has been evaluated using criterium II (Brenner et al., 2000;Dayan and Abbott, 2001;Nirenberg et al., 2001;Petersen et al., 2002;Schneidman et al., 2003;Montemurro et al., 2007).However, these authors did not build two complementary representations of the neural response ignoring and preserving the correlations, as proposed here.
Instead, they ignored correlations by constructing artificial neural responses (or artificial response probabilities) where different neurons were independent or conditionally independent.Thus, their analysis involves a comparison between the real and the artificial neural code.Our analysis, instead, is completely based on complementary reductions of the real neural response.Moreover, in previous studies, the artificial neural responses are not a transformed version of the real response in a well defined time window.Thus, in some cases, the difference between the information with and without preserving correlations is not guaranteed to be non-negative by the data processing inequality (Cover and Thomas, 1991).

Representations of the neural response and the data processing inequality
In some previous studies, the information encoded by different response aspects was assessed, as here, by transforming each neural response window (R τ ) of size τ through functions, into the pattern representation (B τ ) (Furukawa and Middlebrooks, 2002;Petersen et al., 2002;Nelken et al., 2005;Gollisch and Meister, 2008).Examples of those response aspects are the first-spike latencies, spike counts, spike-timing variabilities and first (second, third, etc.) spikes in a pattern.As a result, the information carried by the individual response aspects cannot be greater than that provided by the neural response in the same window (I(S; B τ ) ≤ I(S; R τ )), irrespective of the length τ (see section 2.2).In other studies, however, the pattern representation B was obtained by transforming the spike representation inside a sliding window of variable length: the length of the window depended on the category of the actual pattern.Then, B was read out with time windows of size τ.That is the case, for example, when addressing the information conveyed by interspike intervals of length > 38 ms using words of length τ = 14.8 ms (Reich et al., 2000) and by patterns of length > 104 ms, > 10 ms and > 56 ms, using time windows up to 64 ms, 3.2 ms and 16 ms, respectively (Reinagel and Reid, 2000;Eyherabide et al., 2008;Gaudry and Reinagel, 2008).Unlike the first approach, in this case the data processing inequality does not apply, since B τ is not a function of R τ .Therefore, I(S; B τ ) can be larger or smaller than I(S; R τ ).However, when τ → ∞, B τ = B τ , so asymptotically, both approaches coincide.

The role of synergy and redundancy in the search for relevant response aspects
One of the main goals of the analysis of the neural code is to identify the response aspects that are relevant to information transmission.In this context, two important questions arise: how relevant the chosen aspects are, and how autonomously they stand.Their relevance to information transmission is assessed with information-theoretical measures, as exemplified here with the category and time information (see section 4.2).Their autonomy refers to whether each aspect transmits information by itself or not, and whether the transmitted information is shared by other aspects or not.The degree of autonomy is assessed by quantifying the synergy/redundancy term (∆ S R ) between the different aspects.
The concept of synergy/redundancy entails the comparison between the effect of the whole and the sum of the individual effects of the constituent parts.The concept requires the constituent parts to be univocally determined by the whole, as well as the whole to be completely determined given its constituent parts.In other words, the whole and the constituent parts must be related through a bijective function.In neuroscience, the synergy/redundancy between groups of neurons has been addressed by comparing the information carried by the group of neurons (the whole) and the sum of the information of each and every neuron from the group (the constituent parts) (Brenner et al., 2000;Schneidman et al., 2003).As a result, ∆ S R can be interpreted as a trade-off between synergy and redundancy (Schneidman et al., 2003).
Intuitively, the presence of synergy (∆ S R > 0) between two aspects indicates that, for many responses, the aspects must be read out simultaneously in order to obtain information about the stimulus.For some specific responses, however, one of the aspects may be enough to identify the stimulus.But on average, aspects cooperate.On the other hand, the presence of redundancy (∆ S R < 0) indicates that, for many responses, the information conveyed by both aspects overlaps.Therefore, some of the information that can be extracted from one aspect taken alone can also be extracted from the other aspect taken alone.There might still be a few individual responses for which it is necessary to read both aspects simultaneously to obtain information about the stimulus.But on average, messages tend to be replicated in the different aspects.
In the absence of synergy/redundancy (∆ S R = 0), the aspects might or might not be independent and conditionally independent given the stimulus (Nirenberg and Latham, 2003;Schneidman et al., 2003).If they are, then both aspects are fully autonomous.However, if they are not, then synergy and redundancy coexist.Some responses might require the simultaneous read out of both aspects.However, for other responses, at least one of the individual aspects might be enough to obtain information about the stimulus.
In this case, by considering both aspects separately, one cannot recover the entire encoded information.

Applications
The main ideas in this paper can also be extended to encompass any neuron response aspects, different from pattern timing and pattern category.In particular, they allow us to analyse the information conveyed by different types of patterns and the synergy/redundancy between them, extending the formalism derived in Eyherabide et al. (2008).In addition, aspects may also be defined in continuous time since (Mackay and McCulloc, 1952).Even more, any neural population response can be represented as a single sequence of coloured spikes, each colour indicating the neuron that fired the spike (Brown et al., 2004).
Therefore, single neuron codes and population codes can be analysed under the same formalism.
For example, during the last decades, many studies have focused on assessing whether different neurons transmit information about different stimulus aspects (Gawne et al., 1996;Denning and Reinagel, 2005;Eyherabide et al., 2008).To that aim, different neurons (and different neural response aspects) have been interpreted as information channels (Dan et al., 1998;Montemurro et al., 2008;Krieghoff et al., 2009), often addressing whether they constitute independent channels of information (see section 3.7).However, these studies have focused on whether the two aspects (or neurons) are independent and conditionally independent given the stimulus (Gawne and Richmond, 1993;Schneidman et al., 2003).Indeed, in this case, the response aspects constitute independent channels of information.However, these conditions do not identify which stimulus aspects are encoded by different neurons, nor they guarantee that they are independent.
To gain insight on the relation between stimulus and response aspects, we determine whether the neuron constitutes a canonical feature extractor and/or a canonical feature interpreter.For independent channels of information, the response aspects are canonical feature extractors, canonical feature interpreters, and also independent and conditionally independent given the stimulus.However, none of these conditions can be derived from the other.In effect, a canonical feature extractor or a canonical feature interpreter may or may not exhibit synergy or redundancy between the time and category information (see section 3.7 and Appendix H).Moreover, even if T and C are independent and conditionally independent given the stimulus, T (C) may still convey information about S C (S T ) once the information about S T (S C ) has been read out.
Formally, each of the equalities defining a canonical feature extractor or a canonical feature interpreter constitutes a relation between one aspect of the stimulus and one aspect of the response.Such relations cannot be derived from the independence or conditional independence between two aspects of the response.
For the same reason, the what and the when are not guaranteed to be independent aspects.
Finally, the analysis performed in this work relies on the mutual information between the stimulus and different aspects of neural response, and thus it is related to both the encoding operation and the decoding operation (Shannon, 1948;Brown et al., 2004;Nelken and Chechik, 2007).Indeed, the mutual information is symmetric by definition (Cover and Thomas, 1991).Therefore, one can interpret the information between the stimulus and a particular aspect of the neural response from both points of view, characterizing both how the stimulus is encoded into a specific response aspect (Reich et al., 2000;Reinagel and Reid, 2000;Nelken et al., 2005;Eyherabide et al., 2008;Gollisch and Meister, 2008) and what can be inferred about the stimulus from a decoder that only decodes that specific aspect.To this aim, an explicit representation of each response aspect is needed, for example, as defined in section 2.1.Such representations are not always available, as for example, in studies assessing the role of correlations (see section 4.2).

Acknowldgements
This work was supported by the Alexander von Humboldt Foundation, the Consejo de Investigaciones Científicas y Técnicas, the Agencia de Promoción Científica y Tecnológica of Argentina, and Comisión Nacional de Energía Atómica.

A. Categorical noise
The categorical noise is characterised by the probability P b (b|s) that a stimulus feature s elicits a pattern response of category b.This probability is related to P r (r|s), the probability of inducing the response r due to the feature s, according to where the sum runs through all patterns of spikes r which category is b.

B. Event counts transmit information at a vanishing rate
Previous studies have shown that the information per unit time carried by the spike count decreases with the size of the response time window (Petersen et al., 2002;Montemurro et al., 2007).In this appendix, we formally prove this result and also that the information per unit time vanishes in the limit of long windows.We extend its validity not only for spikes, but for any response patterns, as defined in section 2.1, irrespective of the number of pattern categories.To that aim, consider a representation η that only preserves the number of patterns in each response segment R τ of length τ (Figure 5A).In this representation, two responses stretches R 1 τ and R 2 τ are different if and only if they contain a different number of patterns (η(R 1 τ ) η(R 2 τ )), otherwise they are equal.
In a real experiment, patterns (and spikes) are not instant (Mackay and McCulloc, 1952).Thus, without loss of generality, consider the time divided into time bins of size ∆t shorter than the shortest pattern.The number of events present in any response stretch R w of length w bins is bounded by 0 ≤ η w ≤ w, and therefore H(η w ) ≤ log (w + 1).Hence, the entropy rate H(η) becomes zero, since

C. The response set of event categories transmits information at a vanishing rate
In this appendix, we prove that the information per unit time transmitted by the response set of pattern categories decreases with the length of the response time window, and it vanishes in the limit of long time windows.To this aim, we consider a representation Θ in which two response segments are indistinguishable if and only if they have the same pattern categories, irrespective of their temporal ordering (see Figure 5B).
Hence, two neural responses can be different in the category representation and equal in the Θ representation.Analogously to Appendix B, we only consider that the response events are not instant.
We first prove the result for the case where the number |Σ b | of possible different pattern categories is finite; a neural response B may be composed of several response patterns.This is indeed the most frequent situation in the real neural system (Mackay and McCulloc, 1952) (Cover and Thomas, 1991;Tsujishita, 1995).

E. Relation between the upper-and lower-bounds of ∆ S R
The synergy/redundancy term (∆ S R ), defined in Eq. 9, can be written as The implications of condition 17a mentioned in section 3.7 follow directly from this equation.By interchanging S T and S C in Eqs.F-5 and F-6, analogous conclusions can be derived for the category representation.

G. Redundancy bounds for the canonical feature extractor
To prove Eq. 20, we expand This is the bound that we wanted to prove.

H. The canonical feature interpreter
We define a canonical feature interpreter as a neuron model in which

Figure 1 :
Figure 1: Representations of the neural response.(A) In the spike representation, only the timing of action potentials is described, discarding the fine structure of the voltage traces.In the pattern representation, only the timing and categories of spike patterns remain.This representation is further transformed, to obtain the time and the category representations.The time (category) representation only keeps information about the timing (categories) of the spike patterns.(B), (C), (D) and (E) Each successive transformation of the neural response through a deterministic function simultaneously reduces both the variability in the neural response and number of possible responses.

Figure
Figure 2B shows example neural codes with no noise (upper left panel), categorical noise alone (upper right), temporal noise alone (lower left), and a mixture of categorical and temporal noise (lower right).The categorical noise is defined by P b (b|s), quantifying the probability that a response category b be elicited in response to stimulus s (see Appendix A for the relation between P b (b|s) and P r (r|s)).The temporal noise is implemented as jitter in the pattern onset time.That is, temporal jitter affects the pattern as a whole, displacing all spikes in the pattern by the same amount of time.The temporal displacement is drawn from a uniform distribution in the interval (−σ b , σ b ), where the jitter σ b may depend on the pattern b.

Figure 2 :
Figure2: Simulations: Design and construction.(A) Example of a stimulus stretch and the elicited response.The stimulus is depicted as an integer sequence of silences (0) and features (s > 0), one symbol per time bin of size ∆t.After a feature arrival, the stimulus remains silent for a period λ S min .The response is represented as a binary sequence of spikes (1) and silences (0).Each stimulus feature elicits a response pattern: A burst containing n spikes.Different categories correspond to different intra-burst spike counts.(B) Examples of different response conditions.Upper panels: no temporal jitter; lower panels: the pattern, as a whole, is displaced due to temporal jitter; left panels: no categorical noise; right panels: each stimulus feature elicits pattern responses belonging to more than a single category.
) = 0.04, p(3) = 0.03, p(4) = 0.02.Categorical noise (p(b|s), b s): p(i + 1|i) = 0.1 (4 − i), 0 < i < 4; otherwise p(b|s) = 0. Simulation 2: These simulations are used to address the role of the timing and category of patterns in the neural code, and to study the relation with the what and the when in the stimulus.The latency was µ = 1 ms.When present, temporal jitter was set to σ = 1 ms and categorical noise (p(b|s), b s) was given by: p

Figure 3 :
Figure 3: Experimental data from a grasshopper auditory receptor neuron.(A) Upper panel: Sample of the amplitude modulation of the sound stimulus used in the recordings.Lower panel: Response to 30 of 479 repeated stimulus presentations showing conspicuous burst activity.Each vertical line represents a single spike.(B) Probability of firing a burst with n intra-burst spikes, in a time bin of size ∆t = 1 ms.Isolated spikes (n = 1) and burst activity (n > 1) represent 49.4% and 50.6% of the firing events, respectively.

Figure 4 :
Figure 4: Information per unit time transmitted by different choices of patterns.The spike representation (R) is transformed into a sequence of patterns grouped in categories according to the intra-pattern spike count (B α ), which is further transformed into a sequence of patterns classified as isolated spikes or complex patterns (B β ).Comparing the amount of information transmitted gives insight about the relevance of the structures preserved in the representations.(A) Simulation of a neural response with four different patterns, elicited by a stimulus with four different features, in presence of temporal jitter and categorical noise (see section 2.4.1 Simulation 1 for details).(B) Experimental data from a grasshopper auditory receptor neuron.In all cases, error bars < 1% (smaller than the size of the data points).
C) ≤ I(S; B) .(7) When T and C are read out simultaneously, the pair (T, C) carries the same information as the pattern sequence B (I(T, C; S) = I(B; S)).In fact, B and the pair (T, C) are related through a bijective function.To prove this, consider any pattern representation B i of a neural response U i .The pair (T i , C i ) associated with U i is a function of B i (see Eqs. 1).Conversely, given the pair (T i , C i ) associated with U i , all the information about the time positions and categories of patterns present in U i is available, and thus B i is univocally determined.Notice that the pairs (T, C) are a subset of the Cartesian product T × C. The time positions of patterns may depend on their categories, and vice versa.To explore this relationship, and how it affects the transmitted information, we separate the pattern information as I(B; S) = I(S; T) + I(S; C) + ∆ S R ; (8) where ∆ S R represents the synergy/redundancy between the time and the category representations, defined by ∆ S R = −I(S; T; C) .

Figure 6 :
Figure 6: Pattern, time and category information carried by different neural representations.The spike representation is transformed into a sequence of patterns: B α (left): grouped in categories according to the intra-pattern spike count; B β (middle): classified as isolated spikes or complex patterns; and B γ (right): classified as in B α , reading out the time positions with a lower precision (2 ∆t).In all cases, error bars < 1%.The simulation data is taken from Figure 4 (see section 2.4.1 Simulation 1 for details).
, I(S; T; C) is symmetric in S, T and C. Hence, ∆ S R is upper and lower bounded by − I(X; Y) ≤ ∆ S R ≤ I(X; Y|Z) ; (10) where X, Y and Z represent the variables S, T and C in such an ordering that I(X; Y) = min{I(T; C), I(S; T), I(S; C)}

(
synergistic information) only contributes to the relevance of the pattern categories.The discrepancies in this way induced are shown in the following example.Consider that I(S; R) = 10 bits/s, I(S; T) = 9 bits/s, I(S; C) = 10 bits/s and ∆I th = 2 bits/s.Under criterium II, C is irrelevant because I(S; B) − I(S; T) = 1 bit/s < ∆I th .Nevertheless, under criterium I, C is necessarily relevant, since it constitutes a sufficient statistics (I(S; B) = I(S; C)).
)Consequently, the response pattern categories represent what stimulus features are encoded, whereas the pattern time positions represent when the stimulus features occur.In particular, the time and category information are upper bounded by the stimulus time and category entropies, respectively.Condition 17a implies that all the information I(S C ; T) is already contained in the information I(S T ; T).In other words, I(S C ; T) is completely redundant with I(S T ; T), and I(S C ; T) ≤ I(S T ; T).In this sense, we say that the time information represents the when in the stimulus.Analogous implications can be obtained from condition 17b for the category representation C, by interchanging T with C, and S T with S C (see formal proof in Appendix F).Therefore, conditions 17 are necessary and sufficient to ensure that the standard view of the role of patterns in the neural code actually holds (see section 4.1).

Figure 7 :
Figure 7: Information transmitted by a canonical feature extractor under different noise conditions.The left side of each panel shows the stimulus entropy, whereas the right side shows the pattern, time and category information.In all cases, ∆ S R = 0, so the pattern information is equal to the sum of the category and the time information.From left to right: The addition of categorical noise reduces only the category information irrespective of the amount of temporal jitter.From top to bottom: The presence of temporal jitter degrades solely the time information irrespective of the amount of categorical noise.The pattern information is upper bounded by the stimulus entropy, the time information by the stimulus time entropy, and the category information by the stimulus category entropy.In all cases, error bars < 1%.For detailed description of the simulation see section 2.4.1 Simulation 2.

Figure 8 :
Figure8: Examples of departures from the behaviour of the canonical feature extractor: The effect of pattern-category dependent latencies.In all cases, when latencies depend on the pattern category, the time information is affected while the category information remains unchanged.Furthermore, the addition of categorical noise not only affects the category information but also the time information.In general, how the addition of temporal and/or categorical noise affects the time information depends on the latency index, as well as on the noise already present in the response.For simulation details, see section 2.4.1 Simulation 2. The case where α µ = 0 ms was analysed in Figure7and is reproduced here for comparison.In panel A, the case where α µ = 0 ms also represents the stimulus entropies, as shown in Figure7A.In all cases, error bars < 1%.

Figure 9 :
Figure 9: Analysis of the role of spike patterns: Relationship with the what and the when in the stimulus.(A) Categorical and temporal aspects in the neural response.Definitions of time I(S; T) and category I(S; C) information.(B) Categorical and temporal aspects in the stimulus.Information about the what I(S C ; B) and the when I(S T ; B) conveyed by the neural response B. (C) Analysis of the role of patterns in the neural response.Mutual information between different aspects of the stimulus and different aspects of the neural response.
Formally, this means to address whether I(S; C) = I(S C ; C) (whether I(S; T) = I(S T ; T)).In this sense, we say that the category (time) information only represents the what (the when) in the stimulus.A system complying with this first interpretation of the standard view was called a canonical feature extractor (see section 3.7).On the other hand, the second way to define the standard view rigorously is to assume that the what (the when) is completely encoded by the category (time) representation.Testing this second assumption involves the comparison between the information about the what (the when), conveyed by the category (time) representation (solid lines in Figure9C) and by the pattern representation of the neural response (dashed lines in Figure9C).Formally, it involves assessing whether I(S C ; B) = I(S C ; C) (whether I(S T ; B) = I(S T ; T)).In this sense, we say that all the information about the what (the when) in the stimulus is encoded in the category (time) representation of the neural response.A system for which these equalities hold is called a canonical feature interpreter.It is analogous to the canonical feature extractor, with the role of the stimulus and the response interchanged (see Appendix H).The two formalizations of the standard view are complementary.The first one assesses how different aspects of the stimulus are encoded in each aspect on the neural response.The second one focuses on how each aspect of the stimulus is encoded in different aspects of the neural response.Thus, the second approach is a symmetric version of the first one.However, a canonical feature extractor might or might not be a canonical feature interpreter, and vice versa.A perfect correspondence between the what and the when on one side, and pattern timing and categories, on the other, is found for systems that are canonical feature extractors and canonical feature interpreters, simultaneously.
, the information rate carried by η about any other random variable vanishes.In particular, I(S; η) ≤ H(η) = 0.The result is valid for response patterns of any nature (see section 2.1 for the definition and examples of patterns).
, valid for all the examples of pattern-based codes mentioned in section 2.1 and throughout this paper.Consider that the neural response is read with words of length w bins, smaller than the shortest pattern.The number of patterns is bounded by 0 ≤ η w ≤ w (see Appendix B).In addition, each response pattern may belong to one out of |Σ b | pattern categories.Thus, the number of possible different responses Θ w in the representation Θ is upper-bounded by |Θ w | ≤ (w + 1) |Σ b |(Cover and Thomas, 1991).As a result, its entropy is upper-bounded by H(Θ w ) ≤ log |Θ w |, and its entropy rate isH(Θ) = lim w→∞ H(Θ w ) w ≤ lim w→∞ |Σ Θ | log (w + 1) w = 0 .(C-1)Therefore, there is no mutual information rate between the response set of pattern categories and any other random variable.Particularly, I(S; Θ) = 0.We now generalise the result for infinite codes, under the only condition that patterns belonging to different categories have different durations.These codes can be regarded as academic examples since, in any real condition, they would be impractical due to the long time periods required to read out the codewords.Examples of such infinite codes are bursts codes with no restriction in their duration, interspike intervals or latencies divided into an infinite number of finite ranges and the number of spikes in arbitrarily long time response windows.In a neural response of size w bins, only patterns up to a length w can be read (see section 4.3 for examples).In addition, a neural response may contain several patterns.Thus, the sum of the length of the patterns cannot be greater than the length of the response containing them.Under this conditions, the number of response sets of pattern categories |Θ w | is upper-bounded by |Θ w p(k) represents the number of partitions of the integer number k.By using the Hardy-Ramanujan-Uspensky asymptotic approximation(Apostol, 1990) A and B are positive constants, k 0 represents an integer for which the approximation is valid, C = k=k 0 −1 k=0 p(k) and the right-most inequality is valid for long enough words.Therefore, the entropy rate H(Θ) entropy rate H(Θ) tends to zero, and consequently the mutual information rate that the response set of pattern categories can carry about any other random variable vanishes.D. Decomposition of the pattern informationAs mentioned previously, the pattern sequence B and the pair (T, C) carry the same information about the stimulus, since they are related through a bijective transformation.Therefore I(S; B) = I(S; T, C) (D-1a) = I(S; T) + I(S; C|T) + I(S; C) − I(S; C) (D-1b) = I(S; T) + I(S; C) − (I(S; C) − I(S; C|T)) (D-1c) = I(S; T) + I(S; C) − I(S; T; C) (D-1d) = I(S; T) + I(S; C) + ∆ S R ; (D-1e) where Eq.D-1e is obtained from Eq. D-1d by replacing ∆ S R = −I(S; T; C).Here, I(X; Y; Z) = I(X; Y) − I(X; Y|Z) represents the triple mutual information R = I(S; T|C) − I(S; T) (E-2a) = I(S; C|T) − I(S; C) (E-2b) = I(T; C|S) − I(T; C) .(E-2c)Hence, the upper-and lower-bounds of ∆ S R are these upper-and lower-bounds are related through Eqs.E-2, in such a way that min         I(T; C) I(S; T) I(S; C)          = I(X, Y) ⇔ min          I(T; C|S) I(S; T|C) I(S; C|T)          = I(X, Y|Z) ; (E-4)where X, Y and Z represent the variables S, T and C in any order.This proves the upper-and lower-bounds for the synergy/redundancy term ∆ S R of Eq. 10.I(S; X) = I(ST ; X) + I(S C ; X) C ; X) + I(S T ; X|S C ) = I(S T ; X) (F-6b) I(S C ; X) ≤ I(S T ; X) .(F-6c)Notice that this is the case of the time representation in a canonical feature extractor.Indeed, the definition of a canonical feature extractor states that I(S C ; T|S T ) = 0, and consequently ∆ T S R = −I(S C ; T) .(F-7) (T, S T ; C, S C ) = = I(S T ; S C ) + I(T; S C |S T ) =0 (Eq.17a) + I(S T ; C|S C ) =0 (Eq.17b) +I(T; C|S T , S C ) (G-1a) = I(T; C) + I(S T ; C|T) ≥0 + I(T; S C |C) ≥0 + I(S T ; S C |T, C) 17a and 17b, the second and third term of Eq.G-1a vanish respectively, and the synergy/redundancy between the time and category information (∆ S R ) is lower-bounded by − I(S T ; S C ) ≤ ∆ S R .(G-2) these conditions, the information conveyed by the neural response B about the what (I(B; S C )) and about the when (I(B; S T )) becomes I(B; S T ) = I(T; S T ) ≤ H(S T ) (H-4a) I(B; S C ) = I(C; S C ) ≤ H(S C ) .(H-4b) Consequently, the what (the when) in the stimulus is completely represented in the category (time) representation.In other words, I(S C ; T) (I(S T ; C)) is completely redundant with I(S C ; C) (I(S T ; T)), and I(S C ; T) ≤ I(S C ; C) (I(S T ; C) ≤ I(S T ; T)).The canonical feature interpreter is analogous to the canonical feature extractor.In fact, it can be obtained by interchanging the role of the stimulus and the response in section 3.7.
I Exp (S; B β ) = 91 ± 7 bits/s, representing about 68 % of the spike information.In both examples, the representation B α is "more sufficient" than B β .The difference I(S; B α ) − I(S; B β ) constitutes a quantitative measure of the role of distinguishing between bursts of 2, 3, . . ., n spikes, provided that the distinction between isolated spikes and bursts has already been made (I(S; B α |B β )).However, B α still preserves other response aspects, such as pattern timing, number of patterns, etc.In what follows, we study the role of different response aspects in information transmission.
β ) diminishes the information considerably in both examples (one-sided t-test, p(10) < 0.001, both cases).In the simulation, the information carried by B β is I S im (S; B β ) = 208.7 ± 0.6 bits/s, representing about 82.1 % of the spike information.This is expected since, by construction, different stimulus features are encoded by different patterns.For the experimental data,