Dual Roles for Spike Signaling in Cortical Neural Populations

A prominent feature of signaling in cortical neurons is that of randomness in the action potential. The output of a typical pyramidal cell can be well fit with a Poisson model, and variations in the Poisson rate repeatedly have been shown to be correlated with stimuli. However while the rate provides a very useful characterization of neural spike data, it may not be the most fundamental description of the signaling code. Recent data showing γ frequency range multi-cell action potential correlations, together with spike timing dependent plasticity, are spurring a re-examination of the classical model, since precise timing codes imply that the generation of spikes is essentially deterministic. Could the observed Poisson randomness and timing determinism reflect two separate modes of communication, or do they somehow derive from a single process? We investigate in a timing-based model whether the apparent incompatibility between these probabilistic and deterministic observations may be resolved by examining how spikes could be used in the underlying neural circuits. The crucial component of this model draws on dual roles for spike signaling. In learning receptive fields from ensembles of inputs, spikes need to behave probabilistically, whereas for fast signaling of individual stimuli, the spikes need to behave deterministically. Our simulations show that this combination is possible if deterministic signals using γ latency coding are probabilistically routed through different members of a cortical cell population at different times. This model exhibits standard features characteristic of Poisson models such as orientation tuning and exponential interval histograms. In addition, it makes testable predictions that follow from the γ latency coding.

A general way that cortical neurons can be characterized is in terms of their receptive fields. Collections of these can be interpreted mathematically as a library of functions, termed basis functions, that can encode any signal in terms of a sum of pairwise products of the basis functions and associated coefficients. Thus, a vector of sensory or motor data values distributed over a space x, I(x), may be approximated with N such functions using:

introduCtion
Although individual visual cortical neurons exhibit deterministic spiking in special instances, e.g., during sleep cycles (Drestexhe and Sejnowski, 2001), for the most part they exhibit predictably random spiking behavior that can be modeled closely as a Poisson process (Softky and Koch, 1993) with baseline rates that have been shown to correlate with experimental parameters in hundreds of experiments. Because of this extensive data set, it often is taken for granted that a neuron's basic function is to communicate a scalar parameter by the spike rate. Nonetheless there is reason to be dissatisfied with this stance. Typical firing rates of individual cells are low, in the range 10-100 Hz, whereas motor responses to visual stimuli in the form of saccades can be as fast as 100 ms (Kirchner and Thorpe, 2006). This means that the communication of rate is unrealistically slow if it depends on individual cells. For this reason the classical approach to explain how fast behavioral responses are possible has used population codes (e.g., Georgopoulos et al., 1986;Averbeck et al., 2006;Chen et al., 2006) that interpret the combined output of large numbers of cells as a faster and more accurate signal. However, using a substantial number of cells with each communicating the same unary code also seems unsatisfactory (Softky, 1990(Softky, , 1996. In the first place, population codes can be expensive in terms of the number of synapses required to support them. Even for modest precision and confidence in estimating an analog value such as rate, over a 100 synapses can be required for a single input (see A basis function specified by the vector u i (x) represents the synapses comprising the receptive field of the i-th neuron, and the scalar r i is a coefficient, or strength, that that cell should signal. The simplicity of Eq. 1 belies a fundamental implication of this model and that is that the component cells' spikes must be used in two different tasks simultaneously, each having different requirements. Spikes must firstly be used to help a cell learn its receptive field by adjusting the strength of its synapses, and secondly communicate the coefficient that codes the receptive field's portion of the stimulus. These two tasks have radically different information processing requirements. A cell's receptive field exhibits long-term plasticity and the synapses that compose it are continually being adjusted (Bonomano and Merzenich, 1998;Sanes and Donoghue, 2000). This task is slow and incremental, occurring prominently during development, but also in adulthood, and uses aggregates of inputs. In contrast, signaling via spikes occurs vary rapidly in the course of responding to a stimulus and typically uses just a few spikes over a very fast 100∼300 ms timescale. Equation 1 indirectly specifies both of these tasks. If the input is an ensemble of data samples {I}, the plasticity task is to learn a set of basis functions {u}. However, if these basis functions are fixed, as they are assumed to be on short time scales, then the task is quickly to signal a specific input via an appropriate set of coefficients {r}.
Tremendous progress has been made in developing mathematical formulations that solve both of these problems in a way that explains biological features, such as the orientation distributions of simple cells in striate cortex, by adding a term to Eq. 1 that penalizes solutions that use many non-zero coefficients (Rao and Ballard, 1996;Olshausen and Field, 1997;Lewicki and Sejnowski, 2000;Bell and Sejnowski, 2003;Rehn and Sommer, 2007). The crucial idea is that, in order to minimize some cost, such as metabolic energy, the cortex develops many more coding cells that the minimum necessary, so that for any particular stimulus, a small set of cells with basis functions specialized for that stimulus suffices. This minimal coding strategy is referred to as the sparse coding principle and having a surfeit of coding cells characterizes the population as being overcomplete. Although the main modeling demonstrations illustrating these points have been in the circuitry connecting the lateral geniculate nucleus to striate cortex, as we do here, the results are indicative of a much more general principle that may extend to all of cortex (Rao and Ballard, 1999).
The sparse coding results are important, but so far the majority of models have used abstractions of neurons with signed real numbers for outputs, which in turn assume some kind of population coding, instead of directly dealing with action potential spikes. The crucial issue is: if the cortex is to use a minimal number of spiking cells, how are they to communicate analog values? There have been several ingenious methods to do so by exploiting timing codes (Eliasmith and Anderson, 2003;Eliasmith, 2005;Joshi and Maass, 2005;Lazar, 2010), but these methods may be challenged in the cortex if several independent computations may be simultaneously active (von der Malsburg, 1999). Another possibility is to use a rank order code (Perrineta et al., 2004), or its variant, a latency code (Shaw, 2004). These methods work for feed forward circuits. However, for the cortex's ubiquitous feedback circuits, one needs a more general system that has a common timing reference (e.g., Zhang and Ballard, 2004). a g frequenCy Model One possible timing resource is the cortex's rhythmic signals in the g range (30-80 Hz). It has been suggested that the cortex might use an analog latency with respect to the phase of a g frequency oscillatory signal to do this (Buzsáki and Chrobak, 1995;Fries et al., 2007) and recent evidence is in support of this (Vinck et al., 2010). This coding strategy is very different from the standard Poisson-based population models. Latency temporal coding (Delorme et al., 2001;Gollisch and Meister, 2008), when referenced to a g timing signal, allows a single spike to communicate an analog quantity in a way that is interpretable by both feed forward and feedback circuitry. As a further consequence, small subsets of cells can represent a stimulus adequately and, crucial to our model, the overcompleteness of cortical representations means that many different subsets of cells can communicate the same stimulus. Of course, population codes also use different subsets of cells in signaling but the latency codes turn out to need orders of magnitude less cells to achieve equivalent precision and confidence levels.
Another important consequence of a latency code is that it is compatible with spike timing dependent plasticity (STDP; Bi and Poo, 1998). STDP specifies that the learning rule for a cell is a function of a timing reference. Although the original experiments used the timing of a cell's output spike as a reference, the STDP rule could in principle apply to an extracellular reference as assumed here. Since STDP effects are based on latencies, a latency code is a straightforward way of meeting this requirement.
We have proposed a specific model based on g latencies (Jehee and Ballard, 2009). This model proposes that g latencies serve as a general base representation for communicating analog values in both feed forward and feedback pathways. However, our initial studies using this model to learn and use receptive fields abstracted away important issues related to representing individual spikes, such as a detailed model of latency and the latency code spike statistics. The focus of this paper is to address these issues with an explicit spike model in a circuit of significant scale. In this paper we show, via a number of specific simulations, how the g latency code can serve as a general coding strategy and, at the same time, can exhibit statistics in individual cells that are very close to those of a Poisson model.
The key testable predictions of the model concern properties of the latency coding of spikes. If the stimulus is represented by a latency code, many repeated applications of the same stimulus, on average, should result in the reuse of the same latencies. Thus, the distribution of latencies in any neuron, measured with respect to the g timing phase, should be highly non-uniform. This prediction is in direct contrast to that of a Poisson model, which would predict a uniform distribution that is independent of any particular g reference. A subtler prediction involves the fundamental constraint between the spikes of neurons with overlapping receptive fields. If only one spike from the overlapping group is sent, then the groups' spikes should not be correlated at short (1∼3 ms) timescales. If more than one spike is sent during that period, the component latencies would have to have been adjusted accordingly. Neurons with non-overlapping receptive fields can be correlated, and in fact should be, as the representation of stimuli is still distributed across multiple cells, albeit in much smaller groups than needed by rate-code models. maximization constraint (Dempster et al., 1977) for estimating overlapping probability distributions. Its absence leads to double counting that results in erroneous receptive fields. Our claim is that the observed randomness in spike trains may be a consequence of the need to satisfy this constraint, which ensures that receptive fields are learned correctly.

gaMMa-latenCy Coding
A general way that all cells can signal coefficient information is via g latencies, as shown in Figure 2. Each spike communicates numerical information by using relative timing (Kirchner and Thorpe, 2006;Gollisch and Meister, 2008) where in a wave of spikes the earlier spikes represent higher values. This strategy can be used in general circuitry, including feedback circuitry, if such waves are referenced to the g oscillatory signal (Fries et al., 2007). Spikes coincident with zero phase in the g signal can signal large numbers and spikes lagging by a few milliseconds can signal small numbers. The particular formula we use to relate the projection r to latency l is given by where l is the latency value in milliseconds and a is a constant that has value 0.9 at 50 Hz.
The assumption of the ubiquitous use of latency coding is that it allows numerical data to be propagated throughout the relevant cortical circuitry quickly and, at the same time, owing to the sparseness of the code, makes this circuitry relatively insensitive to cross-talk from any other spike traffic. Given a set of learned receptive fields, an input stimulus initially can be represented by selecting a candidate probabilistically from amongst the set of V1 neurons that have receptive fields that are similar to the input, subtracting the candidate's receptive field from the input, and repeating this process with the resultant residual acting as a new input. The specific algorithm to do this has been described in (Jehee et al., 2006), and is based on matching pursuit (Mallat and Zhang, 1993). After k basis functions (neurons) have been selected, the input from the LGN has been reduced to

Methodology
The development of neural receptive fields in the striate cortex has received extensive study and has become a standard venue for testing different neural models. Thus, this circuitry provides an ideal demonstration site for our very different model of cortical neuron coding and signaling. The more abstract version of the algorithm has been shown to learn receptive fields in cortical areas V1 and MST (Jehee et al., 2006) as well as to account for temporal feedback effects from striate cortex to the LGN (Jehee and Ballard, 2009). The basic core of the algorithm is recapitulated in Section "The Algorithm for Learning Receptive Fields" in the Appendix. Here we describe the modifications for keeping track of individual spikes. What makes the model fundamentally different is its combination of three interlocking constraints: (1) randomized action potential selection, (2) variable g latency coding, and (3) multiplexing of several different g range frequencies.

randoMized aCtion potential SeleCtion
A central constraint on action potential generation concerns neurons with overlapping receptive fields. Two receptive fields are said to overlap when the dot product of their normalized receptive fields is significant, which we take to be greater than some scaler value m. In our simulations most neurons are nearly orthogonal, that is, the dot product is less than 0.20. Rather than selecting the most similar neuron at each instant, neurons with significantly overlapping receptive fields compete to be chosen (Jehee et al., 2006). In terms of the spike model, when two basis functions overlap significantly, they are as a consequence not orthogonal and must probabilistically compete to be the one chosen to send a spike as shown in Figure 1. The probability of being chosen is given by p = e 10r /Z where r is the projection, the empirically determined scalar 10 weights the largest responders, and Z is a normalization factor. In the figure the red circles denote the two competing responses of the left and right receptive fields for a particular input. The probabilistic protocol is dictated by the expectation Figure 1 | random action potential spike selection. On each g cycle, only one of neurons with overlapping receptive fields can signal. Whether or not a neuron sends an action potential is determined probabilistically according to the relative projections of the input onto receptive fields. For the two neuron example shown, for the particular input shown in red, the odds of the leftmost neuron being chosen over the rightmost neuron is given by the ratio of r 2 to r 1 . In the case of multiple overlapping receptive fields, the odds are distributed appropriately among them. The action potential of a neuron encodes an analog value in terms of a latency(black curve) that is scaled with respect to the phase of a particular g frequency (gray curve). The model assumes that several different subnetworks can be simultaneously active and that they are each distinguished by using a frequency in the g range.

Ballard and Jehee
Dual roles for spike signaling Frontiers in Computational Neuroscience www.frontiersin.org Consequently, we assume that the latency code can be adjusted to reflect the particular frequency used. To handle this we scale the value of a by each frequency i.e., where g o is assumed to be 50 Hz. The assumption of multiple frequency latency codes potentially allows multiple independent sets numerical data to be propagated throughout the relevant cortical circuitry quickly and, at the same time, owing to the sparseness of the code, makes this circuitry relatively insensitive to cross-talk from any other spike traffic.

reSultS
The focus of our simulations is to show that timed circuits that use the randomized spike generation protocol send information in a very different way than is currently conceived in that the group of cells sending the information varies from cycle to cycle. However, despite this difference, the spikes in this code can appear to be very similar to those generated by a Poisson processes. At the same time, the latency code introduces regularities in the spike distributions with respect to the g phase that should be detectable.

g latenCy Coding produCeS Conventional orientation tuning CurveS
Does this process generate data that describes conventional receptive fields? We tested the claim that rival neurons were competing in simulation. Neurons whose receptive fields were learned with the matching pursuit algorithm that used the conventions of our model were tested on an 8 × 8 simulation by using small Gabor image patches as input. The Gabor patches were computed using the parameters (wavelength, phase offset, bandwidth, aspect ratio) = (2, 0, 1, 1). The wavelength is measured in pixels.
Thirty-six Gabor image patches were created, one for every 10° rotation. These were then presented to the network 1200 times and fit with learned basis functions each time. To emphasize the point that the randomized neural selection process models the receptive field, we chose two basis functions that overlap and measured their receptive fields using their spike counts for the different Gabor orientations. Their histogram data are indicated in Figure 3. The spike counts, which in the model reflect the number of times they were chosen probabilistically, are representative of standard oriented receptive fields measured experimentally using repeated trials to generate a post-stimulus time histogram, even though the model's process for generating spikes is very different from any standard rate-code model.

SiMulationS Showing the randoM routing of the Signal
The Gabor image test shows that a probabilistic selection method can produce orientation tuning but does not show off two crucial features of the coding method, namely (1) the codings of an image patch vary from g cycle to g cycle and (2) they use latency coding to send a coefficient. These two features are made explicit in where l i n is the latency of neuron i n .
A key assumption in the model concerns the mechanism for realizing this equation in neural circuitry. The use of g latency coding to represent quantities suggests that the process of evaluating residuals should be done in one or two g cycles. Otherwise it would take too long to represent the input stimulus. For example, since each cycle consumes on the order of 20 ms, if it took a cycle for each of 10 coefficients, the total time budget would be unrealistic. Thus, our model posits that an essential role of lateral connections in V1 is to implement the subtraction process shown in Eq. 3 for each of the N different component basis functions within one or two cycles, and the particular simulations herein assume all the subtractions are done within a single g cycle. This is a temporarily demanding constraint, nonetheless the requisite circuitry is in place to do this (e.g., Martinez et al., 2005). There is not a huge wealth of data showing how rapidly cells can communicate in situ, but (Crapse and Sommer, 2009), show that the superior colliculus can cause spikes in the frontal eye fields within 1-2 ms. Thus one would expect that local communication within a cortical area might take considerably less time.
Another important constraint is that, in order to be latency coded without error, the residual magnitudes need to be generated in decreasing order. This is usually the case as highly overcomplete representations mean that some neuron's receptive field is close to the residual and thus has a high probability of being chosen. This probability can also be tuned by scaling b in Eq. 10.
Learning the receptive fields starting from random connections is straightforward. At each step, the winning neuron has its receptive field made a little more similar to that of the stimulus that resulted in its selection. This can be done by moving each receptive field in the direction of its residual, i.e., the k-th receptive field changes by where h is a scalar learning rate.
This equation shows that the learning of receptive fields can take place simply using STDP if both use the same latency code. Since (1) ∆u ∝ ∆I, (2) the former is signaled by latencies with respect to the g phase, and (3) both the model and experimental data use exponential decreases (Bi and Poo, 1998), the adjustment is in principle straightforward.

g frequenCy band Multiplexing
The g band is a broad range and consequently it would be unlikely to expect a single frequency to be chosen and used as a "clock." Indeed elaborate statistical tests for such a clock in local cortical field potential data over 2-4 s have been negative (Burns et al., 2010). However, it may very well be the case that the cortex can have multiple simultaneous computations that each use a separate oscillatory frequency in the g range. Indeed there are reasons to suspect that this is possible. As noted by (Ray and Maunsell, 2010), the observed g frequency scales with contrast, so at least the oscillatory frequency can be adjusted.
signaling. Since latency code is correlated with the spike rate, the rate suggests itself as the primary signaling code, whereas, under the latency hypothesis, the rate is just a byproduct of a more efficient timing-based code.
The data in Figures 4 and 5 plot only the spikes coding the stimulus. No competing stimuli are present and there are no noise effects. However, it is unreasonable to expect that a population of neurons would be isolated in this way. To that end, we conducted two additional simulations in successively more realistic environments.

poiSSon Model CoMpariSon: Modeling the baCkground aS noiSe
In the first test, we constructed a more complete spike data set by embedding those spikes in a "noise bath" of background neural spiking. The idea of noise here is as a model for any other ongoing processing. For each neuron, the probability of firing at each millisecond was set to 0.008, representing approximately 10 Hz background rate, and the model's spikes were added to this bath. For comparison, we compared these spike trains to a pure noise bath with firing probability of 0.01. The different background rates were chosen so that the number of spikes in each case was approximately the same. The results are shown in Figure 6. The black curve shows the cumulative distribution of the spike train for the spikes that use random routing plotted against the cumulative distribution for white noise spikes. By way of comparison, the red curve shows the distribution of spikes in a case where random routing is not used and one set of coefficients is simply repeated, plotted against the cumulative node distribution. It is easily seen that the random routing necessitated by the learning algorithm obscures the timing signal. With the random routing, no timing perturbation is noticeable even after 40 s of simulation. In contrast, without the random routing constraint, the effect of the g latency code clearly shows up in the plot as a step change, representing the influence of the highly detectable 50 Hz signal. The test shown in Figure 6 provides an additional indication that the g latency code can appear similar to a random spike code but at the same time there are more exacting tests than the cumulative distribution comparison. Also, the overcompleteness measure of 2.56 is not very representative of the cortex, which uses overcompleteness measures greater than 10. To attempt to approximate this larger overcompleteness ratio, we resorted to using copies of the learned basis functions in a more exacting test. Basis sets of 512 (×2) and 768 (×3) were constructed by replicating the original learned set. With a larger set, the spike rates for individual cells decrease, so that, to bring the rates up to biologically observed levels, multiple different patch codings were used. For the case of the ×2 set 12 patches were coded, and for the case of the ×3 set, 18 patches were coded. Note that this is possible owing to the enormous economies introduced by the g latency code which uses sparseness to ameliorate cross-talk. In addition, per (Diesmann et al., 1999), a baseline noise rate of 2 Hz was added to each spike train.
An issue that has been finessed up to this point is that g frequencies appear in a range of 30-80 Hz. Thus per (Ray and Maunsell, 2010), for the proposed strategy to work, the latency code has to be sensitive to the particular frequency chosen. As a consequence, one fitting iteration to the next, i.e., the circuit is assumed to be memoryless. However, by adapting the method of (Druckmann and Chklovskii, 2010), memory could be added.
The simulation uses 256 neurons and shows the spikes for each of 200 g cycles, representing 4 s of elapsed time (200 × 20 ms). Each colored dot represents a spike for a particular neuron and in each column 12 such spikes are present, 1 from each of 12 neurons selected. The spikes are color-coded to indicate their g latency values, with light yellow to white being the highest value and dark red being the lowest. The colored scale bar indicates the values of the coefficients prior to latency conversion. Neurons that are used frequently have high coefficient values.
The 10 inset images in a row at the top of this figure are reconstructed from the sum of products of the 12 coefficients at cycles {20, 40, 60, 80, 100, 120, 140, 160, 180, 200} and these images show a high degree of invariance despite the fact that the basis functions used in encoding the patch are different. For example, comparing the basis functions for cycles 60 and 120 (shown on the extreme right hand side) shows that the neurons used to represent the patch are almost entirely different. This can be checked visually by picking a receptive field for the 60 ms patch encoding and trying to find its counterpart in the 120 ms patch encoding and vice versa.
One direct consequence of the probabilistic spike selection strategy is that if the individual neurons are examined, their spike "rate" is correlated with their average response coefficients. Here we use "latency code" as a synonym for "response" since a recipient neuron can decode the latencies to recover the responses. To demonstrate this feature, Figure 5 plots, for each neuron, the sum of its responses (coded as latencies) against the number of times it was chosen. This rate-latency code correlation is one reason why it could be difficult to appreciate a latency code as the basis for neural Each histogram records, for 1200 presentations at each tested orientation, the number of times the neuron was randomly selected by the algorithm. By way of comparison, in a conventional model the y-axis would record the match between the input and the receptive field. The two peaks for each neuron result from the fact that the Gabor patch is self similar every 180°.

Ballard and Jehee
Dual roles for spike signaling Frontiers in Computational Neuroscience www.frontiersin.org it could be assumed that multiple encodings are using different frequencies. Therefore, for the different simultaneous encodings, it is assumed that they each can use a different frequency in the g range, and this is modeled by choosing random frequencies with wavelengths in the range [20 ± (no. patches)/2]. In addition an initial phase offset for each coding in the range [0, 20] is chosen. Finally, the appropriate latency for each neuron's coefficient is added as an offset to its g reference. Figure 7 shows the comparison of g latency encodings with one set of random spike trains ( Figure 7A) whose level of randomness has been adjusted to make the total spikes similar to those in the g latency spike trains. All the comparisons use 1 ms resolution. The simulations lasted for 4 s, or approximately 200 g intervals. We assume that the different patch encodings are not confused, since they use different phases and offsets and the spike trains are very sparse. For each graph, five different runs are made and the data combined in a interval histogram plot with the standard error in the mean at each sample denoted by color. Figure 7B shows that for the overcompleteness

Ballard and Jehee
Dual roles for spike signaling Frontiers in Computational Neuroscience www.frontiersin.org 1. Large scale computation It specifies a way of organizing computations that span different areas throughout the cortex that is impervious to other ongoing computations. The neurons that are tied to a specific g frequency can communicate with each other easily. Cross-talk with other action potentials is minimal owing to the sparseness of the representations and the different g frequencies. Potentially, a given neuron can participate in several different computations, each one specified by a particular g frequency. 2. Generic signal representation It is generic in that the signal representation can suffice for any analog quantity, so that different algorithms that the cortex might use can be interfaced without any difficulty. 3. STDP compatibility The use of the latency code by the learning algorithm is of significant interest since it exploits precisely the timing feature used in STDP (Bi and Poo, 1998). Found in the hippocampus, the learning time constants are about three times longer than needed to support learning in our model, where changes are communicated within 5 ms windows, but it may turn out that STDP is faster in the cortex. Certainly to take full advantage of g latency coding it needs to be. In such a coding the modulation of learning in models such as ours scales the magnitude of the effects in precisely the way signaled by the delay code. 4. Poisson-like statistics It can reproduce the Poisson-like statistics that are the hallmark of rate-code models. From this perspective, the post-stimulus histogram is useful as a correlate of the putative g latency code.
However the g latency code is not without challenges. It is significantly more technically demanding to implement than the standard rate code and the following issues need to be addressed in future work that would use a more detailed neural model.

Precision
How much precision can one expect in the action potential? Our model assumes that the action potential peak is the key indicator. Given a 5-to -10 ms latency window, four bits of precision would require the ability for the recipient neuron to discriminate peaks that are as small as 0.25-0.5 ms apart. This would seem very demanding. However, an ameliorating factor is that since just the receptive field response is to be estimated and this estimate appears as the form of a weighted average, an additional three bits of information may be possible by the action of accumulating charge at the soma.

Spike Random selection and communication
We do not specify exactly how the spikes are generated, and to do this places demands on the neuron. There has to be a mechanism that translates the differences in potential into a probability, and, at the same time, uses this information in the coding of latency. In addition, once selected, the neuron has to communicate its value rapidly to the other neurons in the pool. Our basic assumption is that the size of the pool has to be less than the axonal branching factor which is usually assumed to be on the order of 10 4 . 3. Sequentiality The associated matching pursuit learning algorithm has an inherent sequentiality that is demanding to model, however our simulations assume that this sequentiality can be integrated into the generation of the g phase codes.
ratio of 7.68, the interval histogram is very similar to that observed in purely random spike data. Similarly for the overcompleteness ratio of 5.12, the histogram is very similar to the random plot. However, when the ratio is reduced to 5.12, and the g range is also restricted to 50 ± 1 Hz, the effects of g timing clearly can be seen; however, one must appreciate that this test is severe, as the encodings are designed to be similar, 512 cells are used, and the measurement interval is 4 s. Furthermore, experimental measurements of larger numbers of cells using local field potentials regularly show a g signal. teStable hypotheSiS Figure 8 shows an essential feature that would distinguish the model from a standard Poisson model. If the same image patch is used repeatedly, then any given neuron will have, on average, a discrete set of coefficients that it signals, owing to the discrete nature of the signaling pool of cells. Thus, if a histogram is made of these coefficients for each cell, as is shown in the figure, the entries will not be uniform, as expected if the process were Poisson, but instead will exhibit the asymmetries shown, where certain latencies are used much more frequently than others.

diSCuSSion and ConCluSionS
Our primary hypothesis is that the rate code observed in so many experiments might be an abstraction of a more fundamental and efficient latency code that uses frequencies in the g range as timing references but can generate rate-like statistics when tested by conventional means. This stands in contrast to models that posit that rate codes and g frequency codes are separate and compatible (e.g., Masuda and Aihara, 2003). The particular g latency code model tested in simulation here combines both probabilistic selection and multiplexing and has several important features. In one random routing was used to select the coefficients, and in the other the same coefficients were re-sent at each coding cycle. Next the cumulative density functions (cdfs) of these two cases (vertical axis) were plotted against the cdf for a pure noise bath of the same number of spikes (horizontal axis). Note that if two cdfs were identical the result would be a straight line. The g timing signal clearly shows up using the cdf for the case where the same coefficients are used at every cycle (red curve) vs. the random cdf. In contrast, the random routing cdf vs. the random cdf tends to obscure the g signal completely (black curve).  This process is technically demanding: the largest component must remove its components so that they are not party to the choice of larger g latencies representing smaller magnitudes.
In that way, all the component neurons can be found in one g cycle. This is possible as the axonal propagation speeds in myelinated sensory neurons are on the order of 5∼25 m/s. Another option is that, as long as neurons with substantial overlap can compete, the process can be conducted in parallel. The consequence is that the result can be off by a scale factor, but there maybe ways of fixing this result. Future work will address this option. 4. Decoding Another challenge for g latency coding is that of decoding. Consider the task a recipient neuron faces in recognizing a coded image patch. The code can be realized in many different ways as shown in Figure 4. However a straightforward way of decoding would be to connect the appropriate basis functions' axons to the decoding neuron's dendritic tree. Given that the recipient neuron can have on the order of 10,000 synapses and on the order of 10 basis functions are used per code, ideally 10 3 slots would be available. To reduce cross-talk, it may be possible to organize all the connections for any given code on the same branch of the dendritic tree.
performing just one large computation at a time, so that all the participant neurons implicitly reference that computation and no further bookkeeping is necessary. Nonetheless, a way to segregate different neural computations in cortex would add enormously to its computational capabilities as different tasks could be ongoing. Such tasks, as pointed out by von der Malsburg (1999), can be quite short lived, taking or the order of half a second or less. However, if more than one independent computation is ongoing then there needs to be some way of keeping incompatible computations separate. This is not an easy task as the underlying circuitry is common and this has been characterized as the "binding problem" (von der Malsburg, 1999). If different computations could access different frequencies in the g range, then this could be one way of keeping them separate. To do this, however, requires extending the idea of a single g "clock" to small subsets of different g frequencies or "clocks" that can the thought of as representing different tasks. The huge disparity between the precision of digital computers and neurons in the brain prompted computer pioneer von Neumann (1958) to speculate that the brain might eschew numerical computation altogether, and how to represent numerical precision in cortex remains an open issue. The tremendous success of the rate-code model in interpreting cortical experimental data may have postponed the study of more detailed technical considerations as to its achievable precision. Since it is essentially a unary code, to a first approximation, a recipient neuron is faced with counting spikes to obtain a quantitative value, and since the unary code is Poisson, there is an additional overhead in estimating this value in the face of noise. When these two factors are taken into account the compression realized by the g latency code is greater than a factor of 100. While much more experimental evidence is required to substantiate the validity of the proposed g latency model, our simulations show that it is able to represent numerical data with useful levels of precision.

aCknowledgMentS
This material benefited from discussions with Fritz Sommer and Bruno Olshausen and the Redwood Center for Theoretical Neuroscience at Berkeley, as well as colleagues Ila Fiete, Jonathan Pillow, Nicholas Priebe, Dan Johnston, Bill Geisler, Larry Cormack, and Alex Huk at the the University of Texas at Austin and Jochen Triesch and Constantin Rothkopf at FIAS, Frankfurt. The research has been supported by grants MH 060624 and EY 019174. 5. Phase locking A final implementation issue for the underlying biology is to realize different frequencies in the g range. We envision that the lower levels in the cortical mantle can do this and that the result is somehow mapped onto the pyramidal circuitry that is doing the main processing and thus cells in a large network can be quickly synchronized. Other studies have suggested ways of doing this.
When phase locked, the g latencies may be difficult to detect. We envision that the temporal usefulness of a particular set of g latencies might be as little as 300 ms, the typical time used to acquire information with a saccade and commensurate with the times used in visual routines (Roelfsema et al., 2003). Furthermore it may be possible that many simultaneous such routines could coexist, each using a g signal with a different global phase. Both of these possibilities, together with the random routing constraint, combine to make the detection of the g latencies a difficult problem as noted by (Burns et al., 2010).
The increasing number of experiments that have confirmed the general presence of the g signal has led to many suggestions as to its role. The g signal has been suggested as the basis for a number of effects such as attention , feature binding (Singer and Gray, 1995), and even consciousness (Engel et al., 1999), but we suggest it is a generic method of signaling quantitative information. To make this distinction clear, consider the report that shows g band activity is correlated with change detection . This can be interpreted as a sign that neural activity was initially uncorrelated and subsequently became more synchronous. However, an alternate interpretation is that the underlying process that was dedicated to monitoring the experimental condition used a particular phase reference. In other words, zero phase has to be set to a particular time for this specific problem, so that the latencies signaling numerical quantities are coded with respect to this reference. So our alternate interpretation is that, as the network solves the experimental signal detection problem, ever more neurons are recruited to that phase reference or possibly more signaled values are large and at very small phase lags. In either case, the clocked phase was always present but in the latter case the number of basis functions is just increased, making the g signal more readily detectable.
The number of independent computations that are possible in the cortex is an open question that the g signal may also shed light on. It might still be possible that the cortex is capable of

Matching Pursuit Algorithm
To choose the N neurons that best predict a given input (i.e., neurons that are active), we use a modified version of the matching pursuit algorithm (Mallat and Zhang, 1993). Matching pursuit uses the least number of basis vectors or equivalently the least number of active V1 neurons to accurately predict its input. In a deterministic version of the algorithm, the first vector is chosen as the vector u i 1 that minimizes at the next time step, an additional vector is chosen that minimizes and so on, where the response r i k of the vectors is given by

Probabilistic Version
This deterministic version was modified in order for the learning algorithm to be optimal in terms of sparseness (Jehee et al., 2006). Thus, after learning not only do the V1 units make more accurate predictions, but also fewer of them participate in any given prediction. The modification is very simple: at each time step, a V1 unit is selected randomly from the distribution where i k is the index of the selected unit in the k-th iteration, u j is the basis vector associated with the jth unit, Θ(x) is the Heaviside function, b −1 = 1/10 is a temperature parameter and Z is a normalizing term. In the modified model, the probability with which a unit is selected increases when its receptive field structure better predicts the lower level input. To guarantee optimality, the response of a selected unit r i k should be drawn from a normal distribution N I u k i k ( , ) a s ∆ − ⋅ 1 2 with small variance (Jehee et al., 2006), but the effect of this process is negligible so that in practice the response of a neuron in the modified model is given by Eq. 7. The selected basis vector weighted by its neuronal response is then subtracted from the input.
The feedforward-feedback cycle is then repeated on the residual input so that after k iterations the residual input is given by In words, the number of active V1 neurons increases at each time step in the model, and their combined prediction is subtracted from the actual input. We assume model V1 responses to be stable and non-decaying over the considered time scales.

Learning Receptive Fields
To enhance the sparseness of the neural code and better capture the input statistics, basis vectors are updated in each feedforwardfeedback cycle. This is done by minimizing the description length of the joint distribution of inputs and neural responses (Jehee et al.,

appendix a: the CoSt of preCiSion in a rate Coding Model
Assume that a temporal interval can be adjusted so that in a Poisson model, the probabilities of having zero or one spike dominate the probability of having more than one spike. In that case, an estimate of the probability of an input rate f can be made by counting spikes. Where the input is a probabilistic variable Z that can be zero or one and has P(Z = 1) = f, the estimate, f is given by where m is the number of intervals. A straightforward application of the Hoeffding inequality shows that for a precision d f f = − ∧ | | known with confidence C, the number of inputs required -and consequently synapses -is given by To just get within 10 with 95% confidence requires about 150 synapses. So for 100 independent inputs the total synaptic budget necessary would be 150 synapses per input ×100 inputs, or 15,000 synapses total. In contrast, if the latency code has 10 discernable levels the total number of inputs required would be just 100.

b: the algorithM for learning reCeptive fieldS
The learning algorithm is a fast algorithm for fitting learned receptive fields to input data based on matching pursuit (Mallat and Zhang, 1993). Here we briefly describe model equations and parameters. The interested reader is referred to (Jehee et al., 2006;Jehee and Ballard, 2009).

Initial Filtering
The input is obtained from 768 by 768 black-and-white images of natural surroundings (Figure 1A), filtered with a zero phase whitening/lowpass filter (Atick, 1992;Olshausen and Field, 1996): where the tilde represents the Fourier transform in 2D, and f 0 = 300 cycles/image. We limit this input into the model to a 10 by 10 (100) "patch" that is randomly selected from the filtered image, and represented as a single vector.
The model, which would correspond to an orientation column in cortical area V1, is represented by 256 units. In the language of the model, the synaptic weights between its un-oriented input and oriented V1 units form basis vectors that represent the preferred stimulus of the model oriented V1 neurons. These cells predict its layer IV input I as a linear combination of N active basis vectors, where the weighting coefficient of each basis vector u i is given by the response r i of its corresponding V1 neuron: in which n is a stochastic noise process.

Latency Code
The response of a neuron encodes an analog value in terms of a latency with respect to the phase of a g signal. The particular formula used is given by where l is the latency value in milliseconds. In the learning receptive fields, the analog value is used. For the spike code analyses, such as the spike interval histograms, the analog value for the coefficient is converted to a latency in milliseconds, rounding to intervals of 1 ms. 2006), but the same learning rule can also be obtained from the gradient of the error function for the k-th feedforward-feedback cycle (Jehee et al., 2006): where h = 0.3/(1 + m) is the learning rate, in which m is initially equal to 1 and increases by 1 every 1000 image patches. V1 basis vectors are normalized, and initialized using 128 random values with zero mean: positive values are taken as the initial values of the entries coding for on-type inputs, negative values are rectified and taken as the initial values of the off-type entries of the basis vector, the remaining 128 entries are initialized with value zero. Initializing all 256 entries of the basis vector with random values gives similar results. The basis vectors are trained on 10,000 image patches extracted from 16 natural scenes, and the model was allowed to