Representation of visual scenes by local neuronal populations in layer 2/3 of mouse visual cortex

How are visual scenes encoded in local neural networks of visual cortex? In rodents, visual cortex lacks a columnar organization so that processing of diverse features from a spot in visual space could be performed locally by populations of neighboring neurons. To examine how complex visual scenes are represented by local microcircuits in mouse visual cortex we measured visually evoked responses of layer 2/3 neuronal populations using 3D two-photon calcium imaging. Both natural and artificial movie scenes (10 seconds duration) evoked distributed and sparsely organized responses in local populations of 70–150 neurons within the sampled volumes. About 50% of neurons showed calcium transients during visual scene presentation, of which about half displayed reliable temporal activation patterns. The majority of the reliably responding neurons were activated primarily by one of the four visual scenes applied. Consequently, single-neurons performed poorly in decoding, which visual scene had been presented. In contrast, high levels of decoding performance (>80%) were reached when considering population responses, requiring about 80 randomly picked cells or 20 reliable responders. Furthermore, reliable responding neurons tended to have neighbors sharing the same stimulus preference. Because of this local redundancy, it was beneficial for efficient scene decoding to read out activity from spatially distributed rather than locally clustered neurons. Our results suggest a population code in layer 2/3 of visual cortex, where the visual environment is dynamically represented in the activation of distinct functional sub-networks.


INTRODUCTION
Mouse visual cortex shares fundamental features such as retinotopy, receptive field types, orientation tuning, and ocular dominance plasticity with visual cortices of higher mammalian species (Hubener, 2003). Nonetheless, the fine-scale organization of its cortical microcircuits is clearly dissimilar. Recently, in vivo two-photon calcium imaging enabled new insights into the functional micro-architecture of mouse visual cortex by measuring neuronal response selectivity with single-cell resolution Grewe and Helmchen, 2009;Wallace and Kerr, 2010). Receptive fields of layer 2/3 neurons were found to be relatively large with high overlap for neighboring neurons (Smith and Hausser, 2010). In addition, a salt-andpepper organization of orientation preference exists in layer 2/3 (Ohki et al., 2005;Mrsic-Flogel et al., 2007;Sohya et al., 2007). Thus, these neurons can produce highly selective action potential output in response to drifting gratings, even though synaptic inputs onto their dendrites are more broadly tuned (Jia et al., 2010;Medini, 2011). Such selective responses of cortical neurons suggest that in spite of large receptive fields and high overlap of dendritic and axonal arbors of neighboring neurons (Hellwig, 2000) there may exist a specific micro-organization. Indeed, inter-connected sub-networks of layer 2/3 neurons sharing distinct inputs from layer 4 have been identified in brain slices (Yoshimura et al., 2005). Moreover, a recent study that combined in vivo two-photon calcium imaging with post-hoc paired whole-cell recordings in brain slices reported evidence for functional sub-networks of neurons expressing similar orientation tuning (Ko et al., 2011). To better understand local processing of the visual scenery in intermingled networks of neighboring neurons with diverse tuning properties, further characterization of such functional sub-networks is essential.
Activation of cortical neurons critically depends on the type of visual stimulation and it remains unclear how complex stimuli are encoded in mouse visual cortex. In other species, it has been shown that visual cortex is tuned to compute natural scenes with their specific spatial and temporal statistics . While dynamic natural scenes evoke sparse responses (Vinje and Gallant, 2000;Yao et al., 2007;Yen et al., 2007;Haider et al., 2010), presentations of static natural images failed to induce sparse coding (Tolhurst et al., 2009). This difference may in part arise because synaptic connections between cortical neurons are not stationary but express diverse dynamic transfer functions, even for different terminal arbors of the same axon (Markram et al., 1998). Thus, to reveal the representation of complex and dynamic visual stimuli in mouse cortex, comprehensive measurements of local population activity are needed.
Here we applied a 3D laser scanning technique for in vivo twophoton calcium imaging of neuronal populations (Göbel et al., 2007) in order to determine the local representation of dynamic visual scenes, including natural movies, in layer 2/3 of mouse visual cortex. We evaluated response selectivity and encoding capacity of individual neurons as well as of variable-sized neuronal sub-populations. In addition, we analyzed the spatial distribution of visual scene representations within the local microcircuit, revealing shared functional properties on the finescale of neighboring neurons.

ANIMAL PREPARATION AND FLUORESCENCE LABELING
All animal procedures were carried out according to the guidelines of the University of Zurich, and were approved by the Cantonal Veterinary Office. C57BL/six mice (2-3 months old, of either sex) were anesthetized with either 2.7 ml/kg of a solution of one part fentanyl citrate and fluanisone (Hypnorm; Janssen-Cilag, UK) and one part midazolam (Hypnovel; Roche, Switzerland) in two parts of water or by urethane (0.5-1.0 g/kg) and chlorprothixene (0.2 mg/mouse), applied intraperitoneal. With fentanyl, anesthesia was maintained by injecting 0.4 ml Hypnorm, 1.1 ml H 2 O, and 0.1 ml Dormicum at 0.05 ml per 10 g body weight per hour. Atropine (0.3 mg/kg) and dexamethasone (2 mg/kg) were administered subcutaneously to reduce secretions and edema.
The primary visual cortex was identified using intrinsic imaging (Schuett et al., 2002). Briefly, we illuminated the cortical surface with 630 nm LED light, presented gratings continuously drifting in all direction for 6 seconds, and collected reflectance images through a 4x objective with a CCD camera (Toshiba TELI CS3960DCL; 12 bit; 3-pixel binning, 427 × 347 binned pixels, 8.6 µm pixel size, 25 Hz frame rate). Intrinsic signal changes were analyzed as fractional reflectance changes relative to the prestimulus average. Regions for two-photon imaging were selected within the responsive area identified with intrinsic imaging about 2 mm lateral from the midline, corresponding to the monocular region for the contralateral eye.
A craniotomy was opened, the dura removed, and the exposed cortex superfused with normal rat ringer solution (NRR) (135 mM NaCl, 5.4 mM KCl, 5 mM Hepes, 1.8 mM CaCl 2 , 1 mM MgCl 2 , pH 7.2, with NaOH). Calcium indicator loading was performed using the "multi cell bolus loading" technique (Stosiek et al., 2003). Briefly, 50 µg of the acetoxymethyl (AM) ester form of the calcium-sensitive fluorescent dye Oregon Green BAPTA-1 (OGB-1; Invitrogen, Basel, Switzerland) were dissolved in 4 µl DMSO plus 20% Pluronic F-127 (BASF, Germany) and diluted with 36 µl standard pipette solution (150 mM NaCl, 2.5 mM KCl, 10 mM Hepes, pH 7.2) yielding a final OGB-1 concentration of about 1 mM. The dye was pressure ejected under visual control through a glass pipette with broken tip inserted into layer 2/3 of visual cortex. Application of sulforhodamine 101 (SR101; Invitrogen) to the exposed neocortical surface resulted in co-labeling of the astrocytic network (Nimmerjahn et al., 2004). Following dye injection the craniotomy was filled with agarose (type III-A, Sigma; 1% in NRR) and covered with an immobilized glass cover slip.

VISUAL STIMULATION
Visual stimuli were presented on a 21 inch CRT monitor 30 cm in front of the contralateral eye. The stimulus set for monocular stimulation consisted of four different 10 seconds movies: two different natural movies, a movie of drifting gratings, and a noise stimulus ( Figure 1A). Natural movies were chosen from a published database (van Hateren and Ruderman, 1998) and normalized for mean luminance and contrast. Drifting square wave gratings and noise stimulus had the same temporal (2 Hz) and spatial frequency (0.05 cycles per degree). Gratings drifted in eight different directions for 1.25 seconds to result in a 10 second movie. All stimuli were presented for 10 seconds in pseudorandom order interleaved with blank periods of at least 20 seconds. Typically, 6-12 trials were collected for each visual scene.

3D TWO-PHOTON CALCIUM IMAGING
Calcium transients were acquired using a custom-built twophoton microscope equipped with a piezoelectric focusing unit (PIFOC; Physical Instruments, Germany) and a 40x water immersion objective (LUMPlanFl/IR; 0.8 NA; Olympus). 3D laser scanning and data acquisition were performed as described (Göbel et al., 2007) using custom written software (LabView; National Instruments, USA). A spiral scan line (10,000 scan points) was adjusted to cover a scan volume of 100-200 µm side length and 60-150 µm in depth usually starting at 100 µm below the cortical surface ( Figure 1B). Fluorescence data were acquired together with the position signal of the scanning mirrors and the piezo focusing unit. On average 84 ± 5% (n = 12 populations) of the manually identified neurons were hit by the 3D scan line at 10 Hz scan rate.

ELECTROPHYSIOLOGY
For verification of the estimated spike rates we performed simultaneous juxtacellular recordings of neuronal firing patterns during 3D population imaging. A glass pipette was filled with NRR and the red dye Alexa 594 (20 µM; Invitrogen) for visualization. The tip of the pipette was placed near a neuron filled with calcium indicator and a seal was formed to record extracellular spikes. Spikes were recorded at 5 kHz using a patch-clamp amplifier (npi, Reutlingen, Germany) and Spike2 software (CED, Cambridge, UK), threshold detected and binned at the same rate as the imaging sample rate (10 Hz; Figure 1C).

CALCIUM SIGNAL ANALYSIS
Data were analyzed with LabView and Matlab (Mathworks, USA). Cells were detected manually in the reference stacks and their locations superimposed with the acquired position signal of the 3D laser scan line ( Figure 1B). A volume of interest was placed around the cell bodies and the enclosed pixels of the scan line were assigned to the respective cell (Göbel et al., 2007). Relative percentage changes in fluorescence ( F/F) were thresholded at 95% confidence level of the baseline. To estimate the underlying spike rate we used a deconvolution method (Yaksi and Friedrich, 2006). Traces were low-pass filtered (0.4 Hz) and deconvolved with an idealized spike-evoked calcium transient (amplitude 5%, decay 1.6 seconds) ( Figure 1D). Population responses are shown as intensity graphs, with time running on the horizontal axis and the estimated neuronal spike rate for all neurons depicted in the rows using a gray-scale code ( Figure 1E).
To test the reliability of the neuronal responses to the presented visual stimuli we calculated the correlation of each  trial to every other trial. This analysis was performed either for individual neurons or for the entire population, in which case all single-neuron responses (rows in the intensity graphs) were concatenated to a single long vector. The correlation of trial i and trial j of either single-neuron or network responses was calculated as the covariance of the two vectors (X i , X j ), normalized by their respective variability (standard deviation, σ i and σ j ): Because visual scenes were presented in random order, we sorted the trials according to which visual scene had been presented and displayed the results in a correlation matrix with the correlation coefficient color-coded.

DECODING ANALYSIS
We analyzed how well visual scenes could be decoded from the temporal response pattern either of individual neurons, the entire local population, or subsets of the population. Response trials were classified as encoding for one of the four visual scenes using a nearest mean classifier (Duin, 1996;Goard and Dan, 2009).

Frontiers in Neural Circuits www.frontiersin.org
We computed the class mean for each of the four visual scenes from the training set leaving out the tested trial. The assignment of a trial to a particular class was based on the nearest class mean using a correlation-based distance metric (d ik = 1 − r ik ) where r ik is the correlation of the test trial with the mean of stimulus class k. The obtained list of assigned stimulus identities was compared to the actual order of the stimulus presentation to get the percentage of correctly classified trials for each experiment and stimulus. The same procedure was used for single-cell correlations. Reliable responders were defined as neurons whose responses could be correctly classified in >50% of the trials for at least one visual scene. The type of stimulus preference of each reliable responder (single-stimulus versus multi-stimuli preference) was given by the number of those visual scenes with >50% correctly classified response trials.
To test the dependence of decoding performance on the size of the considered population we repeated the classification algorithm for variable-sized sub-networks of subsets of neurons randomly drawn from the total population of neurons in each experiment. This process was repeated 1000 times for each subnetwork size (one to total number of neurons in experiment) to calculate the mean percentage of correctly classified trials for each network size. For each experiment the maximum performance was calculated as the percentage of correctly classified trials of the complete network. The minimum network size for reaching near-optimal decoding was defined as the mean number of cells required to reach 95% of maximum performance.
To test for dependence of decoding performance on number of discriminated stimuli, the same process was also applied to trials with responses to 2, 3, or 4 randomly drawn visual scenes (i.e., all six possible combinations were considered for 2 stimuli sets and four combinations for 3 stimuli). In addition, to correct for the dependence of information content on the number of stimuli tested, we repeated the analysis using mutual information as performance measure. Mutual information (in bits) of a decoded response vector R and stimulus vector S was calculated as where p(r, s) denotes the joint probability distribution function and p(r) and p(s) the marginal probability distribution functions given the considered stimulus set M (e.g., [1,2,3,4] for discrimination of the 4 stimuli).

NEIGHBORHOOD ANALYSIS
We analyzed the spatial organization of functional responses within the local neuronal network. Cell positions in 3D coordinates were obtained from the reference stacks and were used to calculate distances between all cells. Inter-cell distances were binned with 20 µm bin size to reduce the occurrences of empty bins. A consistent analysis of local network organization required equal numbers of cells in each network. Therefore, we defined two neighborhood groups of neurons composed of the four nearest neighbors of a center neuron ("nearest neighbors") or the next neighbors 5-8 ("second-nearest neighbors"). For these two groups of neighbors we evaluated the percentage of "functional clusters," defined as neighbor groups with at least one member sharing the same stimulus preference as the center neuron. As a control, we calculated the percentage of functional clusters for randomly shuffled cell positions (repeated 1000 times). To calculate the decoding performance of local clusters we grouped each cell with its nearest and second-nearest neighborhoods and obtained the percentage of correctly classified trials using the trial classification method as described above. Data are presented as mean ± standard error if not otherwise noted. Statistical significance was tested with a Student's t-test and significance level was 5% unless noted otherwise. Selectivity for individual stimuli was tested with one-way ANOVA.

3D POPULATION IMAGING OF CORTICAL RESPONSES TO VISUAL SCENE PRESENTATION
To examine the representation of visual scenes in mouse visual cortex we measured visually evoked 3D population activity in cortical layer 2/3 using two-photon calcium imaging. The stimulus set consisted of four 10 seconds movies representing different visual scenes ( Figure 1A). All scenes were presented in random order to detect stimulus-specific cortical population responses irrespective of reported influences of previous stimulus history (Nikolic et al., 2009). Somatic calcium transients in neuronal populations were measured with the calcium indicator OGB-1 using 3D laser scanning (Göbel et al., 2007) (Figure 1B; see Methods; n = 12 populations from eight mice; 70-150 neurons per population). A deconvolution-based algorithm was used to convert calcium signals into an estimated time course of neuronal spike rate (Yaksi and Friedrich, 2006). We validated this approach with simultaneous juxtacellular recording of neuronal firing patterns during 3D population imaging ( Figure 1C). Although single-spike sensitivity was not reached, the spike rates estimated by deconvolving the calcium transients fitted closely the simultaneously recorded spike patterns, filtered at the same frequency as the calcium data ( Figure 1D). In addition, the average deconvolved response matched closely the average instantaneous spike rate and the peri-stimulus time histogram (PSTH) across trials ( Figure 1E). Finally, the deconvolved calcium transients were highly correlated with the simultaneously recorded neuronal spike rate (0.74 ± 0.1; n = 7 neurons), significantly higher than the correlation of the raw fluorescence traces with the firing rate (0.43 ± 0.06; p < 0.0001). These findings indicate that the measured calcium transients represent the underlying neuronal firing patterns and that the deconvolved calcium transients are reliable estimates of neuronal spike rates. We therefore, used the estimated spike rates in the further analysis.
3D population imaging revealed highly reliable and specific stimulus-evoked activity patterns in neuronal subsets (Figure 2). For each stimulus about half of the neuronal population showed significant responses (Movie A: 55 ± 6%; Movie B: 48 ± 6%; Gratings: 55 ± 7%; Noise: 53 ± 5%; n = 1360 neurons in total). Typically, responsive neurons displayed several epochs of activation during presentation of one or multiple visual scenes, which were consistent across trials. To analyze the response specificity we calculated trial-to-trial correlations for all trial combinations with same or different stimuli, considering the neuronal responses of either the entire recorded population or individual neurons (Figures 3, 4; Methods). For entire populations, trial-to-trial correlations were computed from the respective intensity graphs and composed into a 2D matrix (Figures 3A,B). Correlations were significantly higher for same stimulus trials than for trials with different stimuli (example population in Figure 3C; pooled analysis in Figure 3D; mean correlation 0.18 ± 0.13 for n = 1891 same stimulus trial combinations, and 0.08 ± 0.09 for n = 5795 different stimulus trial combinations; ± SD; p < 0.001; t-test). Similar response specificity was observed for all visual scenes tested (mean correlation 0.21 ± 0.15, 0.14 ± 0.11, 0.21 ± 0.12, 0.16 ± 0.1 for Movie A, Movie B, Grating, and Noise, respectively). Hence, using 3D calcium imaging of layer 2/3 populations we could resolve specific neural network responses to the different presented visual scenes. Correlation analysis of single-cell response trials revealed features distinct from the population responses (Figures 4 A,B). While some neurons responded primarily during only one of the visual scenes (e.g., neuron #18 and #38) other neurons showed distinct responses to two or more stimuli (e.g., neuron #65). The diversity of responses from individual cells resulted in broader and overlapping distributions of trial-to-trial correlations for same stimulus trials and different stimulus trials ( Figure 4C; mean correlation 0.14 ± 0.39 for n = 82,120 samestimulus trial combinations and 0.04 ± 0.36 for n = 2,54,220 different stimulus trial combinations in 403 cells; ± SD; p < 0.001, t-test). We conclude that individual layer 2/3 neurons may  be recruited only by specific visual scenes, raising the question how well visual scenes can be decoded from temporal response patterns in single-neurons versus multiple neurons within the local population.

DECODING OF VISUAL SCENES
For decoding visual scenes from the recorded responses we used a nearest mean classifier to predict from individual trials, which visual stimulus had been presented (see Methods). We considered either the entire sampled network (concatenating all single-neuron response vectors) or individual neurons. For each experiment, the response vectors for all trials were assigned to the clusters representing the different visual scenes (Figure 5A). The obtained list of assigned cluster identities was compared to the actual order of the scene presentation, yielding the percentage of correctly classified trials for each experiment and stimulus. Pooled across all experiments, 84 ± 4% of the trials on average were correctly classified with similar success rates for the different visual scenes (Figure 5C; 92 ± 4%, 82 ± 7%, 86 ± 6%, and 75 ± 6% for Movie A, Movie B, Grating, and Noise, respectively; p = 0.26, ANOVA). On the individual cell level we defined "reliable responders" as cells that correctly predicted the stimulus in >50% of the trials for at least one visual scene ( Figure 5B). About half of the responding cells were such reliable responders (26 ± 5% of total number of cells) with most of them preferring only one particular visual scene (75 ± 4% of reliable responders; n = 12 populations) and only a minority showing reliable responses to multiple scenes ( Figure 5D; according to this definition example neurons #18 and #38 in Figure 4 had single-scene preference for Movie A and the Grating movie, respectively; neuron #50 reliably responded to Movies A and B; and neuron #65 responded to all four visual scenes).
Applying the network decoding scheme to the sub-networks of reliable responders resulted in similar decoding performance as the entire population (86 ± 3%; Figure 5C). In contrast, when we applied the decoding scheme to individual neurons, performance was dramatically decreased (Figure 5C). The average single-cell performance pooled for all cells was close to the 25% chance level (21 ± 3%) and pooling over only the reliable responders also resulted in a reduced decoding performance (35 ± 2% correctly classified trials). The likely explanation is that most reliable responder's preferred only one specific visual scene and thus were "blind" to the other scenes. These results show that individual neurons can be highly tuned to specific features of the visual input. However, only by observing the cooperation of several neurons within the population, increasing the likelihood to sample from distinct functional sub-networks, it is possible to decode the cortical representation of the visual scenes.

DEPENDENCE ON NETWORK SIZE AND NUMBER OF STIMULI
To inquire how the decoding performance of the local neuronal population depends on the number of discriminated visual scenes we repeated the decoding analysis, selecting trials to include only 2, 3, or 4 different visual scenes. Moreover, in order to establish the minimum number of cells that have to be considered to reach near-optimal (95%) decoding performance, we examined how the fraction of correctly classified trials depends on the number of cells included in the analysis. Figure 6A illustrates this analysis for one example population that reached 100% performance for >100 neurons. Averaged over all populations, the maximal decoding performance was independent of whether 2, 3, or 4 different visual scenes were considered ( Figure 6B; 92 ± 2%, 88 ± 3%, and 85 ± 5%, respectively; n = 12; p = 0.2; ANOVA). Similar decoding performance levels were also reached for the sub-networks of reliable responders (Figure 6B; 93 ± 2%, 89 ± 3%, and 87 ± 3% for 2, 3, or 4 different visual scenes, respectively; p = 0.3) and only 10-20 reliable responders were required to reach 95% decoding level ( Figure 6C). On the other hand, drawing cells randomly from the entire population required larger number of cells: at least 52, 64, and 69 simultaneously recorded neurons were required to reach 95% decoding performance for 2, 3, and 4 different visual scenes, respectively ( Figure 6C).
The finding that the smallest required network size depended on the number of different visual scenes to classify might simply be due to the different information content when discriminating different numbers of visual scenes. For example, chance level is 50% for 2 visual scenes whereas it is 25% for 4 different scenes. To take this difference in information content into account we also used mutual information as performance measure, a

FIGURE 4 | Response specificity of individual neurons. (A)
Single-cell trial-to-trial correlations were obtained by correlating trials of individual cell responses to visual scenes. The example shows two trials from the same cell in response to a visual stimulus (Grating). Method to calculate trial correlation matrix is shown in Figure 3.  quantity that measures the reduction in uncertainty about the presented visual scene by knowledge of a single-trial neuronal response. Mutual information increased with growing population size and reached higher levels when a larger number of scenes had to be discriminated (Figures 6D,E; 0.7 ± 0.1 bits, 1.1 ± 0.1 bits, and 1.4 ± 0.1 bits for 2, 3, or 4 different visual scenes, respectively). Similar results were obtained for populations consisting either of the entire imaged population or only of the reliable responders ( Figure 6E). Calculating the network size required to reach near-optimal (95%) mutual information in each experiment resulted in similar numbers of cells for different numbers of visual scenes (Figure 6F; 79 ± 10, 81 ± 10, and 80 ± 10 for 2, 3, and 4 visual scenes, respectively; p = 1.0, ANOVA). Selecting only the reliable responders reduced the minimal population size consistently to 21 ± 3, 23 ± 4, 23 ± 4 for 2, 3, and 4 visual scenes, respectively ( Figure 6F; p = 0.9). These findings indicate that observing the neuronal representations in about 80 randomly picked layer 2/3 neurons is required to discriminate low numbers of visual scenes with high fidelity, while around 20 neurons suffice if only the sub-network of reliable responders is considered. However, previous knowledge about the identity and location of these neurons is required to selectively collect information from this subset. Therefore, we further elaborated these findings by analyzing the spatial relationship of the reliably responding neurons.

SPATIAL ORGANIZATION OF VISUAL SCENE REPRESENTATIONS
The 3D laser scanning technique not only acquires population responses to dynamic visual stimuli, it further provides 3D spatial information about the location of the sampled neurons. Hence, we analyzed the relationship between spatial location and stimulus preference for all neurons in the imaged populations ( Figure 7A). We evaluated the abundance of pairs of neurons in the data sets having same or different stimulus preferences and compared these values to those from the same data sets with shuffled neuron positions. Relative to random, we found a significantly higher probability of neuron pairs with the same stimulus preference being in close proximity (48 ± 25% and 18 ± 8% for distances of 20 and 40 µm, respectively; n = 12; p = 0.01; t-test). For pairs of neurons with different stimulus preferences the abundance was similar to the calculated random probability (Figure 7B).
Because the number of neurons differed within local volumes defined by a fixed radius, we performed a neighborhood analysis, considering a nearest neighbors group (four closest cells) and a second-nearest neighborhood (neighbors 5-8). The average distance of each cell to its fourth nearest neighbor was 21 ± 6 µm (95th percentile: 32 µm). We examined the fine-scale spatial organization of the population responses by counting for each reliable responder the number of neurons in the neighborhood that displayed the same visual scene preference as this particular neuron  and compared the results to the same data set with shuffled cell positions (Figures 7B,C). The nearest neighbors group displayed a 50% higher probability of comprising at least one neuron with the same stimulus preference as the center neuron compared to randomly shuffled networks (Figure 7C; 47 ± 7%; n = 12 experiments; p = 0.01; t-test). No significant increase was found when considering the second-nearest neighbors group (p = 0.8). This finding indicates that there exists a certain redundancy of stimulus coding in clusters of neighboring neurons, which might deteriorate decoding performance. Indeed, calculating the mutual information of each neuron's response together with its four nearest neighbors showed a significant decrease by about 6% compared to randomly picking groups of six cells (Figure 7D; 0.53 ± 0.05 bits and 0.56 ± 0.056 bits for neighboring neurons and for random groups of cells, respectively; p = 0.001; n = 12 experiments; t-test). Again, no significant difference in decoding performance was found for the second-nearest neighbors group (p = 0.98).
Local neighbors with the same stimulus preference could still exhibit quite different temporal response profiles or, alternatively, also show increased temporal correlations. To examine this question, we compared the inter-neuron correlation coefficients for neuron pairs within the nearest neighbors group, showing either same or different stimulus preference. In addition, we analyzed neuron pairs with the same stimulus preference but located either within nearest neighbors group or further away from each other. The local neighbors with same stimulus preference displayed the highest correlation coefficients, with the mean being significantly higher compared to the two other controls (Figures 7E,F; mean correlation 0.44 ± 0.21, 0.35 ± 0.17, 0.4 ± 0.16 for local neighbors with same and different stimulus preference and far neighbors, respectively; ± SD; p < 0.001 for both comparisons with different stimulus and far neighbors controls; t-test). These results indicate that neurons within local clusters are more correlated to each other than to neurons further away even if those share the same stimulus preference. Nearby neurons in layer 2/3 of mouse visual cortex thus tends to share dynamic tuning properties.

DISCUSSION
Using 3D two-photon calcium imaging we characterized neuronal spiking activity of layer 2/3 populations in mouse visual cortex during presentations of a set of dynamic visual scenes. We found stimulus-specific response patterns in neuronal subsets with most responding neurons preferring mainly one visual scene. The presented visual scene could be well decoded from the population activity pattern, requiring only about 20 neurons when the most informative pool of reliable responders was considered. Spatial analysis furthermore suggests that within local neighborhoods there exist functional sub-networks of neurons that share similar response properties. Our findings provide novel insights into how the visual environment is dynamically represented in the local microcircuit of mouse neocortex.

3D TWO-PHOTON CALCIUM IMAGING IN VISUAL CORTEX
Several recent studies employed two-photon calcium imaging to investigate the functional microcircuit of visual cortex. Often, drifting gratings were applied as visual stimuli to map orientation tuning using standard frame imaging (Ohki et al., 2005(Ohki et al., , 2006Mrsic-Flogel et al., 2007;Sohya et al., 2007;Li et al., 2008;Kara and Boyd, 2009;Ch'ng and Reid, 2010;Kerlin et al., 2010;Runyan et al., 2010) or imaging of small volumes Kerlin et al., 2010). Here, we applied 3D laser spiral scanning (Göbel et al., 2007) to reveal dynamic response patterns simultaneously in rather large volumes. Despite the reduced temporal resolution and signal-to-noise ratio compared to electrophysiological techniques, we confirmed using simultaneous juxtacellular recordings that 3D calcium imaging faithfully resolves visually evoked neuronal firing patterns (Figure 1). 3D imaging is particularly beneficial for studying the dynamic 3D representation of specific stimuli as well as analyzing spatial functional relationships within local populations.
Here, it enabled us to reveal complete representations in local neighborhoods whereas microelectrodes sample only from very few neurons within a volume of 100 µm diameter (Henze et al., 2000). In addition, 3D imaging permitted us to identify the functionally relevant subset of reliable responders, which are generally dispersed throughout the volume, and examine their decoding performance.
While most of our current concepts of visual processing have come from experiments using artificial stimulus sets, it is important to use natural stimuli to work out how the brain processes the input it is usually confronted with Olshausen and Field, 2005). Interestingly, in cats, with their highly columnar organization of visual cortex (Ohki et al., 2006), responses to natural scenes were heterogeneous in presumed nearby neurons (Yen et al., 2007). In addition, feature detection in visual scenes is increased when these features are presented as parts of natural scenes rather than as artificial stimuli . Moreover, it has been demonstrated that temporal patterns of population activity serve well to differentiate between natural stimuli, noise, and gratings (Kayser et al., 2003). These results point to a distinct visual-coding strategy that is tuned to the dynamics of natural scenes.

FUNCTIONAL MICRO-ORGANIZATION OF MOUSE VISUAL CORTEX
We found a subset of the neurons in each population to respond reliably to visual scene stimulation (Figures 4, 5). The majority of these neurons preferred only one particular scene providing further evidence that neurons in mouse visual cortex can be highly selective to visual features (Niell and Stryker, 2008). In addition, neighboring neurons tended to share the same stimulus preference suggesting the existence of functional sub-networks. In another recent two-photon imaging study in mouse visual cortex, receptive field sub-regions were found to highly overlap even for neurons separated by several hundred microns (Smith and Hausser, 2010). Considering an average receptive field diameter of 10-14 • for pyramidal neurons in mouse visual cortex (Niell and Stryker, 2008;Gao et al., 2010;Smith and Hausser, 2010) and a cortical magnification factor of 15 µm/ • (Schuett et al., 2002), receptive fields of neurons within a cortical volume of 100 µm are also expected to be highly overlapping. It is therefore, unlikely that visual scene preference observed in our study is simply explained by different spatial locations of the receptive fields. More likely, stimulation of visual field areas surrounding the neurons' receptive fields caused modulatory influences (Vinje and Gallant, 2000;Angelucci and Bullier, 2003). This can lead to the observed decorrelation of the population responses and also to an increase in the information capacity Gallant, 2000, 2002).
Despite the seemingly random ("salt-and-pepper") organization of orientation tuning in rodent visual cortex  our spatial analysis of the local representation of visual scenes indicates a certain degree of functional clustering on the scale of ∼40 µm or of the nearest five neighboring cells (Figure 7). It has been reported that cortical neurons form mini-columns with similar widths of 20-40 µm across species (F) Average correlation coefficients between neurons are highest for neurons with preference for the same visual scene and located within local clusters. (Raghanti et al., 2010). Local neighbors within 50 µm also share a higher connectivity (Holmgren et al., 2003) making it likely that the observed clusters represent interconnected sub-networks, with neurons sharing common sensory inputs from layer 4 afferents (Yoshimura et al., 2005). To determine the connectivity scheme of the entire population a serial sectioning electronmicroscopy study would be required. While this technique has been applied to reconstruct the connectivity of few identified neurons in mouse visual cortex (Bock et al., 2011), it is still far from reconstructing the full circuitry of a hundred neurons as in our 3D populations. However, a recent study has shown that neighboring neurons with high response correlations to natural scenes are also more likely to be connected to each other (Ko et al., 2011). Indeed, we find members of local clusters with similar stimulus preferences to be more correlated to each other in their responses to visual scenes than neurons with different stimulus preferences or neurons further apart from each other (Figures 7E,F).
These findings are consistent with the idea that the local clusters observed in our study could represent interconnected sub-networks sharing similar sensory inputs and therefore, similar tuning properties. The redundancy of encoded information in such local sub-networks reduces their discriminative power to distinguish different visual scenes. Consequently, decoding of the visual scenery improves by integration over several, spatially segregated sub-networks. Such a microcircuit, where follower networks integrate inputs from several distinct sub-networks, has recently been reported and proposed for sensory feature integration (Kampa et al., 2006). Thus, 3D imaging of network responses to dynamic visual scenes suggests a population code in layer 2/3 of visual cortex, where the visual environment is represented in the spatio-temporal activation patterns of distinct neuronal sub-networks.

DECODING OF VISUAL SCENES
Our results show that cortical neurons can express diverse tuning properties in response to dynamic visual scenes. This is, to our knowledge, the first study investigating the decoding properties of complete and unbiased local populations using dynamic naturalistic visual stimuli. Even though visual scene-evoked activity was distributed and sparse, we found that population responses were specific and reliable so that in more than 80% of the trials activity patterns could be correctly assigned to one of the four presented visual scenes (Figure 5). Such high level of decoding could be achieved with population sizes of about 50-70 neurons from the total pool and 10-20 neurons from the pool of reliably responding neurons depending on whether 2, 3, or 4 different visual scenes were discriminated (Figure 6). Interestingly, correcting for the difference in information obtained from discrimination of different numbers stimuli led to a similar required population size of ∼80 neurons or ∼20 reliable responders for all numbers of visual scenes (Figure 6). This implies that the information provided by each neuron is independent of the number of discriminated stimuli, which can be explained by the fact that most neurons encode only one particular visual scene. It should, however, be noted that this study is not exhaustive in the sample set of different visual scenes or in their duration. Presentations of Frontiers in Neural Circuits www.frontiersin.org longer movies would also increase the diversity of presented stimulus features and therefore, the probability of neurons to respond. Nonetheless, similar decoding levels have been reached in primate visual cortex during a face discrimination task using comparable numbers of selected reliably responding neurons (Rolls et al., 1997). Interestingly, our observed minimal population size is also similar to reports of required network sizes for motor movement prediction (Lebedev et al., 2008). These results and the high success rate of single-trial classification further confirm the fidelity of our 3D imaging technique. Consequently, with 3D imaging, we have the capacity to resolve the representation of visual scenes in local microcircuits.

FUTURE DIRECTIONS
A number of opportunities exist to reveal further details of the functional organization of cortical microcircuits. For example, using a novel high-speed imaging technique, we demonstrated precise reconstruction of the sparse activation of visual cortex neurons during natural movie presentation (Grewe et al., 2010). A further extension of this method to 3D (Cheng et al., 2011;Grewe et al., 2011) should make it possible to reveal 3D activation patterns at higher temporal resolution. Second, discrimination of neuronal subclasses is desirable (Ascoli et al., 2008), and is becoming possible through post-hoc immunostaining methods . Third, genetically encoded calcium indicators have become nearly as sensitive as synthetic indicators (Lutcke et al., 2010), and enable chronic recordings from the same neuronal populations over weeks to months (Mank et al., 2008;Tian et al., 2009), to explore the effect of behavior and attention on the cortical representation of visual scenes . Together, two-photon imaging of local representations of dynamic natural scenes in mouse visual cortex is a powerful approach to study the function of visual cortex in a realistic context.