Abstract
Social visual processing in vertebrates employs sophisticated neural mechanisms ranging from categorical face cells to distributed sparse coding systems. In primates, recent evidence supports a “tuning landscape” model where neurons signal distances to prototypes in high-dimensional space rather than functioning as simple category detectors. However, social visual processing in non-mammalian animals remains poorly understood. We recorded single-unit activity from three functionally distinct pigeon brain regions—mesopallium ventrolaterale (MVL), visual Wulst, and nidopallium caudolaterale (NCL)—while birds viewed dynamic videos of conspecifics and control shapes performing courtship, eating, flying, and walking behaviors. Despite finding visually responsive neurons in all regions, we observed no categorical distinction between conspecific and control stimuli. Instead, population analyses revealed discrete temporal modulations corresponding to specific motion features—bowing, wing-flapping, head-bobbing—suggesting feature-based rather than categorical encoding of visual information. Sound-modulated visual units were significantly more prevalent in MVL than Wulst, indicating earlier multimodal integration in the tectofugal pathway than previously recognized. The absence of differential responses in NCL during passive viewing, contrasting with clear modulation in visual areas, suggests that this region is less involved in the automatic analysis of visual features. These findings suggest that avian visual structures use sparse coding principles that are similar to the visual cortex, where populations encode specific features through coordinated but brief neural responses rather than sustained categorical signals.
Introduction
Conspecific recognition represents a fundamental challenge in visual neuroscience, requiring rapid discrimination of socially relevant stimuli in the environment. Across vertebrate taxa, this capacity supports the acquisition of spatiotemporal information that is a prerequisite for the animal to understand and respond to changes in its surroundings (; ).
The neural mechanisms underlying social recognition have been extensively studied in primates, which has revealed that the processing strategies used by neurons extend beyond simple categorical coding of visual stimuli. Early investigations of the primate inferior temporal cortex identified face-selective neurons (Perrett et al., 1982; ), organized in discrete patches (Tsao et al., 2006), which lead to influential face cell models of social visual processing. However, the application of information theory has consistently demonstrated that sensory neurons in the higher visual cortex employ distributed sparse coding mechanisms rather than simple categorical responses, supporting monkeys’ exceptional capabilities in object discrimination (Rolls and Tovee, 1995).
Recent theoretical advances have unified sparse coding principles within a broader “tuning landscape” framework (Ponce et al., 2019; Wang and Ponce, 2022). Rather than functioning as categorical detectors, visual sensory neurons signal distances to “prototypes”—specific combinations of visual features within high-dimensional representational space. The model bridges visual encoding mechanisms with hippocampal cognitive mapping theories, where place cells encode locations in multidimensional task-relevant spaces (). The framework predicts that responses that appear to be categorical, a priori, actually emerge from the population-level dynamics in feature space rather than from individual neurons encoding abstract categories of visual stimuli.
Dynamic social stimuli are used to reveal additional mechanisms in population coding beyond those that are detectable using presentation of static images. demonstrated that during naturalistic movie viewing, macaque face patch neurons exhibit a remarkable diversity of selectivity, with spatially neighboring cells responding selectively to distinct aspects of social scenes. These included facial identity, proximity relationships, and body movements. Individual neurons maintained consistent responses across repeated presentations while encoding different dynamic features of social interactions, suggesting that natural social processing employs population codes that are more sophisticated than those apparent during the static presentation of images.
The avian visual system provides a compelling comparative framework for understanding the neural coding principles that are used to interpret conspecific behavior across vertebrate evolution. Pigeons possess sophisticated visual capabilities that support complex social behaviors such as social learning, individual recognition and the formation of social hierarchies (Nagy et al., 2013; ; ). Behavioral investigations demonstrate successful conspecific categorisation from photographs and videos, including discrimination between familiar and unfamiliar individuals (Shimizu, 1998; Nakamura et al., 2003; Ware et al., 2015; Wilkinson et al., 2010).
Birds have two major visual pathways like those found in mammals (; Shimizu and Bowers, 1999). The tectofugal pathway is functionally analogous to the mammalian colliculo-pulvinar-cortical pathway (; ). Previous studies using static stimuli in operant categorization tasks revealed that a population of neurons in the mesopallium ventrolaterale (MVL), a higher center of the tectofugal pathway, supports the decoding of different categories of visual stimuli with diverse features (; ,). The visual Wulst in the thalamofugal pathway is comparable with the striate visual cortex () and is believed to mainly process simple features such as orientation in pigeons’ lateral surroundings (Ng et al., 2010).
In social interactions, the richness of sensory cues at a pigeon’s disposal is much higher than what static stimuli can provide. There is a possibility that the pigeon’s ability to identify a conspecific relies on a wide array of visual features, but also on other sensory cues, such as sound. We recorded single-unit neural responses in pigeons, from three different regions: MVL, along the tectofugal pathway, the visual Wulst along the thalamofugal pathway and the nidopallium caudolaterale (NCL), the equivalent structure to the mammalian prefrontal cortex (; ; Nieder, 2017), while head-fixed and body-restrained birds watched videos of conspecifics performing four different behaviors: courtship, eating, flying, walking. The videos were corrected for low-level features to ensure that any selectivity was not due to differences in luminance or spatial frequency. By using dynamic stimuli with natural sound accompaniment, we observed that the population coding of dynamic social information during passive fixation is most consistent with a sparse code for visual feature information across the recorded visual brain regions.
Materials and methods
Subjects
The study used eleven pigeons (Columba livia) of undetermined sex, as subjects. The birds were housed in individual cages, in a colony room maintained at 20°C, with a light-dark cycle of 12 h. During data collection, the pigeons were kept at their free-feeding weight, having unrestricted access to a mixture of grit, wheat, sunflower seeds, peas and corn. Water intake was controlled such that access to water was removed for 12 h prior to an experimental session.
All experimental, animal handling and housing procedures were carried out in accordance with the University of Otago’s Code of Ethical Conduct for the Manipulation of Animals and approved by the Animal Ethics Committee of the same entity.
Experimental apparatus
The experiment was performed with head-restrained and body-restrained pigeons. A custom-made metal frame was built to accommodate the fixation of the animal and the administration of water. The apparatus contained a water receptacle that, connected to a pump, filled and emptied automatically (Figure 1C). The pump was controlled using custom code, written in MATLAB, jointly with the OTBR (Otto and Rose, 2023) and Psychophysics toolboxes ().
FIGURE 1
There were also two monitors for stimuli display: they were laterally placed at 74 cm, on either side of the pigeons’ eyes, with an azimuth that centered the images on the optic axis (Nalbach et al., 1990) of the pigeon. The monitors were gamma corrected.
Task/visual stimulation
Head-restrained and body-restrained pigeons passively viewed a prepared set of videos that displayed pigeons, or a control shape, performing the following behaviors: courtship, eating, flying and walking. The sequence of videos was pseudorandomized, guaranteeing that the same behavioral category wasn’t displayed consecutively more than three times. Each video was anteceded by a six second inter-trial interval, and was shown six times per session, three times on the left monitor and three times on the right monitor. In total, there were 432 trials, and a session took approximately 75 min to complete. A random 10% of the trials were followed by 2.4 s of access to water (Figure 1B).
Stimuli
The stimulus set of this experiment is composed of videos, recorded in the lab (GoPro Hero 4 session), with durations ranging from 2.2 to 4 s, depicting four different behaviors: courtship, eating, flying and walking (Figure 1). Within each category, four different videos were created to exemplify the same behavior, such that there is some variation in the way it is represented. Although all videos were recorded in the same aviary enclosure, which gave them a very similar background and illumination, they were recorded so that the same action was performed in different places on screen. Occasionally, different camera angles were used.
For each video, we created a corresponding control video in which the background was preserved but the motion that characterizes the targeted behavior was performed by a 2D shape, called a greeble (). The greebles were edited to be the same size as the pigeons and were given the dominant color of a background subtracted frame of one of the pigeon videos. That is, the color of the greeble was the average color of the pigeon. All control videos were edited using Adobe After Effects ().
Additionally, the stimulus set included four auxiliary videos, showing two pigeons engaging in no specific behavior. The two-pigeon stimulus set was included to disambiguate whether neuronal responses to courtship behavior represented courtship behavior or the presence of two animals on screen.
Furthermore, each video had a version where the natural noise, produced with the recorded behavior, was preserved, and a version in which the video was silent. Hence the developed stimulus set consisted of 72 videos, out of which 36 were visually unique.
Processing of stimuli
An important aspect to consider when comparing the neural responses to stimuli ascribed to different categories is whether the observed differences are driven by low-level stimuli features, such as luminance and spatial frequency (,). Therefore, we averaged the luminance values across all frames of all pigeon videos. Equalization was performed separately for the background and the subject. Subsequently, the average values were applied to the control (greeble) videos. The spatial frequency was equalized across all videos (pigeons and controls) but without changing the individual orientation (angle of the light and dark bars, obtained from the fast Fourier transform of the image) of each frame (Figure 2; Willenbockel et al., 2010). We chose this type of equalization to preserve the frame’s natural look and minimize any visual aberrations when they were parsed together to compile the video. Each frame was converted to the HSV color space and the low-level feature transformations were applied to the value layer (). The correction of luminance and spatial frequency were done using MATLAB (Mathworks Inc., Natick, MA United States) and the SHINE toolbox (Willenbockel et al., 2010, with some code adapted from the SHINE_color toolbox; ). All videos were captured and displayed at 30 frames per second (a subset of the stimuli is provided in Supplementary material).
FIGURE 2
Lastly, we quantified how visually dissimilar each video was from its control, on a frame-by-frame basis. This way, we created an expectation of which moments of the video could drive an accentuated difference in neuronal responding, purely by the difference the pixel space occupied by the pigeon and the greeble on the image (further details and results in Supplementary material: frame-by-frame correlation).
Surgery
We performed stereotactic surgery to implant one microdrive, with a probe of eight 25 μm formvar-coated nichrome wires, per animal. For NCL [anteroposterior (AP), 5.5; mediolateral (ML), (±) 7.5], two pigeons (K4, X27) were implanted on the left hemisphere and one (K6) on the right hemisphere. For MVL [AP 10.5; ML (±) 6], two pigeons (K8, JIM) were implanted on the left hemisphere and one (PAM) on the right and. For the visual Wulst [AP 11; ML (±) 3], two pigeons (C1, O4) were implanted on the left hemisphere and three (O9, O8, X15) on right hemisphere. The coordinates for implantation followed histological and inactivation studies (; ).
The birds were anesthetized with a mixture of Ketamine (30 mg/kg) and Xylazine (6 mg/kg), administered intramuscularly. A topical anesthetic (10% Xylocaine) was applied to the scalp and an incision was made to expose the skull. Subsequently, a small craniotomy was made over the target structure, and the dura matter was removed. For each bird, a microdrive was installed over one of the target regions: NCL, MVL or Wulst. The tips of the wire bundles were lowered until the beginning of the target region during surgery. To support the implant, stainless steel screws were inserted into the skull and one of them also served as a ground screw. The microdrive was secured to the skull and to the screws with dental acrylic (Jet Acrylic, Lang Dental). Around the microdrive, we added a short metal post, that was used to attach the animal to the frame of the experimental apparatus. In the days following the surgery, the animals were given analgesics and allowed to recover. On the first 3 days of recovery, for pain control, the animals received an intramuscular injection of Carprieve (4 mg/kg) and the wound margin was sprayed with 10% Xylocaine.
Neuronal recordings
We used probes of eight wires to record extracellular single-neuron activity. Before each recording session, the probe was checked for neuronal signals: pairs of two wires were tested to assess which combination of recording wire and local reference yielded the best signal isolation. If the single-to-noise ratio of the isolated unit was at least 2:1, the unit was considered for recording. The subject was not submitted to any task related visual stimulation, during this process.
Sampling at 20,000 Hz, the signal was amplified using a Grass P511K amplifier (Grass Instruments, Quincy, MA, United States) and a 5 Hz notch filter was applied. The signal was digitized using a CED (Cambridge Electronic Design, Cambridge, United Kingdom) interface. Jointly with Spike2 software, spike sorting was done using the system’s template matching capabilities.
The experimental paradigm ran on a separate computer that sent triggers of relevant events to the CED system, which allowed the alignment of the visualization paradigm with the neural data.
After each recording session, the probe was advanced 42 μm and the animal was returned to its home cage. In case it wasn’t possible to isolate a unit that satisfied the inclusion criteria, the probe was advanced 21 μm and the session didn’t take place.
Statistics and reproducibility
In this experiment, we analyzed single-cell activity in three regions of the pigeon brain (Wulst, MVL, NCL), during the passive observation of videos, depicting pigeons performing different behaviors and their respective controls. We were interested in quantifying neuronal responses, of each one of the regions, to video features shared amongst multiple stimuli. For instance, we wanted to assess neural modulation congruent with the presentation of a video where a specific behavior is performed by a pigeon. It is possible to argue that the features we expect to drive neuronal responses are nested within each video. Considering the nested structure of the stimuli and the repeated measurements of the same neuron, we used several generalized linear mixed models (GLMMs) to assess selectivity, at the single-neuron and population level. When interpreting the models, statistical significance was considered at a level of α = 0.05.
For all neurons, only the trials where the videos were shown on the screen contralateral to the pigeon’s implanted hemisphere were considered for analysis, due to the almost complete decussation of the optic nerve in this species and its impact in the ascending visual projections (). Therefore, for a given neuron, we recorded its activity to the same visual stimulus six times: three presentations of the video with sound and three without sound. Data analysis was performed in R (R Core Team, 2024) [packages: glmmTMB, DHARMa, () emmeans ()] and MATLAB (The MathWorks Inc., 2024).
Neural data analysis
For each unit, spike counts were binned in 200 ms bins. To determine how the data of a recorded region was distributed, we fitted two null models: one with the poison distribution and the log link function, and another with the negative binomial distribution and log link function (Salinas Ruíz et al., 2023).
The model fitted with the negative binomial distribution had a better fit to the data than the Poisson model (see Results). Subsequently, we used the DHARMa package in R for residual diagnostics and zero-inflation testing. The test results informed the decision to fit all the models used in the study with the negative binomial distribution and the log link function (R package glmmTMB).
Single neuron analysis
At the single-neuron level, we tested if firing rate modulations, relative to baseline, could be attributed to the main effect of “actor” (pigeon or greeble), “behavioral category” (courtship, eating, flying or walking) or their interaction. We also measured the contribution of the auditory component of the videos by including an interaction term between “actor” and “sound” (sound On or Off). The random effects component of the model had the trial number as a random intercept: it was included to account for the possible visual adaptation to the stimuli, throughout the session.
A model was fitted to the binned spike counts of each neuron. The dataset consisted of the middle two-seconds of the baseline and the entire video presentation of every trial. The baseline of the trials of the first courtship video was set as the intercept of the m1 model.
The output of the model was used for hypothesis testing: the p-value for each parameter was calculated with the Wald Z-statistic. Furthermore, comparisons between model estimates were done through estimated marginal means (R package emmeans) and the resulting p-values were adjusted with the Bonferroni-Holm method. Neurons that had a singular fit were excluded from analysis.
From the output of the model, a unit was classified as actor-selective if the comparison of firing rate between the baseline and video presentation resulted in a significant interaction effect between, exclusively, one actor and two or more behaviors or, only, in a significant main effect for one of the actors; the classification of behavior-selective was given to units that responded significantly to a specific behavior, for both pigeons and greebles; lastly, actor-and-behavior-selective neurons were the ones that only presented a significant interaction between one behavior and one actor. Furthermore, it was possible to assess whether the presence of sound contributed to the observed firing rate modulations. We counted how many neurons fitted this classification and compared their proportions between the three regions: the comparisons were performed using Chi-Squared tests, with Holm–Bonferroni correction, or the Fisher’s Exact test, with Holm–Bonferroni correction, if the expected counts in a group were below 5.
Population analysis
In the set of neurons that were tuned to stimuli, we were interested in accessing which moments of the video drove the observed neuronal modulation. For this analysis, we fitted a model that explored the influence of the actor and the moment in the video on firing rate.
We ran the model m2 on the subset of units labeled actor selective, tuned to the actor pigeon, for every region. As in the previous test, the neural activity recorded in each trial was segmented in bins of 200 ms. We tested the same neuronal subset for each of the 16 videos (four videos for each of the behaviors) and detected where the activity of the subpopulation was significantly different from baseline. Then, multiple comparisons were performed, to assess if the predicted spike counts for the temporal intervals of the pigeon videos, in which neural activity was significantly modulated from baseline, differed between pigeons and greebles. As in the previous test, the neural activity recorded in each trial was segmented in bins of 200 ms.
It is relevant to emphasize that the random effects component of the model m2 contains a random effect for neuron identity, which accounts for differences in baseline firing for each unit. Neuron identity is nested within subjects, which accounts for subject-specific effects and prevents us from drawing conclusions based only on one animal (Yu et al., 2022).
To further assess how the dynamics of the recorded population change, through the progression of each video, a population state-space analysis was conducted per region. Conceptually, a population state-space analysis starts by representing the activity of a single neuron, at every recorded moment, as a point on a plane. Every neuron generates its own plane, which maps the population activity onto a n-dimensional space, where n corresponds to the total number of recorded neurons. If there is information encoded at the population level, the spiking activity of several neurons will covary according to a smaller set of variables than the number of recorded units. Using a principal components analysis, dimensionality reduction was performed. The first three dimensions were preserved and their projection on the data was used to define trajectories for each condition (16 videos × 2 actors). The trajectories characterize the instantaneous populational firing rate, through the progression of each video.
The population state-space analysis was performed on all task-engaged neurons (MVL n = 49; Wulst n = 74; NCL n = 49) and done separately for every recorded region. For each neuron, each trial’s data was binned in 200 ms bins, with a sliding window of 40 ms. It was necessary to average the activity across trials of the same condition and temporally smooth the data, to minimize fluctuations originating in random spiking (). Therefore, the data was averaged per condition (video × actor) and further smoothed with a 200 ms Gaussian kernel. Neural activity was also z-scored: for each neuron, the neuron’s average baseline spike count (calculated over a 2 s period, in the middle of the 6 s baseline) was subtracted to the averaged trial activity. The difference was subsequently divided by the standard deviation the baseline spike count. The last data processing step was to reduce the baseline to only 600 ms, for purposes of readability of the neural trajectories.
We build an activity matrix that concatenated the time resolved activity of all neurons, in each condition: to guarantee that we compare the same principal components across all conditions, the temporally resolved response of all neurons, to all combinations of video and actor, was considered pseudo-simultaneous (; Ott and Nieder, 2016).
From the PCA, we obtained the three first principal components and calculated the trajectories for each condition. The assessment of the trajectories of each video allowed us to observe differences in population dynamics when the same video had pigeons or greebles on screen: these differences were quantified by the calculation of the Euclidean distances between the trajectories generated by the same video, when the actor was a pigeon or a greeble.
Histology and electrode track reconstruction
At the end of the experiment the pigeons were euthanized with CO2 by placing them in a 450 mm (Length) × 340 mm (Width) × 300 mm (Depth) chamber, with a CO2 delivery rate of 16 L/min, an amount that equated to displacing 35% of the chamber volume per minute. The CO2 was left on for 5 min, and for at least 2 min beyond the point where the animal stopped breathing. The animal was then immediately perfused with 10% formalin in physiological saline. The brains were removed from the skull and kept in 10% formalin for at least 5 days, followed by sucrose formalin (10% formalin, 30% sucrose), and allowed to sink twice. The brains were frozen and sectioned at 40 μm sections with every 10th section mounted and stained with thionin.
Results
Electrode positions
All electrode tracks were within the borders of the targeted NCL, MVL, and Wulst regions as defined by and . For NCL, the intended track positions were AP +5.5 and ML ± 7.5. The track positions for two NCL birds (K6, X27) were as intended. The track position for K4 was AP +6.0, ML +7.0, differing from the intended AP and ML positions by 0.5 mm.
For MVL, the intended track positions were AP +10.5 and ML ± 6.0. The track positions for two MVL birds (JIM, PAM) were as intended. The position for K8 were AP +10.5, ML ± 5.5, differing only from the intended ML position by 0.5 mm.
For Wulst the intended track positions were AP +11.0 and ML ± 3.0. The track positions for three Wulst birds (C1, O4, X15) were as intended. The position for the O8 and O9 were AP +10.75, ML ± 3.0, differing only from the intended AP positions by 0.25 mm (Figure 3).
FIGURE 3
Single-unit analysis
In this experiment, we recorded neural activity in three regions of the pigeon brain (Wulst, MVL and NCL) during the passive visualization of videos depicting conspecifics or control shapes performing the following behaviors: courtship, eating, flying and walking. Half of the video presentations, of the pigeon videos as well as the control shapes, included the sound of the pigeon videos.
We recorded and analyzed 103 units in MVL, 123 in NCL and 213 units in Wulst. Data analysis began with the assessment of the data distribution of the recorded neuronal populations. For all regions, the negative binomial distribution had a better fit to the data (lower AIC) than the Poisson distribution (log likelihood ratio test between models for the MVL data: χ2(1) = 22042,p < 0.0001; NCL: χ2(1) < 22853,p < 0.0001; Wulst: χ2(1) = 9808.9,p < 0.0001). Furthermore, the negative binomial null models were not zero-inflated (MVL: ratio = 1.0067, p = 0.79; NCL: ratio = 1.0183, p = 0.38; Wulst: ratio = 1.0014, p = 0.88) and the residuals fitted the assumptions of the distribution.
We first assessed if the spike counts of each neuron were significantly modulated, compared to baseline, for the fixed effects of behavior (courtship, eating, flying and walking, and non-interactive two pigeons), actor (the video depicted a pigeon or a greeble) and the interaction between them. Additionally, we tested the interaction between the presence of sound and the actor on screen (GLMM per neuron: m1, described in Methods).
In MVL, the activity of 65 out of 103 recorded neurons (63%) was task modulated. In this dataset, 15 units (23%) were categorically tuned to the presence of the actor: 7 (11%) units displayed the most amount of modulation when a pigeon was on screen; 5 (8%) were selective for the greebles, and 3 (5%) units had an excited response for the pigeon videos but an inhibited one for the greeble videos. Additionally, 2 of the units that coded for the pigeon were sound modulated: firing rates differed significantly between the trials in which the pigeon videos were displayed with sound on, compared to the same videos without sound, and the greeble videos with and without sound (Figure 4).
FIGURE 4
Moreover, 5 (8%) units were significantly modulated by one specific behavior, for both pigeons and greebles. 21 units (32%) were modulated by a specific behavior, performed only by one actor. We also found one unit that was selective for more than one behavior and 7 units that were predominately selective for the two pigeon controls. The remaining units were visually selective but not modulated by the content of the videos. Between the behaviorally modulated units and the exclusively visually selective units, 12 units presented a significant interaction between actor and sound and differentiated between the sound on and sound off presentations of the pigeon or greeble videos.
We applied the same categorical analysis (model m1) to the 213 units recorded in Wulst. We found 91 task modulated units. The activity observed was best explained by the presence of a pigeon on screen in 13 units (14%). Other 11 units (12%) fulfilled the criteria of having a significant interaction between the greeble and at least 2 behaviors or exclusively having a significant main effect for the actor greeble. 9 units (10%) were classified as behavior selective. Furthermore, 28 units (31%) were selective for a specific behavior performed only by one actor. We also found 7 (8%) units predominantly modulated by the two-pigeon stimulus set and 3 (3%) that responded to two behaviors. The remaining units had a generalized visual response to all stimulus categories. In this region, there were also 4 (4%) sound modulated units, with one of them being selective for the actor pigeon.
Lastly, we recorded 123 units in NCL. 60 (49%) neurons were task modulated. This dataset had 5 (8%) units in which the presence of a pigeon on screen drove the most amount of neural modulation, and other 5 (8%) units that were modulated by the greebles. We also found 9 (15%) units that encoded one specific behavior for both actors. There were also 3 (5%) units that encoded two behaviors and 3 (5%) other units that encoded two behaviors, each one performed by a different actor. 15 (25%) units encoded a specific behavior performed only by one actor. The remaining units have a significant main effect for both the pigeon and the greeble and no further differentiation amongst the behavioral conditions. That is, they are modulated by the presence of a stimulus in a way that is not specific to the content of the video. They were interpreted as visually selective. Regarding sound modulation, we found 7 (12%) units that were sound modulated, but none belonged to the ones labeled as pigeon selective.
The proportion of units (Figure 5) classified as actor selective did not differ between the three recorded regions [χ2(2) = 2.34, p = 0.3], and the same was the case for the proportion of units classified as behavior selective [χ2(2) = 1.54, p = 0.46]. However, there were differences regarding the proportion of sound modulated units: MVL had a higher proportion of units classified as sound modulated compared to Wulst [χ2(1) = 17.7, adjustedp-value = 0.0001]. No significant differences were found between NCL and MVL [χ2(1) = 4.15, adjustedp-value = 0.08] or, NCL and Wulst (Fisher′sExacttest:adjustedp-value = 0.11).
FIGURE 5
Population analysis
Although only a small percentage of neurons were considered pigeon selective, we were interested in assessing which moments in the video drove the most amount of modulation. We fitted the subpopulations of pigeon-responsive neurons, grouped by region, with model m2 (see Methods). For every video, the 200 ms bins in which the populational activity significantly differed from the populational baseline were identified. Subsequently, in each of these bins, the activity recorded during the presentation of a pigeon video and the respective greeble control was compared.
In Wulst and MVL, the populations diverged from their respective baselines, for at least a period of 200 ms, in most videos (MVL: 11 out of 16; Wulst: 12 out of 16). In the videos depicting courtship or eating behaviors, every time bin that significantly differed from baseline corresponded to frames where the animal was performing a bowing motion or walking (Figure 6). During flying videos, the significant moments depicted wing flapping and, in one instance for both MVL and Wulst, small body movements while the animal was stationary. In walking videos, all relevant bins corresponded to the animal moving forward (relative to its original trajectory).
FIGURE 6
In MVL, 2 out of 11 videos had bins where there was a significant difference of activity between the actor pigeon, and the actor greeble. Both instances depicted the animal walking. In Wulst, 9 out of 12 videos contained bins where activity significantly differed from baseline and differed between the pigeon and the greeble presentations of the video.
In NCL, only 5 out of 16 videos had bins where the population activity was significantly modulated from the populational baseline. Furthermore, the identified bins did not differ between the pigeon and the greeble presentations.
Lastly, we analyzed the populational dynamics of the representations, generated by each stimulus, with a population state-space analysis. The aim of this analysis was to identify patterns of activity that differ between the representation of the pigeons and greebles (see Methods for details). The trajectories (Figure 7a) describe instantaneous activation patterns, learned from the data (Pang et al., 2016): they represent the progression of the populational response, during 600 ms of baseline and the entire video presentation, for each condition (video x actor). The trajectories were projections of the first three principal components on the data. The first three principal components contain 17% of the populational variance in MVL, 13% in Wulst and 11% in NCL (Supplementary Figure 2).
FIGURE 7
In MVL and Wulst, it is possible to observe a destabilization of the population baseline activity, for every behavioral category (Figure 7). Within each category, the videos differ regarding how much separation there is between the trajectories of the actor pigeon and the actor greeble. We quantified the differences in trajectories by measuring the Euclidean distance between them, at every timepoint (Figure 7b). The moments in which the neuronal populations distinguish between the pigeons and the greebles occur in brief periods, not in a sustained way: the distances between trajectories return to values like the ones observed during the baseline after a visible peak.
For each video, we identified the two time points with the largest Euclidian distance between trajectories. During courtship videos, the biggest differences between activity related to pigeons and greebles were found while the pigeon performed bowing motions and walking. During eating videos, it coincided with bowing motions, pecking, mandibulation and walking. In the flying videos, the differences corresponded to two distinct types of behavior. On one hand, moments where the pigeon was flapping its wings and, therefore, creating a lot of movement on screen, but also in moments where the animal was stationary and only performing slight body shifts, such as small head movements. Lastly, in the walking videos, the differences between pigeons and greebles corresponded to the coordinated motion of stepping and head-bobbing. All behaviors were represented in the modulation of the population of every region.
Discussion
We recorded single-unit responses in three different regions of the pigeon brain (MVL, Wulst, and NCL) while head-fixed birds watched videos of conspecifics and control shapes performing four different behaviors: courtship, eating, flying, and walking. We aimed to assess if there is specific neuronal coding for conspecifics at different structures of the pigeon’s visual system when exposed to dynamic, naturalistic stimuli.
In each region, we found a small proportion of units that were modulated by the pigeon videos. However, we did not find a difference between the regions regarding the proportions of units that responded to the actors or the behaviors observable in the videos. The response patterns we observed were neither in line with primate face-cell coding in terms of selectivity to a particular body feature like the head (), nor did they display an invariant, abstract response to social content (). This lack of a simple, categorical “conspecific” code suggests a different processing strategy in the pigeon brain. Our findings are consistent with a tuning landscape model (Ponce et al., 2019; Wang and Ponce, 2022) of visual processing, like that described in the primate ventral stream. In this framework, neurons in the higher centers of the pigeon visual system are not simple category detectors, but are better viewed as signaling distances to specific, preferred combinations of visual attributes within a high-dimensional feature space.
Our population analysis supports our interpretation. We found that population activity in the visual areas (MVL and Wulst) showed modulations in discrete moments of the video, corresponding to specific motions like bowing, wing flapping, and head-bobbing. We interpret these momentary differences as the points where the pigeon video trajectory and the greeble video trajectory diverged on this shared, high-dimensional landscape. The greeble, by not emulating the species-specific motion kinematics, may have simply traced a different path through this feature space, resulting in a momentarily separable neural state. We believe that the distinctions in encoding that we observed was not a categorical social signal but rather a discrepancy of visual features over time (see Supplementary Figure 1). To verify this possibility, future work should model the kinematics of the videos specifically (e.g., head angular velocity, body displacement) to look for modulation of population activity that aligns with changes in these kinematic variables during the videos.
The absence of a clear difference in the NCL subpopulation reinforces our view that visual areas employ sparse coding of visual stimuli. NCL, the avian prefrontal equivalent, is known to encode task-relevant visual information for active categorization (; ). Its lack of any differential engagement during this passive task—while visual areas (MVL/Wulst) showed feature-driven modulation—is consistent with a functional division between visual feature processing and higher-order cognitive engagement in NCL. The present experiment did not require an explicit categorization of the stimuli during passive fixation, as is often the case in operant chamber tasks (Scarf et al., 2016; ; ,; ).
A separate finding was the presence of sound-modulated visual units, with a greater proportion in MVL compared to Wulst. This points to the tectofugal pathway (which includes MVL) as a site for audio-visual integration, building on earlier reports of auditory responses in the pigeon optic tectum (). This integration may be important for behavior, as other studies have shown that pigeons use vocalizations to aid in visual categorization (Partan et al., 2005). While the proportion of such neurons in our dataset was small, their presence in MVL suggests multimodal integration may occur early in the avian hierarchy of sensory processing. Sound modulated neurons that are primarily visually responsive are, in fact, found in the primate anterior fundus face patch (). These face selective neurons display enhanced responses to faces of conspecifics when the stimuli were accompanied by the conspecific’s vocalization. In the present experiment, although we find neurons that modulate their response to videos of pigeons with sound, their proportion was quite low as part of the population we sampled, making it difficult to know if these represent a genuine feature of the MVL.
A known problem of experiments that use videos to study the processing of social stimuli in pigeons is the color display (Ware et al., 2015). Pigeons have tetrachromatic color vision, making it likely that colored videos of conspecifics, displayed on monitors optimized for the human visual system, do not look very realistic. Nonetheless, studies that assessed pigeons’ behavioral responses towards videos of conspecifics, displayed without the UV component, report that animals maintain natural responses towards them (Shimizu, 1998; Ware et al., 2015; Ware et al., 2017). Therefore, pigeons are still able to extract relevant social information from video stimuli, displayed with suboptimal colors. At the neuronal level, sensitivity to colors in the UV range has been reported in Wulst (Nimpf et al., 2024). We expect videos of social scenes with naturalistic color information to be represented in MVL and Wulst similarly to the ones used in the present experiment, where video features are encoded at the population level. Nonetheless, we would expect a higher response range from the individual neurons in these regions, evoked by more naturalistic color information, that reflect the distance to the neuron’s preferred “prototype.” Taken together, the findings of the experiment suggest that the pigeon visual system, much like the primate visual cortex, does not employ a simple, invariant code for “conspecifics.” Instead, our findings support a high-dimensional tuning landscape model, where populations in MVL and Wulst encode specific features that change over time in dynamic videos.
Statements
Data availability statement
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.
Ethics statement
The animal study was approved by the University of Otago Animal Ethics Committee. The study was conducted in accordance with the local legislation and institutional requirements.
Author contributions
SS: Software, Methodology, Visualization, Data curation, Formal analysis, Investigation, Conceptualization, Writing – original draft. DB: Investigation, Writing – review & editing. PH: Writing – review & editing, Investigation. WC: Writing – review & editing, Methodology, Software, Conceptualization. JR: Supervision, Project administration, Writing – review & editing. MC: Methodology, Conceptualization, Supervision, Project administration, Writing – review & editing, Funding acquisition.
Funding
The author(s) declared that financial support was received for this work and/or its publication. This work was supported by the Royal Society of New Zealand Marsden Fund Grant to Michael Colombo (Royal Society of New Zealand Marsden grant UOO1917).
Conflict of interest
The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The author(s) declared that generative AI was not used in the creation of this manuscript.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Supplementary material
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fnbeh.2026.1736261/full#supplementary-material
References
1
Adobe Inc. (2023). Adobe After Effects.San Jose CA: Adobe Inc.
2
AronovD.NeversR.TankD. W. (2017). Mapping of a non-spatial dimension by the hippocampal–entorhinal circuit.Nature543719–722. 10.1038/nature21692
3
AziziA. H.PuschR.KoenenC.KlattS.BrökerF.ThieleS.et al (2019). Emerging category representation in the visual forebrain hierarchy of pigeons (Columba livia).Behav. Brain Res.356423–434. 10.1016/j.bbr.2018.05.014
4
BischofH.-J.EckmeierD.KearyN.LöwelS.MayerU.MichaelN. (2016). Multiple visual field representations in the visual wulst of a laterally eyed bird, the zebra finch (Taeniopygia guttata).PLoS One11:e0154927. 10.1371/journal.pone.0154927
5
BouchardJ.GoodyerW.LefebvreL. (2007). Social learning and innovation are positively correlated in pigeons (Columba livia).Anim. Cogn.10259–266. 10.1007/s10071-006-0064-1
6
BrainardD. H. (1997). The psychophysics toolbox.Spatial Vis.10433–436. 10.1163/156856897X00357
7
ClarkW. J.PorterB.ColomboM. (2019). Searching for face-category representation in the avian visual forebrain.Front. Physiol.10:140. 10.3389/fphys.2019.00140
8
ClarkW.ChilcottM.ColomboM. (2022b). The effect of progressive image scrambling on neuronal responses at three stations of the pigeon tectofugal pathway.Sci. Rep.12:14190. 10.1038/s41598-022-18006-0
9
ClarkW.ChilcottM.AziziA.PuschR.PerryK.ColomboM. (2022a). Neurons in the pigeon visual network discriminate between faces, scrambled faces, and sine grating images.Sci. Rep.12:589. 10.1038/s41598-021-04559-z
10
CunninghamJ. P.YuB. M. (2014). Dimensionality reduction for large-scale neural recordings.Nat. Neurosci.171500–1509. 10.1038/nn.3776
11
Dal BenR. (2023). SHINE_color: Controlling low-level properties of colorful images.MethodsX11:102377. 10.1016/j.mex.2023.102377
12
DelacouxM.ItaharaA.KanoF. (2025). Gaze following in pigeons increases with the number of demonstrators.iScience28:112857. 10.1016/j.isci.2025.112857
13
DesimoneR.AlbrightT.GrossC.BruceC. (1984). Stimulus-selective properties of inferior temporal neurons in the macaque.J. Neurosci.42051–2062. 10.1523/JNEUROSCI.04-08-02051.1984
14
DittrichW. H.LeaS. E. G.BarrettJ.GurrP. R. (1998). Categorization of natural movements by pigeons: Visual concept discrimination and biological motion.J. Exp. Anal. Behav.70281–299. 10.1901/jeab.1998.70-281
15
DitzH. M.NiederA. (2020). Format-dependent and format-independent representation of sequential and simultaneous numerosity in the crow endbrain.Nat. Commun.11:686. 10.1038/s41467-020-14519-2
16
DivacI.MogensenJ.BjörklundA. (1985). The prefrontal ‘cortex’ in the pigeon. Biochemical evidence.Brain Res.332365–368. 10.1016/0006-8993(85)90606-7
17
FrostB. J. (2010). A taxonomy of different forms of visual motion detection and their underlying neural mechanisms.Brain Behav. Evol.75218–235. 10.1159/000314284
18
GauthierI.WilliamsP.TarrM. J.TanakaJ. (1998). Training ‘greeble’ experts: A framework for studying expert object recognition processes. Vis. Res.38, 2401–2428. 10.1016/S0042-6989(97)00442-2
19
GreerD.LeiT.KryshtalA.JessenZ. F.SchwartzG. W. (2025). Visual identification of conspecifics shapes social behavior in mice.Curr. Biol.35287–299.e4. 10.1016/j.cub.2024.11.041
20
GüntürkünO. (2005). The avian ‘prefrontal cortex’ and cognition.Curr. Opin. Neurobiol.15686–693. 10.1016/j.conb.2005.10.003
21
GüntürkünO.BugnyarT. (2016). Cognition without cortex.Trends Cogn. Sci.20291–303. 10.1016/j.tics.2016.02.001
22
GüntürkünO.HahmannU. (1999). Functional subdivisions of the ascending visual pathways in the pigeon.Behav. Brain Res.98193–201. 10.1016/S0166-4328(98)00084-9
23
HartigF. (2024). DHARMa: Residual Diagnostics for Hierarchical (Multi-Level / Mixed) Regression Models. R package version 0.4.7. Available online at: https://CRAN.R-project.org/package=DHARMa.
24
JohnstonM.AndersonC.ColomboM. (2017). Pigeon NCL and NFL neuronal activity represents neural correlates of the sample.Behav. Neurosci.131213–219. 10.1037/bne0000198
25
KartenH. J. (1969). The organization of the avian telencephalon and some speculations on the phylogeny of the amniote telencephalon.Ann. N. Y. Acad. Sci.167164–179. 10.1111/j.1749-6632.1969.tb20442.x
26
KartenH. J.HodosW. (1967). A Stereotaxic Atlas of the Brain of the Pigeon (Columba livia).Baltimore: Johns Hopkins Press.
27
KhandhadiaA. P.MurphyA. P.RomanskiL. M.BizleyJ. K.LeopoldD. A. (2021). Audiovisual integration in macaque face patch neurons.Curr. Biol.311826–1835.e3. 10.1016/j.cub.2021.01.102 .
28
KiyokawaY.KurodaN.TakeuchiY. (2022). The strain of unfamiliar conspecifics affects stress identification in rats.Behav. Process.201:104714. 10.1016/j.beproc.2022.104714
29
KrönerS.GüntürkünO. (1999). Afferent and efferent connections of the caudolateral neostriatum in the pigeon (Columba livia): A retro- and anterograde pathway tracing study.J. Comp. Neurol.407228–260. 10.1002/(SICI)1096-9861(19990503)407:2%253C228::AID-CNE6%253E3.0.CO;2-2
30
LenthR. (2025). emmeans: Estimated Marginal Means, aka Least-Squares Means. R package version 1.11.2-8.Available online at: https://CRAN.R-project.org/package=emmeans (accessed June 3, 2025).
31
LewaldJ.DörrscheidtG. J. (1998). Spatial-tuning properties of auditory neurons in the optic tectum of the pigeon.Brain Res.790339–342. 10.1016/S0006-8993(98)00177-2
32
LiD. P.XiaoQ.WangS. R. (2007). Feedforward construction of the receptive field and orientation selectivity of visual neurons in the pigeon.Cereb. Cortex17885–893. 10.1093/cercor/bhk043
33
McMahonD. B. T.RussB. E.ElnaiemH. D.KurnikovaA. I.LeopoldD. A. (2015). Single-unit activity during natural vision: Diversity, consistency, and spatial sensitivity among AF Face patch neurons.J. Neurosci.355537–5548. 10.1523/JNEUROSCI.3825-14.2015
34
NagyM.VásárhelyiG.PettitB.Roberts-MarianiI.VicsekT.BiroD. (2013). Context-dependent hierarchies in pigeons.Proc. Natl. Acad. Sci. U.S.A.11013049–13054. 10.1073/pnas.1305552110
35
NakamuraT.CroftD. B.WestbrookR. F. (2003). Domestic pigeons (Columba livia) discriminate between photographs of individual pigeons.Learn. Behav.31307–317. 10.3758/BF03195993
36
NalbachH.-O.Wolf-OberhollenzerF.KirschfeldK. (1990). The pigeon’s eye viewed through an ophthalmoscopic microscope: Orientation of retinal landmarks and significance of eye movements.Vis. Res.30529–540. 10.1016/0042-6989(90)90065-S
37
NgB. S. W.Grabska-BarwińskaA.GüntürkünO.JanckeD. (2010). Dominant vertical orientation processing without clustered maps: Early visual brain dynamics imaged with voltage-sensitive dye in the pigeon visual Wulst.J. Neurosci.306713–6725. 10.1523/JNEUROSCI.4078-09.2010
38
NiederA. (2017). Inside the corvid brain—Probing the physiology of cognition in crows.Curr. Opin. Behav. Sci.168–14. 10.1016/j.cobeha.2017.02.005
39
NimpfS.KaplanH. S.NordmannG. C.CushionT.KeaysD. A. (2024). Long-term, high-resolution in vivo calcium imaging in pigeons.Cell Reports Methods4:100711. 10.1016/j.crmeth.2024.100711
40
OttT.NiederA. (2016). Dopamine D2 Receptors Enhance Population Dynamics in Primate Prefrontal Working Memory Circuits.Cereb. Cortex274423–4435. 10.1093/cercor/bhw244
41
OttoT.RoseJ. (2023). The open toolbox for behavioral research.Behav. Res. Methods564522–4529. 10.3758/s13428-023-02199-x
42
PangR.LansdellB. J.FairhallA. L. (2016). Dimensionality reduction in neuroscience.Curr. Biol.26R656–R660. 10.1016/j.cub.2016.05.029
43
PartanS.YeldaS.PriceV.ShimizuT. (2005). Female pigeons, Columba livia, respond to multisensory audio/video playbacks of male courtship behaviour.Anim. Behav.70957–966. 10.1016/j.anbehav.2005.03.002
44
PerrettD. I.RollsE. T.CaanW. (1982). Visual neurones responsive to faces in the monkey temporal cortex.Exp. Brain Res.47329–342. 10.1007/bf00239352
45
PonceC. R.XiaoW.SchadeP. F.HartmannT. S.KreimanG.LivingstoneM. S. (2019). Evolving images for visual neurons using a deep generative network reveals coding principles and neuronal preferences.Cell177999–1009. 10.1016/j.cell.2019.04.005
46
R Core Team. (2024). R: A Language and Environment for Statistical Computing.Vienna: R Foundation for Statistical Computing.
47
RollsE. T.ToveeM. J. (1995). Sparseness of the neuronal representation of stimuli in the primate temporal visual cortex.J. Neurophysiol.73713–726. 10.1152/jn.1995.73.2.713
48
Salinas RuízJ.Montesinos LópezO. A.Hernández RamírezG.Crossa HiriartJ. (2023). Generalized Linear Mixed Models with Applications in Agriculture and Biology.Berlin: Springer International Publishing. 10.1007/978-3-031-32800-8
49
ScarfD.StuartM.JohnstonM.ColomboM. (2016). Visual response properties of neurons in four areas of the avian pallium.J. Comp. Physiol. A202235–245. 10.1007/s00359-016-1071-6
50
ShimizuT. (1998). Conspecific recognition in pigeons (Columba livia) using dynamic video images.Behaviour13543–53. 10.1163/156853998793066429
51
ShimizuT.BowersA. N. (1999). Visual circuits of the avian telencephalon: Evolutionary implications.Behav. Brain Res.98183–191. 10.1016/S0166-4328(98)00083-7
52
The MathWorks Inc. (2024). MATLAB.Available online at: https://www.mathworks.com (accessed September 12, 2024).
53
TsaoD. Y.FreiwaldW. A.TootellR. B.LivingstoneM. S. (2006). A cortical region consisting entirely of face-selective cells.Science311670–674. 10.1126/science.1119983
54
WangB.PonceC. R. (2022). Tuning landscapes of the ventral stream.Cell Rep.41:111595. 10.1016/j.celrep.2022.111595
55
WareE. L. R.SaundersD. R.TrojeN. F. (2017). Social interactivity in pigeon courtship behavior.Curr. Zool.6385–95. 10.1093/cz/zow066
56
WareE.SaundersD. R.TrojeN. F. (2015). The influence of motion quality on responses towards video playback stimuli.Biol. Open4803–811. 10.1242/bio.011270
57
WilkinsonA.SpechtH. L.HuberL. (2010). Pigeons can discriminate group mates from strangers using the concept of familiarity.Anim. Behav.80109–115. 10.1016/j.anbehav.2010.04.006
58
WillenbockelV.SadrJ.FisetD.HorneG. O.GosselinF.TanakaJ. W. (2010). Controlling low-level image properties: The SHINE toolbox.Behav. Res. Methods42671–684. 10.3758/BRM.42.3.671
59
YuZ.GuindaniM.GriecoS. F.ChenL.HolmesT. C.XuX. (2022). Beyond t test and ANOVA: Applications of mixed-effects models for more rigorous statistical analysis in neuroscience research.Neuron11021–35. 10.1016/j.neuron.2021.10.030
Summary
Keywords
conspecific recognition, dynamic stimuli, feature-based coding, multimodal integration, pigeon, sparse coding, tectofugal pathway, thalamofugal pathway
Citation
Santos Silva S, Bühn D, Hall P, Clark W, Rose J and Colombo M (2026) Single neuron responses in NCL, MVL, and Wulst during the observation of videos of conspecifics support population feature coding. Front. Behav. Neurosci. 20:1736261. doi: 10.3389/fnbeh.2026.1736261
Received
31 October 2025
Revised
26 January 2026
Accepted
30 January 2026
Published
26 February 2026
Volume
20 - 2026
Edited by
Mengmeng Li, Zhengzhou University, China
Reviewed by
Lukas Anneser, Friedrich Miescher Institute for Biomedical Research (FMI), Switzerland
Jiangtao Wang, Zhengzhou University, China
Updates
Copyright
© 2026 Santos Silva, Bühn, Hall, Clark, Rose and Colombo.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Sara Santos Silva, sara.santossilva@ruhr-uni-bochum.de
ORCID: Sara Santos Silva, orcid.org/0009-0000-0322-3056; Jonas Rose, orcid.org/0000-0003-1745-727X; Michael Colombo, orcid.org/0000-0002-2574-4888
Disclaimer
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.