Unmixing Binocular Signals

Incompatible images presented to the two eyes lead to perceptual oscillations in which one image at a time is visible. Early models portrayed this binocular rivalry as involving reciprocal inhibition between monocular representations of images, occurring at an early visual stage prior to binocular mixing. However, psychophysical experiments found conditions where rivalry could also occur at a higher, more abstract level of representation. In those cases, the rivalry was between image representations dissociated from eye-of-origin information, rather than between monocular representations from the two eyes. Moreover, neurophysiological recordings found the strongest rivalry correlate in inferotemporal cortex, a high-level, predominantly binocular visual area involved in object recognition, rather than early visual structures. An unresolved issue is how can the separate identities of the two images be maintained after binocular mixing in order for rivalry to be possible at higher levels? Here we demonstrate that after the two images are mixed, they can be unmixed at any subsequent stage using a physiologically plausible non-linear signal-processing algorithm, non-negative matrix factorization, previously proposed for parsing object parts during object recognition. The possibility that unmixed left and right images can be regenerated at late stages within the visual system provides a mechanism for creating various binocular representations and interactions de novo in different cortical areas for different purposes, rather than inheriting then from early areas. This is a clear example how non-linear algorithms can lead to highly non-intuitive behavior in neural information processing.

Incompatible images presented to the two eyes lead to perceptual oscillations in which one image at a time is visible. Early models portrayed this binocular rivalry as involving reciprocal inhibition between monocular representations of images, occurring at an early visual stage prior to binocular mixing. However, psychophysical experiments found conditions where rivalry could also occur at a higher, more abstract level of representation. In those cases, the rivalry was between image representations dissociated from eye-of-origin information, rather than between monocular representations from the two eyes. Moreover, neurophysiological recordings found the strongest rivalry correlate in inferotemporal cortex, a high-level, predominantly binocular visual area involved in object recognition, rather than early visual structures. An unresolved issue is how can the separate identities of the two images be maintained after binocular mixing in order for rivalry to be possible at higher levels? Here we demonstrate that after the two images are mixed, they can be unmixed at any subsequent stage using a physiologically plausible non-linear signal-processing algorithm, non-negative matrix factorization, previously proposed for parsing object parts during object recognition. The possibility that unmixed left and right images can be regenerated at late stages within the visual system provides a mechanism for creating various binocular representations and interactions de novo in different cortical areas for different purposes, rather than inheriting then from early areas. This is a clear example how non-linear algorithms can lead to highly non-intuitive behavior in neural information processing. modest rivalry effects. Weak correlates of rivalry were reported for single-cell recordings in striate cortex , and no rivalry related activity was reported for single-cell recordings in lateral geniculate nucleus (Lehky and Maunsell, 1996;Wilke et al., 2009). FMRI studies, on the other hand produced somewhat different results from single-cell physiology, showing vigorous rivalry correlates in striate cortex (Polonsky et al., 2000;Tong and Engel, 2001;Lee et al., 2007) and to some extent in lateral geniculate nucleus as well (Haynes et al., 2005;Wunderlich et al., 2005).
Overall, examining the psychophysical, neurophysiological, and fMRI data, there is evidence for rivalry occurring at a wide range of levels within the visual system. Faced with this body of results, a new class of "hierarchical" binocular rivalry models was created (Wilson, 2003;Freeman, 2005). Earlier models had postulated reciprocal inhibition between monocular representations of images tied to signals from left and right eyes. Hierarchical models augmented that with an additional stage (or stages) involving inhibition between higher-level, binocular representations of images, where eye-oforigin was lost. That allowed "eye rivalry" to occur at lower levels of the visual system and "image rivalry" to occur at higher levels.
An unresolved issue in hierarchical models is how can the separate identities of the two images be maintained after binocular mixing in order for rivalry to be possible at higher levels? We suggest that a way for left and right images to retain their separate identities after binocular mixing is to simply unmix them. Recently a new class of non-linear signal-processing algorithms has been developed that has the potential to do that, called blind source separation (BSS) algorithms (Choi et al., 2005;Cichocki et al., 2009;Comon and Jutten, 2010). BSS algorithms separate signal mixtures into component "sources." The algorithms are called "blind" because they are given little or no information about the nature of the underlying source signals they are trying to recover. Because they are blind, they fall into the category of unsupervised learning algorithms.
From amongst the various BSS algorithms we focus on one, non-negative matrix factorization (NMF; Lee and Seung, 1999). The non-negativity constraint in NMF is appealing for applications in neural processing as firing rates must be non-negative. However the ability to do binocular unmixing is not unique to NMF, and we shall also demonstrate it using a second, unrelated BSS algorithm called independent component analysis (ICA). Matlab code for NMF was obtained from Hoyer (2011) and for ICA from Hyvarinen (2011). We believe that this is the first suggestion that BSS algorithms may be dynamically operating within the brain for real-time visual processing.

Results
Two pairs of images were used to test the algorithms (Figure 1), a pair of orthogonal sinusoidal gratings and a face/house pair. Both stimulus classes are widely used in binocular rivalry studies. Each pair was linearly mixed in various proportions to form five mixed images. This variable mixing in the algorithm corresponds to physiological observations that binocular neurons in striate cortex of macaque monkey occur in various ocular dominance mixtures (Hubel and Wiesel, 1968). In the words of Hubel and Wiesel (1977), "Just why the two eyes should be brought together in this elaborate but incomplete way is not yet clear. What the ocular dominance columns appear to achieve is a partial mixing of influences from the two eyes, with all shades of ocular dominance throughout the entire binocular field of vision." Whatever the reason for this variable binocular mixing, it is precisely what is needed for BSS algorithms to work. The algorithms would not work if only a single binocular mixture were available. fMRI studies also show ocular dominance columns in humans (Cheng et al., 2001;Yacoub et al., 2007), suggesting variable binocular mixing may be similar in humans and macaque monkeys.
Variable ocular dominance also occurs in extrastriate visual cortex. Ocular dominances in extrastriate cortex are more narrowly spread than in striate cortex, as indicated by data from inferotemporal cortex (Uka et al., 2000) and area MT (Kiorpes et al., 1996). The unmixing results reported here were produced using left/right ocular dominance mixtures spread over the range 67%/33%-33%/67%, as shown in Figure 1. However, similar results were obtained using an The NMF algorithm was implemented in terms of matrix algebra (Figure 2A). The procedure was to factorize the binocular mixture matrix B into two matrices, B = M × A, subject to the constraint M and A were non-negative. Each column in the binocular mixture matrix B corresponded to one mixed image (there even narrower spectrum of ocular dominances, going from 55/45 to 45%/55%, so it does not take a large range to allow the BSS algorithms to work. The variability of ocular dominances in extrastriate cortex appears sufficient to support the sort of binocular unmixing being proposed here.   Figure 1. Each column had 40,000 rows, corresponding to 40,000 pixels in each image (200 × 200 pixels). Thus each image is "unfolded" from a 2D array to a 1D column of pixels. The binocular matrix B was factored into two non-negative matrices M and A such that B = M × A. The factorization was done by iteratively updating M and A in accord with the NMF algorithm so as to gradually reduce error between B and M × A, with error based on entropy divergence (Lee andSeung, 1999, 2001). The matrix M had two columns, containing left and right source images, and 40,000 rows. The matrix A contained mixing coefficients, which combined the two source images in M to form different binocular mixtures. Matrix A had five columns and two rows, corresponding to five pairs of mixing coefficients to produce five different binocular mixtures. algorithm allowed it get stuck in a local error minimum. Details of the crosstalk pattern varied from trial to trial as the algorithm started from different random states.
In addition to NMF, we tried another BSS algorithm, ICA (Bell and Sejnowski, 1995;Hyvärinen and Oja, 2000;Stone, 2002). Instead of being constrained to finding non-negative factors of a matrix, this algorithm was constrained to find a set of unmixed images that were as statistically independent as possible from each other. FastICA (Hyvärinen and Oja, 2000) was the specific variant of the ICA algorithm used. ICA was able to unmix binocular images in a manner similar to NMF (compare Figures 3Aii,B). Unlike NMF, ICA never converged to produce visible crosstalk between unmixed images, although subliminal crosstalk remained. The ICA algorithm, on the other hand, did have the disadvantage that in 50% of unmixing trials the recovered images were contrast reversed, as ICA did not have a non-negativity constraint.
The NMF algorithm was able to unmix gratings with small orientation differences, down to the smallest difference tested of 1°. In contrast, the ICA algorithm had an increasing probability of finding an incorrect solution to the unmixing problem as the orientation difference dropped below 15°.
Although both BSS algorithms were capable of unmixing images, they differed in the details of their behavior. Presumably other BSS algorithms would each have their own mix of characteristics.

Discussion
Binocular unmixing neatly solves the problem of how two images can retain their separate identities after binocular mixing, so that rivalry can occur between high-level binocular representations of incompatible images. Although unmixed images appear virtually identical to the original monocular images (Figure 3), they are binocularly driven (Figure 2B).
The ability of two unrelated algorithms, NMF and ICA, to unmix binocular signals suggests that there is a whole class of BSS algorithms having similar capabilities. This opens the opportunity for combined theoretical and experimental investigations to uncover the particular implementation that may be occurring biologically.
The binocular unmixing model does not consider how the oscillations of rivalry themselves are produced. The actual oscillations during rivalry would require further interactions between the two images after unmixing. Mechanisms to produce oscillations have already been extensively modeled (among them Lehky, 1988;Lumer, 1998;Laing and Chow, 2002;Wilson, 2003Wilson, , 2007Freeman, 2005;Grossberg et al., 2008;Gigante et al., 2009). Binocular unmixing augments those models of oscillations by creating conditions at higher visual levels that allow them to operate. The binocular mixing model also does not consider mechanisms of perceptual grouping that occur under some rivalry conditions (Dörrenhaus, 1975;Kovács et al., 1996;Ngo et al., 2000). Grouping mechanisms in rivalry have received less theoretical attention than oscillatory mechanisms (although see Grossberg et al., 2008). Binocular unmixing again serves to create conditions at higher visual levels that would allow grouping algorithms to operate.
As signals pass through the unmixing circuitry, eye-of-origin labeling is lost in the recovered left and right images. There is no way to tell which image originated from the left eye and which are five mixed images in this example). Each row corresponded to a different image pixel. Starting from random values of M and A, the algorithm iteratively updated their values so as to reduce error between M × A and B, following standard update rules for the algorithm using an error measure based on entropy divergence (Lee andSeung, 1999, 2001). (The error measure used is not critical for the algorithm.) Gradually the two images unmixed as M × A converged to B. The binocular mixture matrix B was now expressed in terms of the multiplication of M, a matrix containing the two unmixed monocular images, by A, a matrix containing mixing coefficients.
What we really want to solve, however, is the inverse problem to that described above. Rather than find the matrix A of mixing coefficients used to combine monocular images into binocular mixtures (Figure 2Ai), we want an unmixing matrix W that can decompose the binocular mixtures into component monocular images: B × W = M (Figure 2Aii). Fortunately there is a simple relationship between the mixing and unmixing matrices: they are inverses of each other: W = A + . (In this case, because the mixing and unmixing matrices are not square, the Moore-Penrose generalized inverse A + must be used rather than the regular matrix inverse A −1 ). Although we applied the algorithm directly to image pixel values, the principle remains the same whether the numbers in matrices M and B represent pixel values or neural firing rates derived by convolving receptive fields with the image.
The unmixing algorithm can be given a more physiological interpretation by formulating it in terms of a neural network rather than matrix algebra ( Figure 2B). The iterative nature of the algorithm is indicated by the feedback loop originating from the outputs. The gradual unmixing of the binocular signal as it cycles through the feedback loop may have a perceptual correlate in binocular rivalry. When orthogonal gratings are briefly flashed to the two eyes for less than 150 ms they appear mixed, in a checkerboard pattern (Wolfe, 1983). It is only after longer exposure that the mixture disappears and the image from one eye or the other starts to predominate.
Feedback was mathematically implemented here as discrete time updates on a set of matrices. It could equivalently be expressed within a network as a non-linear dynamical system operating in continuous time, expressed as a set of coupled differential equations. As the dynamical system evolves to a stable point (unmixed images at the output), it is not only neural activities that must change dynamically, but also the strengths of synaptic interactions. There is indeed evidence for rapid dynamic modulation of neural connectivity in a network (Vaadia et al., 1995), and rapid synaptic plasticity as a mechanism for implementing neural computations has been reviewed by Abbott and Regehr (2004).
Unmixing produced by the NMF algorithm was not perfect. There was residual crosstalk within the two unmixed images. This was apparent when an unmixed image was subtracted from the original source image (Figure 3). The crosstalk was small enough, however, that in most trials it was not apparent upon inspection of the unmixed images. However, in some trials (around 25% for the face/house pair), the NMF algorithm converged to a situation with visible crosstalk, possibly because lack of noise in the level of crosstalk immediately following the initial presentation of rivalrous stimuli, with the crosstalk smoothly decaying over time to some non-zero value before the oscillations started. Subliminal crosstalk would remain during the oscillatory period. Non-negative matrix factorization was introduced as a possible mechanism for parsing objects into parts for object recognition (Lee and Seung, 1999). We see that it may also be involved in binocular rivalry. At the single neuron level, neurophysiological correlates of binocular rivalry are strongest in inferotemporal cortex (Sheinberg and Logothetis, 1997), a ventral visual area associated with object recognition, and weaker in striate cortex  or in the dorsal visual pathway (Logothetis and Schall, 1989). Although as a binocular phenomenon rivalry tends to be most associated with stereopsis, we suggest at higher levels it may also have connections with mechanisms of shape representation during object recognition. originated from the right eye. This lose of eye-of-origin information is consistent with the psychophysical data outlined earlier, and is in fact a defining characteristic of high-level "image rivalry." The situation is different for stereopsis, where the preservation of disparity sign (near/far) indicates that eye-of-origin information is implicitly retained within the population of binocular cells. That was emphasized by Assee and Qian (2007) in a model of da Vinci stereopsis that extracted eye-of-origin information for occluded monocular regions using binocular cells. While the BSS algorithms used here lose eye-of-origin information, in the future it might be possible to devise binocular unmixing models that do retain such information, for applications other than rivalry.
We found a low level of crosstalk in the unmixed left and right images (Figure 3). Binocular crosstalk has not been a prediction of previous binocular models. In experimental observations under conditions of high-level "image rivalry," we would expect a strong  Binocular unmixing thus raises the possibility that new binocular interactions between left and right images can be created in different cortical areas for different purposes, rather than being inherited from striate cortex.

AcknowleDgments
I thank Saumil Patel, Anne Sereno, and Christian Wehrhahn for comments on the manuscript.
Besides binocular rivalry in inferotemporal cortex, another example that might use binocular unmixing involves area MT, a cortical area believed to represent visual motion. There is evidence that area MT can support comparisons between velocities in left and right images for computation of 3D motion (Rokers et al., 2009(Rokers et al., , 2011, despite being binocularly driven. In this case, MT appears to be performing visual processing as if it had access to the original unmixed images.