Skip to main content


Front. Comput. Neurosci., 24 June 2009
Volume 3 - 2009 |

Modeling multisensory enhancement with self-organizing maps

Department of Computer Science, School of Computer Science and Statistics, Trinity College Dublin, Dublin, Ireland
Department of Anatomy and Neurobiology. Virginia Commonwealth University School of Medicine, Richmond, VA, USA
Self-organization, a process by which the internal organization of a system changes without supervision, has been proposed as a possible basis for multisensory enhancement (MSE) in the superior colliculus (Anastasio and Patton, 2003 ). We simplify and extend these results by presenting a simulation using traditional self-organizing maps, intended to understand and simulate MSE as it may generally occur throughout the central nervous system. This simulation of MSE: (1) uses a standard unsupervised competitive learning algorithm, (2) learns from artificially generated activation levels corresponding to driven and spontaneous stimuli from separate and combined input channels, (3) uses a sigmoidal transfer function to generate quantifiable responses to separate inputs, (4) enhances the responses when those same inputs are combined, (5) obeys the inverse effectiveness principle of multisensory integration, and (6) can topographically congregate MSE in a manner similar to that seen in cortex. Thus, the model provides a useful method for evaluating and simulating the development of enhanced interactions between responses to different sensory modalities.


The nervous system must deal with a complex environment which contains events that are often simultaneously encoded by different sensory modalities, the detection of which can provide considerable adaptive significance to an organism. To accomplish this, inputs from different sensory systems converge onto individual neurons, whereby information from different senses is integrated into signals that are different from that produced by either of the same senses stimulated alone. One of the most evident and perhaps best studied forms of multisensory integration is response enhancement, which is a significant increase in activity in response to combined-modality stimulation (Meredith and Stein, 1983 , 1986 ). How different sensory inputs converge to produce multisensory response enhancement is a question that has received considerable interest and the multisensory, or deeper, layers of the superior colliculus have provided a fertile site in which to examine this phenomenon (for review, see Stein and Meredith, 1993 ).
Because the superior colliculus is a robust locus for multisensory integration, efforts to mathematically model response enhancement naturally have attempted to emulate the physiological features and connectional architecture of the structure (Anastasio and Patton, 2003 ; Anastasio et al., 2002 ; Colonius and Diederich, 2001 , 2004 ; Patton and Anastasio, 2003 ; Patton et al., 2002 ; Raginsky and Anastasio, 2008 ). In particular, Anastasio and Patton (2003 ) have developed a hybrid neural computing system tuned to specifically simulate the deep layers of the superior colliculus. After two-stages of training using several empirical parameters and simulated unimodal and bimodal inputs with spontaneous and driven activity, their innovative model successfully generated examples of multisensory enhancement (MSE, Anastasio and Patton, 2003 ).
It is well known that multisensory convergence and processing occur in numerous areas outside the superior colliculus, such as the cerebral cortex (for review, see Ghazanfar and Schroeder, 2006 ). Therefore, this study was directed toward constructing a simpler, more general model of multisensory integration whose features are not tied to or founded upon a particular neural region. This model is based on the following assumptions. Donald Hebb’s observations that neurons can autonomously learn to associate presynaptic and postsynaptic behavior led to the coincidence learning category of learning laws in neural computing (Haykin, 1999 ; Hebb, 1949 ). Coincidence learning has been used for simulating multisensory integration within the deep layers of the superior colliculus by establishing correlation between primary inputs and outputs (Anastasio and Patton, 2003 ) and for training a multinet neural computing system to associate two different modalities of input in numerosity simulations (Ahmad et al., 2002 ). Unsupervised classification learning is a process that clusters input data using no a priori knowledge about an input’s membership in a particular class. Rather, gradually detected characteristics and a history of training are used to assist in defining classes and possible boundaries between them (Haykin, 1999 ). Coincidence and competitive learning are two types of unsupervised learning paradigms. In competitive learning, the neurons of the neural network compete with each other during each episode of learning with the result that only the most active neuron (or group of neurons) is declared the winner for the given input stimulus (Hecht-Nielsen, 1990 ). Furthermore, only the winning neuron and neurons within a given neighborhood of the winner are allowed to change their weights. In both types of learning, the artificial neurons adjust their numerical weights according to the numerical representations of the input stimuli. However, the formulas used to adjust the weights are typically different. The self-organizing map (SOM) is a biologically inspired competitive learning algorithm (Grossberg, 1976 ; Kohonen, 1993 ; von der Malsburg, 1973 ; Willshaw and von der Malsburg, 1976 ). The SOM model of map formation was derived in an attempt “to find the abstract self-organizing processes in which maps resembling the brain maps are formed” (Kohonen, 2001 :p. 104). In his recent review of neural map formation research, Goodhill notes that models such as the SOM “give a remarkably good fit to experimental data on the geometrical properties of maps in (primary visual cortex), including subtle changes in these properties following various forms of visual deprivation” (Goodhill, 2007 ). In the map, neurons are usually placed at the nodes of a two-dimensional lattice and become selectively tuned to respond more to various classes of input patterns. The unsupervised tuning via training leads to the formation of a topographic map of the different features of the input, hence the synonymous name self-organizing feature map.
The present efforts, based in part on the first step of the Anastasio and Patton (2003) model, employ the principles of self-organizing feature maps and have created a simulation that learns to generate MSE from artificially generated combinations of sensory input signals. Furthermore, several characteristic properties of MSE, including the principle of inverse effectiveness, are observed in the simulations. Our unsupervised model is based on the widely used sigmoidal transfer function and the traditional SOM, and is trained via competition amongst its constituent artificial neurons (Kohonen, 1990 ).

Materials and Methods

SOMs used in this study were implemented in the Java™ programming language (freely available for downloading from the author’s website). Within the SOM, each artificial neuron in the lattice has an associated weight vector wi whose components correspond to the neuron’s inputs. When each input vector x representing m modalities
is presented to the network, each neuron i competes on the basis of which has its weight vector wi closest to x. The proximity is measured in terms of a distance function which can be the Euclidean distance or the angle between the input vector and the weight vector. The node with the smallest Euclidean distance or angle from the input vector is then declared the winner and allowed to change its weights toward the input vector.
The standard competitive learning rule computes weight change as the difference between the current connection strength, or weight, of a neuron and its input,
where η is called the neighborhood function which is a decreasing function of the distance between the winning neuron k and the neuron i currently being updated (Kohonen, 1990 ). This weight updating rule helps in damping any explosive growth of weights that may be encountered in coincidence learning (Haykin, 1999 ). The neighborhood function is the Gaussian
where 0 < α ≤ 1is a monotonically decreasing learning-rate factor, σis the width of the neighborhood function, rk and ri correspond to vectors containing the map coordinates of the winning neuron k and the current neuron i, and dist is a function giving the distance between two nodes.
In this paper, we assigned a length of one between all neighboring connections on the grid of the SOM, including the diagonal city-block metric. Mathematically, we compute the distance between two neurons on the grid using
where rmi is the ith coordinate of the vector rm, which corresponds to the position of node m in the map. In addition, the Euclidean distance function produced maps similar to those obtained with the city-block metric. Our key results appear to remain invariant across both distance functions.

System Design

Learning algorithm

The simulations in this paper were based on competitive learning with the traditional SOM paradigm, as described. Artificially generated input vectors whose components corresponded to stimulus levels of different stimulus modalities were used to train the SOMs. The 10 by 10 SOM comprised weighted connections between the simulated modality inputs and the neurons on the map. The weighted connections for each neuron on the map were initialized with weights picked uniformly at random from the range 0 to 1. Furthermore, the weights for every SOM neuron were normalized initially and after every step of training. The learning rate (α) started at 1.0 and was monotonically decreased linearly over the entire learning period (5,000 iterations) to 0.01. Unless indicated otherwise, the neighborhood width (σ) was kept constant at 1.0. The rectangular distance on the map was always used for the distance function between neurons.

Artificial input data

Following Anastasio and Patton (2003) , the SOM was trained and tested using an m component random vector, where m represents the number of modalities. Each component of the m component vector was a randomly chosen integer between 0 and 20 (inclusive) that reflected the degree to which the corresponding simulated sensory modality input was driven or spontaneous. In each iteration of training, the driven and spontaneous characteristics of simulated activation levels for each modality were determined according to a modality combination string which was chosen randomly over all possible strings of 0’s and 1’s with length m (e.g., 100, 111 for m = 3).
Each particular modality combination string defined which modalities were driven and which were spontaneous. Spontaneous activation was denoted with a 0 and driven activation with a 1. For example, the string 111 corresponds to all driven modalities, and the string 010 corresponds to a driven second modality and spontaneous first and third modalities.
During the simulation, the probability of selecting each modality combination string was equal (see Table 1 ). For constructing the simulated input combination to the map, there were conceptually m groups of 20 sensory input neurons, one group for each of the m modalities. The behavior of each sensory input neuron within a modality under spontaneous and driven activations were modeled using probability thresholds ps and pd, respectively, where ps is less than pd. Typical values can be pd = 0.6 and ps = 0.1, which respectively correspond to a 60 or 10% chance of one of the 20 neurons in each modality firing under a driven or spontaneous signal activation level. Once a particular modality combination of spontaneous and driven modalities was chosen, the number of spikes generated from each modality was chosen according to the probability mass function of the binomial distribution,
The binomial distribution was sampled for each modality with the corresponding driven or spontaneous probability to generate a number that represented the number of neurons which fired within that particular modality. Here, 20 was the total number of neurons within each modality, r corresponded to the number of spikes, Pr(#Spikes = r) corresponded to the probability that r neurons out of 20 spiked given the probability pμ, and μ∈{d, s} represents driven or spontaneous respectively. A sample from the binomial distribution corresponded to a random number between 0 and 20 which represented a random sum of activations from the 20 sensory input neurons in a particular modality. The random sums for each modality were then combined into an m component vector which was presented to the network for training.
The expected number of sensory input neurons active in a given modality is essentially determined by the spontaneous and driven activations. Shrinking the distance between the driven and spontaneous distributions causes the driven and spontaneous binomial samples to approach the same expected value (see Table 2 ). The expectations can be calculated by multiplying the corresponding probability by the number of neurons per modality (n = 20).


In artificial neural networks, the output of a neuron can be computed using an activation function that simulates the threshold and saturation properties of a biological neuron (Hopfield, 1984 ). The activation function is a function of the weighted sum of the inputs yes. Weighted summations of neuronal inputs and sigmoidal activation functions have been successfully used to model the physiological behavior of the firing rates of neurons in the mammalian brain (Poirazi et al., 2003a ,b ). In our simulation, the output zi of a SOM neuron i was computed using a non-linear sigmoid activation function based on the exponential function,
Above, γ is a sigmoid sensitivity parameter that controls the slope of the sigmoid and ϕ is a constant bias parameter which determines the amount of input yes required to output the value yes.

Parameter sensitivity analysis

One advantage to using a traditional and popular learning algorithm like the SOM is that many studies are readily available concerning the parameter sensitivity of the functions used in SOMs (Flanagan, 2001 ; Kohonen, 2001 ; Sadeghi, 1998 ). The setting of the sigmoid function is crucial to obtain meaningful results. For example, if the parameter ϕ is set too high, then the sigmoid can be shifted so far horizontally that all amounts of inputs will register a low value. Likewise, if the parameter ϕ is set too low, the sigmoid can be shifted to the left so that all inputs give a near maximum output. Therefore, as is standard in modeling papers (Hopfield, 1984 ), the sigmoid was tuned according to the input ranges.
In general, our sigmoid activation function was configured to output yes when stimulated by one modality. The parameter ϕ was chosen by computing the maximum weighted value of a unimodal stimulus with no spontaneous input from the other modalities,
where n = 20 is the number of neurons per modality and m is the number of modalities. For example, the constant bias parameter for a three modality model with each modality containing n = 20 neurons is yes, which is the maximum sigmoid output of a single driven weighted input, because the weights are kept normalized. The slope (γ) of the activation function was set to yes.
The results were analyzed using the set of sigmoid values over 1,000 presentations of a particular modality combination.

Weight updating

Instead of using the Euclidean distance or angle, we equivalently declared the SOM neuron with the highest sigmoid activation function for a given input as the winner and allowed it to initiate weight changes in its neighborhood. After each iteration of training, the weights of the winner and its neighbors were aligned more toward the given input according to the standard competitive learning rule. After a neuron’s weights were updated, they were subsequently normalized to have Euclidean length 1.

Edge Effects

Edge effects are a SOM phenomenon that can cause a concentration of weight vectors around the edges of the SOM. One solution to the problem of edge effects is to use a modified learning rule, or an edgeless topology, including toroidal and spherical SOM’s (Wu and Takatsuka, 2006 ). However, such modified SOMs typically do not have the wealth of existing theoretical research behind them in comparison with the traditional SOM. In order to keep the model and presentation simple, closer to Anastasio and Patton (2003) , and to retain the theoretical backing of previous SOM research, we decided not to compensate for edge effects.


Sigmoid activation maps

Each SOM was trained using 5,000 randomly selected modality combinations. Once trained, an activation map plot was generated by averaging the sigmoid output of each neuron in the SOM over 1,000 random targets of a specific modality combination. The ith row and jth column of the activation map corresponded to the average sigmoid activation of the neuron at that particular location on the trained map for that specific modality combination. Areas colored red corresponded to the maximum sigmoid activation level (1.0) and those colored black corresponded to the minimum sigmoid activation level (0.0). The parameters used for training are listed in Table 3 .

Sigmoidal simulation of multisensory enhancement

Following Meredith and Stein (1986) , the level of “MSE” of each neuron was calculated for each combination of multimodal stimuli (011, 101, 110, 111),
where CM corresponds to the average combined-modality response and SMmax represents the maximum average response under a single-modality stimulus (001, 010, 100). The average CM and SM responses for a SOM neuron were obtained by averaging its sigmoid output over 1,000 presentations of each particular modality combination.
Figure 1 shows an example of the calculation of MSE using the sigmoid function. Traditionally, MSE is calculated using an aggregate value of action potential counts derived in response to different single-modality and combined-modality stimuli over several trials. In contrast, our measurement is based on the output of a sigmoid activation function. Perhaps a better term to use in reference to the model is sigmoidal enhancement, since the model compares the sigmoid outputs of an artificial neuron and not actual spike counts. Nevertheless, artificial neurons with high activation levels can be interpreted to fire more often than those that do not receive that level of input and may conceptually correlate with increased spike activity of real neurons. Therefore, the terms sigmoidal enhancement and MSE are used synonymously here.
Figure 1. A sigmoidal enhancement example. This curve simulates the physiological relationship between level of input to an artificial neuron (dot product) and its hypothetical output (sigmoid). By selecting points along this input–output relationship, a value for sigmoid enhancement can be calculated. In this example, SM 1 and SM 2 correspond to unimodal responses, CM to the bimodal response, and the dot product corresponds to the value of the weighted sum of inputs. The sigmoidal enhancement in this example is (0.9 − 0.3)/0.3 = 200%.


Activation Maps Generated by the SOM Algorithm

After training, the different unimodal input combinations (001, 010, 100) each activated a distinct (yellow) region of the trained map (Figure 2 ). Bimodal inputs (011, 101, 110) involving combinations of the driven stimuli, also activated specific parts of the map (red-orange). However, in this case, bimodal stimuli did not only preferentially activate the two areas represented by their constituent unimodal stimuli, but instead triggered a single focus of activity in the region between the two. This effect was observed for all bimodal stimulus combinations (see Figure 2 ) and was made even more evident when the levels of MSE were plotted. As seen in Figure 2 (row 2), maximal levels of MSE occurred in narrow bands restricted to the general regions between the constituent unimodal representations. Similarly, although trimodal stimuli (111) activated most of the map (see Figure 2 , row 4), the highest levels of MSE for trimodal stimulation were focused central to the different unimodal representations. The completely spontaneous stimulus (000) prompted little activation over the entire map. The ontogenesis of the map in Figure 2 was portrayed in a movie that is available in the supplementary materials (or ). The movie is an animation of the development of activation and MSE patterns on the SOM according to stimulus combination over 5,000 iterations of training.
Figure 2. Activation and multisensory enhancement (MSE) levels for a three modality model on a 10 × 10 sized SOM after 5,000 iterations of training; throughout which, the neighborhood width was kept constant at 1. Activation maps in the top row (labeled “unimodal”) show distinct activation areas (yellow) for single-modality stimulation of the different modalities (001, 010, 100). The second row shows activation levels for combined-modality stimulation (011, resulting from combined inputs from 001 and 010; 101 resulting from combining 001 and 100; and 110 resulting from combining 010 and 100). Here, the highest levels of activity (red, dark red) occurred in a single region between the representations of the constituent unimodal inputs. In row three, the level of multisensory enhancement is determined for the 011, 101, and 110 stimulus combinations (see Section “Materials and Methods”), and reveals a sharper focus of multisensory enhancement levels at the locus between the unimodal representations. The bottom row illustrates an activation map for stimulation in all three modalities (111) and the resulting levels of multisensory enhancement (MSE 111). The map in the box depicts the result of spontaneous activity (000) without driving, wherein only low levels of activity resulted across the map. Scale bars on right indicate activation and MSE levels.
As can be seen in the different bimodal activation maps (Figure 2 , row 2), levels of activation, and resulting multisensory integration changed with different stimulus combinations and different locations within the map. To quantify which conditions most directly influenced multisensory integration, these features (modality combination, spatial location in map) were compared in an experiment using a two modality model (Figure 3 ). Row 1 of Figure 3 shows activation maps for unimodal (01, 10) and bimodal (11) conditions. The graphs in row 2 illustrate the quantified level of activity evoked in each condition for artificial neurons located on the diagonal from positions 1,1 to 10,10 of the SOM grid. The experiments in rows 1 and 2 of Figure 3 confirm that unimodal inputs preferentially activated a unique area of the map, while the combination of the same (i.e., bimodal) inputs created a higher focus of activity in the area between their constituent unimodal parts. In row 3, selected points (1,1; 5,5; 10,10) on the SOM are quantified and compared (bar graph) for each stimulus condition (spontaneous, unimodal, and bimodal). From these graphs it is clear that the top left (neuron 1,1) and bottom right (neuron 10,10) of the map have come to represent the different unimodal inputs and respectively responded the most to the 01 and 10 modality combinations. In contrast, the middle of the map (neuron 5,5) preferentially represented and exhibited a larger response for the bimodal stimuli (11) over the unimodal stimuli combinations (01 or 10). Furthermore, the bimodal increase in simulated activity was greater than that evoked by either of the component stimuli presented alone as well as greater than the sum of their activity (superadditive).
Figure 3. Quantification of multisensory enhancement for a two modality model. Activation maps in the top row show unimodal activation areas (01, 10) and the result of bimodal stimulation (11), from which the data in the middle row are derived. For a given map, representative levels of activity was measured from neurons lying on a diagonal line from the top left to bottom right and these response values are depicted by the blue, curved line (second row). Note that unimodal responses (01, 10) were distributed toward the edges of the map, while bimodal responses (11) were centered between them. In each condition, bimodal responses exceeded that produced by unimodal stimulation (histograms, bottom row), but the greatest levels of activation, representing multisensory enhancement, were achieved at the position (5,5) between the unimodal areas of the map (third row). Weight vectors for the trained SOM neurons at positions (1,1), (5,5), and (10,10) are depicted in the last row of the figure.
These findings were confirmed for 100 randomly initialized and trained SOMs, for which the average minimum, maximum, average, and standard deviation of the percentage enhancement for each multimodal combination are shown in Table 4 . The average minima, maxima, and standard deviations indicate a wide variability in the MSE of the trained map.

Inverse Effectiveness

Being an artificial neuron within the two modality representation of the SOM shown in Figure 3 did not, by default, always result in significant enhancement. In fact, the degree of enhancement in the model was affected by the level (strength) of the driven input stimulation. Some driven levels of stimulation produced little enhancement while other combinations achieved far higher levels. The driven stimulation level for the trained map shown in Figure 3 was systematically altered from ∼0.4 to 1.0 (all other factors remained the same) and the resulting levels of MSE were plotted in Figure 4 according to their map position (neuron 1,1 thru 10,10). Here, enhancement was most evident in the bimodal areas activated by low-level driven stimulations (e.g., ∼0.4–0.5) but was essentially lost if higher-level driven stimulations (e.g., 0.8–0.9) were used. Thus, the model indicated an inverse relationship between stimulation effectiveness and MSE.
Figure 4. A two modality SOM showing inverse effectiveness. This 3D graph plots the position of artificial neurons in the trained SOM (location 1,1-10,10; x-axis) against the driven value of the inputs (0.4–1.0; y-axis) and the level of multisensory enhancement (MSE; z-axis). Note that the same positions (e.g., neuron 6,6) on the SOM had different levels of MSE depending on the driven value of the input. If the driven value was low (0.4–0.5), the maximal level of MSE generated was high (>400%); but if the driven value was high (0.9–1.0), the maximal MSE level was low (<60%). This inverse relationship between driven value (stimulus strength or effectiveness) and MSE is similar to that observed for biological multisensory neurons.
The inverse relationship between driven value (stimulus strength or effectiveness) and MSE is similar to that observed for biological multisensory neurons. However, in our experiments with bimodal neurons (11), driven stimulation values less than ∼0.4 produced progressively lower values of MSE as driving strength approached spontaneous values (0.1). At present, it is not known if these near-spontaneous values appropriately represent the biology of subthreshold levels of activation of bimodal neurons or not. However, additional experiments (not shown) that manipulated the sigmoid threshold function, (for example yes), affected the results in a manner that was consistent with inverse effectiveness along the entire range of driven stimulation (from 0.9 to 0.1). This manipulation illustrates that the sigmoid function may need to be adjusted in order to accurately simulate the biology. As mentioned previously, choosing a proper threshold function is a common trend of effort in many mathematically modeled biological systems (Hopfield, 1984 ).


The artificial training regimen presented in this paper was based, in part, on the work of Anastasio and Patton (2003) . These authors used a two-stage algorithm, involving a novel hybrid of coincidence and competitive learning for simulating MSE in the deep layers of the superior colliculus. Such an approach, however, required the use of a large number of empirical parameters and structure to emulate the organization and connectivity of the superior colliculus. We have shown that a standard competitive learning algorithm is capable of producing viable results without invoking these additional parameters, stages, and connections; thereby demonstrating that MSE can be simulated with traditional SOMs without specifically modeling the superior colliculus.

Topography Preservation

Our results confirm that a traditional SOM can segregate itself into areas of activity (and inactivity) reflective of its input. Each unimodal input revealed a distinct, active area of the trained map, indicating that separate groups of neurons in the map had organized themselves to recognize certain stimuli (see Figure 2 ). However, when the trained SOM was given bimodal inputs, it generated not two separate foci of activity, but a single, new and even more vigorous level of activity at the intersection between the representations of the different inputs involved. This behavior was true for all effective bimodal stimulus combinations (Figure 2 ). Similarly, trimodal stimuli mostly activated the areas in the center of the map at the intersection of the representations of the three different unimodal stimuli involved (Figure 2 ). Spontaneous inputs without driving exhibited a small amount of activity at the central location (Figure 2 ). Spontaneous activity is an important component of successful visual map formation (Eglen, 1999 ; Raginsky and Anastasio, 2008 ). Although different topographical patterns might occur on separate runs, similar results were obtained for maps initialized with different random weights and trained with different patterns of modality combinations. Furthermore, even though different structures could be produced by varying neighborhood widths or other parameters (not shown), simulated MSE and inverse effectiveness occurred predominantly on the borders between modalities.
These bimodal (and trimodal) effects can be explained by the neighborhood preservation property of SOMs (Bauer and Pawelzik, 1992 ; Haykin, 1999 ). For example, because the inputs in the two modality model are close to the topological forms {(0,n), (n,n), (n,0): 0 ≤ n ≤ 20}, the weights of neurons in the SOM become organized through training to preserve this spatial arrangement. For example, in our experiments, the topology of the input patterns was preserved after training because regions stimulated, for example, by 11 appeared precisely between the areas activated independently by 01 and 10. This effect is due to the fact that a bimodal input (11) is a linear combination of the basis vectors of the constituent unimodal inputs (01,10). Artificial neurons which respond to similar topological forms will organize to lie in close proximity to one another within the map. Because inputs of the form (n,n) are a combination of inputs of the form (0,n) and (n,0), they come to lie between the two on the trained map. Accordingly, bimodal responses will predominantly reside between the areas representing the constituent unimodal inputs. As such, it may be helpful to think of an SOM not so much as representing a spatial map of a neural region, but as the simplified dendrites of a single neuron. In this manner, inputs via synaptic boutons from one modality are physically segregated from those of another (as in the SOM) by distance along the dendrite, but combinations of different inputs achieve their maximal effect at an intermediary point along the same membrane (as in the SOM).

Simulated Multisensory Enhancement

When compared with the results elicited by unimodal inputs, not only did bimodal stimulation cause a spatial shift in the focus of activity within the SOM, but also the level of activity increased. Although bimodal stimulation generally produced increased activity levels across the SOM, the largest increments were always observed at the intersection between the representations of the different unimodal inputs. Quantified as a measure of MSE, these values were significantly (p < 0.02) elevated above those produced by the same artificial neurons under unimodal conditions.
Note that the SOM algorithm has (a) learned weights that could generate the MSE, and (b) organized all weights topographically on the map. However, the sigmoidal transfer function allowed certain weighted input combinations to produce MSE and inverse effectiveness. The magnitude of the simulated MSE was controlled by the slope of the sigmoid function and the location of the steep part of the sigmoid curve relative to the magnitudes of the driven and spontaneous input combinations. Specifically, each SOM neuron’s output was computed by passing a simple weighted linear combination of inputs through a non-linear sigmoid function. The sigmoidal enhancement was generated in certain neurons due to the way in which the sigmoid function was constructed. The sigmoid was constructed so that the expected magnitude of a unimodal input combination would generate an output that corresponded to an area in the middle of the sigmoid curve (see Section “Materials and Methods”). Artificial neurons whose weights combined unimodal signals more equally were more likely to produce an enhanced response because unimodal stimulus levels could combine to traverse the steep section of its non-linear sigmoid function. We chose the sigmoid and input distribution parameters to produce similar enhancement values to those observed in laboratory settings (up to around 350%; see Wallace et al., 1992 ). However, these parameters can be adjusted according to empirically measured integrative levels and subsequently modeled directly within a framework such as the one we have described.

SOMs and Biological Multisensory Enhancement

In summary, the choice of the input vectors and the sigmoid activation function had an influence on the results we have obtained. First, the design of the input vectors and the SOM algorithm ensured that the bimodal (11) neurons were organized to lie on the borders between the unimodal (01) neurons and unimodal (10) neurons. Second, the choice of the sigmoid activation function helped the bimodal (11) neurons to contribute more to the simulated MSE than the unimodal (01 or 10) neurons. Together, these two choices provided quantitative predictions about the topographical organization and magnitude of MSE according to various levels of driven and spontaneous inputs.
The simulated MSE observed in bimodal and trimodal SOMs bears a striking resemblance to the phenomenon of MSE in central nervous system (CNS) neurons, such as those found in the superior colliculus as well as other brain areas (e.g., Barraclough et al., 2005 ; Clemo et al., 2007 ; King and Palmer, 1985 ; Meredith and Stein, 1983 , 1986 ; Sugihara et al., 2006 ; Wallace et al., 1992 ). The SOMs, like multisensory neurons of the classic, bimodal type, are independently activated by inputs from different sensory modalities. In addition, when presented combined-modality stimulation, both SOM and CNS bimodal neurons have the potential to respond with activity levels which exceed that generated by either of the component inputs presented alone. In fact, parts of SOMs that reveal multisensory response enhancement that, when quantified [see, for example, bar graph for neuron at position (5,5) in Figure 3 , show a striking resemblance to superadditive responses in some CNS neurons (see Figure 2, Meredith and Stein, 1986 ; Figure 7, Perrault et al., 2005 ). Also like biological neurons, SOMs presented different levels of driven activity generated different levels of multisensory response enhancement when the stimuli were combined. Furthermore, like biological neurons, there was a specific stimulus–response relationship such that SOMs presented with low-level driven stimuli achieved proportionately higher levels of enhancement than those given more effective driven stimuli. In CNS neurons, this phenomenon has been described as “inverse effectiveness,” where higher levels of enhancement occur in response to weak stimuli when compared to that elicited by combinations of highly effective stimuli (Ghazanfar et al., 2005 ; Kayser et al., 2005 ; Lakatos et al., 2007 ; Meredith and Stein, 1986 ). Thus, a distinctive property of multisensory neurons, that of inverse effectiveness, also emerges from trained SOMs. Although inverse effectiveness has been observed biologically (and thus expected by those familiar with multisensory research literature), it has rarely been modeled. Our work represents a computational demonstration of multisensory inverse-effectiveness and, as such, provides insights into the phenomenon that are difficult to test experimentally. Specifically, it is experimentally quite difficult to assess the lower limits of this effect, since the biological distinctions between subthreshold and spontaneous activity are quite small and labile in living neurons. Although the sigmoidal SOM model we used was very simple, it interpreted a wide range of observations, and made definite predictions that can be tested, and possibly falsified by further biological experiment.
In summary, the present study sought to identify a simple, unsupervised neural computing system that can learn to simulate multisensory processing. To this end we employed a widely used, biologically inspired unsupervised competitive learning algorithm to simulate how artificial neurons can behave and become topologically organized following training with artificial multimodal stimulation. These biologically inspired procedures and assumptions were able to simulate definitive properties of MSE, and may represent such effects wherever in the CNS they may occur.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.


Thanks to H.R. Clemo, L.P. Keniston for comments on this manuscript. Supported by Trinity College funds (K. Ahmad) and NIH Grant NS064675 (M.A. Meredith).

Supplementary Material

The Supplementary Material for this article can be found online at 10.3389/neuro.10/008.2009 .


Ahmad, K., Bale, T., and Casey, M. (2002). Connectionist simulation of quantification skills. Connect. Sci. 14, 165–201.
Anastasio, T. J., and Patton, P. E. (2003). A two-stage unsupervised learning algorithm reproduces multisensory enhancement in a neural network model of the corticotectal system. J. Neurosci 23, 6713–6727.
Anastasio, T. J., Patton, P. E., and Belkacem-Boussaid, K. (2002). Using Bayes’ rule to model multisensory enhancement in the superior colliculus. Neural Comput. 12, 1165–1187.
Barraclough, N. E., Xiao, D., Baker, C. I., Oram, M. W., and Perrett, D. I. (2005). Integration of visual and auditory information by superior temporal sulcus neurons responsive to the sight of actions. J. Cogn. Neurosci. 17, 377–391.
Bauer, H. U., and Pawelzik, K. R. (1992). Quantifying the neighborhood preservation of self-organizing feature maps. IEEE Trans. Neural Netw. 3, 570–579.
Clemo, H. R., Allman, B. L., Donlan, M. A., and Meredith, M. A. (2007). Sensory and multisensory representations within the cat rostral suprasylvian cortices. J. Comp. Neurol. 503, 110–127.
Colonius, H., and Diederich, A. (2001). A maximum-likelihood approach to modeling multisensory enhancement. In NIPS, T. G. Dietterich, S. Becker and Z. Ghahramani, eds (Cambridge, MA, MIT Press), pp. 181–187.
Colonius, H., and Diederich, A. (2004). Why aren’t all deep superior colliculus neurons multisensory? A Bayes’ ratio analysis. Cogn. Affect. Behav. Neurosci. 4, 344–353.
Eglen, S. J. (1999). The role of retinal waves and synaptic normalization in retinogeniculate development. Philos. Trans. R. Soc. Lond., B, Biol. Sci. 354, 497–506.
Flanagan, J. A. (2001). Self-organization in the one-dimensional SOM with a decreasing neighborhood. Neural Netw. 14, 1405–1417.
Ghazanfar, A. A., Maier, J. X., Hoffman, K. L., and Logothetis, N. K. (2005). Multisensory integration of dynamic faces and voices in rhesus monkey auditory cortex. J. Neurosci. 25, 5004–5012.
Ghazanfar, A. A., and Schroeder, C. E. (2006). Is neocortex essentially multisensory? Trends Cogn. Sci. 10, 278–285.
Goodhill, G. J. (2007). Contributions of theoretical modeling to the understanding of neural map development. Neuron 56, 301–311.
Grossberg, S. (1976). Adaptive pattern classification and universal recoding: I. Parallel development and coding of neural feature detectors. Biol. Cybern. 23, 121–134.
Haykin, S. (1999). Neural Networks, 2nd Edn. Englewood Cliffs, NJ, Prentice Hall.
Hebb, D. O. (1949). The Organization of Behavior. New York, NY, John Wiley & Sons.
Hecht-Nielsen, R. (1990). Neurocomputing. Reading, MA, Addison-Wesley.
Hopfield, J. J. (1984). Neurons with graded response have collective computational properties like those of two-state neurons. Proc. Natl. Acad. Sci. USA 81, 3088–3092.
Kayser, C., Petkov, C. I., Augath, M., and Logothetis, N. K. (2005). Integration of touch and sound in auditory cortex. Neuron 48, 373–384.
King, A. J., and Palmer, A. R. (1985). Integration of visual and auditory information in bimodal neurons in the guinea-pig superior colliculus. Exp Brain Res 60, 492–500.
Kohonen, T. (1990). The self-organizing map. Proc. IEEE 78, 1464–1479.
Kohonen, T. (1993). Physiological interpretation of the self-organizing map algorithm. Neural Netw. 6, 895–905.
Kohonen, T. (2001). Self-Organizing Maps. Berlin, Springer.
Lakatos, P., Chen, C. M., O’Connell, M. N., Mills, A., and Schroeder, C. E. (2007). Neuronal oscillations and multisensory interaction in primary auditory cortex. Neuron 53, 279–292.
Meredith, M. A., and Stein, B. E. (1983). Interactions among converging sensory inputs in the superior colliculus. Science 221, 389–391.
Meredith, M. A., and Stein, B. E. (1986). Visual, auditory, and somatosensory convergence on cells in superior colliculus results in multisensory integration. J. Neurophysiol. 56, 640–662.
Patton, P. E., and Anastasio, T. J. (2003). Modeling cross-modal enhancement and modality-specific suppression in multisensory neurons. Neural Comput. 15, 783–810.
Patton, P. E., Belkacem-Boussaid, K., and Anastasio, T. J. (2002). Multimodality in the superior colliculus: an information theoretic analysis. Brain Res. Cogn. Brain Res. 14, 10–19.
Perrault, T. J. Jr., Vaughan, J. W., Stein, B. E., and Wallace, M. T. (2005). Superior colliculus neurons use distinct operational modes in the integration of multisensory stimuli. J. Neurophysiol. 93, 2575–2586.
Poirazi, P., Brannon, T., and Mel, B. W. (2003a). Arithmetic of subthreshold synaptic summation in a model CA1 pyramidal cell. Neuron 37, 890–891.
Poirazi, P., Brannon, T., and Mel, B. W. (2003b). Pyramidal neuron as two-layer neural network. Neuron 37, 989–999.
Raginsky, M., and Anastasio, T. J. (2008). Cooperation in self-organizing map networks enhances information transmission in the presence of input background activity. Biol. Cybern. 98, 195–211.
Sadeghi, A. A. (1998). Self-organization property of Kohonen’s map with general type of stimuli distribution. Neural Netw. 11, 1637–1643.
Stein, B. E., and Meredith, M. A. (1993). The Merging of the Senses. Cambridge, MA, The MIT Press.
Sugihara, T., Diltz, M. D., Averbeck, B. B., and Romanski, L. M. (2006). Integration of auditory and visual communication information in the primate ventrolateral prefrontal cortex. J. Neurosci. 26, 11138–11147.
von der Malsburg, C. (1973). Self-organization of orientation sensitive cells in the striate cortex. Kybernetik 14, 85–100.
Wallace, M. T., Meredith, M. A., and Stein, B. E. (1992). The integration of multiple sensory modalities in cat cortex. Exp. Brain Res. 91, 484–488.
Willshaw, D. J., and von der Malsburg, C. (1976). How patterned neural connections can be set up by self-organization. Proc. R. Soc. Lond., B, Biol. Sci. 194, 431–445.
Wu, Y., and Takatsuka, M. (2006). Spherical self-organizing map using efficient indexed geodesic data structure. Neural Netw. 19, 900–910.
multisensory integration, artificial neural networks, competitive learning, self-organization, computational modeling, superior colliculus
Martin JG, Meredith MA and Ahmad K (2009). Modeling multisensory enhancement with self-organizing maps. Front. Comput. Neurosci. 3:8. doi: 10.3389/neuro.10.008.2009
13 January 2009;
 Paper pending published:
08 February 2009;
04 June 2009;
 Published online:
24 June 2009.

Edited by:

Israel Nelken, Hebrew University, Israel

Reviewed by:

Bruno Averbeck,Institute of Neurology, University College London, UK
Andrew King, McGill University, Canada
© 2009 Martin, Meredith and Ahmad. This is an open-access article subject to an exclusive license agreement between the authors and the Frontiers Research Foundation, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are credited.
Jacob G. Martin, Department of Neuroscience, Georgetown University Medical Center, WP-07 New Research Building, 3970 Reservoir Road, North West, Washington, DC 20007, USA. e-mail: