FOCUSED REVIEW article
Front. Neurosci., 15 December 2008 | https://doi.org/10.3389/neuro.01.033.2008
Department of Neurobiology and Anatomy, Wake Forest University School of Medicine, Winston-Salem, NC, USA
Animals have evolved multiple senses that transduce different forms of energy as a way of increasing their sensitivity to environmental events. Each sense provides a unique and independent perspective on the world, and very often a single event stimulates several of them. In order to make best use of the available information, the brain has also evolved the capacity to integrate information across the senses (“multisensory integration”). This facilitates the detection, localization, and identification of a given event, and has obvious survival value for the individual and the species. Multisensory responses in the superior colliculus (SC) evidence shorter latencies and are more robust at their onset. This is the phenomenon of initial response enhancement in multisensory integration, which is believed to represent a real time fusion of information across the senses. The present paper reviews two recent reports describing how the timing and robustness of sensory responses change as a consequence of multisensory integration in the model system of the SC.
Animals have evolved multiple senses that transduce very different forms of energy as a way of increasing their sensitivity to environmental events. Three of these senses that are of particular interest in the current context are vision, audition, and somatosensation. The first involves sensitivity to photons of various wavelengths that may be reflected off of objects, the second pressure waves that travel through an intervening physical medium, and the third physical displacement of hair or skin. Each sense provides a unique and independent perspective on the world, and very often a single event stimulates several of them. For example, animals can often be seen and heard as a consequence of their movement (e.g., hooves hitting the ground). In order to make best use of the available information, the brain has also evolved the capacity to integrate information across the senses. This facilitates the detection, localization, and identification of a given event, and has obvious survival value for the individual and the species (Rowe, 1999 ; Stein and Meredith, 1993 ). Indeed, this capacity (referred to as “multisensory integration ”) is likely to be present in every organism. Its functional impact has been examined in a variety of organisms, including insects (Fischer et al., 2001 ), fish (Allum et al., 1976 ), reptiles (Gaither and Stein, 1979 ), birds (Whitchurch and Takahashi, 2006 ), rodents (King and Palmer, 1985 ; Komura et al., 2005 ), carnivores (Bizley et al., 2007 ; Stein and Arigbede, 1972 ), nonhuman primates (Bell et al., 2005 ; Wallace et al., 1996 ), and humans (Ernst and Banks, 2002 ; Frens et al., 1995 ; McGurk and MacDonald, 1976 ).
The computational basis of multisensory integration can be understood in the context of statistical sampling. Factors affecting the reliability of one sense do not affect the others. For example, loud background noise might impair our ability to hear the speech of a friend, but not our ability to see him. Furthermore, due to their separation in the brain, random fluctuations that introduce error to the estimates of one sense are unlikely to contaminate estimates in another sense (Anastasio et al., 2000 ; Knill and Pouget, 2004 ; Rowland et al., 2007a ). Thus, the estimates provided by the two senses are independent of one another, and when independent estimates that measure the same factor (e.g., an object’s location) are integrated, their combination yields a better overall estimate of that factor. It is not only intuitively obvious that obtaining two or more independent samples of an event is a better strategy for detecting and identifying that event than obtaining only one, but behavioral responses to multisensory events are often faster and more effective than responses to unisensory events (Frens et al., 1995 ; Gielen et al., 1983 ; Goldring et al., 1996 ; Hughes et al., 1994 ; Perrott et al., 1990 ).
Multisensory integration is used in many areas of the brain to facilitate a host of tasks, and often this is due to the convergence of unisensory afferents at many different sites rather than the routing of multisensory information from one multisensory hub to another. An excellent example is the superior colliculus (SC), a midbrain structure involved in attentive and orientation behaviors that has often served as a model for understanding multisensory integration (Jiang et al., 2001 ; Meredith and Stein, 1983 ; Perrault et al., 2003 ; Rowland et al., 2007c ; Stanford et al., 2005 ; Wallace et al., 1993 ). Neurons in the SC receive unisensory inputs from visual, auditory, and somatosensory structures (Clemo and Stein, 1984 ; Stein et al., 1976 ; Wallace et al., 1993 ). These inputs are in rough topographic register so that when they converge on a single neuron, the neuron’s different sensory receptive fields are in spatial register with one another. As a result, stimuli derived from the same location in space, regardless of sensory modality, can activate the same neurons. The afferents of the SC are derived from many areas of the brain, and in particular, from relatively “low-order” sensory areas and “high-order” unisensory association areas of the cortex, such as the anterior ectosylvian sulcus (AES) in the cat (Figure 1 ) (Jiang et al., 2001 , 2002 ; Wallace and Stein, 1994 ; Wallace et al., 1993 ).
Figure 1. A schematic model of the circuit believed to support multisensory integration in the superior colliculus (SC). A multisensory (visual-auditory) neuron receives input from a number of unisensory sources; in particular, inputs derived from areas of association cortex (e.g., AES) (top) and inputs from other sources (bottom). These inputs also project to an inhibitory interneuron population (I). The inhibitory connections “offset,” or balance, the excitatory inputs, but have a greater impact on non-AES than AES inputs. The balance of excitation and inhibition in this single-unit model replicates a number of physiological findings pertaining to multisensory integration in the SC; for example, its dependence on the functional integrity of cortico-collicular afferents from AES. The nature of the biophysical mechanisms underlying multisensory integration remains an active area of empirical enquiry.
Traditionally, multisensory integration in the SC has been measured by calculating the change in the magnitude of the response (# of impulses) over its entire duration. Operationally, if the multisensory response magnitude significantly exceeds the number of impulses evoked by the most effective modality-specific component stimulus it is said to reflect “multisensory enhancement. ” When the multisensory response magnitude exceeds the sum of the unisensory response magnitudes, it is described as “superadditive.” If the multisensory response is significantly smaller than the largest unisensory response, it reflects “multisensory depression” (Stein and Meredith, 1993 ). Multisensory enhancement and depression are two opposing forms of multisensory integration. The magnitude of these integrated responses is typically quantified as the percent deviation of the multisensory response from the largest of the unisensory responses. Individual neurons are defined as “multisensory” if they respond to more than a single sensory modality or engage in multisensory integration as operationally defined above. Much excitement has been generated by reports that multisensory integration in the SC reflects the nonlinear integration of descending inputs from association cortex (Jiang et al., 2001 , 2002 ; Wallace and Stein, 1994 ). In short, multisensory responses do not reflect integration in the absence of these cortical projections.
For obvious reasons, multisensory enhancement has been the focus of far more attention in the computational community than has multisensory depression (this is the focus of the following discussion), and has typically been found to be inversely related to the magnitude of the unisensory response (“inverse effectiveness”) (Stein and Meredith, 1993 ). This property makes good computational sense if one assumes that there is a rough correlation between response magnitude and the information contained in the response (Anastasio et al., 2000 ). The more information one can obtain from a single source, the less the benefit accrued by looking at other sources. In other words, if the variance of an estimate is already very small, there is less benefit to adding another observation to the population. Physiologically, the largest enhancements are seen when unisensory responses are at their weakest, where the multisensory response can be greater than the sum of the unisensory responses (“superadditivity”) (Stanford and Stein, 2007 ; Stanford et al., 2005 ).
Response magnitude, measured in numbers of impulses, is a very useful way of characterizing neural responses. However, it also has limitations. The principle limitation is that it is measured over the entire response and thus is time-insensitive. Consequently its use gives us no information regarding the timing of the various multisensory interactions that are merged and collectively referred to as multisensory integration. This is somewhat problematic as all portions of a sensory response (e.g., beginning, middle, and end) are not necessarily treated equally by the rest of the brain. For example, the beginning of the response may have a far greater impact on immediate behavioral reactions than the end of the response. On the other hand, the end of the response may be more important in higher-order computations. Multisensory integration might also change the onset and acceleration of the response. The possibilities, of course, are not limited to these options. Given that we now want to know when multisensory interactions are occurring in time, a single measure of response magnitude is insufficient.
To address these issues, it was necessary to employ techniques to quantify the temporal profiles of sensory responses. The goal here was to transform series of discrete impulses collected in response to multiple presentations of a stimulus into a continuous-time analog estimate of response magnitude. There are several popular methods that have been successfully used for this purpose, including instantaneous-firing rate methods which use the reciprocal of inter-spike intervals and spike density function methods in which the impulse train is convolved with some continuous kernel function (e.g., an exponential or Gaussian function) (Koch and Segev, 1999 ). These methods work well in many but not all circumstances. For example, both methods are difficult to use consistently when responses are weak in magnitude (i.e., near threshold). Instantaneous firing rate methods are challenged by the lack of multiple impulses on individual trials, and spike density function methods require an assumption of the shape and width of the kernel to be used in the convolution. If the kernel is too “thin,” then one will underestimate the instantaneous firing rate; if it is too “thick,” one loses precision in the timing of interactions. As noted above, however, responses near threshold are exactly the cases in which multisensory integration has its largest effects.
Our solution was to employ two methods to quantify the temporal profile of multisensory and unisensory responses (Rowland and Stein, 2007 ; Rowland et al., 2007b ). The first method computed a running tally of the mean number of stimulus-elicited impulses generated on or before each moment in time (the mean cumulative impulse count, or qsum for short). The second method gave a more instantaneous measure of the response efficacy, and was termed an event estimate. This measure was computed by first identifying a kernel function which, when convolved with the impulse train, would yield a spike density function with the greatest separation between activity in the spontaneous and response-related ranges (using mutual information as a measure). A spike density function computed using this kernel was then obtained, and each value was replaced with the probability that the value reflected response-related (i.e., not spontaneous) activity. Spontaneous levels (which were typically very low) were simply subtracted. The temporal profile of multisensory integration could be computed using these methods by subtracting the multisensory and largest (or alternatively, the summed) unisensory response(s) at each moment in time.
The hypothesis was that cross-modal inputs are integrated in real-time, that is; as soon as they arrive at the target structure. We refer to this as a model of real-time integration (Figure 2 A). The expectation in this case was that multisensory responses would evidence shorter response latencies, and that multisensory response enhancement would be evident from the very onset of the multisensory response. Another possibility was that multisensory interactions might appear only later in the response. We refer to this as a model of delayed integration (Figure 2 B). This outcome might reflect higher-order computations engaged in multisensory processing (e.g., “binding”), recurrent interactions required for its expression (Knill and Pouget, 2004 ), and/or (in the case of the SC) a nonlinear dependence on association areas of the cortex (Jiang et al., 2001 , 2002 ; Wallace and Stein, 1994 ).
Figure 2. Two different models for the temporal profile of multisensory integration. Illustrated are putative unisensory visual (V), auditory (A), and multisensory (VA) input magnitudes (top) and cumulative output magnitudes (bottom) as a function of time. If inputs are integrated in real-time (left, A), the multisensory input magnitude will cross threshold earlier than either unisensory input (upper-left). This will produce a multisensory response with a shorter response latency and higher initial firing rate. The predicted cumulative response profile (qsum) is illustrated in the lower-left corner. Alternatively, if multisensory integration takes place only later in the response (“delayed integration”, B), the multisensory and unisensory input profiles will initially appear similar, but the multisensory response will reach a higher peak (upper-right). This will yield multisensory responses that are enhanced in magnitude, but have identical physiological response latencies as the unisensory responses (bottom-right).
Evaluating the temporal profile of the response provides us with insights into other issues as well. As described above, the largest multisensory enhancements (i.e., “superadditive” enhancements) are typically observed when unisensory responses are weak (Stanford et al., 2005 ; Stein and Meredith, 1993 ). However, even the strongest responses are “weak” at their onset and offset. Thus, if multisensory integration takes place near the onset or offset of a response, signals may be integrated in a superadditive computation at those times, even if the averaged computation over the entire response is additive or even subadditive. In short, the temporal profile provides a mechanism to address the incidence of various computations engaged during multisensory integration at different points in time, giving a greater appreciation of their incidence in the overall population by avoiding the possibility that the incidence is really simply a function of which part of the response one examines.
In this research we used data collected (courtesy of Stephan Quessy) from single multisensory (visual-auditory) neurons in the deep layers of the SC of anesthetized cats. Neural responses were collected to modality-specific (visual, auditory), and cross-modal (visual-auditory) stimuli . Stimulus intensities and onset asynchronies were systematically manipulated to ensure a broad sample of responses. Analysis of the dataset was therefore restricted to circumstances in which the unisensory response onsets were approximately aligned.
Multisensory Integration Shortens Physiological Response Latencies (Rowland et al., 2007b )
Using the qsum measure, it was found that in the majority of cases (69%), the minimum multisensory response latency was shorter than the minimum unisensory response latency. In other words, the multisensory response typically began before the very earliest unisensory response was expected to begin. Although most latency shifts were quite small in magnitude (mean = 6.2 ms), they could also be quite long. The longest shifts correlated with the longest unisensory response latencies, although the incidence and magnitude of the effect appeared dependent on the relative timing of the visual and auditory responses. In every neuron studied, at least one cross-modal stimulus combination produced a latency shift. This was interpreted as resulting from the integration of subthreshold unisensory inputs during their rising phase. This interpretation was also most consistent with the magnitude and incidence of the observed latency shifts and with a model of real-time multisensory integration.
If one identifies the computational mode at each moment in time; the number of responses evidencing superadditive computations somewhere in their responses (typically the beginning) in this population is 88%, much higher than previous estimates that evaluated superadditive computations by first measuring response magnitude over the entire response duration. This initial response enhancement (IRE) often dominated responses (see Figure 3 ). This was especially true at the very onset of responses in which there was a latency shift. Obviously, any response containing a multisensory latency shift engages superadditive computations (both unisensory referents at the onset of the multisensory response is zero). However, the incidence of superadditive computations also increased steadily during the first 40 ms of the response.
Figure 3. Multisensory integration reflects an initial response enhancement (IRE). Shown is a dramatic example of initial response enhancement from a single visual-auditory SC neuron. Impulse rasters for visual (V), auditory (A), and multisensory (VA) responses are displayed on the left. The vertical line crossing each plot indicates the time of the multisensory response onset, with respect to which the unisensory responses are delayed. The differences between the multisensory and unisensory qsums (upper-right) and event estimates (lower-right) are striking. The multisensory response not only evidences a shorter latency, but is greatly enhanced in robustness from its very onset.
Multisensory Integration Reflects an Initial Response Enhancement (IRE) (Rowland and Stein, 2007 )
These data indicated that multisensory responses containing an IRE also occurred earlier than expected given the unisensory responses. Here we sought a more detailed analysis of this phenomenon. The first step was to determine if the enhancements evident at the beginning of the multisensory response were retained throughout its duration, or were transient and eliminated shortly thereafter. To evaluate this we assessed if and when multisensory and unisensory responses (quantified using the qsum and event estimates) reached certain threshold criteria values (e.g., 1 impulse or an event estimate of 0.5). We found that multisensory responses were more likely to meet higher criteria than unisensory responses, and reached criteria levels earlier than the corresponding unisensory responses. This indicated that the early enhancements seen in the multisensory responses were retained throughout their duration. The results were qualitatively similar for both qsum and event estimate measures.
To evaluate relative acceleration of multisensory enhancement over the entire response duration, we extracted Δqsum and Δevent estimate difference measures by subtracting the largest unisensory response from the multisensory response at each moment in time. Data were analyzed as a function of time and percent of the total multisensory response duration. We fit a continuous function to the difference measures in order to extract their first and second derivatives. It was found that the acceleration of multisensory enhancement is greatest at the beginning of the response, typically within the first 40 ms or roughly 50% of total duration, thereby providing a “rule of thumb” for the duration of the IRE. It should be noted that there is also an increase in enhancement velocity at the end of the response, reflecting the fact that multisensory responses also evidence longer durations than do unisensory responses.
To further explore these issues the multisensory responses were binned into quartiles and the multisensory and summed unisensory event estimates (sampled at 1 ms resolution) were compared within each bin. In the first quartile of the response (which includes the IRE), multisensory event estimates were overwhelmingly greater than the sum of the unisensory responses. These differences decreased in each successive quartile, until the fourth quartile, in which the multisensory and summed unisensory event estimates were much closer to parity. These observations reinforced the idea that multisensory integration reflected an initial response enhancement, wherein multisensory responses were enhanced (in this dataset) from their very onset.
Multisensory computations represent a special subcategory of information processing in which samples taken by the different senses in parallel are integrated (Stein and Meredith, 1993 ). Despite the fact that multisensory integration appears, in at least some cases, capable of engaging higher-order circuits, it takes place in real-time so that unisensory inputs are integrated as soon as they arrive at the SC neuron to form multisensory products (Rowland and Stein, 2007 ; Rowland et al., 2007b ). Consequently, at both behavioral and physiological levels, concordant cross-modal cues are integrated to yield multisensory responses that reach higher levels of activity at faster rates than do the corresponding unisensory responses. The greatest acceleration of multisensory enhancement is evident in the initial phase of the response (when the responses themselves are accelerating), although it continues at lower levels throughout the response window. This is referred to here as the initial response enhancement or IRE.
A long-standing observation in the field of multisensory integration is that the greatest multisensory enhancements are produced when modality-specific stimuli are weakly effective (“inverse effectiveness”) (Stanford et al., 2005 ; Stein and Meredith, 1993 ). There is a sound computational basis for this principle: when a single unisensory input provides a lot of information, there is less of a “benefit” to be gained by adding information from another source. The consideration of the temporal profile of the response discussed here sheds new light on this issue, and provides an entirely new perspective on the underlying computations. Because physiological responses to sensory stimuli typically increase, plateau, and then decrease, one might expect the same principle to be evident over time in the multisensory response. Indeed, this was the finding and the basis of the IRE, an initial portion of the response wherein the acceleration of multisensory enhancement was greatest (albeit there was often a lesser acceleration at the end of the response).
The IRE is associated with multisensory integration, which in the case of the SC, is dependent on the functional integrity of inputs derived from association cortex. Whether this is a general property of all multisensory circuits, or specific to this particular circuit, remains to be determined. Furthermore, although these analyses were specifically directed to circumstances in which the sensory signals arrived at the target neuron very close in time to one another, this is undoubtedly not the only possible circumstance one might encounter. It is unknown, for example, whether the phenomenon of the IRE will generalize to circumstances in which one input might precede the other by a substantial margin. However, we predict that the IRE will generalize to these circumstances, but will be shifted in time to reflect the impact of the earlier sensory input on the later. The accuracy of this prediction and the presumptive magnitudes of these effects remain to be determined.
These results indicate that the principle of inverse effectiveness holds for individual responses, and that a response may contain superadditive, additive, or subadditive computations, regardless of what computation is estimated from the response magnitude that is averaged over the duration of the response. Furthermore, the results show that superadditivity is far more common in multisensory responses than previously thought, especially during the IRE. Perhaps more important, however, is that by virtue of the location of the IRE, and the fact that it generally engaged a superadditive computation, the IRE can skew the computation when averaged over the entire response. Thus the superadditive computations in the IRE are likely to have a far greater impact on reaction speed, event detection, and localization than previously appreciated.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
The authors would like to thank Stephan Quessy and Terrence Stanford for their assistance in collecting the initial dataset. This research was supported by NIH grants NS36916 and EY016716.
Multisensory integration: The synthesis of information across different sensory modalities. Operationally, multisensory integration is defined as a significant difference between the response elicited by a combination of cross-modal stimuli and the largest response elicited by the component modality-specific stimuli when presented in isolation.