Perceived duration of Visual and Tactile Stimuli Depends on Perceived Speed

It is known that the perceived duration of visual stimuli is strongly influenced by speed: faster moving stimuli appear to last longer. To test whether this is a general property of sensory systems we asked participants to reproduce the duration of visual and tactile gratings, and visuo-tactile gratings moving at a variable speed (3.5–15 cm/s) for three different durations (400, 600, and 800 ms). For both modalities, the apparent duration of the stimulus increased strongly with stimulus speed, more so for tactile than for visual stimuli. In addition, visual stimuli were perceived to last approximately 200 ms longer than tactile stimuli. The apparent duration of visuo-tactile stimuli lay between the unimodal estimates, as the Bayesian account predicts, but the bimodal precision of the reproduction did not show the theoretical improvement. A cross-modal speed-matching task revealed that visual stimuli were perceived to move faster than tactile stimuli. To test whether the large difference in the perceived duration of visual and tactile stimuli resulted from the difference in their perceived speed, we repeated the time reproduction task with visual and tactile stimuli matched in apparent speed. This reduced, but did not completely eliminate the difference in apparent duration. These results show that for both vision and touch, perceived duration depends on speed, pointing to common strategies of time perception.


INTRODUCTION
Any sensory experience, regardless of the modality of the stimulus -visual, auditory, or tactile -is defined within a temporal interval. Stimuli of different sensory modalities are all mapped along the same temporal dimension, allowing us to order events in time as well as to judge their relative duration. The most immediate and intuitive comprehension of time is therefore that of a universal dimension that transcends each specific sensory modality. The idea that our brain is endowed with a unique and centralized clock has dominated the research on time perception for many years (Treisman, 1963;Gibbon et al., 1997). Emerging evidence suggests, however, that the analysis of temporal information may have modality-specific components (e.g., Gamache and Grondin, 2010) and may be intimately embedded within local sensory processing. It has been shown that perceived time can be distorted by means of local sensory adaptation both in the visual (Johnston et al., 2006;Burr et al., 2007) and, most recently, in the tactile domain (Tomassini et al., 2010;Watanabe et al., 2010). Moreover, modalityspecific temporal distortions have been documented around the time of saccadic eye movements (Morrone et al., 2005). Multiple and distributed mechanisms, though likely constrained by similar computational principles, may thus underlie timing functions within different sensory modalities. That no dedicated system exists for perceiving time, at least in the sub-second range, would also explain the ease with which many non-temporal, low-level properties of the stimuli -such as visibility (Terao et al., 2008), size (Xuan et al., 2007), temporal frequency (Kanai et al., 2006;Khoshnoodi et al., 2008), and speed (Kaneko and Murakami, 2009) -can alter perceived time. The strong influence of stimulus motion on apparent duration has long been recognized (Lhamon and Goldstone, 1975;Brown, 1995), although its neural bases and functional significance remain unknown. So far, the relationship between these two perceptual attributes has been uniquely studied within the visual system, showing that faster moving stimuli appear to last longer.
This motion-induced temporal illusion is well suited to investigate the properties of timing mechanisms within and across sensory modalities. Recent evidence suggests that visual and tactile motion processing share much in common (e.g., Pei et al., 2011): for example visual and tactile motion are subject to similar illusions (Harrar and Harris, 2007;Watanabe et al., 2007;Bicchi et al., 2008), show cross-modal interactions (Bensmaia et al., 2006;Craig, 2006;Konkle et al., 2009), multisensory facilitation (Gori et al., 2011), and seem to have partially overlapping neural substrates (Hagen et al., 2002;Ricciardi et al., 2004).
One remarkable feature, that has no counterpart in the spatial domain, is that not only do different sensory modalities show different temporal resolutions (as in space) but they can also provide different estimates for the temporal properties of sensory events (Grondin, 2003;van Erp and Werkhoven, 2004). It is well Frontiers in Integrative Neuroscience www.frontiersin.org known, for example, that auditory tones are perceived to last longer than visual flashes of the same physical length (Walker and Scott, 1981;Wearden et al., 1998;Harrington et al., 2011). Given the importance of accurate timing for multiple perceptual, motor, and cognitive functions, a relevant but poorly investigated question is how the brain deals with these inter-sensory discrepancies in temporal estimates and ultimately provides a combined percept of event duration.

THE BAYESIAN FRAMEWORK FOR MULTISENSORY INTEGRATION
In recent years the Bayesian statistical approach has been successful in providing a quantitative prediction of the effects of intersensory signal combination in many perceptual domains (Ernst and Banks, 2002;Ernst and Bülthoff, 2004). Optimal Bayesian integration of multiple sensory signals requires each source of information to be weighted by its relative reliability so that the most probable, though sometimes erroneous, perceptual estimate is obtained with the less uncertainty (see Eqs 1-4 in the Materials and Methods). The so-called "ventriloquist effect" illustrates clearly how the most precise information, in this case that provided by vision, drives the final percept with the sound being attracted toward the location of the visual stimulus (just like the ventriloquist's voice seems to come from the mouth of the puppet; Alais and Burr, 2004). By virtue of its higher spatial acuity, vision usually dominates audition in the spatial domain; the reverse is however true in the temporal domain. Many cases show that auditory stimuli can strongly influence the perceived timing of visual (Shams et al., 2000;Aschersleben and Bertelson, 2003;Morein-Zamir et al., 2003;Recanzone, 2003) and tactile events (Bresciani et al., 2005). Although this is in line with what optimal ''Bayesian" integration would predict on the basis of the greater temporal precision of the auditory system, it is not clear whether this model provides a good quantitative description of the data. While most studies reporting auditory dominance in temporal judgments have not assessed this issue directly, two recent studies, testing audiovisual integration in a temporal bisection task (Burr et al., 2009) and audio-tactile temporal order judgments (Ley et al., 2009), provide conflicting results. Evidence as to whether the Bayesian cue-combination theory is a good explanatory framework in the temporal domain, like it is in the spatial domain, remains thus inconclusive.
In this study we test whether speed-dependency of apparent duration is a general property of sensory systems. That apparent duration depends on speed for both vision and touch would suggest that timing mechanisms share common operating principles across different modalities. Our results show that the duration of tactile events also depends on speed, pointing to a general principle. We also studied bimodal visuo-tactile gratings, to investigate how vision and touch are combined to yield an estimate of duration. The results show that the two modalities do interact with each other, but the advantage gained from the bimodal fusion is quantitatively suboptimal.

MATERIALS AND METHODS
Visual, tactile, and visuo-tactile motion stimuli were provided by physical wheels (diameter 10.5 cm; width 3 cm) etched with a corrugated grating of alternating ridges and grooves of equal width, of spatial frequency 3 c/cm ( Figure 1A). The wheels were spatially aligned to give the appearance of a common object and driven at specific velocities by two independently controlled motors ( Figure 1B). The velocity of the wheels was calibrated by means of a visual tracking system (NDI Optotrack Certus system), showing only minor deviations (3%) from the ideal constant velocity stimuli.
Subjects, seated at 57 cm from the stimuli, observed the front wheel through a small aperture (visual condition) and touched with their right index finger the second wheel, concealed from view (tactile condition). In the bimodal condition participants observed and touched the two wheels simultaneously ( Figure 1C). The gratings were oriented horizontally (perpendicular to the long axis of the finger) and the direction of the motion could be either up-to-down (distal-to-proximal relative to the finger) or down-toup (proximal-to-distal) depending on the trial (always coherent in the bimodal condition).
Participants were required to reproduce the duration of the moving stimuli by pressing a button on a keyboard with their left index finger after each stimulus presentation. The next stimulus started 1 s after the end of the reproduction phase.
The stimuli were presented for three different durations, 400, 600, and 800 ms, with speed varied between 3.5, 5, 7.5, 10, 12.5, and 15 cm/s. Data were collected in separate sessions of 90 trials, with different durations and speeds intermingled within each session. Although this procedure may result in what is called "regression toward the mean," reducing the real effect of speed on duration, we chose to randomize both durations and speeds so as to encourage subjects to attend to the stimuli and avoid stereotyped responses.
No feedback was provided about the physical duration of the stimuli. Six subjects, one author and five naïve to the goals of the experiment, participated in the experiment; each subject completed a minimum of four sessions per condition (visual, tactile, and visuotactile). Participants did not receive any training. The average and variance of the reproduced durations across trials were calculated separately for each subject, stimulus modality, duration, and speed. The second part of the experiment involved a cross-modal speed-matching task. The experimental apparatus and stimuli were the same as described above. Three subjects (one author and two naïves from the previous group) were asked to judge the relative speed of two moving stimuli, one visual and the other tactile, presented in succession in random order for 600 ms each. The direction of the movement was randomized on a trial-by-trial basis, but was always the same for the two stimuli within each trial.
The speed of one stimulus (the probe) was varied from trial to trial by means of the QUEST algorithm (Watson and Pelli, 1983) to generate a psychometric function; the other stimulus (the standard) had fixed speed. Two different conditions were intermingled within the same experimental session, with the probe being either tactile or visual. Three separate sessions of 40 trials each (half trials with visual probes and half with tactile probes) were run for four different standard speeds, 3.5, 7.5, 10, and 15 cm/s (except for subject MG who did not complete the 10 cm/s condition), chosen among the speed values used in the time reproduction task. Data for each condition were fitted with cumulative Gaussian functions estimated by means of the maximum likelihood method; the point of subjective equality (PSE) and the differential threshold were derived from the median and SD of the psychometric function, respectively. SEs for the PSEs and SDs were estimated by bootstrap.
The PSE indicated the speed of the visual (tactile) probe for which it was perceived as fast as the tactile (visual) standard. We thus repeated the time reproduction task (in the same three subjects) with new speed values, determined for each subject and stimulus modality according to the PSEs found in the crossmodal speed-matching task, so that the stimuli for both modalities were matched in perceived speed to those previously used. In the bimodal condition, the cross-modal speed-matching was obtained by using the standard speeds (3.5, 5, 7.5, 10, 12.5, and 15 cm/s) for the visual stimuli and appropriately changing the speeds of the tactile stimuli so as to match the visual speeds. Since PSEs were known for four of the six standard speeds, the other values were estimated by interpolation with the best-fitting linear function. The experiment required about 9 h testing for the three subjects who completed all the conditions and 4 h for the other subjects.
The results for the bimodal condition were modeled within the Bayesian framework (Ernst and Banks, 2002;Alais and Burr, 2004). According to optimal "Bayesian" integration the perceived duration of the combined visuo-tactile stimuli results from a weighted sum of the estimates of duration provided separately by each modality. Assuming that the visual and tactile estimates are statistically independent, the combined estimate of event duration, D VT , is given by the following equation: where D V is the visual estimate and D T the tactile estimate, calculated as the average reproduced duration across trials, for each stimulus duration and speed. The weights, w V and w T , sum to unity and are inversely related to the variances for vision (σ 2 V ) and touch (σ 2 T ), respectively: since the variance of the reproduced duration did not change systematically with stimulus speed, σ 2 V and σ 2 T were computed averaging variances across speeds, separately for each duration.
The model predicts that the variance for the combined estimate, σ 2 VT , is always less than the unimodal variances, σ 2 V and σ 2 T , with the greatest improvement in precision (

Figure 2
reports the individual reproduced durations as a function of speed for the 400, 600, and 800 ms visual (left column) and tactile (right column) stimuli. Duration reproduction is rather biased, so reproduced duration differs from the physical duration of the stimuli, with considerable variation between subjects. Regardless of the individual bias in the reproduction, the visual stimuli are always perceived to last longer (264 ± 40 ms on average) than the tactile stimuli. In most cases, the difference in perceived duration between visual and tactile stimuli grows with stimulus duration (Figure 3). For both modalities apparent duration increases linearly with log speed. However, the speed-dependency was stronger for the tactile than for the visual stimuli, as can be observed in Figure 4A. To analyze better the relationship between perceived duration and speed, data were normalized by dividing them by the reproduced duration obtained for the stimuli moving at 7.5 cm/s. In this way we preserved only the information regarding the relative change in apparent duration, unaffected by systematic biases in the reproduction. The slopes of the normalized reproduced duration versus speed functions (calculated by linear regression) for touch are plotted against the slopes for vision. Tactile slopes are much greater than the visual slopes as indicated by the points lying above the equality line. A repeated-measures analysis of variance (ANOVA) with two within-subjects factors (modality and duration) was conducted on the slopes, leading to a significant difference between visual and tactile slopes [main effect of factor modality; F (1,5) = 10.183; p = 0.024], but neither to a significant effect of stimulus duration [F (2,10) = 1.06; p = 0.427], nor to a significant interaction between stimulus modality and duration [F (2,10) = 0.427; p = 0.679].
Importantly, the slopes for the visual modality correlate positively with the slopes for the tactile modality [r = 0.584 (16) driving the time expansion of both visual and tactile stimuli. Figure 4B shows average results for each stimulus duration in the visual and tactile conditions. The increasing linear functions have very similar slopes within each modality, indicating that the same relationship between apparent duration and speed applies to all stimulus durations, but they are always steeper for the tactile than for the visual stimuli.
To evaluate the relative contributions of vision and touch to the final combined percept we took advantage of the large differences in perceived duration between visual and tactile stimuli (in some cases up to 400-800 ms), leading to two clearly distinct unimodal duration estimates. We thus employed bimodal stimuli comprising visual and tactile stimuli moving at the same physical velocity for the same duration. The results for each subject are reported in Figure 5A for the 600 ms stimuli (comparable results were obtained for the other two durations tested; individual data not shown). In all cases except one, the apparent duration  for the bimodal stimuli as predicted by Eq. 4 (see Materials and Methods). As shown in Figure 5B, the bimodal SDs for all other subjects are never better than the best unimodal case, and worse than what predicted by the model. To quantify the goodness of model fit, we performed a linear regression between the bimodal data and the model predictions (see Eq. 1 in the Materials and Methods) for all subjects. We tested whether the best-fitting linear function is significantly different from the ideal fit (equality line with intercept equal to 0 and slope equal to 1) by looking at the 95% confidence intervals for the intercept and slope. The duration estimates for the bimodal stimuli do not deviate significantly from the predicted estimates. However, since the confidence intervals are quite large, the absence of a significant difference between the bimodal and predicted estimates can be affirmed with high uncertainty. Figure 6 shows results averaged across subjects, separately for the three durations. As was evident in the single-subject results, the bimodal duration reproductions fall between the unimodal reproductions, close to the model predictions. The strong test of optimal integration is an improvement in thresholds (SDs). The lower bar graphs show average normalized thresholds for all conditions: the variances for all speeds and subjects were first divided by the bimodal variance (to eliminate inter-subject variability), then summed and square-rooted to yield the thresholds of Figure 6B. The predicted SDs are significantly lower than the bimodal SDs, indicating suboptimal integration [t (35) = 6.3, p < 0.0001 for 400 ms; t (35) = 5.5, p < 0.0001 for 600 ms; t (35) = 6.5, p < 0.0001 for 800 ms; two tailed paired t -tests]. Figure 2 shows that not only does the duration of both visual and tactile stimuli depend on speed, but visual stimuli tend to be perceived as lasting longer than tactile stimuli. As perceived duration varies strongly with stimulus speed, the difference in perceived duration of visual and tactile stimuli could arise from differences in their perceived speed. To examine this possibility, we first measured relative speed perception between vision and touch. The cross-modal speed-matching task (see Materials and Methods for details) revealed that visual stimuli appear to move faster than tactile stimuli. Visual (red symbols) and tactile (green symbols) PSEs for the three subjects who completed the task are plotted in Figure 7A as a function of tactile and visual standard speeds, respectively. The dashed line indicates equal perceived speed between vision and touch. Visual PSEs lie below the equality line, and tactile above, indicating overestimation of visual speed relative to tactile speed.
We then repeated the time reproduction task with higher speeds for touch and lower speeds for vision (specified by the PSEs), so that the new visual and tactile stimuli were matched in perceived speed to those previously used. Figure 7B reports the results for visual and tactile stimuli matched in physical (filled symbols) and perceived speed (red open symbols match with green filled symbols and green open symbols with red filled symbols). After speedmatching, the difference in apparent duration between visual and tactile stimuli is reduced but not completely eliminated. For two Frontiers in Integrative Neuroscience www.frontiersin.org subjects out of three (CZ and MG) perceived speed explains exactly half of the difference in perceived duration between vision and touch, as indicated by the open symbols lying halfway between the filled symbols. Subject AS shows an asymmetrical pattern of results: speed-matching did not affect tactile apparent duration (green open symbols overlap with green filled symbols), whereas it produced a remarkable decrease (more than half of the difference between vision and touch) in visual apparent duration. These findings appear quite surprising if one considers the strong speeddependency of tactile apparent duration shown by the same subject, unless we hypothesize that she changed her response criterion in the second part of the experiment, shortening all reproduction times (this would also be consistent with the greater reduction in perceived duration reported by AS after visual speed-matching compared with what reported by the other two subjects).
The results for the bimodal speed-matched stimuli (Figure 8) do not allow us to draw different conclusions with respect to those already discussed for bimodal stimuli consisting of physically identical visual and tactile stimuli. Both before and after speed-matching the results for the subject CZ indicate that vision and touch are combined in an optimal way to yield an estimate of event duration, as shown by the good fit of the model. Bimodal duration estimates and precision of the reproduction deviate little from what predicted by optimal integration for the subject AS, while they are completely inconsistent with the model for the subject MG.

DISCUSSION
We used a time reproduction task to measure the apparent duration of visual, tactile, and visuo-tactile stimuli moving at various speeds. The study yielded three main results. Firstly, we show that motion induces temporal dilation in the tactile modality as previously shown in the visual modality: faster stimuli appear to be longer. Secondly, visual stimuli appear to last longer and to move faster than tactile stimuli of the same duration and speed. Thirdly we model the results with the Bayesian theory of optimal integration, and find an adequate fit for the duration estimates but not for their precision.
Unlike prior investigations in vision (Kanai et al., 2006;Kaneko and Murakami, 2009), our experiment was not designed to evaluate the differential role of speed, temporal frequency, and spatial frequency in time dilation; we did not manipulate the spatial frequency of the stimuli (3 c/cm for touch; 3 c/deg for vision) and consequently the temporal frequency and speed always covaried. We found that also for the tactile modality, apparent duration increases with increasing speed (and temporal frequency, in agreement with Khoshnoodi et al., 2008). Speed-dependency for touch is stronger than for vision. This might be explained by the different spatial and temporal tuning properties of the two sensory systems, which determine a different sensitivity to stimulus motion in the range of speeds considered. The maximum speed tested (15 cm/s, 45 Hz) is actually quite low compared with the high temporal resolution (up to 400 Hz) of the tactile system, while it approaches the upper limit of sensitivity for visual motion. This may account for the more rapid saturation of the effect in the visual modality, reflected in the Frontiers in Integrative Neuroscience www.frontiersin.org lower slopes of the increasing linear functions describing speed dependence. The functional architectures of early visual and tactile sensory processing show several important similarities. Although the two systems have different temporal resolutions, both are equipped with low-pass and band-pass temporal channels that yield sustained and transient neural responses. Several lines of evidence indicate that motion processing also shares similar properties and possibly common neural substrates between vision and touch (Konkle et al., 2009;Gori et al., 2011;Pei et al., 2011). Recently, compelling evidence has linked the encoding of duration in the sub-second range to the early sensory machinery for temporal analysis. It has been shown that perceived time can be locally altered by means of visual motion (or flicker) adaptation (Johnston et al., 2006;Burr et al., 2007) and the same result has been also extended to the tactile modality (Watanabe et al., 2010). Here we report that time dilation induced by motion is a common finding across vision and touch. All this fits well with the suggestion that timing functions may be realized by multiple, modality-specific mechanisms, operating according to similar computational principles and rooted in the early sensory function.
Vision and touch yield different duration reproductions for stimuli moving at the same physical speed, with tactile reproductions being in general slightly more accurate (closer to the actual physical duration) and precise (showing less inter-trial variability) than visual reproductions. As visual stimuli appear to last longer than tactile stimuli (∼200 ms), we tested whether this inter-sensory difference in perceived duration could result from differences in perceived speed. The cross-modal speed-matching task revealed that visual stimuli are perceived to move faster than the tactile stimuli, but this does not explain entirely the difference in perceived duration, which persists, although to a lesser extent, after the stimuli are matched in perceived speed. That the apparent duration may change depending on stimulus modality is not a new finding in the timing literature (Goldstone and Lhamon, 1974;Walker and Scott, 1981). Differences in apparent duration have been previously reported for auditory and visual stimuli and generally interpreted within the"internal clock theory" as modality-specific differences in the pulse rate of the pacemaker (e.g., Wearden et al., 2006). The difference that we observe between vision and touch increases proportionally with stimulus duration, ruling out explanations based on effects at onset and offset (Penney et al., 2000;Burle and Casini, 2001).
The reasons for these modality effects in the perception of duration are not clear at present, but certainly pose the problem of how the brain handles inter-sensory conflicts when multimodal events have to be timed. We tried to tackle this issue examining duration reproduction for bimodal visuo-tactile stimuli. The results show that both vision and touch contribute to the final duration percept, as indicated by the bimodal estimates lying between the unimodal estimates. The bimodal durations were statistically indistinguishable from the quantitative predictions of optimal fusion (weighted average of the unimodal estimates). However, the bimodal precision was far from being "optimal," not showing the theoretical improvement. One reason for the lack of improvement in thresholds may be that our experimental design involved temporal reproduction, and a full model would have to consider that the reproduction task might have introduced its own noise, and this would affect the predictions. In effect, as this noise occurs after the fusion of visual and tactile signals, it would add to all threshold estimates, and dilute any advantage that may have been gained from the bimodal fusion.
The encoding of duration cannot rely on specific sense organs, nor seems to be subserved by a specifically dedicated pathway. Our sense of time is continuously subject to numerous distortions (for a review see Eagleman, 2008), probably reflecting the fact that time analysis is interconnected with the processing of other contents of the external world, suggesting that this inherent plasticity of the system is functionally more relevant than having a stable and exact metric of time.