Perception of visual apparent motion is modulated by a gap within concurrent auditory glides, even when it is illusory

Wang, Qingcui; Guo, Lu; Bao, Ming; Chen, Lihan

doi:10.3389/fpsyg.2015.00564

ORIGINAL RESEARCH article

Front. Psychol., 19 May 2015

Sec. Perception Science

Volume 6 - 2015 | https://doi.org/10.3389/fpsyg.2015.00564

Perception of visual apparent motion is modulated by a gap within concurrent auditory glides, even when it is illusory

Qingcui Wang^1,2

Lu Guo²

Ming Bao^2*

Lihan Chen^3*

¹Hangzhou Applied Acoustics Research Institute, Key Laboratory of Science and Technology, Hangzhou, China
²Institute of Acoustics – Chinese Academy of Sciences, Beijing, China
³Department of Psychology and Key Laboratory of Machine Perception (Ministry of Education), Peking University, Beijing, China

Auditory and visual events often happen concurrently, and how they group together can have a strong effect on what is perceived. We investigated whether/how intra- or cross-modal temporal grouping influenced the perceptual decision of otherwise ambiguous visual apparent motion. To achieve this, we juxtaposed auditory gap transfer illusion with visual Ternus display. The Ternus display involves a multi-element stimulus that can induce either of two different percepts of apparent motion: ‘element motion’ (EM) or ‘group motion’ (GM). In “EM,” the endmost disk is seen as moving back and forth while the middle disk at the central position remains stationary; while in “GM,” both disks appear to move laterally as a whole. The gap transfer illusion refers to the illusory subjective transfer of a short gap (around 100 ms) from the long glide to the short continuous glide when the two glides intercede at the temporal middle point. In our experiments, observers were required to make a perceptual discrimination of Ternus motion in the presence of concurrent auditory glides (with or without a gap inside). Results showed that a gap within a short glide imposed a remarkable effect on separating visual events, and led to a dominant perception of GM as well. The auditory configuration with gap transfer illusion triggered the same auditory capture effect. Further investigations showed that visual interval which coincided with the gap interval (50–230 ms) in the long glide was perceived to be shorter than that within both the short glide and the ‘gap-transfer’ auditory configurations in the same physical intervals (gaps). The results indicated that auditory temporal perceptual grouping takes priority over the cross-modal interaction in determining the final readout of the visual perception, and the mechanism of selective attention on auditory events also plays a role.

Introduction

In a noisy environment, we often need to integrate various sources of information, including spatial, temporal, and semantic cues between multiple signals to build a coherent representation for the target sensory events (Bertelson and Aschersleben, 2003; Alais and Burr, 2004; Stein and Stanford, 2008). Among all forms of cross-modal interactions, audiovisual processing remains a main vehicle for perceiving the events in the world (Koelewijn et al., 2010). Owing to the inherent higher temporal precision in auditory modality (Morein-Zamir et al., 2003; Alais and Burr, 2004), auditory events usually help to make the (noisy) visual stimuli perceptually distinctive and even pop-out among the cluttered environment. The influence of auditory signals upon visual events has been documented in a typical multisensory illusion – temporal ventriloquism (Shimojo et al., 2001; Bertelson and Aschersleben, 2003; Morein-Zamir et al., 2003; Vroomen and de Gelder, 2004; Burr et al., 2009; Cook and Van Valkenburg, 2009), which has been initially characterized by cross-modal capture of the perceived timing of visual events at the presence of concurrent auditory events.

Recently, demonstrations of temporal ventriloquism have been extended to dynamic contexts where different perceptual groupings compete to determine the final percept of visual apparent motion (Getzmann, 2007; Bruns and Getzmann, 2008; Freeman and Driver, 2008; Shi et al., 2010). For example, in the typical bouncing illusion, two balls moved across each other and elicited either streaming or bouncing off percept. Presentation of a ‘collision’ sound near the crossover of the balls facilitated the perception of ‘bouncing rather’ than ‘streaming’ (Sekuler et al., 1997; Shimojo et al., 2001). A few studies have used the combinations of a single auditory event with multiple visual events to investigate whether and how this single auditory event could selectively bind with only one of the multiple visual events; or alternatively, interacts with all the visual events, to reach a perceptual decision (simultaneity judgment or feature discrimination) on visual events (Van der Burg et al., 2008, 2013; Roseboom et al., 2009, 2013). The evidence so far suggests a ‘selective’ temporal ventriloquism effect, in which a subjective mapping from a single auditory event to multiple visual signals is hard to achieve.

The auditory stimuli used in previous studies were either a sequence of auditory beeps or a continuous sound lasting through the visual motion process, with the sounds being grouped in pitch or rhythm (Watanabe and Shimojo, 2001; Keetels et al., 2007; Bruns and Getzmann, 2008), obeying Gestalt principles such as proximity in frequency or time, continuous or smooth transition, onset and offset, rhythm and common spatial location (Bregman, 1990; Bregman et al., 1994; Carlyon, 2004). Nevertheless, the acoustic environment we live in usually consists of complex sound scenes. Multiple auditory inputs, if they are conflicted in features, would give rise to perceptual competition for human observers. Therefore, it is important to examine how we resolve the competition of perceptual organizations in the cross-modal interaction. Specifically, it is of ecological sense to examine whether and how the perceptual competition in a single modality (such as auditory modality) is resolved first to influence the visual percept; alternatively, it is important to examine how competitive sound signals directly (and selectively) interact with their visual counterparts by cross-modal grouping to obtain the final visual perceptual decision (Koelewijn et al., 2010; Talsma et al., 2010).

Here, we combined the auditory gap transfer illusion with the visual Ternus apparent motion to examine how perceptually competitive tones interact with the visual frames in resolving the ambiguous perceptual states of visual apparent motion. For the auditory stimuli, the gap transfer illusion refers to the auditory stimulus pattern that embodies a short descending (or rising) glide crossing with a long rising (or descending) glide. There is a short gap (around 100 ms) in the temporal middle of the cross in the long glide, the gap is yet perceived in the short gap instead of the long glide, by the perceptual integration of onsets and offsets of the sound segments at the crossing (Nakajima et al., 2000; Kanafuka et al., 2007). For the visual stimuli, we used a typical Ternus display that is composed of two successive visual frames, each containing two horizontal dots. The two frames share one common dot location when overlaid, with the other dot on the opposite side. With different inter-frame intervals (IFIs), participants can perceive two different percepts of apparent motion. When the IFI is short, ‘element motion’ (EM) is perceived with the outer dot moving from one side to the other, while the center dot remains static or flashing. However, long IFIs give rise to the perception of ‘group motion’ (GM), in which the two dots move together as a group (Ternus, 1926; Kramer and Yantis, 1997; Alais and Lorenceau, 2002; Petersik and Rice, 2008; Chen et al., 2010; Shi et al., 2010). Therefore, the perceived motion state of Ternus display is a function of the implicitly perceived IFI between two visual frames. As illustrated above, the auditory gap transfer illusion is composed of both physically and perceptually equal stimuli configurations. The visual Ternus display contains a relatively wide temporal range in categorizing the two motion percepts. The juxtaposition of auditory glides and visual Ternus display provides a good paradigm to examine the role of temporal perceptual groupings of auditory events upon visual motion, with easy manipulation of the temporal relations between auditory-visual events. Specifically, we here asked two empirical questions: (1) whether intra-modal temporal groupings precede over cross-modal interaction in the complex audiovisual scene; (2) whether the final visual percept is driven by the subjectively perceived temporal interval between visual frames when they interface with the ‘gap’ in the auditory glides.

To fulfill the research interests, four experiments were conducted. In Experiment 1A, participants were asked to make a two-forced choice toward the classification of visual Ternus apparent motion (EM vs. GM). The auditory stimuli consisted of the gap transfer and no gap transfer patterns, as well as a single ascending and a single descending glide tone (with or without a gap), together with the visual-only condition (Ternus display) as a control. As we noted, the direction of the pitch change (rising or falling) mattered significantly in the perception of dynamic loudness change (Neuhoff, 1998) and the rising tone takes perceptual priority with salient biological meaning (indicating the onset of a looming object) over decreasing tone (Neuhoff et al., 1999, 2002; Hall and Moore, 2003). In Experiment 1B, we manipulated the direction of pitch change (decreasing) to exclude any potential effect of the directional information of the pitch contour. To further examine the temporal correspondence between the perceived auditory gap interval and the ultimate visual motion percept, in Experiment 2A we curtailed the experimental conditions (from Experiment 1A) by selecting the critical stimuli configurations: auditory glides with gap transfer illusion, auditory gap with short glides, auditory gap with long glides and visual-only conditions. After completing the visual Ternus motion discrimination session, participants were required to compare visual intervals in different stimuli configurations. Gap intervals (50–230 ms) between two visual Ternus frames were embedded in the above three auditory stimuli configurations, with the standard IFI (140 ms) of visual Ternus frames (without accompanying tones).

To examine the potential mechanism of temporal proximity between auditory and visual events in modulating the perception of visual apparent motion, and to overcome the physical constraints (for cross-modal interaction) of larger temporal deviation between audiovisual events in Experiments 1 and 2, we implemented Experiments 3 and 4. Experiment 3 was implemented to address whether the temporal proximity between the visual frames and sound beeps (a gap in long glides) played an important role in modulating the percept of visual Ternus apparent motion by introducing two types of Ternus display (short frame duration vs. long frame duration). Experiment 4 was implemented to show how synchronous paired audiovisual events influenced the perception of visual Ternus motion. The results from the four experiments indicated that in a complex audiovisual interaction scenario, auditory (intra-modal) temporal grouping took priority over the audiovisual (cross-modal) temporal organization. Furthermore, the final decision for the percept of visual motion was largely determined by the perceived visual gap intervals.

Experiment 1

Method

Experiment 1 was conducted by using between-subjects design. The ascending glides were the long glides in Experiment 1A and the short glides in Experiment 1B. The descending glides were mirror-reversed in two experiments (short glides in Experiment 1A and long glides in Experiment 1B).

Participants

Twenty-seven undergraduate and graduate students (four females, with an average age of 24.9) participated in Experiment 1A. Twenty-eight students (eight females, with an average age of 23.3) took part in Experiment 1B. All participants reported having normal hearing and normal or corrected-to-normal vision. The experiments were performed in compliance with all institutional guidelines set by the Academic Affairs Committee, Department of Psychology at Peking University. All participants provided written informed consent according to the Declaration of Helsinki.

Apparatus and Stimuli

Visual stimuli were presented on a 17-inch CRT monitor (Viewsonic), controlled by a normal PC (HP AMD Athlon 64 Dual-Core Processor) with a Radeon 1700 FSC graphics card. The vertical refresh rate was set to 100 Hz and the resolution was 1024 × 768 pixels. The auditory stimuli (65 dB) were generated by a sound card (RME Fireface UFX) and binaurally presented to the participants’ ears with the headphone (Sennheiser HD 600). The computer programs for controlling the experiments were developed with Matlab (Mathworks Inc.) and the Psychophysics Toolbox (Brainard, 1997; Pelli, 1997). The test cabin was semi-anechoic and dimly lit throughout the experiment. The viewing distance was fixed at 60 cm, maintained by using a chin-rest.

Visual Stimuli

Visual stimuli were composed of two sequential stimulus frames, with each lasting 30 ms. Each frame contained two black horizontal dots presented on a gray background (10.6 cd/m²). The dots were 1.3° of visual angle in diameter, 0.24 cd/m² in luminance and had a 2°of visual angle separation between them. The two frames shared one dot location at the center of the monitor with the two other dots located in the opposite positions relative to the center (Figure 1). The first frame ended at the beginning of the glide gap and the second frame started at the end of the gap (Figure 2), which meant the IFIs of the visual stimulus were equal to the gap intervals of the auditory stimulus.

FIGURE 1

FIGURE 1. Two possible motion perceptions of the Ternus display. (A) ‘Element motion’: the center dot is perceived to remain at the same space, while the outer dot is perceived to move from one side to the other side; (B) ‘Group motion’: two dots are perceived to move together as a group.

FIGURE 2

FIGURE 2. Illustrations of the stimuli configurations for Experiment 1A. Stimuli 1–6 consisted of both auditory and visual stimuli. Stimuli 7 contained only visual stimuli. The presentation duration of each stimulus configuration was 1500 ms. The rectangles on the X-axis in each stimuli pattern represented the two visual frames (30 ms for each frame) as shown in Figure 1. Stimuli 1 and 2 were composed of a short descending glide crossing over a long ascending glide. There was a gap in the temporal middle of the stimuli. The gap was in the short glide in the ‘no gap transfer’ stimulus configuration (Stimuli 1) and in the long glide in the ‘gap transfer’ stimulus configuration (Stimuli 2). The auditory stimuli in Stimuli 3–6 were the components of the stimuli configuration 1 and stimuli configuration 2, consisting of only the short glide or the long glide. Stimuli used in Experiment 1B were made by mirror-reversing the stimuli in Experiment 1A in time.

Auditory Stimuli

Six auditory stimulus configurations and a control stimulus configuration (visual Ternus only) were used in Experiment 1A (Figure 2). In the first (A1-NGT: no gap transfer) and the second (A2-GT: gap transfer) stimulus pattern, the auditory stimuli consisted of a long ascending glide crossed with a short descending glide. The duration of the long ascending glide was 1500 ms, increasing from 660.7 to 1513.6 Hz, and the short glide lasted 500 ms, decreasing from 1148.2 to 871.0 Hz. Both glides moved linearly on the logarithmic frequency scale at a rate of 0.80 oct/s. The long and the short glide crossed each other at 1000 Hz with a temporally middle point at t = 750 ms from the beginning of the long ascending glide. At the crossing temporal point, “no gap transfer stimuli” (stimuli 1 configuration) had a temporal gap in the short descending glide, while the “gap transfer stimuli” had a gap in the long ascending glide. The rise and fall times were 150 ms at the beginning and end of the long glide, while the rise and fall times were 3 ms for the short glide (at the boundaries of the gap in either the long or the short glide).

In the third (A3-SG: short glide with gap) and fourth (A4-SNG: short glide with no gap) stimulus pattern, the auditory stimuli consisted of only the short glide with a gap (stimuli 3) or without a gap (stimuli 4). The short glide (A3, stimuli 3) was inserted with a gap, with the first auditory segment having frequencies of 1148.2–1028 Hz before the gap and the second segment of 972.7–871.0 Hz after the gap. The fourth stimulus pattern was a continuous glide with changing frequencies of 1148.2–871.0 Hz. The fifth stimulus pattern (A5- LG: long glide with gap) was a long glide with a gap, one auditory segment before the gap having frequencies of 660.7–972.2 Hz, with the other segment having frequencies of 1028–1513.6 Hz after the gap. The sixth stimulus pattern (A6-LNG: long glide without gap) consisted of the long glide without a gap, with pitch rising from 660.7 to 1513.6 Hz.

In Experiment 1B, the stimulus patterns were similar to those in Experiment 1A except that the auditory stimuli in the first six stimulus patterns were mirror-reversions of those in Experiment 1A. As a result, the directions of the pitch change for both the long and short glides were opposite to the configurations in Experiment 1A.

To accurately render the timing of the auditory and visual stimuli, the duration of the visual stimuli and the synchronization of the auditory and visual stimuli were controlled by the monitor’s vertical synchronization pulse.

Design and Procedures

Prior to the experiment, participants were shown demonstrations of ‘EM’ and ‘GM’ of visual Ternus display. They then practiced a series of trials. All participants reported clear discriminations between ‘EM’ and ‘GM’ with a correct response rate of about 95% after 60 trials.

A 7 (stimuli configurations: A1–A6, V-only) × 7 (IFI: 50, 80, 110, 140, 170, 200, or 230 ms) factorial design was adopted in Experiment 1A. Each auditory configuration had two blocks and each block had 42 trials. The directions of the apparent motion (left or right) were counterbalanced across the 42 trials. Totally, there were 14 blocks and 588 trials. Participants were required to fix their eyes on the center of the monitor and make discrimination of the perceptual state of visual Ternus display (‘EM’ vs. ‘GM’), ignorance of the auditory beeps if they were presented. A typical trial started with a fixation cross lasting for 300 ms. The stimuli appeared after a blank interval of 500 ms. In the auditory-present trials, participants heard the glides first and then saw the first visual frame, after a given ISI (from 50 to 230 ms), the second visual frame appeared. In the visual only trials, participants heard nothing but waited for the same time interval as that in the sound-present conditions before seeing the first visual frame. In the auditory stimuli present blocks, the IFIs of the visual frames were equal to those in the gap intervals. After all stimuli were presented, with a random delay of 300–500 ms, participants were presented with a question mark to make a two-forced choice response. They would press the left arrow for ‘GM’ and the right arrow for ‘EM.’

Experiment 1B was implemented with the same temporal structure and similar stimuli configurations as those in Experiment 1A, except that the time-frequencies of the glides were mirror-reversed accordingly.

In a control experiment, we presented participants with the auditory gap transfer stimuli pattern with different gap durations (stimuli 1 and 2), and asked 24 participants (10 females, aged from 23 to 30 years-old with an average age of 24.1 years-old) who attended in Experiment 1A to determine whether the short gap was in the long glide or the short glide. Nearly all participants perceived the short gap in the short glide rather than in the long glide. The averaged percentages for reporting the illusory gap in short glides with gap durations of 50 ms (i.e., gap50), gap80, gap110, gap140, gap170, gap200, and gap230 were 87.5 (with associated standard error of 3.7), 93.8 (2.4), 92.5 (2.6),93.3 (2.7), 93.8 (2.2), 90.0 (4.0), and 83.8 (5.7). Repeated measures analysis of variance (ANOVA; Greenhouse–Geisser corrected) showed a marginal significant difference for perceiving ‘illusory’ gap among the seven types of stimuli configurations, F(6,138) = 2.874, p = 0.063, with relatively lower percentages of reporting ‘gap transfer’ in the shortest gap (50 ms) and longest gap (230 ms) conditions. Those results suggested that the stimuli patterns used in the experiments deliver genuine gap transfer illusion properly.

Results

The point of subjective equality (PSE) and just noticeable difference (JND) were calculated across each stimulus condition. PSE refers to the transitional temporal point at which percepts of ‘EM’ and ‘GM’ were perceived with equal probabilities, which can be calculated by estimating the point of 50% of the percentages for reporting ‘GM’ on fitted logistic function. JND represents the difference between the two motion perceptions, which is obtained by estimating the IFI difference between 50 and 75% of the GM responses from the psychometric curves (Treutwein and Strasburger, 1999). Figure 3 shows the average psychometric curves in Experiments 1A,B for all participants. Figure 4 shows the mean PSEs and JNDs (with associated standard errors) in Experiments 1A,B, which are also listed in Table 1.

FIGURE 3

FIGURE 3. Average psychometric curves for Experiment 1. (A) Experiment 1A, (B) Experiment 1B, (C) for averaged results of all participants. Dot line with filled circles represents no gap transfer (A1-NGT); dot dash line with stars indicates gap transfer (A2-GT); Solid line with triangles short glide with gap (A3-SG); Dash line with diamonds short glide with no gap (A4-SNG); Dot dash line with triangles long glide with gap (A5- LG); Solid line with squares long glide with no gap(A6-LNG); Solid line with filled squares V-only condition. The error bars represent the associated standard errors.

FIGURE 4

FIGURE 4. Point of subjective equality (PSEs) and JNDs for Experiments 1A and 1B. The error bars represent standard errors (^∗p < 0.05; ^∗∗p < 0.01; ^∗∗∗p < 0.001).

TABLE 1

TABLE 1. Point of subjective equality (PSEs) and JNDs and associated errors (ms) in Experiments 1A,B.

A repeated measures of ANOVA was conducted, with stimuli conditions (A1 ∼ A6 and V-only) as within-subjects factor and experiments (Experiments 1A or 1B) as between-subject factor. Both PSEs and JNDs were dependent factors. For PSEs, the main effect of auditory condition was significant, F(6,318) = 16.275, p < 0.001. However, no significant main effect of experiment was found, F(1,53) = 0.473, p = 0.495, neither the interaction between stimuli conditions and experiments, F(6,318) = 1.004, p = 0.423. Bonferroni correction (with 95% confidence interval for difference) was used for the full set of 21 possible pairwise comparisons. The analysis revealed no difference was found in PSEs between A1-‘no-gap transfer’ (122.3 ms), A2-‘gap-transfer’ (122.2 ms), and A3-‘short gap’ (121.5 ms), ps > 0.05, but they were smaller (ps < 0.05) than those in A4-‘short glide with no gap’ (137.1 ms), A5-‘long glide with gap’ (130.7 ms), A6-‘long glide with no gap’ (133.8 ms) and V-only (132.8 ms). PSEs in A4, A5, A6, and V-only were statistically equal, all ps > 0.1. Pairwise comparisons showed a notable effect of the short gap. The short glide with a temporal gap (physically- A3-SG or perceptually- A1-NGT and A2-GT) facilitated in separating visual frames and led to more reports of GM (with reduced PSEs).

On the other hand, ANOVA of JNDs also revealed a significant main effect of auditory conditions, F(6,318) = 6.322, p < 0.001. The main effect of experiments was not significant, F(1,53) = 2.711, p = 0.106. The interaction effect between auditory condition and experiment was also not significant, F(6,318) = 1.178, p = 0.318.

Bonferroni corrected comparison showed that JND in V-only (baseline, 22.7 ms) had no difference from those in A1 ∼ A6, all ps > 0.1. Nevertheless, there was a significant difference of JNDs among the auditory-present conditions (A1 ∼ A6), F(5,265) = 6.950, p < 0.001. JNDs were similar (ps > 0.05) in A1 (19.3 ms), A2 (20.3 ms) and A3 (19.4 ms), but were smaller (ps < 0.05) in A1 ∼ A3 than the JND in A4 (short no gap, 25.9 ms). Therefore, in general, participants’ sensitivities for discriminating the Ternus apparent motion remained nearly the same in both auditory present and absent conditions, while they showed noticeable lower sensitivities in the short glide with no gap.

Experiment 2

Experiment 1 showed that short auditory glides with a gap (either physically or perceptually perceived) imposed a significant influence on the perception of Ternus motion (with a dominant percept of GM). However, long auditory glides with a gap had a less impact on the perceived ‘state’ of Ternus motion. Though, with the similar perceptual organization in both short glide and long glide, the modulation effects were different. It was probably due to the biased visual gap intervals (compared with the gap interval in the visual Ternus display) in different auditory configurations that further modulated the perceived states of visual apparent motion (Burr et al., 2009; Shi et al., 2010). In Experiment 2, we adopted three (characteristic) configurations of auditory glides from Experiment 1, i.e., gap transfer configuration (A2), short glide with a gap (A3) and long glide with a gap (A5). Compared with the standard interval (with fixed IFI = 140 ms as a gap in Ternus display). If the perceived gap interval in A5 is not biased in the presence of long glides, and meanwhile the perceived gap intervals in A 2 and A3 are longer and equivalent in terms of magnitudes of the illusory biases (Shi et al., 2010), we would observe the similar effect of auditory capture of visual Ternus motion – dominant ‘GM.’ However, the capture effect was less magnificent or even absent with the gap in long glides (A5), due to the less/null-bias in perceiving the gap interval in A5. Experiment 2 was hence implemented to test this hypothesis.

Participants first performed the classification task of Ternus apparent motion as they did in Experiment 1, they then did a time interval judgment task for discriminating the empty duration (gap) between two visual frames, in the presence of short/long glides (A2, A3, and A5), comparing the gap in between two visual Ternus frames (without accompanying short/long glides).