Audiovisual crossmodal cuing effects in front and rear space

Lee, Jae; Spence, Charles

doi:10.3389/fpsyg.2015.01086

ORIGINAL RESEARCH article

Front. Psychol., 30 July 2015

Sec. Cognitive Science

Volume 6 - 2015 | https://doi.org/10.3389/fpsyg.2015.01086

This article is part of the Research TopicFunctional relevance and representation of 3D spaceView all 8 articles

Audiovisual crossmodal cuing effects in front and rear space

Jae Lee^*

Charles Spence

Crossmodal Research Laboratory, Department of Experimental Psychology, University of Oxford, Oxford, UK

The participants in the present study had to make speeded elevation discrimination responses to visual targets presented to the left or right of central fixation following the presentation of a task-irrelevant auditory cue on either the same or opposite side. In Experiment 1, the cues were presented from in front of the participants (from the same azimuthal positions as the visual targets). A standard crossmodal exogenous spatial cuing effect was observed, with participants responding significantly faster in the elevation discrimination task to visual targets when both the auditory cues and the visual targets were presented on the same side. Experiment 2 replicated the exogenous spatial cuing effect for frontal visual targets following both front and rear auditory cues. The results of Experiment 3 demonstrated that the participants had little difficulty in correctly discriminating the location from which the sounds were presented. Thus, taken together, the results of the three experiments reported here demonstrate that the exact co-location of auditory cues and visual targets is not necessary to attract spatial attention. Implications of these results for the design of real-world warning signals are discussed.

Introduction

Our senses are constantly bombarded by information from the surroundings, and therefore it is crucial for our brains to know which stimuli should be focused on, and which can safely be ignored. Over the last few decades, there has been a plethora of research on the topic of spatial attention, spanning all the way from basic (see Spence and Driver, 2004, for a review) through to applied (see Spence and Ho, 2008, for a review). The majority of the research on this topic has been focused on exogenous (involuntary) orienting rather than endogenous (voluntary) orienting (see Spence and Driver, 2004; Wright and Ward, 2008, for reviews). In the case of endogenous spatial orienting, attention is thought to be “pushed” to the expected target location (e.g., following the presentation of an informative central arrow cue at fixation), whereas in the case of exogenous orienting, attention is “pulled” to the location of a salient peripheral cue (Spence and Driver, 1994, 2004; Wright and Ward, 2008). While early exogenous cuing studies tended to focus on spatial attention within just the visual modality (Jonides, 1981; Briand and Klein, 1987; Müller and Rabbitt, 1989; Rafal et al., 1991; Klein et al., 1992), there has been an explosion of research interest in crossmodal attention over the last couple of decades (e.g., Spence and Driver, 1994, 1997; McDonald et al., 2000; Ferlazzo et al., 2002).

The major finding to have emerged from these studies of exogenous spatial orienting is that participants typically respond more rapidly to targets when they are preceded by cues presented on the same side than when the cues and targets are presented on opposite sides. The facilitation attributable to exogenous spatial cuing typically lasts for around 300 ms from the onset of the cue¹. It is, however, not clear exactly how spatially specific exogenous spatial cuing effects are: is the exact co-location of cues and targets required, or is the comparative lateral position between cues and targets all that matters, as the terms location, position, and/or side have been used interchangeably when discussing cuing effects (Posner, 1980; Posner et al., 1980; Jonides, 1981; Briand and Klein, 1987; Rizzolatti et al., 1987; Spence and Driver, 1994; Ward, 1994; Spence et al., 1998; Rorden and Driver, 1999; McDonald et al., 2000; Schmitt et al., 2000; Kennett et al., 2001, 2002; Ferlazzo et al., 2002; Batson et al., 2011).

Here, we report three experiments designed to investigate how the location of auditory cues, in terms of lateral cuing (i.e., cued if the cues and targets are on the same side and uncued if they appear on opposite sides) and depth (i.e., front vs. rear), affects the cuing effect for frontal visual targets. Experiment 1 was conducted in order to replicate the standard crossmodal exogenous auditory spatial cuing effect in front space, before going on to study what happens in rear space (Experiment 2). Both experiments adapted the orthogonal spatial cuing methodology originally introduced by Spence and Driver (1994, 2004) in which the dimension of cuing is orthogonal to that of participants’ responses. For example, if the cues happened to be presented on the z-axis (i.e., front or rear), the task in the orthogonal cuing design was to indicate whether the targets appeared on either the left or right side (x-axis), or on the upper vs. lower location (y-axis). The orthogonal cuing design allows researchers to rule out any observed performance benefits that might result simply from response priming (Spence and Driver, 1994). Spatial cuing effects were evaluated by looking for any performance discrepancy in the reaction times (RTs) and error rates (ERs) of participants’ responses between cued and uncued trials. Since a task-irrelevant auditory cue varies in a spatial dimension orthogonal to that in which target discrimination judgments are made, any cuing effect will be reflected by shorter RTs at the cued as compared to the uncued locations if the cue facilitates target perception.

Experiment 1

Experiment 1 was designed to test the hypothesis that participants would respond significantly faster (and possibly also more accurately) to visual targets that had been preceded by cues from the same side of central fixation as compared to those presented on the opposite side. We also assessed any differences in the magnitude of the spatial cuing effects as a function of the type of auditory cue that was presented: white noise vs. pure tones. Given that white noise stimuli are easier to localize than pure tones, especially in terms of their elevation (e.g., Stevens and Newman, 1936; Deatherage, 1972; Spence and Driver, 1994), it seemed plausible to assess whether the latter might lead to a broader spread of spatial attention around the cued location.

Methods

Participants

Twenty participants (10 male and 10 female) were recruited to take part in the experiment through the Crossmodal Research Lab mailing list and Oxford Psychology Research participant recruitment scheme. The average age of the participants was 26 years, with a range from 19 to 37. All of the participants were right-handed, and had normal hearing and vision, by self-report. The experimental session lasted for approximately 30 min. The participants were paid £5 in return for taking part in the study. The experiment was approved by the Medical Sciences Interdivisional Research Ethics Committee at the University of Oxford, and was conducted in line with the guidelines provided.

Apparatus and Materials

All of the experiments reported in the present study were conducted in a darkened room (320 cm × 144 cm × 220 cm), using MATLAB r2014a with Psychtoolbox 3.0.12 on Ubuntu 14.04 LTS. The participants were seated at a desk with a backlit computer keyboard, approximately 60 cm away from a cloth screen mounted on the front wall of the room. The cloth screen hid five 12v 5 mm LEDs with a luminance of 8000 millicandelas and two loudspeakers (M-Audio Studiophile AV 40; model 9900-65140-00). The LEDs were controlled by an Arduino Uno board rev. 3, following MATLAB commands. One LED was placed at the center of the screen, approximately at the eye level of the participants (111 cm from the floor) as a fixation point. Four additional LEDs were installed as visual targets in the top-left, top-right, bottom-left, and bottom-right positions, each separated by 60 cm horizontally and by 40 cm vertically with the fixation LED positioned in the center (see Figure 1).

FIGURE 1

FIGURE 1. Bird’s-eye view showing the position of the loudspeakers and target LEDs in Experiment 1.

The loudspeakers were equipped with a 1-inch diameter treble tweeter and a 4-inch diameter low frequency driver, positioned 10 cm below the tweeter. The loudspeaker frequency response ranged from 85 Hz to 20 kHz. The loudspeakers were placed on their sides so that the tweeters were situated closer to the walls than the low frequency drivers. The farthest sides of the two loudspeakers were separated by a distance of 120 cm. The center of each treble tweeter and low frequency driver was placed 111 cm above the floor. The auditory cues consisted of a 2000 Hz pure tone at 75 dBA and white noise (with a frequency cutoff range between 0 and 22 kHz) presented at 68 dBA, both measured from the participant’s ear position². The sample rate for both auditory cues was 44.1 kHz. A computer monitor (Dell UltraSharp; model 1908FPb) was placed on the left side of the participant’s seat to display any instructions.

Design

There were three within-participants factors in the experiment: Cue Type (pure tone vs. white noise), Spatial Cuing (cue presented on the same vs. opposite side as the target), and stimulus onset asynchrony (SOA) between the cue and target (100, 200, or 700 ms). The crossing of these factors yielded 12 possible conditions, with each condition being presented 12 times randomly in each block of 144 trials. The participants completed a total of three blocks, and were encouraged to take a short break between blocks.

Procedure

At the start of each trial, the fixation LED was illuminated and remained on for 2 s after the onset of the visual target, or until the participant made a response. After a random delay of 400–650 ms, an auditory cue was presented from one of the two loudspeakers at a constant intensity, for 100 ms. A visual target, shown as the illumination of one of the four LEDs for 140 ms, occurred after a further delay of 0, 100, 600 ms, depending on the SOA. The participants were instructed to press the up arrow key on the keyboard if an LED illuminated on either the upper-left or upper-right, and to press the down arrow key if an LED illuminated on either the lower-left or lower-right. The participants were further instructed to ignore the auditory cue, and to respond as rapidly and accurately as possible to the location of the visual target. The participants completed 10 practice trials before the experimenter stepped out of the room. If the participants failed to respond within 2 s of the onset of the visual target, the trial terminated, and the next trial began.

Results

A box plot of participants’ average RTs across all conditions revealed a median of 429 ms, between 354 and 524 ms for the 25-percentile (Q₁) and 75-percentile (Q₃) range, respectively. One participant’s average RT (M = 813 ms) was greater than the upper limit (Q₃ + interquartile range multiplied by 1.5) when compared to that of the sample. This participant’s data was therefore identified as an outlier and removed from the analyses (see Tukey, 1977; Wesslein et al., 2014). The data from another participant were removed due to his/her failing to respond on more than 10% of all trials. The following trial data were excluded from the subsequent analyses: incorrect responses, responses immediately following an incorrect response, and RTs that fell outside the range between 150 and 1,500 ms (see Spence and Driver, 1997, for similar exclusion criteria). The application of these exclusion criteria led to the removal of a total of 404 trials (5.2% of the data).

A three-way within-participants repeated measures analysis of variance (RM-ANOVA) was conducted with the factors of Cue Type, Spatial Cuing, and SOA. The analysis revealed a significant main effect of Spatial Cuing, F(1,17) = 47.489, p < 0.001, with participants responding more rapidly on the cued trials (M = 415 ms) than on the uncued trials (M = 433 ms). There was also a significant main effect of SOA, F(1.522,25.873) = 30.427, p < 0.001, with the participants responding more slowly at the 100 ms (M = 439 ms) as compared to either the 200 ms (M = 419 ms) or 700 ms (M = 414 ms) SOAs (the latter two conditions did not differ significantly, p = 0.384). This speeding-up of participants’ responses as the SOA increased presumably reflects a generalized alerting effect (see Spence and Driver, 1997). The analysis of the data also highlighted a significant two-way interaction between Spatial Cuing and SOA, F(2,34) = 5.935, p = 0.006. Paired t-tests revealed that the participants responded significantly more rapidly on the cued than on the uncued trials at all three SOAs: at the 100 ms SOA, t(17) = -6.575, p < 0.001; at the 200 ms SOA, t(17) = -6.159, p < 0.001; and at the 700 ms SOA, t(17) = -3.553, p = 0.002 (all p-values were smaller than 0.0167 based on Bonferroni correction, see Figure 2 and Table 1). Subsequent contrasts revealed that the magnitudes of the cuing effects between the 100 ms (M = 22 ms) and 200 ms (M = 20 ms) SOAs were not significantly different, F(1,17) = 0.764, p = 0.394, whereas the magnitude of the cuing effect at the 200 ms SOA was significantly larger than that at the 700 ms (M = 11 ms) SOA, F(1,17) = 4.921, p = 0.040.

FIGURE 2

FIGURE 2. Mean reaction times (RTs; in milliseconds) and error rates (ERs; cue-target cued conditions in square brackets and uncued conditions in rounded parentheses), as a function of cue-target stimulus onset asynchrony (SOA) in Experiment 1. The solid line represents the cued conditions and the dotted line represents the uncued conditions. Asterisks indicate the RT differences between the cued and uncued conditions at given SOAs were significant based on paired t-tests after Bonferroni correction (p < 0.0167).

TABLE 1

TABLE 1. Mean reaction times (RTs; in Milliseconds) from pure tone and white noise conditions, their within-participant SEs from Cousineau’s (2005) method, and error rates (ERs; in parentheses), as a function of stimulus onset asynchrony (SOA) and spatial cuing in Experiment 1.

A similar analysis of the error data did not reveal any significant terms.

Discussion

The results of Experiment 1 clearly demonstrate a significant exogenous crossmodal cuing effect. In particular, the participants’ elevation discrimination responses were facilitated when the presentation of the visual targets were preceded by an auditory cue on the same, rather than on the opposite, side of central fixation. These results therefore replicate those reported some years ago by Spence and Driver (1997; see Spence et al., 2004 for a review). However, another interesting result to emerge from the analysis of the data from our first experiment was that the magnitudes of the crossmodal cuing effects were similar regardless of the type of auditory cue (pure tone vs. white noise) that preceded the onset of the visual target. The latter result is interesting in that one might have expected, a priori, that more localizable auditory cues (i.e., the white noise burst) would have given rise to a more narrowly localized focusing of participants’ spatial attention around the cue location than the pure tone cues which were presumably less localizable in the elevation dimension (cf. Spence et al., 2004).

Having replicated the basic exogenous crossmodal spatial cuing effect and having demonstrated its seeming insensitivity to the type of auditory cue that was presented (at least for the two cues presented in Experiment 1), we went on, in Experiment 2, to investigate what would happen if the auditory cues were to be presented from behind the participant’s head on either the left or right (i.e., from a very different spatial location than that occupied by the visual target). The design of Experiment 2 was identical to that of Experiment 1, with the sole exception that on half of the trials, the auditory cues were now presented from behind the participant’s head, rather than from in front, in order to investigate whether they would also influence the speed of information processing for visual targets presented from the front. It was expected that the participants would respond more rapidly to the frontally arrayed visual targets after same side front cues (cued trials) than to the targets following front cues presented on the opposite side (uncued trials; thus hopefully replicating the results of Experiment 1). More interesting, though, was what would happen following the presentation of the same auditory cues from the rear. On the one hand, one might expect to observe no spatial cuing effects at all, since the rear cues would always be presented from a different location than the front targets. On the other hand, however, it could also be argued that the very fact that the cue and target are still presented on the same vs. opposite sides might be sufficient to elicit some sort of spatial cuing effect; who knows, perhaps the exact co-location of the cue and target would not matter. In fact, little is currently known about how attention is oriented exogenously following the presentation of auditory cues that fall outside of the visual field. Obtaining information on this point could be particularly interesting for those thinking about how to alert drivers, say, to stimuli presented in their blind spot (see Ho and Spence, 2008).