Efficiency of Sensory Substitution Devices Alone and in Combination With Self-Motion for Spatial Navigation in Sighted and Visually Impaired

Jicol, Crescent; Lloyd-Esenkaya, Tayfun; Proulx, Michael J.; Lange-Smith, Simon; Scheller, Meike; O'Neill, Eamonn; Petrini, Karin

doi:10.3389/fpsyg.2020.01443

ORIGINAL RESEARCH article

Front. Psychol., 10 July 2020

Sec. Perception Science

Volume 11 - 2020 | https://doi.org/10.3389/fpsyg.2020.01443

Efficiency of Sensory Substitution Devices Alone and in Combination With Self-Motion for Spatial Navigation in Sighted and Visually Impaired

CJ
Crescent Jicol ¹^†
TL
Tayfun Lloyd-Esenkaya ²^†
MJ
Michael J. Proulx ¹
SL
Simon Lange-Smith ³
MS
Meike Scheller ¹
EO
Eamonn O'Neill ²
KP
Karin Petrini ¹^*

1. Department of Psychology, University of Bath, Bath, United Kingdom
2. Department of Computer Science, University of Bath, Bath, United Kingdom
3. School of Sport and Exercise Sciences, Liverpool John Moores University, Liverpool, United Kingdom

Article metrics

View details

Citations

11,9k

Views

2,3k

Downloads

Abstract

Human adults can optimally combine vision with self-motion to facilitate navigation. In the absence of visual input (e.g., dark environments and visual impairments), sensory substitution devices (SSDs), such as The vOICe or BrainPort, which translate visual information into auditory or tactile information, could be used to increase navigation precision when integrated together or with self-motion. In Experiment 1, we compared and assessed together The vOICe and BrainPort in aerial maps task performed by a group of sighted participants. In Experiment 2, we examined whether sighted individuals and a group of visually impaired (VI) individuals could benefit from using The vOICe, with and without self-motion, to accurately navigate a three-dimensional (3D) environment. In both studies, 3D motion tracking data were used to determine the level of precision with which participants performed two different tasks (an egocentric and an allocentric task) and three different conditions (two unisensory conditions and one multisensory condition). In Experiment 1, we found no benefit of using the devices together. In Experiment 2, the sighted performance during The vOICe was almost as good as that for self-motion despite a short training period, although we found no benefit (reduction in variability) of using The vOICe and self-motion in combination compared to the two in isolation. In contrast, the group of VI participants did benefit from combining The vOICe and self-motion despite the low number of trials. Finally, while both groups became more accurate in their use of The vOICe with increased trials, only the VI group showed an increased level of accuracy in the combined condition. Our findings highlight how exploiting non-visual multisensory integration to develop new assistive technologies could be key to help blind and VI persons, especially due to their difficulty in attaining allocentric information.

Introduction

Our world, built by the sighted for the sighted, poses significant challenges for the estimated 252 million visually impaired (VI) individuals worldwide (Bourne et al., 2017). Furthermore, visual impairments and blindness have been estimated to drastically increase in the next 30 years leading to approximately 4 million (when only considering the United Kingdom) living with sight loss (Future Sight Loss, 2009, pp. 43–44).

The eyes are our window to where we are and what is around us in the environment. Vision, with its higher spatial resolution, normally provides the most reliable information when it comes to spatial tasks in general, and to navigation specifically. Evidence that vision dominates other senses during spatial tasks comes from developmental studies. These studies show that children use visual information to calibrate (teach) other sensory cues during spatial tasks (e.g., Gori et al., 2008, 2010, 2012; Petrini et al., 2016) and also show that children have difficulties discounting or ignoring visual information even when it is irrelevant for the task (Innes-Brown et al., 2011; Downing et al., 2015; Petrini et al., 2015). In spatial navigation, vision is so relevant that it can influence even how humans find their way in the dark back to a previously seen location (Tcheang et al., 2011; Petrini et al., 2016). For example, a study using immersive virtual reality showed that, after being presented with conflicting visual information, adult sighted participants used a representation combining visual and self-motion cues to find their way back to the start in darkness (Tcheang et al., 2011).

However, when vision is absent or less reliable (e.g., in a poorly lit environment), our reliance on other sensory cues such as sound becomes essential. This holds especially true for blind individuals who need to mostly or completely rely on other sensory cues to perform daily tasks (e.g., locating a person by his/her voice). Navigation is a particularly important but demanding task for blind individuals as they not only have to find their way by using less reliable spatial information but they also have to avoid collision with a huge number and variety of obstacles in the environment (e.g., objects, people, and animals). Several studies have demonstrated how visual experience is essential for the typical development of spatial cognition and navigation abilities (Pasqualotto and Proulx, 2012). This, however, is not true for all kinds of navigation but it seems to be specific to navigation tasks that require an allocentric (i.e., a spatial representation built on the relative position of objects in the environment), rather than an egocentric (i.e., a spatial representation built on the subject’s own position in the environment), representation of space (Pasqualotto et al., 2013; Iachini et al., 2014). For example, Iachini et al. (2014) reported that congenitally blind participants, when compared to late blind and sighted participants, found it difficult to represent spatial information allocentrically, but not egocentrically, during a large-scale space navigation task. Accumulating evidence of this type has prompted the development of numerous types of technological aids aimed to help individuals with visual deficits during navigation requiring allocentric representation (i.e., in large-scale environments).

Among these technological approaches, sensory substitution devices (SSDs) have received a great deal of interest in the last few decades. SSDs are noninvasive technologies that exploit the ability of the brain to adapt and to process the lost sensory information (vision) through the other unaffected senses (e.g., “seeing through the ears”; Bach-y-Rita et al., 1969; Meijer, 1992). SSDs are not only noninvasive and much cheaper than other alternatives (e.g., sensory restoration devices) but are also better suited for use with different types of visual deficits, including congenital blindness. This is because they do not require a developed visual system and/or any previous visual knowledge (Proulx et al., 2014a).

A freely available SSD is The vOICe, which uses an image-to-sound conversion algorithm which receives input from a camera and transposes it into 1-s auditory “soundscapes” (Meijer, 1992). The vOICe algorithm transforms visual images by scanning them from left-to-right, converting them into grayscale, and subdividing them into pixels. Each pixel is then converted into sound (or “sonified”) based on its luminance, horizontal position, and vertical position. High luminance pixels sound louder than low luminance pixels, pixels on the left of the visual field are played before those on the right, and pixels at the top have a higher pitch than those at the bottom (Meijer, 1992).

The vOICe has been demonstrated to allow VI individuals to access visual information through audition, allowing object recognition and localization (Auvray et al., 2007). However, The vOICe is limited as users find it difficult to distinguish between multiple objects which are vertically aligned, as it is difficult to distinguish between the pitches of sounds which are played simultaneously (Brown et al., 2015). Similarly, due to the nature of the left-to-right scanning that creates soundscapes, it is difficult to process horizontally aligned objects simultaneously, as their respective sounds are played at different junctures. Nevertheless, the benefit of The vOICe cannot be understated, as it confers superior spatial resolution to all the other tactile-visual sensory substitution systems (e.g., BrainPort; for details see Bach-y-Rita and Kercel, 2003; Haigh et al., 2013; Proulx et al., 2014a).

An alternative SSD is the BrainPort, a visual-to-tactile aid. This device operates by transforming images into a pattern of electrical stimulation delivered via an electrode array that sits atop the tongue (Bach-y-Rita and Kercel, 2003). The device is used by exploring this electrode pad, thus objects can be processed theoretically in parallel (Arditi and Tian, 2013), and users might have no difficulty in distinguishing between vertically aligned objects. In addition, the BrainPort confers a superior temporal resolution to The vOICe, although its spatial resolution is inferior (Bach-y-Rita and Kercel, 2003).

That The vOICe and BrainPort each seem to have strengths where the other has weaknesses raises the question of whether the unaffected sensorimotor ability (e.g., self-motion) could be integrated with one or even both of these simultaneously during spatial navigation. Optimal concurrent use of two or more SSDs would be reliant on multisensory integration, the process by which information from different senses is combined to form a holistic percept (Stein et al., 2009). Thus, concurrent use of multiple SSDs could allow multisensory integration of incoming information, whereby the advantages of each device compensates for the respective limitations of the other (Shull and Damian, 2015). Or the use of these devices concurrently with another sensorimotor ability could increase precision and accuracy during spatial navigation by integrating these multiple information sources in absence of vision.

The ability to use a multimodal representation of space in blind individuals when navigating their environment, however, has not received support in persons with restored vision through a retinal prosthesis. Garcia et al. (2015) examined the ability of a group of adult patients with ARGUS II retinal prosthesis to use the restored visual information to navigate a simple two-legged path. The patients, an age-matched control group and another younger control group, had to retrace a two-legged path (two sides of a triangle they previously experienced) in one task and go back to the start point after walking the same two-legged path in another task (i.e., they had to complete the triangle by walking as precisely as possible the remaining third side). Before reproducing the path or completing the triangle, participants could walk (by being guided) the two-legged path with either an indirect visual landmark or no visual landmark. Garcia et al. (2015) showed that, in contrast to sighted individuals, these patients did not use a combined representation of visual and self-motion cues when navigating (when reproducing the path or completing the triangle) but relied entirely on self-motion (Garcia et al., 2015). Thus, it appears that a multimodal representation of space (a single and coherent representation of space obtained by integrating the restored visual information with self-motion) was not formed in these blind individuals.

This stands in contrast to existing evidence from neuroscience, which suggests that congenitally blind individuals can recruit visual areas when recognizing sounds, shapes, and movements through SSDs (De Volder et al., 1999; Poirier et al., 2007), in addition to areas, such as parahippocampus and visual cortex, that are essential for successful spatial navigation in sighted individuals (Kupers et al., 2010). A possible explanation is that blind individuals may usually form a non-visual multimodal representation of space with the unaffected sensory information (e.g., sound and self-motion). In that case, using the restored visual information would be detrimental rather than helpful as the possible representation of space with the restored visual information (with a far lower resolution than typical vision) is poorer than a non-visual multimodal representation of space. Consequently, forming a multisensory representation of space and benefitting from it could be possible for VI and blind individuals when using non-visual information as provided by the SSDs. That blind and VI individuals may use a non-visual multisensory representation of space to increase their accuracy and precision is supported by recent findings showing that an audiotactile map (delivered through a touchpad) was more efficient than either a tactile only map or only walking during a navigation task (Papadopoulos et al., 2018).

The ability of blind/VI and sighted blindfolded individuals to use SSDs (Chebat et al., 2011, 2015; Maidenbaum et al., 2014; Kolarik et al., 2017) efficiently during spatial navigation, even after a short training, is well-known. For example, Chebat et al. (2011) showed that congenitally blind participants had an enhanced ability to detect and avoid obstacles compared to blindfolded sighted when using a tongue display unit (TDU), and Chebat et al. (2015) showed that congenitally blind, low vision, and late blind individuals could achieve the sighted (non-blindfolded) performance in a real and virtual maze after few trials with the EyeCane (a device that uses sound and vibration to deliver information about distances). Chebat et al. (2015) also showed that participants could improve their spatial perception and form a cognitive map through the learning experience afforded by the EyeCane. However, what remains unclear is whether the formation of a cognitive map combining non-visual information can speed up learning and provide better precision and accuracy to VI and blind users. Understanding whether the integration of different non-visual cues can improve VI spatial navigation has both important theoretical and applicative significance. On the one hand, it has important implications for the development, training, and application of existent and new aids for the blinds. On the other hand, it could bring support to a convergent model of spatial learning (Schinazi et al., 2016) in the blind and VI, by showing that even when using less effective cues for navigation, blind and VI can learn to perform as well as sighted by increasing their precision through non-visual multisensory integration.

Here, we examine this possibility by first testing whether combining a vision-to-sound and a vision-to-tactile information as provided by two SSDs can enhance navigation performance in a group of blind-folded sighted participants. Next, we tested whether combining the information from one SSD with existing and unaffected senses (e.g., self-motion and proprioception) can improve navigation precision and accuracy in a group of blind-folded sighted participants and a group of VI individuals. To test the formation of a cognitive map, we asked participants to perform the navigation task (walking to a target location) in darkness after experiencing the environment under different conditions (e.g., with an SSD or with self-motion). To test whether there was an increase in accuracy and precision (when combining either information from different SSDs or from one device and the available self-motion information), we used a maximum likelihood estimation (MLE) framework (i.e., we compared the reduction in variability for the measured combined condition to that obtained for each sense separately and to the reduction in variability predicted by the MLE; Ernst and Banks, 2002). Under the MLE framework, we expect to see a significant reduction in performance variance (or reduced uncertainty) as predicted by the model when the variance for the unimodal conditions (e.g., when using the two SSDs in isolation) are similar, or in other words when the reliability of the cues to be integrated are similar. Hence, the tasks used here were chosen to be fairly easy and straightforward to assure that a similar level of performance with different devices could be achieved.

In Experiment 1, we examine whether a non-visual multisensory representation of space can improve the navigation performance of a group of sighted blindfolded individuals when using a tactile or auditory SSD (i.e., The vOICe or the BrainPort) or the two together (The vOICe and BrainPort) in an egocentric and allocentric aerial map task. Aerial maps are the most common representations provided to people for building layouts and cities, and blind persons have been shown to benefit from a tactile aerial representation when navigating an unfamiliar environment (Espinosa et al., 1998), probably because it removes the lack of depth perception as a barrier for VI individuals. Furthermore, a survey representation which encodes external and unfamiliar information of the environment (like in an aerial or map-like view) is more severely affected by lack of vision when compared to route (serial)-based representation (Tinti et al., 2006). Hence, we used an aerial map task to assess the efficiency of different SSDs alone or in combination. We chose this task also based on recent evidence that the use of audiotactile maps to build cognitive spatial representations are more efficient than using only a tactile map or walking in an unfamiliar environment (Papadopoulos et al., 2018). We hypothesized an improved performance (reduced variance) on a distance estimation-based navigation task when participants explored aerial maps using The vOICe and BrainPort together than when using either of these devices in isolation. We also hypothesized an increase in accuracy with a number of trials for all the conditions.

In Experiment 2, we examine whether a non-visual multisensory representation of space can improve the navigation performance of a group of sighted and a group of VI blindfolded individuals when using self-motion or The vOICe or the two together in an egocentric and allocentric spatial navigation task. We hypothesized an improved performance (reduced variance) on the navigation task using The vOICe and self-motion together than when using either The vOICe or self-motion in isolation, especially for the VI group. We also hypothesized an increase in accuracy with a number of trials for all the conditions, especially for the VI group.

Experiment 1

Method

Participants

Thirty students (15 males and 15 females), aged 18–22 (M = 20.38, SD = 0.924), from the University of Bath, UK, participated in the experiment. Due to technical problems, some of the trials for three participants were not saved correctly and thus we had to exclude these participants. Hence, the data for twenty-seven participants were included in the analysis. Twenty-five were self-reportedly right-handed. All participants had normal vision and audition and were naïve to The vOICe, BrainPort, and the laboratory where the experiment took place. Participants were reimbursed £5 for their time. All participants provided informed consent and were debriefed. The experiment was approved by the University of Bath Psychology Department Ethics Committee (Ethics Code 16:180).

Apparatus

The experiment took place in an 11 m × 7 m laboratory. Two configurations of four target points (each 50 cm × 50 cm) were marked on the floor of the laboratory (see Supplementary Figures S1, S2), one for training and one for the experimental procedure. These configurations were based on studies by Garcia et al. (2015) and Petrini et al. (2016).

The laboratory was equipped to record motion tracking data, using a Vicon Bonita system consisting of eight infrared cameras (see Figure 1B), which tracked five reflectors on the motion tracking helmet, to which a blindfold was attached (see Figure 1C). The Vicon system was controlled through a Python 3.0 script using Vizard libraries. A remote for controlling the script was used to control tracking for each navigation trial (see Figure 1D).

Figure 1

The BrainPort device consists of three parts: camera glasses, the processor unit, and the Intra-Oral Device (IOD). A laptop connected the BrainPort’s software (vRemote) to the live feed from the camera glasses to display the settings and allow correct positioning of the stimuli. Auditory stimuli were played from the same laptop via Philips stereo headphones. The headphones we used were open in the sense that participants could still hear sounds in the room to some extent, as well as their own footsteps. This was done so as to replicate as closely possible to a real environment which will have noises (information normally used by the blind and VI). These noises were always kept constant though throughout the conditions of the study so as not to add a confounding variable. Previous literature suggests that a head-mounted camera performs better than a hand-held camera while using The vOICe for navigation purposes (Brown et al., 2011). As a result, we designed a helmet with a blindfold (Mindfold Eye Mask) and reflectors used for motion tracking attached. A USB camera (ELP 480P webcam with 120° view) was mounted to the middle of the blindfold (see Figure 1C). The USB webcam was connected to a mini-PC (1.3 Ghz Intel Atom processor, 1 GB RAM) running Windows XP and The vOICe (Meijer, 1992). Participants used Philips SHS 5200 neckband headphones to listen to the soundscapes.

We used the default settings of The vOICe algorithm aside from changing the zoom to 2×. This enabled participants to observe the objects separately, group them two by two or explore them all at the same time. The experiment took place in the Virtual Reality (VR) Lab (11 m × 7 m). The three-dimensional (3D) objects developed for the study were a cylinder, a cube, and a four-faced pyramid of the same height (60 cm; see Figure 1A). We used different shapes intentionally as we wanted the soundscapes returned by The vOICe to be different so as to replicate more closely real environments where various objects are available. However, the three objects had similar dimensions as they had the same width and length.

Materials

Experimental Stimulus Design

Aerial perspectives of the training and experimental point configurations marked on the laboratory floor were digitally recreated to scale using AutoCad (Version 21.0, AutoDesk, Inc., Mill Valley, California, United States). These were the “aerial maps,” with each target and the start point being indicated by a white square on a black background (see Supplementary Figure S2). All stimuli were transformed into soundscapes using The vOICe’s image sonification algorithm (Meijer, 1992) at the following settings: 2-s scan rate, normal contrast, and foveal view off. A5 sized prints of all stimuli were placed in front of the BrainPort camera and were explored via the IOD at the following settings: zoom 33°, invert off, contrast high, lighting low, tilt 25°, and lock off. This ensured that the visual information being transformed by both devices was congruent to ensure that multisensory integration was not prevented (Schinazi et al., 2016).

Training Stimulus Design

The training stimuli consisted of a set of four lines and five sets of circles (all white on a black background), which occupied approximately the same visual area (see Supplementary Figure S3). The stimuli were produced in the same fashion and using the same settings as the experimental stimuli.

Conditions

The conditions of the experimental procedure comprised of two unimodal conditions: The vOICe only (vOICe) and BrainPort only (TDU), and one bimodal condition: The vOICe plus BrainPort (vOICeTDU). In each condition, the same aerial map was delivered, and 10 wayfinding task-pairs were completed. Thus, in total, every participant completed 60 wayfinding tasks, based on the same target configuration. The order of wayfinding tasks was counterbalanced among trials and conditions. This was done to minimize a potential confound of participants learning the configuration of target points over subsequent conditions.

Navigation Tasks

Each wayfinding task-pair comprised of an egocentric task and an allocentric task. In the egocentric task, participants navigated directly to target 3 from the start point (Figure 2). In the allocentric task, participants navigated from the start point to target 1 and then to target 3 (Figure 2). The experimenter oriented participants toward their first target they were to navigate to prior to commencing each task.

Figure 2

Participants’ motion during the wayfinding tasks was tracked: commencing once they were ready to begin each task and terminating once they announced that they had reached the target location. They were then returned to the start point via an indirect route to discourage them from trying to estimate the distance between their final position and the start point from the route the experimenter took them rather than the SSD(s).

Procedure

The study consisted of three phases: basic training, active training, and the experimental procedure. Prior to the study, the experimenter collected demographic information from the participants (age, handedness, and gender). They were then blindfolded to prevent viewing the interior of the laboratory.

Basic Training

Upon beginning the study, participants were trained to use the two SSDs. This procedure utilized the training stimuli. The device that participants were trained with first was counterbalanced in an ABAB fashion. First, the experimenter would explain the mechanisms of action of both SSDs. Then, for each training stimulus, the experimenter either played the auditory file for The vOICe or placed the relevant printed stimulus in front of the BrainPort camera, for 10 s. Participants were asked to use the relevant device to identify and count the lines or circles that were presented to them. The question was left open-ended, so the likelihood of participants correctly identifying the stimulus by chance was negligible. If the stimulus was identified, training would progress to the next stimulus. If not, feedback was provided, and the mechanism of action of each SSD was explained again. This process continued until participants were able to identify and count all training stimuli.

Active Training

The purpose of the active training was to give participants a sense of the scale, how the distances between the target points they experienced using the SSDs equated to physical distances. The active training mirrored the three experimental conditions in terms of the utilized exploration methods (vOICe, TDU, or vOICeTDU) and was counterbalanced mirroring the experimental procedure.

This procedure utilized the aerial map of the training target configuration (Supplementary Figure S2), which was delivered via the SSDs. Participants were instructed to explore the training aerial map via the SSD(s) for as long as required to identify and localize all points. Before each practice trial, participants were told that they would be taken to the starting point and oriented in the direction in which they would need to move initially (depending on whether they were doing an egocentric or an allocentric task). They were then told to walk as far as they needed and turn as much as they needed to reach the target point. During this practice phase, participants received feedback, that is, if they made a mistake in estimating distance or angle then the experimenter would correct them and tell them whether they had over/underestimated. This was done at each target location and for both distance and rotation, and thus, for the allocentric task, participants received feedback after the first (Object 1, see Figure 2 right panel) and second target (Object 3, see Figure 2 right panel), while for the egocentric task feedback was received for the only target used for the task (Object 3, see Figure 2 left panel). They would then complete two trials of the navigation task, one allocentric and one egocentric with the order counterbalanced. At the end of active training, participants were led outside the laboratory for a 5-min break.

Experimental Procedure

Each experimental condition was identical, the only difference being the SSD the participants used to explore the aerial map. This procedure utilized the experimental target configuration and respective aerial map (Figure 2 and Supplementary Figure S2). Upon beginning a condition, participants used the device(s) specified by the condition to explore the aerial map for 10 s (this was an arbitrary time limit enforced to standardize stimulus exposure). That is, participants used the different devices (depending on the condition at hand) to scan the room before attempting the navigation task, while during the navigation task only self-motion was used. When using both devices together, alignment between the two signals was controlled by the participant by activating the BrainPort as soon as The vOICe information started, so that the two devices started to deliver information at approximately the same time. The decision to let the participants control for the start of the BrainPort was taken to better approximate a real condition in which the user would have control on what device to use and when. They would then complete two trials of the navigation task, one allocentric and one egocentric with the order counterbalanced, using self-motion. Upon completing both trials, participants were led back to the SSD apparatus, and they used the device(s) for the given condition for another 10 s, and then completed another pair of navigation trials. This process was repeated until 10 pairs of navigation trials were completed. Once a condition was completed, participants were led outside the laboratory and had another break. The process was then repeated for the remaining two conditions. Once participants had completed the navigation tasks, they were taken outside the laboratory and debriefed, gave final consent, and were paid, thus concluding the experiment.

Results

Individual Estimates

The tracked coordinates obtained through the Vicon system were processed using MATLAB (Version R2018b, The MathWorks, Inc.) and Psychtoolbox command Library (Brainard, 1997; Pelli, 1997). For each participant’s end positions (when the participant decided he/she arrived at the object’s target position), a bivariate normal distribution was fitted (Figure 3), which enabled the estimation of x mean, y mean, x variance, and y variance. The FASTCMD algorithm (Rousseeuw and Driessen, 1999), as implemented by the MATLAB Libra toolbox (Verboven and Hubert, 2005), was used for a robust estimation of these values, with the assumption of 1% aberrant (outlier) values (i.e., a value of 0.99 for the alpha parameter). For each participant, a single variable error was computed by using the sum of the variance of x and y directions of the fitted bivariate distribution (black ellipses in Figure 3). Secondly, a measure of constant error was calculated as the distance between the center of the fitted bivariate distribution (center of the black ellipses in Figure 3) and the correct position for the target object (Object 3). Variable error is expected to reduce when participants are able to combine multiple modalities and in line with the MLE model (Ernst and Banks, 2002; Alais and Carlile, 2005; Cheng et al., 2007; Van der Burg et al., 2015; Noel et al., 2016). On the other hand, constant error represents a systematic navigational bias. That is, it reoccurs over multiple trials and is consistent. Constant error is expected to reduce when less biased information is available.

Figure 3

Group Analysis

The variable error estimates (obtained as size of the individual ellipsis for each condition, see Figure 3) and the constant error estimates (obtained as the distance of the center of each individual ellipsis from the correct target position, point 0,0 in Figure 3) were tested to determine whether they were normally distributed. As the majority of conditions did not meet the assumption of normal distribution (Shapiro-Wilk, p < 0.05), we used Wilcoxon tests to examine differences between conditions (e.g., vOICeTDU vs. vOICe) within each group, and Mann Whitney U tests to compare the two groups’ performances in each condition. We then used Pearson’s correlation analyses (as assumption of linearity was met) to determine whether the number of trials (from 1 to 10) was associated with changes in constant error (i.e., accuracy), in other words, whether there was a decrease in error (or increase in accuracy) with increased number of trials. For directional hypotheses, the reported results are one-tailed.

Figure 4 (left panels) shows the results for the variable error in the allocentric (top panels) and egocentric (bottom panels) tasks. Wilcoxon tests were used to compare the variable error between the bimodal (vOICeTDU) and the unimodal conditions (vOICe and TDU) and between the measured bimodal (vOICeTDU) and the predicted bimodal (MLE) conditions separately for the allocentric and egocentric tasks. The analysis showed no significant difference between vOICeTDU and the unimodal (vOICe and TDU) conditions for both tasks, Z ≤ −0.953, p ≥ 0.170, one-tailed. There was, however, a significant difference between vOICeTDU and MLE for both tasks (Z ≥ −2.463, p ≤ 0.014) indicating that the level of variability for the bimodal condition was not accurately predicted by the MLE model.

Figure 4

A similar analysis was performed on the constant error measures (Figure 4 middle panels), and it showed no significant difference between vOICeTDU and TDU for both egocentric and allocentric task (Z ≤ −1.410, p ≥ 0.079, one-tailed) and a significant difference between vOICeTDU and vOICe in the allocentric task (Z = −2.440, p = 0.007, one-tailed), indicating higher accuracy and less bias with The vOICe alone, but only a trend in the egocentric task (Z = −1.600, p = 0.055, one-tailed).

Finally, we examined whether sighted participants showed any learning effect across the 10 trials within each sensory condition (vOICe, TDU, and vOICeTDU) for allocentric and egocentric task separately. Thus, Pearson correlations (given the data linearity) were used to analyze whether the average constant error decreased with an increase in number of trials, i.e., whether participants’ accuracy increased with practice. For the allocentric task, as shown in Figure 4 top right panel, a significant association between decrease in error and increase in trial number was found for the TDU condition (r = −0.863, p < 0.001, and one-tailed) but not for the vOICeTDU (r = −0.182, p = 0.308, and one-tailed) and vOICe condition (r = 0.424, p = 0.111, and one-tailed). In addition, vOICeTDU accuracy performance as a function of trials did not correlate with either the performance in The vOICe or TDU alone (r ≤ 0.039, p ≥ 0.458, and one-tailed). For the egocentric task, as shown in Figure 4 bottom right panel, a significant association between decrease in error and increase in trial number was found for the TDU condition (r = −0.795, p = 0.003, and one-tailed) and for The vOICe condition (r = −0.881, p < 0.001, and one-tailed), but not for the vOICeTDU condition (r = −0.499, p = 0.071, and one-tailed), although the combined condition did show a trend in this direction. Finally, vOICeTDU accuracy performance as a function of trials significantly correlated with both the performance in The vOICe or TDU alone (r ≥ 0.594, p ≤ 0.017, and one-tailed). This suggested that in the egocentric task the changes in accuracy in the bimodal condition (vOICeTDU) was driven by changes in accuracy for both The vOICe and TDU condition alone.