Effects of Sensory Feedback and Collider Size on Reach-to-Grasp Coordination in Haptic-Free Virtual Reality

Technological advancements and increased access have prompted the adoption of head- mounted display based virtual reality (VR) for neuroscientific research, manual skill training, and neurological rehabilitation. Applications that focus on manual interaction within the virtual environment (VE), especially haptic-free VR, critically depend on virtual hand-object collision detection. Knowledge about how multisensory integration related to hand-object collisions affects perception-action dynamics and reach-to-grasp coordination is needed to enhance the immersiveness of interactive VR. Here, we explored whether and to what extent sensory substitution for haptic feedback of hand-object collision (visual, audio, or audiovisual) and collider size (size of spherical pointers representing the fingertips) influences reach-to-grasp kinematics. In Study 1, visual, auditory, or combined feedback were compared as sensory substitutes to indicate the successful grasp of a virtual object during reach-to-grasp actions. In Study 2, participants reached to grasp virtual objects using spherical colliders of different diameters to test if virtual collider size impacts reach-to-grasp. Our data indicate that collider size but not sensory feedback modality significantly affected the kinematics of grasping. Larger colliders led to a smaller size-normalized peak aperture. We discuss this finding in the context of a possible influence of spherical collider size on the perception of the virtual object’s size and hence effects on motor planning of reach-to-grasp. Critically, reach-to-grasp spatiotemporal coordination patterns were robust to manipulations of sensory feedback modality and spherical collider size, suggesting that the nervous system adjusted the reach (transport) component commensurately to the changes in the grasp (aperture) component. These results have important implications for research, commercial, industrial, and clinical applications of VR.


INTRODUCTION
Natural hand-object interactions are critical for a fully immersive virtual reality (VR) experience. In the real world, reach-to-grasp coordination is facilitated by congruent visual and proprioceptive feedback of limb position and orientation and haptic feedback of object properties Coats et al., 2008;Bingham and Mon-Williams, 2013;Bozzacchi et al., 2014;Whitwell et al., 2015;Bozzacchi et al., 2016;Hosang et al., 2016;Volcic and Domini, 2016;Bozzacchi et al., 2018). In virtual environments (VE), visual feedback of the avatar hand may be incongruent with proprioceptive feedback from the biological hand. This discrepancy can arise from technological limitations (e.g., latency, rendering speed, and tracking accuracy) related to how the scene is calibrated (Stanney, 2002) or how the VR task is manipulated (Groen and Werkhoven, 1998;. Moreover, the virtual representation of the limb may be distorted in appearance (Argelaguet et al., 2016;Liu et al., 2019) in a similar manner to the use of a cursor to represent hand position in traditional computer displays. For example, visualization of the index finger and thumb as simple spherical colliders to allow pincer grasping of objects in VE is often employed (Furmanek et al., 2019;van Polanen et al., 2019;Mangalam et al., 2021). The colliders' size is often arbitrarily chosen by researchers but can have profound effects on behavior, especially for dexterous and accuracydemanding tasks. Finally, when not combined with haptic devices, haptic information about whether and how a given object has been grasped is absent, creating additional uncertainty. The lack of haptic feedback about object properties may be supplemented with terminal visual feedback (sensory substitution) in the form of the object changing its color, or as auditory feedback in the form of a sound, to signal that the virtual object has been contacted or grasped and to minimize hand-object interpenetration (Zahariev and MacKenzie, 2003;Zahariev and MacKenzie, 2007;Castiello et al., 2010;Sedda et al., 2011;Canales and Jörg, 2020).
One of the most common and well-studied forms of handobject interactions is reaching and grasping an object. Reach-tograsp movements involve a reach component describing the transport of the hand toward the object and a grasp component describing the preshaping of the fingers to the object. Traditionally, the end of a "reach-to-grasp" movement is defined by contact with the object. The reach component is quantified through analysis of hand transport kinematics (e.g., trajectory and velocity of the wrist motion), and the grasp component is quantified through analysis of aperture kinematics (e.g., interdigit distance in time) (Jeannerod, 1981;Jeannerod, 1984). Planning and execution of successful reach-tograsp movements require both spatial and temporal coordination between the reach and grasp components (Rand et al., 2008;Furmanek et al., 2019;Mangalam et al., 2021). Whether the transport and aperture components represent information flow in independent neural channels remains an open and interesting question (Culham et al., 2006;Vesia and Crawford, 2012;Schettino et al., 2017); however, several kinematic features of coordination between the two components have been well described (Haggard and Wing, 1991;Paulignan et al., 1991a;Paulignan et al., 1991b;Gentilucci et al., 1992;Haggard and Wing, 1995;Dubrowski et al., 2002). For instance, peak transport velocity tends to occur at 30% of the total time to complete the movement (Jeannerod, 1984), and peak aperture (maximal hand opening) occurs at 60-70% of total movement time (Castiello, 2005). Furthermore, there is substantial evidence to support that the grasp and reach are strongly coordinated in the spatial domain (Haggard and Wing, 1995;Rand et al., 2008). Namely, the distance of the hand from the object when hand opening ceases and hand closing begins (closure distance, usually the point of peak aperture) can be accurately predicted from state estimates of transport velocity, transport acceleration, and aperture.
There is growing interest in contrasting performance of dexterous actions, such as reach-to-grasp, when executed in the physical environment (PE) and VE. In our previous work, we showed that temporal features of reach-to-grasp coordination and the control law governing closure (Mangalam et al., 2021) were preserved in a VE that utilized a reductionist spherical collider representation of the index and thumb and audiovisual feedback-based sensory substitution. However, we noted that movement speed and maximum grip aperture differed between the real environment and VE (Furmanek et al., 2019). These studies utilized only a single set of parameters for the presentation of feedback in the VE, and therefore, the influence of different parameters for representation of the virtual fingers and substitution of haptic feedback is unknown. The goal of this investigation was to test the extent to which the selection feedback parameters influence behavior in the VE. In two studies, we systematically varied parameters related to the sensory modality of haptic sensory substitution (Study 1) and the size of the spherical colliders representing the index-tip and thumb-tip (Study 2) to better understand the influence of these parameters on features of reach-to-grasp performance in VR. In both studies, participants reach to grasp virtual objects at a natural pace in an immersive VE presented via a head-mounted display (HMD).
Study 1 was designed to test whether visual, auditory, or audiovisual sensory substitution for haptic feedback of the object properties significantly affects reach-to-grasp kinematics. Participants grasped virtual objects of different sizes and placed them at different distances, where the change in color of the object (visual), tone (auditory), or both (audiovisual) was used to provide the terminal feedback that grasp was completed and achieved successfully. A previous study using spherical colliders to reach to grasp virtual objects reported that audio and audiovisual terminal feedback of the object being grasped resulted in shorter movement times than visual or absent terminal feedback, though there was no effect of terminal feedback on peak aperture (Zahariev and MacKenzie, 2007). While this study had a similar design to our Study 1, it was conducted using stereoscopic glasses to obtain a 3D view of images presented on a 2D display, and the results may not transfer to an HMD-based presentation of VR that presents a more immersive experience and is more commonly used today. Furthermore, no analysis of temporal or spatial reach-to-grasp kinematics was provided, limiting interpretations about the effects of terminal feedback on reach-to-grasp coordination. A more recent study using a robotic-looking virtual hand avatar to reach to grasp and transport virtual objects in an HMD immersive VR setup found that movement time was shorter for visual, compared to auditory or absent, terminal feedback (Canales and Jörg, 2020). Interestingly, participants subjectively preferred audio terminal feedback to other sensory modalities despite the fact that audio feedback produced the slowest movements. The Canales and Jörg study did not measure the kinematics of the movement and therefore interpretation about movement coordination is limited. Based on these studies and our previous work (Furmanek et al., 2019), we expected that the modality of terminal feedback used to signal successful grasp would affect reach-to-grasp kinematics due to uncertainty of contact with an object. Specifically, we hypothesized that, with multimodal (audiovisual) feedback, participants would show (H1.1) greater scaling of aperture to object width and (H1.2) faster completion of the reach-to-grasp task, but (H1.3) the spatiotemporal coordination between the reach and the grasp components of the movement should remain preserved across terminal feedback condition.
To date, no study has systematically examined the impact of the size of the virtual effector on reach-to-grasp kinematics. Study 2 was designed to fill this gap in the literature. Participants used spherical colliders of different diameters to reach to grasp virtual objects of different sizes placed at different distances. Ogawa and coworkers (Ogawa et al., 2018) reported that the size of a virtual avatar hand affects participants' perception of object size in an HMD-based VE, but they did not study reach-to-grasp movements or analyze movement kinematics. Extrapolating from their results, we hypothesized that the size of the spherical collider would affect maximum grip aperture, with smaller colliders predicted to result in larger maximum grip aperture (H2). We specifically used a reduced version of the avatar hand (just two dots representing the thumb and index fingertips) to reduce the number of factors that can potentially affect reach-to-grasp kinematics, such as differences in the shape, color, and texture of a more biological looking hand avatar (Lok et al., 2003;Ogawa et al., 2018). Moreover, the spherical colliders allowed for more precise localization of the fingertips in VE than is typical of anthropomorphic hand avatars (Vosinakis and Koutsabasis, 2018) and eliminated the influence of visuoproprioceptive discrepancies caused by potential tracking or joint angle calibration errors inherent in sensor gloves. Similar reductionist effectors have been successfully used in multiple previous studies for similar reasons (Zahariev and MacKenzie, 2007;Zahariev and Mackenzie, 2008;Furmanek et al., 2019;Mangalam et al., 2021). Furthermore, a recent study where only the target and the richness of hand anthropomorphism (e.g., 2-point, point-dot hand, and full hand) were visible to participants reported that kinematic performance was best when either the minimal (2-point) or enriched hand-like model (skeleton, full) was provided (Sivakumar et al., 2021). Therefore, in the present study, we used simple spheres representing the fingertips to systematically test the effect of collider size on reach-to-grasp behavior. Study 1 and Study 2 were designed to increase knowledge about how choices for haptic sensory substitution and collider size may affect reach-to-grasp performance in HMD-based VR. This work has the potential to directly impact the design of VR platforms used for commercial, industrial, research, and rehabilitation applications.

Participants
Ten adults [seven men and three women; M ± SD, age 21.1 ± 5.88 years; all right-handed (Oldfield et al., 1971)] with no reported muscular, orthopedic, or neurological health concerns voluntarily participated in both studies after providing informed consent approved by the Institutional Review Board (IRB) at Northeastern University. The participant pool was a convenience sample of undergraduate and graduate students. Some participants had previously participated in reach-to-grasp studies in our hf-VE; however, none of the participants reported extensive experience in VR (e.g., gaming and simulations).

Reach-to-Grasp Task, Virtual Environment, and Kinematic Measurement
Each participant reached to grasp 3D-printed physical objects in the PE and their exact virtual renderings in the haptic-free virtual environment (hf-VE) of three different sizes, small (width × height × depth 3.6 × 8 × 2.5 cm), medium (5.4 × 8 × 2.5 cm), and large (7.2 × 8 × 2.5 cm), placed at three different distances, near (24 cm), middle (30 cm), and far (36 cm) from the initial position of the fingertips. Objects were rotated along their vertical axis to 75°measured from the horizontal axis to avoid excessive wrist extension. The physical objects were 3D printed using PLA thermoplastic (mass: small: 30 g; medium: 44 g; large: 59 g) and covered with glow-in-the-dark paint.
A commercial HTC Vive Pro, comprised of HMD and an infrared laser emitter unit, was used. The virtual scene was created and rendered in Unity (ver. 5.6, 64 bits, Unity Technologies, San Francisco, CA) with C# as the programming language, running on a computer with Windows 7 Ultimate, 64-bit operating system, an Intel(R) Xenon(R) CPU E5-1630 v3 3.7 GHz, 32 GB RAM, and an NVIDIA Quadro M6000 graphics card. Given the power of the PC and simplicity of the VE, scenes were rendered in less than one frame time (see below). The interpupillary distance in the HMD was individually adjusted to each participant. Objects were displayed in stereovision giving the perception that they were 3D. Participants were asked to confirm that they perceived the object as 3D and that they could distinguish the object's edges, though we did not formally test for stereopsis. Motion tracking of the head was achieved by streaming data from an IMU and laserbased photodiodes embedded in the headset. A detailed description of the HTC Vive's head tracking system is published elsewhere (Niehorster et al., 2017). Position and orientation data provided by the Vive were acquired through Frontiers in Virtual Reality | www.frontiersin.org August 2021 | Volume 2 | Article 648529 Unity at ∼ 90 Hz, the frame rate of the HTC Vive. Prior work has reported that, for large head movements, the average error between the laser-measured position and the position reported by the Vive is less than 1 cm (Luckett, 2018). In our experiment, each participant's head remained relatively stable (the task did not involve extensive head motion) and therefore head tracking inconsistencies were negligible and none of the subjects reported any shifts or jumps in the visual display. An eight-camera motion tracking system (120 Hz, PPT Studio NTM, WorldViz Inc., Santa Barbara, CA) captured the 3D motion of IRED markers attached to the participants' wrist and fingertips. The placement procedure of the IRED markers on the fingertip was as follows: an identical 3D-printed physical object was grasped at the top of its height, and markers were attached to the tops of fingertips in a way that minimized the distance between the object and marker. The centroid of the virtual sphere corresponded to the detected position of the IRED. Note that although data were collected at 120 Hz in the PPT system, acquisition of samples in Unity was limited to ∼ 90 Hz, the frame rate of the HTC Vive. Prior to each data collection, the 3D motion capture system was calibrated. This entailed using a standard frame to reset the origin and axes of the 3D space in PPT to match the Unity origin. According to the manufacturer and confirmed by our team when analyzing the residuals during the calibration procedure, the error of the PPT system was less than 1 mm. End-to-end latency, indicating the time between the physical movement of the motion sensor (from PPT) and movement rendered in the virtual scene, was 22 ms (upper bound on the true system latency). This latency was not associated with motion sickness (Stanney, 2002;Barrett, 2004) in a previous publication using a nearly identical system (Niehorster et al., 2017). No participants in our study anecdotally reported symptoms of motion sickness; however, no formal assessment of subjective symptoms of motion sickness was completed. The schedule of trials, virtual renderings of the target object, and timing/triggering of the perturbation were controlled using custom software developed in C#. We recently published two reports showing that spatiotemporal coordination of reach-tograsp movements is similar in the above described hf-VE compared to that of the real world (Furmanek et al., 2019;Mangalam et al., 2021).

Procedure and Instructions to Participants
Each participant was seated on a chair with the right arm and hand placed on a table in front of them. At the start position, the thumb and index finger straddled a 1.5 cm wide plastic peg located 12 cm in front and 24 cm to the right of the sternum, with the thumb depressing a switch. Lifting the thumb off the switch marked movement onset. Upon an auditory tone ("beep" signal), the participant reached to grasp the virtual object presented in the HMD, lifted it, held it until it disappeared (3.5 s from movement onset, i.e., the moment the switch was released), and returned their hand to the starting position. Each auditory tone was time jittered within 0.5 s standard deviation from 1 s after trial start (i.e., after the start switch was activated) to avoid participants' adaptation. A custom collision detection algorithm was used to determine when the virtual object was grasped. Each finger was represented by a sphere. When any point on the sphere made contact with any point on the object, it was considered "attached." Once both fingers were "attached" to the object, the object was considered "grasped," and translational movement from the fingers would also move the object. A 1.2 cm error margin, imposed on the distance between the spheres, was used to maintain grasp. If the distance between the spheres increased by more than 1.2 cm from its value at the time the object was "grasped" (e.g., if the fingers opened), the object was no longer considered grasped, the color changed to white, and it would drop to the table. Conversely, if the distance between the spheres decreased by more than 1.2 cm from its value at the time the object was "grasped," the object was considered "overgrasped." An "overgrasped" object would turn white and would remain frozen. If neither error occurred, the object was considered to be grasped successfully, and its color changed to red (visual feedback condition) or a tone sounded (audio condition); see below for details about terminal feedback conditions. 1.2 cm error margin was chosen after extensive piloting of the experiment. In the future, we are planning to systematically check for the effect of the error margin on reach-to-grasp behavior.
Before data collection, each participant was familiarized with the setup and procedure. Familiarization consisted of 30 trials of grasping virtual and physical objects (five trials × three objects, placed at the middle distance) first in PE and then in hf-VE. The participant was instructed to reach and grasp an object at a comfortable speed in the middle along its vertical dimension. Following familiarization, the participant began experimental trials. Further details are provided in the subsequent sections.
To wash out any effect of sensory feedback (Study 1) or collider size (Study 2) on reach-to-grasp coordination, each participant performed a block of reach-to-grasp movements in PE prior to each hf-VE block. The rendering in the virtual scene showed two spheres, representing the thumb and index fingertips, which were visible to the participant. To make the PE condition comparable with regard to what a participant saw, the room was darkened so that the participants could see only the glow-in-thedark object and the illuminated IRED markers on their fingertips. Overhead lights were turned on and off (after every five trials) to prevent adaptation to the dark. PE trials were used strictly for washout and although data were recorded during these trials, the data were not analyzed nor presented in this manuscript.

Study 1: Manipulations of Sensory Feedback
Each participant was tested in a single session consisting of 270 trials evenly spread across six blocks of 45 trials, alternating between PE and hf-VE with the first block performed in PE. The participant was given a 2 min break between consecutive blocks. In the three blocks for hf-VE, visual (V), auditory (A), and both visual and auditory [audiovisual (AV)] feedback were provided to indicate that the virtual object had been grasped. In the vision condition, the object turned from blue to red. In the auditory condition, the sound of a click (875 Hz, 50 ms duration) was Frontiers in Virtual Reality | www.frontiersin.org August 2021 | Volume 2 | Article 648529 presented. In the audiovisual condition, the object turned from blue to red in addition to the sound of a click (Figure 1, top) and remained red until the object disappeared or was released/ overgrasped. The collider size remained constant (diameter 0.8 cm) in each feedback condition. The order of feedback conditions was pseudorandomized across participants. Each condition was collected in a single block that contained 45 trials (three object sizes, three object distances, and five trials per size-distance pair). Objects in each block were presented in the same order [small-near (five trials), small-middle (five trials), and small-far (five trials); medium-near (five trials), mediummiddle (five trials), and medium-far (five trials); large-near (five trials), large-middle (five trials), and large-far (five trials)]. Each block of virtual grasping was preceded by an identical block of grasping physical objects to wash out possible carryover effects from the previous hf-VE block.

Study 2: Manipulations of Collider Size
Each participant was tested in a single session consisting of 450 trials evenly spread across ten blocks of 45 trials, alternating between PE and hf-VE with the first block performed in PE. The participant was given a 2 min break between consecutive blocks. In the five hf-VE blocks, we manipulated the collider size to be 0.2, 0.4, 0.8, 1.2, or 1.4 cm (Figure 1, bottom). Collider size was constant for all trials within a block. The order that collider size blocks were presented was pseudorandomized across participants. Each block contained 45 trials (three object sizes, three object distances, and five trials per size-distance pair). Objects in each block were presented in the same order [small-near (five trials), small-middle (five trials), and smallfar (five trials); medium-near (five trials), medium-middle (five trials), and medium-far (five trials); large-near (five trials), largemiddle (five trials), and large-far (five trials)]. Each block of virtual grasping was preceded by an identical block of grasping physical objects to wash out possible carryover effects from the previous hf-VE block.

Kinematic Processing
All kinematic data were analyzed offline using custom MATLAB routines (Mathworks Inc., Natick, MA). For each trial, time series data for the planar motion of the markers in the x-and y-coordinates were cropped from movement onset (the moment the switch was released) to movement offset (the moment the collision detection criterion was met). Transport distance (i.e., the straight-line distance of the wrist marker from the starting position in the transverse plane) and aperture (the straight-line distance between the thumb and index finger markers in the transverse plane) trajectories were computed for each trial. The first derivative of transport displacement and aperture was computed to obtain the velocity profiles for kinematic feature extraction. All time series were filtered at 6 Hz using a fourth-order low-pass Butterworth filter. In line with our past data processing protocols, trials in which participants did not move or lifted their fingers off the starting switch not in the process of making a goal-directed action toward the object were excluded from the analysis. Excluded trials comprised < 3% of trials in any given condition. Additionally, we also computed the time series for sizenormalized aperture. The rationale for this normalization was twofold. First, markers were attached to the dorsum of the digits (on the nail) to avoid interference with grasping. Second, in hf-VE, the collider's relative sizes and the target object might influence the grasp. For instance, a larger collider might lead to a small object being perceived disproportionately smaller than a large object. Normalizing peak aperture by object size allowed us to examine any effect of such perceptual discrepancy on the grasp.
For each trial, the following kinematic features, units in parentheses, were extracted using the filtered time series data: • Movement time (ms): duration from movement onset to movement offset. • Peak aperture (cm): maximum distance between the fingertip markers. Peak aperture also marked the FIGURE 1 | Schematic illustration of the experimental setup and procedure. After wearing an HTC Vive TM head-mounted display (HMD), the participants sat on a chair in front of the experimental rig, with their thumb pressing a start switch (indicated in yellow). IRED markers were attached to the participant's wrist and the tips of the thumb and index finger. An auditory cue-a beep-signaled the participant to reach to grasp the object (small: 3.6 × 2.5 × 8 cm; medium: 5.4 × 2.5 × 8 cm; large: 7.2 × 2.5 × 8 cm), placed at three different distances relative to the switch (near: 24 cm; middle: 30 cm; far: 36 cm). An inset presents the first person scene that appeared in the HMD. Translucent panels containing text in the visual scene were only visible to the experimenter. In Study 1, participants grasped the object with 0.8 cm colliders, and visual, auditory, or audiovisual feedback was provided to signal that the object has been grasped. In Study 2, audiovisual feedback was provided to signal that the object has been grasped, and participants grasped the object with 0.2, 0.4, 0.8, 1.2, and 1.4 cm colliders. In middle and bottom panels, the medium object is presented with the accurate scaling relationship between object dimensions and collider size.
Frontiers in Virtual Reality | www.frontiersin.org August 2021 | Volume 2 | Article 648529 initiation of closure or closure onset (henceforth, CO), which we refer to as aperture at CO. • Size-normalized peak aperture: peak aperture normalized by the target object width. • Time to peak aperture (ms): time from movement onset to peak aperture. • Closure distance (cm): distance between the wrist's position at CO and the object's center. • Peak transport velocity (cm/s): maximum velocity of the wrist marker. • Time to peak transport velocity (ms): time from movement onset to maximum velocity of the wrist marker. • Transport velocity at CO (cm/s): velocity of the wrist marker at the time of CO.
Movement time was used to examine the global effect of condition manipulations on reach-to-grasp movements. Peak aperture, time to peak aperture, and size-normalized peak aperture were used to examine the effect on the grasp component. Likewise, peak transport velocity and time to peak transport velocity were used to examine the effect on the transport component. Finally, time to peak transport velocity and time to peak aperture as well as transport velocity at CO and closure distance were used to examine the effects of task manipulations on reach-to-grasp coordination (Furmanek et al., 2019;Mangalam et al., 2021).

Statistical Analysis
All analyses were initially performed at the trial level to compute means for each subject. Subjects' means were then submitted to analysis of variance for group-level statistics. 3 × 3 × 3 repeated measures analyses of variance (rm-ANOVAs) with withinsubject factors of sensory feedback (visual, auditory, and audiovisual), object size (small, medium, and large), and object distance (near, middle, and far) were used to evaluate the effects on each kinematic variable in Study 1. 5 × 3 × 3 rm-ANOVAs with within-subject factors of collider size (0.2, 0.4, 0.8, 1.2, and 1.4), object size (small, medium, and large), and object distance (near, middle, and far) were used to evaluate the effects on each kinematic variable separately in Study 2. In most cases, the data met assumptions for normality, homogeneity of variance, and sphericity. When an assumption of sphericity was not met, a Greenhouse-Geisser correction was applied. All tests were performed in Statistica (ver. 13, Dell Inc.). Each test statistic was considered significant at the two-tailed alpha level of 0.05. All effect sizes are reported as partial eta-squared (η 2 ).
We used linear mixed-effects (LME) models to test the relationship between time to peak transport velocity and time to peak aperture and between closure transport velocity at CO and closure distance, in both Studies 1 and 2. The same LMEs also tested whether and how the respective relationship was influenced by sensory feedback in Study 1 and collider size in Study 2. In LMEs for Study 1, sensory feedback served as a categorical independent variable with three levels: visual, auditory, and audiovisual. The "visual" feedback served as the reference level. In LMEs for Study 2, collider size served as a continuous independent variable. In each model, participant identity was treated as a random effect. Both models were fit using the lmer() function in the package lme4 (Bates et al., 2014) for R (Team R. C., 2013). Approximate effect sizes for LMEs were computed using the omega_squared() function in the package effectsize (Ben-Shachar et al., 2021) for R. Coefficients were considered significant at the alpha level of 0.05.

RESULTS
3.1 Study 1: Effects of Sensory Feedback on Reach-to-Grasp Movements Figure 2A shows the trajectories of the mean 2D position of the wrist, thumb, and index finger corresponding to each sensory feedback condition for a representative participant (averaged across all trials) for the medium object placed at the middle distance. Figure 2B shows the mean transport velocity and aperture profiles obtained from the trajectories shown in Figure 2A. Notice that, in both figures, the curves for the three feedback conditions entirely eclipse each other, indicating that sensory feedback affected neither the wrist, thumb, and index finger trajectories nor the transport velocity and aperture profiles. Figure 3 shows the phase relationship between transport velocity and size-normalized aperture (Furmanek et al., 2019). An almost invariant location of peak transport velocity and peak aperture, which mark the onset of the shaping phase and the closure phase, respectively, indicates that this phase relationship did not vary across feedback conditions.
An rm-ANOVA revealed that movement time did not differ among the three types of sensory feedback (p > 0.05; Table 1). As expected, movement time differed across objects placed at different distances (F2,18 36.71, p < 0.001). Bonferroni's post hoc tests revealed that movement time was longer for more distant objects (middle vs. near: 48 ± 10 ms, p < 0.001; far vs. near: 87 ± 10 ms, p < 0.001; far vs. middle: 38 ± 10 ms, p 0.004; Figure 4A). Neither the main effect of object size nor any of the interaction effects of sensory feedback, object distance, and object size was significant (p > 0.05, Table 1).
Neither sensory feedback nor object distance affected any kinematic variable related to the grasp component: peak aperture and size-normalized peak aperture (p > 0.05; Table 1). With respect to these variables, peak aperture differed across objects of different sizes (F2,18 232.39, p < 0.001), as did size-normalized peak aperture (F1,9.2 34.08, p < 0.001). Bonferroni's post hoc tests revealed that peak aperture was larger for a larger object (medium vs. small: 1.3 ± 0.1 cm, p < 0.001; large vs. small: 2.7 ± 0.1 cm, p < 0.001; large vs. medium: 1 < 0.3 ± 0.1 ms, p < 0.001; Figure 4B) confirming that the grasp was scaled to object size. However, the size-normalized peak aperture was larger for a smaller object (medium vs. small: −1.5 ± 0.3, p < 0.001; large vs. small: −2.0 ± 0.3, p < 0.001, Figure 4C), suggesting that participants had a greater aperture overshoot for smaller objects, consistent with past results (Meulenbroek et al., 2001;Furmanek et al., 2019). None of the interaction effects of sensory feedback, object distance, and object size on peak aperture or size-normalized peak aperture were significant (p > 0.05; Table 1).
Frontiers in Virtual Reality | www.frontiersin.org August 2021 | Volume 2 | Article 648529 6 Sensory feedback did not affect any variable related to the transport component: peak transport velocity, time to peak transport velocity, and transport velocity at CO (p > 0.05; Table 1). As expected, peak transport velocity (F1.1,10.1 239.96, p < 0.001), time to peak transport velocity (F2,18 33.00, p < 0.001), and transport velocity at CO (F2,18 5.11, p < 0.010) differed across objects placed at different distances. Bonferroni's post hoc tests revealed that the values were larger for a more distant object for peak transport velocity (middle vs. near: 12.6 ± 1.1 cm/s, p < 0.001; far vs. near: 23.6 ± 1.1 cm/s, p < 0.001; far vs. middle: 11.0 ± 1.1 cm/s, p < 0.001), time to peak transport velocity (middle vs. near: 19 ± 3 ms, p < 0.001; far vs. near: 25 ± 3 cm/s, p < 0.001), and transport velocity at CO (far vs. near: 4.3 ± 1.4 cm/s, p 0.006). Neither the main effect to object size nor any of the interaction effects of sensory feedback, object distance, and object size on any of these variables was significant (p > 0.05). Furthermore, transport velocity at CO differed across objects of different sizes (F2,18 9.42, p < 0.001). Bonferroni's post hoc tests revealed that transport velocity at CO was lower for a smaller object (large vs. small: 7.8 ± 1.8 cm/s, p 0.001). Otherwise, neither the main effect of object size nor any of the interaction effects of sensory feedback, object distance, and object size was significant for any of these variables (p > 0.05; Table 1).
To investigate whether reach-to-grasp coordination was influenced by visual, auditory, and audiovisual feedback, LMEs were performed to test the relationship between time to peak transport velocity and time to peak aperture and between closure transport velocity at CO and closure distance, and how it was influenced by sensory feedback. Time to peak aperture increased with time to peak transport velocity (B 1.23 ± 0.16, t 7.95, p < 0.001; Figure 7A). The observed increase in time to peak aperture with an increase in time to peak transport velocity did not differ between the three types of sensory feedback (p > 0.05; Table 2). Likewise, closure distance increased with transport velocity at CO (B 0.15 ± 0.0057, t 26.50, p < 0.001; Figure 7B). The observed increase in closure distance with an increase in transport velocity at CO did not differ between the three types of sensory feedback (p > 0.05; Table 2). Together, these results indicate that sensory feedback signaling that the object had been grasped did not affect the coordination between the transport and aperture components, including the initiation of closure based on the state estimate of transport velocity. In summary, these results confirm the known effects of object size and object distance on variables related to the aperture and transport components, respectively (Paulignan et al., 1991a;Paulignan et al., 1991b). However, each type of sensory feedback-visual, auditory, or audiovisual-is equally provided for successful reach-to-grasp. Figure 2C shows the trajectories of the mean 2D position of the wrist, thumb, and index finger corresponding to each collider size condition for a representative participant (averaged across all trials) for the medium object placed at the middle distance. Figure 2D shows mean transport velocity and aperture profiles obtained from the trajectories shown in Figure 2C. Notice that, in both figures, curves for the five collider sizes show noticeable differences. Figure 5 shows the phase relationship between transport velocity and size-normalized aperture. Notice that the magnitude of size-normalized peak aperture reduces with collider size and disproportionately more for a smaller and a more distant object, but it occurs at about the same transport velocity.

Study 2: Effects of Collider Size on Reach-to-Grasp Movements
An rm-ANOVA revealed a significant main effect of collider size on movement time (F4,36 2.87, p < 0.030, Table 3). However, Bonferroni's post hoc tests failed to identify any pairwise differences for different collider sizes (p > 0.05, Figure 6A). As expected, movement time differed across objects placed at different distances (F1.1,10 59.70, p < 0.001). Bonferroni's post hoc tests revealed that movement time was larger for a more distantly placed object (middle vs. near: 49 ± 9 ms, p < 0.001; far vs. near: 97 ± 9 ms, p < 0.001; far vs. middle: 48 ± 9 ms, p 0.004). Neither the main effect of object size nor any interaction effects of collider size, object distance, and object size were significant (p > 0.05).
Neither collider size nor object distance affected peak aperture (p > 0.05; Figure 6B). As expected, aperture differed across objects of different sizes (F1.1, 10.4 183.04, p < 0.001). Bonferroni's post hoc tests revealed that peak aperture was FIGURE 3 | Study 1. Phase plots of size-normalized aperture vs. transport velocity for each condition of sensory feedback for a representative participant. Diamonds and circles indicate size-normalized peak aperture and peak transport velocity, respectively. Black arrows indicate the progression of reach-to-grasp movement.
Frontiers in Virtual Reality | www.frontiersin.org August 2021 | Volume 2 | Article 648529 8 larger for a larger object (medium vs. small: 1.2 ± 0.1 cm, p < 0.001; large vs. small: 2.5 ± 0.1 cm, p < 0.001; large vs. medium: 1.3 ± 0.1 ms, p < 0.001) confirming that the grasp was scaled to object size. None of the interaction effects of collider size, object distance, and object size on peak aperture was significant (p > 0.05).

Variables
FIGURE 4 | Study 1. Effects of (A) object distance on movement time, (B) object size on peak aperture, and (C) object size on size-normalized peak aperture. Error bars indicate ±1SEM (n 10). Data calculated across all levels of sensory feedback for each participant.  3 | Outcomes of 5 × 3 × 3 rm-ANOVAs examining the effects of collider size (0.2, 0.4, 0.8, 1.2, and 1.4), object size (small, medium, and large), and object distance (near, middle, and far) on each kinematic variable in Study 2.

Variables
Collider size ( MT: movement time, PA: peak aperture, SN-PA: size-normalized peak aperture, PV: peak transport velocity, T-PV: time to peak transport velocity, and TV-CO: transport velocity at closure onset. NS: not significant.
FIGURE 5 | Study 2. Phase plots of size-normalized aperture vs. transport velocity for each collider size for a representative participant. Diamonds and circles indicate size-normalized peak aperture and peak transport velocity, respectively. Black arrows indicate the progression of reach-to-grasp movement.
To investigate whether reach-to-grasp coordination was influenced by collider size, LMEs were performed to test the relationship between time to peak transport velocity and time to peak aperture and between closure transport velocity at CO and closure distance and how it was influenced by collider size. Time to peak aperture increased with time to peak transport velocity (B 0.83 ± 0.11, t 7.34, p < 0.001; Figure 7C). The observed increase in time to peak aperture with an increase in time to peak transport velocity was not affected by collider size (p > 0.05; Table 4). Likewise, closure distance increased with transport velocity at CO (B 0.17 ± 0.0061, t 27.37, p < 0.001; Figure 7D). The observed increase in closure distance with an increase in transport velocity at CO was not affected by collider size (p > 0.05; Table 4). Together, these results indicate that collider size did not affect the coordination between the transport and aperture components, including the initiation of closure based on the state estimate of transport velocity.
In summary, these results further confirm the known effects of object size and object distance on variables related to the aperture and transport components, respectively (Paulignan et al., 1991a;Paulignan et al., 1991b). Most importantly, we show that collider size also affects properties of the grasp relative to the object, specifically, a larger collider prompts a proportionally small aperture. Nonetheless, it appears that collider size has no bearing on reach-to-grasp coordination.

DISCUSSION
We investigated the effects of sensory feedback mode (Study 1) and collider size (Study 2) on the coordination of reach-to-grasp movements in hf-VE. Contrary to our expectation (H1), we found that visual, auditory, and audiovisual feedback did not differentially impact key features of reach-to-grasp kinematics in the absence of terminal haptic feedback. In Study 2, larger colliders led to a smaller size-normalized peak aperture (H2) suggesting a possible influence of spherical collider size on the perception of virtual object size and motor planning of reach-tograsp. Critically, reach-to-grasp spatiotemporal coordination patterns were robust to manipulations of sensory modality and for haptic sensory substitution and spherical collider size.

Manipulations of Sensory Substitution
In Study 1, we did not observe any changes in the transport and aperture kinematics or in the reach-to-grasp coordination, as a function due to the type of sensory substitution that was provided (visual, auditory, or audiovisual) to indicate that the object had been grasped in the absence of haptic feedback about object properties. Our data did confirm the known effects of object size and object distance on variables related to the aperture and transport components, respectively (Paulignan et al., 1991a;Paulignan et al., 1991b), indicating that variation in reach-tograsp patterns with respect to object properties in our hf-VE is comparable to that found in the real world as previously indicated in Furmanek et al. (2019). While many studies have explored the role of sensory substitution of haptic feedback in VR (Sikström et al., 2016;Cooper et al., 2018), few studies have investigated the effect of sensory substitution for haptic feedback, specifically in the context of reach-to-grasp movements. One study that used simple spherical colliders for grasping reported faster movement time when sensory substitution for haptic feedback was provided with audio and audiovisual cues compared to visual or absent cues that the object was grasped (Zahariev and MacKenzie, 2007). Our findings that there were no differences in movement kinematics for different types of haptic sensory substitution conditions do not support these past findings, though differences in the outcomes may be explained, in part, by the VR technology utilized. For example, in Zahariev and MacKenzie (2007), participants grasped mirror reflections of computergenerated projections of objects. Such setups have lower fidelity of object rendering than what is typical of HMD-VR and might result in greater salience to auditory feedback. In a more recent study using HMD-VR, participants performed reach-to-grasp movements as part of a pick and place task in less time with visual compared to auditory sensory substitution but interestingly indicated a preference for auditory cues that the object was grasped (Canales and Jörg, 2020). Notably, differences between audio, visual, and audiovisual feedback were small, and since reach-to-grasp kinematics were not presented, interpretations as to why the movements were slower with audio feedback were not possible to make. In an immersive hf-VE like ours, participants might not have had to rely on one sensory modality over the other and hence did not show differences in reach-to-grasp coordination based on visual, auditory, and audiovisual feedback. Furthermore, the fact that we did not observe differences in movement kinematics and spatiotemporal reach-to-grasp coordination ( Figures 7A,B) suggests that, in a high-fidelity VR environment, the choice of modality for sensory substitution for haptic feedback may have relatively little bearing on behavior. We speculate that, with highfidelity feedback of the hand-object interaction, visual feedback of the hand-object collision, rather than explicit feedback in the form of overt sensory substitution, may govern behavior.
The finding that visual information may be sufficient for haptic-free grasping is in agreement with the interesting line of research using a haptic-free robotic system. For instance, Meccariello and others (Meccariello et al., 2016) showed that experienced surgeons perform conventional suturing faster and more accurately than nonexperts when only visual information was used. It has been proposed that experienced surgeons may create a perception of haptic feedback during haptic-free robotic surgery based on visual information and previously learned haptic sensations (Hagen et al., 2008). This suggests that haptic feedback may be needed during skill acquisition, but not necessary for practiced movement.
Another parsimonious explanation for why we did not observe between-condition differences of sensory feedback type on grasp kinematics is related to the study design. As opposed to Zahariev and Mackenzie (2007) and Zahariev and Mackenzie (2008), who randomized the order of object size trials, our participants performed reach-to-grasp actions to each object in a blocked manner (i.e., all trials for each object size-distance pair were completed consecutively within each block). Thus, in our study, subjects' prior experience-specifically, the proprioceptively perceived final aperture-might have made reliance on explicit feedback of grasp less necessary. Indeed, the calibration of the current reach-to-grasp movement based on past movements is well documented (Gentilucci et al., 1995;Säfström and Edin, 2004;Säfström and Edin, 2005;Bingham et al., 2007;Mon-Williams and Bingham, 2007;Coats et al., 2008;Säfström and Edin, 2008;Foster et al., 2011). Finally, the availability of continuous online feedback of the target object and colliders might have also reduced reliance on sensory feedback (Zahariev and MacKenzie, 2007;Zahariev and Mackenzie, 2008;Volcic and Domini, 2014). The present study was not designed to test such a hypothesis, but future work can explicitly investigate whether reliance on different modalities of terminal sensory feedback may be stronger in a randomized design, when anticipation and planning are less dependable.

Manipulations of Collider Size
In Study 2, there was a significant main effect of collider size for movement time, time to peak transport velocity, and sizenormalized peak aperture indicating that collider size modified key features of the reach-to-grasp movement. It is likely that the collider size altered the perception of object size, an object might be perceived to be smaller when using a larger collider, and that this altered perception might have affected the planning of reach-tograsp movements. Indeed, previous studies have shown that the hand avatar may act as a metric to scale the intrinsic object properties (e.g., object size) (Linkenauger et al., 2011;Linkenauger et al., 2013;Ogawa et al., 2017;Ogawa et al., 2018;Lin et al., 2019). Interestingly, Ogawa et al. (2017) found that perception of object size was affected by the realism of the avatar, with a biological avatar showing a greater effect on object size perception than an abstract avatar such as what was used in our study. However, in that study participants did not grasp the object; the task was simply to carry the virtual cube object on an open avatar palm. It may therefore be concluded that the effect of avatar size on perception is likely mediated by the requirements of the task, and the use of avatar size as a means to scale the dimension of the intrinsic object properties is more sensitive when the avatar is used to actually grasp the object. One caveat to our finding is that a collider size by object size interaction was not observed. If collider size caused a linear scaling of the perception of object size, then a collider size by object size interaction would be expected as the change in the ratio of collider size to object size will be different for different object sizes. Hand size manipulations do not affect the perceived size of objects that are too big to be grasped, suggesting that hand size may only be used as a scaling mechanism when the object affords the relevant action, in this case, grasping (Linkenauger et al., 2011), providing further evidence of nonlinearities in the use of the hand avatar as a "perceptual ruler." Therefore, our findings indicate that either the scaling of perception of object size by collider size is nonlinear or the changes we observed arise from different explicit strategies for different colliders independent of perception. Future research will test these competing hypotheses.
Assuming that collider size did in fact influence the perception of object size, it follows that the size of the colliders might have had a similar effect on altering the perceptual scaling of object distance. This interpretation provides a possible explanation for the significant main effect of collider size on time to peak transport velocity. However, given that the ratio of collider size to object distance was much smaller than the ratio of collider size to object size, we think that perceptual effects on distance were probably negligible, at least relative to the perceptual effects on object size. We therefore offer an alternative explanation for the scaling of peak transport velocity and associated movement time, with different collider sizes. If collider size affected the planning of aperture overshoot, as evidenced by the main effect of size-normalized peak aperture, then we may assume that this was also incorporated into the planning of transport to maintain the spatiotemporal coordination of reach-to-grasp. Our data indicate that this may be the case, as both temporal (the relationship between time to peak transport velocity and time to peak aperture) and spatial (the relationship between transport velocity at CO and closure distance) aspects of coordination were not influenced by collider size (Figures 7C,D).
Agnostic to whether the effects of the colliders on aperture profiles were perceptual or strategic, we surmise that these effects were present at the beginning of the movement to ensure that the coordination of the reach and grasp component was not disrupted. Preservation of reach-to-grasp coordination as the primary goal of reach-to-grasp movements is something we have observed in our previous work (Furmanek et al., 2019;Mangalam et al., 2021). The block nature of our design likely facilitated the described effect on planning; however, we do not believe that proprioceptive memory had a large influence on the effects observed in Study 2. If proprioceptive memory did influence behavior, we can assume that it would be equal across all collider sizes and therefore cannot explain behavioral differences across collider sizes. Future research should test whether the observations here hold if object size and distance are randomized.
Our result that larger colliders led to a smaller size-normalized peak aperture can also be framed using the equilibrium point hypothesis (EPH) (Feldman, 1986). In this framework, the peak aperture at a location near the object may be considered a key point in the referent trajectory driving the limb and finger movements (Weiss and Jeannerod, 1998). Given the evidence that the referent configuration for a reach-to-grasp action is specified depending on the object shape, localization, and orientation to define a position-dimensional variable, threshold muscle length (Yang and Feldman, 2010), it is possible that collider size may also influence the referent configuration. One possibility is that collider size may influence the perceived force needed to grasp the object (Pilon et al., 2007) despite the virtual object having no physical properties. Future studies may be specifically designed to test this hypothesis for hf-VE.

Limitations
Our studies had several limitations. Data were collected from only ten participants limiting the generalization of our findings and potentially exposing us to type 2 error if a certain outcome measure effect size is small. The sample involved only three female participants making it difficult to understand if there may be sex-dependent differences in reach-to-grasp performance, particularly in light of recent evidence that VR may be experienced differently between male and female participants (Munafo et al., 2017;Curry et al., 2020). We used a simple hand avatar rendering of spheres to represent only the tips of the thumb and index finger, and the results of this study may not extrapolate to more anthropomorphic avatars. Our VE was simple comprising only the table, object to be grasped, and hand avatar. Use of the hand avatar as a "perceptual ruler" for objects in the scene may be different for richer environments, especially for those comprising objects with strong connotations of their size (e.g., a soda can). Finally, the degree of stereopsis, presence, and immersion and symptoms of cybersickness were not recorded, and therefore, the influence of these factors on individual participant behavior is unknown.

CONCLUDING REMARKS
The results of our studies together suggest that spatiotemporal coordination of reach-to-grasp in a high-fidelity immersive hf-VE is robust to the type of modality (e.g., visual/auditory) used as a sensory substitute for the absence of haptic feedback and to the size of the avatar that represents the fingertips. Avatar size may modify the magnitude of peak aperture in hf-VE when using spheres to represent the fingertips, but this change did not affect spatiotemporal coordination between reach and grasp components of the movement. We suggest that the modulation of aperture associated with avatar size may be rooted in the use of the avatar as a "perceptual ruler" for intrinsic properties of virtual objects. These results have implications for commercial and clinical use of hf-VE and should be evaluated in relation to technological limitations of the VR system (i.e., tracking accuracy, update rate, and display latency) (Stanney, 2002). Specifically, when VR is used for manual skill training or neurorehabilitation (Adamovich et al., 2005;Adamovich et al., 2009;Massetti et al., 2018), future work should consider the implications of avatar size on the transfer of learning from the VE to the real world especially in populations with deficits in multisensory integration.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation. Please contact the corresponding author by e-mail.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Institutional Review Board (IRB) at Northeastern University. The participants provided their written informed consent to participate in this study.