Enhancing Virtual Walking Sensation Using Self-Avatar in First-Person Perspective and Foot Vibrations

Walking is a fundamental physical activity in humans. Various virtual walking systems have been developed using treadmill or leg-support devices. Using optic flow, foot vibrations simulating footsteps, and a walking avatar, we propose a virtual walking system that does not require limb action for seated users. We aim to investigate whether a full-body or hands-and-feet-only walking avatar with either the first-person (experiment 1) or third-person (experiment 2) perspective can convey the sensation of walking in a virtual environment through optic flows and foot vibrations. The viewing direction of the virtual camera and the head of the full-body avatar were linked to the actual user's head motion. We discovered that the full-body avatar with the first-person perspective enhanced the sensations of walking, leg action, and telepresence, either through synchronous or asynchronous foot vibrations. Although the hands-and-feet-only avatar with the first-person perspective enhanced the walking sensation and telepresence, compared with the no-avatar condition, its effect was less prominent than that of the full-body avatar. However, the full-body avatar with the third-person perspective did not enhance the sensations of walking and leg action; rather, it impaired the sensations of self-motion and telepresence. Synchronous or rhythmic foot vibrations enhanced the sensations of self-motion, waking, leg action, and telepresence, irrespective of the avatar condition. These results suggest that the full-body or hands-and-feet avatar is effective for creating virtual walking experiences from the first-person perspective, but not the third-person perspective, and that the foot vibrations simulating footsteps are effective, regardless of the avatar condition.


INTRODUCTION
Walking is a fundamental physical activity in the daily lives of humans. Rhythmic activity is controlled semi-automatically by a spinal locomotion network (central pattern generator) (MacKay-Lyons, 2002). Walking produces various perceptions that differ from standing, although humans are not typically conscious of them because of the semi-automatic process involved. For example, vestibular sensations occur during movements, visual elements move radially (optic flow), tactile sensations arise on the feet with the foot striking on the ground, and the proprioception of limbs changes each time a person moves their hands and feet. Conversely, a walking sensation can be induced by presenting stimuli perceived while walking.
Generating the pseudo-sensation of walking is a popular topic in the virtual reality (VR) field. A popular method entails having a person walk at a certain location in a stationary place (walking-in-place; e.g., Templeman et al., 1999) or using a specific apparatus such as omnidirectional treadmills (Darken et al., 1997;Iwata, 1999), foot-supporting motion platforms (Iwata et al., 2001), movable tiles (Iwata et al., 2005), and rotating spheres (Medina et al., 2008). Freiwald et al. (2020) developed a virtual walking system by mapping the cycling biomechanics of seated users' legs to virtual walking. These methods can produce a pseudo-sensation of walking because the participants move their legs, and the motor command and proprioception of the legs are similar to actual leg movements.
The perception of self-motion, which can be induced only by visual motion in a large visual field, is known as visually induced self-motion perception or vection (as a review, Dichgans and Brandt, 1978;Riecke, 2011;Palmisano et al., 2015). Vection is induced when motion is presented in a large visual field (Dichgans and Brandt, 1978;Riecke and Jordan, 2015), with background, rather than foreground motion (Brandt et al., 1975;Ohmi et al., 1987), and non-attended, rather than attended motion, (Kitazaki and Sato, 2003) being dominant. Vection is heightened by perspective jitter (Palmisano et al., 2000(Palmisano et al., , 2003 and non-visual modality information, such as sound and touch (Riecke et al., 2009;Farkhatdinov et al., 2013). It is also facilitated by naturalistic and globally consistent stimuli (Riecke et al., 2005(Riecke et al., , 2006. Several virtual walking systems that do not necessitate active leg movements have been developed using vection (Lécuyer et al., 2006;Terziman et al., 2012;Ikei et al., 2015;Kitazaki et al., 2019). Integrating simulated camera motions (similar to perspective jitter) with the vection stimulus, such as an expanding radial flow, improves the walking sensation of seated users (Lécuyer et al., 2006). By conveying tactile sensations to the feet in addition to vection, the pseudo-sensation of virtual walking sensation can be induced (Terziman et al., 2012;Ikei et al., 2015;Kitazaki et al., 2019). Kitazaki et al. (2019) demonstrated that foot vibrations that were synchronous to the oscillating actual optic flow induced not only the sensation of self-motion but also the sensations of walking, leg action, and telepresence. Kruijff et al. (2016) demonstrated that walking-related vibrotactile cues on the feet, simulated head bobbing as a visual cue, and footstep sounds as auditory cues enhance vection and presence. Telepresence, one of the crucial factors in VR, refers to the perception of presence at a physically remote or simulated site (Draper et al., 1998).
However, users cannot observe their bodies in these virtual walking systems. This is inconsistent with our daily experience. Turchet et al. (2012) developed a multimodal VR system to enhance the realism of virtual walking by combining visual scenes, footstep sounds, and haptic feedback on the feet that vary according to the type of ground; they also incorporated a walking or running avatar from a third-person perspective. Although they demonstrated that consistent multimodal cues enhance the realism of walking, they did not evaluate the effect of the avatar. Humans can have illusory body ownership of an avatar when they receive synchronous tactile sensations with it, such as the rubber hand illusion (Botvinick and Cohen, 1998) and full-body illusion (Ehrsson, 2007;Lenggenhager et al., 2007;Petkova and Ehrsson, 2008), or when the avatar moves synchronously with them (Gonzalez-Franco et al., 2010;Maselli and Slater, 2013). The sense of agency is the subjective experience of controlling one's own actions and events in the virtual world (Haggard and Chambon, 2012;Haggard, 2017). Kokkinara et al. (2016) demonstrated that despite the absence of any corresponding activity, except head motions on the part of the seated users, they exhibited an illusory sense of ownership and agency of their walking avatars in a virtual environment. The illusory ownership and agency were stronger with a first-person perspective, compared to a third-person perspective, and the sway simulating head motion during walking reduced the sense of agency.
Therefore, we propose a new virtual walking system that induces the sense of ownership and agency in the users of a walking self-avatar through a combination of vection and leg action induced by optic flow and foot vibrations, respectively. The illusory ownership of the avatar's body and sense of agency could enhance the sensations of telepresence and walking (Sanchez-Vives and Slater, 2005).
The walking avatars we employed were hands-and-feet-only and full-body avatars. When the virtual hands and feet exhibit an appropriate spatial relationship and moved synchronously with the user, they will feel as if the hands-and-feet-only avatar is their own body; furthermore, there will be a sense of having an invisible trunk between the hands and feet (Kondo et al., 2018(Kondo et al., , 2020. This method of inducing illusory body ownership is easy to implement, requires low computing power, and can potentially reduce the conflict between the appearance of the actual self 's body and the avatar's body. Hence, we aim to compare handsand-feet-only and full-body avatars in experiments. We employed a first-person perspective in experiment 1 and a third-person perspective in experiment 2. The firstperson perspective is more effective for inducing illusory body ownership in virtual environments Maselli and Slater, 2013;Kokkinara et al., 2016), more immersive (Monteiro et al., 2018), and better for dynamic task performance (Bhandari and O'Neill, 2020). However, walking persons typically view a limited part of the avatar's body (such as their hands and insteps in the periphery) from the first-person perspective; therefore, a mirror is typically used to view the full-body. However, the presence of a mirror on the scene or viewing their own body in the mirror might interfere with walking sensations. By contrast, a person can have illusory ownership of the avatar's entire body from behind the avatar from a third-person perspective (Lenggenhager et al., 2007Aspell et al., 2009). It might be preferable to view the avatar's walking movements from a third-person perspective, rather than a first-person perspective. Therefore, we also investigated the effects of the third-person perspective on the illusory body ownership of the walking avatar, in addition to those of the first-person perspective.
In summary, the purpose of this study was to analyze whether an avatar representing a seated user's virtual walking can enhance the sensation of walking through optic flow and foot vibrations that are synchronous with the avatar's foot movements. The full-body and hands-and-feet-only avatars with both the firstand third-person perspectives were compared. We predicted that either of the walking hand-and-feet or full-body avatar would enhance the virtual walking sensation. It must be noted that the preliminary analysis of experiment 1 has already been presented at a conference as a poster presentation (Matsuda et al., 2020).

Participants
Twenty observers (19 men) aged 21-23 (mean 21.95, SD 0.50) and 20 observers (all men) aged 19-24 (mean 21.65, SD 1.11) participated in experiments 1 and 2, respectively. The two experiments were comprised of different participants. All observers had normal or corrected-to-normal vision. Informed consent was obtained from all the participants prior to the experiment. The experimental methods were approved by the Ethical Committee for Human-Subject Research at the Toyohashi University of Technology, and all methods were performed in accordance with the relevant guidelines and regulations. The sample size was determined via a priori power analysis using G * Power 3.1 (Faul et al., 2007(Faul et al., , 2009: medium effect size f = 0.25, α = 0.05, power (1β) = 0.8, and repeated measures ANOVA (three avatar conditions × two vibration conditions).

Apparatus
We used a computer (Intel Core i7 6700, NVIDIA GeForce GTX 1060 6GB) with Unity Pro to create and control the stimuli. The visual stimuli were presented using a head-mounted display [HMD; HTC VIVE, 1080 (width) × 1,200 (height) pixels, refresh rate, 90 Hz; Figure 1A). The final IK Unity plugin was used to associate the participants' head movements with the avatar's head movement in yaw, roll, and pitch. Tactile stimuli were created using Audacity and presented via four vibro-transducers (Acouve Lab Vp408, 16-15 kHz), each placed on the left and right forefeet and heels by inputting sound signals from a power amplifier (Behringer EPQ450, 40 W (8 ) × 4 ch) through a USB multichannel preamplifier (Behringer FCA1616, input 16 ch, output 16 ch) controlled by the computer. The maximum input to the vibro-transducers was 6 W, and the amplitude was fixed throughout all the experiments. The amplitude of the vibrations was strong enough for all the participants to feel vibrations while wearing socks. The foot vibration system was made of aluminum frames, vibro-transducers, acrylic plates, springs, and wood plates ( Figure 1B). The relative positions of the vibro-transducers were constant for all the participants. The wood plates were firmly connected to the aluminum frame and supported the participants' midfoot. The vibro-transducers for the forefoot and heel were connected to the frame via an acrylic plate with springs to prevent transmission of vibrations ( Figure 1C). To eliminate sounds from the vibro-transducers, white noise (70 dBA) was presented to the participants using a noise-canceling headphone (SONY WH-1000XM2). The system latency was as high as 11.11 ms.

Visual Stimuli
A virtual room and an avatar were presented in a virtual environment through an HMD. As shown in Figure 2A, the room comprised a textured floor with sidewalls (10-m width × 120-m depth × 5-m height); the textured floor was made of wooden materials. In experiment 1, 18 mirrors (2 × 2 m, 1.45 m left and right alternately from the center, every 5 m in the walking direction, oriented inward by 20 • ) were used to view the avatar in the first-person perspective.
We used a three-dimensional human model, Toshiro (Renderpeople), as a full-body avatar for all participants. The height of the avatar was 170 cm, and the height of the eyes was 155 cm. The avatar was walking forward (5.16 km/h, 2.02 steps/s) with the walking animation. Participants observed the scene from the avatar's viewpoint (first-person perspective) in experiment 1 and from the viewpoint 2 m behind the avatar's eyes (third-person perspective) in experiment 2 (Figures 2B,C, Supplementary Videos 1, 2). The participants were able to observe around them in the virtual environment by turning their head, and the avatar's head motion was synchronized with the actual participant's head motion. The participants observed the avatar' s body movements, including head motion, both in the mirrors and in direct viewing in experiment 1, and from the back of the avatar in experiment 2. For the hands-and-feet-only avatar, only the hands and feet of the Toshiro were presented similar to those of the avatar of Kondo et al. (2018Kondo et al. ( , 2020 (Figure 3). The walking motions were identical for both the full-body and hands-and-feet-only avatars.

Tactile Stimuli
Foot vibrations were presented in the left and right forefeet and heels (Figure 1B), simulating the foot striking the ground. The vibration was generated from the footstep sound of walking on an asphalt-paved road (https://soundeffect-lab.info/) using Audacity software. The duration of the vibration was 300 ms, and the maximum amplitude occurred at 65 ms ( Figure 4A). The same foot vibration was used for the heel and forefoot. The heel vibration was followed by the forefoot vibration. The stimulus onset asynchrony of the heel and forefoot vibrations was 105 ms.
The timings of the foot vibrations were synchronous with the avatar's foot strikes on the floor in the synchronous condition, whereas they were randomized in the asynchronous condition ( Figure 4B). The total number of vibrations was 158 under both conditions in each trial.

Conditions
Three avatar conditions (full body, hands-and-feet only, and no avatar) and two foot-vibration conditions (synchronous and asynchronous timings) were used in both experiments 1 and 2. Twenty-four random trials (three avatar conditions × two foot vibration conditions × four repetitions) were performed by each participant. Hence, the avatar and vibration conditions were both within-subject factors.

Questionnaire
Four items obtained from a previous study (Kitazaki et al., 2019) were presented to assess the participants' sensations after each stimulus ( Figure 5).
1 : I felt that my entire body was moving forward (self-motion). 2 : I felt as if I was walking forward (walking sensation). 3 : I felt as if my feet were striking the ground (leg action). 4 : I felt as if I was present in the scene (telepresence).
The order of the items was constant throughout all the trials. The participants responded to these items using a visual analog scale (VAS); the leftmost side of the line implied no sensation, whereas the right side of the line implied the same sensation as in the actual walking experience. The data were digitized from 0 to 100 for analysis.

Procedure
The participants sat on a stool and wore an HMD. The foot vibrators comprised four vibro-transducers and a headphone ( Figure 1A). After a 5-s blank, a fixation point appeared for 2 s, following which the participants were instructed to study the environments for 5 s to confirm the contents of the virtual environment and embody the avatar. In the full-body condition, they observed the avatar's head motion in the mirrors in experiment 1 or from the third-person perspective in experiment 2, turning their heads to embody the avatar. Subsequently, they observed the walking animation from the first-(experiment 1) or third-person (experiment 2) perspective for 40 s. Then, a questionnaire was presented to the participants. All the participants participated in 24 trials (three avatar conditions × two vibration conditions × four repetitions). The order of the trials was randomized.

Experiment 1: First-Person Perspective
We calculated the mean digitized VAS data (0-100) for each question item and tested the normality of the data (Shapiro-Wilk test, α = 0.05). The data of leg action deviated significantly from the normality, whereas the others did not violate the normality. Thus, we conducted a two-way repeated measures ANOVA with the aligned rank transformation  (ART) (ANOVA-ART) procedure (Wobbrock et al., 2011) on the non-parametric data (leg action) based on the three avatar conditions (full body, hands-and-feet only, and no avatar) and two foot-vibration conditions (synchronous and asynchronous timings). The Tukey method with Kenward-Roger degrees of freedom approximation was applied for the post-hoc multiple comparison for the non-parametric ANOVA-ART. We conducted a two-way repeated measures ANOVA on the parametric data (self-motion, walking, and telepresence). When the Mendoza's multisample sphericity test exhibited a lack of sphericity, the reported values were adjusted using the Greenhouse-Geisser correction (Geisser and Greenhouse, 1958). Shaffer's modified sequentially rejective Bonferroni procedure was applied for post-hoc multiple comparisons for the parametric ANOVA.
For the self-motion sensation, the ANOVA revealed significant main effects for the avatars [F (1.41, 26.71 (Figure 6A). A post-hoc analysis of the avatars did not show any significant difference across the conditions, although the scores were slightly higher for the full-body avatar, compared to the others [t(19) = 2.13, adj. p = 0.101 between the full-body and hands-and-feet-only avatar conditions; t(19) = 2.29, adj. p = 0.101 between the full-body and no-avatar conditions]. The scores were higher in the synchronous vibration condition than in the asynchronous vibration condition, irrespective of the avatar conditions.
For the walking sensation, the ANOVA revealed significant main effects for the avatar [F (1.59, 30.22 (Figure 6B). The post-hoc analysis of the avatar conditions showed that the scores for the full-body avatar were significantly higher than those for the hands-and-feet avatar [t(19) = 3.40, adj. p = 0.003] and the no-avatar conditions [t(19) = 4.45, adj. p < 0.001]. Additionally, the post-hoc analysis showed that the scores for the hands-and-feet   (Figure 6D). The post-hoc analysis of the avatar conditions showed that the scores for the full-body avatar were significantly higher than those for the hands-andfeet avatar [t(19) = 3.36, adj. p = 0.009] and no-avatar conditions [t(19) = 3.41, adj. p = 0.009]. Furthermore, the post-hoc analysis showed that the scores for the hands-andfeet avatar were significantly higher than those for the noavatar condition [t(19) = 2.12, adj. p = 0.047]. The scores were higher in the synchronous vibration condition than in the asynchronous vibration condition, irrespective of the avatar conditions.
To summarize, the synchronous or rhythmic foot vibrations induced higher sensations of self-motion, walking, leg action, and telepresence in all the avatars. The full-body avatar induced greater walking and leg action sensations and telepresence than the other conditions, irrespective of the foot vibration timing. The walking sensation and telepresence induced by the handsand-feet-only avatar were higher and lower than those of the no-avatar and full-body-avatar conditions, respectively.

Experiment 2: Third-Person Perspective
We calculated the mean digitized VAS data (0-100) for each item and tested the normality of the data (Shapiro-Wilk test, α = 0.05). The data of self-motion and telepresence significantly deviated from the normality, whereas the others did not violate the normality. Thus, we conducted a two-way repeated measures ANOVA with ART on the non-parametric data (selfmotion and telepresence) based on three avatar conditions (full body, hands-and-feet-only, and no avatar) and two footvibration conditions (synchronous and asynchronous timing). The Tukey method with Kenward-Roger degrees of freedom approximation was applied for the post-hoc multiple comparison of the non-parametric ANOVA with ART. We conducted a two-way repeated measures ANOVA on the parametric data (walking and leg action). When the Mendoza's multisample sphericity test exhibited a lack of sphericity, the reported values were adjusted using the Greenhouse-Geisser correction. Shaffer's modified sequentially rejective Bonferroni procedure was applied for post-hoc multiple comparisons for the parametric ANOVA.
For self-motion, the ANOVA with ART revealed significant main effects for the avatar [F (2, 38) = 12.07, p < 0.0001, η 2 p = 0.39] and foot vibrations [F (1, 19) = 45.85, p < 0.0001, η 2 p = 0.71) ( Figure 7A). The post-hoc analysis of the avatar conditions showed that the scores for the no-avatar condition were significantly higher than those for the full-body avatar [t(19) = 3.72, adj. p = 0.0018] and hands-and-feet-only avatar [t(19) = 4.64, adj. p = 0.0001]. The scores were higher in the synchronous vibration condition than in the asynchronous vibration condition, irrespective of the avatar conditions.
For leg action, the ANOVA revealed only a significant main effect for foot vibrations [F (1, 19) = 66.08, p < 0.0001, η 2 p = 0.78] (Figure 7C). The scores were higher in the synchronous vibration condition than in the asynchronous vibration condition, irrespective of the avatar conditions.
For telepresence, the ANOVA with ART revealed significant main effects for the avatar [F (2, 38) = 8.77, p < 0.0001, η 2 p = 0.32] and foot vibrations [F (1, 19) = 33.33, p < 0.0001, η 2 p = 0.64], and interaction [F (2, 38) = 4.37, p = 0.0195, η 2 p = 0.19] (Figure 7D). The analysis of the simple main effects showed that the scores were higher in the synchronous than asynchronous vibrations in all avatar conditions [full-body avatar : F (1, 19 The post-hoc multiple comparison analysis of the simple effects of the avatar difference showed that the scores for the no-avatar condition were significantly higher than those for the full-body avatar [t(19) = 3.78, adj. p = 0.002) and handsand-feet avatar conditions [t(19) = 4.69, adj. p = 0.0001] in the synchronous condition, whereas the scores for the no-avatar condition were significantly higher than those for the handsand-feet avatar condition [t(19) = 3.20, adj. p = 0.008] in the asynchronous condition.
These results indicated that the avatar did not significantly enhance any of the walking-related sensations from the third-person perspective. In contrast, both the full-body and hands-and-feet avatars impaired the self-motion sensation and telepresence, irrespective of the foot vibrations, in comparison to the no-avatar condition.

Summary of Results
We investigated whether the avatar enhanced the virtual walking sensation. Experiment 1 showed that the full-body avatar with the first-person perspective enhanced the sensations of walking, leg action, and telepresence, either with synchronous or asynchronous foot vibrations. Experiment 2 showed that the fullbody avatar with the third-person perspective did not enhance the sensations of walking and leg action; rather, it impaired the sensations of self-motion and telepresence in comparison to the no-avatar condition. These results suggest that the role of the avatar in virtual walking differs, depending on whether it is first-or third-person perspective. Furthermore, experiment 1 showed that the hands-and-feet-only avatar with the first-person perspective enhanced the walking sensation and telepresence, in comparison with the no-avatar condition; however, the effect was less pronounced than that of the full-body avatar.

Enhancing Effect of Walking Avatar From the First-Person Perspective
The effects of the avatar from the first-person perspective were significant in terms of walking sensation, leg action, and telepresence, but insignificant in terms of self-motion sensation. When the participants observed the avatar walking, they felt an illusory sense of the agency of walking (Kokkinara et al., 2016). Hence, the walking sensation and leg action would be enhanced by the illusory agency of the walking action. The sense of agency and/or illusory body ownership of the avatar can improve telepresence. Furthermore, the effect was observed in the hands-and-feet-only avatar in terms of the walking sensation and telepresence from the first-person perspective. This is reasonable because the hands-and-feet-only avatar induced an illusory ownership of the full-body by interpolating the hands and feet (Kondo et al., 2018(Kondo et al., , 2020.

Enhancing Effect of Foot Vibrations
Synchronous or rhythmic foot vibrations enhanced the sensations of self-motion, walking, leg action, and telepresence, irrespective of the avatar conditions, both in the first-and third-person perspectives. This is consistent with the results of a previous study (Kitazaki et al., 2019). However, the effects of the full-body and hands-and-feet body avatars might be more effective with synchronous than with asynchronous foot vibrations because the synchronicity between the avatar's footsteps (visual stimuli) and foot vibrations (tactile stimuli) should improve the illusory body ownership of the avatar, similar to the rubber-hand illusion (Botvinick and Cohen, 1998). We speculated that the effect of rhythmic foot vibration on the spinal locomotion network (central pattern generator) was much stronger than the effect of illusory ownership of the walking avatar. Because the effect of foot vibration was more prominent in the walking and leg action sensations than in the self-motion and telepresence (see Figures 6, 7), the rhythmic foot vibrations simulating footsteps were particularly effective for the active components of walking.

Impaired Sensations by the Avatar From Third-Person Perspective
The self-motion sensation and telepresence induced by the avatar in the third-person perspective was impaired, compared to that of the no-avatar condition; furthermore, the avatar exerted no significant influence on the walking and leg action sensations, irrespective of the foot vibration conditions. This may be explainable from the perspective of the participants who may compare the no-avatar condition in the first-person perspective with the explicit third-person perspective conditions where the full-body or the hands-and-feet avatar was located in front of the participants. The first-person perspective is more immersive than the third-person perspective (Monteiro et al., 2018). This might result in heightened self-motion sensation and telepresence, compared to the no-avatar condition, as well as a juxtaposition of the advantages of the avatar and the disadvantage of the third-person perspective in terms of walking and leg action sensations.
In the third-person perspective, the avatar condition exerted a greater influence on the telepresence in the synchronous condition than in the asynchronous condition, and the scores for the no-avatar condition were significantly higher than those for the full-body avatar and hands-and-feet avatar conditions in the synchronous condition, but higher than those of the handsand-feet-only avatar conditions in the asynchronous condition. Thus, the congruent foot vibration with the third-person avatar impaired telepresence, in comparison to the no-avatar condition. This might be due to a strange feeling of synchrony induced by the foot vibrations and the visual footsteps of the avatar in front of the observer. We speculate that the user's presence might be dissociated from the location of the viewpoint and that of the avatar from the third-person perspective because the sense of illusory body ownership or out-of-body experience was not sufficient in the experiment.
The scores for the no-avatar condition in experiment 2 were higher than those in experiment 1. This might be due to the exclusion of mirrors from the scene in experiment 2. We speculate that mirrors could interfere with the sensations of walking and telepresence because the visual motions in them were tiresome and confusing. As an exploratory analysis to investigate this difference between the experiments, we conducted a three-way mixed ANOVA based on three avatar conditions (full body, hands-and-feet only, and no avatar) and two foot-vibration conditions (synchronous and asynchronous timings) as within-subject variables (repeated factors) and two experiments as a between-subject variable because the participants were different. We applied the ANOVA for the walking sensation (parametric data) and the ANOVA-ART for self-motion, leg action, and telepresence (non-parametric data). We found significant interactions of the experiments and the avatar for all sensations [self-motion: F (2, 76) = 10.21, p = 0.0001, η 2 p = 0.21; walking: F (2, 76) = 8.83, p = 0.0004, η 2 p = 0.19; leg action: F (2, 76) = 4.40, p = 0.02, η 2 p = 0.10; telepresence: F (2, 76) = 13.24, p < 0.0001, η 2 p = 0.26]. The analysis of simple effects showed that the scores of experiment 2 (without mirror) in the no-avatar condition were significantly higher than those of experiment 1 (with mirror) for the sensation of walking [F (1, 38) = 7.46, p = 0.0100, η 2 p = 0.16] and telepresence [F (1, 38) = 4.87, p = 0.0335, η 2 p = 0.11], but not significant for the sensation of self-motion and leg action. For telepresence only, the three-way interaction of the experiment × avatar × foot vibration was significant [F (2, 76) = 6.61, p < 0.0001, η 2 p = 0.15]. The analysis of simple effects showed that the scores for experiment 2 (without mirror) in the no-avatar condition were significantly higher, compared to experiment 1 (with mirror) only in the synchronous condition [F (1, 38) = 11.56, p = 0.0002, η 2 p = 0.23]. This result indicated that the presence of mirrors impaired telepresence only when there was no avatar and the foot vibration was rhythmical. We cannot fully explain this result; however, we may conjecture that the combination of no reflection of the observer's own body in the mirrors with the rhythmical foot vibration might reduce the sense of body ownership and/or the sense of self-location and impair the telepresence in experiment 1.

Limitations
The enhancing effect of the hands-and-feet-only avatar was less than that of the full-body avatar. This might be owing to the lack of visual-motor synchronicity between the avatar and user. Without the avatar's head, the movements of the handsand-feet-only avatar were independent of the user, whereas the full-body avatar's head was associated with the observer's head motion. We speculated that only a body image similar to the biological motion with a point-light display (Johansson, 1973;Troje, 2008;Thompson and Parasuraman, 2012) in the handsand-feet-only avatar might induce illusory body ownership and agency. However, the body visual discontinuity, such as an arm with a missing wrist or an arm with a thin rigid wire connecting the forearm and the hand, weakens the illusion of body ownership (Tieri et al., 2015). The weakened sense of body ownership owing to the discontinuous body stimulus might decrease the virtual walking sensations, compared to the full-body avatar. Synchronous foot vibrations with handsand-feet-only avatar footsteps might contribute to illusory body ownership; however, the effect of the asynchronous foot vibrations was significant. This issue is yet to be elucidated and must be further investigated in future studies. The hands-and-feet-only avatar required less computing power and presented less conflict arising from appearance difference between the actual body and the avatar's body; however, its effect was limited, and a tradeoff was observed. If this limitation can be improved, then the hands-and-feet-only avatar will be useful.
Furthermore, although mirrors are effective for viewing the self-avatars in the first-person perspective, it is unnatural and tiresome to have many mirrors on the scene. In our virtual walking system, we eliminated mirrors such that the self-avatar was observed from a third-person perspective in experiment 2. However, we did not demonstrate the enhancing effect of the avatar from the third-person perspective. Some studies have revealed that the third-person perspective improves spatial awareness in non-immersive video games (Schuurink and Toet, 2010;Denisova and Cairns, 2015) and decreased motion sickness in VR games (Monteiro et al., 2018). Hence, the third-person perspective offers several advantages. A novel technique was proposed in a recent study that combined the advantages of the first-and third-person perspectives using immersive first-person perspective scenes with a third-person perspective miniature with an avatar (Evin et al., 2020). For future studies, we shall consider a method that combines the advantages of the first-and third-person perspectives in a virtual walking system without a mirror.
Although the floor of the scene was wood, we used the sound of feet treading an asphalt-paved road for foot vibrations. We chose this sound based on our preliminary observation of consistency with the scene; the footstep sound was clear and suitable for foot vibration. Inconsistency between the sound and the floor in VR could affect the sensation of walking, as revealed by Turchet et al. (2012). The effect of the consistency of the vibration and the nature of the ground should be examined in future studies.
We did not measure the illusory body ownership of the avatars in this study. Particularly from the third-person perspective, how much the participants embodied the avatar and whether self-localization was shifted toward the avatar are crucial for considering the effect of the avatar on the illusory walking sensations. Illusory ownership should be measured and compared with walking-related sensations in a future study to clarify the relationship between illusory body ownership and illusory walking sensation.
Furthermore, we did not use standard questionnaires to measure presence (Usoh et al., 2000) and simulator sickness (Kennedy et al., 1993), although these are crucial issues in virtual walking systems. We shall use these questionnaires to measure presence and motion sickness in our virtual walking system in future studies. In a preliminary observation, some participants experienced motion sickness when the optic flow included perspective jitter. Kokkinara et al. (2016) demonstrated that perspective jitter simulating head motion during walking reduces the sense of agency. Thus, we did not include perspective jitter in the present study. Some studies have shown that vection intensity correlates with the degree of motion sickness (Diels et al., 2007;Palmisano et al., 2007;Bonato et al., 2008). However, several factors that modulate the intensity of vection do not affect motion sickness (Keshavarz et al., 2015(Keshavarz et al., , 2019Riecke and Jordan, 2015). This is an open question that should be investigated further to develop an effective virtual walking system without discomfort arising from motion sickness.

CONCLUSIONS
Using optic flow, foot vibrations simulating footsteps, and a walking avatar, we developed a virtual walking system for seated users that does not require limb action, and we discovered that the full-body avatar in the first-person perspective enhanced the sensations of walking, leg action, and telepresence. Although the hands-and-feet-only avatar in the first-person perspective enhanced the walking sensation and telepresence more than the no-avatar condition, its effect was less prominent than that of the full-body avatar. However, the full-body avatar in the third-person perspective did not enhance the sensation of walking. Synchronous or rhythmic foot vibrations enhanced the sensations of self-motion, waking, leg action, and telepresence, irrespective of avatars. The hands-and-feet-only avatar could be useful for enhancing the virtual walking sensation because of its easy implementation, low computational power, and reduction of the conflict of difference between the appearance of the user's actual body and the self-avatar's body.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Ethical Committee for Human-Subject Research at the Toyohashi University of Technology. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
YM, JN, TA, YI, and MK conceived and designed the experiments. JN collected and analyzed the data. YM, JN, and MK contributed to the manuscript preparation. All authors have reviewed the manuscript.

FUNDING
This research was supported in part by JST ERATO (JPMJER1701) to MK, JSPS KAKENHI JP18H04118 to YI, JP19K20645 to YM, and JP20H04489 to MK.

ACKNOWLEDGMENTS
We would like to thank Editage (www.editage.com) for English language editing. Preliminary analysis of Experiment 1 was presented at the IEEE VR conference (Matsuda et al., 2020) as a poster presentation.