Embodiment and Performance in the Supernumerary Hand Illusion in Augmented Reality

In teleoperations, robots are generally used because related tasks are too dangerous, uncomfortable or impossible for humans to perform. When using augmented reality to control robotic limbs in such teleoperations, it is essential to understand how these extra virtual limbs are experienced. In particular, the relationship between the embodiment experience of the user and relevant outcomes such as task performance must be examined. In this article, we study the relationship between experienced embodiment of a supernumerary virtual arm that acts alongside a user’s two real arms, and their task performance in augmented reality. Specifically, we compare how well users can trace a virtual half ring placed just outside of personal space using their virtual arm in a condition where there is expected to be low embodiment (a floating disconnected hand) and a condition where there is expected to be high embodiment (a connected arm and hand). Embodiment is measured quantitatively through skin conductance response and qualitatively through ownership, agency, and self-location questionnaires. Performance is measured in terms of tracing precision. The results show positive correlations between subjective ownership and agency, and agency and performance, but no correlation between subjective or objective ownership and performance. Also, ownership ratings were low overall, while the agency ratings were significantly higher for the disconnected hand condition than the connected arm condition, as was performance. Notably, the presence of the virtual arm evoked incorrect expectations of the movement capabilities of the arm, which may have contributed to an overall preference for the unrealistic disconnected hand over the more realistic connected arm in this particular task. Our results imply that methods to increase performance in various teleoperations can indeed be found in the experience of embodiment: not necessarily directly through ownership, but through ownership mediated by agency.


INTRODUCTION
The way we experience virtual avatars is a much discussed topic across academic disciplines. In a previous work we have summarized these in relation to gaming and found a common advocation for a form of embodiment similar or related to that from everyday life, specifically comprising of the feelings of having, controlling, and being in a body (Rosa et al., 2018). Notably, since the dawn of the rubber hand illusion (RHI) (Botvinick and Cohen, 1998), the number of studies that apply this experimental paradigm to examine the sense of embodiment (i.e., body ownership, agency, and self-location) (Kilteni et al., 2012), have steadily grown. These studies were initially, and still are, performed in reality, but an increasing number of versions of this experiment have also been performed in virtual reality (VR) and augmented reality (AR). With the nature of these technologies, the question arises whether experiencing a sense of embodiment should be a goal of the simulation in itself, or whether it can be used to as a means to achieve other goals, such as eliciting certain emotional responses (Waltemate et al., 2018) or altering motor behaviour of virtual body parts (Burin et al., 2019).
For example, a relation between embodiment and task performance has been posited. Such a relation would expand and further strengthen the application of VR and AR in domains beyond entertainment such as healthcare, education, and teleoperations. Recently, it has been postulated that embodiment of a remote manipulator can improve dexterous performance, based on evidence from VR and prosthesis use (Toet et al., 2020). A number of recent studies examine this postulate through only body ownership and have found mixed results. For example, Grechuta et al. found clear positive correlations between body ownership and performance in various tasks in VR (Grechuta et al., 2017;Grechuta et al., 2019). In contrast, Shin et al. found that greater body ownership may cause an increased risk perception, which in turn leads to degraded performance in pick-and-place tasks in VR (Shin et al., 2021).
Notably, these works only exploit the use of VR technology, but do not study the same effects in AR. With respect to recently presented AR games and demonstrations, we see an inclination towards a first-person perspective implementation where the user interacts with virtual objects using the real hands. However, once the user wants to interact with far away objects, a form of gestural interaction is often required, resulting in a divided interaction experience. Using a third virtual hand that interacts in a similar fashion as the real hand may amend this. Beyond gaming, it is clear that AR offers valuable technological advances such as, in the field of robotic teleoperation, embedding the robot video stream within the user's view rather than on a separate display, as it no longer divides the user's attention (Hedayati et al., 2018). We suggest that embedding the object for far away interaction into the user's body, namely a virtual hand, may provide similar benefits. Studies on body ownership in AR have shown that the medium still allows the experience of ownership of a virtual arm (Suzuki et al., 2013;Rosa et al., 2019), although it may weaken the experience compared to reality (Škola and Liarokapis, 2016) and VR (IJsselsteijn et al., 2006). However, it remains unclear whether extending the body with a virtual hand in AR rather than replacing the real hand by a virtual hand as is typical in VR, may affect any possible relation between ownership and performance.
Interacting with objects through virtual hands in AR requires continuous action, in which case there is evidence for a relation between the sense of agency and task performance (Wen et al., 2015). The link, then, between the sense of ownership and performance may be through the sense of agency. If we take into account the evidence that a sense of ownership may facilitate a sense of agency (Kalckert and Ehrsson, 2012), one may suggest that agency is a possible mediator of the relation between a sense of ownership and performance. A small number of studies in VR have measured these three phenomena simultaneously (Egeberg et al., 2016;Laha et al., 2016), but it is not common that the role of agency is further studied. The goal of this study is to investigate the relation between the sense of ownership and task performance in an AR task. Specifically, the objective is to investigate whether this relation is established through the sense of agency in an AR version of the RHI, namely the augmented reality supernumerary hand illusion (ARSHI).

Ownership and Agency
Research on the possible link between ownership and agency has mixed results, see Braun et al. (2018) for a thorough review. Here, we highlight a few recent studies. Tsakiris et al. used fMRI to compare activated brain areas in an RHI paradigm, with factors movement (active or passive) and visual feedback (synchronous or asynchronous), measured through questionnaires (Tsakiris et al., 2010). Their questionnaire responses to ownership and agency over the rubber hand both followed patterns typical to other RHIs, supporting an additive model of agency to ownership (i.e., agency entails body ownership). The neuroimaging data, on the other hand, showed that ownership and agency were associated with distinct and exclusive patterns of activation, supporting an independence model (i.e., they are qualitatively different experiences). The authors argue that the inconsistent results could be explained by the participants using common sense while responding to the questions, and that there may also not be a one-to-one mapping between brain activity and conscious experience.
Another study supporting the independence model is that of Kalckert and Ehrsson. Here, the authors performed a series of four experiments in an RHI paradigm to simultaneously measure ownership (through proprioceptive drift and questionnaire) and agency of the fake hand (by questionnaire) (Kalckert and Ehrsson, 2012). In the first pair of experiments they used factor movement timing (synchronous or asynchronous) while the rubber hand was passively moved, and found that both ownership and agency were experienced in the synchronous condition, and in the asynchronous condition there was lower but positive agency and no ownership. They found no correlation between proprioceptive drift and agency in either condition. In a second pair of experiments, they used factors movement mode (active or passive) and hand position (congruent versus incongruent) with synchronous movement. The questionnaire responses found a double dissociation of the two experiences: there was strong ownership (in both measures) and agency in the active congruent condition, ownership but no agency in the passive congruent condition, agency but no ownership in the active incongruent condition, and neither in the passive incongruent condition. Ownership and agency statements were positively correlated in the active congruent condition, but not in the other conditions. In summary, the results suggested ownership and agency were independent processes, and ownership modulated agency, that is, stronger agency was experienced when the hand model was owned.
A study supporting the additive model is that of Burin et al. The authors combined an RHI paradigm with a sensory attenuation (SA) paradigm to examine how body ownership contributes to agency, by only using self administered shocks (and not also by an "other" as is typical in SA alone) in three ownership-related conditions (Burin et al., 2017). In the synchronous visuotactile condition where ownership ratings and proprioceptive drift were high, intensity ratings were low and agency questionnaire responses were high, meaning the movement of the fake embodied finger was subjectively misattributed to the participant's own will and the stimulus intensity delivered by that finger was attenuated. On the other hand, in the two conditions where ownership ratings and drift were low (asynchronous visuotactile, and synchronous visuotactile with incongruent hand position), intensity ratings were high and agency questionnaire responses were low. This means that the movement of the fake not-embodied finger was not misattributed to the participant's own will and the stimulus intensity delivered by that finger was not attenuated. In summary, in the absence of (intent of) motor actions, when participants experienced ownership they also experienced agency, and when there was no ownership there was also no agency. The authors conclude: "owning the body would lead to the inference 'since this is my body part, any action would be intended by me'".
Lastly, in a similar study to that of Burin et al., Pyasik et al. perform two separate experiments on the same group of participants, one measuring ownership over a fake hand in an RHI through proprioceptive drift and questionnaire, the other measuring experienced agency in Libet's clock paradigm through intentional binding, intensity attenuation and questionnaire (Pyasik et al., 2018). Both experiments had the typical result patterns. These results were subsequently examined for correlations, and the authors found a positive correlation between proprioceptive drift and attenuation, and importantly, no correlation between both questionnaires. These results are in contrast to the study by Tsakiris et al. discussed above, in that here the questionnaire responses showed no overlap, whereas the quantitative measures did. The authors explain this may be because that study and many more use movement of an embodied fake hand to examine both ownership and agency. They conclude that spatiotemporal constraints in integrating sensory-related signals are common to both body ownership and sense of agency, supporting an additive model, whereas their subjective experience would rely on additional processes specific for any given sense, supporting an independence model.
To summarize, the discrepancy of results concerning the link between agency and ownership seems to depend on the type of measure, where qualitative measures typically find an overlap in experiences and quantitative measures do not, and whether movement was used to elicit agency, which may also accommodate body ownership. For VR and AR, the use of questionnaires combined with movement to examine body ownership and agency are in the majority compared to other measures and setups, meaning there may be a bias towards an additive model in the literature. Nonetheless, since the purpose of this study is to examine performance in a sensorimotor task which is to be executed by moving limbs, we hypothesize that there will be a positive correlation between experienced ownership and agency of the virtual hand (H1a).
RHI related studies in AR are rare, thus it is not straightforward whether factors from reality and VR may also influence experienced ownership and agency in AR in a similar manner. In our previous ARSHI study (Rosa et al., 2019), varying ownership experiences seemed to rely mostly on increasing numbers of synchronous multimodal feedback, while agency relied on the presence of visuomotor feedback, regardless of synchronicity. However, in terms of a task, one would not rely on decreasing the amount of information given to the participant, nor on providing asynchronous visuomotor feedback, as these could hamper performance regardless of ownership or agency. A more appropriate factor for investigating the relation between these phenomena is then connectedness of a virtual hand. Tieri et al. (2015) found that participants only experienced ownership and vicarious agency (i.e., virtual arm moved but participants stayed still) when the arm was completely connected, and not when arm segments were missing. Therefore, in our experiment, we hypothesize that ownership of the virtual hand will be higher in a connected condition than in a disconnected condition (H2a), and agency will similarly be higher in a connected condition than in a disconnected condition (H2b). We also hypothesize that as in our previous ARSHI study (Rosa et al., 2019), there will be a shift in experienced hand-location (H2c). Wen et al. (2015) showed that action-feedback association (i.e., congruence between predicted and actual sensory information) and goal-directed inference (i.e., how well one was performing) both influenced the judgement of subjective agency in a continuous action task. They also showed that when the comparison between continuous action and feedback is difficult, then goal-directed inference plays a dominant role in judging agency. The experiment consisted of a key pressing task, where participants had to move a dot to a target by pressing arrow keys. They used conditions self-control versus assisted (i.e., incorrect key presses resulted in no movement, thus by definition better performance), and action delay of 100, 400, and 700 ms. Performance was measured through duration, number of keys pressed and frequency of keys pressed. Agency ratings and the three performance measures all showed the same effects, namely they increased as delay increased, and were higher in the assisted condition than in the self-control condition. A multivariate analysis was used to estimate the relative influence of task performance on the sense of agency, and found assistance influenced the sense of agency indirectly via task performance, and delay influenced the sense of agency directly. The participants felt strong sense of agency when their task performance improved via computer assistance, even though a large proportion of their commands were not executed. This would suggest a correlation between agency and performance, even though the performance was not necessarily increased by the participants themselves.

Agency and Performance
Informally, one may suggest that a greater sense of being in control also coincides with better motor control. Possibly, experiencing more agency over a virtual hand makes the interaction performed by that hand feel more "natural" to the participant than if there were no sense of agency. One could then suggest that the interaction may require fewer cognitive resources. The eliciting of a sense of agency is typically described to arise from a comparison between prediction and result, and Hon et al. (2013) showed that these comparisons are consciously performed. In their experiment, participants rated agency after moving a dot on a screen by pressing arrow keys, while they concurrently were asked to memorize two or six consonants which they were tested on, as a means of low and high load conditions, respectively. They found that agency ratings were significantly lower in the high load condition than in the low load condition. This would suggest that, since resources from a cognitive resource pool are already allocated in order to elicit the sense of agency, fewer resources remain for task execution, which is in contrast to the idea of performing better when the interaction is more natural. However, this does not explain how studies in VR consistently find higher task performance coinciding with a higher sense of agency (Egeberg et al., 2016;Laha et al., 2016). These studies are further discussed below.

Ownership and Performance
Very few studies have examined the relation between body ownership and task performance directly. Older works have studied the relationship between performance and presence, a concept very related to the sense of embodiment, which can be defined as the experience of being present in a virtual environment. Snow confirmed that there was a positive relation between presence and performance of simple tasks related to a VR system's parameters, but that this relation was weak, and does not speak to the cause of this relation (Snow, 1998). Indeed, Welch describes the idea of presence causing better performance as a scientific urban legend, without there being evidence to support the causality (Welch, 1999).
In a more recent study on virtual wings in VR, Egeberg et al. (2016) investigated the role of different types of sensory feedback on body and wing ownership and agency using three conditions: only visuoproprioceptive feedback (no movement), only visuomotor feedback (rotating shoulders made the wings flap), and visuomotor and visuotactile feedback (during flapping). While visuoproprioceptive feedback alone did not in fact elicit any ownership or agency over the body or wings, the other two conditions did, where visuotactile feedback enhanced ownership and agency over the wings, but not the body. In a subsequent task participants were instructed to hit green balls and avoid red balls that were shot at them from a cannon, with or without visuotactile feedback. Although participants were equally well at hitting green balls in both visuomotor conditions, participants were able to avoid more red balls in the condition without visuotactile feedback. In summary, although participants experienced ownership and agency over both body and wings in both visuomotor conditions, performance was higher when there was no visuotactile feedback, which coincided with lower (but still positive) ownership and agency over the wings. This would suggest that a greater sense of ownership and/or agency can correspond to worse performance in a task, but the study lacks a correlation analysis, which makes it difficult to interpret whether such decrease in performance is caused by an increase in ownership alone, agency alone, or both, or neither of them.
Similarly Laha et al. (2016), investigated the influence of control schema for a three-armed body in VR on task performance and, among other measures, body ownership and agency. They compared unimanual control (one wrist uses vertical and horizontal rotation for vertical and horizontal translation), bimanual (one wrist uses vertical, other horizontal) and head control (head uses vertical and horizontal rotation) of a third elongated arm protruding from the chest. Participants were instructed to touch three target cubes, each target being located in its own 3-by-3 array of cubes: to the left and 0.8 m in front of the participant, centered and 1.3 m in front of the participant, and to the right and 0.8 m in front of the participant. The results showed that participants completed touching three cubes fastest in the head control condition followed by the uni-and bimanual conditions. Body ownership was higher in the unimanual and head conditions than in the bimanual condition, and agency was higher in the head condition than in the bimanual condition. In summary, the sense of ownership and agency coincided with greater performance, but again since the focus of the study was the control schemes rather than the relation between the three phenomena, no correlation analysis was performed.
In another recent study, Burin et al. (2019) examined the effects of ownership and agency on the ability to draw straight lines in VR. Viewpoint was altered (first person perspective using right hand versus third person perspective using left hand), and ownership and agency of the virtual hand were measured after a baseline phase where participants were instructed to draw straight lines and simultaneously saw straight lines being drawn (matching their own drawing), and after a deviation phase where they instead saw curved lines being drawn (not matching their attempted straight lines). Correlation results showed that participants that reported a greater sense of body ownership, regardless of after which phase, were more inclined to follow the curved lines in their real drawings. That is, body ownership influenced motor actions. Moreover, there was a positive correlation between ownership of the virtual hand reported after the deviation phase and curve in the drawing, and also between agency after the deviation phase and curve in the drawing. It should be noted that ownership was maintained through the entire experiment in the first person perspective, but agency was only experienced prior to the deviation phase. If we interpret a more curved drawing as a worse performance, since they were instructed to draw a straight line, than these results would suggest that when there is visuomotor discrepancy, greater experience of ownership over a virtual hand can result in worse performance. The authors explain that motor control can behave differently depending on whether the errors between predicted and actual feedback are causally attributed to the body or the environment. Grechuta et al. (2017) drew evidence from brain activity studies that showed an overlap in the brain areas corresponding to body ownership, and those corresponding to motor control, and further investigated this overlap by means of a sensorimotor task in an RHI paradigm. Participants were instructed to press a button as soon as the fake hand was stroked under congruent visuotactile stimulation, incongruent haptic and incongruent visual stimulation. The authors found that ownership was higher and reaction times were lower in the congruent condition compared to the incongruent conditions, and a significant negative correlation between ownership and reaction time. It is explained that this confirms a functional role of ownership in the domain of motor control. In a next study the authors further examine this relationship, by arguing that ownership in RHIs using movement relies on an internal forward model, which in turn integrate signals from both proximodistal and purely distal sensory cues relevant to the task (Grechuta et al., 2019). Therefore, incongruent distal cues should impede both performance and the eliciting of ownership. This was confirmed in a VR air hockey experiment, where a condition with congruent distal cues was compared to a condition with incongruent distal cues. Both performance and sense of ownership were higher in the congruent condition than the incongruent conditions, while agency ratings did not differ between conditions but were nonetheless high, as was expected since there was no change in visuomotor congruency.
Lastly, in a very recent study the so far positive relationship between ownership and performance was challenged by the notion of risk of danger in the context of VR-based machinery teleoperation (Shin et al., 2021). Participants performed pick and place tasks on a conveyor belt, during which a "raw material" had to be placed in a metal press machine for quick pressing (high risk) or slow pressing (low risk), using either a realistic hand or a robot hand. The results showed that body ownership significantly increased the risk perception during the operation, and was not moderated by actual risk of danger. Moreover, risk perception was negatively associated with work performance.
In summary, many studies have consistently found high ownership coinciding with high performance, but recent studies how found scenarios in which this suggested positive correlation becomes a negative correlation. However, one such scenario where intended sudden movement error was introduced seems unlikely in a scenario where high performance is desirable. Furthermore, although not supported through reasoning of allocated cognitive resources, studies repeatedly find coinciding agency and task performance. We hypothesize, therefore, that in a completely safe task performed within an ARSHI paradigm, any link between ownership and performance is mediated by agency (H1b).

Design
To investigate the relation between embodiment and performance, our experiment included one factor "connectedness", referring to the connectedness of the virtual hand, and was performed in a within-subjects design. We emphasize that the purpose of using the two conditions is to introduce variation in the responses to ownership and agency, in order to correlate both negative and positive ownership and agency responses to the performance values. The sense of ownership is measured by means of a questionnaire accompanied by galvanic skin responses (GSRs) in response to a threat. The senses of agency and self-location are also measured by means of a questionnaire. The Medical Research Ethics Committee of Utrecht did not raise objections to the execution of this experiment.

Participants
23 participants took part in the experiment, with mean age 30.5 (SD 9.5, range 19-47) of which 13 female and 10 male. Inclusion criteria were: between 18-50 years of age, right-handed, light skin color (to match as much as possible with the virtual arm/hand model), not right arm/hand/finger amputee, no prosthetic on right arm/hand/finger, no scars or tattoos on right hand, and no experience with severe motion sickness or cybersickness.

Material
To create a video see-through AR setup, a ZED mini camera was mounted on to an HTC Vive. The ZED mini lens has a maximum field of view of 90°(horizontal) × 60°(vertical) × 100°(depth), and can reach 60 frames per second with a side-by-side output resolution of 2,560 × 720 pixels. The HTC Vive offers a 110°fi eld of view, a maximum refresh rate of 90 frames per second and a combined resolution of 2,160 × 1,200 pixels (1,080 × 1,200 pixels per eye). A Vive Tracker was placed on the table to determine the center of the interaction space, and another was strapped to the right wrist of the participant. The experiment was run on a Lenovo Legion T730-28ICO 90JF with a GEFORCE RTX 2080 Super graphics card and an Intel Core i9 processor. The project was created in Unity 2019.2.17f1 and Visual Studio 2019. The scene was visualized using SteamVR 1.15.19 and the SteamVR Unity Plugin 2.6.1. The "VR Hands and FP Arms Pack" by NatureManufacture was used for the arm and hand, where in the latter case the arm was removed to create a single hand, see Figure 1. The "Final IK" package by Rootmotion was used to allow the arm segments to move naturally. The "Modern Combat Knife" by Float3D was used for the knife threat. A Biosemi set was used to measure GSR. The acquisition software ran on a separate Dell Latitude E6540 laptop. Using a Biosemi trigger interface, triggers were sent from the Unity project to the acquisition device through a serial port. The individual output measures were primarily analyzed using IBM SPSS Statistics 24 and supplemented by correlations analyses performed in RStudio 1.2.1335. Post hoc power analyses for the condition comparisons were performed in GPower 1.3, and for the correlations in IBM SPSS Statistics 27.

Procedure
Participants signed a consent form and washed their hands with a mild non-abrasive soap upon arrival at the laboratory. The experimenter attached four electrodes to the left hand: two to the thenar and hypothenar eminences, and two to the distal phalanges of the index and middle fingers. The experimenter helped the participant put on the HMD and the tracker on the right wrist. The participant then sat in an indicated start position at a table, with hands 30 cm away from the table's edge and 50 cm apart, while looking straight forward. The experimenter started the first condition, and the participant could see the room through the HMD, but with an added virtual arm or hand, see Figure 1. The practice session then started, during which the participant could move the virtual limb for 90 s to learn how it's movement corresponded to the movement of the tracker. The position of the tracker determined the position of the virtual fingertip in both conditions, not the virtual wrist; the participants were not told to hold their hand in a specific shape. In the Hand condition, the virtual hand had fixed orientation. In the Arm condition, the virtual hand would rotate according to inverse kinematics.
After this, a first half ring would appear at 67 cm from the table's edge and 65 cm above the Tracker on the table, see Figure 1. The apparent full ring (torus) was 30 cm (horizontal) × 30 cm (vertical) × 6.5 cm (depth). The participant was instructed to touch the center of the intersection of the torus pipe, starting from the green side (0.75 cm high, overlapping the end of the half ring), which would turn yellow once touched, to the red side (0.75 cm high, overlapping the end of the half ring). Once the red side was touched, the half ring would disappear and a new ring would appear at a new random horizontal location, with the same vertical and depth location. Here, location refers to the center point of the full ring. The half ring would also rotate randomly in multiples of 45°, and the drawing direction would also switch randomly from clockwise to counterclockwise.
After finishing 20 trials, the participant was asked to place their hand back in starting position, after which the virtual threat was launched: a knife would approach the virtual hand from the right and stop just before contact. After this the scene was turned off and the experimenter would orally ask seven questions to the participant in random order, who would answer on a scale from −3 to 3, representing "completely disagree" to "completely agree", respectively, see Table 1. The participant was also given the opportunity to orally provide comments to the session, which were denoted by the experimenter. When the comment was lengthily or ambiguous, the experimenter would confirm the written piece with the participant. After this, the session was repeated using the other limb version, where ordering was counterbalanced across participants. After completing both sessions the participant was asked which hand they felt was more pleasant in use and were allowed to provide further comments about their decision.

Analysis
For the ring tracing analysis, we wrote an algorithm in C# using Visual Studio 2015 to calculate the root mean square (RMS) deviation. Each half ring was divided into 180 bins, where each bin was a rectangular prism with frontal width equating to 1 degree of the half ring. The depth and frontal height of the prisms were chosen to be 14 cm. Then all virtual finger tip data were sorted into these bins, and the smallest distance from the center of each prism to the sorted points was saved as the error ϵ for that bin. For empty bins, the error was automatically 7 cm. The final performance measure was then equal to 1/ i ϵ 2 /20 . Using this inverse measure, a higher value indicates less deviation and thus better performance.
For the GSR analysis, we calculated each threat response by subtracting the average signal of the 5 s before the threat from the maximum signal in the 10 s after the threat. Here, "threat" means the moment the virtual knife reached the proximity of the virtual hand. These responses were then transformed to a logarithmic scale: log 10 (ΔGSR + 1).
For the participants' comments, all individual statements were grouped into separate categories. Here, a statement means a meaningful expression concerning a single theme, and a comment could consist of multiple statements. Duplicate statements were removed, for example "it was difficult to move" in the first condition and "it was easier to move" in the second condition are counted as a single statement that only occurred once. The participants were not obliged to give a comment, nor were they given a maximum number of allowed comments, so the number of comments differs per person.

Questionnaire Responses
For ease of reading, we discuss the questionnaire results by coding the Likert-responses to −3, . . ., 3 corresponding with "completely disagree", . . ., "completely agree". See an overview of all results in Figure 2. Wilcoxon Signed Ranks Tests were performed on each pair of questionnaire responses. For agency (Q4), there was a statistically significant difference between conditions, Z −3.281, p 0.0002 one-tailed, 1 − β 0.996; however, we found that the agency ratings were in fact higher for the Hand condition (median 2) than the Arm condition (median 1), in contrast to the hypothesis. For shift in self-location (Q6 and Q7), both tests resulted in statistically significant differences between the conditions, Z −2.200, p 0.014 one-tailed and Z −1.841, p 0.036 one-tailed, respectively, but with low power, 1 − β 0.735 and 1 − β 0.615. Participants rated a higher degree of both full and partial shifts in the Hand condition (medians 0 and 1, respectively) than in the Arm condition (medians -1 and 0, respectively). All other responses (Q1, Q2, Q3, Q5) did not differ significantly between conditions. One-sample Wilcoxon Signed Rank tests showed that the responses to Q1 in the Arm condition were significantly less than 0 (Z 56.50, p 0.021, N 23, 1 − β 0.642, two-tailed), with median −1. The responses to Q4 in both conditions were significantly greater than 0, with median 2 for condition Hand (Z 263.00, p 0.0001, N 23, 1 − β 1.000, two-tailed) and with median 1 for condition Arm (Z 168.00, p 0.059, N 23, 1 − β 0.495, two-tailed. Responses to Q6 in the Arm condition were significantly lower than 0, median −1, Z 48.00, p 0.029, N 23, 1 − β 0.510, two-tailed, and to Q7 in the Hand condition were significantly greater than 0, median 1, Z 170.00, p 0.012, N 23, 1 − β 0.714, two-tailed. All other responses (to Q1 for Hand, Q2, Q3, Q5, Q6 for Hand, Q7 for Arm) did not significantly differ from 0.

Task Performance
After confirming that the performance values were normally distributed with a Shapiro-Wilk test (W(23) 0.960, p 0.472 for Arm, W(23) 0.952, p 0.322) for Hand, a paired samples t-test was used to compare performance between conditions. This showed that participants had statistically significantly higher deviation scores in the Hand condition (mean 2.208) than in the Arm condition (mean 1.956), t(22) −3.460, p 0.002, onetailed, 1 − β 0.973. This means that participants performed better in the Hand condition than in the Arm condition.

Galvanic Skin Responses
In two cases the GSR was not recorded due to equipment failure, thus data of the two relevant participants were excluded. No participants were classified as non-responders, that is, all participants demonstrated a difference in GSR within a single condition of more than 0.05μSiemens (Venables and Christie, 1980

Pleasantness and Comments
Three participants found the Arm condition more pleasant in use, and twenty participants found the Hand condition more pleasant in use. Regarding the participant comments, a full overview can be found in the Supplementary Table S1. Here we shall only denote the two most frequently provided statements, namely: • movement in Arm was different than mine: 12 • movement in Arm was more difficult than in Hand: 13

DISCUSSION
The goal of this study was to examine the relation between ownership and performance in the augmented reality supernumerary hand illusion. Participants were asked to trace a half ring as accurately as possible in two conditions: a connected Arm condition which was expected to result in high virtual hand ownership and agency ratings, and a disconnected Hand condition which was expected to result in low ratings. The results showed that ownership ratings did not differ between conditions, and, surprisingly, that agency ratings were higher in the Hand condition than in the Arm condition. In the correlation analyses we found a positive correlation between ownership and agency ratings, as well as between agency ratings and performance, but not between ownership ratings and performance.

Ownership
We had hypothesized that participants would experience greater ownership in the Arm condition than in the Hand condition based on previous results (Tieri et al., 2015). For all ownership responses including GSRs we found no significant differences between the two conditions, thus we reject H2a. There were weak correlations between different ownership measures but all with low power, making it difficult to draw conclusions. Since multiple RHI studies in reality and VR have found positive correlations between various qualitative and quantitative measures of ownership, we suspect that the experience in the ARSHI, i.e., expanding the real body with a virtual limb, greatly differs from these other real and virtual RHIs, where the real limb/body is replaced with a fake limb/body. Indeed, the study by Feuchtner and Müller (2017), that studies ways to present the virtual hand in an RHI in AR in order to interact with real objects, shows similar low responses regarding direct attribution ("own hand", our Q1), in conditions "abstract hand" (similar to our Hand) and "arm without inpaint" (similar to our Arm). Our responses to Q2 on body image ("three hands"), on the other hand, showed no significant difference between conditions, whereas the study by Feuchtner and Müller would suggest greater experiencing of three hands in the Arm condition than in the Hand condition.
Moreover, although the ratings were quite spread, the majority did not differ significantly from 0 and the responses to Q1 were significantly lower than 0, that is, the majority of participants did not experience any degree of direct attribution. In the following, we suggest three reasons for finding overall negative ownership results and thus also no difference between conditions.
First, one probable factor is the complex notion of visual realism in AR. Ownership studies in VR have similarly found mixed results on the effect of hand realism. For example, while Pyasik et al. found that using a 3D scan of the real hand in VR resulted in greater ownership over the virtual hand than over a virtual hand model, Jo et al. found the opposite effect, namely that a cartoon version of the participant with matching clothes elicited more body ownership than a 3D scan of the participant (Jo et al., 2017;Pyasik et al., 2020). In our study, five statements were made by participants about the strange appearance of the virtual arm/ hand (see Supplementary Table S1), of which none explicitly referred to a mismatch between the appearance of their limb and the virtual limb. In our earlier pilot study of the ARSHI, we indeed found that with a virtual projection of the real hand, participants still found it difficult to accept the virtual hand as their own (Rosa et al., 2019). We discussed that in AR participants may have to be more willing to believe that what is fake is not fake, and in a follow-up study we indeed found a positive relation between the participants' immersive tendency (i.e., their capability to become immersed) and their ownership responses (Rosa et al., 2020).
Secondly, it may be that fakeness of the virtual hand was not only experienced in the visual aspect, but also its movement. The frequent comments about the movement of the virtual limb in the connected Arm condition may illustrate that there was a greater expectation about the abilities of the connected (i.e., more realistic) arm, in comparison to the disconnected (i.e., less realistic) hand. We emphasize that the positioning of the virtual fingertips using tracked wrist data was identical in both conditions, but the rotation of the wrist differed as a result of the use of inverse kinematics in the Arm condition and nothing in the Hand condition. This dip as a result of unfulfilled expectation resembles the popularly referenced uncanny valley effect. Initially, the uncanny valley referred to a graph of affinity against the human likeness of a robot in terms of appearance and movement in the field of robotics (Mori et al., 2012). In our study, the visible arm as opposed to no arm may have further reduced affinity by becoming a form of distraction, since all questions only referred to the hand, reducing the visible arm to be experienced as "noise". Furthermore, this "noise" is not related to the more simple notion of number of presented distracting pixels (i.e., in Hand no arm pixels and in Arm many arm pixels), since Okumura et al. found higher ownership ratings of a supernumerary virtual hand in AR in a condition with high arm opacity (i.e., more pixels) than in a condition with low arm opacity (i.e., less pixels) (Okumura et al., 2020). However, it is difficult to attribute these findings solely to the uncanny valley, as the evidence of whether it even exists is mixed.
Lastly, it is also possible that the relation between realism (in whatever form it may take) and embodiment in AR differs in nature from the corresponding relation in reality and VR. That is, it may not be straightforward to expect an a priori positive relation between embodiment and realism, because a third virtual arm may be more disturbing in AR than an abstract tool. The study by Tieri et al. found a typical ownership experience, but our setups differ fundamentally in that those participants did not actually move the virtual hand themselves, nor were they subjected to a performance related task. Because of this different context, we suggest that in our case, even if users had not experienced the movement as improper for the "more realistic" connected arm (e.g., by using different inverse kinematics), then possibly still the floating hand may not have been experienced as more unrealistic than a third arm, as would be suggested by the uncanny valley discussion above. Our reasoning from Section 1, namely expecting a benefit in the interaction experience by embedding the interaction object into the user's body, is contrasted by the overwhelming majority of the participants that found that the disconnected hand was more pleasant to use.
Together, these findings suggest that (1) visual realism is more complex in AR than in reality and VR and requires more willingness to believe in order to accept a virtual object as real, (2) increasing realism in a single dimension can cause expectations in other realism-dimensions, that, when not fulfilled, may lead to an overall less pleasant experience, and (3) the relation between embodiment and realism in AR fundamentally differs from the corresponding relation in reality and VR.

Agency and Self-Location
We had hypothesized that participants would experience more agency in the Arm condition than in the Hand condition, also based on the results by Tieri et al. Our results showed that in both conditions agency ratings were highly positive, but that the responses in the Hand condition were actually higher than in the Arm condition, thus we reject H2b. Also, we had hypothesized participants would experience a shift in handlocation in both conditions, and our results showed that this was only the case for "partial shift" (Q7) in the Hand condition, while responses to "full shift" (Q6) in the Arm condition were actually negative and all others approximately 0, thus we reject H2c.
Regarding agency, in the previous section on ownership, we discussed how the setup of the study by Tieri et al. differed fundamentally from ours in terms of the cause of movement and the experimental context. Due to the absence of participant movement, the authors only measure vicarious agency, that is, the feeling of being the agent of others' actions. We acknowledge it may have been overly simplistic to assume a similar sense of body agency would occur in our study. From the comments, it became clear that the participants experienced some form of discrepancy in the movement of the virtual limb in the Arm condition, despite identical virtual fingertips positioning mechanisms in both conditions. We expect that this experience was largely caused by having to move with a specific purpose rather than just synchronous, but further meaningless, movement as is typical in a typical RHI. Participants tried their best, as instructed, to trace the half ring, a seemingly straightforward and simple task, but found that it was more difficult than expected before execution. This may have led to frustration and automatically thinking they were performing badly. In the case of the connected arm, they may have moreover been preoccupied by the way the arm segments were moving differently to theirs, leading to even more frustration, and in turn the feeling of performing worse. This would suggest, in line with the results of the study by Wen et al., that agency decreased as estimated performance decreased, even though they were provided no information regarding performance compared to the other participants or the other condition.
Regarding self-location, we expected that successfully reaching an object in peripersonal space would result in a change in experienced self-location, although it was uncertain whether this would take form in a shift or separation of normal selflocation. This was different than in our previous ARSHI study, where participants did not have to actively reach to the boundaries of their personal space (Rosa et al., 2019). However, we saw that such a change did not occur with the exception of the partial shift experienced only in the Hand condition. To explain our results, we turn to the definition of self-location, namely the volume in space where one feels to be located, which in daily life coincides with body-space, meaning one feels self-located inside the physical body (Kilteni et al., 2012). We found that participants made seven statements about having difficulty seeing depth, three of them occurring in the Hand condition and four in the Arm condition (see the Supplementary  Table S1). Possibly, then, participants struggled to make a mental spatial model, and as a result, they could not reliably say whether the virtual hand felt located in the personal or peripersonal space, as suggested by the 0-level responses rather than negative responses.

The Relation Between Ownership, Agency and Performance
We hypothesized that there would be a positive correlation between experienced ownership and performance of the virtual hand, and that this correlation would be mediated by agency. Our results showed a significantly better performance in the Hand condition than the Arm condition. We did not find a significant correlation between ownership and performance, thus we reject H1a. However, we found moderate positive correlations between ownership [in terms of direct attribution (Q1) and the source of experienced sensations (Q3)] and agency, and a moderate positive correlation between agency and performance, thus we accept H1b. In the following we attempt to place our findings within existing embodiment frameworks in order to present a possible causality.
It is well accepted that the experience of ownership is established through a combination of bottom-up and topdown processing mechanisms. However, its has been demonstrated that the top-down processing mechanism depends on whether there is self-generated movement or not (Grechuta et al., 2017). When the self-movement is congruent, ownership is high, even in cases with incongruent body characteristics, indicating that when a participant actively moves, the processing mechanism no longer depends on the internal body model as in the traditional non-moving RHI, but rather on predictive forward models. In our study, this would mean that ownership would be high regardless of connectedness of the virtual hand. However, as explained in the previous section on ownership, there was an experience of incorrect movement by the virtual hand beyond the positioning of the fingertip. Translating to internal forward models, the internal prediction about how the hand as a whole would move did not match the actual movement, since only the position of the fingertip matched their own real wrist movements and the rest of the limb did not correspond to their own movements, thus the premise of congruent self-movement no longer holds, and the sense of ownership once again relies on the internal body model. As discussed in Section 5.1, our low ownership results may then have been the result of incongruence in visual appearance of the virtual hand.
However, although their was no overall experience of ownership in our study, ownership was found to be positively correlated to agency, but importantly, not to performance. This latter finding contrasts what was found in (Grechuta et al., 2017).
The authors explain that body ownership is a result of multimodal integration, and the results of this integration can be used to modulate performance, or, put in terms of Bayesian inference for decision-making, congruent information reduces perceptual ambiguity which can enhance motor response. This simply suggests, however, a causality of multimodal integration (i.e., the creation of a mental model) to the eliciting of ownership, and of multimodal integration to motor control, but does not restrict the increasing of performance to a case where ownership is also increased. Our results confirm this, since the connectedness had no affect on body ownership, but did affect performance. In other words, such a correlation can only exist when the factor used to alter experiences of ownership is related to motor control.
When the factor is not related to motor control, our results suggest that the sense of agency can still be altered, even if it does not alter the sense ownership. A possible explanation is that the participants did not actually experience body agency in both conditions, but external (tool) agency in one or both conditions. Kalckert and Ehrsson (2012) provide evidence that these are distinct experiences, where body agency may only be related to transfer of sensorimotor integration mechanisms to the virtual hand (as is body ownership), whereas external agency relies on sensory predictions based on actions and goals from learned experiences. However, the positive correlation between ownership and agency in the present study contradicts the notion that participants exclusively experienced external agency rather than body agency, thus we do not think the participants reported external agency instead of body agency. Furthermore, Kalckert and Ehrsson have suggested a directional causality between the sense of ownership and the sense of body agency, since there is a general tendency to ascribe agency to an owned body part, and found little support for the opposite causality.
The findings by Wen et al. (2015) would further suggest that the positive correlation between the agency ratings and the performance scores was due to a causality from performance to agency, however in their study, participants were clearly aware of being assisted with a resulting positive effect on performance. In our study, participants were completely unaware of their realtime performance, nor were they provided any means by which they could deduce how well they performed overall, thus translating these causality findings to our study is not straightforward. The opposite causality would seem to contradict the evidence that creating a sense of agency in itself requires the allocation of cognitive resources (Hon et al., 2013). However, it is possible that the number of resources subsequently required in the sensorimotor task are lower, precisely due to the elicited sense of agency. If this gain outweighs the resources required for agency, then this would indeed suggest a causality from agency to performance.
To summarize, our results do not support a direct relation between the sense of ownership and sensorimotor task performance. Instead, we found evidence for a relation between the sense of ownership and the sense of agency, and also between the sense of agency and performance. The design of the experiment does not allow conclusions to be drawn regarding causality. However, based on the above discussion, we found support for the following causality: the sense of ownership over a third virtual hand in AR influences the sense of agency over that virtual hand, which in turn influences performance in a sensorimotor task performed by that hand. However, we emphasize that this was not the focus of our study, and further investigating possible causalities is an important topic for future research.

Limitation
In order to examine correlations between ownership, agency and performance, the formal statistical method would be to perform a repeated measures correlation on the data of both conditions together, taking into account that the questionnaire data is ordinal and the GSRs and performance values are both continuous. Unfortunately, such an analysis does not exist in SPSS or in R and writing an appropriate package is outside the scope of this study. Therefore the closest alternative, namely considering all data as continuous, was chosen. Although this may slightly affect the statistical results, this should not happen to such as extent that our conclusions are no longer valid. Alternatives for correlation analysis, such as a Spearman correlation on the combined data (i.e., not repeated measures) or two Spearman correlations on the split data per condition, are listed and visualized in the Supplementary Table S2 and Supplementary Figure S1. Note that the significant correlations found in these ways are not inconsistent with the results presented above. For completeness, a linear regression is not applicable to our results since both to be analysed measures are random variables and not fixed without error, and is therefore not performed in this study.

CONCLUSION
The relationship between embodiment and task performance has been suggested in the literature for decades, but so far there was mixed empirical evidence to support these notions. The objective of this study was to investigate whether the often suggested relation between the sense of ownership and task performance is established through the sense of agency in an ARSHI. Our results showed that: 1) altering connectedness of a third virtual hand affected the sense of agency and task performance but not the sense of ownership, 2) the overall sense of hand ownership was weak, possibly due to a complex effect of realism, 3) there was no direct relation between the sense of ownership and sensorimotor task performance, but an indirect relation through the sense of agency. Our research was a first step in understanding the practical benefits of the experience of embodiment, and these findings have implications for serious domains in which optimal performance is crucial, such as the area of teleoperations. Future research should be performed in order to derive the exact causality relationship between the sense of body ownership, the sense of agency, and task performance.