A Virtual Versus an Augmented Reality Cooking Task Based-Tools: A Behavioral and Physiological Study on the Assessment of Executive Functions

Virtual reality (VR) and augmented reality (AR) are two novel graphics immersive techniques (GIT) that, in the last decade, have been attracting the attention of many researchers, especially in psychological research. VR can provide 3D real-life synthetic environments in which controllers allow human interaction. AR overlays synthetic elements to the real world and the human gaze to target allow hand gesture to act with synthetic elements. Both techniques are providing more ecologically environments than traditional methods, and most of the previous researches, on one side, have more focused on the use of VR for treatment and assessment showing positive effectiveness results. On the other, AR has been proving for the treatment of specific disorders but there are no studies that investigated the feasibility and effectiveness of AR in the neuropsychological assessment. Starting from these premises, the present study aimed to compare the performance and sense of presence using both techniques during an ecological task, such as cooking. The study included 50 cognitively healthy subjects. The cooking task consisted of four levels that increased in difficulty. As the level increased, additional activities appeared. The order of presentation of each exposure condition (AR and VR) was counterbalanced for each participant. The VR-cooking task has been performed through “HTC/VIVE” and AR through “Microsoft HoloLens.” Furthermore, the study recorded and compared the psychophysiological changes [heart rate and skin conductance response (SCR)] during the cooking task in both conditions. To measure the sense of presence occurring during the two exposure conditions, subjects completed the Slater-Usoh-Steed Questionnaire (SUSQ) and the ITC-Sense of Presence Inventory (ITC-SOPI) immediately after each condition. The behavioral results showed that times are always lower in VR than in AR, increasing constantly in accordance with the difficulty of the tasks. Regarding physiological responses, the findings showed that AR condition produced more individual excitement and activation than VR. Finally, VR was able to produce higher levels of sense of presence than AR condition. The overall results support that VR currently represents the GIT with greater usability and feasibility compared to AR, probably due to the differences in the human–computer interaction between the two techniques.


INTRODUCTION
Virtual reality (VR) and augmented reality (AR) are two novel graphics immersive techniques (GIT) that, in the last decade, have been attracting the attention of many researchers, especially in the fields of psychology and education (Chicchi Giglioli et al., 2015;Negut et al., 2016;Fleming et al., 2017;Cipresso et al., 2018;Jensen and Konradsen, 2018;Ventura et al., 2018;Germine et al., 2019). On one side, VR is an interactive and advanced computer technology that it can create real-simulated threedimensional (3D) environment. Technologically, VR provides a wide field of view (FOV -the area angular size allowed to a user to see a scene) around 100 • and the human-computer interaction can be ensured by various devices, such as headmounted display (HMD) for the visual stimuli, headphone for the acoustic stimuli, controllers for hand interaction. These allow users to navigate and interact with the virtual environment, being felt them totally immersed in the virtual world. The accurate real-simulated 3D environment and the technological presence can help users to generate a sense of presence, defined as the feeling to "being in" the virtual environment (Gregg and Tarrier, 2007;Slater, 2009;Parsons, 2015;Valmaggia et al., 2016;Freeman et al., 2017). On the other, AR is a recent technology in which synthetic elements are incorporated in the physical world adding information to the users (Chicchi Giglioli et al., 2015;Ventura et al., 2018). The FOV is narrower than VR, included between 35 • and 45 • and the interaction is ensured by various sensors integrated into the headband, like cameras that, through the human gaze to target, allow the real hands' interaction with the synthetic elements. AR, like VR, aims to provide high visual realism, fidelity of the experience, and presence, highly similar to the real one and adding real objects/information to real world. Visual realism and fidelity can depend on the FOV, accuracy, complexity of the systems, as well as on the user's interaction fidelity. Regarding the visual realism and fidelity, a wider FOV allows the user to see more of the scene at once and to use peripheral vision, while a narrower FOV, as in AR systems, may reduce distraction in the periphery and allow the user to focus on the area of interest in the scene (Ragan et al., 2010(Ragan et al., , 2012McMahan et al., 2012). Furthermore, high accuracy and complexity on graphics can enhance the level of fidelity of the experience, allowing transferring the VR/AR learned behaviors in real-world or allowing to perform in the AR/VR world, as if the user were in the real-life (Dunkin et al., 2007;Seymour, 2008;Saposnik et al., 2010). Finally, interaction fidelity supposed that more is natural the interaction, higher is the fidelity (McMahan et al., 2012). However, comparison studies on different hand controllers showed that the more familiar, and less natural type of controller provided a best performance, although the participants appreciated the more natural interaction (McMahan et al., 2010). All these features are able to generate immersed and the psychological state to be present in the virtual and augmented environments (Slater, 2009). A valid and reliable measure for the sense of presence is the ITC-Sense of Presence Inventory (ITC-SOPI; Lessiter et al., 2001) that assess four dimensions: sense of physical space, engagement (E), ecological validity (EV), and negative effects (NE). Tang et al. (2004), compared the sense of presence between a VR and an AR environment, showing significantly higher score for sense of physical space for AR, and no significant differences in the other three dimensions, although all means were higher in the AR than VR condition.
According to this, at present, both techniques are providing advantages along with traditional scientific research procedures, providing accurate real-simulated stimuli control and behavior measurement of reactions times and scores and allowing researcher to address issues that would simply be difficult to pose in natural environments (Bohil et al., 2011;Germine et al., 2012;De Leeuw, 2015;Reimers and Stewart, 2015). In psychology both technologies have been extensively explored in the treatment of certain disorders, such as phobias, allowing patients learning and repeating new behaviors to cope with fearful stimuli in safe and reactive environments generating effectiveness in behavioral changes in real contexts (Chicchi Giglioli et al., 2015;Suso-Ribera et al., 2018;Ventura et al., 2018). In psychological assessment, conversely, several VR applications have been developed for neuropsychological evaluation in order to improve the EV of them (Pugnetti et al., 1998;Ku et al., 2003, Ku et al., 2004Rizzo et al., 2004;Rand et al., 2007Rand et al., , 2009Díaz-Orueta et al., 2012;Henry et al., 2012;Parsons et al., 2013;Cipresso et al., 2014;Díaz-Orueta et al., 2014). Traditional neuropsychological assessment consists of performance-based approach, involving paper-and-pencil and/or computerized tests, to assess a variety of cognitive processes, such as attention, memory, inhibition control, planning, cognitive flexibility, and the higher-order system of executive functions, that govern the cognitive processes to goal-directed and adaptive behaviors. These tests consist of a set of predefined and abstracts' stimuli delivered in a controlled setting that have proved moderate level of EV in predicting real-functional performance (Elkind et al., 2001;Chaytor and Schmitter-Edgecombe, 2003;Chaytor et al., 2006). For example, the Tower of London is a neuropsychological measure for the assessment of executive functioning, specifically related to planning abilities, in which a target configuration of colored beads are presented to the participant and he/she is asked to compute the minimal number of steps (ranging from 1 to 5) to reach a target configuration. This test is a reliable and valid measure but it is abstract and decontextualized from the reallife activities.
In order to improve similarity between tests and reallife activities, several VR environments have been developed such as virtual mall/supermarket (Rand et al., 2007(Rand et al., , 2009Cipresso et al., 2014), and classroom (Rizzo et al., 2000, Rizzo et al., 2009Díaz-Orueta et al., 2014). For example, Cipresso et al. (2014) tested a virtual supermarket in which participants (patients with normal cognition, patients with mild cognitive impairments and cognitively healthy subjects) had to complete four shopping tasks. Findings revealed that the virtual shopping task was able to discriminate the performance among the three groups and that the virtual supermarket was more sensitive than traditional assessment in detecting cognitive impairments. Furthermore, a recent meta-analytic review (Negut et al., 2016) on VR applications in neuropsychological assessment showed moderate sensitivity and effect size in detecting cognitive impairments by comparing performance between health subjects and patients using both VR applications and traditional measures.
Despite the opportunities, that VR has been providing in psychological assessment, to our knowledge no previous studies have investigated the differences in behavioral responses to ecological tasks presented through AR compared to other methods -particularly VR.
Finally, both systems are also compatible with other neuroscientific tools such as wrist devices able to measure changes in electrodermal activity (EDA) and heart rate variability (HRV) (Poh et al., 2010;Garbarino et al., 2014). EDA and HRV showed consistent results with cognitive and information processing (Dawson et al., 2007;Sequeira et al., 2009) and can provide, together with behavioral data, implicit and objective responses to changing during activities.
Starting from these premises, the first aim of this study was to analyze and compare behavioral and physiological data collected before, during and after performing a cooking task in VR and AR environments. Second, the study aimed to determine the degree of presence, or the feeling of "being there, " that produced VR through the "HTC Vive" and AR through "Microsoft HoloLens."

Participants
The experimental sample included 50 healthy individuals (16 males and 34 women). Participants were recruited through local advertisement among college students and workers of the Polytechnic University of Valencia. The mean age was 25.96 ± 6.51. To be included in the study, participants were required to have a score higher than 24 in the "Mini-Mental State Examination" (MMSE) (Folstein et al., 1975). Before participating in the study, each participant was provided with written information about the study and required to give written consent for inclusion in the study. The study received ethical approval by the Ethical Committee of the Polytechnic University of Valencia. Table 1 includes the main sociodemographic data, such as age, gender, and education.

Psychological Assessment
Before the experimental session, the following questionnaires were administered to each participant: • Attentional Control Scale (ACS) (Derryberry and Reed, 2002) is used to evaluate the attentional control and higher scores show a great ability to maintain voluntarily attention in a task, while low values are related to greater attention stiffness. • Barratt Impulsiveness Scale (BIS-11) (Barratt, 1959;Oquendo et al., 2001) is a measure of impulsiveness and a score of 72 or more means that the individual is highly impulsive. Between 52 and 71 should be considered within the normal limits of impulsivity. Below 52 represents a subject excessively controlled. • Cognitive Flexibility Scale (CFS) (Martin and Rubin, 1995) consists of 12 questions that are scored on 6 points Likertscale; a score of 60 or more indicates that the individual has a high cognitive flexibility.     Furthermore, participants completed a total of 5 standard tasks (ST): Dot Probe Task (DOT) version published by Miller and Fillmore (2010); Go/NoGo Task (Fillmore et al., 2006); Stroop Test (Stroop, 1992); Trail Making Task (TMTA-B), paper-and-pencil-based version published by Reitan (1958); and Tower of London -Drexler (TOLDX; Culberston and Zillmer, 1999). The ST were randomly presented and performed on a personal computer. Neuropsychological data performance of the participants are reported in Table 2.
After each exposure condition, the following presence questionnaires were administered to each participant: • Slater-Usoh-Steed Questionnaire (SUSQ) (Slater and Steed, 2000): This post hoc test consists of three questions that are evaluated on a scale of 7 points. The items evaluate the sensation of being in the environment, the extent to which the medium becomes the dominant reality and the magnitude in which it is remembered as a "place." • ITC-SOPI (Lessiter et al., 2001): This test consists of 42 items, evaluated on a scale of 5 points, and evaluates four dimensions of presence: the sense of physical space or spatial presence (SP), E, EV, and NE.
Descriptive data on presence are reported in Table 3.

Physiological Assessment
At the beginning and during the experimental session, skin conductance response (SCR) and HRV were recorded to obtain subjects' physiological responses to VR and AR cooking task. SCR and HRV are considered indexes of arousal responses (Boucsein, 1992). The physiological signals were acquired using Empatica E4 device, including E4 Manager software to record and export raw signals. The sampling frequency in the SCR signal was acquired at 4 Hz, and 64 Hz for HRV, inside a window time from 1 to 2.5 s with an amplitude >0.01 µS (microvolts).

The Cooking Task
The virtual and augmented system was developed using Unity 5.5.1f1 software, applying c# programing language using the Visual Studio tool. Participants performed the virtual cooking task wearing an HMD device (HTC VIVE 1 ) and through two hand controllers, and the augmented cooking task using Microsoft HoloLens 2 . The AR experience was performed in a real kitchen in which the augmented synthetic objects appeared in front of the subject according to the subjective human gaze. The interaction in AR was ensured by various sensors integrated into the headband, like cameras that, through the human gaze to target, allow the real hands' interaction with the synthetic elements. Before the VR and AR virtual cooking task, participants performed two introductory tasks (tutorial), one for each technology, in order to learn the main body movements and hands' interactions useful to perform the virtual cooking task. The tutorial consisted of a simulated task, similar to the virtual cooking task. In both conditions, body movements were real in the physical space and hands' interaction in the VR was performed through the use of two controllers and in AR, participant interacted with objects with their own hands. Participants could train for as long as necessary, according to the needs of each one. When they felt confident about body and hand movements and interactions, a button pulsed to start the virtual cooking task.
The virtual cooking task consisted of four levels of difficulty, involving the abilities to pay attention, planning, and shifting. All were based on cooking a series of food in a set time, avoiding burning (in which the ingredient was in the fire more FIGURE 3 | Paired t-test significant differences between conditions for total times ( * p ≤ 0.05, * * p ≤ 0.01).
FIGURE 4 | Paired t-test significant differences between conditions for cooling and burning time ( * p ≤ 0.05, * * p ≤ 0.01). than the set time) or cooling them (switch off the glass-ceramic switch or remove the food from the pan during cooking). As the level increased, additional activities appeared (Figure 1). In the first level, participants had to cook three foods in one cooker on 2 min; in the second level, they had to cook 5 foods on 2 cookers in 3 min; in the third level, a dual-task should be performed: (a) 5 foods should be cooked on 2 cookers in 4 min; (b) during the cooking, participants should add the right dressing to the foods; in the last level, another dual-task has been proposed: (a) participants should cook 5 foods in 2 cookers in 5 min; and (b) they should set a table. Each food should be cooked in a scheduled time, as well as the level had a limit time that appeared all the time in the virtual and augmented environment. When the food was cooked, it had to be removed from the pan, turning off the cooker and placed in the dish. The main aim of each level was to cook the foods in the scheduled time without burning and letting them cool. Burning a food means by not taking it out of the pan, or turning the burner off, after the predefined cooking time. Cooling a food means left the food in the pan to cool down after it was cooked. The virtual system gathered various time/performance data for each subtask, including total times, burning times, and cooling times.
Participants exceeded the following level when they have cooked all the foods, completing the level. Before each level, instructions, explaining what activities participants had to be carried out, what time they had to do it, times for each food and remembering to cook foods without burning and letting them cool, have been showed (Figure 2).

Experimental Procedure
The order of presentation of each exposure condition (AR and VR) was counterbalanced for each participant. Before the beginning of the experiment, participants were administered the MMSE and standard questionnaires (ACS, BIS, CFS) and tasks (DOT, GoNoGo, Stroop, TMTA-B, TOL). Once this first phase was completed, we recorded 3-min of EDA and HRV baseline, asking to participants to stay completely relaxed during the recording. Once the physiological baseline was recorded, the experimental session started, and EDA and HRV were   continuously recorded until the end of the experiment. To measure the sense of presence occurring during the two exposure conditions, subjects completed the SUSQ and the ITC-SOPI immediately after each condition.

Statistical Analyses
The analyses were performed using SPSS version 22.0 (Statistical Package for the Social Sciences for Windows, Chicago, IL, United States) for PC. The biosignals' processing and computation were analyzed using MATLAB and Ledalab programs. First, we verified the assumptions of normality applying Kolmogorov-Smirnov and the internal consistency of the scales was assessed via Cronbach's alpha. Second, it has been verified the normal cognitive functioning and the physiological health (SDNN and rMSSD of HRV values) of the subjects.
Next, four paired t-tests were conducted to compare behavioral, physiological data (SCR and HR), and sense of presence responses in AR and VR conditions. The level of significance was set at α = 0.05.
Regarding the cognitive functioning (Table 2), the mean total score on cognitive flexibility showed that the subjects had a high cognitive flexibility (CFS TOTAL = 47.36; normal range: 10-60); the mean total value on impulsivity (BIS TOTAL = 65.21) is within the normal limits of impulsivity (normal range: 52-71); and for attentional control, a very high mean score was obtained (ACS TOTAL = 55.44), indicating that subjects were able to voluntarily control their attention. Table 2 also reports the descriptive data on standardized tasks.
Focusing on health at physiological level, the values of beats per minute (BPM) at baseline and during the tasks are in the normal range of 60-100 beats/min. Also, SDNN values indicate that participants are not in danger of suffering from any cardiac episode since the data is greater than 100 ms, while the rMSSD are also in the normal range (greater than 25 ms) (Macías, 2016) (Table 4).

Behavioral Responses to Cooking Task
Regarding performance, Table 5 shows the mean and standard deviations of behavioral values of the cooking task for both conditions.
A paired t-test was conducted to compare behavioral responses in AR and VR conditions. There were significant differences in the scores for the total four levels'  (Figure 4).
No other significant differences in physiological activation during the four levels of the cooking task have been found.
Heart Rate Variability Table 7 shows the mean and standard deviation of HRV values of the cooking task for both conditions (AR vs. VR).
A paired t-test was computed to compare HRV in AR and VR conditions. There was a significant difference in the scores for HRV for AR-pre (M = 79.57, SD = 13.43) and -post (M = 81.20, SD = 14.07) task; t(49) = −1.97, p = 0.05 (Figure 6). No other significant differences in HRV during the four levels of the cooking task have been found.

Sense of Presence
A paired t-test was computed to compare SUSQ and ITC-SOPI questionnaires in AR and VR conditions. Regarding the SUSQ, there was a significant difference in the scores for AR (M = 4.11, SD = 1.65) and VR (M = 5.85, SD = 1.00) conditions; p = 0.00. The ITC-SOPI showed significant differences between AR and VR in the four dimension of presence: SP [AR (M = 3.29, SD = 0.64); VR

DISCUSSION AND CONCLUSION
The first aim of this study was to analyze and compare behavioral and physiological data collected before, during and after performing the cooking task in VR and AR environments. The results on the behavioral data comparison showed that times are always lower in VR than AR, both for the times' means of the four levels and in the specific levels. This may be because the interaction with VR is usually simpler; however, the VR and AR descriptive data showed that AR levels times decrease, while in VR levels increase (Bermejo Vidal, 2018). The results are partially in opposite with previous works that have compared VR and AR and could depend on the task to perform and on display fidelity, the accuracy and complexity of the technological systems (Irawati et al., 2008;Juan and Pérez, 2010;Khademi et al., 2013;Krichenbauer et al., 2017). Indeed, the main previous comparison studies implemented non-complex tasks in which one task was proposed at a time, such as object manipulation, and our study included activity of daily life characterized by a succession of actions and/or two tasks at a time in a rich similar real environment. For example, Khademi et al. (2013) compared an AR with VR "pick and place" task for stroke patients, showing that participants performed better in the AR condition than in VR. Möller et al. (2014) performed a study on navigation with a guidance system, showing that participants navigated in VR faster than in AR but committing more errors. Furthermore, each previous study used a different technological system with specific characteristics according to display fidelity, the accuracy, and complexity of the technological systems that could influence the results' variability (Germine et al., 2019).
Coherent with the behavioral data are the physiological results, showing that both conditions produced individual activation with higher values in AR than VR (Bermejo Vidal, 2018). Higher physiological activation in AR could depend on the interaction system differences. More in detail, as mentioned in the description of the cooking task, in VR interaction was ensured by two hand controllers, and in AR depended on the human gaze that allowed real hand interaction with the synthetic elements.
Regarding the second aim on the sense of presence, scores between conditions showed that VR always produces a higher sense of presence than AR (Bermejo Vidal, 2018). Specifically, the higher significant results on VR SP dimension than AR could depend on the fact that VR condition is mostly unmediated. Indeed, VR created a unitary and composite synthetic environment in which the user is totally immersed without interferences from the real world and AR adds synthetic objects to the real world, being able to perceive of discordance between reality and the artificial information in the environment.
Regarding the E and EV dimensions, we expected that the EV results of AR would be significantly higher than that of VR. Nevertheless, the higher score in VR could depend on the self-report measure used (ITC-SOPI) also for evaluating AR experience. Indeed, the ITC-SOPI items related to EV (5,11,15,20,27) evaluate if the environment seems natural or if was part of the real world and in AR the environment is the real world. This suggests that in the future studies a change of the scale may be needed for evaluating EV in AR. Finally, participants evaluated AR with a negative connotation with respect to RV as shown in the results and especially in SOPI-EN (NE of the ITC-SOPI) where AR has a higher score than in VR. This result seems to confirm previous results on the comparison between both conditions in situations of acrophobia, in which sense of presence was higher in VR than AR (Juan and Pérez, 2010). This result could also depend on less difficulty and greater familiarity by the subjects in using VR controllers and a feeling of greater naturalness in the interaction in AR, as mentioned in the introduction (McMahan et al., 2010).
Although the results are interesting for their possible applications in neuropsychological assessment, our study has some limitations that could affect the generalizability of the results or that may have influenced the findings. The main issues are related to the small sample size and the specific sample of healthy subjects included in this study. At the technological level, FOV and user interaction differences between the two technological systems can have generated the variability of the test scores. Futures studied are needed to investigate differences in behavioral responses comparing clinical populations and healthy subjects, as well as comparing AR and VR with other condition, such as the real condition. Furthermore, to overpass possible differences between the technological systems results, in the test design it would be important to focus more on the accuracy of the responses rather than in reaction times and also implement an individual baseline on the same or another measure using the different systems before the experimental task (Germine et al., 2019). In this way, it would be possible to consider and control system variability producing a higher generalization of the results. To conclude, VR and AR are two novels GIT with a high EV value applicable to a wide variety of research fields, so it is relevant to understand the effects of various technological systems also on neuropsychological effectiveness. Specifically, we focused on behavioral performance, physiological activation in the virtual cooking task and on the sense of presence, comparing VR and AR. We found higher results on VR than AR condition in all comparison factors.
This research represents a step toward better understanding the differences between AR and VR and opens up several new venues for future research works. In particular, we conclude that future test designs took into consideration some changes in the experimental design -adding an individual technological baseline and considering also the responses accuracy -and in the self-report scale to measure presence in AR.

DATA AVAILABILITY STATEMENT
The datasets generated for this study are available on request to the corresponding author.