Virtual multiple errands test (VMET): a virtual reality-based tool to detect early executive functions deficit in Parkinson’s disease

Introduction: Several recent studies have pointed out that early impairment of executive functions (EFs) in Parkinson’s Disease (PD) may be a crucial marker to detect patients at risk for developing dementia. The main objective of this study was to compare the performances of PD patients with mild cognitive impairment (PD-MCI) with PD patients with normal cognition (PD-NC) and a control group (CG) using a traditional assessment of EFs and the Virtual Multiple Errands Test (VMET), a virtual reality (VR)-based tool. In order to understand which subcomponents of EFs are early impaired, this experimental study aimed to investigate specifically which instrument best discriminates among these three groups. Materials and methods: The study included three groups of 15 individuals each (for a total of 45 participants): 15 PD-NC; 15 PD-MCI, and 15 cognitively healthy individuals (CG). To assess the global neuropsychological functioning and the EFs, several tests (including the Mini Mental State Examination (MMSE), Clock Drawing Test, and Tower of London test) were administered to the participants. The VMET was used for a more ecologically valid neuropsychological evaluation of EFs. Results: Findings revealed significant differences in the VMET scores between the PD-NC patients vs. the controls. In particular, patients made more errors in the tasks of the VMET, and showed a poorer ability to use effective strategies to complete the tasks. This VMET result seems to be more sensitive in the early detection of executive deficits because these two groups did not differ in the traditional assessment of EFs (neuropsychological battery). Conclusion: This study offers initial evidence that a more ecologically valid evaluation of EFs is more likely to lead to detection of subtle executive deficits.


INTRODUCTION
The umbrella term "executive function" (EF) refers to a broad set of high-level cognitive abilities used to regulate actions (Burgess and Simons, 2005;Chan et al., 2008;Otero and Barker, 2013). These cognitive abilities range from the capacity to problem solve, plan, sustain attention, utilize internal/external feedback, multitasking and cognitive flexibility and ability to deal with novelty (Damasio, 1995;Stuss et al., 1995;Grafman and Litvan, 1999;Burgess et al., 2000;Miller and Cohen, 2001;Strauss et al., 2006;Stuss, 2007;Chan et al., 2008;Goldberg, 2009). Impairment of EF is extremely common in neurological patients, specifically in those presenting with frontal pathology (Bechara et al., 1994;Stuss et al., 1995;Burgess and Shallice, 1996a,b;Dreher et al., 2008;Barker et al., 2010;Morton and Barker, 2010;Cole et al., 2013). Although EFs are thought to be mediated by frontal brain regions, frontal areas have multiple connections with cortical and subcortical regions, as well as to the amygdala, cerebellum, and basal ganglia (for a review, see Tekin and Cummings, 2002). Specifically, functional magnetic resonance imaging (fMRI) studies have shown that BOLD signals increase in the basal ganglia during the performance of EF tasks which require cognitive flexibility, shifting of mental sets, and updating of working representations (Cools et al., 2004;Leber et al., 2008;Hikosaka and Isoda, 2010). Further evidence that the basal ganglia is part of the circuitry crucial for executive functioning comes from studies with patients with basal ganglia Frontiers in Behavioral Neuroscience www.frontiersin.org December 2014 | Volume 8 | Article 405 | 1 lesions, specifically patients who suffer from Parkinson's Disease (PD; Cools et al., 1984Cools et al., , 2001McKinlay et al., 2010). Indeed, in addition to the typical motor signs, a number of different cognitive deficits have received relevant clinical attention in PD (Levy et al., 2002;Vingerhoets et al., 2003;Foltynie et al., 2004;Muslimović et al., 2005;Williams-Gray et al., 2009). The characteristics of cognitive impairment in PD may be extremely variable in regard to the timing of the onset and the rate of progression (Aarsland et al., 2005(Aarsland et al., , 2007Buter et al., 2008;Hely et al., 2008), and in terms of what cognitive functions are impaired (Verleden et al., 2007;Kehagia et al., 2010). Even if the neuropsychological profile of patients who suffer from PD is heterogeneous, including memory deficits (Whittington et al., 2006;Ramanan and Kumar, 2013) and visuo-spatial impairments (Montse et al., 2001;Kemps et al., 2005), it is marked specifically by executive deficits (Cools et al., 2001;McKinlay et al., 2010). Moreover, the impairment of EFs appears to be the core feature of a neuropsychological profile in PD-related dementia (Girotti et al., 1988;Jacobs et al., 1995;Padovani et al., 2006;Pagonabarraga and Kulisevsky, 2012;Kudlicka et al., 2013). Similar executive deficits also can be found in nondemented PD patients (for reviews, see Kudlicka et al., 2011;Ceravolo et al., 2012), but they are more severe in patients who suffer from dementia. Following this direction, several recent studies have pointed out the predictive value of early EF deficits in the transitional stage of mild cognitive impairment (MCI) of the disease (Levy et al., 2002;Woods and Tröster, 2003). The concept of MCI, originally introduced to identify the earliest cognitive changes due to Alzheimer's Disease (AD; Petersen et al., 2001;Petersen, 2004), has also been applied to PD to improve the detection of patients at risk for developing dementia (Aarsland et al., 2011). Litvan et al. (2012) proposed the diagnostic guidelines to facilitate the diagnosis of "mild cognitive impairment in Parkinson's Disease" (PD-MCI). These criteria are generally based on the established principles of MCI given by Petersen, namely, subjective cognitive decline and objective evidence of impairment assessed by neuropsychological evaluation that does not interfere with functional independence (Petersen, 2004). Similar to AD, the risk of developing dementia increases appreciably with the presence of PD-MCI (Janvin et al., 2006). As underlined by Biundo et al. (2013), a great challenge today is to characterize the neuropsychological profile of PD-MCI and to evaluate the screening power of traditional neuropsychological tests. In their work, 104 PD patients were given an extensive neuropsychological evaluation. Results showed that specific neuropsychological tests measuring attentional/setshifting, verbal memory, and visual-spatial functions are the best predictors of PD-MCI. In this perspective, EF dysfunction is a possible marker of potentially more severe cognitive impairment and may indicate a likely decline into dementia. Similarly, Goldberg proposed that EF deficits are also key markers for later dementia in AD (Goldberg, 2009 (Petrova et al., 2010). The diagnosis of MCI was made according to modified criteria proposed by Petersen et al. (2001). They found that amnestic PD-MCI patients showed impairment in several aspects of attention/EFs, including the ability to inhibit irrelevant responses and in cognitive flexibility, as measured by the Stroop test (Stroop, 1935) and Modified Wisconsin Card Sorting Test (Nelson, 1976), in formulating and following a complex plan, as revealed by Trail Making Test (Greenlief et al., 1985), and in sustaining a cognitive load during a language test, as highlighted by the phonemic and semantic verbal fluency test (Lezak, 1995). These findings underline the need for a complex evaluation of EFs in MCI-PD patients, especially in the possible relationship between these early executive impairments and behavioral change.
Previous studies indicate a need for rigorous ecologically valid assessments that reliably capture subtle impairments that may be markers for later dementia. In fact, there are some critical issues in the traditional neuropsychological evaluation of EFs (Chan et al., 2008). A more ecological and prompt assessment of EFs is essential to evaluate the specific cognitive profile of different individuals (Goldstein, 1996;Chaytor and Schmitter-Edgecombe, 2003;Burgess et al., 2006). Indeed, the traditional evaluation does not reflect the complexity of EFs in everyday situations. A more detailed assessment may evaluate if individuals are able to formulate, store, and check all the goals and subgoals in order to effectively respond to environmental and/or internal demands. In this direction, there are some instruments developed to measure executive deficits in situations similar to daily ones, such as the Behavioral Assessment of Dysexecutive Syndrome (BADS; Wilson et al., 1996) and the Multiple Errands Test (MET; Shallice and Burgess, 1991). The BADS (Wilson et al., 1996) consists of six subtests and a Dysexecutive Questionnaire (DEX). The DEX is designed to assess everyday cognitive, emotional, and behavioral changes, and it is completed by the patient (self-rating: DEX-S) and a person who knows the patient (independent rater: DEX-I). Although the BADS has good validity (Wilson et al., 1998), and the DEX was recently found to be, with some limitations, a useful instrument for capturing changes in to day to day functioning (Barker et al., 2011), it does not measure performance during real-life tasks. An interesting example of a functional instrument is the MET (Shallice and Burgess, 1991), in which participants are invited to complete different tasks following specific rules to adhere to within a specified time frame. Even the simplified versions of the MET, however, adapted especially to be performed in a hospital setting or a nearby shopping mall (Alderman et al., 2003), can be particularly demanding for a patient because these versions require good motor skills; for a clinician these versions are time consuming and demand high economic costs.
To address the issue of ecological validity and clinical utility, virtual reality (VR) appears to be an appropriate instrument for the evaluation of EFs because it provides the chance to deliver different tasks within ecologically valid, controlled, and secure environments (for a review, see Bohil et al., 2011). Based on this, the virtual version of the Multiple Errands Test (VMET) has been recently developed and tested in different clinical populations (Albani et al., 2011;Raspelli et al., 2012;Cipresso et al., 2013a). The VMET is a VR-based tool aimed at evaluating different aspects of EFs by enabling active exploration of a virtual supermarket, where participants are requested to buy various products presented on shelves and to abide by different rules. Thanks to the potential of the VR, with the VMET the real functional status of patients can be easily evaluated, as manifested in executive dysfunctions, which had not been fully acknowledged in laboratory tests. Specifically, the VMET measures a patient's ability to formulate, store, and check all the goals and subgoals to effectively respond to environmental demands in ecological situations and to complete the specified task. The VMET has demonstrated good inter-rate reliability, showing an intraclass correlation coefficient (ICC) of 0.88 (Cipresso et al., 2013b) and good usability (Pedroli et al., 2013). This test has demonstrated that it can be used with patients who are not familiar with computerized tests. On the basis of these methodolical strengths, we argue that the VMET may significantly improve the traditional assessment of EFs in PD-MCI patients.
The main objective of this study is to compare the performances of PD-MCI with PD with normal cognition and cognitively healthy controls using traditional assessments of EFs and the VMET. In order to understand which subcomponents of EFs are early impaired, this experimental study aimed to specifically investigate the instruments that best discriminate among these three groups.

PARTICIPANTS
A total of 45 participants allocated to three groups were included in the study: 15 PD patients with normal cognition (PD-NC), 15 PD patients suffering from MCI, and 15 cognitively healthy individuals (CG, control group). The PD-NC group was composed of six women (40%) and nine men (60%), while the PD-MCI and the CG included seven women (46.7%) and eight men (53.3%) and nine women (60%) and six men (40%), respectively. CG and PD patients were recruited from the San Giuseppe Hospital's Istituto Auxologico Italiano in Verbania, Italy. Individuals did not receive money for their participation in the study. Detailed demographic and clinical characteristics of the three groups are reported in Table 1. Individuals gave their written consent for the procedures, which were approved by the Ethical Committee of the Istituto Auxologico Italiano.

NEUROPSYCHOLOGICAL GLOBAL ASSESSMENT AND PARKINSON's DISEASE CLASSIFICATION
PD patients were classified into the two cognitive groups (PD-NC and PD-MCI), following the guidelines of the Task Force for the diagnosis of PD-MCI (Litvan et al., 2012). The proposed PD-MCI criteria utilized a two-level schema depending on the comprehensiveness of the neuropsychological testing. The Level I and II categories represent PD-MCI, but they differ in regard to the type of neuropsychological assessment and, consequently, the level of diagnostic certainty. Specifically, for the diagnosis of PD-MCI by Level II criteria, the Task Force recommends comprehensive neuropsychological testing that highlights either two impaired tests in one cognitive domain or one impaired test in two different cognitive domains. For the division of PD patients into PD-ND and PD-MCI (Level II), a comprehensive neuropsychological battery with at least two neuropsychological tests per cognitive domain was employed. First, to evaluate the cognitive functioning of the participants in the study, the Mini Mental State Examination (MMSE; Folstein et al., 1975) was administered. The MMSE is a brief questionnaire widely used to obtain a picture of an individual's present cognitive performance in different cognitive domains (short-and long-term memory, orientation, attention, verbal fluency, and constructional apraxia). A score of <24 is generally the accepted cutoff, indicating the presence of cognitive impairment. The MMSE has been validated in the Italian sample with 1019 elderly subjects (aged 65-89 years) (Magni et al., 1996).
To evaluate the visuo-spatial function, the Behavioral Inattention Test (BIT; Wilson et al., 1987) was used. The BIT is traditionally used to screen for neglect behaviors, and it consists of six conventional pencil and paper subtests and nine behavioral subtests reflecting aspects of daily life. In the present study, the Italian validation of the BIT's conventional subtests was administered (Wilson et al., 2010): line crossing, letter cancellation, star cancellation, figure and shape copying, line bisection, and representational drawing. The maximum total score is 146 points.
To assess language comprehension abilities, the Token test was administered within the brief neuropsychological examination (Mondini et al., 2003). This is a simple test which requires 20 tokens that vary in shape, color, and size. The Italian validated test has 32 commands, each of which requires the attention and/or the manipulation of one or more of the tokens (e.g., "Put the small red square under the white large circle.").
The Italian validated Digit Span was used to evaluate shortterm memory abilities (Orsini et al., 1987). In this easy-toadminister test, the researcher reads a series of digits aloud to the participant, who is requested to repeat back the same series of digits in the same sequence (i.e., 9-1-7 for 9-1-7). To assess long-term memory abilities, the Short Story test (Novelli et al., 1986a) was administered. The researcher read aloud the Short Story, required participants to provide a first immediate recall, then read aloud the story again, requesting another immediate recall. After a delay of around 15 min, participants were asked for a delayed retrieval. The final score is the average of the number of correctly recalled morphological units over three recall trials.  In order to specifically evaluate the spatial memory abilities of the study's participants, the following standard neuropsychological test was administered: the Corsi Block Test (Corsi, Unpublished Thesis; Spinnler and Tognoni, 1987). This task is used to measure short-term spatial memory (Corsi Span) and long-term spatial memory (Corsi Supraspan). The participants are invited to tap a sequence of wooden blocks in the same order as the researcher, with increasing span length on each trial.
Neuropsychological data for the three groups are reported in Table 2.
All scores obtained from these neuropsychological tests have been corrected for age, education level, and gender, according to Italian normative data.

EXECUTIVE FUNCTIONS EVALUATION
In order to fully evaluate the executive functioning of the study participants, a comprehensive standard neuropsychological battery focused on the different aspects of EF was administered.
The Clock Drawing Test (Freedman et al., 1994;Caffarra et al., 2011) has been traditionally used to assess a wide range of cognitive abilities including EFs, specifically understanding verbal instructions and abstract thinking and planning abilities. It is brief, easy to administer, and has excellent patient acceptability. Participants were required to draw numbers in a circle on a paper to resemble a clock and then draw the hands of the clock to read "10 after 11".
To evaluate multi-tasking and cognitive flexibility, two types of verbal fluency tests were employed. Phonological verbal fluency (Novelli et al., 1986b;Lezak, 1995) is a traditional neuropsychological measure of language production in which a number of words are given with an initial letter (e.g., F). Semantic verbal fluency (Novelli et al., 1986b;Lezak, 1995) is a more complex traditional neuropsychological measure of language production in which the number of words in a specific category produced in 60 s (e.g., animals) is evaluated. Both tests require participants to use executive processes to solve them because an efficient and creative organization of the verbal retrieved material, as well as the inhibition of responses when appropriate, is crucial.
To specifically detect early deficits in problem-solving and planning, the Tower of London test (Shallice, 1982;Fancello et al., 2006) was administered. The researcher explained the rules of the task (e.g., don't make more moves than necessary), and then used one tower with three rods of descending heights and a set of beads to display the desired goal: Participants are invited to rearrange the set of beads on the tower to match the examiner's configuration.

THE VIRTUAL MULTIPLE ERRANDS TEST (VMET)
The VMET consists of a Blender-based application that enables the active exploration of a virtual supermarket, where participants are requested to select and buy various products presented on shelves. From a technical point of view, the VMET was created with the software NeuroVR 1 (Riva et al., 2011), a free virtual-reality platform for creating virtual environments useful for neuropsychological assessment and neurorehabilitation. NeuroVR is software that allows nonexpert users to adapt the content of several virtual environments to the specific needs of the clinical and research setting. Thanks to the NeuroVR Player, it is possible to visualize virtual environments: The user enters the virtual supermarket, and he/she is presented with virtual objects of the various items to be purchased. Each virtual object has been inserted through the NeuroVR Editor, which offers a rich database of 2D and 3D objects; these can be easily placed into the predesigned virtual scenario by using an icon-based interface. Using a joystick, the participant is able to freely navigate the various aisles (using the up-down joystick arrows) and to collect products (by pressing a button placed on the right side of the joystick), after having selected them with the viewfinder. After an initial training phase with a smaller supermarket, the user enters the virtual supermarket and is presented with virtual objects of the various items to be purchased (Figure 1).
The virtual supermarket contains products grouped into the main grocery categories, including beverages, fruits and vegetables, breakfast foods, hygiene products, frozen foods, garden products, and pet products. Signs at the top of each section indicate the product categories as an aid for navigation.
Participants are also given a shopping list, a map of the supermarket, some information about the supermarket (opening and closing times, products on sale, etc.), a pen, a wrist watch, and the instruction sheet. The instructions are fully illustrated for the participants, and the rules are explained with precise reference to the instruction sheet. The VMET test is composed of four main tasks. The first involves purchasing six items (e.g., one product on sale). The second involves asking the examiner information about one item to be purchased. The third involves writing the shopping list 5 min after beginning the test. The fourth involves responding to some questions at the end of the virtual session by using useful materials (e.g., the closing time of the virtual supermarket). To complete the task, participants have to follow several rules: (1) they have to execute all the proposed tasks; (2) they can execute all the tasks in any order; (3) they cannot go to a place unless it is a part of a task; (4) they cannot pass through the same passage more than once; (5) they cannot buy more than two items per category (look at the chart); (6) they have to take as little time as possible to complete the exercise; (7) they cannot talk to the researcher unless this is a part of the task; and (8) they have to go to their "shopping cart" after 5 min from the beginning of the task and make a list of all their products. The time is stopped when the

FIGURE 1 | Screenshot of the virtual multiple errands test (VMET).
participant says, "I finished." During the task, the examiner takes notes on the participant's behaviors in the virtual environment. As suggested by Shallice and Burgess (1991), the following errors were recorded (please also see the VMET validation procedure in Raspelli et al., 2012): task failures, inefficiencies, strategies, rule breaks, and interpretation failures. A task failure occurs when a subtask is not completed satisfactorily; for example, the first task required participants to purchase six items, so it was composed of six subtasks. For errors in executing the tasks, the scoring range was from 11 (the participants had correctly done the 11 subtasks) to 33 (the participants had totally omitted the 11 subtasks). The scoring scale for each task failure was from 1 to 3 (1 = the participant performed the task 100% correctly as indicated by the test; 2 = the participant performed aspects of the task, but not completed 100% accurately; 3 = the participant totally omitted the task). An inefficiency occurs when a more effective strategy could have been applied to accomplish the task. An example of the eight inefficiencies is not grouping similar tasks when it is possible. The scoring range was from 8 (many inefficiencies) to 32 (no inefficiencies). More precisely, the scoring scale for each inefficiency was from 1 to 4 (1 = always; 2 = more than once; 3 = once; 4 = never). To measure the participant's ability to use effective strategies that facilitate carrying out the tasks, it is possible to evaluate 13 possible strategies. An example of a good strategy is doing accurate planning before starting a specific subtask. For each strategy, the scoring scale for each strategy was from 1 to 4 (1 = always; 2 = more than once; 3 = once; 4 = never). The total score range was from 13 (good strategies) to 52 (no strategies). A rule break occurs when one of the eight rules listed in the instruction sheet has been violated (e.g., talking with the examiner when not necessary). The scoring scale for each rule break was from 1 to 4 (1 = always; 2 = more than once; 3 = once; 4 = never). For rule breaks, the scoring range was from 8 (a large number of rule breaks) to 32 (no rule breaks). Finally, an interpretation failure occurs when the requirements of a particular task are misunderstood; for example, when a participant thinks that the subtasks all have to be done in the order presented in the information sheet. The scoring range was from 3 (a large number of interpretation failures) to 6 (no interpretation failures). The scoring scale for each interpretation failure was from 1 to 2 (1 = yes; 2 = no).

PROCEDURE
After participants gave written informed consent to participate, they underwent a neuropsychological global assessment; this was done in order to obtain an accurate overview of their cognitive function and to split the PD sample according to the guidelines of the Task Force for the diagnosis of PD-MCI (Litvan et al., 2012). Then, all participants were required to complete the neuropsychological functions evaluation. At the beginning of the experimental session, participants were asked to sit at a desk in front of a computer monitor to complete the VMET. The VMET was rendered using a portable computer (Intel Core 2 Duo with graphics board OpenGL compatible and 256 MB video memory; operative System was Microsoft Windows XP). Participants also had a gamepad (Logitech Rumble F510), which allowed them to explore and interact with the environment. Then they were asked to complete the VMET procedure after a training session. A training period of about 15 min was first provided in a smaller version of the virtual supermarket environment in order to familiarize participants with the navigation and shopping tasks. Windows, Chicago, IL, USA). To investigate differences in EFs and VMET scores between groups (CG vs. PD-NC vs. PD-MCI), a series of analysis of variance were calculated. Post hoc tests (with Bonferroni's adjustment) were carried out to compare significant differences. The level of significance was set at α = 0.05.

EXECUTIVE FUNCTION SCORES
In order to investigate differences in neuropsychological evaluation of EFs, a series of analysis of variance were computed with groups (CG vs. PD-NC vs. PD-MCI) as between variable. Five participants (three of the PD-MCI and two of the PD-NC group) were not included in the Clock Drawing Test analyses for errors in the score recording. Moreover, one patient from the PD-MCI group was excluded from the analyses of the phonological and semantic verbal fluency tests.

VMET SCORES
In order to investigate differences in VMET scores, a series of analysis of variance were computed with groups (CG vs. PD-NC vs. PD-MCI) as between variable. First of all, in regard to the time needed for each participant to complete the task, analysis showed significant differences between groups [F (2,42) = 3.83, p < 0.05, η p 2 = 0.154]. In particular, post hoc analyses indicated that the PD-MCI group took significantly less time (M = 1223, SD = 579, p < 0.05) compared with CG (M = 727, SD = 308).
Finally, no significant differences between groups were found in the rule breaks and in the interpretation failure. Results are summarized in Table 4.

DISCUSSION AND CONCLUSION
Because cognitive impairment is a common complication of PD and is associated with significant disability for patients and a burden for caregivers, it is crucial to fully investigate the distinguishing features of the neuropsychological profile in this clinical population (Aarsland et al., 1999(Aarsland et al., , 2000Schrag et al., 2000). As the PD progresses, a relevant proportion of patients will develop dementia (Aarsland et al., 2003;Bosboom et al., 2004;Hely et al., 2008). Specifically, Aarsland et al. (2005) found that more than 30% of PD patients have dementia. So the focus now is to identify patients with a potentially higher risk of dementia, with the possibility to implement an early and individualized cognitive rehabilitation treatment to improve their quality of life.  Particularly, an increasing number of studies have suggested that the executive deficits in PD are predictive of the conversion to dementia (Levy et al., 2002;Woods and Tröster, 2003). On these premises, the main objective of this study was to investigate the potentiality of the VMET, to integrate the traditional neuropsychological evaluation of EFs in PD with a more ecologically valid evaluation. This study offers initial evidence that a more ecologically valid evaluation of EFs is more likely to lead to detection of subtle executive deficits in PD patients. VMET specifically seems to capture the early executive dysfunctions of PD-NC patients, while they did not differ in the traditional assessment of EFs when compared to CG.
First, although some recent reviews suggested that executive deficits are present in the early stage of PD (Kudlicka et al., 2011;Ceravolo et al., 2012), our results showed that PD-NC patients were not impaired in the traditional neuropsychological evaluation of EFs when compared with the CG. In fact, in their review, Kudlicka et al. (2011) underlined that studies on EFs in PD are marked by a general lack of clarity in regard to the measure selection and their clinical interpretation. Obviously, it is crucial to acknowledge the possibility that different results across studies might reflect the different tests used, and the underlying functions that the tests are thought to capture. So it is crucial to fully understand which subcomponents of EFs are impaired early in this population. In this direction, Kudlicka et al. (2013) used a data-driven approach to investigate which areas of EF are particularly deficient in 34 patients with PD. Results showed that the impairment was more profound in tests requiring timeefficient attentional control; for example, the Trail Making Test (Tombaugh, 2004).
Our findings showed only a significant difference in the semantic verbal and phonetic verbal fluencies between PD-NC and cognitively healthy participants. As previously explained, verbal fluency tests measure several EF components, including setswitching, strategy generation, and rule attainment, along with other non-EF components such as semantic memory and verbal lexicon. Our results are consistent with a recent meta-analysis that reports verbal fluency deficits in PD (Henry and Crawford, 2004). Specifically, Henry and Crawford (2004) found that PD patients were significantly more impaired in semantic fluency, concluding that this deficit may be associated not only with a problem in executive functioning, but also properly with an initial disorder in the semantic memory (namely, concept-based knowledge). Also, in an interesting study with 88 PD patients and 65 healthy participants, Koerts et al. (2013) pointed out that verbal fluency deficits can be interpreted in light of the progression of the disease and the dysfunctions in other cognitive domains. The performance in the verbal fluency tests is explained by the psychomotor speed in the mild stage of PD, while the cognitive flexibility accounts for deficits in those tests in the moderate phases of the disease.
Concerning the VMET, as previously indicated, our main findings revealed significant differences in some VMET scores between the PD-NC and the cognitively healthy participants. Specifically, within all VMET scores, it is interesting to note a significant difference in task failure and strategies between these two groups. PD-ND patients, compared with cognitively healthy controls, made a greater number of errors in completing the subtasks of VMET. Furthermore, compared with the CG, PD-ND patients showed poorer ability in using effective strategies that facilitate the carrying out of the tasks; for example, accurate planning before starting a specific subtask or using the map for navigating the virtual supermarket. These executive deficits may reflect a specific deficit in cognitive flexibility; namely, the ability with which a person's conceptualization changes selectively to effectively respond to external/internal stimulation. This may also explain why there is no significant difference between PD-ND and PD-MCI in these VMET scores. Indeed, to discriminate between PD-MCI and PD-ND, it is important to follow the recent guidelines of the Task Force (Litvan et al., 2012). So our findings confirm that the traditional assessment of EFs appears to be more useful to detect differences in the EFs between these two cognitive groups.
In conclusion, our results showed that the VMET appears sensitive to evaluate the functional status of PD with normal cognition, as manifested in terms of executive deficits, which had not been fully acknowledged by traditional neuropsychological evaluations. The VMET allows the possibility to evaluate some subcomponents of EFs in ecological settings, giving a more accurate estimate of the patients' deficits that are difficult to detect with traditional tests.
As previously explained, one of the most crucial criticisms of the neuropsychological tests is the lack of ecological validity (Goldstein, 1996;Chaytor and Schmitter-Edgecombe, 2003;Chan et al., 2008). Even though patients with supposed executive deficits may perform as well as controls on traditional neuropsychological tests, they may experience difficulties in real world situations. VR may be used to offer a new human-computer interaction paradigm in which patients are active participants within an ecological virtual world (Riva, 2009). In virtual tasks such as the VMET, it is possible to simulate life-like challenges, which require a more complex series of goals to achieve and the cognitive flexibility to elaborate different strategies to accomplish them and to inhibit inappropriate actions.
Our results may also represent a theoretical contribution in the attempt to isolate the specific subcomponents of EF. Most of the traditional neuropsychological tests, therefore, measure one specific EF component, but they don't reflect a true picture of a functional patient's status. According to different theories, however, EF is best conceptualized as a system of interconnected processes guided necessarily by a central supervisor system to facilitate goal-oriented behavior (Luria, 1966;Norman and Shallice, 1986;Miller and Cohen, 2001;Miller et al., 2002). Our findings contribute to emphasize the idea that a breakdown in the executive control mechanisms is reflected in deficits in many multitasking behaviors, such as effective planning and strategy allocation and monitoring.
The findings of this study are interesting and valuable, but there are some limitations. First, the small sample size of 45 participants may limit the generalizability of the results. The sample, however, was carefully evaluated with a comprehensive neuropsychological assessment according to the criteria established by Litvan et al. (2012). Second, considering the use of computerized tests for PD patients with motor deficits, it would be important to also assess the individual's perception of VMET usability (for example, difficulties during the experience in using the joystick, selecting products from aisles, and learning to move in the supermarket). As explained above, a recent study showed good usability of this virtual instrument (Pedroli et al., 2013). The performance on the VMET, however, must be read with consideration of the motor deficit. A final limitation of our study is the difference between the PD and CGs in terms of years of educations. All scores obtained from neuropsychological tests were corrected for education level according to Italian normative data, but the results from VMET must be viewed according to this potential limit. A future challenge is to explore the relative impact of age, gender, education on VMET scores: for example, in an interesting work of Boone (1999) it was found that the impact of educational level and gender was limited to some Wisconsin Card Sorting Test score. Obviously, further studies are needed to evaluate the potentiality of the VMET, especially in terms of its temporal stability, namely, test-retest reliability and criterion validity for PD. This study, however, provides initial evidence that a more ecological evaluation of EFs may provide the possibility to also detect subtle executive deficits in PD-ND patients.
All participants' data were memorized in encrypted and password-protected files, following the criteria to protect personal health information (El Emam et al., 2011) and using PsychoPass method  to generate and share passwords information among colleagues.