The self in conflict: actors and agency in the mediated sequential Simon task

Executive control refers to the ability to withstand interference in order to achieve task goals. The effect of conflict adaptation describes that after experiencing interference, subsequent conflict effects are weaker. However, changes in the source of conflict have been found to disrupt conflict adaptation. Previous studies indicated that this specificity is determined by the degree to which one source causes episodic retrieval of a previous source. A virtual reality version of the Simon task was employed to investigate whether changes in a visual representation of the self would similarly affect conflict adaptation. Participants engaged in a mediated Simon task via 3D “avatar” models that either mirrored the participants’ movements, or were presented statically. A retrieval cue was implemented as the identity of the avatar: switching it from a male to a female avatar was expected to disrupt the conflict adaptation effect (CAE). The results show that only in static conditions did the CAE depend on the avatar identity, while in dynamic conditions, changes did not cause disruption. We also explored the effect of conflict and adaptation on the degree of movement made with the task-irrelevant hand and replicated the reaction time pattern. The findings add to earlier studies of source-specific conflict adaptation by showing that a visual representation of the self in action can provide a cue that determines episodic retrieval. Furthermore, the novel paradigm is made openly available to the scientific community and is described in its significance for studies of social cognition, cognitive psychology, and human–computer interaction.


Introduction
Cognitive control refers to the ability to withstand temptation and avoid distraction in order to reach certain goals. This is true for definitions from both social and clinical studies -in which such goals are generally longer term, abstract and self-referencing (Baumeister et al., 2000) -and cognitive science -in which they tend to be short term ("in the next block"), very specific ("press a button as cued by the center of the stimulus, not its flankers") and referencing a specific task designed by the experimenter (here Eriksen and Eriksen, 1974). Despite these differences, cognitive control is commonly portrayed as a kind of limited resource that allows us to handle conflicts and interferences: should the resource run low, we may fail to act quickly or correctly.
This somewhat dualistic characterization of control is reflected in models formalizing conflict and control in terms of models featuring two routes. A stimulus can trigger, quickly or automatically, responses that are typical for our normal functioning: the urge is to deal with this token stimulus as with any other of its kind. A secondary type of processing works its slow, willful way top-down from a goal level toward the more complex processing of the stimulus. For example, in the popular Stroop task (Stroop, 1935), in which we are asked to respond "green" if the word "red" is written in green, we are almost overwhelmed by the automatic reaction to repeat our well-rehearsed training and "read out loud" the word, rather than mind the coloring. Thus, the conflict is between competing responses of the two routes, while the executive control is supposed to suppress the incorrect response.
Despite the apparent simplicity of dual-route models, they do elegantly account for a more recently found effect called conflict adaptation. The effect has also been referred to as Gratton effect, or sequential conflict modulation effect, and refers to the observation that after experiencing one instance of conflict, subsequent conflict becomes easier. The effect seems to extend across diverse conflict tasks, including the Stroop task (Egner and Hirsch, 2005b;Spapé and Hommel, 2008), the Simon task (Simon and Rudell, 1967;Hommel et al., 2004) and the Eriksen Flanker Task (Eriksen and Eriksen, 1974;Gratton et al., 1992). To improve clarity, we shall refer to the conflict adaptation effect (CAE) independently from the specific paradigm in which it is encountered, formulating it as: In which capital Cs and Is denote currently compatible (congruent, non-conflicting) and incompatible (incongruent, conflicting) trials, whereas lower case cs and is refer to preceding (often termed N-1) compatibility and incompatibility. The formula thus quantifies the effect as the reduction of conflict-effects as a function of preceding trials.
Dual-route models of executive control account for the CAE by suggesting that a conflicting trial -the word "red" in green -triggers the recruitment of attentional resources to cope with the response uncertainty (Botvinick et al., 2001). Depending on the preferred model, this would mean for our example either that task-relevant route (the color-response association) is facilitated, or that part of the irrelevant stimulus processing route (the word-response association) is suppressed. The result is more or less the same: if, on a subsequent trial, the word "green" is presented in red, the system should be able to cope with ease: both our enhanced color-route, or our attenuated verbal route leaves us well-prepared for correct action.
However, recent observations suggest dual-route models may not adequately account for localized, or context dependent conflict adaptation. For example, if attentional resources are generically recruited after experiencing conflict, one should predict smaller subsequent conflict effects, independent of the taskwhich is not always the case (Notebaert and Verguts, 2008). Furthermore, even within a task, changing a task-irrelevant feature between two Stroop (Spapé and Hommel, 2008) or Simon  displays, critically reduces the CAE. Finally, the outcome of conflict in terms of reward has also been shown to affect the CAE (Van Steenbergen et al., 2009). It seems, then, that a unitary, limited resource type of executive control would fail to account for these observations. Sequences of conflict, however, involve many more cognitive functions than just executive control. To understand what happens in any kind of task repetitions, it is necessary to take a more detailed look at the specific features involved in sequences of conflict. For one, it has been argued that if conflict changes (i.e., cI and iC sequences), some part of the stimulus or response must be different as well, whereas if the conflict does not change (in cC and iI), there is usually a proportion of trials in which the whole stimulus-response scenario is repeated. In other words, priming -rather than cognitive control -was pointed out to be at least partly responsible for the CAE pattern (Mayr et al., 2003).
Further aggravating the situation was the observation by Hommel et al. (2004) who showed that increased errors and reaction latencies observed in cI and iC sequences could be traced back to their constituent features partly repeating. Following in the footsteps of Kahneman et al. (1992), they provided evidence that if one scenario (e.g., an arrow left pointing to the left) is similar to a previous representation in that features are repeated (an arrow left pointing to the right), an episodic retrieval effect ensues. This is problematic for two reasons: (1) the repeated feature (the location of the arrow) thus prompts a no longer relevant and indeed conflicting response; and (2) the partial overlap itself may be problematic for the cognitive system (Treisman, 1996;Hommel et al., 2001).
It is thus possible that the workings of episodic retrieval, memory and a type of pattern recognition may account for both the CAE and the context dependency of the CAE. This "stronger" account suggests that the data can fully be accounted for by referring to the "lower-level" functions involved in priming (Mayr et al., 2003), episodic retrieval (Hommel et al., 2004) and contingency learning (Schmidt and Besner, 2008). Thus, there would be very little theoretical need to postulate the extra limited resource to sometimes come to our aid and cognitive control is reduced to an illusory epiphenomenon of free will.
Alternatively, a mechanism featuring episodic retrieval causing conflict adaptation could reconcile "pure control" with context dependency effects. As we have argued before Hommel, 2008, 2014), it is possible that the similarity of situations between two trials may not only retrieve the previous episodes in terms of their constituent features, but also in terms of control parameters. Thus, tasks involving an amount of similarity, because, e.g., a Simon stimulus gradually rotated into its new position, causing updated episodic memory (Spapé and Hommel, 2010) or a voice presenting an auditory Stroop stimulus is repeated (Spapé and Hommel, 2008), may result in conflict adaptation. Conversely, gradually rotating the Simon display to the wrong position or presenting a stimulus in a different tone of voice may interfere with retrieval of executive control (for a similar proposal, see Egner, 2014).

Present Study
The mapping of contingencies of conflict adaptation thus remains important while the debate concerning the status of conflict adaptation continues. The present study was somewhat inspired by the earlier cited observation of the context dependency of the CAE (Spapé and Hommel, 2008). In that study, the words "high" and "low" were mixed with high and low tones, and participants were asked to judge the pitch of the tones and ignore the words. A type of Stroop effect was observed-participants found it difficult to not imitate the voice-as well as conflict adaptation-the Stroop effect was smaller after incompatible trials. The context dependency was in the voice: although it was entirely irrelevant to the task, changing the voice from one gender to the other caused interference with the CAE.
A visual version of this task was designed for the present study, with one critical change: the degree of ownership over the contextual change. Rather than changing something entirely irrelevant as in the original study, or changing the task itself (Notebaert and Verguts, 2008), we set out to change the degree to which the change was related to the person involved in the task. Participants were engaged in the task in two conditions: directly or mediated by a visual representation of themselves, which we will refer to as the "avatar." Similar to the original study, this avatar served as a contextual cue, and could either alternate or repeat between two genders. Although entirely irrelevant to the task, changes in avatar identity should, according to the episodic retrieval account of the CAE, affect the conflict-control pattern. That is, repeating the avatar should act as a cue, prompting retrieval of the preceding trial and possibly its conflict-related aspects. Changes in the identity of the avatar should, conversely, interfere with retrieval and thereby reduce the CAE.
However, to go beyond previous studies related to the contextdependency of the CAE, we investigated whether the relationship between the participant and their virtual identity would have an effect on conflict and control. By using a motion tracking device, we established a sense of agency over the avatar, projecting it as standing in front of the participant and mimicking the participants' gestures. Previous studies used similar techniques in order to manipulate the representation of the self toward the virtual identity (Lenggenhager et al., 2007). In the present experiment, we contrast this "dynamic" condition in which the avatar is displayed as co-acting the participant's gestures, with a "static, " control condition in which the avatar did not move.
On the one hand, creating a sense of agency over the avatar by making it respond to the task necessarily increases the degree to which the avatar is task-relevant. Given that conflict-resolution has previously been found to work on task-relevant features (Egner and Hirsch, 2005a), a conflict-control point of view would predict changes in a task-related avatar's identity to be of greater impact than changes in a static, and therefore neutral and irrelevant, picture. On the other hand, however, the degree of agency over the avatar could create the impression that the avatar is "part of " the participant. Thus, a superficial change in the visual appearance of the self-related object should be negated by the sense that it acts as a pointer toward the distal representation: the participant him or herself.
The motion tracking device furthermore enabled us to go a step beyond the traditional reaction times (RTs). Recent studies used single-handed pointing movements (Buetti and Kerzel, 2008) and mouse pointer trajectories (Scherbaum et al., 2010) and analyzed movement trajectories in order to dissociate conflict mechanisms underlying the Simon effect. In these studies, the spatial location of a stimulus was found to cause a shift in movement trajectory toward the stimulus (Buetti and Kerzel, 2009). Here, we explored whether this continuous, "visuomotor" Simon effect (Wiegand and Wascher, 2005) could similarly be observed in a gesture-based, two-handed paradigm. Similar to these studies, we expected the visual location of the stimulus to evoke unintentional movement toward that location. However, in this two-handed study, such movement should occur in the other hand, even though it is irrelevant for executing the desired gesture. To our knowledge, there are as yet no studies directly testing the conflict dependency of the CAE on this type of movement trajectory measure, but we expected the pattern of the irrelevant movement (IM) to largely follow that of traditional RT.

Participants
We partly based the number of participants on similar episodic studies, such as Spapé and Hommel (2008), who observed a sizable effect size of identity switches on conflict control of η 2 p = 0.56 with 14 subjects. However, given the unknown, additional factor of avatar animation, and the novel apparatus in use, we ultimately recruited 18 volunteers (seven female). They were 27.1 ± 3.2 years of age and took part in the study in exchange for cinema tickets. Before signing informed consent, they were informed of their rights in accordance with the Declaration of Helsinki. One (female) participant could not complete the study and was removed from further analysis.

Apparatus and Stimuli
The Xbox-360 Kinect (Microsoft, Redmond, WA, USA) is a motion sensing input device that uses a depth camera to track up to six persons and estimate full skeletal tracking information of two persons. Its sensor has a frame rate of 30 Hz, a field of view of 57 • × 43 • , and 27 • of vertical tilt range, to obtain information for estimating the 3D spatial position of 20 joints for each body. In the study we used it for tracking the position of both hands relative to the torso. Furthermore, we calculated the participant's joint orientation. In the dynamic condition of the present study, the detected joint orientation was projected onto the avatar, giving it participant-avatar congruence in bodily motion. Figure 1 shows the basic characteristics of the Simon task, which was displayed on a 95.17 cm × 57.10 cm virtual screen which itself was projected on a 254 cm × 142.875 cm Screenline real screen. All task related stimuli -the circles, stars, and fixation crosshair -were 28.55 cm × 28.55 cm. Left and right locations were defined as occurring at, respectively, 28.58 cm left and right from the center of the screen. The 3D character, referred to as the "avatar, " was presented at a location below and slightly overlapping the central fixation, as to give the impression that it was standing in between the participant and the virtual screen. It was 25.32 cm × 105.51 cm in size (of which the lower ca. 30 cm not visible) and was of either male or female gender.

Procedure
After reading written instructions, participants witnessed a demonstration of the experiment involving one of the authors FIGURE 1 | Schematic display of the trial procedure for compatible and incompatible conditions. Participants were instructed to keep their hands together, until the target stimulus (a circle or star) was displayed, prompting a left or right handed action, respectively. Both positive (pictured) and negative feedback was displayed in the first 16 trials, whereas during the rest of the experiment, either negative feedbackfollowing incorrect reactions -or blank virtual screens -following correct actions -were used. The Simon effect refers to the effect of incompatibility between stimulus-and response-location, as is the case in the lower middle panel.
undertaking 16 trials to show the task. Participants were then asked to stand at a distance between 2.5 and 3.5 m from the screen with the arms spread wide, while the instruments were calibrated. If participants had no further questions, they were asked to move their hands together to start the first trial of the experiment.
Every trial started with a fixation crosshair, displayed for ca. 1 s of stable identification of both participant's hands remaining near the center of their body. Then, a star or circle was presented to the left or right of the virtual screen. Participants were instructed to move their left arm left if a circle was shown and their right arm right if a star was shown, irrespective of the location of the stimulus. Movements were detected if the participant moved either hand 20 cm lateral to their shoulders, at which point the star or circle was removed from the screen. Only once the participant moved both their hands back together would the next trial begin. Avatars were presented throughout the experiment as either "static" or "dynamic, " the latter case referring to the scenario that the movements of the participants were reflected in the movements of the avatar.

Design and Measurements
The general design of the experiment was based on 2 (locations, left vs. right) × 2 (shapes requiring left vs. right responses) × 2 (avatar identities) × 2 (animations) × 16 = 256 trials with one block of 128 trials for each type of animation, presented in counter-balanced order with equal numbers of compatible (location = response) and incompatible (location = response) trials. The analysis was based on two four-way repeated measures ANOVAs with animation (static vs. dynamic), avatar repetition (vs. alternation), previous compatibility (vs. incompatibility), and current compatibility (vs. incompatibility) as factors. Within each block, a restricted random sampling procedure was used to generate at least 12 occurrences for each design cell.
Two measurements were tested independently: RT and incorrect movement (IM) velocity. The RT was measured as the difference between the onset of the target stimulus (i.e., the circle or star) and the time at which a displacement of either of the participant's hand was detected at least 20 cm relative to the corresponding shoulder. The IM was measured as the peak velocity of the average movement trajectory of the inactive hand prior to the final movement (occurring on average at 601 ± 25 ms after target onset). The movement of the correct hand was also recorded, but not analyzed, as it is confounded with RT (see Figure 2).

Results
The first eight trials as well as the first trial in each block were considered still part of training and removed from analysis. All trials with slow (RT > 1000 ms) or incorrect reactions were also removed, as well as the first trial directly after such scenarios, constituting 9.1 ± 6.3% of trials.
In repeated measures ANOVAs with animation of the avatar (static vs. dynamic), the repetition of the avatar (repeated vs. alternated), the previous compatibility (vs. incompatibility), and current compatibility (vs. incompatibility) on RT and IM, current compatibility significantly affected both RT , F(1,15)   Neither of the other main effects was significant for RT, ps > 0.59, and IM, ps > 0.20. In general, the IM measure showed a pattern similar to the RT, with interacting variables significantly affecting either both RT and IM, or neither. However, one effect was uniquely observed for one measure: compatibility significantly interacted with avatar identity, F(1,17) = 4.60, MSE = 10.70, p = 0.048, η 2 p = 0.22, for IM only. This indicated that the compatibility effect was larger (C-I = 40.4 pts) after repeated than after alternated (23.3 pts) avatar identities.
Critically, a significant interaction effect between previous and current compatibility was observed for both measures, RT F(1,15) = 80.31, MSE = 545.71, p < 0.001, η 2 p = 0.83; IM F(1,15) = 13.02, MSE = 16.88, p = 0.002, η 2 p = 0.45. This showed a clear replication of a CAE, with the effect of incompatibility being reduced following incompatibility, for both RT (cC -cI = 73 ms, iC -iI = 22 ms) and IM (cC -cI = 49.8 pts, iC -iI = 13.9 pts). Finally, a significant four-way interaction suggested conflict adaptation to be dependent on both the repetition of the avatar, and its animation, RT F(1,15) = 5.25, MSE = 84.36, p = 0.04, η 2 p = 0.25, and IM F(1,15) = 10.37, MSE = 8.60, p = 0.005, η 2 p = 0.39. To better understand the significant four-way interaction, we calculated the interaction term for each individual combination of avatar animation and avatar repetition. These CAE scores represent the decrease in the conflict effect as a function of preceding trial and are summarized in Figure 2. As can be seen from the figure, a maximal CAE was observed in repeated, static conditions for both RT and IM, indicating a replication of a standard CAE or Gratton effect (Gratton et al., 1992;Botvinick et al., 2001). CAEs were lower during static, alternated trials, with the CAE in IM turning to insignificance (4.15 ± 16.58 pts), replicating previous observations of the context dependency of the CAE. However, this context dependency itself was modulated by the animation of the avatar as, with dynamic conditions, the alternated avatar identities no longer caused a disruption of the CAE.

Discussion
The results show that both the identity of the avatar, and its relation with the participant, affect cognitive performance. In general, participants suffered from a smaller conflict effect after conflict was repeated. Replicating previous studies suggesting conflict adaptation acts locally, or depends critically on irrelevant cues, the CAE was found to be disrupted if the identity of the avatar was changed. In other words, despite the avatar itself being entirely irrelevant to the task, a subtle change in its appearance reduced the CAE. This could be due to the change in cue disrupting recall of the preceding episode, disrupting feature integration and perhaps recall of control-related parameters.
One might imagine, as we sketched in the introduction, that perceiving the avatar as actively mimicking the participant's actions would make it necessarily related to the task, as opposed to, as in the static case, an accidental bystander. Consequently, a change in the mirror image could constitute a particularly disrupting, if not disturbing event: after all, such an imaginary change in self-perception is a classic motif in horror stories (Dietrich, 1992) and a symptom in psychiatry (Maack and Mullen, 1983). Whether frightful or merely task-relevant, the predicted effect of avatar changes should from this perspective be larger in animated than in static conditions. However, this prediction clearly did not hold. Conditions in which the avatar was displayed dynamically, with its movements mimicking those of the participant, showed no longer the disruptive effect of identity changes on the CAE. Indeed, if anything, the effect sometimes even seemed to increase after a change.
One way to account for this could be in terms of an integration process that makes the avatar similar to "a tool" as held by the participant. In the rubber-hand illusion, seeing an object being stroked and feeling the sensation on the real hand brings about the perception that the virtual object is part of oneself (Botvinick and Cohen, 1998). Here, a virtual persona is likewise presented in synchrony with the participant's actions. By acting consistently in concert with the subject, it is likely that a bi-directional association is formed (Hommel, 1996), between one's own intentions and the behavior carried out by the avatar. Such bidirectional association has recently been shown to elicit a certain unity between model and imitator, as shown by facilitated action execution if a model anticipates imitation rather than counter-imitation (Pfister et al., 2013).
Thus, if perceiving the dynamic avatar results in similar corepresentation, the result could be that in the dynamic condition, the avatar is not necessarily an aspect of the task anymore, but an aspect of the agent. This, in turn, should have a critical effect on control in the degree to which the new and the old trial relate: the superficial identity of the avatar may have changed, but it should still point toward the same distal (Hommel, 2009) property. The repetition would then act as an episodic recall cue for the preceding trial, in which the same agent (i.e., the participant himor herself) was present. In other words, different task-related, whether relevant or irrelevant, features may retrieve preceding, potentially partially overlapping trials, but changes in the avatar still relate to the self-same agent, who was always present in the preceding trial as well.
A competing explanation for the findings could be that the dynamically portrayed avatar made it more difficult to see changes affecting the identity of the avatar. However, this seems to run counter previous studies showing effects on conflict adaptation to remain even with stimulus displays featuring dynamic contextual cues (Spapé and Hommel, 2014). Alternatively, the animation itself was not critical in disturbing the context dependency of the CAE, but the fact that the animation was congruent with the participant's own movement. This form of agency could perhaps counteract the effect by inducing a type of "change-blindness" (Simons and Levin, 1997) to the changes in identity. In the end, however, this forward-interfering account seems presently difficult to distinguish from the earlier, retrieval-based one.
Finally, we would like to discuss some novel aspects of the platform and methodology used in the experiment, as with the publication of this article, we release it as open source, freely available (source 1 ) to the academic community. The compressed 1 www.cognitology.eu/SelfInConflict.html archive contains source, binaries and a short documentation file (see README.txt inside archive). Notice that, apart from the dynamic and static conditions referred to in the present manuscript, the platform also allows pre-programmed avatar animations with an onset equal to the average RT of the participant. We decided not to use these animations for the present study, as we had no predictions for model-imitator incongruency at the time (but see Pfister et al., 2013), but we could well-imagine this option could be of potential interest to fellow researchers.
The first aspect to note, particularly of interest for studies of conflict control, could be in the use of motion tracking. Although the field remains dominated by simple RTs and 2-4 alternatives forced-choice paradigms, current theoretical models, neuroscience methods and motor control paradigms (Scherbaum et al., 2010;Spapé and Serrien, 2010;Serrien and Spapé, 2011) indicate that focusing on the far endpoint of an action -the time at which a button is fully pressed -ignores valuable data. Although previous studies found compatibility affecting response force as well as RT (van der Lubbe et al., 2001), the present study goes further to show the time-course of response conflict in the irrelevant response modality. It is possible that the other hand provides a more optimal indicator of conflict than the correct hand, as it is presumably less affected by early control operations that may partially negate the final RT. Of course, previous studies have circumvented the issue by providing measures related to the activation of the irrelevant motor cortex (Valle-Inclán, 1996) and muscles (Hasbroucq et al., 1999). However, the presented IM measure has the advantage of being very directly related to irrelevant response tendency as well as being rather cost-effective in terms of expenses of consumer grade apparatus and the time involved for participants and researchers (no recording preparation or calibration requirements).
The second aspect of the study that merits further discussion is the virtualized design. The experiment in a wider setting may provide a relatively low-cost virtual reality platform for studies of cognition and social identity. Here, we showed effects of changing one's identity, implying that the setup can be a useful tool for the study of social and virtual identity. Social psychological effects, such as social facilitation (Zajonc, 1965) and conformity (Asch, 1951) can be easily tested without relying on confederates by adding extra avatars and operating them remotely (see Blascovich et al., 2002 for an overview of the benefits of immersive virtual environments). Tests of implicit stereotyping and embodied cognition could involve the adjustment of the shape of the avatar to enable identification with various cultural stereotypes. In sum, the study demonstrates that the present design (open source code 1 ) may provide an interesting, new way for a variety of researchers and fields of study.
Finally the study blends the fields of executive control and conflict with the study of human-computer interaction (HCI). Given the growing diversity of input techniques and the heterogeneity of user interfaces, basic psychological studies can inform design by taking into account how different interaction techniques inflict conflict or provide control. User interfaces, such as employed in the study are increasingly becoming part of everyday consumer products such as game consoles (Harper and Mentis, 2013) and public displays (Kuikkaniemi et al., 2011). This has prompted research in HCI to reconsider embodied interaction with virtual representations (Wilson et al., 2012). The study also demonstrates self-representing avatars may positively contribute to interfaces designed for scenarios with common distraction and a high demand for attentional control. This should motivate further investigation of effects of avatars on various persuasion phenomena on a wide range of different application contexts.