EyeFrame: Real-Time Memory Aid Improves Human Multitasking via Domain-General Eye Tracking Procedures

Taylor, P.; Bilgrien, Noah; He, Ze; Siegelmann, Hava T.

doi:10.3389/fict.2015.00017

ORIGINAL RESEARCH article

Front. ICT, 02 September 2015

Sec. Human-Media Interaction

Volume 2 - 2015 | https://doi.org/10.3389/fict.2015.00017

EyeFrame: real-time memory aid improves human multitasking via domain-general eye tracking procedures

P. Taylor^1,2*

Noah Bilgrien¹

Ze He^1,3

Hava T. Siegelmann^1,2

¹College of Information and Computer Sciences, University of Massachusetts, Amherst, MA, USA
²Neuroscience and Behavior Program, University of Massachusetts, Amherst, MA, USA
³Department of Mechanical and Industrial Engineering, University of Massachusetts, Amherst, MA, USA

Objective: We developed an extensively general closed-loop system to improve human interaction in various multitasking scenarios, with semi-autonomous agents, processes, and robots.

Background: Much technology is converging toward semi-independent processes with intermittent human supervision distributed over multiple computerized agents. Human operators multitask notoriously poorly, in part due to cognitive load and limited working memory. To multitask optimally, users must remember task order, e.g., the most neglected task, since longer times not monitoring an element indicates greater probability of need for user input. The secondary task of monitoring attention history over multiple spatial tasks requires similar cognitive resources as primary tasks themselves. Humans can not reliably make more than ^∼2 decisions/s.

Methods: Participants managed a range of 4–10 semi-autonomous agents performing rescue tasks. To optimize monitoring and controlling multiple agents, we created an automated short-term memory aid, providing visual cues from users’ gaze history. Cues indicated when and where to look next, and were derived from an inverse of eye fixation recency.

Results: Contingent eye tracking algorithms drastically improved operator performance, increasing multitasking capacity. The gaze aid reduced biases, and reduced cognitive load, measured by smaller pupil dilation.

Conclusion: Our eye aid likely helped by delegating short-term memory to the computer, and by reducing decision-making load. Past studies used eye position for gaze-aware control and interactive updating of displays in application-specific scenarios, but ours is the first to successfully implement domain-general algorithms. Procedures should generalize well to process control, factory operations, robot control, surveillance, aviation, air traffic control, driving, military, mobile search and rescue, and many tasks where probability of utility is predicted by duration since last attention to a task.

1. Introduction

Interactions with partially autonomous processes are becoming an integral part of human industrial and civil function. Given semi-autonomy, many such tasks can often be monitored by a user at one time. In the course of operating a computer or vehicle, a single human might manage multiple processes, e.g., search and rescue type mobile robots, performing medical supply distribution, patient checkup, general cleanup, firefighting tasks, as well as process control with many dials or readings, security or surveillance monitoring, or other forms of human-based monitoring or tracking tasks. Generally, each automated agent or process only needs intermittent supervision and guidance from a human to optimize performance, and thus a single user can remotely operate or supervise multiple entities, for efficiency of labor. When controlling multiple automated processes at once, the user must decide how to distribute attention across each task. Even if the operator conducts the same type of task with each automated process, this form of human–system interaction requires a multitasking effort.

Unfortunately, most people are notoriously poor multitaskers (Watson and Strayer, 2010) and can remain unaware of visually subtle cues that indicate the need for user input. Further complicating the situation, individuals who perform worst at multitasking actually perceive they are better at multitasking, demonstrated by negative correlations between ability and perception of ability in large studies (Sanbonmatsu et al., 2013). To make matters worse, humans often naturally develop a plethora of biases of attention and perception. To address many of these issues, divided attention performance has been studied for many years (Kahneman, 1973; Gopher, 1993). A further difficulty in multitasking is that brains rely heavily upon prediction and, fundamentally, are incapable of knowing what important information they have missed.

Eye tracking to ascertain point of gaze is a highly effective method of determining where people orient their attention (Just and Carpenter, 1976; Nielsen and Pernice, 2010), as well as what they deem important (Buswell, 1935; Yarbus, 1967). Traditionally, eye tracking informed post-experiment analysis, rather than helping users in the field in real-time. For example, a study might analyze optimal gaze strategies in high-performing groups, and then at a later date, train new users on those previously discovered optimal search strategies (Rosch and Vogel-Walcutt, 2013). For example, studies have trained novice drivers’ gaze to mimic experienced drivers with lower crash risk (Taylor et al., 2013).

Alternatively, eye movement strategies can be employed to optimize real-time task performance, since many eye-movements are capable of being intentionally controlled. For those eye movements that cannot easily be intentionally controlled, salient “pop-out” cues (e.g., flashing red box around target) can reliably direct attention in a more automatic, bottom-up manner. As we discuss further, many eye tracking systems have been developed for real-time control, with very few attempting pure assistance, though none were both successful and domain-general (Taylor et al., 2015). There appears to be a need for such an assistive system. Here, we tested a solution, which was uniquely domain-general, non-interfering, purely gaze-aware, and most importantly, yielded large benefits in performance.

The work described here maintained several hypotheses and operational predictions. When a user manages multiple tasks in the real world, the primary task must be performed, while a secondary task of correctly remembering their attention history also often contributes to optimizing performance. Here, we provided participants a computerized system to perform the secondary task of remembering, processing, and then displaying actions derived from gaze history. This treatment was predicted to increase users’ ability to perform any task requiring semi-random gaze patterns over a large mufti-faceted display or set of dials and readouts. Supplying participants with this highlighted inverse of eye gaze recency may improve user performance by delegating short-term working memory for attention history to the computer. This could generalize to many multitasking scenarios where probability of needed action on a given single task increases over duration since last interaction with the task.

2. Materials and Methods

We designed an eye tracking system to assist users in a monitoring and search task with multiple simulated agents. Assistance was specified by an algorithm to highlight in real-time the most neglected task elements. The game-like task designed for participants required human monitoring and interaction with multiple rescue robots at one time. Our algorithm helped users determine where to look, to improve performance and reduce cognitive load. Our goal was to create a game task, which tested more a general utility of design, to extrapolate to many future tasks, in which the probability of need per task increases with time.

2.1. Experimental Task for Human Subjects: Ember’s Game

To participants, the experimental task was presented as “Ember’s Game.” It was used to test our short-term working memory aid. A user managed from 4 to 10 mobile virtual agents (robots) on a computer monitor at one time, in sequential experimental blocks (trials). The game provided a measurable means to assess visual-spatial multitasking. Beyond the experimental questions it asked, the simulation was designed to be engaging for the user, practically oriented, and to represent a broad variety of human–robot, human–computer, or process-control tasks encountered in the fields of human factors and engineering psychology for industrial, civil, or military applications.

In each task, a human operator directed a firefighting robot to move semi-autonomously through a natural building-like environment, to rescue people and pets from a fire elsewhere in the building (Figure 1). Three types of objects were included in the game: (1) firefighting robots, which autonomously explored the space via a semi-random walk, saving firefighting victims upon contact, (2) primary targets, which were human rescue victims, when saved earned the operator 10 points, and (3) secondary targets, which were puppies, earned the operator 5 points when saved. The instructed goal of the game was to obtain as many points as possible by saving rescue victims within a fixed time limit. The user could improve on the robot’s performance by assuming control of the semi-random walk behavior to send the robot more directly to targets. This human intervention would accelerate rescue times over the less targeted random walk movement.

FIGURE 1

Figure 1. The experimental task: Ember’s Game. (A) Participants were instructed to control remote firefighting robots (top left) to save rescue targets (top right), on map-panel tasks (2 displayed at bottom). During the experimental trials, these robots were represented by red firefighter hats (above), with each traveling through one separate building floor to find rescue targets. The primary rescue targets were children (pictured) and the secondary targets were puppies (pictured). The participant was awarded points for each rescue target the robot contacted (10 points for primary child targets, 5 points for secondary dog targets). Each robot stayed within its own separate map-panel task, moved independently of the other robots on other panels, and needed to navigate around walls to get to a target. The semi-autonomous movement of the robot was controlled by the specified decision probabilities above, unless the human intervened to send the robot directly to a location. The participant had an opportunity to intermittently control multiple robots, each on a separate “building floor” (shown as a map-panel task above and in the experimental trials). Occasional human intervention could improve on semi-autonomous movement. Over the course of the experiment, more and more map-panel tasks were added for participants to simultaneously monitor (4–10 maps). (B) Gaze tracking assistive system design. Dark red frame outlined the frame looked at longest ago (most neglected frame), and pale red frame that which was looked at second longest ago.

Participants were instructed to manage multiple independent sections of the burning building at one time, where each building section had its own maze, firefighter, and rescue targets (Figure 1). The set of building map-panel tasks were arrayed across a computer screen in a grid formation (Figure 1). Since the robots were semi-autonomous and could follow human instructions, yet take around 1–10 s to reach their targets, each map-panel task only required user interaction intermittently. When a user interacted with a building map-panel task and satisfied task requirements (“saves” a rescue target), new rescue targets could appear afterwards, which was an opportunity to gain more points. Thus, the probability of these points having re-appeared on that map panel increased with time. Consequently, the user could optimize their own behavior by switching between these tasks to interact with the map-panel task, which required intervention. Each user played multiple experimental blocks of the experiment, first with 4 map-panel tasks, then 5, with up to 10 map tasks, requiring optimal switching between map panels to score well.

While Ember’s game is simple, it particularly taxes spatial working memory used in visual screen monitoring environments and should generalize well to many tasks in which the probability of task utility increases as a function of time since user intervention with the given task element. These methods are potentially beneficial, in part because they may yield improvement via a simple visual reminder for many types of working memory intensive spatial monitoring tasks.

2.2. Manipulated Independent Variable: Cue-Type

We employed eye tracking to produce contingent highlighting via gaze history across the set of all maps (Figure 1). This gaze history was inverted compared to traditional heat-maps for gaze history; in this study, the most neglected map-panel tasks were highlighted for the user. In this gaze-aware display, elements in the display array were highlighted in color-steps, like a gradient of time since the user last looked at the display, with the elements looked at longest ago highlighted most saliently. This gradient estimated probabilities of utility, which notably, need not be independently discoverable by the computer (e.g., via computer vision).

The experiment was a between-subjects design with three groups of participants (one test and two control), determined by three “frame types” that were employed: (1) helpful frame cues, (2) randomly moving frames (“active” control), and (3) no frames (“absent” control). These three experimental groups of participants are detailed further here:

1. Helpful frame cues (“On” experimental condition): our short-term working memory aid algorithm drew a red frame cue around the most neglected map-panel task looked at longest ago, which may thus have been most in need of the user’s attention, and drew a pale red line around the map-panel task looked at the second longest ago in time. To do so, every monitor-refresh, the eye tracker notified the computer and game of the location of gaze. Frame cues for the most neglected map panels were removed and re-updated if the user glanced at or interacted with a map panel. As a backup for eye tracker error, if the user clicked on a map-panel task, the frame was removed, since the user had manually interacted with that map-panel task. In summary, a map-panel task glanced at recently was not highlighted, while a neglected map-panel task glanced at longest ago was brightly highlighted. This history of successive gazes was then used to provide real-time visual cues for more effective task switching between maps, throughout the entire course of the game.

2. Randomly moving frame highlighting (“Random” control condition): mimicked the Helpful condition (On) frames in most other aspects, other than their relationship to user gaze. Using the same physical stimulus as the Helpful condition, two randomly chosen map-panel tasks were highlighted at any given time, and stayed highlighted for a random amount of time between 1 and 2 s, closely approximating the amount of time a frame stays highlighted when a user looks at a given map-panel task. Random highlighting helped to control for novelty or pop-out effects using the same physical stimulus as during Helpful frames, but without the helpful information. Users were notified before they started playing that the red frames were random and irrelevant to the game, and the user was able to choose their own strategy for switching between maps.

3. No map-panel task frame highlighting (“Off” control condition): was employed in the second control condition, and the user chose their own strategy for switching between map-panel tasks.

Each group experienced the same progression of 4 simultaneous maps to manage, up to 10, creating 6 levels of the map-panel task number factor.

2.3. Experimental Procedure

The experimental sequence was as follows: before starting, each participant received and signed an informed consent. Participants were randomly assigned to conditions (Helpful frame cues, Random frames, Off frames). The eye tracker apparatus was explained to participants. Where possible, all experimental groups received identical instructions (except as required for condition-differences) for playing Ember’s game, and any questions were answered (instructions included in Supplementary Material). Before starting, lights were dimmed to a level consistent across participants, for calibration reliability and pupil dilation consistency. Participants were given practice playing, first training 1 building map-panel task at a time, then training with 2 maps at the same time, followed by 3 maps, with 60 s for each block. Then, experimental trials began (different highlighting for each group), with participants playing an experimental block containing 4 map-panel tasks simultaneously, and subsequent blocks containing 5 through 10 maps simultaneously (150 s each block), adding 1 map-panel task per block.

2.4. Measurement and Dependent Variables

A variety of measures were recorded to index working memory load, cognitive load, and distribution of humans’ visual attention. Behavioral performance and reaction times were measured to analyze strategies. Point of gaze was recorded throughout the task. Measures of pupil dilation indexed cognitive load. Many studies have shown that pupil dilation is a reliable measure of cognitive load under certain conditions (Hess and Polt, 1964; Kahneman and Beatty, 1967), with more mental effort typically assumed to be associated with larger pupil size; expanded in the Discussion.

Data logging included: the status of all experimental variables on every refresh (at 30 Hz) during experimental trials. Behavioral data were indexed by location and status of all game elements, such as robot location, path location, target location, and time of target detection. Eye data were indexed by left and right point of gaze on the screen (x, y coordinates) at the refresh rate frequency, the calibration quality data (error quantity) before every new block, and pupil dilation of left and right eye diameter in milliliters at every time-step. Mouse location (x, y coordinates) was recorded at the same frequency for comparison to gaze data.

2.5. Participants

A total of 44 human subjects participated in Ember’s game. All procedures complied with departmental and university guidelines for research with human participants and were approved by the university institutional review board. Participants were compensated for their time with $5 USD. Data were not excluded based on behavioral task performance, in order to obtain a generalizable sample of individual variation on performance of the task, while avoiding a restriction of range (Myers et al., 2010). Two participants with vision correction causing poor calibration quality for entire blocks were excluded, leaving 42 subjects. No data were excluded within this pool of subjects. Minimal pilot data were collected using extended game-play and testing on the experimenters themselves, though these data were not included due to their short length and differing parameters from the experimental subjects. Between-subjects design was selected, to avoid “order” and “training” effects which are present in within-subjects designs, particularly with 2 of 3 groups being controls. Each participant reported past video game experience, current vision correction, age, and sleep measures for the previous several days; this was done after rather than before experimental task participation to prevent bias.

2.6. Technical Implementation

We used a desk-mounted GazePoint GP3 eye tracker to pinpoint the users’ point of gaze, i.e., the point on the screen the user is fixating. This eye tracker has an accuracy of [0.5–1] degrees of visual angle and 60 Hz update rate. Nine-point calibration was performed immediately before every 150 s new block. Python and PyGame were used to program the experiment, and interfaced with the eye tracker’s open standard API via TCP/IP, generously provided by GazePoint (http://www.gazept.com). Gaze on a panel (for the Helpful frames condition) was inferred if the user had been looking at a map-panel task continuously for 10 frames (about 1/3 of a second). Fixations are generally considered to be roughly 80–100 ms, and this 300 ms duration was chosen because it approximated the amount of time a player had to look at a map-panel task to focus on it and obtain the information they need, while also being long enough to significantly reduce the effect of eye tracker error.

2.7. Statistical Analysis

Most statistics are displayed within figures themselves, either (1) as standard error of the mean (SEM) bars, which, in our experiment, conservatively indicate statistically significant differences between groups by approximating t-tests if SEM bars are not overlapping between conditions, as explained below, (2) as pairwise t-tests superimposed on map bias task arrays, (3) as Pearson’s product moment correlation coefficient r and p-values superimposed on scatter-plots, and (4) as effect sizes calculated via Cohen’s d (Table S1 in Supplementary Material).

The t-statistic is defined as the difference between the means of two compared groups, divided by the SEM, (u₁–u₂)/SEM. Thus, within the parameters of this experiment (and above any typical n) it is a mathematical necessity that when the SEM bars do not overlap, a t-test on those same data would be significant above an alpha criterion of around p < 0.03 for a one-tailed t-test for effects in the expected direction (as they were this experiment).

The low number of tests within proposed statistical families, the presence of consistent global trends, and guidelines discussed below, all argue against correcting p-values themselves for multiple comparisons (Rothman, 1990; Saville, 1990; Perneger, 1998; Feise, 2002; Gelman et al., 2012). Further, many statisticians do not recommend numerically correcting for multiple comparisons (Rothman, 1990; Saville, 1990). Rather, it is often recommended to document individual uncorrected p-values, while being transparent that no correction was performed for multiple comparisons. In light of the differing opinions of what defines a statistical family, we provided Bonferroni-corrected alpha thresholds, though these are known to be overly conservative (Perneger, 1998), especially for measures predicted to be correlated, as ours were. The p-values presented for these experiments in Table S2 in Supplementary Material should be interpreted while considering that alpha thresholds would traditionally be significant at p < 0.05, while with seven hypothetical measures in a family the Bonferroni threshold was adjusted to around p < 0.007, with 21 hypothetical tests per family measure the Bonferroni alpha would be adjusted to approximately p < 0.002, or with an evaluation more similar to Dunnett’s method, a corrected measure for 10 tests was around p < 0.004. When these corrections were applied to the p-values found in Table S2 in Supplementary Material, no conclusions were changed. Post hoc, the percentage of significant tests out of the set of all tests can be observed, and evaluated as to whether it deviates or conforms to the statistically expected percentage. This is similar to methods, such as Holm’s, which rank p-values, or methods which evaluate the percentage of significant tests out of the set of total tests, and it should be noted that our conclusions rested not upon a single test, but upon globally uniform patterns.

Data processing and plotting were programed in the R-project statistical environment (Core Team, 2013).

3. Results

Below we detailed the practically relevant results with bearing for applications in human factors and related fields. For a cognitive and mechanistic deconstruction of the effect of our cuing system on human participants, we provided an additional elaborate analysis in Taylor et al. (2015).

3.1. Helpful Frame Cues Improved Performance

Most importantly, our results demonstrated large performance improvements in mean game scores for operators using our memory aid, as compared to the two control groups (Figure 2). The Random frames control condition showed slightly reduced scores as compared to the Off frames control condition, indicating that the randomly moving frames may have been distracting to the users, and validating the importance of a second control. The benefit of the eye tracker appears largest for larger numbers of maps, likely because the eye tracking system compensated for the inability to optimize switching across seven or more panels.

FIGURE 2

Figure 2. Total number of targets acquired (Y-axis), averaged by condition (bottom X-axes) and across number of maps monitored at once (top X-axis in dark gray). Helpful frames were plotted as green triangles, Random as orange squares, and Off frames as blue circles. Error bars show SEM in this and all following figures, and thus, if each is not overlapping between conditions, these indicate statistically significant differences between the groups (t-test equivalent, explained in text). The benefit of Helpful frame cues varied as a function of the number of maps on the screen, and by proxy, the amount of information provided by the Helpful frames. Our Helpful frame cue eye aid demonstrated large improvements in direct performance, as measured by large effect sizes for seven maps and up (Cohen’s d values around 1; Table S1 in Supplementary Material).

3.2. Reaction Times were Improved

As would be predicted by higher scores, participants were also faster at managing their task, measured in several ways. First, reaction times were faster to targets, as measured by the delay from a target appearing, to the user directing the robot toward it (Figure 3A). Second, participants were faster to assist waiting robots, which required user input to actually save the primary targets (Figure 3B). Third, the delay from the time a target spawned until the target was actually acquired for points was shorter in the Helpful frame cue condition (Figure 3C). Interestingly, though total scores in the Off frames control were only marginally better than Random frames control (above), reaction times were reliably worse for the Random condition than the Off frames condition; compare Figure 2 with Figure 3.

FIGURE 3

Figure 3. Participants were faster to manage their game tasks. (A) Users were faster to set paths to send robots toward targets after the target’s appearance, (B) robots waited less time for human input at the primary target, and (C) users were faster to actually acquiring points after target appearance. These results indicated that participants more quickly satisfied task requirements with the Helpful frame cue aid.

3.3. Global Bias was Reduced

When interacting with this simulation, like many real-world tasks, it is often important to eliminate biases when these biases are either artifactual or irrational. In this case, a bias would take the form of greater time spent attending to a single map-panel task, over the time spent looking at equally relevant other map-panel tasks on the screen at the same time. We employed two possible measures of bias. The first measure averaged the duration of time spent on each map-panel task, with a set of averages for each map-panel task in each experimental block (each experimental block has a different number of map-panel tasks). This first measure took the form of a heat overlay displaying the cumulative time the gaze spent on each map-panel task during the entire block (Figure 4A). In this exemplar case of 10 map-panel tasks, most subjects in Random and Off conditions were biased toward the maps in the middle of the screen (darker colors in the middle squares), while attention was more evenly spread in the Helpful frame experimental condition (more evenly spread heat map over the whole array). This measure agreed with previous literature that human subjects have an “edge effect,” being biased toward the center (Parasuraman, 1986). One limitation of this grand-mean measure is that variability is primarily sensitive to spatial biases which occur across all subjects.

FIGURE 4

Figure 4. Helpful frame group was less biased to particular frames. (A) Total duration of time spent on each map in block with 10 map-panel tasks. Conditions were represented by three diagonal-super-arrays of 10 maps (Off, On, Random; blocks 4–9 not depicted), while each individual square was a map-panel. Dark panels were looked at for longer times than light panels. Statistical comparisons (t-test; p-values in squares) of each map and condition at matrix intersection in bottom left three super-arrays (On versus Off; Random vs. Off; On vs. Random). Random and Off controls were biased toward the center map-panel task, with more even distribution in Helpful frames. (B) Which map is biased toward may vary across subjects. A bias measure insensitive to which map was biased, variability (SD), was increased in both control groups, and when compared to Helpful assistance, illustrated lesser bias in the Helpful (On) condition. (C) Helpful eye tracking frames reduced variability (bias) in mouse duration per map.

Thus, to systematically quantify any biases at an individual subject level, we calculated the standard deviation (SD) of total times on each panel, across all map-panel tasks. This described the variation in cumulative time the eyes spent on each of the maps in an experimental block and map number condition (Figure 4B). For example, in the seven maps Helpful condition, the set of cumulative times across each of the ten maps had a low variability and thus each map-panel task was looked at for a similar amount of time, while in the control conditions the set of cumulative times across each of the ten maps had a higher variability indicating that there was greater bias toward some maps away from others. Confirming the eye gaze results, a similar reduction in bias was observed for the time the mouse spent over each map-panel task (Figure 4C). This decreased variability of total time spent on any single map-panel task compared to the rest indicated improved consistency of time spent on each map-panel task with the eye tracker, while participants favored some map-panel tasks (robots) irrationally when they did not have the eye tracker’s help.

It is notable that not all bias is bad, since some task elements may be more important or require more frequent input than others. Current experiments in our lab automated rational biases by unevenly weighting delay for elements of the array panel, providing a gaze-contingent system to distribute gazes accordingly.

3.4. Measures of Cognitive Load were Reduced

To assess measures of cognitive load, pupil dilation was recorded, both over the course of each block and averaged across blocks. We demonstrated the dynamics of pupil dilation over the course of a trial, using data from the “7 map-panel tasks” block as an exemplar, since this is where behavioral benefits started to convincingly appear. To do so, three time-traces for pupil dilation for each condition were plotted over time (Figure 5A). For 7 map-panel tasks, the Helpful condition had the lowest pupil dilation relative to the other conditions, while Off frames was in the middle, and Random frames had the largest pupil dilation. Pupil dilation was then collapsed over the entire trial, for each condition and number of maps. Pupil dilation was reliably smaller with Helpful frames than in the Random frames condition (Figure 5B); though not predicted, pupil dilation did not significantly differ between the Off frames and Helpful frames condition. This finding confirmed the benefit of including a second control group using the same physical condition with no information (Random frames). It is possible that the Random frames were distracting, and took cognitive effort to ignore, compared to the Helpful frames. Variations in pupil dilation over map number could be generally explained by experience, novelty, fatigue, or training, though since this factor was not experimentally manipulated to explore the effect of number of items on the screen, reliable interpretations of map number effects can not be drawn. Interestingly, when pairing pupil dilation and score for each individual subject, within the Helpful condition, larger pupils were associated with higher total scores (Figure 5C). These results suggest operators exerting more “effort” had larger pupils.

FIGURE 5

Figure 5. Pupils were larger in the Random frames condition. (A) Y-axis was pupil dilation (in pixels) across time (X-axis) for an entire 150 s trial for 7 maps. Each line trace was plotted as a condition mean (Off, On, Random). (B) For a more fine-grained analysis, pupil size was averaged across each condition for each block (number of maps). (C) Each dot represented one participant’s data, with score on the Y-axis and pupil dilation on the X-axis. Within the Helpful frames condition, larger pupils associated with better performance, perhaps due to greater cognitive effort.

3.5. Participant Sample Statistics

Lastly, we thoroughly confirmed there were no incidental differences between subject groups in each condition for features known to influence experimental performance. To do so, we tested the null hypothesis that each group had the same population mean using ANOVA for the following measures: (A) hours of sleep in the previous week did not differ (F = 0.2, p < 0.8), (B) age in years (mean = 26) did not differ (F = 1.2, p < 0.3), and (C) multiple measures of video game experience did not vary between conditions, as measured by post-experimental surveys assessing multiple measures of gaming frequency (F = 0.8, p < 0.5 – days/year; F = 0.8, p < 0.5 – h/week), and gaming history (F = 0.1, p < 0.9 – duration; F = 0.5, p < 0.6 – age started playing).

4. Discussion

4.1. Real-World Performance

Our gaze tracking algorithms improved human operator performance, with very large effect sizes, both for pure performance scores, and for reaction times (Cohen’s d; Table S1 in Supplementary Material). These improvements have a good chance of being reflected in real-world applications of such an algorithm, because the scenarios tested here were designed to have realistic features and be difficult. For example, humans are reliably worse at dual-target compared to single target searches, even though such search needs arise in every day life and in some work settings, e.g., when scanning x-rays for both explosive devices and metal objects (Menneer et al., 2012). To relate, our task employed such a dual-target search for conjunction targets. While pop out type searches (a stand-out object) are fast and hypothesized to be performed quickly in parallel pre-attentively, conjunction searches (multiple features) are serial and slow (Treisman and Gelade, 1980). Our study included conjunction stimuli that required a more difficult serial search to find. Conversely, the red frame cue acted as a pop out stimulus and was thus very easy to find.

4.2. Bias

There are a myriad of biases which can be adopted by a user, and visual biases are some of the most well known. For example, search tends to be biased toward central regions of available visual space, coined an “edge effect” (Parasuraman, 1986). We observed this type of effect in the two control conditions, and our assistive system greatly reduced this bias (Figure 4A). Visual biases, likely derived from reading, have also been observed for the upper left of a display (Megaw and Richardson, 1979), which we may have also observed. Further, individuals may vary in biases for or against different task elements, but these may not be present in group means. To address this issue, we demonstrated that the variability in looking time at each task was reduced in the assistive frames condition compared to either control, confirming more strongly that our system reduced user bias. It should be noted that not all bias is undesirable, and we extended these concepts in the Future Directions Section below.

4.3. Task Switching

Human operators appear to be unable to reliably make greater than two decisions per second (Craik, 1948; Elkind and Sprague, 1961; Fitts and Posner, 1967; Debecker and Desmedt, 1970). Our system potentially eliminates one decision per second, or per map switch, a non-trivial benefit. Visual cues for task switching may assist operators (Allport et al., 1994; Wickens, 1997); this is particularly true when cuing important tasks (Wiener and Curry, 1980; Funk, 1996; Hammer, 1999). However, we are the first to implement such reminders via eye tracking in a manner that can easily be implemented across domains and platforms. In many multitask scenarios, like 8 or more maps here, it is likely optimal to have an automated task-management strategy. Our algorithm leverages these phenomena to optimize the primary task, while secondary tasks (remembering order) are also beneficial to perform, this can be automated reliably. Our frame cue aid appeared to assist participants to more fully attend to a single map at a time, with efficient task switching between maps. Further, decision-making contributes to cognitive load.

4.4. Cognitive Load

In addition to attention, eye tracking can also be used to asses several nebulous mental states, including that of cognitive effort. For example, over 50 years of research suggests measuring pupil dilation over time (pupillometry) is a reliable measure of cognitive load, where larger pupils indicate greater load or arousal in controlled lighting conditions (Hess and Polt, 1964; Kahneman and Beatty, 1966, 1967; Bradshaw, 1968; Simpson and Hale, 1969; Goldwater, 1972; Hyona et al., 1995; Granholm et al., 1996; Just et al., 2003; Granholm and Steinhauer, 2004; Van Gerven et al., 2004; Haapalainen et al., 2010; Piquado et al., 2010; Laeng et al., 2012; Wierda et al., 2012; Hwang et al., 2013; Zekveld and Kramer, 2014; Zekveld et al., 2014). Our system also appeared to reduce cognitive load, as illustrated by reduced pupil size (lower cognitive effort) in the Helpful condition compared to the Random control (Figure 5). These results, in particular, were not unequivocal, though the general trend matched expectations.

4.5. Novelty of the Real-Time Eye Tracking System

Previous work has used point of gaze to update displays in real-time, contingent on pupil size or on location of gaze. Both types are summarized here.

4.5.1. Pupil Size in Real-Time

The DARPA augmented cognition initiative primarily evaluated pupil dilation as a measure of cognitive load, some in real time (Marshall, 2002; Marshall et al., 2003; St John et al., 2003; Taylor et al., 2003; Raley et al., 2004; St. John et al., 2004; Johnson et al., 2005; Mathan et al., 2005; Russo et al., 2005; Ververs et al., 2005; De Greef et al., 2007; Coyne et al., 2009). Though some attempted to use gaze location (Barber et al., 2008), none were successful.

4.5.2. Real-Time Eye Tracking as an Experimental Tool

Contingent eye tracking modifies computer displays in real-time based on gaze location and was traditionally used in psychology experiments, though usually to impede users, not to optimize performance. During the development of eye tracking technology, psycholinguists used methods, such as the moving window paradigm (Reder, 1973; McConkie and Rayner, 1975), which interferes with the participant by turning the upcoming periphery into noise or random stimuli while reading, to reduce parafoveal preview. Many other paradigms have been employed to impede participants, such as the moving mask paradigm (Rayner and Bertera, 1979; Castelhano and Henderson, 2008; Miellet et al., 2010), the parafoveal magnification paradigm (Miellet et al., 2009), or a central hole (Shimojo et al., 2003). Interfering with performance can successfully function to probe participant’s abilities in experiments. However, none attempted to improve performance, instead degrading it.

4.5.3. Real-Time Gaze Control

Gaze-based systems to control computer displays, wheelchairs, or other robots have been extensively developed. These methods often aimed to move cursors, wheelchairs, accessories, graphical menus, zoom of windows, display context-sensitive presentation of information, while also including systems for mouse clicks (Jacob, 1990, 1991, 1993a,b; Jakob, 1998; Zhai et al., 1999; Tanriverdi and Jacob, 2000; Ashmore et al., 2005; Laqua et al., 2007; Liu et al., 2012; Sundstedt, 2012; Hild et al., 2013; Wankhede et al., 2013). Such systems have also been employed for robot and swarm robot control (Carlson and Demiris, 2009; Couture-Beil et al., 2010; Monajjemi et al., 2013). Many of these methods were primarily for control, though could be interpreted to have some assistive component, such as displaying context-sensitive information. Our gaze-aware system can improve performance, without direct input, and may assist operators in a variety of scenarios, both control and non-control. This purely assistive (rather than control) algorithm may complement some of these control systems.

4.5.4. Gaze-Aware Assistive Systems

Gaze-aware systems are both much less common, and the interfaces have been domain specific, such as those used for reading, menu selection, view scrolling, or information display (Bolt, 1981; Starker and Bolt, 1990; Sibert and Jacob, 2000; Hyrskykari et al., 2003; Fono and Vertegaal, 2004, 2005; Iqbal and Bailey, 2004; Ohno, 2004, 2007; Spakov and Miniotas, 2005; Hyrskykari, 2006; Merten and Conati, 2006; Kumar et al., 2007; Buscher et al., 2008; Bulling et al., 2011). Few human–robot studies have taken gaze location into consideration for improving human task performance (DeJong et al., 2011), with such studies also being limited to application-specific industrial goals. Often such systems predict the participants’ gaze location in tasks like map scanning, reading, eye-typing, or entertainment media (Goldberg and Schryver, 1993, 1995; Salvucci, 1999; Qvarfordt and Zhai, 2005; Bee et al., 2006; Buscher and Dengel, 2008; Jie and Clark, 2008; Xu et al., 2008; Hwang et al., 2013). Some studies have succeeded at domain-general methods of prediction (Hwang et al., 2013); however, these methods all require advanced computational methods, such as image processing and machine learning. We argue that eliminating prediction and focusing on describing gaze history yields more general utility. Very few studies have attempted to create domain-general assistive systems via real-time eye tracking. However, such attempts have had drawbacks, for example, by obscuring the display permanently (e.g., make everything glanced at obscured), or only applying to search, rather than monitoring or control tasks (Pavel et al., 2003; Roy et al., 2004; Bosse et al., 2007). The system presented here improves upon these previous deficiencies; it is domain-general, easy to use, closed-loop, non-predictive, does not require advanced computational methods, such as computer vision, and most importantly, is effective. Despite the need and benefit from this situation-blind multitasking aid, and the seeming obviousness of our solution, no domain-general solutions such as this exist to date. The experimental results showed large benefits, and there is potential for wide generalization.

4.6. Behavioral Mechanisms and In-Depth Discussion

We provide an extended in-depth eye tracking analysis of these studies, as well as further discussion of prior assistive systems, divided attention, bias, supervisory sampling, search, task switching, working memory, augmented cognition, contingent eye tracking, gaze-control, gaze-aware displays, and other topics in Taylor et al. (2015).

4.7. Conclusion and Future Directions

Our real-time gaze aid was successfully domain-general in an unprecedented way, succeeding where previous studies fell short, either by being application-specific, control oriented, or by interfering with the user in a non-intuitive manner. It is likely that our algorithm, of displaying the inverse of gaze recency, could benefit many multi-monitoring or tracking tasks where probability of utility relates to duration since last attention to an element. These applications might include process control, multitasking factory operations, multi-robot control, multi-panel surveillance, aviation, air traffic control, driving, military, mobile search and rescue, or others.

For future work, user-assisting systems should be flexible to the task needs, i.e., should be able to choose and change how frequently the cues appear, depending on the frequency with which agents need input. Currently, our lab has developed a platform-independent general overlay application, which includes weighting the value of each task in the array to be looked at. For example, some dials might be more important than others or require more frequent input. Further, to ensure compatibility with a variety of tasks, we incorporated varying colors, levels of visual transparency, and types of cue. Also, user-assisting systems should be practical and able to be implemented broadly, and thus we incorporated the use of the mouse rather than eye in a similar application, for broad use without eye tracking.

Data Sharing

Source code for the Ember’s game experiment is provided under the open GPLv3 license: https://gitlab.com/hceye/HCeye/

Author Contributions

PT designed experiment, with assistance from NB, HS, and ZH. NB programmed experiment, with contributions and edits by PT. PT and ZH ran human subjects. ZH and PT programmed data analysis. PT wrote manuscript, with contributions from HS, ZH, and NB. HS supervised all research.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Supplementary Material

Detailed experimental procedures can be found online at http://journal.frontiersin.org/article/10.3389/fict.2015.00017

Funding

We thank the Office of Naval Research, award: N00014-09-1-0069.

References

Allport, D., Styles, E., and Hsieh, S. (1994). “Shifting intentional set: exploring the dynamic control of tasks,” in Attention and Performance 15: Conscious and Nonconscious Information Processing, eds C. Umilt and M. Moscovitch (Cambridge, MA: The MIT Press), 421–452.

Google Scholar

Ashmore, M., Duchowski, A. T., and Shoemaker, G. (2005). “Efficient eye pointing with a fisheye lens,” in GI ’05 (Waterloo, ON: Canadian Human-Computer Communications Society, School of Computer Science, University of Waterloo), 203–210.

Google Scholar

Barber, D., Davis, L., Nicholson, D., Finkelstein, N., and Chen, J. Y. (2008). The Mixed Initiative Experimental (MIX) Testbed for Human Robot Interactions with Varied Levels of Automation. Technical Report, Orlando, FL: DTIC Document.

Google Scholar

Bee, N., Prendinger, H., André, E., and Ishizuka, M. (2006). “Automatic preference detection by analyzing the gaze ‘Cascade Effect’,” in Electronic Proceedings of the Second Conference on Communication by Gaze Interaction (Turin: COGAIN), 63–66.

Google Scholar

Bolt, R. A. (1981). “Gaze-orchestrated dynamic windows,” in SIGGRAPH ’81 (New York, NY: ACM), 109–119.

Google Scholar

Bosse, T., van Doesburg, W., van Maanen, P.-P., and Treur, J. (2007). “Augmented metacognition addressing dynamic allocation of tasks requiring visual attention,” in Foundations of Augmented Cognition, eds D. Schmorrow and L. Reeves (Berlin: Springer), 166–175. doi: 10.1007/978-3-540-73216-7_19