Similarities and Differences Between Eye and Mouse Dynamics During Web Pages Exploration

The study of eye movements is a common way to non-invasively understand and analyze human behavior. However, eye-tracking techniques are very hard to scale, and require expensive equipment and extensive expertise. In the context of web browsing, these issues could be overcome by studying the link between the eye and the computer mouse. Here, we propose new analysis methods, and a more advanced characterization of this link. To this end, we recorded the eye, mouse, and scroll movements of 151 participants exploring 18 dynamic web pages while performing free viewing and visual search tasks for 20 s. The data revealed significant differences of eye, mouse, and scroll parameters over time which stabilize at the end of exploration. This suggests the existence of a task-independent relationship between eye, mouse, and scroll parameters, which are characterized by two distinct patterns: one common pattern for movement parameters and a second for dwelling/fixation parameters. Within these patterns, mouse and eye movements remained consistent with each other, while the scrolling behaved the opposite way.


INTRODUCTION
Websites, and more particularly web pages, refer to a type of stimulus we potentially see every day. Such stimuli are rarely entirely visible, hence the fact that we cannot fully explore them using only our eyes. That is one of the reasons web browsing on a desktop computer requires the use and coordination of the eyes and the computer mouse. On the one hand, the eyes are used to explore and extract information of interest, such as the location of items. On the other hand, the mouse is used to interact with the content. This interaction can take multiple forms, including clicks, scrolling, and drags and drops. While clicks and drags and drops allow the user to perform actions on the visible content, scrolling drives which part of the web page is displayed. These characteristics specific to web pages induce more complex behaviors, as well as more challenging issues to address. One particularly interesting aspect is how the mouse relates to the eyes.
Eye movements have been extensively studied. For instance, we know that a fixation last in average 250-350 ms (Mackworth and Morandi, 1967;Yarbus, 1967) and that visual exploration is modulated by bottom-up and top-down factors regardless of the stimulus type (Yarbus, 1967;DeAngelus and Pelz, 2009;Helo et al., 2014;Itti and Borji, 2015). Bottom-up factors are characterized by low-level features of the stimulus, such as luminance, contrast, or edges (Tatler and Vincent, 2008), while top-down factors are characterized by high-level properties representing cognitive processes (Henderson and Hollingworth, 1999). It is generally assumed that the interaction between bottom-up and top-down factors influence how we orient our visual attention (Theeuwes and Failing, 2020). In that sense, top-down factors are usually addressed as factors influencing bottom-up ones and are not considered as totally distinct factors (Theeuwes and Failing, 2020). Furthermore, Still and Masciocchi (2010) pointed out that most of web-specific biases were top-down and were mainly related to learned behaviors. Web pages often follow a similar template: a header with main sections of a website, a content with left or right bar, and a footer at the end of the web page. Thus, users developed strategies to maximize their efficiency in visual exploration (Buscher et al., 2009). Nielsen (2010) observed that users tend to spend more time on the left part of a web page than on the right one. He also observed this behavior on right-to-left reading web pages. A more recent study from Fessenden (2017) showed a similar behavior on search engine result pages (SERP). Nielsen (2006) ran a usability experiment during which he analyzed which part of a web page users were looking at. He observed a recurring viewing pattern in the shape of the F letter. People started their browsing at the top-left corner of the web pages and read horizontally, then they were scrolling down to read a second time horizontally to finally scan the content vertically. Both factors have been widely investigated during website exploration in order to better understand user behavior and thus improve the usability of web pages. For instance, Pan et al. (2004) showed differences in visual exploration depending on the type of website, their presentation order and the gender of the user. They did not find any difference between a memorization and a free viewing task, highlighting the importance of adapting a website to its targeted audience. In his work, Tullis (2007) found that older users spent more time looking at a page content, especially navigational areas, compared to younger users. Additionally, Roth et al. (2013) showed that user expectations had an influence on visual exploration, and, more particularly, less fixations were needed to find items in expected locations compared to unexpected ones.
These studies clearly show an influence of bottom-up and topdown factors. However, Tatler and Vincent (2008) and Anderson et al. (2015) show that bottom-up influence was higher at the beginning of visual exploration. Thus, both factors alternatively influence visual exploration (Henderson, 2003;Torralba et al., 2006). As such, Cronin et al. (2020) encouraged the need to focus more on the dynamic of eye movements. They showed that the study of global eye movement parameters could not necessarily be used to distinguish different experimental conditions. To do so, they compared fixation durations and saccade amplitudes between a memorization task and an esthetic judgment task. While they did not find differences in the mean level analyses, the use of temporal and distributional analyses allowed them to discriminate the two tasks.
Previous research already highlighted the dynamic of eye movements (Unema et al., 2005;Pannasch et al., 2008;Pannasch and Velichkovsky, 2009). They found that the amplitude of saccades decreased while the duration of fixations increased over time. Pannasch and Velichkovsky (2009) and Velichkovsky et al. (2002) defined two visual exploration modes based on the relationship between saccade amplitudes and fixation durations. The ambient mode corresponds to short fixations (<180 ms) followed by saccades with an amplitude >5 • , while the focal mode corresponds to long fixations (>180 ms) followed by saccades with an amplitude of <5 • . Generally, visual exploration begins in ambient mode before gradually switching to focal mode (Velichkovsky et al., 2002;Pannasch and Velichkovsky, 2009). Our knowledge on these visual modes is growing but still incomplete. For instance, we know that a fixation last in average 250-350 ms (Mackworth and Morandi, 1967;Yarbus, 1967) and that visual exploration is modulated by bottom-up and top-down factors regardless of the stimulus type (Yarbus, 1967;DeAngelus and Pelz, 2009;Helo et al., 2014;Itti and Borji, 2015). A closer understanding of these two modes could help to better grasp the dynamic of eye movements when looking at complex stimuli, such as web pages. More specifically, in addition to eye movements, it would also be of interest to use these two visual modes to investigate the dynamic of mouse movements.
To our knowledge, despite the fact that the use of the computer mouse is well studied, its dynamic is rarely considered. Generally, research on the computer mouse focuses on how mouse movements could reveal users' intentions. Its availability and its potential for scalability enable innovative applications, such as authentication (Zheng et al., 2011), the prediction of the users' cognitive load (Rheem et al., 2018), the prediction of users' intentions (Guo and Agichtein, 2010;Fu et al., 2017), or pattern behavior analysis (Tzafilkou and Protogeros, 2018). One of the most studied topics is the computer mouse movement patterns commonly used by participants when browsing. Tzafilkou and Protogeros (2018) reviewed six patterns: the straight pattern (Griffiths and Chen, 2007), the hesitation pattern (Mueller and Lockerd, 2001), the horizontal reading pattern (Rodden et al., 2008), the vertical reading pattern (Rodden et al., 2008), the random pattern (Ferreira et al., 2010), and the fixed pattern (Griffiths and Chen, 2007).
Whether it is necessary to describe mouse movement patterns or their dynamic, mouse movements are not limited to moving the mouse and include scrolling as well. However, contrary to mouse movements, scrolling behavior has, to our knowledge, not been closely examined. For instance, Liu et al. (2017) investigated users' strategies when navigating SERP through their scrolling behavior. An SERP consists of a list of links corresponding to a query entered by a user in a search engine. Liu et al. (2017) analyzed the number of scrolls and their direction. In their work, Braganza et al. (2009) evaluated user preferences depending on the web page layout and the scrolling mechanism using the number of scrolls and their total duration. More generally, these studies show that the mouse is a convenient and cheap way to infer users' cognitive processes, such as intentions or reading strategies. However, these studies mostly focus on users' strategies and do not tackle quantitative analyses of mouse and scroll parameters. Such extensive statistical description could provide a baseline of typical behavior when exploring web pages and could be used to assess more precisely strategies or any other behavior.
These limitations can also be found when it comes to the relationship between the eye and the computer mouse. To this day, one of the most studied web stimuli for investigating this relationship is the SERP. On this type of web page, the coordination between the eyes and the computer mouse is higher for the vertical axis of the screen than for the horizontal axis (Rodden and Fu, 2007;Guo and Agichtein, 2010). However, this relationship remains uncertain, considering that the mouse could be used as a means to mark a potential result previously located with the eyes (Rodden et al., 2008). Furthermore, the amount of time spent by a user on an SERP can affect the location of the gaze and the mouse during the exploration (Huang et al., 2012). Navalpakkam et al. (2013) designed a model to predict the location of the eyes based on the mouse location and showed that the correlation between the eyes and the mouse is nonlinear and user dependent. More specifically, this correlation has been found for time periods during which a user looked at an area of interest (AOI) and when switched between AOIs. However, SERPs are not representative of the web and remain transitional web pages to access a content on a different website. As a matter of fact, users spend a significant cumulative amount of time on SERPs, but in short bursts of time. When focusing on common web pages, the eyes and the mouse are also coordinated on the vertical axis, and the scroll speeds influence the position of the eyes during scrolling (Milisavljevic et al., 2018). The participant is looking at the opposite part of the screen when scrolling at a high speed. Moreover, the presence of the cursor in a region of the screen correlates with the probability that the participant is fixating on this region (Chen et al., 2001). To better estimate if the eyes and the mouse are coordinated, Boi et al. (2016) generalized the work of Navalpakkam et al. (2013) by defining that the eyes and mouse must be positioned over the same content. This new definition allowed them to improve the predictive power of the models of Guo and Agichtein (2010) and Huang et al. (2012) when applied to classic web pages. Finally, when it comes to the coordination of the eyes and scrolling, web pages are not of primary interest. That is why, to our knowledge, no studies tackle the coordination between the two outside the reading field (Kumar et al., 2007;Sharmin et al., 2013).
The goal of our study was to contribute to this growing area of research by exploring the similarities and differences between movement of the eyes and computer mouse on web pages. First, we introduced a new segmentation threshold in order to differentiate two mouse movements or scrolls as precisely as possible. Then, with this new segmentation, analyses from eye movement methodology were applied to mouse movement and scrolling parameters. This methodology allowed us to investigate the influence of the tasks (free viewing and visual search) on eye, mouse, and scroll parameters. Beyond these global analyses, we also considered the influence of time on the dynamic of each type of movement through visual exploration modes.
Participants reported normal or corrected to normal vision and were naive about the purpose of the study. They were right-handed or accustomed to using a computer mouse with the right hand. A majority were undergraduate students from the psychology institute at the Université de Paris. Participants were compensated either by course credit or a 15 euro gift card. All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee (local Ethics Committee of Paris Descartes University, No. CER-PD: 2018-77) and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. All subjects gave written informed consent.

Apparatus
Eye movements were recorded using an Eye-Link 1000 Plus (SR Research Ltd., Canada) at a 1,000 Hz sampling rate with 0.05 • precision. We recorded the right eye of the participants with a 35 mm monocular lens. Mouse movements were recorded with a standard USB optical mouse with a 125 Hz polling rate. Stimuli were displayed on a 24.5 inch LCD computer screen with a 1,920 × 1,080 pixel resolution and a 144 Hz refresh rate. The experiment was run using Python 2.7 with Pylink from the manufacturer and Chromium 64.

Stimuli
In this experiment, 18 web pages (see example in Figure 1) from 18 different websites were randomly presented to the participants. The web pages had a width of 1,920 pixels and their total height was between 5,000 pixels and 19,230 pixels (M = 6, 405px, SD = ±2, 673px). Participants were allowed to freely move the mouse, scroll, or click. However, hyperlinks and content animations were deactivated, thus participants could not leave the displayed web page. The presented web pages and their topic were arbitrarily chosen, including blogs, front pages, textual pages, articles (see example in Figure 1). We ensured that each selected web page followed several criteria to minimize biases. The first criteria was the language of the website. We ensured stimuli were from French websites.The second criterion was about the websites' news content. Since this study was run over several months, a web page could not have any content referring to current events or content related to a season, date, holiday, celebration, etc. As the third criterion, we checked that the web pages did not have any external advertising. In contrast to the first three criteria, which were respected on all web pages, the following criteria were counterbalanced between web pages. As Bruyer et al. (1987) explained, faces are handled differently by our brain during visual exploration. To this end, we made sure that we keep a balance of faces between the web pages. We also made sure that a balance was maintained for images, texts, general layout, and total length of the web page to have stimuli with different content types and organization. Finally, as described in the following paragraph, we gave targets already present within the original web page. Thus, we checked the number of targets available on the web page and their distribution across the page.

Tasks
Participants had to perform a total of nine free viewing tasks and nine visual search tasks randomly distributed on the 18 websites following a uniform distribution. Thus, each participant executed one task per web page. The balance of tasks per web page was ensured before any analyses. During the free viewing task, the participants were instructed to explore the web page freely for exactly 60 s. This duration was chosen after multiple trials and errors to provide enough data for the study of long browsing. Thus, participants had enough time to fully explore the web page. In the visual search task, participants were asked to find a target in an arbitrarily maximum of 2 min. The participants did not know how many targets there were but we informed them that there were up to three targets, with at least one, per web page. As previously defined, the targets were icons or images present on the original web page. Moreover, the targets were equally distributed between the top, middle, and bottom of the web page, and could be found on the sides, or in the content, header, or footer.

Procedure
In a quiet room, with constant luminosity, the participants were instructed to position their head on a chin rest in front of a computer screen at a viewing distance of 57 cm. The experiment then began with practice trials, one for each task. After this phase, the participants' right eye was calibrated at nine points and this was repeated until the error value was below 1 • . Once the calibration was successfully complete, the participants had to click on the next trial with the mouse on a 3 × 6 table, as shown in Figure 2. Then the instructions were displayed on a new screen with a button to launch the trial. The position of this button was randomly chosen in order to avoid bias related to the first fixation commonly being at the same position as the button launching the trial. Furthermore, to ensure the web page would have completely loaded before the trial started a 3-s countdown was added to the button launching the trial. The countdown only began after the page entirely loaded, thus visual elements displayed after few seconds could be avoided. During this phase, the participants were informed of the presence of maximum three targets when carrying out the visual search task. After clicking on the button, the web page was displayed for 60 or 120 s, depending on the task. During the visual search task, the participants had to click on the targets when they founded them. If the image clicked was one of the targets, a green rectangle surrounded the target to indicate that one of the targets had been found. The participants were instructed to press the space bar on the keyboard when they thought they had found all the targets. After 1 min of the free viewing task, and 2 min or after the space bar was pressed in the visual search, the recording was stopped, and the 3 × 6 table displayed at the beginning was displayed again. Between each trial a 5-point calibration was performed. A 9-point calibration was initiated after the ninth trial, or if any problems occurred during the experiment. FIGURE 2 | Example of screen on which participants had to click the next item to get the instruction. The white button indicates a website not yet visited, the green button a website already visited, and the blue button the next website to visit. Only the blue button was clickable.

Data Cleaning
Data from 12 participants who did not finished the experimental protocol due to calibration problems were discarded. Among the remaining 139 participants (2,502 trials), due to problems encountered during the experiment, such as calibration problems, participants talking during a trial, external noise, etc., we removed 4.88% of all trials (122 trials). The remaining data (2,380 trials) was then pre-processed and cleaned in three steps. The first step was only applied to the visual search task. The two last seconds of recording were removed in order to deal with the moment the participant looked at the keyboard when pressing the space bar. In addition, and for the same reason, residual fixations below the screen at the end of the exploration were removed. Throughout the second step, blinks and fixations under 100 ms around a blink were cleaned (Holmqvist et al., 2011). During the third and final step, fixations with a visual angle of more than 3 • from the screen's border were deleted. Fixations outside the screen, but below the 3 • threshold, were reset to the corresponding border of the screen. These three steps led to deletions within all the trials. All 139 participants, and 95% of the initial trials (2,378 trials), were kept. In total, 91.74% of all records were retained for analyses. Finally, only the first 20 s were selected for this work, and 18 more trials were deleted due to insufficient mouse moves or scrolling events (2,360 trials remaining). It should be noted that eye movement analyses were run on aggregated data, and scrolling and mouse events on raw data. All analyses were carried out using Python 3.6.

Events Segmentation
There are a number of well-established, and ever improving, methods to label raw data from eye recordings. However, mouse and scroll recordings lack such a method, specifically to differentiate two close events. While it is easy to determine if two mouse or scroll events separated by 2 or 3 s are indeed two distinct events, doing the same operation for 2 events with, for instance, <1 s in between is much difficult.
In the literature, we can find multiple attempts to define a threshold allowing the differentiation of idle time and movement of the mouse. Since the mouse is a pointing device, a simple threshold seems to be appropriate, contrary to eye movements that are more complex. In their attempt to define a new behavioral biometric technique based on mouse movements, Gamboa and Fred (2004) differentiated two mouse movements as a pause in the user's interaction when the two consecutive events were separated by more than 100 ms. In their work, Reeder and Maxion (2006) arbitrarily considered a threshold of 3 s with to the user being silent and inactive (with both the mouse and the keyboard) in order to propose a method to detect user difficulties when using an interface. On the other hand, Feher et al. (2012) empirically set this threshold to 500 ms to categorize mouse movements and thus uniquely identify users. More recently, Seelye et al. (2015) studied cognitive impairment using computer mouse movement patterns. They mentioned a median idle time, which is the time spent idling or pausing between mouse movements, of 310 ms. In the continuity of the work of Gamboa and Fred (2004), Antal and Egyed-Zsigmond (2019) used a threshold of 10 s to segment mouse movements and used them to detect intruders on a computer.
Moreover, several studies focused specifically on scroll segmentation (Braganza et al., 2009;Brady et al., 2018;Milisavljevic et al., 2018). In their study into the scrolling behavior, Braganza et al. (2009) determined that two scrolls recorded within 1 s of each other were considered as a single scroll. To set this threshold, they tried values ranging from 200 ms to 4 s, with increments of 100 ms. They did not find any major differences between these timings, and consequently chose 1 s as a threshold. In their study, Milisavljevic et al. (2018) defined a scroll session as a set of continuous scroll events ended with a mouse movement. On the topic of scrolling when reading, Brady et al. (2018) sampled a frame every 100 ms to check if the displayed text had moved. If it had moved more than half a line between one sentence and the next, it was counted as a scroll. Even though the presented techniques try to segment scrolling or mouse events, these techniques are mostly based on arbitrary thresholds. Thus, our goal is to propose a better approach of mouse and scroll events segmentation to provide more robust analyses.
If we take a closer look at our previous attempt to segment events, we defined a threshold based on the events number rather than the time (Milisavljevic et al., 2018). This definition does not take into account all parameters that come into play when interacting using mouse or scroll. The main parameter is the fact that, on a desktop, it is possible to move the mouse during a scroll. In such a case, a single scroll would be labeled as two different scrolls. The bias will remain if the participant uses the browser scroll bar, which allows the user to grab a bar on the right of the browser and scroll by moving it up or down. Furthermore, Brady et al. (2018) used a spatial threshold of 40 pixels to identify when a user was scrolling, but this is applicable to mouse movements. In addition to highlighting the need to use a time-based threshold, all previously mentioned studies did not correctly handle stops and micro-stops. A stop is a period of time during which the user does not move the mouse or scroll. During this idle time, the user explores the web page and processes it. But based on this definition, a new question arises: what is the minimal length of this period of time to give the user enough time to process the stimulus and make the decision to keep moving, scrolling, or stop entirely? In other terms, how can we differentiate microstops from the movement itself? A micro-stop is an interruption during the action which is long enough to allow the user to make a decision, but this is not visible to the eye. To differentiate microstops from movements, we looked at the study from Moher and Song (2019) in which they compared behaviors between a 3D reach tracker, a computer mouse, and a stylus. Among multiple conditions, they measured the average response latency of 220 ms when displacing a target. This could be considered as the minimum time to visualize a target's new position and make the decision to reorient the movement. Thus, a micro-stop could not be <220 ms, and a stop below this threshold should be considered as the continuity of the previous action. We used a unified threshold to segment mouse movements and scrolls. We chose a threshold of 300 ms to differentiate two distinct movements or scrolls. This corresponds to the average visual fixation duration in a scene viewing (Henderson and Hollingworth, 1998). Despite the fact that visual fixations can be shorter than 300 ms, this does not apply to ecological conditions and semantic-rich stimuli, such as web pages.

Variables
After all cleaning processes, we ran our analyses on a wide range of new parameters. In the state-of-the-art, the same types of parameters are frequently used. For the use of the mouse, these include curvature, trajectory, clicks, dwells, or the number of movements (Zheng et al., 2011;Fu et al., 2017;Rheem et al., 2018;Tzafilkou and Protogeros, 2018), and for scrolls, amplitude, speed, and number (Braganza et al., 2009;Liu et al., 2017;Milisavljevic et al., 2018). In comparison, eye-mouse studied parameters are more related to their respective positions, but are not limited to this factor. For instance, eye-mouse distance, content hovered, lag, percentage of regions visited by both the eyes and mouse, etc., have been used to study their relationship (Chen et al., 2001;Rodden and Fu, 2007;Rodden et al., 2008;Guo and Agichtein, 2010;Huang et al., 2012;Navalpakkam et al., 2013;Boi et al., 2016).
In this paper, we propose a more complete set of parameters directly inspired from eye movement analyses. These parameters include dwell duration, movement duration, movement amplitude, and number of events. It should be noted that duration variables are expressed in seconds or milliseconds, while amplitude variables are expressed in degrees of visual angle. Furthermore, in order to better characterize the dynamic of the exploration through ambient and focal visual modes, we apply, for the first time, the K coefficient defined by Krejtz et al. (2016) to mouse and scroll events. This coefficient is calculated by averaging the differences in z-scores between the duration of each fixation and the next saccade, as shown in Equation (1). A negative value indicates that the fixation d i is short and the next saccade a i+1 is long (>5 • ). In contrast, a positive value suggests that the fixation d i is long and the next saccade a i+1 is short (<5 • ) which corresponds to a focal mode.
Milisavljevic et al. (2019) introduced two new variables to better capture the dynamic of focal and ambient modes. While the K coefficient did not discriminate between the different stimuli used in their study, the number of switches between modes did. It is for this reason that we are using these parameters to more precisely describe the dynamic of the exploration for both the eyes and mouse.

Mouse and Scroll Overlap
Participants were able to independently move the mouse and scroll. Consequently, this led to overlaps between mouse movements and scrolls. We found that this overlap occurred only 10% (SD = ±4.83%) of the total mouse movement time and 15% (SD = ±10.59%) of the total scrolling time. During these overlaps, we observed mouse movements with an amplitude of 0.02 • (SD = ±0.02 • ) and a duration of 240 (SD = ±195.53ms) for a total duration of 570 (SD = ±430ms). As described, during overlaps, movements represented a negligible part of the exploration. Moreover, these overlaps followed three main patterns: move-scroll, scroll-move, and move-scroll-move. The move-scroll pattern refers to a scroll that began while already moving the mouse. This pattern occurred 43% of the time and was the most frequent. The second pattern we observed was the scroll-move pattern. This pattern is the exact opposite: the participant began to move the mouse while already scrolling. This pattern happened 25% of time. The move-scroll-move pattern is when the participant scrolled within a single mouse move. This was less common and occurred 21% of the time. Finally, the 11% remaining was exotic patterns, such as movescroll-move-scroll or move-scroll-move-scroll-move, which represent 2% each. Due to the low frequency of overlaps between scrolls and mouse movements, we can safely conclude that these specific movements are residual movements or involuntary micro-movements generated by the use of the mouse wheel. For this reason, we did not take overlaps into account in the following analyses.

RESULTS
To study the similarities and differences between eye movements, mouse movements, and scrolling, we ran two types of analyses. We first described eye, mouse, and scroll parameters globally, to clearly define what a mouse or scroll movement was, and summarized them in Table 1. Then, we examined the role of tasks and time, by performing a 2 (free viewing and visual search) X 4 (0-5 s time-bin, 5-10 s time-bin, 10-15 s time-bin, and 15-20

s time-bin) repeated measures analyses of variance (ANOVAs).
Post-hoc analyses were run using the pairwise Student's t-test with a Bonferroni correction. It should be noted that only mouse and scroll movement parameters are presented in this section (see Table 1 for dwell parameters). Contrary to a fixation that provides information of current cognitive processes, a dwell generally means that the mouse have not been used. Moreover, the duration of a dwell is much longer than a fixation and can easily last the equivalent of 10 fixations. This difference of scale does not make it possible to determine what falls within the scope of the cognitive process in progress, or the simple nonuse of the mouse. However, movement parameters remain comparable.

Eye Movements Analysis
We measured a stable distribution of fixations and saccades across the different conditions. During the exploration of a website, participants spent approximately 14% (SD = ±1.72%) of the time making a saccade (see Table 1). Although this proportion was maintained across the tasks, we found a task effect on the distribution of fixations/saccades [F (1, 138) = 231.98, p < 0.001]. Participants spent 13.6% (SD = ±1.79%) of the time making a saccade in the free viewing task and 15% (SD = ±1.84%) during the visual search task. Furthermore, we found a time effect [F (3, 414) = 685.59, p < 0.001] present between the first and second time-bins (t = −29.50, p < 0.001), and between the second and third time-bins (t = 8.98, p < 0.001), but not between the third and fourth timebins (t = −2.33, p > 0.05). We also found a significant interaction effect between task and time [F (1, 138) = 3.48, p < 0.05], and post-hoc analyses confirmed that main effects were preserved (see Table 2).

Number of Fixations and Saccades
Globally In free viewing task, there were no significant differences between the successive time-bins (all p >0.05). However, in visual search, the only difference with the main time effect was the absence of a reduction between the third and fourth time-bins (p >0.05) (see Table 2).
The average fixation duration significantly increased over time [F (3,414) = 297.65, p < 0.001] up to the third time-bin. More precisely, the first time-bin was significantly different from the second time-bin (t = 20.91, p < 0.001), and this second timebin was significantly different from the third time-bin (t = 6.80, Avg. duration ( Avg. duration ( The non-significant column regroups all post-hoc with a p-value > 0.05. Not mentioned post-hocs have a p-value. Frontiers in Psychology | www.frontiersin.org p < 0.001). However, the third time-bin was not significantly different from the fourth (p >0.05. There was also an interaction effect between task and time [F (3,414) = 3.29, p < 0.05], but post-hoc analyses confirmed that main effects were preserved (see Table 2).

Dominant Mode
Finally, to understand the dynamic of visual exploration, we computed the K coefficient and its associated variables, as defined by Krejtz et al. (2016) Table 2).

Visual Modes Switches
As described in the Methodology section, the number of visual modes switches corresponds to how many times a participant switched from ambient to focal and focal to ambient during a trial. It was not, however, significantly different between the third and fourth time-bins (t = −1.24, p > 0.05). Furthermore, we found a significant interaction between the task and time [F (3,414) = 6.33, p < 0.001]. The main task effect was maintained except for the third time-bin (t = 4.33, p > 0.05). Similarly, the main time effect was preserved for the free viewing task, but not in the visual search task, during which there were no significant differences between the second and third, and the third and fourth time-bins (all p >0.05) (see Table 2).

Visual Modes Proportions
The participants spent, in total, 43% (SD = ±6.81%) of the time in ambient mode. This proportion significantly varied according to the task [F (1,138) = 358.75, p < 0.001]. It was higher in the visual search task (M = 48.35%, SD = ±7.33%) than in the free viewing task (M = 38.21%, SD = ±7.65%). There was a significant time effect [F (3,414) = 638.94, p < 0.001]. The proportion of time spent in ambient mode significantly decreased between all successive time-bins: between the first and second time-bins (t = −31.30, p < 0.001), between the second and third time-bins (t = −9.32, p < 0.001), and between the third and fourth time-bins (t = −1.44, p > 0.05). We also found a significant interaction between the time and task [F (3,414) = 8.75, p < 0.001], but post-hoc analyses confirmed that main effects were preserved (see Table 2).
To summarize, we found a task and time effect on all the variables of eye movements parameters. Most of the parameters increased over time to then stabilize starting at the third time-bin (after 10-15 s). More specifically, fixation-related variables increased and movement-related variables decreased over time. Moreover, ambient mode was predominant during the exploration but progressively switched to focal mode as time went by.

Mouse Analysis
The participants spent 20.85% (SD = ±8.33%) of the time moving the mouse during their exploration. We found a significant task effect [F (1,138) = 37.66, p < 0.001], the proportion of time spent moving the mouse was significantly higher in the visual search task (M = 23.33%, SD = ±8.48%) than in the free viewing task (M = 18.94%, SD = ±10.11%). We also observed a time effect [F (3,414) = 420.24, p < 0.001] with a significant decrease between the first and second time-bins (t = −24.14, p < 0.001), and between the second and third timebins (t = −3.25, p < 0.01). However, there was no significant difference between the third and fourth time-bins (t = −1.68, p > 0.05). There was a significant interaction between time and task [F (3,414) = 7.75, p < 0.001]. The main task effect was maintained excepted for the second time-bin (t = 1.2, p > 0.05). The main time effect was preserved in the free viewing, but not entirely during the visual search task, there was no significant difference between the second and third time-bins (p >0.05) (see Table 3).

Number of Mouse Movements
The participants did 6.04 (SD = ±1.78) movements on average. We found a task effect [F (1,138) = 73.45, p < 0.001] with more mouse movements during the visual search task (M = 6.77, SD = ±2.01) than during the free viewing task (M = 5.43, SD = ±1.97). We found an influence of time [F (3,414) = 183.46, p < 0.001] with a significant decrease between the first and Avg. duration (s) Avg. duration (ms) The non-significant column regroups all post-hocs with a p-value > 0.05. Not mentioned post-hocs have a p-value <0.05 or less. second time-bins (t = −14.34, p < 0.001), and between the second and third time-bins (t = −4.70, p < 0.001). However, there was no significant difference between the third and fourth time-bins (t = −1.79, p > 0.05). We also found a significant interaction between time and task [F (3,414) = 14.15, p < 0.001]. The main task effect was preserved excepted for the second timebin (p >0.05). In the free viewing task, the main time effect was preserved, but in the visual search task this main effect was maintained only between the first and second time-bins (p <0.001) (see Table 3).

Duration of Mouse Movements
The participants moved the mouse for 768 ms (SD = ±342.55ms) on average. We found a task effect [F (1,138) = 15.63, p < 0.001] with significantly longer mouse movements in the free viewing task (M = 772.68ms, SD = ±362.58ms) than in the visual search task (M = 767.43ms, SD = ±386.39ms). Moreover, we found a time effect [F (3,414) = 269.83, p < 0.001] with a significant decrease between the first and second timebins (t = −19.53, p < 0.001), but no significant difference between the second and third time-bins (t = −2.56, p > 0.05) or between the third and fourth time-bins (t = 0.74, p > 0.05). We also found a significant interaction between time and task [F (3,414) = 3.69, p < 0.05]. However, the main task effect was preserved only for the two last time-bins (all p <0.005), while the main time effect was only preserved for the visual search task. During the free viewing task, we observed significant differences between the first and second time-bins, and between the second and third time-bins (all p >0.05) (see Table 3).

Dynamic of Mouse Movements
Here, K coefficient is used to better understand the mouse movement dynamic. The K coefficient showed a dominance of the ambient mode (M = −0.35, SD = ±0.63). We found significant differences between tasks [F (1,138) = 15.27, p < 0.001], which was slightly higher in the free viewing task (M = −0.31, SD = ±0.58) than in the visual search task (M = −0.39, SD = ±0.77). There also was a significant time effect [F (3,414) = 410.86, p < 0.001]. We found a significant increase between all successive time-bins (all p <0.001). However, there was no significant interaction effect [F (3,414) = 2.48, p > 0.05] (see Table 3).
The non-significant column regroups all post-hocs with a p-value > 0.05. Not mentioned post-hocs have a p-value < 0.05 or less.

Mode Switches
On average, 3.78 (SD = ±0.89) switches occurred between modes given by the K coefficient. There was a significant task effect [F (1,138) = 70.08, p < 0.001], which was characterized by a lower number of mode switches during the free viewing task (M = 3.44, SD = ±1.04) than during the visual search task (M = 4.19, SD = ±1.07). There was also a significant time effect [F (3,414) = 109.86, p < 0.001]. The number of switches significantly increased between the first and second time-bins (t = 11.68, p < 0.001), and between the second and third timebins (t = 3.72, p < 0.005), but there was no significant difference between the third and fourth time-bins (t = 1.42, p > 0.05). We also found a significant interaction between the time and task [F (3,414) = 11.93, p < 0.001]. The main task effect was preserved except for the first time-bin (p <0.05). Furthermore, the main time effect was maintained for the free viewing task, but, for the visual search task, the first and second time-bins were significantly different (p <0.001), while remaining time-bins did not have significant differences (all p >0.05) (see Table 3).
To summarize, we found a task and time effect for all the mouse parameters. As found for eye movements, most of the mouse parameters stabilized at the end of the exploration. Interestingly, the mouse parameters behaved similarly to eye movements parameters. Finally, ambient mode was the prevailing mode for mouse movements, but, as for the eyes, progressively switched to the focal mode over time.

Scroll Analysis
The participants, globally, spent 16.58% (SD = ±5.32%) of a trial scrolling. There was a task effect [F (1,138) = 469.10, p < 0.001]. The proportion of time spent scrolling was higher in the visual search task (M = 23.80%, SD = ±8.28%) compared to the free viewing task (M = 10.86%, SD = ±4.87%). We also found a time effect [F (3,414) = 239.92, p < 0.001]. There was a significant increase between the first and second time-bins (t = 20.74, p < 0.001), as well as between the third and fourth time-bins (t = 3.70, p < 0.005), while there was no significant differences between the second and third time-bins (t = 0.06, p > 0.05). We found a significant interaction between the time and task [F (3,414) = 11.94, p < 0.001]. The main task effect was maintained for all time-bins (all p <0.001). However, the time effect was not preserved. In both tasks, the first and the second time-bins were significantly different (t = −20.5, p < 0.001), but we did not find significant differences between other time-bins (p >0.05) (see Table 4).

Number of Scrolls
During the trial, the participants scrolled on average 8.77 (SD = ±2.04) times. We found a task effect [F (1,138) = 512.15, p < 0.001]. We measured lower numbers in the free viewing task (M = 6.62, SD = ±2.25) compared to the visual search task (M = 11.44, SD = ±2.63). We also found a time effect [F (3,414) = 282.94, p < 0.001]. There was a significant increase between the first and second time-bins (t = 24.37, p < 0.001). However, there was no significant differences between the second and third timebins (t = 0.19, p > 0.05) or between the third and fourth timebins (t = −0.62, p > 0.05). There was a significant interaction between the time and task [F (3,414) = 6.03, p < 0.001]. However, post-hoc analyses showed that the main effects were maintained (see Table 4).

Scroll Duration
Scrolls lasted on average 367.57 (SD = ±121.65ms). We found a task effect [F (1,138) = 205.20, p < 0.001]. Scroll was shorter in the free viewing task (M = 328.64ms, SD = ±99.57ms) compared to the visual search task (M = 417.24ms, SD = ±186.17ms). Additionally, we found a time effect [F (3,414) = 55.49, p < 0.001]. There was a significant increase between the first and second time-bins (t = 9.34, p < 0.001), as well as between the third and fourth time-bins (t = 3.39, p < 0.01). However, there was no significant difference between the second and third time-bins (t = 1, p > 0.05). We did not find any interaction [F (3,414) = 1.94, p > 0.05] (see Table 4). There was a significant increase between the first and second time-bins (t = 9.44, p < 0.001), but not between the second and third time-bins (t = 0.77, p > 0.05) or between the third and fourth time-bins (t = 1.20, p > 0.05). There was a significant interaction between the time and task [F (3,414) = 6.51, p < 0.001], but post-hoc analyses confirmed that main effects were preserved (see Table 4).

Scrolling Dynamic
In contrast to eye and mouse dynamics, scrolling dynamic was dominated by the focal mode (M = 0.43, SD = ±0.45). There was a task effect on the K coefficient [F (1,138) = 454.64, p < 0.001], which was significantly more indicative of the focal mode in the free viewing task (M = 0.92, SD = ±0.67) than in the visual search task (M = −0.17, SD = ±0.47). There was also a time effect [F (3,414) = 5.58, p < 0.001], the K coefficient significantly decreased between the first and second time-bins (t = −4.29, p < 0.001), but did not between the following successive time-bins (all p >0.05). We found an interaction between the time and task [F (3,414) = 39.55, p < 0.001]. The main task effect was maintained (all p <0.001). However, maintained during the free viewing task, the main time effect was not maintained in the visual search task. We measured a significant reduction between the first and second time-bins, and the second and third time-bins (all p <0.05, but not between the third and fourth time-bins (p >0.05) (see Table 4).

Modes Switches
The participants switched between modes an average of 3.63 (SD = ±0.74) times. There was a significant task effect [F (1,138) = 257.59, p < 0.001]. The number of switches between modes was significantly lower in the free viewing task (M = 2.99, SD = ±0.94) than in the visual search task (M = 4.37, SD = ±1). We also found a significant time effect [F (3,414) = 109.40, p < 0.001]. There was a significant decrease in the number of switches between the first and the second time-bins (t = −15.27, p < 0.001), but no significant differences between the following successive time-bins (all p >0.05). The interaction of the time and task was also significant [F (3,414) = 4.60, p < 0.001], but post-hoc analyses confirmed that main effects were preserved (see Table 4).
To summarize, we found a task and time effect for all scrolling parameters. As with the eyes and mouse parameters, most of the scrolling parameters stabilized at the end of the exploration. However, this evolution was in the opposite sense of that for the eye and mouse movements. While the eye and mouse fixation or dwelling parameters increased over time, scrolling dwells decreased. Inversely, while the eye and mouse movement parameters decreased over time, scrolling increased. As such, the focal mode was predominant in the global exploration, but tended to ambient mode over time.

CONCLUSION AND DISCUSSION
Since the seminal work of Buswell (1935), eye movements have been extensively studied in a wide variety of conditions. From viewing patterns (Yarbus, 1967) to average fixation durations (Mackworth and Morandi, 1967), how eye movement parameters behave are well-known. The knowledge of these basic parameters led to more complex research aiming to infer cognitive processes occurring during eye movements (Velichkovsky et al., 2002;Unema et al., 2005;Pannasch et al., 2008). However, with the stimuli diversity that aroused during the last decades, it became crucial to extend and adapt this knowledge to new stimuli types. That is why our study aims to provide a detailed statistical description of eye movement parameters on ecological web pages. Contrary to other stimuli such as natural images, web pages allow the use of mouse movements and scrolls. As previously described, mouse movements are mostly studied as patterns or trajectories (Rodden et al., 2008;Guo and Agichtein, 2010;Tzafilkou and Protogeros, 2018) and scrolling is sparsely studied (Braganza et al., 2009;Liu et al., 2017;Milisavljevic et al., 2018). Although their respective parameters are mentioned, to our knowledge, no quantitative analyses of their parameters have been performed. Using the same approach as for the study of eye movements, we intended to run such analyses to describe mouse and scroll parameters. Thus, the purpose of our study is to provide a statistics baseline of eye movements, mouse movements, and scrolling parameters during web pages exploration.

Eye Movement Parameters
We first found a task effect for all eye variables that replicated several studies in the literature (Yarbus, 1967;DeAngelus and Pelz, 2009;Itti and Borji, 2015). Fixation-related variables were higher in the free viewing task compared to the visual search task, while movement-related variables were higher in the visual search task. We also found a time effect on all variables. Fixation-related variables increased over time for both tasks while movement-related variables decreased. Participants did fewer fixations and saccades, but longer fixations and shorter saccades over time (Unema et al., 2005). As a result, we observed a  global domination of ambient mode (i.e., short fixations with long saccades), but over time the dominant mode progressively switched to focal mode (i.e., long fixations with short saccades). This behavior could indicate that participants try to contextualize the stimulus at the beginning of the exploration to then focus more and more on content as time goes by.

Mouse Parameters
Then we ran the same analyses on mouse movements and scrolls. We found a task effect for all parameters of the mouse exploration, except for the average amplitude and duration of the mouse movements. As for the eye movements, dwell-related variables were higher in the free viewing task compared to the visual search task, while movement-related variables were higher in the visual search task. Again, we found a time effect on all variables. Comparably to eye movement parameters, dwellrelated variables increased over time and movement-related variables decreased over time for both tasks. This behavior is similar to that of eye movements and suggests strong similarities between the two. Hence, we applied visual mode concepts to mouse movements. However, it is worth noting that the number of mouse movements was broadly lower to the number of eye movements, so these results should be discussed with caution. Despite the difference in the number of events, we observed similar behavior in the mouse dynamic, which began in ambient mode to progressively switch to focal model over the course of the exploration.
Regarding scrolling, all parameters varied according to the task. Comparably to eye and mouse movement parameters, we found a task effect for all parameters. We also found a time effect on all the variables, but dwell-related variables decreased over time while scroll-related variables increased. However, the stabilization of scroll parameters began earlier than for mouse parameters (see Figures 3, 4). Although there were fewer scroll movements than eye movements their frequency remained slightly higher than that of mouse movements. Therefore, we conducted analyses of dominant modes and found that, globally, scrolling was in focal mode. However, when looking over time, we observed that the focal mode was more dominant at the beginning of the exploration and ambient mode at the end. Since participants scrolled increasingly over time but did longer eye fixations, they seemed to balance the natural emergence of the focal mode of the eyes by scrolling to keep changing and contextualizing the newly displayed content.

Similarities and Differences
When studying eye, mouse, and scroll parameters, we observed common tendencies over time. In order to study these tendencies, we separated computed variables into two distinct categories: variables related to movements and variables related to fixations/dwells. Then, since focused on tendencies, the relevant parameters were normalized between 0 and 1 to enable the comparisons. This movement-fixation dichotomy is directly inspired from how the visual cortex processes visual information.
The visual cortex is divided in two main pathways: ventral and dorsal stream. Ventral stream carries information about object recognition, while the dorsal stream is more related to visually guided movements (Goodale and Milner, 1992). Since saccades, mouse movements and scrolls are all visually guided movements, they are analyzed together. However, while fixation is directly involved in object recognition (ventral stream), it is not clear whether a mouse or a scroll dwell is involved. The mouse remains a tool used to browse a web page, and the implication of a pause still needs further investigations. For the convenience of the following analyses, we compare eye fixations with mouse and scroll dwells.
The first common tendency we observed is depicted in Figure 3A . It shows a common pattern between the fixationrelated variables of the eyes and the mouse, and an opposite one with the scroll. On the one hand, eye and mouse parameters behaved similarly. Fixation or dwell durations, and percentages of fixations/dwells, were at their lowest at the beginning of the exploration and increased up to the end of exploration. For instance, in the free viewing task, the average fixation duration was 211.27 ms at the beginning of the exploration and increased up to 241.28 ms, while the average mouse dwell lasted 1 s at the beginning and increased up to 1.52 s (see Tables 2, 3 for more details). On the other hand, scrolling behaved exactly the opposite way. Scroll dwell was at its highest at the beginning of the exploration and lasted 1.91 s in average during the first time bin of the free viewing task and decreased overtime to reach 1.44 s at the end of the exploration (see Table 4 for more details). These observations are consistent in both the free viewing and visual search tasks (Figures 3B,C). Yet we observed a stabilization of mouse and scroll dwell durations starting from the second time-bin.
We observed a second tendency describing the opposite pattern for movement-related variables, as presented in Figure 4A. Eye and mouse movement variables decreased over time and scroll variables increased. Eye and mouse parameters behaved in the opposite way to scroll parameters, just as with fixation-related variables. Furthermore, this relationship was maintained across both tasks (Figures 4B,C). For instance, we observed an average saccade amplitude of 6.26 • and an average mouse amplitude of 0.43 • at the beginning of exploration during the free viewing task. Then both amplitudes have decreased to, respectively, 4.47 • and 0.13 • at the end of exploration. Under the same conditions, the scrolling amplitude increased from 5.88 • at the beginning of the exploration to 6.78 • at the end (see Tables 2-4 for more details).
Our results show a clear relationship between eye, mouse, and scroll parameters. Previous studies have already shown the spatial coordination of the eyes and mouse (Guo and Agichtein, 2010;Huang et al., 2012;Boi et al., 2016) and some coordination between the eyes and scroll speed (Milisavljevic et al., 2018). However, here we show that this relationship is even deeper than expected, and can be identified through analyzing eye, mouse, and scroll parameters. Indeed, coordination is not only between the eyes and the mouse, or, between the eyes and the scroll, but clearly between all three. Our findings show, for the first time, that eye and mouse parameters behave similarly, which confirms the interest of using mouse behavior to predict eye behavior. Yet the interaction described here does not take spatial coordinates into account that could be combined with relationship parameters to better predict eye movements from mouse events.
Even though further studies are needed to confirm our results, the relationship between eyes and mouse parameters seems consistent over time. This may be related to similar processing in the ventral and dorsal streams (Goodale and Milner, 1992). For instance, Stone and Gonzalez (2015) reported several studies in which ventral and dorsal streams of congenitally blind individuals were preserved during pointing and grasping tasks. Thus, we can assume that the important role of both streams involved in hand movements and eye movements may explain why the eyes and the mouse parameters behave similarly during the exploration. However, this hypothesis does not address why the scroll parameters behave oppositely. The opposite behavior we observed for the scroll may be explained by the "the sensory weighting hypothesis" (Ernst and Banks, 2002). This theory states that during a task involving sensory competition, here the presence of both vision and haptic, we tend to rely on the optimal one to complete the task. For instance, before reaching an object whose position is unknown, we first need to look at it, but there are occasions when we reach objects without looking at them because we already know their exact position. In our case, the task is to browse the page with or without a target. At the beginning of the exploration, the optimal sensory input to fulfill this task would be the eyes. As time goes by, we discover the web page more and more until we browsed it entirely. The scroll would gradually become the optimal way to browse the web page, since fixation duration is increasing and saccade amplitude decreasing, and the scroll would then replace large saccades.
Further research is necessary to better understand what mechanisms are involved in the eyes and mouse coordination during web pages exploration. For instance, we did not differentiate scroll up from scroll down in our analyses. When we scroll down, we usually discover the content for the first time. But a scroll up is necessary to re-examine an already seen area of the web page. Differentiating the two directions might provide finer results on what cognitive processes are involved.

DATA AVAILABILITY STATEMENT
The data analyzed in this study was obtained from the company Sublime Skinz. Data cannot be distributed, remixed, adapted, used to build upon, changed in any way or used commercially. Requests to access the datasets should be directed to Coralie Petermann, coralie@sublime.xyz.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Research Ethics Board of Paris Descartes University (Comité d'éthique de la Recherche de l'université Paris Descartes). The patients/participants provided their written informed consent to participate in this study.