A State Space Approach to Dynamic Modeling of Mouse-Tracking Data

Mouse-tracking recording techniques are becoming very attractive in experimental psychology. They provide an effective means of enhancing the measurement of some real-time cognitive processes involved in categorization, decision-making, and lexical decision tasks. Mouse-tracking data are commonly analyzed using a two-step procedure which first summarizes individuals' hand trajectories with independent measures, and then applies standard statistical models on them. However, this approach can be problematic in many cases. In particular, it does not provide a direct way to capitalize the richness of hand movement variability within a consistent and unified representation. In this article we present a novel, unified framework for mouse-tracking data. Unlike standard approaches to mouse-tracking, our proposal uses stochastic state-space modeling to represent the observed trajectories in terms of both individual movement dynamics and experimental variables. The model is estimated via a Metropolis-Hastings algorithm coupled with a non-linear recursive filter. The characteristics and potentials of the proposed approach are illustrated using a lexical decision case study. The results highlighted how dynamic modeling of mouse-tracking data can considerably improve the analysis of mouse-tracking tasks and the conclusions researchers can draw from them.

1 Evaluating the state-space model

Computation time
In this section we evaluate the time required by the algorithm included in Table A.3 of the manuscript to completely fit the state-space model. The algorithm has been implemented in Matlab R2018a and it is freely available online at https://github.com/antcalcagni/SSM_mousetracking. We tested two instances of our model, namely the simple model with a categorical variable and the complete model including categorical and continuous variables together with their interaction. The simulation study was obtained by varying the type of model (categorical, interaction), the number of statistical units I = (12,25,50), the number of trials J = (12, 28) and the number of levels of the categorical variable K = (2, 4). The time step was kept fixed N = 61. The study involved I × J × K = 12 scenario for two type of models (see Table 1). For each scenario, data and model structure were kept fixed, one single chain was used without parallelization, warming-up and sampling periods were equal to 2000 samples (warm-up = 1000 samples), and the adaptive parameter of the MH-loop was equal to H = 25. The elapsed time was measured in seconds using the standard built-in Matlab function. Computations were performed on Intel Core i7-7500U 4-core 2.70GHz, Ubuntu 18.04 64bit, 8GB Ram. Figure 1 shows the results for both types of models. As expected, the elapsed time increased exponentially as a function of the complexity of the scenario. In both models, the most simple scenario (scenario no. 1) required less then one minute to complete the warming-up and about one minute to complete the sampling period. Instead, the medium scenario (scenario no. 6) required about one minute to complete the warming-up and less then three minutes to complete the sampling phase. Finally, the most complex scenario (scenario no. 12) required less than three minutes to finalize the warming-up and about seven minutes to complete the sampling period.

Simulation study
In this section, we report the results of a simulation study to evaluate the ability of the model to (i) recover the true parameters β and (ii) resemble the observed data Y. times by getting a total of 40000 final samples. The levels of I, J, and K were chosen to represent the number of individuals, trials, and variables commonly encountered in experimental psychology. The simulation was performed on a (remote) HPC machine based on 16 cpu Intel Xeon CPU E5-2630L v3 1.80GHz, 16x4GB Ram.

Simulation design
The simulation workflow proceeded as follows: 1. The hyperparameters µ β and Σ β of the prior f (β) = N (µ β , Σ β ) were kept fixed, µ 1 = 2.75, 2. For each scenario s = 1, . . . , 16, the parameters were sampled from the priors β 1,...,M ∼ f (β) and a series of M datasets Y s 1 , . . . , Y s M were generated according to the model equations (note that the generic dataset is of the form Y s I×J×N ) 3. For each of the datasets Y s 1 , . . . , Y s M , the model was fitted by obtaining the array of latent states and parameter posteriors: where the generic array Z s is of the form Z s I×N . The model was run using 10000 (samples) x 8 (chains), with warming-up of 2500 samples. Matlab Parallel computing was used to run the chains.

Outcome measures
The recovery problem was evaluated for s = 1, . . . , 16 by means of the overlapping measure between symmetric distributions: which measures the amount of agreement between two distributions, with Λ = 0 indicating no overlapping. In our context, as f (β) and f (β|Y) are symmetric with a unique mode, we interpret Λ as density similarity where Λ = 1 means that the two distributions are the same 1 .
The resembling problem was instead assessed by means of the Y|Ỹ-discrepancy measure: which measures the amount of reconstruction 2 of the observed array Y via the reproduced arrayỸ. The measure AoR takes values in [0,100] with 0% indicating that no reconstruction occurred at all.

Results
The results of the simulation study are described in Tables 2-3. Overall, both the models got acceptable fit with regard to recover the true model structure (Λ index) and the adequacy in reproducing the input data (AoR index). As expected, the goodness-of-fit increased as a function of the number of trials J and the number of individuals I.

Convergences of the MCMC chains
In this section we illustrate the convergence diagnostics for the three case studies described in Section 4 of the manuscript.

Fit of the model
The adequacy of the state-space model to reproduce the observed data was computed by means of a simulation-based approach. In particular, given the posteriors of the parametersβ and filtered/smoothed latent statesẐ, M new (simulated) datasets (Y * 1 , . . . , Y * M ) were generated according to the estimated model structure. For each new dataset, the AoR discrepancy measure was computed: In addition, to further evaluate the amount of reconstruction, we computed two kind of AoR measures: • Overall AoR: percentage of data reconstruction based on the overall I × J × N observed matrix Y.
• By-subject AoR: percentage of data reconstruction based on the J × N observed matrix Y i for each subject i = 1, . . . , I. The index allows for evaluating the adequacy of the model to reconstruct the individual-based set of trajectories.
The simulation was performed on a (remote) HPC machine based on 16 cpu Intel Xeon CPU E5-2630L v3 1.80GHz, 16x4GB Ram. Figures 5-7 show the results of AoR indices for the three models. Figure 8 shows the observed mean trajectories for each individual Y i , i = 1, . . . , I against the predicted trajectories by the three models.