To Explore the Predictive Power of Visuomotor Network Dysfunctions in Mild Cognitive Impairment and Alzheimer’s Disease

Background Research into Alzheimer’s disease has shifted toward the identification of minimally invasive and less time-consuming modalities to define preclinical stages of Alzheimer’s disease. Method Here, we propose visuomotor network dysfunctions as a potential biomarker in AD and its prodromal stage, mild cognitive impairment with underlying the Alzheimer’s disease pathology. The functionality of this network was tested in terms of timing, accuracy, and speed with goal-directed eye-hand tasks. The predictive power was determined by comparing the classification performance of a zero-rule algorithm (baseline), a decision tree, a support vector machine, and a neural network using functional parameters to classify controls without cognitive disorders, mild cognitive impaired patients, and Alzheimer’s disease patients. Results Fair to good classification was achieved between controls and patients, controls and mild cognitive impaired patients, and between controls and Alzheimer’s disease patients with the support vector machine (77–82% accuracy, 57–93% sensitivity, 63–90% specificity, 0.74–0.78 area under the curve). Classification between mild cognitive impaired patients and Alzheimer’s disease patients was poor, as no algorithm outperformed the baseline (63% accuracy, 0% sensitivity, 100% specificity, 0.50 area under the curve). Comparison with Existing Method(s) The classification performance found in the present study is comparable to that of the existing CSF and MRI biomarkers. Conclusion The data suggest that visuomotor network dysfunctions have potential in biomarker research and the proposed eye-hand tasks could add to existing tests to form a clear definition of the preclinical phenotype of AD.


INTRODUCTION
Alzheimer's disease (AD) is a devastating neurodegenerative disease. It is characterized by progressive cognitive decline. It is estimated that it affects 44 million people worldwide (Patterson, 2018). Despite the many attempts to find a cure for AD, no disease-modifying treatment has been developed (Au et al., 2015). Some suggest that experimental treatments were administered too late in the course of the disease . Instead, optimal treatment would be achieved in the preclinical stage of AD , when pathophysiological changes are already occurring, but no clinical symptoms-such as memory loss-have manifested (Jack et al., 2011). In clinical practice, it is impossible to accurately diagnose patients in the preclinical stage: in fact, a definitive diagnosis can only be established postmortem (Au et al., 2015). Biomarkers, such as cerebrospinal fluid (CSF) analysis and neuroimaging techniques, are one of the most promising methods for detecting preclinical AD (Barber, 2010). Yet, to date, no biomarker has been identified at the start of the pathophysiological processes of AD, that also has a strong link with the later emergence of AD's clinical symptoms . In addition, given the suboptimal sensitivity and specificity of CSF and structural neuroimaging markers, their use is mostly restricted to research settings (Almeida, 2002;De Leon et al., 2004;Schmand et al., 2010;Ritchie et al., 2015;Jack et al.). Another problem is the heterogeneous nature of AD: a certain level of a biomarker may be detected in some people, even though they will never develop AD, while the same biomarker level might not be detected in people who did develop AD (Au et al., 2015). For these reasons, AD is still primarily diagnosed on clinical grounds (Salimi et al., 2018) and consequently, most patients are not diagnosed until they have progressed to late stages of AD (Barber, 2010).
Recent studies have indicated that multiple functional networks throughout the entire brain are already affected by neurodegeneration in early stage AD (Seeley et al., 2009;Barber, 2010). For instance, there is not only damage to the hippocampus, a structure that is classically associated with the memory impairments in AD, but functional imaging studies showed that other areas, for example in the frontal and parietal lobe, also atrophied (Burnod et al., 1999;Jacobs et al., 2012). This atrophy damages several important networks in the brain: the visuomotor network is one such important function brain network. This network receives visual and proprioceptive sensory to compute motor control signals. The planning and executing of the tasks at hand rely on feedback and feedforward mechanisms through visual, motor, parietal, and frontal cortices (Ledberg et al., 2007) in a dynamic manner (Brovelli et al., 2017). The functionality of the visuomotor network can be quantified in terms of timing, accuracy, and speed of eye and hand movements when performing goal-directed tasks. Consequently, impairments in visuomotor functioning due to underlying neurodegenerative damage could be a marker of early or even preclinical AD and a valuable addition to existing markers (de Boer et al., 2016;Salimi et al., 2018). Several studies support this hypothesis: for instance, it was found that the initiation of saccades was slowed (Molitor et al., 2015;de Boer et al., 2016) and that the initiation and execution of motor sequences was impaired (Salek et al., 2011;de Boer et al., 2016) in AD and MCI patients compared to healthy elderly. Furthermore, visuomotor functioning has the benefit of being easy and quick to measure, compared to existing markers, which are often more timeconsuming (i.e., structural neuroimaging) or invasive (i.e., CSF). However, although aforementioned studies show a relation, much is still unclear about the added value of visuomotor performance for network degeneration and the sensitivity and specificity of the method. We aimed to study the predictive value of visuomotor performance by measuring several visuomotor parameters, such as the time needed to move the eyes or the hand to a target. Because only one parameter is not sufficient for a classification (Crawford, 2015), a combination of parameters with high predictive power were analyzed using machine learning. This is a promising method, as it can reduce a large dataset to a small subset of meaningful features using feature selection methods (Sarica et al., 2018). Thus, machine learning provides insight in the clinical relevance (i.e., what visuomotor parameters add to existing markers) of visuomotor performance on the basis of multiple parameters. A first step toward establishing the predictive value was taken by Lagun et al. (2011), who showed that classification algorithm trained on separating healthy elderly and AD patients could distinguish between healthy elderly and MCI patients based on their eye movements, with an accuracy of 87%.
The aim of the present study was to examine whether visuomotor functioning, in terms of both eye and hand movements, can be a potential new biomarker. We also examined whether there was additional predictive value in using eyehand coordination parameters over just using eye movement parameters (as in Lagun et al., 2011). To this end, we evaluated the performance of several classification algorithms when classifying controls, MCI, AD and consequently determining the predictive power of visuomotor functioning with only eye movements, or both eye and hand movements.

Participants
Participants were included from June 2010 to January 2018. This study was approved by the Medical Ethical Committee of the Erasmus MC Rotterdam (MEC-2008-365 andMEC 2012-524). Data collected until 2015 were previously published by Verheij et al. (2012) andde Boer et al. (2016). Patients were independently diagnosed using a standardized diagnostic procedure carried out by a panel of specialists from the departments of Geriatrics, Neurology, and Psychiatry from the Erasmus Medical Centre. AD subjects met the standard criteria for probable AD (NINCDS-ADRDA) and were considered to be in early stages of the disease. The clinical diagnosis of MCI and AD was based on patient history, laboratory findings, and imaging results. Family members or caregivers of the patients were recruited as healthy elderly controls. Subjects were excluded if their mini-mental state examination (MMSE) (Folstein et al., 1975) score was below 21/30, if they displayed neurological or psychiatric disorders (excluding cognitive decline related to the diagnosis), or if they had motor or ocular pathologies which rendered them unable to touch or see stimuli (unless eyesight could be corrected). The use of acetylcholinesterase inhibitors was permitted. A written informed consent was obtained from all subjects. All collected data were stored in a database.

Measurement Setup
Eye hand tasks were presented on a 32-inch touchscreen (ELO touch systems). Eye movements were recorded with a headmounted video infrared eye-tracking system. We used to systems: Chronos, Chronos Vision, Berlin, sampling rate of 200Hz, the EyeSeeCam system, which replaced the Chronos system in 2017 (EyeSeeTec GmbH, Munich, Germany, sampling rate of 220 Hz, resampled to 200 Hz so that the same code could be used to analyze EyeSeeCam and Chronos data). Both systems used pupil detection software. Chronos recorded gaze data of both eyes, whereas EyeSeeCam recorded the gaze data of the left eye. Hand movements were recorded by an infrared motion capture system (Vicon, Vicon Motion, Oxford, sampling rate 200 Hz). Subjects wore a glove with four markers on the dominant hand, which reflected infrared light back to 4 cameras to follow hand movements. All systems were synchronized using a trigger controlled by MATLAB R2017a (MathWorks, Natick, MA). The set-up is illustrated in Supplementary Figure 1.

Experimental Procedures
All participants were tested at the department of Neuroscience of the Erasmus Medical Centre. Each participant completed a MMSE to assess general cognitive functioning. Subjects were seated at 60 cm distance from the touchscreen using a chin rest. Eye and hand position were calibrated with a 5-point calibration scheme. Verbal instructions were given and subjects were asked to act as quickly and accurately as possible. Subjects performed eight different tasks, three eye tasks, see Supplementary Figure  (1) Reflex tasks: assessment of reflexive eye movements (prosaccade task) and eye/hand movements (pro-tapping task).
(2) Inhibition tasks: assessment of inhibitory eye movements (anti-saccade task) and eye/hand tasks (anti-saccade antitapping task and anti-tapping task). (3) Memory tasks: assessment of short-term memory eye movements (memory-saccade task) and eye/hand movements (memory-tapping task and sequential-tapping task).
Each task started with three practice trials, followed by eight measurement trials. First, the eye tasks were presented, followed by the eye-hand tasks. The tasks that involved hand movement were executed with the index finger of the dominant hand. Between 2010 and 2018, different protocols have been applied. Thus not every participant underwent each task. In Supplementary Table 1, an overview is presented of the number of participants per group who completed a given task. Supplementary Figure 3 represents a typical example of eye and hand traces. Data were pre-processed using custom MATLAB code. The following parameters were calculated for eye movements: -Eye latency: time between target stimulus presentation and eye movement initiation -Eye maximum velocity: maximum eye movement velocity during the primary saccade. Finally, several parameters were calculated from the eye trace to quantify pupil characteristics (Wang and Munoz, 2015): -Pupil peak: maximum pupil dilation between target onset and the primary saccade. -Pupil latency: time from target onset until maximum pupil dilation occurs. -Pupil baseline: mean pupil dilation during the fixation period.
Additionally, anticipation (measure of whether the eye movement toward the first and second stimuli preceded the hand movement toward the same stimulus) was calculated for the sequential-tapping task. Finally, a measure of task performance was computed for each task, based on the number of correct trials (all correctly performed trials) divided by the number of valid trials (all trials that could be analyzed). Trials that had not been recorded correctly (invalid trials) were excluded.

Data Handling
Data from the memory saccade task were excluded because data was collected from too few participants. Outliers-defined as data points that were more than two standard deviations removed from the group mean-were excluded per trial per subject. When eye latency, hand latency, or pupil latency were outliers, the eye, hand, or pupil parameters were all removed, because outliers in these parameters often indicated faulty measurements. In other cases, only the parameter that was an outlier was removed. On average, 4.26% of all data points (SD = 2.65%, range: 0-11.99%) were excluded as outliers. Any missing value was marked as "not a number." Lastly, data were standardized with z-scores within the external cross-validation procedure to prevent data leakage. The final dataset consisted of 96 subject instances where each instance contained 77 functional parameters and one nominal variable, which indicated the subject's group. The dataset was prepared for classification in WEKA 3.8 (WEKA, University of Waikato, New-Zealand). A Zero rule algorithm was used as baseline (ZeroR in WEKA). This algorithm predicts that every subject is part of the largest group in the dataset. The performance of three supervised algorithms was compared to this baseline: 1. Support vector machine (SVM) (Cortes and Vapnik, 1995): a linear classifier which separates groups at the point where the distance between the groups is the greatest by drawing a hyperplane. SVMs use the kernel trick (Shawe-Taylor and Cristianini, 2004) to perform non-linear classification: data are transformed, using a mapping function (defined by the kernel), to a high-dimensional space where a linear hyperplane can be found. When this hyperplane is transformed back to 2-dimensional space, it will become a non-linear separator. A drawback of the kernel trick is that the most effective type of kernel is not known beforehand, making it necessary to test multiple kernels (LibSVM function in WEKA). The SVM was chosen in this study so a direct comparison could be made with Lagun et al.'s study (2011), who looked at the value of eye movement parameters for distinguishing between healthy controls and MCI or AD patients. Furthermore, SVMs are fast to train and are flexible because different kernels can be included in the model. We included a variety of kernels (radial basis function kernel, polynomial kernel, sigmoid kernel, linear kernel) to accommodate potential differences in data structures and to compare the performance of simple and high dimensional kernels. Although better performance might be expected from higher dimensional kernels, simpler kernels save time and processing power: therefore, there is value in comparing the performance of simpler kernels to higher dimensional kernels. 2. Decision tree (Breiman et al., 1984): a simple algorithm that splits the dataset into smaller partitions until it can correctly classify most instances. For each split, the parameter that would provide the best separation between groups at that point is chosen. The decision tree was selected because it is a simple, easily interpretable, and computationally inexpensive algorithm that gives immediate insight in the parameters and weights used for classification (J48 algorithm in WEKA). 3. Artificial neural network (ANN) (Schmidhuber, 2015): the algorithm creates a network of hidden nodes, with differing weights, which are activated or deactivated depending on the input they receive. The activation pattern of all nodes is transformed into the predicted class for the subject. The ANN is a very flexible algorithm, that performs well on complex problems (multilayer perceptron in WEKA). ANNs are also suitable to handling multiclass problems, which the SVM does not readily lend itself for.
We selected these three models for comparison and to find the model with the best performance and the easiest implementation. For example, the ANN is expected to do well because it is tailored to complex problems, but it is computationally heavy and takes a long time to train. Therefore, if a decision tree or SVM could achieve similar performance, these types of algorithms would be preferred in practice. This is especially true for the decision tree, as this algorithm gives complete insight in which features are used. However, since the decision tree has more trouble with complex problems, we also included the SVM as the middle road. Figure 1 represents a flow chart of the steps that were involved to convert the data into the final model including:

Feature Selection
Feature selection was performed for all dataset views to identify the functional variables that would be most meaningful for classification and additionally, to prevent model overfitting by reducing the number of features that the ANN would be trained on. Additionally, we performed significance tests in SPSS 24 (IBM SPSS, Chicago, United States) to verify whether the selected variables could meaningfully separate the groups. Normality was tested for each selected variable using the Shapiro-Wilk test. For normally distributed data, two-tailed t-tests were performed and Cohen's d was calculated as a measure of effect size. For non-normally distributed data, a non-parametric Mann-Whitney U-test was performed and r-squared was calculated as a measure of effect size (Rosenthal et al., 1994). One variable (Anticipation) recorded the number of trials that the subject anticipated a stimulus in the sequential-tapping task and therefore, a chisquare test was performed. Phi was taken as a measure of effect size. The significance level for all tests was α = 0.05. Exact p-values were reported, unless p < 0.001.
To gain insight in the parameters used by models other than the ANN, we performed post hoc feature selection in WEKA using the most successful classifier with the same hyperparameter settings on the HC-controls, HC-MCI, HC-AD, and MCI-AD comparisons. This method returned the merit of a parameter and ranked them from most to least merit. Parameters were selected if their merit was > 0.01.

Hyper-Parameter Selection
Cross-validated parameter selection was performed to optimize each model's hyper-parameters. Hyper-parameters influence the learning process of a model. For parameter selection, the hyperparameters were varied over a pre-established range. All trained instances were compared and the hyper-parameters of the instance with the highest accuracy were chosen. The hyperparameter that was specified first was optimized first and with that optimal value, the second hyper-parameter was optimized. Optimization was internally cross-validated 10 times. Specifically, for the ANN we optimized learning rate and momentum (ranges: 0.0001-1); for the decision tree we optimized the confidence factor (range: 0.1-0.9) and the minimum number of objects (range: 1-10); and for the SVM we optimized the cost and the gamma (ranges: 0.0001-1).

Model Training
Eight optimized models were trained for each dataset view: one decision tree, three ANNs with one hidden layer with 2, 4, or 6 hidden neurons, and four SVMs with different kernels (radial basis function kernel, polynomial kernel, sigmoid kernel, linear kernel). Model training was performed in WEKA with 10-fold cross-validation. The model training was repeated 10 times in order to achieve more robust results.

Model Evaluation
Classification performance was assessed based on accuracy (correctly classified data points/total data points), sensitivity (data points correctly classified as positives/total data points classified as positives), specificity (data points correctly classified as negatives/total data points classified as negatives), and area under the curve (AUC, measure of classification accuracy, independent of class distribution). These outcome measures were averaged across the 100 repetitions and reported as the final performance estimates. Per dataset view, the model that scored the highest on most performance metrics, as determined with a one-way analysis of variance (ANOVA), was selected as the "best performing model." When there was no significant difference in performance between models (α = 0.05), models with a higher sensitivity were preferred, because it is more important to correctly classify patients. The performance of the "best performing model" was assessed using the following rules of thumb. These rules did not apply to the baseline algorithm, which was considered to be a poor performer in all cases because of its strategy. Thus, algorithms that performed similarly to the baseline were also considered poor performers.

Learning Curves
We generated learning curves in WEKA to assess whether sample size was sufficient. Learning curves are used to indicate how much data a model needs to be trained on to achieve a good performance (Figueroa et al., 2012). The curves were generated by plotting model accuracy against sample size. Sample size was manipulated by training the algorithm on a fraction of the total dataset (starting at 10%), which was increased by 10% until the complete dataset was used. Typically, a learning curve that indicates a sufficient sample size first shows a rapid increase in performance, followed by a turning point where the performance increase slows. The last part of the curve is flat and indicates that performance will not improve further, even if sample size is increased (Figueroa et al., 2012).

RESULTS
In total, 96 subjects were selected from the database. Table 1 provides an overview of the demographics of these subjects. 21 MCI patients were diagnosed with MCI due to Alzheimer's disease and 1 was diagnosed with MCI or early stage dementia. The mean age of all subjects was 74.9 years (SD = 6.15) and the mean MMSE score was 25 (SD = 4.01). A oneway ANOVA showed no differences between groups in age [F (2,91) = 1.097, p = 0.338] and a difference in MMSE scores between groups [F (2,83) = 23.37, p < 0.001]. Post hoc tests, corrected for multiple comparisons with a Bonferroni correction, showed that the mean MMSE scores of the controls were higher than of the MCI patients (p < 0.001) or the AD patients (p < 0.001). Figures 2-5 give an overview of the raw performance data of each group (Figure 2: eye latency; Figure 3: hand latency, and hand error; Figure 4; pupil latency; Figure 5: performance). Table 2 presents the information gain (if selected for the ANN) or merit (if selected by the SVM) and the results of the significance tests for each selected variable. The variables are divided per task to provide better insight in the importance of each task. For illustration purposes, we chose to present the results of the significance tests only for the HC and patients (both MCI and AD) dataset. All selected variables differed significantly between the control group and the aggregated patient group and most of the variables showed medium to very large effect sizes (Cohen, 1988). To give an indication of the similarities between the selected variables between datasets the feature selection results for the controls-MCI, controls-AD, and MCI-AD datasets, using all parameters are presented in Table 2. Table 3 reports the descriptive statistics of the functional variables that were selected based on their information gain (gain > 0), except for the variables that were only selected for the MCI-AD dataset, as these variables did not provide sufficient predictive power to classify MCI and AD patients.
The confusion matrices of these comparisons are shown in Table 7. Figure 6 shows the learning curves generated for all datasets, using the accuracy of the best performing algorithm plotted against sample size. The learning curves for the controls and patients, controls and MCI patients, and controls and AD patients show the initial large increase in accuracy and the turning point where this increase slows. The final phase, where performance plateaus, was not achieved. The learning curve for the MCI and AD dataset displayed only a marginal increase in performance, indicating that the algorithm learned minimally from the EHC parameters.

Predictive Power: Eye Parameters vs. Eye-Hand Parameters
The best performing algorithm for each comparison was trained on one dataset containing all data from the eye tasks and one dataset containing all data from eye-hand tasks. Classification performance was examined for the HC and MCI and the HC and AD datasets. The HC and the patient's dataset was disregarded FIGURE 4 | Mean performance on pupil latency for all groups. Error bars represent standard error. Significant differences were indicated with * p < 0.05, * * p < 0.01, * * * p < 0.001.
FIGURE 5 | Mean weighted performance over all trials for all groups, on all tasks. Error bars represent standard error. Significant differences were indicated with * p < 0.05, * * p < 0.01, * * * p < 0.001. MCI*: MCI group was excluded because of insufficient data (N = 3).
because we aimed to directly examine classification separately for the patient groups. The MCI and AD dataset was disregarded because classification failed using all EHC parameters, which indicated that classification would also fail on the splitted datasets. A one-way ANOVA was performed on all metrics to compare classification performance on the split datasets (significance level α = 0.05). The difference in performance was taken as the additional predictive value of eye-hand tasks. Table 4 reports the performance of the SVM (linear) on the split datasets, with the performance of the baseline for reference.
Classification based on eye tasks performed better on specificity [F (1,198) = 51.62, p < 0.001, partial η 2 = 0.21]; however, the confusion matrix of this algorithm (   Therefore, classification based on eye-hand tasks was considered superior on all metrics.

DISCUSSION
In the present study, the functional consequences of neurodegeneration on the visuomotor network have primarily been assessed in both patients with AD and in patients with mild cognitive impairments using eye-hand coordination (EHC) tasks. The aim of this study was to establish the predictive value of EHC for network dysfunction and the sensitivity and the specificity of this method. This predictive power depended on the predictive power of the individual parameters that constitute visuomotor performance, such as the timing, accuracy and speed of movements. An accuracy of 87% was found using a classification algorithm that distinguished between healthy elderly and MCI patients, based on only eye movement parameters (Lagun et al., 2011). In the present study, a combination of eye and hand movements parameters were obtained, because AD patients seem to have difficulties with handling increased load (Toepper, 2017). As expected, the best classification performance was achieved for HC and AD patients, followed by HC and all patients, and finally HC and MCI patients. The eye-hand task parameters, in comparison with only eye movement parameters, were shown to contribute most to classification performance for all comparisons (Tables 2, 4). Based on the eye tasks, we found a reduction in predictive power of ± 12%. This indeed suggests that an increase in cognitive load contributes to the best classification of controls and either patient group. The predictive power of visuomotor performance was determined by several parameters that were selected during the feature selection procedure using the best performing model for each comparison, to gain insight in the parameters that were most useful for classification (Tables 2, 4). The parameters with the most merit for classification primarily came from the reflexive tapping task and from the inhibition tapping task, contributing to the hypothesis that adding hand movement to the tasks increases their predictive value. In particular, parameters related to latency (time between target presentation and movement initiation), parameters that characterize hand movement (e.g., hand total distance, hand maximum velocity, hand movement time, hand error), and the number of saccades were the most useful information for classification. Several of these parameters have also been described in previous EHC studies (Molitor et al., 2015). For instance, the finding that reflexive saccades made by AD patients are often hypometric is comparable to the parameter saccadic error (the distance between the end position of a saccade and the actual target position) which was selected in this study. Another example is the increased latency of eye movements when executing the anti-saccade task. Furthermore, it was found that the initiation and execution of motor sequences were impaired in AD patients and in MCI patients when cognitive load was increased (Salek et al., 2011). We detected a similar impairment in our patient group, represented by the parameter from the protapping task. The remaining selected parameters ( Table 2 and  Supplementary Table 2) in the current study could not be related to previous literature, as studies into visuomotor performance in MCI and AD patients are scarce. However, the similarities discussed here do support the results of the feature selection procedure and provide additional evidence that these specific tasks and parameters contribute to the predictive power. Considering our findings, visuomotor performance appears as a promising approach to distinguish MCI or early stage AD patients from controls. Yet, it must also be compared to existing biomarkers to determine its clinical relevance. The findings of the current study were compared to a recent overview of studies that examined the performance of CSF and neuroimaging markers for the classification of controls, MCI patients, and AD patients (Figure 7; Henriques et al., 2018). Overall, the classification performance found in the present study seems comparable to that of the existing CSF and MRI biomarkers. Unfortunately, our findings are also similar in that neither visuomotor performance nor CSF markers are particularly successful in distinguishing between MCI patients and AD patients. Although the CSF marker classification achieved a fair AUC, sensitivity remained insufficient. In summary, visuomotor performance can compare to the existing biomarkers, although the classification of MCI and AD patients remains challenging. Assessment of visuomotor The abbreviation RBF stands for radial basis function kernel. The two models in bold are the best performing models. The result of the hyperparameter tuning for the SVM (linear) was a cost of 0.316. Cost represents the penalty given for misclassification: a higher cost allows less misclassification when drawing the decision boundary.  performance offers some practical advantages: it is simple and quick to perform, non-invasive, and relatively cheap when compared to, for example, MRI scans.
This study has several limitations. First, the predictive power of visuomotor performance, and thus its clinical relevance, depend on the tasks that are included in the eye-hand coordination protocol. Our current protocol focused on reflexive behaviors and inhibition and to a lesser extent on memory. The performance scores on the memory tasks were for a majority of the patients very low. Most of the patients failed to complete the memory tasks, indicating that subjects were too fatigued to properly perform these tasks at the end of the protocol, or that these tasks were not appropriate for this population. Still, memory tasks cannot be disregarded in general (Lagun et al., 2011). These tasks involve the recognition of a previously seen image, a function that relies on a temporalparietal memory network (Hirose et al., 2013). Interestingly, this network can also be related to the parietal lobe damage that patients sustain in early stages of AD (Jacobs et al., 2012). Because early stage neurodegeneration damages the parietal lobe, this network would likely also be disrupted (Hirose et al., 2013). Therefore, a recognition task could be an adequate addition to our protocol. Previous literature also suggests that a smooth pursuit task might contribute to classification, as studies reported that AD patients showed increased latencies when they initiated smooth pursuit as compared to controls. Additionally, while following the target, patients showed a lower gain and velocity of eye movements, and because patients tended to lag behind the stimulus, they made more catch-up saccades (Molitor et al., 2015).
Second, we performed no power calculations before the study was conducted, because it was an exploratory study. Instead, we assessed sample size post hoc by generating learning curves (Figure 2). The learning curves for the SVM on the controls and patients, controls and MCI patients, and controls and AD patients datasets indicate that classification would likely have benefitted from an increased sample size, as the curves do not plateau. An increase in sample size could also benefit model stability: the standard deviations of the performance metrics are large and could indicate noisy data, which could be remedied by increasing sample size. However, the learning curves do reach a point where performance only continues to increase minimally. This indicates that a larger sample size would only have a small effect, as the model has learned most of the important patterns in the data. Therefore, we advise that future studies increase the sample size, as that would likely improve classification performance. Furthermore, due to the relatively small sample size in this study, there was a discrepancy between the distribution of the FIGURE 7 | After Henriques et al. (2018). The classification performance of algorithms using CSF or MRI markers to classify HC, MCI patients, and AD patients, compared to classification performance using EHC parameters. control and the patient groups and the distribution of these groups in clinical practice. In our study, controls and AD patients made up an equal part of our sample, whereas in the clinic, most people would be patients: consequently, the group of controls would be relatively small. Even if groups of patients with other types of dementia would be included, 60-80% of the total patient group would still be an AD patient (Williams and First, 2013). This difference is important to keep in mind, because group distribution affects how a model classifies a subject. If almost all instances presented to a model are AD patients and not controls, the algorithm might learn that it can simply classify every subject as being part of the largest group, as it will still achieve a high accuracy. Such a strategy is useless for diagnosis and therefore, it would also be important for future studies to assess the classification models from this study in a more realistically distributed sample. Lastly, data on how subjects performed on conventional diagnostic methods and clinical information such as age of disease onset were unfortunately not available for the complete group of patients. This limits insights in for example how uniform the patient groups were and in which stage of the disease individual patients were. In future studies we recommend that this information is also collected.
As stated previously, early diagnosis is primarily hindered by the heterogeneity of AD, as some patients may show severe symptoms, but no abnormalities in biomarker measurements, or vice versa. As such, no single marker has been identified that can diagnose all cases of AD. It has recently been suggested that AD is not one disease, but a collection of subtypes (Au et al., 2015) that differ in their neuropathological and structural characteristics (Ferreira et al., 2017). The idea of subtypes with different underlying pathologies also gives rise to the hypothesis that an assay of biomarkers could be more useful for diagnosis than relying on a single biomarker, provided that each marker adds new information. For example, it was shown that MRI can complement the standard diagnostic procedure by distinguishing four types of AD using MRI scans, which could not be distinguished using routine cognitive evaluations or CSF analysis (Ferreira et al., 2017). Recent studies have also shown that combining existing biomarkers increases the predictive power beyond what the individual markers could achieve (Zhang et al., 2011;Henriques et al., 2018). Visuomotor performance could provide complementary information (i.e., about functional deficits in patients) in much the same way and when combined with the existing markers, it could further boost the predictive power for early stage AD. As such, testing of visuomotor network dysfunctions could potentially be added to the current diagnostic procedure.
Future research should aim to improve the classification performance of visuomotor network dysfunctions, by adding different types of tasks to develop an optimal protocol. Subsequently, it would be valuable to study whether visuomotor performance correlates with functional network degeneration as measured by neuroimaging techniques, as it is hypothesized to reflect network degeneration. If visuomotor performance can be related to neurodegeneration in this way, next steps should involve replicating the current results in different medical centers and in larger datasets, preferably with a realistic group distribution. Lastly, future research could investigate whether visuomotor functioning could also be a marker for other types of dementia. Neurodegeneration in other dementias also disrupts multiple pathways which are integral to good visuomotor functioning. For example, fronto-temporal dementias could be a valuable field of study, as the frontal lobe is also an important structure for visuomotor function (e.g., planning saccades).

CONCLUSION
Visuomotor network dysfunctions have potential in biomarker research and the proposed eye-hand tasks could add to a clear definition of the preclinical phenotype of AD.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Medical Ethical Committee of the Erasmus MC Rotterdam. The patients/participants provided their written informed consent to participate in this study. Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.

AUTHOR CONTRIBUTIONS
JuS selected, analyzed, and interpreted the data and was a major contributor to writing the manuscript. FM-R designed the study, supported patient recruitment, and interpreted the data. HD advised on the application of the machine learning methodology. JoS designed the eye-hand coordination tasks and interpreted the data. JP designed the study and interpreted the data and was a major contributor in writing the manuscript. All authors read and approved the final manuscript.