Test-Retest-Reliability of Video-Oculography During Free Visual Exploration in Right-Hemispheric Stroke Patients With Neglect

The Mean gaze position during free visual exploration (FVE) is a sensitive tool to detect neglect in patients after a right-hemispheric stroke. Here we investigated the test-retest-reliability of mean gaze position during FVE in 23 patients with left-sided neglect after a first-ever sub-acute right-hemispheric stroke. We analyzed the reliability between different test sets administered within 11 days (test sets A and B, each including different images and their mirrored versions), and between repeated measures using the same test set administered three times within 2 days (test set C, including the same images and their mirrored versions). The intra-class correlation coefficient (ICC) showed good reliability between the two different test sets (test sets A and B; ICC = 0.819), and excellent reliability for the repeated measures with the same test set C (ICC = 0.964). FVE can therefore be recommended for the longitudinal assessments of patients’ neglect severity during neurorehabilitation as well as in treatment trials.


INTRODUCTION
Spatial neglect is characterized by the failure to attend or respond to the contralesional hemispace (Heilman et al., 1993). After stroke, neglect has been reported to occur in 43-80% of patients with a right-hemispheric lesion (Stone et al., 1991;Azouvi et al., 2002;Ringman et al., 2004). Recent studies suggested that video-oculography may be an appropriate method to analyze visual neglect (visual exploration of naturalistic scenes, e.g., Delazer et al., 2018;Kaufmann et al., 2020a,b; visual exploration of abstract imags, e.g., Mannan et al., 2005; visual exploration of faces, e.g., Van Belle et al., 2010a,b;Verfaillie et al., 2014). Especially, a free visual exploration paradigm (FVE) might be a fast and accurate screening tool to detect neglect (Delazer et al., 2018;Kaufmann et al., 2020a). Indeed, evidence of the relationship between mean horizontal gaze position and neglect in everyday behavior, as assessed by means of the Catherine Bergego Scale (CBS), has been recently shown (Kaufmann et al., 2020a). Furthermore, the mean gaze position on the horizontal axis has been shown to be more sensitive in detecting neglect than conventional neuropsychological paper-pencil tests, such as the Line Bisection, Bells Cancellation, or Random Shape Cancellation Test (Kaufmann et al., 2020a). For most of these neuropsychological paper-pencil tests, testretest-reliability is well known in neglect patients. However, the test-retest-reliability of the mean gaze position during FVE in neglect patients remains unknown. This aspect is of high relevance if the mean gaze position during FVE is intended to be used in clinical practice or as an outcome measure in research. Therefore, the aim of the present study was to investigate the test-retest-reliability of the mean gaze position on the horizontal axis as an indicator of neglect, as assessed by video-oculography during FVE. Test-retest-reliability was assessed between different test versions (i.e., including different images and their mirrored versions), as well as for the same test version applied over several measurement time points (i.e., including the same images and their mirrored versions) in patients with left-sided neglect after a first-ever subacute, right hemispheric stroke.

Patients
A total of 23 patients with left-sided neglect after a firstever, right hemispheric, subacute stroke were recruited in the Neurorehabilitation Center of the Luzerner Kantonsspital (mean age = 72.74, SD = 10.25; 5 female; mean time since stroke = 19.73 days, SD = 8.83).
All patients provided written informed consent to participate in the study. The study was approved by the local ethics committee.

Video-Oculography
Video-oculography was assessed by means of an FVE paradigm, as previously described (Ptak et al., 2009;Cazzoli et al., 2011;Fellrath and Ptak, 2015;Paladini et al., 2019;Kaufmann et al., 2020a,b). In short, naturalistic images (e.g., colored photographs of everyday scenes such as the view of a mountain or of a public place; size 1200 × 900 pixels), and their mirrored versions (mirrored along the vertical axis) were presented on a computer screen. Each of the images was presented for 7 s, and was preceded by a central, black fixation-cross on a gray background (3 s), in order to enforce a common central starting point of visual exploration for all patients. All patients were instructed to freely explore the images, as if they would look at pictures in a newspaper or a photo album. A 3 × 3-point grid was presented for calibration of the eye-tracking system and for its validation prior to the experiment. During video-oculography, patients were seated in front of the screen, and their heads were positioned on a chin-and-forehead rest, to ensure that their mid-sagittal plane was aligned with the middle of the screen at a constant distance of 68 cm (resulting in a viewing angle of 28 • × 21 • ) and to minimize head movements. Eye movements were recorded using a remote, infrared-based, video-eye-tracking system (EyeLink 1000 Plus System, SR Research, Ottawa, ON, Canada). All fixations with a duration between 100 and 2000 msec were included in the offline data analyses (fixations excluded = 5.94%) (Salthouse and Ellis, 1980;Carpenter, 1988). The mean gaze position on the horizontal axis in degrees of visual angle (i.e., the mean x-position on the screen) was calculated using R. The mean gaze position, expressed in degrees of visual angle, allows quantifying neglect severity. The mean gaze position can range between −14 • (at the far left of the images) to +14 • (at the far right of the images). A mean gaze position of 0 • thus indicates a spatially unbiased distribution of fixations, whereas positive values indicate a shift toward the right side of space, which is typical for righthemispheric stroke patients with left-sided neglect (e.g., Paladini et al., 2019;Kaufmann et al., 2020a).
In a recent study, we also found a significant relationship between the mean gaze position and neglect severity in daily living as assessed by the CBS (Kaufmann et al., 2020a). Therefore, the Pearson's correlation between the mean gaze position (in degrees of visual angle) of the initial assessment of FVE and the CBS total score was computed (1-tailed).
To investigate the test-retest-reliability of video-oculography during FVE between different test versions and over several measurement points, different test sets were used.

Test Sets A and B
In a first study, test performance in 18 patients was compared between two different test sets, i.e., test set A and test set B, using a cross-over design ( Figure 1A). Each test set included 24 images (12 images and their 12 mirrored versions). All patients viewed both test sets within 11 days, the order of the sets being randomized over patients. The mean time elapsed between the two testing sessions with the respective test was 5.06 days (SD = 3.84 days).

Test Set C
In a second study, test-retest-reliability was assessed between three consecutive measurements, using test set C in 11 patients (thereof, six patients also participated in the first study). Test set C was a short version of FVE, including 12 images. Therefore, tests sets A and B were merged and six images and their six mirrored versions were randomly selected featuring test set C ( Figure 1B). All three measurements were performed on 2 consecutive days in all patients. The mean time elapsed between M1 and M2 was thus 24 h, and between M2 and M3, 4 h, respectively ( Figure 1B).

Reliability Analysis
The mean gaze position (in degrees of visual angle) was compared between test sets A and B using a paired t-test (two-tailed). The mean gaze position (in degrees of visual angle) in the three measurements using test set C was evaluated by means of a univariate ANOVA with repeated measures. For all statistical tests, the significance level of α = 5% was used.
The reliability of video-oculography was determined by computing the intra-class correlation coefficient (ICC) using SPSS (IBM SPSS Statistics, Version 25) based on a meanrating, absolute-agreement, two-way mixed-effects model (Koo and Li, 2016). Several reliability analyses were conducted. For each analysis, the patient's individual agreements between measurements were plotted in a Bland-Altman plot including the 95% limit of agreement (Giavarina, 2015). The Bland-Altman plots allow comparing two measures of the same variable, by plotting the mean of the two measures on the x-axis and the difference between the two measures on the y-axis (Giavarina, 2015;Kalra, 2017). The graphic interpretation of the plots may then be used to identify outliers and potential, systematic over-/under-estimations in either of the two measures (Kalra, 2017).

Test-Retest-Reliability Between Two Test Sets Including Different Pictures (Test Set A and Test Set B)
The reliability of video-oculography between test set A and test set B was determined through the ICC. For each patient, the agreement between test set A and test set B was plotted in a Bland-Altman plot (Giavarina, 2015).

Test-Retest-Reliability Between Three Measurements Administered With the Same Test Set (C)
The reliability of video-oculography between three repeated measures of the same test set (C) was determined through the ICC. For each patient, the agreement between measurements was plotted for each combination (M1-M2, M1-M3, and M2-M3) separately using Bland-Altman plots (Giavarina, 2015).

RESULTS
The mean gaze position (in degrees of visual angle) in the initial assessment of FVE significantly correlates with neglect severity in daily living as assessed by the CBS (r = 0.362 moderate effect, p = 0.045, one-tailed, Table 1).

Test-Retest-Reliability Between Two Test Sets Including Different Pictures (Test Set A and Test Set B)
The mean gaze position did not differ between test sets A and B [mean gaze position in test set A = 2.016 • (SD = 1.458 • ), mean gaze position in set B = 2.134 • (SD = 1.972 • ); t(17) = −0.370, p = 0.716]. Thus, in both sets, the spatial distribution of fixations is significantly shifted toward the right.
Intra-class correlation coefficient conducted between the two test sets of FVE (test sets A and B) showed a reliability index of 0.819, indicating good test-retest reliability for FVE (Koo and Li, 2016; Table 2). Ninety-five percent of our sample showed an ICC between 0.512 and 0.933 (Figure 2A). For each patient, the agreements between two measurements were plotted in a Bland-Altman plot including the 95% limit of agreement (Giavarina, 2015). The graphic interpretation of the Bland-Altman plot confirms that all patients performed within the upper and lower limits of agreement. The individual values of each participant are distributed above and below the 0 line, which suggests that there is no consistent bias of one test set versus the other (Kalra, 2017).
The analysis of FVE between three measurements with the same test set C revealed an ICC reliability index of 0.964, indicating excellent test-retest-reliability (Koo and Li, 2016; Table 2). Ninety-five percent of our sample showed an ICC between 0.903 and 0.990, referring to an excellent consistency. For each patient, the agreements between two measurements (M1-M2, M1-M3, M2-M3) were plotted in a Bland-Altman plot including the 95% limit of agreement (Giavarina, 2015; Figures 2B-D). The graphic interpretation of the Bland-Altman plots confirms that all patients performed within the upper and lower limits of agreement. In all three plots, the individual values of each participant are distributed above and below the 0 line, which suggests that there is no consistent bias of one measurement versus the others (Kalra, 2017).

DISCUSSION
In the present study, we investigated the test-retest-reliability of video-oculography during FVE between different test sets (test set A, test set B) and between repeated measures using the same test set (test set C).
We found that the mean gaze position on the horizontal axis during FVE of naturalistic images (e.g., photographs of everyday scenes such as the view of a mountain or public places) and their mirrored versions shows good to excellent reliability and is stable concerning retesting.
The reliability between the two test sets (test set A, test set B), administered within 11 days, was good (ICC = 0.819). This shows that the content of naturalistic photographs imaging everyday scenes seems not to be crucial, provided that each picture is presented with its respective mirrored version.
Furthermore, comparing the ICC of our two analyses revealed that the reliability between test set A and test set B was slightly lower than the reliability for the repeated measures using test set C. This difference in ICC may have different causes. For example, since the mean time elapsed between measurements was 5 days, and all patients had sub-acute stroke, it is possible that neglect severity already improved in some patients due to spontaneous neglect recovery or strategies learned in neurorehabilitation therapy (Bailey et al., 2004). Note that, due to ethical reasons, all our patients received neurorehabilitative therapy in between the assessments of test set A and test set B; this might have influenced neglect recovery. On the other hand, as the same test set (C) was administered three times within a relatively short time period (within 2 days), the patients' individual differences between test and retest measures might rather be related to variations in attentional level over time (Bailey et al., 2004). Using video-oculography during FVE has several advantages. First, mean horizontal gaze position significantly correlates with neglect severity in daily living as assessed by the CBS (Kaufmann et al., 2020a), which was also replicated in the present study. Second, it has high sensitivity and specificity to diagnose neglect after a stroke, and it is even more sensitive than conventional neuropsychological cancellation tests (Kaufmann et al., 2020a).
Third, FVE can be performed in less than 10 min and has the potential to be used as a fast and accurate screening tool that allows the initiation of comprehensive diagnostics and therapy from early on (Kaufmann et al., 2020a). Finally, visual exploration is spontaneous and requires only little effort from the patient.
A potential limitation of our study is that we included a relatively small sample size and did not include a healthy control group.
In conclusion, our results show good to excellent test-retestreliability of FVE, and the ICC of FVE values which are comparable to commonly used paper-pencil tests. FVE can therefore be recommended for the longitudinal assessments of a patient's neglect severity during neurorehabilitation as well as in treatment trials.

DATA AVAILABILITY STATEMENT
Individual participant data collected in this study will not be distributed openly to conform to the data privacy statements signed by our participants. However, specific aspects of the anonymized datasets and codes supporting the findings presented in this paper will be shared upon request to TNy, email: thomas.nyffeler@luks.ch.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Ethics Committee Nordwest and Zentralschweiz (EKNZ), Switzerland. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
BK, TNy, and DC contributed to the conception and design of the study and wrote the first draft of the manuscript. BK organized the database and performed the statistical analyses. TNe and RM wrote sections of the manuscript. DC, TNy, TNe, and RM critically revised the work for important intellectual content. All authors contributed to the manuscript revision, and read and approved the submitted version.

FUNDING
This work was supported by the SNF Grant No. 320030_169789.