Hyperparameter tuning using Lévy flight and interactive crossover-based reptile search algorithm for eye movement event classification

Introduction: Eye movement is one of the cues used in human–machine interface technologies for predicting the intention of users. The developing application in eye movement event detection is the creation of assistive technologies for paralyzed patients. However, developing an effective classifier is one of the main issues in eye movement event detection. Methods: In this paper, bidirectional long short-term memory (BILSTM) is proposed along with hyperparameter tuning for achieving effective eye movement event classification. The Lévy flight and interactive crossover-based reptile search algorithm (LICRSA) is used for optimizing the hyperparameters of BILSTM. The issues related to overfitting are avoided by using fuzzy data augmentation (FDA), and a deep neural network, namely, VGG-19, is used for extracting features from eye movements. Therefore, the optimization of hyperparameters using LICRSA enhances the classification of eye movement events using BILSTM. Results and Discussion: The proposed BILSTM–LICRSA is evaluated by using accuracy, precision, sensitivity, F1-score, area under the receiver operating characteristic (AUROC) curve measure, and area under the precision–recall curve (AUPRC) measure for four datasets, namely, Lund2013, collected dataset, GazeBaseR, and UTMultiView. The gazeNet, human manual classification (HMC), and multi-source information-embedded approach (MSIEA) are used for comparison with the BILSTM–LICRSA. The F1-score of BILSTM–LICRSA for the GazeBaseR dataset is 98.99%, which is higher than that of the MSIEA.


Introduction
The human eye is considered a spontaneous way of understanding human communication and interaction, which is exploited for processing data according to the nearby environment, in response to the respective situation.The physiological capacities are highly constrained from creating movement in any of the limbs or the head, as a result of various diseases such as Parkinson's, spinal cord injury, locked-in syndrome, muscular dystrophy, multiple sclerosis, complete paralysis, and arthritis.Hence, around 132 million disabled people require a wheelchair, and only 22% of them have access to one.Moreover, these disabled people cannot use a technically improved wheelchair.Therefore, an eye detection and tracking method is investigated for enhancing the interaction between humans and computers, and it will enhance the living standard of disabled people (Dahmani et al., 2020;Barz, and Sonntag, 2021;Koochaki, and Najafizadeh, 2021;Aunsri, and Rattarom, 2022).Brain activity triggers eye movements that are a response to visual stimuli or an intent to obtain information about the neighboring environment (Harezlak, and Kasprowski, 2020;Li et al., 2021;Vortmann, and Putze, 2021).Generally, eye movements are categorized into saccades and fixations, i.e., when the eye gaze moves from one position to another position and pauses at a certain position, respectively (Harezlak et al., 2019;Rahman et al., 2021;Yoo et al., 2021).
Eye tracking is the process of tracking and determining the movements of the eye and the focal point of the eye.The technology of eye tracking is used in various fields such as cognitive science, computer gaming, marketing, medicine, and psychology.Hence, eye tracking is extensively used in computer science applications by making use of the features of the eye for studying information processing tasks.In general, eye-tracking information is computed and acquired by using an eye-tracking sensor/camera.The acquired data offer many features and are useful in various classification tasks (Lim et al., 2022;Holmqvist et al., 2023).Eye tracking metrics are used for disclosing perceptions about the participant's actions and mindset in different circumstances.Significant eye-tracking metrics are saccades, duration, pointing, fixation, and pupil diameter (Bitkina et al., 2021;Elmadjian et al., 2023).Eye movement classification is affected because eye-tracking data contain a huge amount of user data that are not required for all applications.For example, eye movement discovers characteristics such as bio-markers, identity, and gender (David-John et al., 2021).
In this research, eye movement event detection is performed using a deep learning classifier with hyperparameter tuning.Generally, hyperparameter tuning is used to choose the parameter values and obtain improved classification (Shankar et al., 2020).The major contributions of this research are given as follows: • A BILSTM is used for classifying eye movement events to help disabled people.The BILSTM is used in this research because it considers both the past and upcoming data while classifying the given input.The remaining paper is organized as follows: Section 2 provides the details about existing eye-tracking applications.The proposed method is detailed in Section 4, whereas the outcomes of the proposed method are presented in Section 4. Finally, Section 5 offers the conclusion.
2 Related works Li et al. (2020) implemented a machine learning-based automated approach to perform fatigue detection and classification for equipment operators.Toeplitz inverse covariance-based clustering (TICC) was used to obtain various mental fatigue levels and labeling based on the movement of the eye.Features of eye movement were acquired for various construction sites, and supervised learning classified the mental fatigue levels of the operator.TICC along with machine learning were used in various construction sites due to higher accuracy.An additional enhancement in accuracy was achieved only by using a large number of eye movement metrics related to mental fatigue.Yang et al. (2023) developed an analysis for the attention patterns in depressed patients based on the region-of-interest (ROI) analysis.The established ROI recognition analysis was named ROI eventless clustering (REC) and did not need eye movement event discovery.For diverse attribute features, ROI clustering was operated with deflection elimination (RCDE) for supporting the discovery of depression.This RCDE also used noisy data for describing the attention patterns.Moreover, it was essential to use eye movement event because gaze features were vital while performing classification.Mao et al., 2020 implemented disease classification according to the movements of the eye by using a deep neural network.Normalized pupil data such as size and location were offered as features of the eye movement.For each feature, long short-term memory (LSTM) was used for developing a weak classifier.The weights of each weak classifier were discovered using a self-learning method.Next, a strong classifier was designed by synthesizing the weak classifiers.The classifier with fewer samples was less robust while performing the classification.Zemblys et al., 2019 implemented event detection using gazeNet without any requirement for hand-crafted signal features or signal thresholding.End-to-end deep learning was used in gazeNet, which categorized raw eye-tracking data into fixations, post-saccadic oscillations, and saccades.The problems created by unbalanced inputs were overcome by using heuristic data augmentation.However, the effect of previously classified information was required to be eliminated in gazeNet for enhancing event detection.Friedman et al., 2023 developed the classification of eye movement using human manual classification (HMC).In manual classification, higher inter-rater reliability was used because it was an important representation of an expressive standardized categorization.For evaluating the results acquired from automatic classification, inter-rater reliability was used, alongside the training of machine learning approaches to achieving better classification.The HMC was effective only when operated with less input data during eye movement detection.Yuan et al., 2021 presented the multi-source informationembedded approach (MSIEA) to investigate driving actions.A precise eye gaze was estimated by using the identification of eye gaze without gaze calibration.The information of multiscale sparse features of eye and head poses was combined for predicting the direction of gaze.Next, fused data were obtained by integrating the estimated gaze with vehicle data.The FastICA was used for discovering a large amount of drivingrelated data which were used for understanding the driving actions.The driver's head orientations affected the performances of the MSIEA.Kanade et al., 2021 developed gaze classification using convolutional neural networks (CNNs) for vehicular environments.From the input, the images of the face, right eye, and left eye were acquired using region-of-interest.Appropriate gaze features were obtained by fine-tuning the VGG-face network with pre-trained CNNs.Here, the classification was performed using the distance factor.The learning methodologies used in eye-tracking were utilized for enhancing the performance of eye feature evaluation.

BILSTM-LICRSA method
The classification of eye movement event detection is performed using a BILSTM deep learning classifier, whereas the LICRSA is used to optimize the hyperparameters.The important processes of this proposed method are dataset acquisition, data augmentation, feature extraction using VGG-19, BILSTM classification, and LICRSAbased hyperparameter tuning process.The block diagram for the BILSTM-LICRSA method is shown in Figure 1.

Lund2013
An annotated eye-tracker dataset, Lund2013 dataset, created at Humanities Lab (Larsson et al., 2013) was used to perform eye movement event detection.Monocular eye movement data of a person's viewing images, videos, and moving dots are included in the Lund2013 dataset.The Lund2013 dataset has different classes such as fixations, saccades, smooth pursuit, post-saccadic oscillations, blinks, and undefined events.The data of fixations, saccades, and post-saccadic oscillations from the Lund2013 dataset are used in this research, which are 136,078 samples in total.

Collected dataset
Participants were made to sit directly facing the webcam at a fixed distance for collecting real-time face images.Videos are obtained by the webcam while the user follows a pre-defined on-screen target, i.e., a dot that moves in different locations.An onscreen target's trajectory is recorded as the eye movement trajectory (EMT) in the system, and related angle images are saved using the real-time webcam.A sample real-time face image is shown in Figure 2.Both the images and EMT are combined as a collective dataset and used for real-time analysis.The total instances gathered are 10,000 from 100 test users, and it has the labels of fixations, saccades, and post-saccadic oscillations.

GazeBaseR
The GazeBaseR dataset has temporal motion features of gaze points and spatial distribution features of saccades.The gaze point's temporal motion features are a sequence of timestamps, and it is related to the gaze points, e.g., [0.1 s, (30, 20)], [0.2 s, (32, 22)], and [0.3 s, (31, 21)], which denotes the gaze point at various times.On the other hand, spatial distribution is a saccade vector series, e.g., (5, 3), (4, 2), and (3, 1), which denotes the distance and direction of eye movements.Two classes exist in this dataset during the prediction: movement or no movement.

UTMultiView dataset
The UTMultiView has eye image and 3D head pose features, where the eye image is a grayscale and low-resolution image that has the iris, pupil, and some portion of the sclera.Next, the 3D head pose is a three-element vector (such as 30, 45, and 60) denoting a person's head orientation by means of roll, pitch, and yaw.The prediction provides the three-element vector, e.g., 10, 20, and −10, denoting the evaluated direction of gaze.
The real-time images from the collected dataset and UTMultiView dataset are processed under FDA and VGG-19 to augment and extract features from the images.The text data are directly fed to the classifier along with the respective extracted features from VGG-19 for classification.

FDA-based data augmentation
FDA (Dabare et al., 2022) augments the collected data (i.e., images from the dataset), which is considered a preprocessing approach for avoiding overfitting issues in a classifier.There are two different phases that exist in the augmentation: fuzzification and generation of new augmented data.First, fuzzification is performed according to the clustering and identifying the membership grade of every record.A new input characteristic value is created according to the adequate cluster center value discovery and using the threshold value, which is an α-cut value.The parameters given as input to perform augmentation are specified in Table 1.

Fuzzification of an entire input space
For the input data, the attributes are clustered using the fuzzy C-means (FCM) clustering method.The FCM method is selected because of the identification capacity for each cluster's membership grade of every piece of information.The fuzzification process is described as follows: 1. FCM clusters the input data and deduces the membership degree of each attribute of the cluster.The usage of FCM clustering for unknown data offers a measure of belongingness, which is represented as the membership grade for every cluster.
If unknown information is clustered, it belongs to various clusters.In FCM clustering, data have membership degrees either as 1 or 0 for the cluster center.The membership degree is 1, when the data belong to a respective cluster; otherwise, the membership degree is 0. Therefore, the values of membership degree are defined by the cluster.
Therefore, y i x i1 x i2 x i3 . . .x ik , where the record of the dataset is represented as i.FCM clustering is performed over the input, and a new matrix E i is formed by the membership grade for each cluster as shown in Eq. 3.
where, μ represents the fuzzy membership grade for the dataset's first record and μ i2 represents the second cluster.Eq 3 shows the transformation of input X into the element E, which is in descending order, as shown in Eq. 4.

New augmented dataset generation
An allocation of cluster center α-cut for the fuzzified data is used to form the augmented data, wherein the steps are explained as follows: 1. Fuzzy uncertainty and fuzzy clustering are used to transform the input for identifying the appropriate cluster center of each data of cluster.2. The fixed α-levels between 0 and 1 are horizontally cut by the membership function according to Figure 3, which illustrates a  Noise type Gaussian noise: e.g., standard deviation (σ) = 0.01 to 0.05-low-to-moderate noise; salt-andpepper noise: e.g., noise density = 0.001 to 0.01-low-to-moderate noise sample of triangular membership function at the α-cut level of 0.25.The membership function is horizontally cut between 0 and 1 with a restricted amount of α-levels.The uncertainty is high when there is a huge support for the membership function.The fuzzy set has all components in a membership function of α[0, 1], and the above is referred as the membership function's α-cut.The finest augmented data are created by the optimum α-cut for generalization purposes without using an overall fuzzified data point.This FDA uses the trial-and-error approach to identify the optimum α-cut because an inadequate α-cut affects the balance of achieving the important features of gaze movements.According to the deliberation of how well the approach is generalized with the selected α-cut value, an optimum α-cut is selected for FDA.The group of elements that belong to the fuzzy set A and to the degree α is denoted as α-cut and is shown in Eq. 5, where the membership grade's belongingness is denoted as μ A .

Parameters Description and value
3. Data with a smaller membership degree than the selected α-cut are avoided from the new cluster center dataset.
Equation 6 is the identified threshold value for E α , which represents the optimal α-cut value.The formulated threshold value is used for knowing the amount of data filtered from the augmented dataset.The threshold value, i.e., α-cut in E α , is expressed in Eq. 6.
where U denotes the universe of discourse, μ E is the membership grade between [0, 1], and value between [0, 1] is denoted as α.If the fuzzy α-cut is used in E, the number of rows in E is changed according to the fuzzy α-cut.Moreover, a cluster center dataset is formed by using the centers of each cluster.For instance, Eq. 7 shows the cluster center dataset for n records of original data and three clusters.

CL
Hence, the cluster center of each element is transformed as the CL cluster center dataset.The cluster center is considered a symbol of fuzzy values of data.Subsequently, the identified cluster centers are included to the input dataset to generate the augmented dataset, as expressed in Eq. 8.

Aug data X, CL}. ( 8 )
The augmented data from the FDA is concatenated with the input data results, in 305,744 samples, which is 55.49% is higher than the given input.The sample for the augmented real-time image used along with the input is shown in Figure 4.

Feature extraction using VGG-19
In feature extraction, the VGG-19 (Mateen et al., 2019) is used to obtain important features from the augmented input (Aug data), i.e., augmented images.Generally, VGG-19 is a deep neural network with multilayered operation.Due to its simple architecture, VGG-19 is suitable, and the 3 × 3 convolutional layers are positioned at the top to increase the depth level.In VGG-19, the max-pooling layers are used as a handler to minimize volume size, and the two fully connected layers are used with 4,096 neurons.Here, feature extraction is accomplished by convolutional layers, and the dimensionality of features is reduced by max-pooling layers related to the convolutional layers.In the first convolutional layer, 64 kernels are used to accomplish the feature extraction, followed by a feature vector generated by fully connected layers.For each sample, VGG-19 returns 4,096 features during feature extraction.The extracted feature vectors are given as input for BILSTM classification.

Classification using BILSTM
The features from VGG-19 are given along with the text features from the respective dataset as input to the BILSTM (Ali et al., 2021) for classifying eye movements.In general, LSTM is an extended version of a recurrent neural network with a similar kind of architecture.The RNN and LSTM transfer the data from one stage to another stage.The LSTM classification offers higher success in longstanding dependencies.However, a single LSTM unit is restricted to classifying the output according to the previous data.Hence, the single LSTM has the possibility of providing misclassification without considering the forward data.Accordingly, the BILSTM approach is developed, which incorporates both the past and upcoming data, therefore enhancing the classification.There are two LSTM models operated in a parallel manner, as shown in Figure 5.In that, one LSTM operates from the input data's start, and the other operates from the input data's end.Consequently, the BILSTM classification supports both the previous and upcoming data.For instance, the first and second LSTM of BILSTM studies the data starting from left to right and ending from right to left.This helps the BILSTM model completely keep the information of eye movement data for classification.

Performance analysis
Eye movement event detection is an important objective of this proposed method, which is specifically designed for disabled people.For performing eye movement event detection and classification, data from the Lund2013 and collected datasets acquired in real-time are used for analysis.However, event detection using eye movement is not implemented by many types of research; hence, two more datasets: GazeBaseR dataset (https://figshare.com/articles/dataset/GazeBase_Data_Repository/12912257.) and UTMultiView (Sugano et al., 2014) are considered for further analyzing the proposed method.A total of 322 subjects are included in the GazeBaseR dataset, where everyone has completed two recording sessions.Moreover, UTMultiView has 24,320 samples, which include head pose and gaze directions.The training and testing ratio of 70:30 is considered for evaluating the BILSTM-LICRSA.In this section, the proposed method is evaluated with different classifiers and optimization algorithms for the hyperparameter tuning process.

BILSTM-LICRSA evaluation with different classifiers
This section shows the performance of BILSTM with different classifiers such as GAN, RNN, and LSTM.The confusion matrix (CM) of the GAN, RNN, LSTM, and BILSTM for the Lund2013 dataset is shown in Figure 6.The numbers 0, 1, and 2 represent the classes of fixations, saccades, and post-saccadic oscillations, respectively.This CM is used to determine how well the developed model performs an effective classification.From the analysis, it is concluded that BILSTM offers a better performance than the GAN, RNN, and LSTM.
Here, the LICRSA-based hyperparameter tuning is incorporated into the classifiers.The analysis of BILSTM with different classifiers is shown in Tables 2-5 for Lund2013, collected, GazeBaseR, and UTMultiView datasets, respectively.From these analyses, it is found that the proposed BILSTM method provides better performance than the GAN, RNN, and LSTM.For example, the accuracy of BILSTM for the GazeBaseR dataset is 98.95%, whereas the GAN obtains 95.93%, RNN obtains 96.80%, and LSTM obtains 97.40%.The reasons for BILSTM having superior performances are stated as follows: 1) the combined information of both the past and upcoming data is used to avoid misclassification while classifying the data and 2) the hyperparameter tuning process developed for BILSTM further enhances the classification.
The performance evaluation for the BILSTM-LICRSA according to the augmentation is shown in Table 6.This shows that BILSTM-LICRSA with augmented data from the FDA improves the classification than the actual input.The BILSTM-LICRSA with FDA achieves enhanced performance compared to the classifier with actual input by avoiding the overfitting issue.
The bootstrapping average precision (AP) and rank for different classifiers is shown in Table 7.According to the statistics, an average precision is investigated from a single test set sample, that is, the point     evaluation.The point evaluation is varied with the usage of various test sets for investigation, which falls into confidence intervals with a definite probability.These confidence intervals are utilized for evaluating the difference among the algorithms.This analysis represents that BILSTM has the first grade among other classifiers.

BILSTM-LICRSA evaluation with different optimizations
This section shows the performance of the LICRSA with different optimization algorithms such as PSO, GWO, and RSA.The evaluation of BILSTM-LICRSA with different optimization algorithms for Lund2013, collected, GazeBaseR, and UTMultiView datasets is given in Tables 8-11, respectively.This analysis shows that the LICRSA achieves improved classification than PSO, GWO, and RSA.The LICRSA with BILSTM for the GazeBaseR dataset achieves an accuracy of 98.95%, whereas the PSO obtains 93.56%, GWO obtains 96.24%, and RSA obtains 97.89%.In LICRSA, the Lévy flight and interactive crossover are used for searching the local solutions and enhancing the ability to search, which is further used to achieve the optimal set of hyperparameters.This helps improve the classification of eye movement.

Discussion
This section offers a detailed discussion related to the outcomes of BILSTM-LICRSA developed for eye movement event classification.Initially, the results of BILSTM were investigated with different state-of-the-art classifiers such as the GAN, RNN, and LSTM.Next, different optimization algorithms such as PSO, GWO, and RSA were used to investigate the efficiency of optimal hyperparameters discovered from the LICRSA.The developed BILSTM-LICRSA method is analyzed with four datasets: Lund2013, collected dataset, GazeBaseR, and UTMultiView.The evaluation of results represents that the BILSTM-LICRSA achieves a better performance than the GAN, RNN, LSTM, PSO, GWO, and RSA.Moreover, the BILSTM-LICRSA has better performance than the existing gazeNet (Zemblys et al., 2019), HMC (Friedman et al., 2023), and MSIEA (Yuan et al., 2021).BILSTM presents robust classification by integrating both the past and upcoming data during the recognition.Moreover, the optimum hyperparameters obtained from the LICRSA additionally help improve the classification.

Conclusion
In this paper, an effective eye movement event classification is achieved by using BILSTM with a hyperparameter tuning process.The LICRSA-based hyperparameter tuning is done according to the accuracy for improving the classification process.The Lévy flight and interactive crossover are used for searching the local solutions and improving the searching ability to achieve the optimal set of hyperparameters.On the other hand, the utilization of past and upcoming data in BILSTM further enhances the classification.The issue related to overfitting is avoided by using FDA-based augmentation.Therefore, the combination of BILSTM and the LICRSA achieves better classification of eye movements.The outcomes of the BILSTM-LICRSA show that it outperforms the gazeNet, HMC, and MSIEA.The F1-score of BILSTM-LICRSA for the GazeBaseR dataset is 98.99%, which is superior to that of the MSIEA.In future, different ways of feature aggregation can be studied to enhance the performance of the proposed eye movement event classification.

FIGURE 3
FIGURE 3Sample of the triangular membership function for α-cut.
2. The membership grade is considered in descending order for each attribute during the formation of the new membership dataset.For each record in the dataset, it is considered that the input contains n records and k attributes.Eqs 1, 2 show the input (X) and output (Y) spaces, respectively.

TABLE 2
BILSTM-LICRSA evaluation with different classifiers for the Lund2013 dataset.

TABLE 3
BILSTM-LICRSA evaluation with different classifiers for the collected dataset.

TABLE 4
BILSTM-LICRSA evaluation with different classifiers for the GazeBaseR dataset.

TABLE 5
BILSTM-LICRSA evaluation with different classifiers for the UTMultiView dataset.

TABLE 7
Analysis of bootstrapping AP and rank.

TABLE 9
BILSTM-LICRSA evaluation with different optimization algorithms for the collected dataset.
TABLE 10 BILSTM-LICRSA evaluation with different optimization algorithms for the GazeBaseR dataset.