Unleashing the potential of fNIRS with machine learning: classification of fine anatomical movements to empower future brain-computer interface

In this study, we explore the potential of using functional near-infrared spectroscopy (fNIRS) signals in conjunction with modern machine-learning techniques to classify specific anatomical movements to increase the number of control commands for a possible fNIRS-based brain-computer interface (BCI) applications. The study focuses on novel individual finger-tapping, a well-known task in fNIRS and fMRI studies, but limited to left/right or few fingers. Twenty-four right-handed participants performed the individual finger-tapping task. Data were recorded by using sixteen sources and detectors placed over the motor cortex according to the 10-10 international system. The event's average oxygenated Δ HbO and deoxygenated Δ HbR hemoglobin data were utilized as features to assess the performance of diverse machine learning (ML) models in a challenging multi-class classification setting. These methods include LDA, QDA, MNLR, XGBoost, and RF. A new DL-based model named “Hemo-Net” has been proposed which consists of multiple parallel convolution layers with different filters to extract the features. This paper aims to explore the efficacy of using fNRIS along with ML/DL methods in a multi-class classification task. Complex models like RF, XGBoost, and Hemo-Net produce relatively higher test set accuracy when compared to LDA, MNLR, and QDA. Hemo-Net has depicted a superior performance achieving the highest test set accuracy of 76%, however, in this work, we do not aim at improving the accuracies of models rather we are interested in exploring if fNIRS has the neural signatures to help modern ML/DL methods in multi-class classification which can lead to applications like brain-computer interfaces. Multi-class classification of fine anatomical movements, such as individual finger movements, is difficult to classify with fNIRS data. Traditional ML models like MNLR and LDA show inferior performance compared to the ensemble-based methods of RF and XGBoost. DL-based method Hemo-Net outperforms all methods evaluated in this study and demonstrates a promising future for fNIRS-based BCI applications.


Introduction
fNIRS (functional near-infrared spectroscopy) is a non-invasive neuroimaging technique that uses near-infrared light to measure changes in oxygenated ( HbO) and deoxygenated hemoglobin ( HbR) in the brain (Ferrari and Quaresima, 2012;Wilcox and Biondi, 2015).Neural activity in a brain region is associated with blood flow changes due to neurovascular responses; fNIRS measures the brain activation by using near-infrared light (optical window of wavelength: 650 − 1100; nm) (Strangman et al., 2003;Sato et al., 2004).The change in the optical densities is then converted to hemoglobin concentration changes using Modified Beer-lambert Law (MBBL) (Wilcox and Biondi, 2015).fNIRS is becoming popular in brain-computer interface (BCI) research because it is non-invasive, portable, and relatively lowcost compared to other neuroimaging techniques such as functional magnetic resonance imaging (fMRI) (Naseer and Hong, 2015;Khan et al., 2021a).In early 2000, it was potentially complemented that the new developments in fMRI equipment, preprocessing algorithms, and robust statistical approaches will make it suitable for BCI applications (Sitaram et al., 2007).However, with its limitation of temporal resolution, restricted movements, constraint due to strong magnetic field, and calibration issues, fNIRS is becoming more popular for BCI applications compared to fMRI.The fNIRS measurements of concentration changes yield a similar signal as the blood oxygen level-dependent (BOLD) response acquired through fMRI with additional information about HbR (Strangman et al., 2002;Gagnon et al., 2012).Additionally, it can measure changes in brain activity at a relatively high temporal resolution, which is important for real-time BCI applications.
Among the main challenges in fNIRS-based BCI include a reduction in the response time, an increase in the number of control commands, and improving the classification accuracy of the system (Hong and Khan, 2017).To achieve these goals and utilize the advantage of different brain imaging modalities for BCI applications, a new sub-field emerged within BCI called hybrid BCI (hBCI).In hybrid BCI, at least two brain signal modalities are combined with one another (Pfurtscheller et al., 2010;Hong et al., 2018).Different modalities, such as EEG-fMRI and EEG-fNIRS, have been merged to enhance the BCI system.Considerable technical and analytical developments have made valuable scientific contributions to the field of BCI, but due to some series of technical challenges such as fMRI magnetic field interface with EEG signals, portability, and other safety concerns due to strong magnetic field made it limited (Warbrick, 2022).On the other hand, fNIRS technology can easily be combined with other modalities, such as electroencephalogram (EEG) and Electromyography (EMG), to provide a better picture of brain activity.There is evidence that combined EEG, and fNIRS-based BCI systems perform better than individual EEG and fNIRS-based BCI systems.although it should be noted that it is still necessary to conduct further research to understand the physiological interactions of EEG and fNIRS in detail.Hybrid EEG-fNIRS setups are fraught with problems based on the differences in measurement methods and the ground differences of the nature of these biosignals in two different domains (temporal and spatial) that must be analyzed simultaneously (Ahn and Jun, 2017;Khan et al., 2021a).The focus of our study is to individually explore the potential of information in fNIRS signals to increase the number of control commands.However, our future work will explore the individual finger-tapping task with a hybrid EEG-fNIRS study.
In this study, we investigate the possibility of utilizing fNIRS signals from individual channels with advanced machine-learning techniques to classify specific anatomical movements instead of relying on hybrid modalities to increase the number of control commands for BCI applications.To achieve this, we focus on a well-known task in fNIRS and fMRI studies called finger tapping.To make the task more complex, we focus on the fine anatomical movements, such as individual finger movements, which to the author's knowledge, have never been explored with fNIRS.fNIRS is more feasible to use in a naturalistic setting where the participant can move around compared with an fMRI environment.The reason is very apparent that fNIRS has very low spatial resolution compared to fMRI.It is worth noting that each body part has a distinct area in the primary motor cortex dedicated to it.However, due to its lower spatial resolution than fMRI, accurately classifying individual finger movements using fNIRS alone presents challenges.fNIRS as a single modality has the advantage of supplying additional information about HbR, which could possibly be used to estimate the metabolic rate (Boas et al., 2003).Nevertheless, we hypothesize that the motor cortex signals contain valuable information that can be leveraged for enhancing control commands through modern machine learning algorithms.
Till 2017, deep learning methods did not show any significant improvements compared to state-of-the-art techniques used for bio-signals classification in BCI (Lotte et al., 2018).However, recent research shows its future potential due to its ability to simultaneously learn useful features and classifiers from raw data.Based on the potential of the deep learning model, this research hypothesizes that fNIRS signal features can be used to distinguish even finite movements, such as individual finger tapping.This will be useful for various future applications in the field of fNIRS-based BCI.The increase in classification accuracy and control command generation is significant in BCI because they directly impact the usability and effectiveness of the system.Classification accuracy of the BCI system refers to the ability to correctly identify and interpret the user's intent from their brain signals (Lotte et al., 2007).The finger tapping task is a well-understood and relatively simple motor task with specific brain activation patterns used in the BCI experiment (Middendorf et al., 2000).But even for such tasks, detecting delicate anatomical structures, such as recognizing individual finger movements in BCI applications, is a complex and ongoing area of research.So far, continuous individual finger movements decoding and control in BCI are achieved using EMG signals, but it cannot be achieved in the case of muscle paralysis.On the other hand, invasive brain signal modalities, Electrocorticography (ECoG)-based BCI, shows promising results in distinguishing between individual finger movements due to its good spatio-spectral features, such as discriminating between ipsilateral or contralateral along with thumb or index finger movements (Zanos et al., 2008).In another study, using ECoG arrays, the intention for individual finger movements was classified using LDA with an overall accuracy of 67% for people with tetraplegia to use the BCI system accuracy (Jorge et al., 2020).However, these invasive methods are not feasible for BCI applications.To develop a deeper understanding of the motor control system during the individual finger-tapping exercise and to classify the specific finger movements, we will explore state-of-the-art data analytic tools.Machine Learning (ML) is characterized by learning hidden patterns from the data.In supervised ML the labels for the target variable are known in the data, and a function or model is learned to map the input to the output space.The targets can be continuous in the case of regression and discrete in classification.Similarly, the input variables can be pixels of an image, a portion of a time series, coefficients of statistical models, statistical summaries of data, Fourier and Wavelet transformations, etc.Many traditional statistical techniques and machine learning algorithms such as Linear Regression, Logistic Regression, K-Nearest Neighbors, and other available methods, rely on pre-processed features for learning the classification and regression functions.In a recent study using the modern machine learning approach, an improved and fast decoding of all five fingers along with resting state (six class) with a classification accuracy of 77% was achieved using ECoG (Yao et al., 2022).However, ECoG is limited due to its invasive nature.Therefore, inspired by ECoG, researchers attempted to classify non-invasively using EEG to decode individual finger movements.Liao et al. (2014) decode individual finger movements from one hand with an average accuracy of 77.11% using binary classification and support vector machine (SVM) as a classifier to classify between the pair of fingers using spectral changes in EEG data.In a recent study using a high-density EEG electrode setup, the average classification (SVM as a classifier) accuracy achieved using pairwise finger was 64.8%.Most other EEG studies demonstrate decoding of multiple finger movements to enhance the control command generation for the BCI system instead of single or contralateral finger movements (Gannouni et al., 2020).All these findings motivate the investigation of individual finger movements with fNIRS that can be utilized in various fNIRSbased BCI applications such as prosthetic arm development and rehabilitation.
The fMRI has a comparable high spatial resolution to fNIRS demonstrating reasonable classification accuracies for the classification of related tasks.In a study using real-time wholebrain imaging the right and left index finger movements were classified with an accuracy of 80% (LaConte et al., 2007).Decoding individual finger movements from the right hand using singletrial fMRI data was processed in one study using multivariate pattern classification analysis approaches (Shen et al., 2014).The average accuracy of 63% (best trail 84.7%) was achieved to classify between five fingers.Another study using fMRI data demonstrated that hyperalignment provides better between-subject classification accuracy of 88.8% than conventional anatomical alignment 46% using the four (index to little) finger presses movements (Kilmarx et al., 2021).Due to fNIRS's relatively lower spatial resolution, empirical data on how fine anatomical movements can be decoded from hemodynamic responses have yet to be investigated.In this research, we hypothesize that the fNIRS signal, with its rich information, could be used to distinguish delicate anatomical structures with modern classification algorithms to help enhance control commands for BCI systems, for example, the control of prosthetic hands and finger movements.The current fNIRSbased BCI research is limited to tapping one or more fingers, single or both hand-tapping, right and left finger-tapping, or hand-tapping.Significantly higher classification accuracy has been achieved using two classes (98.7 ± 1%, left vs. right finger tapping) and three classes (98.7 ± 6.9%, left versus right finger tapping vs. rest condition) problems using vector-based phase analysis (Nazeer et al., 2020).Using statistical features along with HbO and HbR data is also one of the common approaches to enhance classification accuracy even for motor imaginary (MI) tasks (Shin and Jeong, 2014).In a single-trail MI study, a finger tapping task was used to discriminate between thumb tapping vs. complex sequential tapping task with an accuracy of 81% (Holper and Wolf, 2011).In our pilot study on the collected dataset, we classified the individual finger movements against the baseline with classical machine learning algorithms such as SVM, random forests (RF), AdaBoost, (SVM), random forests (RF), decision trees (DT), AdaBoost, quadratic discriminant analysis (QDA), artificial neural networks (ANN), k-nearest neighbors (kNN), Artificial neural networks (ANN), and k-nearest neighbors (kNN).The average classification accuracies achieved were 75 ± 4%, 75 ± 5%, and 77 ± 6% using kNN, RF, and XGBoost, respectively (Khan et al., 2021b).The current study performed a classification of in-between finger movements, which is more difficult than finger movements with baseline or resting states.Comparing fNIRS with fMRI data, the misclassification rate is comparable for hand or finger movement tasks.However, as noted, fNIRS is a more promising modality because of its portability, relatively low cost, and its amenability to more naturalistic settings.
In this paper, we explored machine and deep learning (DL) models to classify fine anatomical movement (individual finger tapping) using fNIRS data.There are two primary difficulties: firstly, the classification of such minute movements using fNIRS, and secondly, the classification is a multi-class problem (classifying five fingers and the resting state).DL is inspired by information processing in the human brain; these models can learn complex and non-linear functions.It can potentially learn complex non-linear interactions between brain regions; and associations between the fNIRS signal and the motor movement.Moreover, feature extraction is usually not required for DL models as the feature are extracted hierarchically in the hidden layers, where the initial layers learn the lowlevel features and the deep layers learn more complex highlevel features corresponding to the task and loss function (Zeiler and Fergus, 2014).Therefore, these models learn the best representations from the data in the hidden layers.The study will be a foundation for classifying fine anatomical movement using fNIRS-based BCI.The paper is structured to describe the methodology in Section 2, results and discussion in Section 3, limitation of the current work in Section 4, and conclusion Section in 5.

Materials and methods
In this section, we describe the methods used for data collection, experimental design, and data analysis. .

Participant recruitment and training
The experiment involved 24 healthy right-handed participants, 18 males (M = 30.44± 3.03 in years; range: 24 to 34 years old) and six females (F = 29.17± 3.06 in years; range: 24 to 34 years old).It was required that participants write with their right hands with no neurological disorders or limitations in hand or finger motor abilities to meet the inclusion criteria.All the participants performed the finger tapping with finger from right hand only.The experiment was performed in a relatively controlled environment such as a quiet room to reduce any attentional bias, and a black shower cap to reduce the background noise from lamps and computer screens.A visual presentation in the textual format of resting and task (finger name to tap) was displayed on the computer monitor.Experiments were preceded by practice sessions in which participants were informed of the protocol and procedures.Medium-to-fast finger tapping was performed without any specific frequency.Experiments were repeated for each participant based on their comfort and convenience.The

. Instrumentation, experimental paradigm, and montage
A continuous-wave (λ 1 = 760 nm, λ 2 = 850 nm) optical tomography machine NIRScout (NIRx Medizintechnik GmbH, Germany) was used to acquire brain data with a sampling rate of 3.9063 Hz as shown in Figure 1A.The block design consists of blocks of rest and task (thumb, index, middle, ring, and little fingertapping) of the right hand, as shown in Figure 2. A baseline rest of 30 sec and a tapping duration of 10 sec was performed to achieve Frontiers in Human Neuroscience frontiersin.orga robust hemodynamic response in response to finger-tapping activity (Khan et al., 2020).The single experimental paradigm consisted of three sessions of each finger-tapping trial.The total length of the experiment was 350 sec.The single trial included 10 sec of rest followed by 10 sec of the task.Before placing the fNIRS cap on the participant's heads, cranial landmarks (inion and nasion) were marked to locate C z .The emitter and detector were placed according to the 10-10 International electrode positioning layout.The distance between the source and the detector was kept at a minimum of 3 cm using optode holders.Sixteen emitters and detectors were positioned over the motor cortex according to standard motor 16x16 shown in Figure 1B.The source detectors are assumed to cover the part of the frontal lobe, frontal-central sulcus lobe, central sulcus lobe, part of the central-parietal lobe, and temporal-parietal lobe.
. Signal prepossessing fNIRS data were processed using the pipeline as shown in Figure 3.The data processing included data truncation (removal of data points before and after the first and last stimuli appeared, respectively), spike removal (using the interpolation method), and channel rejection based on the criteria of coefficient of variation (CV) of 7.5%.The Modified Beer-Lambert Law (MBBL) converted optical densities into hemoglobin concentration changes.The spontaneous contamination of physiological and nonphysiological noise was removed using the Butterworth filter (lowpass frequency: 0.01 Hz, high-pass frequency: 0.5 Hz).Data were then cleaned for motion noises by applying temporal-derivative distribution repair (Fishburn et al., 2019).Signal correction such as z-normalization and baseline-zero adjustments (value = 10) was performed.

. Classification . . Event-related averages
After the signal processing step mentioned in Section 2.3 the filtered event averages from each run corresponding to each fingertapping task were calculated.The task was averaged into epochs from 5 sec before stimulus onset to 15 sec (including 10 sec task duration and 5 sec of post-task) ensuring no conflict with the onset of the next stimulus presentation, which may occur as early as 20 sec after the previous stimulus onset.The channel-specific event averages for HbO, HbR, and HbT are considered features of the machine learning model presented in the upcoming section.

. . Classifier
Time series classification (TSC) deals with classifying time series as belonging to specific classes.Given dataset D = (X i , Y i ), the goal in TSC is to learn a mapping function (f: X → Y), this mapping function is called a classifier.Machine learning was used to learn the classification function to predict the class probabilities given the feature vector X i .In the case of multivariate time series, a single training example X i ∈ R m×t , where m is the total number of time series, and t is the length of each time series in the data.The problem of finger-taping can be modeled as a multi-class classification problem, in the present case X i is a 1 × 77-dimensional vector representing the oxygenation level at discrete time steps of an individual channel.Mathematically, the classification problem can be written as where Y, is the target vector containing the class labels for the 5-finger taping respectively.X is the feature vector, and f (X) is the classification function which is learned from the data, w are the parameters of the model that need to be learned to build this classifier, x i is the event averages from one of the channels and y i is the label for the i-th training example.The classification function can be learned using parametric models like a deep learning (DL) model, parameterized by the weights w of the neural network, or non-parametric models like KNN.The length of the time series (event-related averages) considered for each channel is 77-time points corresponding to the event average data of 20 sec of data for each tapping task as mentioned in Section 2.4.1.We are interested in the class conditional probability of a specific finger tapping given one of the channel's time series data.

. . Data preparation
Deep Learning (DL) also known as representation learning is end-to-end learning.The traditional ML models require feature extraction and engineering, which are generally not required for DL models, as the features are learned in hidden layers.The DL models learn by forward and back-propagation of errors, in which gradients are calculated to update the model's weights until a convergent solution is obtained for minimizing the loss function.Scaling of the input or feature vector is required for the DL models for faster convergence when using gradient-based methods and comparable input features.For this reason, we scaled the input feature vector using standard scaling.

. . Model evaluation
The classifier's evaluation metrics are mainly accuracy, precision, and recall.In a classification problem, accuracy is defined as the ratio of correctly classified samples to the total number of samples in the data.Define Y p,i and Y t,i to be the predicted label and true label for the i-th sample and N be the total number of examples, then accuracy is defined mathematically as where "I" is an indicator function.Precision is defined as the ratio of the true positives in the data to the total number of samples predicted as positives by the classifier.For a class k, let T p be the total number of samples belonging to class k that the model has accurately predicted, T n be the total number of samples correctly predicted as belonging to other classes, F p be the total number of instances of wrong predictions made by the model as belonging to class k, and F n be the total number of instances of wrong predictions made by the model as belonging to other classes.Then precision is defined to be The recall is also known as sensitivity and is defined as the ratio of the true positives to the sum of true positives and false negatives, recall is defined as follows .

Machine learning classifiers
This section gives the details of various ML-based multiclass classifiers, built for the classification of individual finger tapping.Our analysis compares the performance of Random Forest (RF), XGBoost, Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), and Multinomial Logistic Regression (MNLR).

. . Linear discriminant analysis
Linear Discriminant Analysis (LDA) is a supervised machine learning algorithm for classification (Fisher, 1936;Johnson et al., 2002).In LDA, classification is viewed as a problem of dimension reduction.It finds the linear boundary that separates the classes by learning the discriminant functions by projecting data in a lower dimensional space.The LDA finds the best projections using discriminant functions to maximize the separation between classes (the means of the projected vectors) and minimize the variance within a class.Let X = x 1 , x 2 , ..., x N be the data where x i ∈ R d , and Y = y 1 , y 2 , ..., y N be the class labels for K classes respectively.Let µ k be the class-specific mean, and µ be the overall mean of the data.

ŜW =
where ŜW represents the variance within each class also called within class scatter matrix.The between-class variance or scatter matrix, which has to be maximized is given by the following relation where ŜB is the between class variance matrix, and n k is the number of examples in class k, µ is the overall mean vector.The projection matrix W can be obtained by solving the generalized eigenvalue problem given in Equation 7, by selecting the largest M λ i eigenvalues and corresponding v i eigenvectors.
We applied 5-fold cross-validation and values of solver [svd, lsqr, eigen] and shrinkage [auto, none] hyper-parameters were tuned using grid search.

. . Quadratic discriminant analysis
The Quadratic Discriminant Analysis (QDA) is a supervised learning classification algorithm, an extension of LDA that relaxes Frontiers in Human Neuroscience frontiersin.orgthe equal variance assumption across all classes (Hastie et al., 2009).
For each class, the QDA computes the class conditional densities separately and makes predictions using the Bayes Rule.In QDA, the class conditional densities are modeled using multi-variate Gaussian distributions.Let π k be the prior probability for a training sample X i belonging to class k.These prior probabilities for each class can be estimated from the data equivalent to the proportion of the data containing the sample belonging to the class k.Let µ k represent the class-specific mean, and k be the class-specific covariance matrix.The posterior probability for a sample X i as belonging to class k can be computed using Bayes Rule as follows: where f k is a multivariate Gaussian density function corresponding to class k, defined to be where z indicates the dimension of the vector X i , each X i , µ k ∈ R z .The 5-fold cross-validation was used during training, for tuning hyper-parameter reg param [0.0, 0.1, 0.5, 1.0] which controls the regularization on the covariance estimates was tuned using grid search.

. . XGBoost
Extreme Gradient Boosting (XGBoost) is an ensemble learning model widely used for supervised machine learning, i.e., classification and regression problems (Chen and Guestrin, 2016).The goal of XGBoost is to minimize the cross-entropy loss, which can be stated as follows where y i p is the predicted class probability or the predicted label for the instance i.The vector y it for true class labels is one hot encoded with a dimension equal to K the number of classes.DT is used as base learners in the XGBoost algorithm, and the splits are performed based on the reduction in the loss.The 5-fold crossvalidation was used during training, hyper-parameters including the number of estimators [50, 100], maximum depth [None, 5, 10], learning rate: [0.1, 0.01, 0.001] were tuned using grid search.

. . Random forest
The Random Forest (RF) is a widely used supervised ensemble learning method that can be used for classification and regression problems (Breiman, 2001).As the name suggests RF comprises multiple classifiers which are traditionally Decision Trees (DT) if not otherwise stated.RF builds different classifiers and uses Bagging and feature randomness to reduce the prediction variance and make the model more robust and stable.RF also provides feature importance which can be used for feature selection for downstream tasks.RF reduces the correlation of the individual DT used for the construction of RF by incorporating a random split of the features; however, due to the inclusion of multiple DT, RF is complex and not easily interpretable.The 5-fold cross-validation was used during training, hyper-parameters including the number of estimators [50,100], maximum depth [None,5,10], minimum samples required to split a node [2,5], and minimum samples required at leaf node [1,4] were tuned using grid search.

. . Multi-nomial logistic regression
The Logistic Regression (LR) algorithm is a commonly used statistical model for binary classification (McCullagh, 2019).The extension of LR employed for dealing with the case of multiple classes is called Multi-nomial Logistic Regression (MLR).Let X represent the features for the independent variables, and Y be the true labels.Y has labels for more than two distinct classes.In the present case, there will be five distinct labels in Y and X i ∈ R 1×77 .For every five classes, we can learn a separate set of weights W k .Let us define the discriminant function in Equation 11, and to calculate the probabilities for each class we need to use the softmax function.
The Equation 12gives the probability that the sample unit i with features X i belongs to a specific class.The 5-fold cross-validation was used, and the hyper-parameters for the regularization as L1 and L2 were searched using grid search.The values considered for the regularization are [0.01, 0.1, 1, 10, 20].A Saga solver was used to determine the optimal estimates for the model parameters.
The sklearn package in Python was utilized to implement all machine learning models (Pedregosa et al., 2011).

. Deep learning classifiers . . Baseline model
A baseline model with 5 fully connected dense hidden layers was developed.A batch normalization layer and a ReLU activation follow each linear layer.The layers have 1, 024, 512, 128, 64, and 5 neurons, respectively.

. . LSTM
Proposed by Hochreiter and Schmidhuber (1997) Long Short Term Memory (LSTM), solves the vanishing gradient problem of Recurrent Neural Networks (RNN).LSTMs find their application in sequence modeling tasks (Sundermeyer et al., 2012;Cao et al., 2019;Yu et al., 2019;Zhang et al., 2020).These sequence modeling applications may include DNA analysis, speech recognition, time series prediction, etc.Using a gating mechanism LSTM can capture the long-term dependencies in the data by maintaining cell states.LSTM uses three gates called output, forget, and update gates to learn the temporal dependence in the data.Let x t ∈ R d be the multivariate time series input, a t be the activation, γ t be the cell state at the instance t.Corresponding to these and b o are the learnable parameters (weights and biases) associated with the cell state, forget, output and update gates.θ f , θ u , θ o represents the output for the forget, update, and output gates.Non-linear activation functions tanh and σ are used in LSTM, the computations for the forward pass through LSTM can be stated as follows The candidate new cell state is given by γ ′ t , (.) is the dot product and ( * ) is the hadamard product (element wise).The cell state at instance t in Equation 17 is expressed as a linear combination of the candidate cell state at instance t Equation 13and cell state at instance t − 1.These states are scaled by the output of update θ u and forget θ f gates.In a multi-class classification problem, LSTM units are followed by a Dense layer with a softmax activation which computes the class conditional probabilities to minimize the cross-entropy loss.We used a 5 network for multi-class classification.The first 2 layers are LSTM layers with 50 units, followed by 2 dense layers with 50 and 20 units each with Relu activation.Each of these layers is regularized using L2 regularization and followed by batch normalization layers.The output layer has 5 units with a softmax activation.

. . Bi-directional LSTM
Standard LSTM/RNN blocks are based on the recurrence of past information to the future time steps.However, in many sequence modeling tasks like named entity recognition, etc., to infer the information about the current time step, informatBatch normalization layers follow these layers in a time series context; these models depend on future information as well.Given a sequence of time series, we can use bi-directional LSTM where the recurrence is calculated based on hidden states and gating mechanisms from the past and future (Huang et al., 2015).We constructed a bi-directional LSTM for multi-class classification of fNIRS finger-tapping time-series event averages data.The deep learning model consists of 5 layers, 3 of which are bi-directional LSTM layers with 50, 50, and 20 units.These layers are followed by batch normalization layers.Stacked bi-directional LSTM layers are followed by a dense layer with 20 units with Relu activation.Stacked bi-directional LSTM layers followed by a dense layer are regularized using L2 regularization.The output layer consists of 5 units with softmax activation.

. . LSTM-CNN hybrid
LSTM-CNN hybrid architectures have shown promising results in time series classification tasks Liu et al. (2019a,b); Garcia et al. (2020); Xie et al. (2020).We use a model with 2 separate heads to process the time series data using 1D convolution and Bidirectional LSTM layers.The CNN head consists of 3 stacked 1Dconvolution layers with 64, 64, and 32 filters followed by a dense layer with 32 units.These layers have kernel sizes of 3, 5, and 3 with the same padding, L2-kernel regularization, and Leaky Relu activation.All these layers are followed by batch-normalization layers.LSTM head consists of 3 stacked LSTM layers with 64, 64, and 32 units followed by a dense layer with 32 units with a Leaky Relu activation.All layers are regularized using L2 regularization and followed by batch-normalization layers.The output from LSTM and CNN heads is concatenated and passed to a dense block which has 2 layers with 32 units followed by batch normalization layers with Leaky Relu activation.The output of the dense block is passed to the output layer with 5 units and softmax activation.

. . Hemo-Net
We propose a modified model for multiclass classification of fNIRS data in particular tasks like finger tapping, inspired from Ismail Fawaz et al. ( 2020).The model is named Hemo-Net after the inspiration from both hemodynamic response and neural network.This model is based on the inception time model and has three inception blocks.The model is shown in Figure 4, which is composed of three inception blocks followed by average pooling and a classifier layer.Our model uses the same inception blocks described in the (Ismail Fawaz et al., 2020).The details of the inception block are given in Figure 4.The inception block comprises bottleneck, max pooling, convolution, and batchnormalization layers followed by a ReLU activation.The main advantage of using the Inception block is that it allows convolving the same input with varying kernel sizes.In this work, we use the kernels of dimensions 10 × 10, 20 × 20, and 40 × 40 based on experimentation inception blocks followed by average pooling layers to help reduce the dimensionality of the data.The pooling layer is connected to a fully connected classification layer with softmax activation.Residual connection is used in the model to avoid the problem of vanishing gradients.

Results and discussion
This study explores the potential of robust information about the subtle motor movements in fNIRS signals with classical ML and DL techniques.We evaluated the performance of various machine learning and deep learning algorithms on the multiclass classification task.We considered 5 machine learning models, including MNLR, QDA, LDA, RF, XGBoost    Recall is also balanced for all of the five classes around 75%.The F1 score which is the harmonic mean of precision and recall and is generally used for the evaluation of the multi-class classifiers is also 76%, which means that all of the classes are being predicted with good precision and recall, the classification report of the model is given in Table 1.We compared the performance of different machine learning and deep learning classifiers.The performance of all the machine and deep learning models on the training and test data is listed in Tables 2, 3 respectively.RF, XGBoost, and Hemo-Net depicted overfitting, while LDA, QDA, and MNLR did not depict this phenomenon.The reason for this behavior can be attributed to the complexity of the models.RF, XGBoost, and Hemo-Net are comparatively complex algorithms involving larger learnable parameters when compared to LDA, QDA, and MNLR.The simple models like LDA, QDA, and MNLR could not perform well on either training or test data.The QDA algorithm achieved a better performance in terms of accuracy on the test set with a score of 51.3%.Among the overfitting models, DL-based Hemo-Net  The results demonstrate the potential rich information in the fNIRS signal to represent and classify fine anatomical movements such as decoding individual finger tapping.The classifier has been demonstrated to perform remarkably better than the baseline models.This model can serve as a baseline for training more complex models using big data because DL models tend to perform better when trained on large data sets, as observed previously.Also, our model can be used for transfer learning for similar tasks like foot tapping, etc.With pre-trained weights, we can further improve the accuracy of our model which can be used to classify various related fNIRS-based BCI tasks.The classification accuracy achieved with our Hemo-Net model for five class problems was also much approved compared to our previous work (Khan et al., 2021b) for binary classification of the same data with classical ML models which was approximately (77% for XGBoost).The multiclass classification problem is comparatively complex to learn for machine learning models compared to the binary classification task.The brain imaging data sets are generally based on the design of experiment strategies and are therefore limited in the size of the data.By conducting more experiments and collecting more data for the finger-tapping task the accuracy of the model can be improved.
One possible direction with limited data sets can be to explore deep generative models for data augmentation and then retrain the classifier with the augmented data.
There are different sources of randomness that might be contributing to the 25% error in classification.There is variation in the person-to-person tapping pattern and corresponding brain activation.There is also variation within the trials or replication of the same person.The future model can include these random effects to make more robust classifiers.We also identify that event averages might also be deteriorating the inherited rich information in the raw signals, therefore in future works, we can experiment with models to extract the features directly from the raw or minimally processed data.This can also be helpful in real-time applications as with minimal pre-processing of the data and taking total leverage of the data-driven DL technologies, the lags in the BCI can be reduced.
Our primary focus in this study is to investigate the classification of fine anatomical movements, specifically individual finger-tapping tasks, utilizing fNIRS.By exploring the potential of fNIRS in brain-computer interface (BCI) applications, we aim to enhance our understanding of the hemodynamic responses associated with such movements.In the future, our research will expand to include the localization and temporal response analysis of brain activity by utilizing hybrid EEG-fNIRS techniques.In the future, if we can accurately measure finger movements using fNIRS and classify the signals correctly, it could have several advantages from a biomedical perspective.These include improved prosthetic control, enhanced rehabilitation assessment and monitoring, better understanding and treatment of neurological disorders, advancements in human-computer interaction, development of brain-machine interfaces, and the potential for cognitive assessment.This technology has the potential to significantly benefit individuals with motor impairments and neurological conditions, providing them with greater control, functionality, and improved quality of life.Our ultimate goal is to develop effective and efficient BCI systems that can improve the quality of life for individuals with motor impairments.

Limitations and future work
In this work, we analyzed the performance of different ML and DL algorithms on the task of multi-class classification of finger tapping from the event averages data.We observed that the DL-based architecture Hemo-Net has a better performance, but suffers from overfitting.This can be attributed to the data scarcity and the model's complexity.We propose to extend our work to collect more data for these complex models while utilizing transfer learning to initialize the models from the pre-learned weights on this data set.Event averages have been currently used for the fNIRS channels however event averages can destroy the temporal dependence structure in the data.Therefore we can build the models on the raw time series instead of event averages and compare the difference in the evaluation metrics for the classifiers for both approaches.Moreover using raw signals with minimal preprocessing also is beneficial for the real-time application of these classifiers on embedded systems.
Generative AI and DL methods can also be explored for building robust classifiers.The data can be augmented by using Generative Adversarial Neural Networks (GANs) or Variational Auto Encoders (VAEs) for the 5 classes.The classifiers can then be trained on these augmented data, and the real data can be used for evaluation in 3-folds with training, testing, and validation, with the test set never used for augmentation.The hypothesis behind this approach is that each generative model for the classes is assumed to capture the class-specific patterns in the data while minimizing the randomness from the uncontrolled sources (sensing, noise, inter-person tapping variation, intra-person tapping variation).

Conclusion
We explore the potential of functional near-infrared spectroscopy (fNIRS) in classifying fine anatomical movements using classical, modern machine learning (ML), and deep learning (DL) approaches.The results are promising, demonstrating acceptable accuracies for such challenging tasks that involve classifying fine anatomical movements.These findings highlight the potential of fNIRS signals in capturing information related to fine movements.However, to further enhance the application of brain-computer interfaces (BCIs), we require larger datasets, improved models, and reduced computational requirements.The proposed model, Hemo-Net exhibits superior performance compared to others.It has been observed that complex models tend to overfit the training data but perform better when evaluated on the test set than simpler models.This suggests the complexity of the problem at hand and the limited size of the available data.Collecting more future data can help improve these models' performance.Instead of training Hemo-Net from scratch, we propose training it with additional data using the pre-learned weights obtained from this study.This approach may decrease the training time required for the model.

FIGURE
FIGURE (A) Experimental setup demonstrating fNIRS NIRScout (NIRx Medical Technology GmbH, Germany) (B) Sixteen sources and detectors each were placed over the motor cortex according to the − international system.
experiment followed the declaration of Helsinki.Research Ethics Committee (REK No. 322236) no objection letter was obtained for experimental work.According to the Norwegian Center for Research Data AS (NSD Ref. No. 647457), informed consent for voluntary participation was given by all the participants before the experiment.Further details can be found inKhan et al. (2021b).

FIGURE
FIGUREData processing steps followed before application of DL model.
, and 5 deep learning models DNN, LSTM, Bi-directional LSTM, LSTM-CNN Hybrid, and Hemo-Net.The Hemo-Net model is inspired by Ismail Fawaz et al. (2020) which consists of inception blocks that allow learning filters with different dimensions in a single block to extract the useful features from the time series by back-propagation.Among all the models (including machine learning and deep learning), the DL-based model Haemo Net has shown superior performance on Frontiers in Human Neuroscience frontiersin.org

FIGURE
FIGURE Schematic illustrations of Inception time model adaptation for Hemo-Net (left) and a single Inception block (right).

FIGURE
FIGURE Confusion matrix C -C represent respective class as shown in Table .
the test set.Among machine learning models RF has depicted a superior test set performance.Haemo-Net depicted a test accuracy score of 76%.The confusion matrix is given in Figure5.The precision is around 75% for each class.We have not observed a high variance between the precision of each class on the test set for Haemo-Net.One reason for this balance can be the balance in the data set for all of the classes, in our training data, we have the same number of examples for the model to learn for all five classes.

TABLE Hemo -
Net classification report test data.
TABLE Machine learning models performance.
TABLE Deep learning models performance.