Reduction of Onset Delay in Functional Near-Infrared Spectroscopy: Prediction of HbO/HbR Signals

An intrinsic problem when using hemodynamic responses for the brain-machine interface is the slow nature of the physiological process. In this paper, a novel method that estimates the oxyhemoglobin changes caused by neuronal activations is proposed and validated. In monitoring the time responses of blood-oxygen-level-dependent signals with functional near-infrared spectroscopy (fNIRS), the early trajectories of both oxy- and deoxy-hemoglobins in their phase space are scrutinized. Furthermore, to reduce the detection time, a prediction method based upon a kernel-based recursive least squares (KRLS) algorithm is implemented. In validating the proposed approach, the fNIRS signals of finger tapping tasks measured from the left motor cortex are examined. The results show that the KRLS algorithm using the Gaussian kernel yields the best fitting for both ΔHbO (i.e., 87.5%) and ΔHbR (i.e., 85.2%) at q = 15 steps ahead (i.e., 1.63 s ahead at a sampling frequency of 9.19 Hz). This concludes that a neuronal activation can be concluded in about 0.1 s with fNIRS using prediction, which enables an almost real-time practice if combined with EEG.

The measured fNIRS signals (i.e., HbO, HbR) can be categorized into three durations (Frostig et al., 1990;Ernst and Hennig, 1994): (i) the initial dip, which represents the early extraction of oxygen by the nearby active neurons causing the HbO/ HbR to decrease/increase, (ii) the conventional hemodynamic response (HR) that is the large increase in cerebral blood flow (CBF) resulting in an increase/decrease in HbO/ HbR, respectively, and (iii) the undershoot before going back to the rest state. The changes in HbO/ HbR upon the functional stimulation can be translated into meaningful commands for BCI applications (Matthews et al., 2008). These converted signals can be further used to actuate external devices such as robotic arm/leg or wheelchairs for improving the quality of patient lives Wolpaw, 2010, 2011;Ortiz-Rosario and Adeli, 2013;Yazdani et al., 2018). In particular, fNIRS devices are portable and have shown great potential for BCI applications. The main limitation of fNIRS for BCI is its slow nature of the HR and the inherent delay from the onset of the neuronal activity (Jasdzewski et al., 2003;Cui et al., 2010;Ahn and Jun, 2017), which restricts its use for online BCI applications as well as hybridization with other rapid techniques such as EEG (Jiao et al., 2018;Li et al., 2018;Yang et al., 2018), magnetoencephalography, etc. Because of this limitation, various features in different temporal windows of 0-5, 2-7, 0-10, 0-15, 0-17, and 0-20 s were used in multi-class classification algorithms to classify HRs associated with the same or different brain regions for fNIRS-BCI applications (Power et al., 2011;Khan et al., 2014;Schudlo and Chau, 2014;Gateau et al., 2015;Khan and Hong, 2015;Hong et al., 2017;Shin et al., 2017;Liu et al., 2018;Yi et al., 2018). Thus far, the features frequently used from these windows include signal mean, signal slope, signal peak, skewness, kurtosis, variance, standard deviation, number and sum of peaks, root mean square, median, etc. (Hwang et al., 2016;Naseer et al., 2016;Liu and Hong, 2017;Hong et al., 2018b;Wibowo et al., 2018).
Another means of addressing this delay is to utilize the initial dip for fast fNIRS-BCI applications. The initial dip is an early change in oxygenation prior to any subsequent increase in CBF, which is spatially more specific to the site of neuronal activity (Vanzetta and Grinvald, 2008;Hong and Zafar, 2018). However, there is also a time lag in detecting the initial dip . A previous study by Hong and Naseer (2016) showed that the initial dip could be detected using a vector phase diagram with a single threshold circle. The vector phase diagram is a computationally efficient method to detect both the initial dip and the HR by displaying the trajectories of HbO and HbR, as orthogonal components, in the HbO-HbR polar coordinates (Oka et al., 2015). It was further proposed to use q-step-ahead prediction scheme in combination with the vector phase diagram to reduce the time lag in detecting the initial dip. They showed that the initial dip could be detected in 0.9 s using the q-step-ahead prediction scheme, showing high potential for BCI applications. Later, Zafar and Hong (2017) attempted to find the features and temporal window size for classifying the initial dip duration in fNIRS signals of different mental tasks. They showed that the running temporal window size for fNIRS could be reduced from 5 to 2.5 s using initial dip features (i.e., signal mean and signal minimum) in the classification process. Li et al.  Zafar and Hong, 2018).
(2017) also used the mean value of HbO and HbR signals in the 0-2 s window as an initial dip feature and achieved 85.5% classification accuracy for the classification of left-and righthand movements. Similarly, Khan and Hong (2017) used signal minimum as an initial dip feature and achieved a classification accuracy of 75.6% in classifying four mental tasks in a reduced window size (i.e., 0-2 s).
The use of dual threshold circles in the vector phase diagram was proposed to improve the detection of both initial dip and the conventional HR , see Figure 1. The threshold circles in the vector phase analysis helps to minimize the false detection of resting-state fluctuation and large fluctuations of HbO and HbR signals during the task period. The radius of the inner circle was set to the maximum HbO during the resting state, and the radius of the outer circle was set to the sum of the radius of the inner circle and 30% of the peak value of the main HR. The peak value of the HR was determined through the averaging over trials measured in the training phase. They showed that the use of dual threshold circles in the vector phase diagram resulted in an enhancement of the classification accuracies of two-finger tapping tasks. They also used the signal mean and the minimum signal value in 0-2.5 s time window to classify two-finger tapping tasks. However, windows of 0-2 s or 0-2.5 s are still too large for real-time BCI applications and hybridization of fNIRS with other rapid techniques such as EEG. Furthermore, the previously mentioned q-step-ahead prediction scheme by Hong and Naseer (2016) to reduce the delay was an offline analysis, and the validity of the predicted signals with multiple steps was not examined. Knowing the maximal prediction size of the q-step-ahead prediction method is important because the error of the predicted signals increases significantly with the increase of the number of step sizes. In addition, for real-time BCI applications, an online scheme is required to reduce the onset delay in fNIRS signals.
In this study, the use of a kernel recursive least squares algorithm (KRLS) is proposed for the q-step-ahead prediction of fNIRS signals. Three most commonly used kernels (i.e., Gaussian, polynomial, and sigmoid) are tested to compare the errors in the predicted fNIRS signals. Then, the effectiveness of the proposed prediction scheme was evaluated using fNIRS signals of finger tapping tasks measured from the left motor cortex of eleven subjects. The results of the proposed scheme were compared with those of the commonly used recursive least squares (RLS) algorithm. This paper further presents the applicability of the qstep-ahead prediction scheme to reduce the time lag in detecting the initial dips in fNIRS signals.

Brain Activity Model and Kernel Recursive Least Square
In this paper, a brain activity is modeled in a linear form using the autoregressive moving average with exogenous signals (ARMAX) model as follows.
where i represents the channel number; y is the measured HbO/ HbR; u is the desired hemodynamic response function (dHRF); w is the physiological noise; ε is the zero-mean Gaussian noise; a n , b m , c p , and c o are unknown coefficients that are recursively estimated; and n o , m o , and p o are the orders of the system, input, and exogenous signals, respectively. For fNIRS, the exogenous signal w consists of specifically three sinusoidal signals representing cardiac, Mayer, and respiration related physiological noises (Abdelnour and Huppert, 2009;Nguyen H.-D. et al., 2018). Also, the exogenous signals can be dropped out in the estimation process (i.e., p o = 0) if the prediction/tracking of the measured signal is focused. Nevertheless, the fNIRS signals were low-and high-pass filtered to minimize the effect of the physiological noises before the estimation process. Equation (1) can be written in a simplified vector form as follows.
where ϕ(k) ∈ ℜ (n+m+p+1)×1 is the regression vector and superscript T stands for the transpose operator. Figure 2 shows the estimation/prediction scheme. In this study, dHRF [i.e., u(k)] was generated by convolving the canonical HRF (cHRF), denoted by h(k), with a stimulation period, s(k), as follows. where task and rest represent the task period and the rest period, respectively (task = 10 s and rest = 20 s in this study). cHRF was generated as a linear combination of three gamma functions by the following equation (Shan et al., 2014).
where j represents the number of gamma functions, A j is the amplitude, α j and β j tune the shape and the scale, respectively, and k is the time step (in this work, A 1 = −1.5, A 2 = 7, A 3 = −2, α 1 = 1.5, α 2 = 6, α 3 = 16, and β 1 = β 2 = β 3 = 1 were used). The unknown coefficients in Equation (2) are estimated and updated using the KRLS based on the optimization of the cost function given by where κ represents the Mercer kernel, is the kernel matrix of all k input data points, R is a positive number known as the regularization parameter, H represents the reproducing kernel Hilbert space (RKHS) associated with the Mercer kernel, and λ (0.98 in this study) is the forgetting factor. The performances of the following three most commonly used kernels in improving the prediction of the fNIRS signals are tested (Muller et al., 2001): where σ is a scaling factor, and ϕ ′ represents the new upcoming data.
(ii) Polynomial kernel where c is a non-negative constant, and p is the order of the polynomial kernel.
where s and t are suitable non-negative constants. The basic idea is to map input data points to a high dimensional feature space (i.e., RKHS). This process allows the transformation of linear inner products into RKHS by simply changing their inner product into kernels (Schölkopf and Smola, 2002;Liu et al., 2010). The transformed feature space is then solved using the linear algorithm. The advantage of kernel-based algorithms is that they have a unique global solution that can be derived by solving a convex optimization problem . Furthermore, if data show a non-linear relationship, linear regression techniques cannot model them adequately. The kernel method can address this issue by moving to another feature space that is more likely to correspond to a linear model. However, the kernel method suffers from the overfitting problem because the Hilbert space induces high dimensionality of data. To address the issue of overfitting, the solution is penalized by limiting it to the L2 norm, as shown in Equation (8) (Evgeniou et al., 2000;Pillonetto et al., 2014), which is solved and updated as follows (Liu et al., 2010).
FIGURE 3 | Emitter-detector placement and experimental paradigm for the right-hand finger tapping task: (A) Emitter-detector placement and their distances, (B) experimental paradigm.
Frontiers in Neurorobotics | www.frontiersin.org As the kernel matrix grows linearly with the number of observations, the computational complexity of KRLS increases. The complexity is reduced by using the approximate linear dependency (ALD) criterion (Engel et al., 2004). The KRLS-ALD algorithm has been implemented using the Matlab TM kernel adaptive filtering toolbox (Van Vaerenbergh and Santamaría, 2013;Van Vaerenbergh, 2017). Accordingly, using Equation (2), the estimated brain activity model can be represented aŝ For q-step-ahead prediction, Equation (21) can be written as follows. Frontiers in Neurorobotics | www.frontiersin.org The performance of the algorithm was tested using the percentage fitting (%FIT) criterion as follows .
The performance criterion in (23) quantifies how much of the variance of y is captured by the q-step-ahead predicted signal . Furthermore, %FIT criteria measure how accurately the q-step-ahead predicted signals are estimated.

Experimental Data
Previously published experimental data  of right-hand (thumb and little) finger tapping sessions from 11 subjects were used for validating the proposed q-step-ahead prediction scheme. Brain signals generated by the finger-tapping were acquired from the left motor cortex using the frequency domain fNIRS system (ISS Imagent, ISS Inc.) at a sampling rate of 9.19 Hz. The electrode placement and the corresponding emitterdetector distances are shown in Figure 3A. A total of 36 channels were formed using emitter-detector pairs.
The experimental paradigm is shown in Figure 3B. The experimental paradigm consists of two sessions of finger tapping tasks. A session is composed of six trials of 30 s. Each trial includes a 10 s activity task followed by a 20 s rest. During the task period, the subjects were instructed to tap their right-hand finger as fast as they could without paying attention to the number of taps. The raw data ( HbO and HbR) obtained from the ISS Imagent data acquisition and analysis software (ISS-Boxy) were pre-processed to remove physiological noises related to respiration, cardiac, and low-frequency drift signals. Fourth-order Butterworth lowand high-pass filters with cutoff frequencies of 0.15 and 0.01 Hz, respectively, were used to minimize the respiration, cardiac, and low-frequency drift signals from the obtained fNIRS signals.

Detection of Initial Dip
The initial dip will be detected through the vector phase analysis with dual threshold circles (Yoshino and Kato, 2012;Hong and Naseer, 2016;Zafar and Hong, 2018), see Figure 1. Vector phase analysis is a polar coordinate plane method defined by HbO and HbR as orthogonal vector components. Two other vector components, cerebral oxygen exchange ( COE) and cerebral blood volume ( CBV), are obtained by rotating the vector coordinate system by 45 • counterclockwise using the following equations (Yoshino et al., 2013;Khan et al., 2018).
The magnitude and the phase of a vector p = ( HbO, HbR) in the phase plane can be calculated as follows.
The degree of oxygen exchange is defined by the ratio of COE and CBV. Therefore, the oxygen exchange in the blood vessel is represented by the change in COE. Using the abovementioned four indices, eight phases are defined on the vector phase diagram, see Figure 1. Phases 1-5 (i.e., Phase 1: 0 < HbR < HbO, COE < 0 < CBV; Phase 2: 0 < HbO < HbR, 0 < COE < CBV; Phase 3: HbO < 0 < HbR, 0 < CBV < COE; Phase 4: HbO < 0 < HbR, CBV < 0 < COE; Phase 5: HbO < HbR < 0, CBV < 0 < COE) are defined as the initial dip phases because they reflect an increase in either HbR or COE, whereas Phases 6 to 8 (i.e., Phase 6: HbR< HbO < 0, CBV < COE < 0; Phase 7: HbR < 0 < HbO, COE < CBV < 0; Phase 8: HbR < 0 < HbO, COE < 0 < CBV) are defined as HR phases. If there are no threshold circles in the vector diagram, the resting-state fluctuation and large fluctuations of HbO and HbR signals during the task period with COE > 0 can easily be interpreted as an initial dip. Threshold circles (i.e., dual threshold circles) incorporated in the vector phase analysis help in minimizing the detection of false dips Zafar and Hong, 2018). The radius of the first (inner) threshold circle in Figure 1 was determined during the resting state period as follows .
The single (inner) threshold circle can help to separate the resting-state fluctuation from the initial dip and task-related HR. However, a large fluctuation of HbO and HbR above the threshold circle can still falsely be interpreted as an initial dip. Therefore, a second (outer) threshold circle as an upper bound is drawn on the vector diagram to separate large HbO and HbR fluctuations from the initial dip. The radius for the second threshold circle was determined using the following  .
where p 1 and SD are the peak value and standard deviation of the averaged HbO trial over several trials from the most active channel, where the most active channel means the channel that shows the largest difference between the maximum HbO values in the resting state and the HR of the first trial during the training stage. The initial dips are detected if the trajectory lies in any phase from Phase 3 to Phase 5 and remains within the two threshold circles within first 2-4 s of the task period, and it moves to either Phases 7 or 8, after 2-4 s. The first (i.e., inner) threshold circle is used to detect the time instance of the occurrence of an initial dip and the HR. Any trajectory going outside the secondary threshold circle is considered as a false dip or noise.

RESULTS
The data during the resting-state and the first session were used in the training stage, whereas the second session data were used to test the proposed method. The parameters of the Gaussian (i.e., σ ), polynomial (i.e., c and p), and sigmoid (i.e., s and t) kernels were determined iteratively, and the value with the maximum %FIT for each kernel was selected for further analysis. Using the data of 792 channels [i.e., 11 subjects × (36 HbO + 36 HbR)], the values of parameters σ , c, p, s, and t for the Gaussian, polynomial, and sigmoid kernels were found to be 1 in the training stage. For the regularization parameter (R), several different values were tested through trial and error, and R = 10 −8 was found to achieve the best fitting (i.e., %FIT) of the predicted signals. R > 10 −8 did not affect the %FIT of the predicted signals, but lower values decreased %FIT. Figure 4 shows the fitting of the one-step-ahead predicted HbO and HbR signals on top of the measured signals ( HbO, HbR) using the RLS method and the Gaussiankernel RLS method for both active (i.e., Ch. 18) and non-active (i.e., Ch. 3) channels of Subject 1, respectively. Tables 1-4 reports the %FIT of HbO and HbR for individual subjects using RLS and KRLS. The statistical significance of the %FIT was verified using two-sample ttests. Signal information in the predicted signals (i.e., %FIT) significantly decreases (p < 0.05) as the step size increases. Table 5 shows a comparison of the averaged %FITs of RLS and KRLS with the Gaussian, polynomial, and sigmoid kernels for different q-step-ahead predicted fNIRS ( HbO, HbR) signals.
A number of previous studies reported that the peak of the initial dip occurred at approximately 1.9-2.5 s (Hu and Yacoub, 2012;Zafar and Hong, 2017). Therefore, q = 15 (i.e., 1.63 s since the sampling frequency was 9.19 Hz in this study) was selected for further analysis. Table 1 shows that KRLS with the Gaussian kernel yielded the best fitting (p < 0.05) for the 1.63 s ahead predicted HbO (i.e., 87.5%) and HbR (i.e., 85.2%) signals as compared to all other methods. Therefore, the Gaussian-kernel RLS was further used for reducing the delay in detecting initial dips in the fNIRS signals. Figure 5 shows the 15-step-ahead predicted HbO and HbR signals of active channels for different subjects. It can be clearly seen that the predicted signals are well-ahead (i.e., blue dotted lines) and perfectly tracking the measured signals (solid red line).
A comparison of vector-phase trajectories using measured and 1.63 s ahead predicted fNIRS signals for Subject 3 is shown in Figure 6. Table 6 shows the times of initial dip detection using 15-step-ahead predicted HbO and HbR signals for active channels of all subjects.

DISCUSSIONS
The newly emerging neuroimaging modality (i.e., fNIRS) has a disadvantage of inherent onset delay from neuronal activation, which limits its application for rapid BCIs. To overcome this, the use of a kernel method for q-step-ahead prediction of fNIRS signals was proposed for the first time. The novelty of this study lies in using an online prediction scheme to reduce this onset delay for online applications.
A previous study by Hong and Naseer (2016) could reduce the delay in detecting initial dip in fNIRS signals to approximately 0.9 s using an offline q-step-ahead ARMAX model-based prediction scheme. Our results (Tables 1-5) reveal that the fitting accuracy of the q-step-ahead predicted signals decreased significantly (p < 0.05) with the increase of prediction step sizes. Therefore, the selection of a proper step size is very crucial to ensure that the predicted signals contain the maximum information of the measured signals.
In this study, a linear combination of three gamma functions was used (i.e., dHRF, see Figure 2). Most early studies used only s two-gamma-function dHRF to analyze the fNIRS time-series (Abdelnour and Huppert, 2009;Ye et al., 2009;Hu et al., 2010). A key drawback in using two gamma functions is that the initial dip duration is neglected in the estimation/prediction process. This limitation was overcome by using three gamma functions, which provides an extra degree of freedom by including the initial dip in the dHRF model for better estimation/prediction of the fNIRS signal.
The KRLS algorithm improves the fitting of the predicted signals as compared to the RLS algorithm by moving from the input space to the transformed feature space, i.e., a high dimensional space (see Tables 1-5). The non-linear relationship   in the data cannot be adequately modeled by using linear regression techniques. The advantage of moving to a higher dimensional space is that there is a high probability that the data corresponds to a linear model, and it can be solved using the linear algorithms (Liu et al., 2010). Regarding the kernels, the Gaussian kernel yielded the best fitting of the predicted HbO  (i.e., 87.5%) and HbR (i.e., 85.2%) signals at q = 15 stepahead. The polynomial kernel also yielded good results for the HbO signals, but the fitting slightly decreased for the HbR signals. In contrast, the fitting of both HbO and HbR signals significantly decreased for the sigmoid kernel. Furthermore, the fitting of the predicted HbR signals was lower than that of the predicted HbO.
Early studies reported that the peak of initial dip occurred around 1.9-2.5 s (Malonek and Grinvald, 1996;Yacoub and Hu, 2001;Yacoub et al., 2001;Hu and Yacoub, 2012;Zafar and Hong, 2017). From this viewpoint, 1.63 s ahead prediction was selected in this study for an early detection of initial dips. Nevertheless, the peak of an initial dip depends on various factors, such as the type of task performed, the duration of the task period, and the brain area under investigation. The trajectories (Figure 6) for both measured and predicted signals were almost the same, showing that the predicted signals were well-tracking the measured signals. However, if the fitting of the predicted signal is not adequate, the trajectory can lead to a wrong decision regarding the detection of initial dip or HR. With 1.63 s ahead prediction (Table 6), the initial dips were detected in minimum 0.11 s (maximum 0.65 s), which is much lower than that of Hong and Naseer (2016) (i.e., 0.9 s). Furthermore, the initial dip phenomenon did not occur in some trials. In the literature, this issue has been discussed considering several issues. One interesting report is that it is due to the use of caffeine before the experiment (Behzadi and Liu, 2006;Hong and Zafar, 2018). In addition, the detection time of the initial dip varies among trials and subjects (Hu et al., 2013).
Finally, this study demonstrated a step moving toward the development of a real-time BCI and a brain monitoring system using fNIRS. The significance of this study lies in the fast detection of activity-related responses in fNIRS signals. Even if an initial dip is not present, the inherent onset delay in the conventional HR can be reduced using the proposed q-step-head prediction scheme. Moreover, the use of q-step-head prediction with improved fitting can help in the hybridization of fNIRS with other rapid modalities such as EEG. Nevertheless, further research is still required to improve the fitting of the predicted fNIRS signals with an accuracy more than 90% using advanced signal processing (Ghafoor et al., 2017;Chen et al., 2018;Hong et al., 2018a) and adaptive algorithms (Iqbal et al., 2018;Nguyen Q. C. et al., 2018). In the future, other types of kernels should also be investigated for further improvement of the predicted fNIRS signals. The limitations of this study are as follows: (i) the order of the system (a n ) and the input (b n ) was set as 1 to ensure low computational complexity. Therefore, the optimal order of the system and the input for the prediction of fNIRS signals should be investigated further. (ii) Exogenous signals were excluded from the estimation/prediction process. These signals should be considered for further improvement of the predicted fNIRS signals in the future.

CONCLUSION
In this study, the q-step-ahead prediction scheme based on KRLS was used to reduce the onset delay from the neuronal activation in fNIRS signals. fNIRS signals of right-hand finger tapping task acquired from the left motor cortex were used to evaluate the performance of the prediction scheme. The results show that the Gaussian kernel yields the best fitting for both HbO (i.e., 87.5%) and HbR (i.e., 85.2%) signals at q = 15 step ahead prediction (i.e., 1.63 s with the sampling frequency of 9.19 Hz). The application of the scheme was found to reduce the delay in detecting the initial dip. The improvement in the fitting of 1.63 s ahead predicted fNIRS signals enabled the detection of initial dip in 0.1 s. The reduction in the onset delay is a significant improvement in the development of real-time BCI applications using fNIRS.

DATA AVAILABILITY STATEMENT
The datasets analyzed in this article are not publicly available. Requests to access the datasets should be directed to kshong@pusan.ac.kr.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Pusan National University Institutional Review Board. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
AZ carried out the data processing and wrote the first draft of the manuscript. K-SH suggested the theoretical aspects of the current study, corrected the manuscript, and supervised the entire process from the beginning.