Signal Processing in fNIRS: A Case for the Removal of Systemic Activity for Single Trial Data

Researchers using functional near infrared spectroscopy (fNIRS) are increasingly aware of the problem that conventional filtering methods do not eliminate systemic noise at frequencies overlapping with the task frequency. This is a problem when signals are averaged for analysis, even more so when single trial data are used as in online neurofeedback or BCI applications where insufficiently preprocessed data means feeding back noise instead of brain activity or when looking for brain-behavior relationships on a trial-by-trial basis. For removing this task-related noise statistical approaches have been proposed. Yet as evidence is lacking on how these approaches perform on independent data, choosing one approach over another can be difficult. Here signal quality at the single trial level was considered together with statistical effects to inform this choice. Compared were conventional band-pass filtering and wavelet minimum description length detrending and the combination of both with a more elaborate, published preprocessing approach for a motor execution—motor imagery data set. Temporal consistency between Δ[HbO] and Δ[HbR] and two measures of the spatial specificity of signals that are proposed here served as measures of data quality. Both improved strongly for the combinationed preprocessing approaches. Statistical effects showed a strong tendency toward getting smaller for the combined approaches. This underlines the importance to adequately deal with noise in fNIRS recordings and demonstrates how the quality of statistical correction approaches can be estimated.


INTRODUCTION
The functional near infrared spectroscopy (fNIRS) signal is intended to measure task-related cortical hemodynamic responses. It typically also comprises physiological noise such as heart beat (1-1.5 Hz), respiration (0.2-0.5 Hz) and low frequency content resulting from blood pressure fluctuations (Mayer waves; 0.1 Hz) (Naseer and Hong, 2015;Kamran et al., 2016). Several methods are available to remove these systemic, non-evoked components Zhang et al., 2013;Naseer and Hong, 2015;Kamran et al., 2016). The most widely used method is band-pass filtering with cut-off frequencies of approximately 0.01-0.9 Hz (Naseer and Hong, 2015;Kamran et al., 2016). However, depending on the context of an experiment (e.g., online vs. offline) filter types [e.g., finite impulse response (FIR) vs. infinite filter response (IIR); Pinti et al., 2018] and cutoff frequencies differ substantially (e.g, offline filter with cutoff frequencies of [0.01, 0.09] Hz as in Pinti et al., 2018 vs. online filter with cutoff frequencies of [0.01, 1.5] Hz as in . Since the exact frequency characteristics of the systemic components are unknown and might differ between subjects, band-pass filtering might not eliminate all frequencies related to this physiological contamination (Duan et al., 2018). One possible approach to overcome this limitation might be the application of wavelet filters since in this context it is not relevant to know the exact frequency components Duan et al., 2018).
In addition to the non-evoked systemic noise signals, another noise-component is task-dependent cerebral and extracerebral hemodynamic activity. This noise signal component is present in the signal due to near infrared (NIR) light passing cerebral and highly vascularized extracerebral layers, that is scalp and skull tissue, from a light source to a detector twice (Leff et al., 2011;Scholkmann et al., 2014;Tak and Ye, 2014;Brigadoi and Cooper, 2015;Tachtsidis and Scholkmann, 2016;Zhang et al., 2016;Pfeifer et al., 2018). During this journey the NIR light is contaminated by changes in the hemodynamic signals originating in these layers, mainly from the scalp veins (Takahashi et al., 2011;Kirilina et al., 2012). This noise component, consisting of evoked systemic cerebral and extracerebral components, here referred to as global component (GC), is considerably more problematic to remove than the non-evoked systemic noise because it is not constantly distributed over the head but varies from channel to channel (Gagnon et al., 2012). Moreover, it comprises the same taskrelated frequency range as the brain-related neuronal signal and can therefore not be removed by any filtering method. Although awareness for this fNIRS-specific problem is rising (Scholkmann et al., 2014;Tachtsidis and Scholkmann, 2016;Zhang et al., 2016;Pfeifer et al., 2018), to date most researchers do not correct for the GC and in consequence many of the reported experimental fNIRS effects are at risk of being inflated by artifacts (Brigadoi and Cooper, 2015;Pfeifer et al., 2018). In cases in which single trial data are used, the danger of quantifying artifacts is even more severe as here artifacts not properly removed by the standard filters will have a much stronger impact. Examples are neurofeedback or brain-computer interface (BCI) setups or studies aiming to demonstrate a brain-behavior relationship at the single trial level.
The use of additional short-distance channels (0.5-1 cm) is by now considered the most efficient method to remove the extracerebral component of the GC from the hemodynmaic fNIRS data but most up-and-running fNIRS systems are not equipped with the necessary hardware yet (Saager and Berger, 2005;Brigadoi and Cooper, 2015;Tachtsidis and Scholkmann, 2016;Nambu et al., 2017;Yücel et al., 2017;Pfeifer et al., 2018). Alternatives are a number of statistical correction methods, that however do not distinguish between the cerebral and the extracerebral aspect of the GC (Scholkmann et al., 2014;Tachtsidis and Scholkmann, 2016). Little is known of how these methods affect signals and statistical results in independent data sets (Zhang et al., 2005(Zhang et al., , 2016Kohno et al., 2007;Santosa et al., 2013;Pfeifer et al., 2018). Pfeifer et al. (2018) compared different approaches two of which included a self-implemented correction by means of a global regression or a unilateral regression. They found major differences between the averaged data in the tested approaches and also in the corresponding statistical results which emphasizes the extensive impact the GC might have on a study's results and corresponding conclusions.
This study continues this line of research with a particular focus on single trial data quality. Aim of the study was to quantify the impact of published preprocessing approaches on single trial data and statistical outcome of an independent data set. In a first step, the conventional band-pass filter was compared with a wavelet minimum description length (MDL) detrending filter (implemented in NIRS_SPM; Jang et al., 2009). The latter was expected to provide a superior removal of the non-evoked systemic noise from the fNIRS signal as compared to band-pass filtering. In a second step, to additionally remove the GC from the signal, both the band-pass filter and the wavelet filter were combined with an approach based on singular value decomposition (SVD) and a Gaussian kernel smoother (Zhang et al., 2016). These methods were selected because they can be easily implemented in an online neurofeedback or BCI experiment, yet several more exist, also regarding the removal of the GC (Zhang et al., 2005(Zhang et al., , 2016Kohno et al., 2007;Santosa et al., 2013;Pfeifer et al., 2018).
Across the four approaches it was expected that the respectively more elaborate approaches, that is, each filtering method in combination with the SVD and Gaussian kernel method, would result in a higher signal quality as compared to the filtering methods alone. Moreover, it was expected that the task signal would be more spatially specific after applying the spatial filter in addition as compared to either the bandpass or the wavelet filter alone. Higher spatial specificity would be reflected in lower correlations between channels and more clearly defined activation patterns. These differences in signal quality were predicted to affect experimental effect sizes in the data set with a decrease from band-pass filtering to the wavelet filter to the combination of the filtering methods and global component removal.

Subjects
The analyzed data was originally collected as part of a fNIRSbased motor imagery (MI) neurofeedback group comparison study. Data from the different neurofeedback groups were pooled for the present analysis (for details see section 2 in the Supplementary Material). In total, 50 data sets [27 females, 23 males; mean (± SD) age: 24.1 (±2.78) years, ranging from 19 to 30 years] were included. Another ten subjects were excluded based on their electromyography (EMG) data (for details see section 3 in Supplementary Material). Handedness was assessed by means of the Edinburgh Handedness Inventory (EHI; Oldfield, 1971). All participants were right handed with a mean laterality quotient of 84.75 ± 17.33 (mean ± SD). Participants were recruited by way of the virtual platform FIGURE 1 | Schematic illustration of the experimental design. All subjects started with a familiarization phase (A) in which they memorized the sequence, followed by the pre-test (B), consisting of five trials ME and five trials MI. Afterwards subjects performed 20 trials of MI training (C) and at the end a post-test (D) was performed, consisting of five trials of ME. The first seven trials in the MI training session (C) were defined as "early" trials and the last seven trials as "late" trials in order to statistically test for a "time on task" effect within the MI data.
of the University of Oldenburg. Only participants without a history of psychological and neurological disorders, with normal or corrected-to-normal vision and without any experience in piano playing were included. After explanation of the study, all subjects gave written informed consent. Participants were paid 8 €/h as reimbursement. The study was approved by the Ethics Committee of the University of Oldenburg. Figure 1 illustrates the structure of the experimental design. Each subject participated in one experimental session consisting of familiarization phase, pre-test [motor execution (ME) and MI], training session (MI) and post-test (ME) that together lasted no longer than 45 min. During the session, subjects sat comfortably in a chair with their arms on the chair's armrests in a sound-shielded booth. Participants sat in front of a computer monitor at a distance of approximately 175 cm. The motor task was a sequential 8-position finger-tapping task performed with the left hand. The task was semi self-paced in that the duration of the tapping period was fixed, but not the speed with which participants tapped. Participants memorized the fixed sequence of numbers (3-2-3-5-4-2-5-4) during the familiarization phase.

Experimental Design
In the ME parts of the pre-test and the post-test phases participants physically performed the motor task as often as possible in five 20 s trials. In the MI parts of the pre-test and the training phase, participants were instructed to perform the finger-tapping task as often as possible for five (pre-test phase) or 20 (training phase) 20 s trials using kinesthetic motor imagery. Participants were asked not to move their hands throughout the MI parts. In the training phase participants belonging to one of the feedback groups (cf. section 2 in Supplementary Material for details) were additionally instructed to increase a thermometer bar from trial to trial by trying to intensify the imagination of the motor task. Participants of the non-feedback group saw a comparable bar to keep visual input as similar as possible across groups. Trial structure was identical for MI and ME trials. A trial started with a rest stage lasting for 18-22 s (pseudo-randomized) indicated by a bright red fixation cross (cf. Figures 1B-D). The bright red fixation cross was preceded and followed by a 1 s dark red fixation cross indicating an upcoming change between trial stages. The subsequent task stage lasted for 20 s and was indicated by a blue fixation cross. In the training session the task stage was followed by an 1 s dark red fixation cross and either a visual feedback in shape of a thermometer filling up from the bottom according to the achieved change in hemoglobin concentration or a thermometer displaying the passing of time till the next trial would start. For a more detailed description of the experimental design including neurofeedback generation see section 2 in Supplementary Material.

FIGURE 2 | (A)
The eight sources (pink) and the 12 detectors (green) resulted in 24 channels (white). Analyses were restricted to channels covering the right hemisphere (box). (B) Sensorimotor cortical regions covered by the layout, approximated by means of the fOLD toolbox (Zimeo Morais et al., 2018), together with the fNIRS channels. The regions include primary motor cortex (M1; blue), premotor cortex and supplementary motor areas (PMC+SMA; green) and the primary somatosensory cortex (S1; red). As in (A), the box contains the channels used for analyses.

Electromyography (EMG) and Electroencephalography (EEG)
Electromyography (EMG) data was recorded from the extensor digitorum communis muscle on both arms during the whole experiment. To this end two electrodes were placed on each arm, resulting in two bipolar channels. A ground electrode was attached to the processus styloideus ulnae (see Figure S1 in Supplementary Material). EMG was recorded with a BrainVision Recorder (version 1.10) using a BrainAmp DC Amplifier (BrainProducts, Gilching, Germany). The sampling rate was 1 kHz with online filtering between 0.1 Hz and 250 Hz. The EMG data served as a control variable for the offline analysis only. All EMG-contaminated MI trials were removed from analysis (for details see section 3 in Supplementary Material).
In addition to the fNIRS data, EEG data was simultaneously recorded from 32 Ag/AECl electrodes placed together with the NIRS optodes in an elastic cap (EasyCap, Herrsching, Germany) according to the international 10-20 system. EEG data are of no relevance for the present study and will not be considered further.

Functional Near Infrared Spectroscopy (fNIRS)
FNIRS data was recorded with a NIRScout 816 device (NIRStar 14, NIRx Medizintechnik GmbH, Berlin, Germany). As visualized in Figure 2, the eight LED sources (intensity 5 mW/wavelength) and the 12 detectors covered the premotor cortex (PMC), primary motor areas (M1), supplementary motor areas (SMA), and somatosensory areas (S1) of both hemispheres (approximated with the fOLD toolbox; Zimeo Morais et al., 2018). The NIRS optodes were placed according to the international 10-20 system in a custom-made cap together with the EEG electrodes. The Cz position was used as a marker for correctly positioning the cap. The optodes were attached to the cap with spring-loaded grommets (NIRx Medizintechnik GmbH, Berlin, Germany) which reduce optode movement and, which improve the contact between optode and skull. The distance between a source and a neighboring detector was approx. 3 cm and each of these source-detector pairs gave rise to one channel. This layout resulted in a total of 24 channels. The sources emitted NIR light at wavelengths 760 and 850 nm. Light intensity at the detectors was sampled with a sampling rate of 7.81 Hz.

Data Preprocessing
For offline preprocessing using the band-pass filter, fNIRS data was imported via the MATLAB-based nilab2 toolbox (Rev. 2.0, NIRx Medizintechnik GmbH, Berlin, Germany) using MATLAB 2012a (The MathWorks Inc., Natick, MA, USA). To allow for the data to be preprocessed with the wavelet MDL detrending approach

Conventional Band-Pass Filter (BPF)
In the context of a general linear model (GLM) framework it is recommended to apply a finite impulse response (FIR) filter with very high filter orders (i.e., > 1, 000; Pinti et al., 2018). Since the context of the present analyses differs, that is, the prospective application within a (online) neurofeedback or BCI experiment, a FIR filter with such a high filter order (i.e., the filter length minus one data point) is not applicable without a very long baseline (e.g., around 1, 000/8s −1 = 125s with a sampling rate of approx. 8 Hz). In addition, MATLAB-based filter functions such as filtfilt() require data lengths of at least three times the filter order. Therefore, an infinite impulse response (IIR) filter was used in the present study in order to have a better comparison to the outcome of an online preprocessing version.
Although a broader frequency window of the band-pass filter was applied in the online version of this experiment (i.e., [0.01, 0.7] Hz; cf. section 3 in Supplementary Material) the guidelines regarding the cutoff frequency window of Pinti et al. (2018) was followed. That is, for either the conventional bandpass filtering (BPF) approach and its combination with the global component removal (BPF+GCR) the data of the △[HbO] and △[HbR] was firstly low-pass filtered with a third order Butterworth filter (0.09 Hz) and afterwards high-pass filtered by means of a first order Butterworth filter (0.01 Hz). The filters were applied using MATLAB-based functions [i.e., butter() and filtfilt()].

Wavelet Filter (WLF)
In order to eliminate the non-evoked systemic components from the hemoglobin signal (i.e., respiration, blood pressure) a wavelet MDL detrending method was applied  which is implemented in the MATLAB-based NIRS_SPM toolbox (Version 4; Ye et al., 2009). This wavelet filter (WLF) has some advantages over conventional band-pass filtering. For example, it is possible to remove the frequencies due to physiological noise and fast varying trends, which can not be removed by conventional filtering because otherwise parts of the hemodynamic signal would be removed ). The method was originally developed for fMRI analysis and later adapted for fNIRS data analysis. It is based on the discrete wavelet transform and it is used to decompose the fNIRS data into global trend (i.e., global drift), hemodynamic signals and uncorrelated noise at distinct scales . For the decomposition, the data is firstly temporally smoothed using a canonical hemodynamic response function (HRF) (Friston et al., 2000), whereupon a wavelet MDL detrending algorithm  removes the global trend from the signal. In the algorithm of Jang et al. (2009), this global trend is modeled within the general linear model (GLM): where the fNIRS signal is indicated by Y, the design matrix by X, β describes the weights and ǫ the residuals. The new term denoted by θ characterizes the additional global drift for each wavelength and at each channel location separately. Therefore, a modified GLM is derived for each signal type HbX as follows: Y HbX = X HbX β HbX + ǫ HbX + θ HbX Then, a discrete wavelet transform is used to separate the global trend θ HbX into wavelets. In comparison to the infinite sine and cosine waves used in Fourier analysis, wavelets are temporally limited "mini waves" used to represent other functions. Fourier analysis techniques use sine and cosine functions which are defined by frequency and not by time. A small change in frequency produces changes in the whole time domain, whereas wavelets are local in frequency and time (Vidakovic and Müller, 1994). To denoise or filter the data by means of wavelets some coefficients act as filters and others correspond to details in the data set. The idea is to set all coefficients below a specified threshold to zero before using them for reconstruction of the data in an inverse wavelet transform. This way of filtering is favored because it denoises the signal without removing sharp structures and details (Graps, 1995). Therefore, Jang et al. (2009) implemented a Cohen-Daubechies-Feauveau (CDF) 9/7 wavelet. For its usage in fMRI exists an implicit assumption that all wavelet coefficients at the same scale are simultaneously either all zero or all non-zero ). However, fNIRS data possesses a much higher number of data points and if the same assumption is considered it could lead to underor overfitting. To avoid this, the MDL principle is additionally implemented. Simplified, this procedure is based on Occam's razor (i.e., select the answer with the fewest assumptions) and successively includes the descending ordered wavelet coefficients until a sufficient complexity is found ).

Wavelet Filter and GC Removal (WLF+GCR) and Band-Pass Filter and GC Removal (BPF+GCR)
In addition to both the BPF and the WFL method, the GC was removed from the signal. One method for removing the GC was introduced by Zhang et al. (2016). This method is based on Gaussian spatial filtering and singular value decomposition (SVD). For implementing the method, the Montreal Neurological Institute (MNI) coordinates of each channel are required. The MNI coordinates for this data set were approximated by means of the ICBM-152 head model (ICBM 2009a Nonlinear Symmetric Atlas) with the NIRSite toolbox (Version 1, NIRx Medizintechnik GmbH, Berlin, Germany). The optodes were not digitized for any of the subjects, instead uniform coordinates according to the international 10-20 system (approximated by the fOLD toolbox; Zimeo Morais et al., 2018) were used. Although subject specific coordinates would have been more accurate, the distances between optodes are primarily important for this approach and should be comparable between subjects.

Gaussian Spatial Filtering
To smooth the data with the Gaussian kernel channel distances are required. In order to calculate the curved distance between two channels (Fenn, 2001) the conversion from cartesian (x, y, z) into spherical coordinates (r, θ , ϕ) is necessary. The distance between two channels is then calculated by means of the arc length which is defined as the great-circle distance, i.e., the shortest distance between two points (channels) on a curved surface. To calculate this distance, information about the latitude and longitude of two channels, given by ϕ 1 , θ 1 and ϕ 2 , θ 2 , is necessary, respectively. The central angle △c between the channels is then defined as: △c = arccos(sin ϕ 1 sin ϕ 2 + cos ϕ 1 cos ϕ 2 cos(△θ )) and distance d or arc length between these channels is then calculated as follows: The distances between all channels are stored in a n × n distance matrix D which is applied within the Gaussian filter defined by: A Gaussian filter is a two-dimensional kernel smoother mainly applied to remove detail and noise from a data set. The variable σ represents the width of the kernel. The authors of the approach suggested σ = 50 • because the width of the kernel should be on the one hand greater than the cortical neuronal activation and on the other hand smaller than the GC. In a later paper, enhanced results were reported by using a σ of 48 • which is also applied in this analysis (Zhang et al., 2017).

Singular Value Decomposition
The fNIRS signal over all channels is represented by an m × n matrix S where n specifies the number of channels and m the number of data points. A singular value decomposition (SVD) is basically a generalization of the eigendecomposition to nonsymmetric matrices. The SVD decomposes the matrix S into three matrices: where U is an m × n matrix consisting of left singular vectors and represents the temporal waveform of the n channels (Harner, 1990;Zhang et al., 2016). The columns represent the principal features in descending order starting with the first column which is the highly correlated (most global) feature of the original data (Harner, 1990). is a n × n diagonal matrix with n nonnegative singular values as diagonal elements. The singular values correspond to the square root of the variance of each spatiotemporal feature which is the same as the square root of the eigenvalues and are arranged in in descending order (Harner, 1990). V T is the transpose of a n × n matrix and consists of the right singular vectors containing the spatial information of the channels corresponding to the temporal information of each vector in U (Harner, 1990;Zhang et al., 2016). Since the matrix V contains the spatial information of the fNIRS data, the next step toward the GC removal is the smoothing of V by means of the Gaussian filter. This kernel smoothing is done by a discrete convolution of V, consisting of vectors v i , and the Gaussian kernel G: The resulting matrix V * , consisting of the vectors v * i , contains only the spatial information of the GC, because the convolution eliminates the localized neuronal pattern (Zhang et al., 2016). To reconstruct the waveforms of the GC of each channel the matrix V is replaced by the matrix V * in the SVD formula: The difference between S Global and S results in a matrix containing only data related to neuronal activity:

Channel Selection and Mean Value Calculation
Once data were preprocessed with either the BPF, the WLF, the BPF+GCR or the WLF+GCR approach, GLMs were fitted to the cleaned data using a Boynton canonical HRF (6 s delay). GLMs were fitted using the nilab2 toolbox (Rev. 2.0, NIRx Medizintechnik GmbH, Berlin, Germany). Thereupon, all data was epoched (−5 to 30 s around stimulus onset) and baseline corrected (−5 to onset) using custom MATLAB-based code. All steps were applied to both △[HbO] and △ [HbR]. Individual channel selection for statistical analysis was primarily based on the beta values derived from the GLM for the MI and ME pre-test phases and restricted to channels covering the right hemisphere, that is, the hemisphere contralateral to the active hand (cf. Figure 2). To select the individual channel, all contralateral channels were ranked according to their beta values separately for MI and ME (cf. Figure 3A). Then a second ranking was performed by averaging the ranks of ME and MI for each channel and the channel with the highest average rank was then chosen for further analyses (cf. Figures 3B,C). If there were two or more channels with the same average rank an additional criterion was applied. For this criterion, for each trial the mean value of the ±2 s window around the trials' peak was calculated (see Figure 3D).
The mean values were then averaged separately for MI and ME trials. The resulting mean concentration changes for ME and MI were multiplied, and the channel with the largest product was selected for further analyses. For the individually selected best channel, single trial mean values were extracted from a time window ranging from 4 to 20 s after stimulus onset. This window was selected because the hemodynamic response is not expected to peak before 4-6 s after stimulus onset due to the delayed nature of the signal and otherwise not task-related activity might influence the signal values used for statistics. Single trial mean concentration changes were derived as the mean value of the ±2 s window around the trial's peak. Single trial values were averaged separately for MI and ME trials.

Statistical Analyses
For all statistical analyses R (version 3.6.0 "Planting of a Tree"; R Core Team, 2017) together with RStudio (version 1.2.1335; RStudio Team, 2015) was used. The data was always visually checked for normally distributed residuals by means of a qqplot. In order to correct for violations of sphericity (ANOVA) the Greenhouse-Geisser correction method was used. As posthoc tests, Bonferroni corrected paired student's t-tests or pairwise t-tests (for parametric tests) as well as pairwise comparisons using Nemenyi multiple comparison test (for non-parametric tests) were applied. Additionally, test specific effect sizes were reported for all applied tests (i.e., Kendall's W, Cohen's d or generalized η 2 G ).

Temporal Consistency
To assess the quality of the neuronal signal S Neuronal , Zhang et al. (2016) where Oxy wave form and Deoxy wave form are vectors containing the grand average signal of the epoched data over all channels of interest with mean value removed. The β value describes the scalar to minimize the residual error ǫ. Zhang et al. (2016) suggested this error as a measure of temporal consistency and, this measure is predicted to decrease after GC removal. Additionally, the authors measured the spatial consistency by means of a similar model, and both measures showed FIGURE 3 | Schematic illustration of the channel selection based on the beta values (bars in A) for ME (blue) and MI (violet) of the pre-test phases. (A) Firstly, the channels were ordered due to the ranking of the beta values for ME and MI data separately, then (B) a second ranking was performed based on the mean rank of ME and MI ranks and finally (C), the channel with the best rank was selected (channel 13 in the example). In (D) the mean value calculation is exemplary visualized for both △[HbO] (pink) and △[HbR] (blue) data. Therefore, for each trial a mean value over the peak ±2 s was calculated.
significantly decreased residuals for the models fitted for S Neuronal (Equation 6) as compared to the models fitted for the signal S (Equation 9) (Zhang et al., 2016). This reduced ǫ indicates an enhanced temporal relationship between △[HbO] and △[HbR], interpretable as resulting from a cleaner signal. Temporal consistency was calculated as proposed by Zhang et al. (2016) for the different preprocessing approaches to judge preprocessing-related changes in data quality. In contrast to Zhang et al. (2016) no block averaged but single trial linear regression models were fitted separately for ME and MI data with △[HbO] and △[HbR] as the response variable and the explanatory variable (Equation 10). For all approaches, data for temporal consistency analysis were derived from the individual channels selected for the WLF+GCR preprocessed △[HbO] data because it was expected that this combination results in the cleanest signal and hence most accurate channel selection. For each subject and separately for ME and MI data, the residuals of the single trial regression models were extracted and a mean residual value was calculated. Note that for ME this average comprised all trials of pre-and post-test (ten trials in total) and for MI all trials of the training session after removal of the trials with significant EMG activity (17.72 ± 3.49 trials, ranging from 5 to 20 trials; for details of EMG analysis and EMG-based trial removal, see section 3 in Supplementary Material).
The mean of the single trial residuals were then used to test for differences in the temporal consistency of data derived with the four preprocessing approaches. Because the residuals of the data did not appear normally distributed a non-parametric repeated measures ANOVA (Friedman test) was conducted with the within-subject factor "approach" (BPF, WLF, BPF+GCR, WLF+GCR) and the mean residuals as response variable. Nonparametric tests are based on ranks which are in this case not informative, therefore in addition to mean ranks (± SEM), mean residuals (± SEM) are reported.

Spatial Specificity
Spatial specificity refers to whether the signals measured with a given imaging technique are clearly defined in space or smeared, where a clearly defined signal would be the desirable situation.
To quantify spatial specificity of △[HbO] and △[HbR], the mean of single trial correlation matrices and spatial variances were calculated and compared for the four preprocessing approaches. To this end, for each subject Spearman correlation matrices were calculated between the single trial hemodynamic data of all contralateral channels, separately for ME and MI as well as △[HbO] and △ [HbR]. Then two analyses were conducted.
For the first analysis, for each subject the single trial correlation matrices were averaged for each condition (ME and MI), signal type (△[HbO] and △[HbR]) and preprocessing approach (BPF, WLF, BPF+GCR, and WLF+GCR), resulting in 16 correlation matrices per subject. Then, for each of the 16 matrix types a mean correlation matrix over all subjects was calculated and normalized (Fisher Z transformation). The "cortest" function ("psych" package version 1.8.12) was used to test for differences between matrices resulting from the four preprocessing approaches. Matrices were expected to differ between BPF, WLF, BPF+GCR, and WLF+GCR as a result of improved artifact removal and, consequently, a more clearly defined hemodynamic signal. A significant difference between matrices is an indication that spatial specificity differs between approaches, it is however not unequivocal in this regard. That is, it could result from a spatially more clearly defined signal, but also from a spatially unspecific, general shift in correlations for one approach. This was addressed in the second analysis.
For the second analysis, the variances over the triangular matrix of the single trial correlation matrices was calculated. With this approach, a spatially unspecific signal would result in a smaller variance than a signal clearly defined in space. On the other hand, a spatially unspecific, general shift of correlations would not reduce the variance. For each participant and preprocessing approach, single trial variances were then averaged for ME and MI and △[HbO] and △[HbR], respectively. These values were used in four non-parametric repeated measures ANOVAs (Friedman test), all with the within-subject factor "approach" (BPF, WLF, BPF+GCR, WLF+GCR). Mean variances were expected to increase from BPF to WLF to BPF+GCR to WLF+GCR. Mean ranks (± SEM) and mean residuals (± SEM) are reported for these analyses. , 0] s) data of a single ME trials preprocessed by BPF and BPF+GCR approaches (BPF vs. BPF+GCR) and the subplots in the right column contain the data preprocessed by means of WLF and WLF+GCR approaches and the GC itself (WLF vs. WLF+GCR). For each subject, the first row contains the pre-test and the second row the post-test ME data. , 0] s) data of a single MI trials preprocessed by BPF and BPF+GCR approaches (BPF vs. BPF+GCR) and the subplots in the right column contain the data preprocessed by means of WLF and WLF+GCR approaches and the GC itself (WLF vs. WLF+GCR). For both subjects for a better visualization only the first 10 trials of the training session after EMG correction are shown.

Consequences for Experimental Outcomes
Removing artifacts from fNIRS data with more elaborate preprocessing pipelines has been shown to affect the outcome of a study (Pfeifer et al., 2018). To examine whether the same applies in the present study a 4 × 2 ANOVA with repeated measures was conducted for the within-subject factors "approach" (BPF, WLF, BPF+GCR, WLF+GCR) and "time on task" (pre, post [ME] or early, late [MI]), and the averaged single trial mean values of the concentration changes as response variable for ME and MI and signal type separately. With regard to the experimental factor "time on task" it was expected that concentrations would change from pre to post (ME) and from early to late (MI), reflecting practice effects in the finger tapping task. For ME, the levels of the "time on task" factor corresponded to the data collected in pre-and post-test phases respectively. In contrast, for MI the two levels of the "time on task" factor were defined as the first seven trials (early) and the last seven trials (late) out of a total 20 trials of the MI training session. Participants had to have at least two valid trials for each level (i.e., 2 out of 7) in order to be entered into the analysis. Due to the EMG-based epoch removal this criterion was not fulfilled in three out of 50 participants, yielding a sample of N = 47 for this analysis.

Temporal Consistency
In order to assess whether the temporal consistency between △[HbO] and △[HbR] differs between the preprocessing approaches, linear models were fitted for each trial for the individually selected channel (see section Temporal Consistency). Individual single trial residuals were averaged and Friedman tests with the within-subject factor "approach" (BPF, WLF, BPF+GCR, WLF+GCR) were conducted for ME and MI separately. Additionally, the mean (± SEM) ranks and residuals are represented for each method and ME and MI data.

Figures 4, 5 illustrate the △[HbO]
and △[HbR] data from two representative participants for ME (Figure 4) and MI (Figure 5) trials. Clearly, for both subjects the noise was highly reduced after applying the GCR approach in addition to each of the two filters in all trials. Furthermore, descriptively, after additionally removing the GC from the data the amplitudes of both △[HbO] and △[HbR] were greatly reduced as well as smoothed.
Results of the Friedman tests indicated for both MI and ME residuals a highly significant effect [ME: χ 2 (3) = 146.62, p < 0.00001, W = 0.89; MI: χ 2 (3) = 135.96, p < 0.00001, W = 0.87]. The subsequent pairwise comparisons using Nemenyi multiple comparison test resulted for all approaches and for both ME and MI in highly significant differences (p < 0.01).
Since the WLF+GCR approach was expected to result in the cleanest signal the channel selection was based on the data resulting from this approach. Against the expectations, the mean (± SEM) ranks and residuals were highest for WLF and lowest for BPF+GCR (cf. Table 1 for exact values), indicating that the highest temporal consistency resulted from the BPF+GCR approach. Hence, the analysis was repeated with the channel selection based on the BPF+GCR approach for which the same increments regarding the residuals were expected as found for the channel selection based on WLF+GCR.
This was confirmed by the results of the Friedman test which showed also a highly significant effect for both ME [χ 2 (3) = 142.01, p < 0.00001, W = 0.90] and MI [χ 2 (3) = 137.35, p < 0.00001, W = 0.88] data. The post-hoc tests showed highly significant differences in all comparisons (p < 0.01). As reported in Table 1 the highest residuals resulted from the WLF approach, followed by BPF and decreased further from WLF+GCR to BPF+GCR approach, indicating the strongest temporal consistency resulting from the BPF+GCR method.

Spatial Specificity
To examine whether the more elaborate approaches resulted in a higher spatial specificity, single trial Spearman correlations were computed between all contralateral channels separately for △[HbO] and △[HbR], ME and MI and the four preprocessing approaches. Afterwards, mean correlation matrices were calculated, normalized and used to test for differences between methods. Secondly, the variances of the single trial correlation matrices were averaged. These values entered the Friedman tests with "approach" (BPF, WLF, BPF+GCR, WLF+GCR) as within-subject factor. Figures 6, 7 show the mean magnitudes and mean correlations averaged over all trials for each channel and all subjects for each preprocessing approach and separately for ME and MI as well as △[HbO] and △[HbR] data. As illustrated in Figures 6A,B, 7A Figures 6, 7).
The mean correlation matrices are visualized in Figures 6,  7C,D. Note that the figures show non-normalized values for a clearer comparison. Descriptively, correlation matrices are spatially unspecific and highly similar for approaches BPF and WLF, whereas for approaches BPF+GCR and WLF+GCR, although also highly comparable, correlations seem much more spatially defined. This impression was confirmed by statistical tests.
For all mean correlation matrix comparisons of MI and ME and △[HbR] and △[HbO] data, the comparison of the correlation Each row corresponds to one of the preprocessing approaches. For both signal types only WLF+GCR and BPF+GCR approaches resulted in a higher spatial specificity indicated by both mean (Continued) FIGURE 6 | correlation matrices and mean magnitudes. The white numbers in the diagonal correspond to the channel number. Note that the order of the channels might differ between matrices due to the hierarchical clustering applied only for visualization ("ggcorrplot" function from the package "ggcorrplot"). Exact mean ± SEM Spearman correlation coefficients can be found in Figure S2 of the Supplementary Material. To represent the influence of the GC all channels are visualized in the channel maps although only channels of the right hemisphere were of interest. For a better comparison the channel maps of the mean magnitudes are underlaid with the corresponding ROIs (cf. Figure 2). matrices derived for the WLF and BPF as well as the comparison between BPF+GCR and WLF+GCR approaches did not result in significant differences. All other comparisons showed highly significant differences (all p < 0.00001; cf. Table 2). This pattern of results indicates on the one hand a strong influence of the GC on the spatial specificity of the signal and on the other hand, regarding the spatial specificity, no differences between the two applied filter types are evident. The results of all 24 correlation matrix comparisons can be found in Table 2.
A second set of analyses tested for differences in the variances of single trial Spearman correlations between all contralateral channels. Regarding the ME data, the results of the Friedman tests indicated highly significant differences for the factor "approach" for both △ For the ME data, subsequent pairwise comparisons using Nemenyi multiple comparison test showed highly significant results (p < 0.05) for all comparisons except for the comparison between BPF and WLF as well as between BPF+GCR and WLF+GCR approaches. Regarding the MI data, the post-hoc tests showed highly significant differences (p < 0.01) for all comparisons besides the comparison of BPF+GCR and WLF+GCR approaches. As showed in Table 3, the variances increased to a large degree from BPF to BPF+GCR and from WLF to WLF+GCR approaches but are highly comparable between WLF and BPF and WLF+GCR and BPF+GCR. This is in support of the conclusion that removing the GC has a particulary strong influence on spatial specificity.

Motor Execution: Time on Task Comparison
In order to compare the approaches with respect to experimental outcomes 4×2 ANOVAs with repeated measures and the withinsubject factors "approach" (BPF, WLF, BPF+GCR, WLF+GCR) and "time on task" (pre, post) were conducted.
The results regarding the △[HbO] data indicated a highly significant main effect of "time on task" [F (1,49) = 9.03, p < 0.01, η 2 G = 0.03] with a decreasing mean value from pre-test (3.53 ± 0.18µM) to post-test (2.91 ± 0.15µM). Furthermore, the main effect of "approach" [F (3,147) = 139.08, p < 0.00001, η 2 G = 0.34] as well as the interaction "approach:time on task" [F (3,147) = 7.84, p < 0.001, η 2 G = 0.006] were highly significant. The 4 × 2 ANOVA for △[HbR] indicated only a highly significant main effect for "approach" [F (3,147) = 83.54, p < 0.00001, η 2 G = 0.20]. Regarding the main effect of "approach, " the pairwise comparisons are documented in Tables 4, 5 and showed for all comparisons a highly significant difference (p < 0.00001) for △ [HbO] and △[HbR] data, respectively. To further explore the interaction for each preprocessing approach a paired t-test was conducted with △[HbO] as response variable and "time on task" (pre, post) as explanatory variable. Except for the BPF+GCR approach all t-tests showed significant differences with a general direction of a decrease from preto post-test (see Figure 8A). The size of the statistical effects decreased from WLF approach (|d| = 0.46) to the WLF+GCR approach (|d| = 0.36) to the BPF method (|d| = 0.28) to the to the BPF+GCR approach (|d| = 0.19).

Motor Imagery: Time on Task Comparison
For the MI data, 4 × 2 ANOVAs with the within-subject factors "approach" (BPF, WLF, BPF+GCR, WLF+GCR) and "time on task" (early, late) were performed to assess the effect of preprocessing approach on experimental effects.
For Neither the main effect of "time on task" nor an interaction between the two factors was significant for both signal types. This is illustrated in Figures 8C,D. Regarding both the △[HbO] and the △[HbR] data, all pairwise comparisons resulted in significant differences (p < 0.05).

Additional Exploratory Analyses
As evident from the statistical analyses and as illustrated in Figures 8B-D, "time on task" had no effect on the ME △[HbR] data as well as on the MI △[HbO] and △[HbR] data signal irrespective of preprocessing approach. Exploratory analyses were conducted to better understand this unexpected result. For the sake of completeness, although a main effect of "time on task" was evident for ME △[HbO] data, the exploratory analyses were also applied for this data.
A possible explanation of the lacking "time on task" effect could be that part of the subjects showed a decrease in signal over time, while the remaining subjects showed an increase, effectively averaging out any "time on task" effect at the group level. To test this idea, the sample was regrouped into a group with a signal increase ("increase") and a signal decrease group ("decrease"), separately for ME and MI, each preprocessing approach and for △[HbO] and △[HbR] data. This grouping was based on the difference of the mean signal value of pre-and post-test for ME and early and late data for MI, respectively. For ME and MI, each FIGURE 7 | correlation matrices and mean magnitudes. The white numbers in the diagonal correspond to the channel number. Note that the order of the channels might differ between matrices due to the hierarchical clustering applied only for visualization ("ggcorrplot" function from the package "ggcorrplot"). Exact mean ± SEM Spearman correlation coefficients can be found in Figure S2 of the Supplementary Material. To represent the influence of the GC all channels are visualized in the channel maps although only channels of the right hemisphere were of interest. For a better comparison the channel maps of the mean magnitudes are underlaid with the corresponding ROIs (cf. Figure 2). ) a 2 × 2 mixed repeated measures ANOVA with within-subject factor "time on task" [pre, post (ME) or early, late (MI)] and between-subjects factor "signal change" (increase, decrease) was conducted. All test results, effect sizes and group sizes are given in Tables 4-7.

Motor Execution
Regarding the ME data, the output of the ANOVAs are documented in Tables 4, 5. Overall, the effect of most interest was the interaction "signal change:time on task" which was for both △[HbO] and △[HbR] data and all preprocessing approaches highly significant (p < 0.00001; cf. Tables 4, 5). To further explore this interaction, paired t-tests were conducted for both △[HbO] and △[HbR] data and each preprocessing approach separately in order to find a potential "time on task" effect for the signal change subgroups. As illustrated in Figures 8E,F, the results of all applied t-tests showed highly significant differences between pre-and post-test data (p < 0.001).
For the △[HbO] data and all preprocessing approaches, the "decrease" signal change group comprised the larger amount of subjects as compared to the "increase" signal change group (△N = 6 for BPF+GCR and △N = 18 for all other approaches) which might be the reason for the main effect of "time on task" of the primary analysis (cf. section Motor Execution: Time on Task Comparison), which resulted in a signal decrease from preto post-test. For the △[HbR] data the size of the subgroups are more comparable (△N : 2-8) and is likely to be the reason for the lacking "time on task" effect in the main analysis (cf. section Motor Execution: Time on Task Comparison).
For all subgroups of both △[HbO] and △[HbR] data and each preprocessing approach the effect sizes always decreased from BPF to BPF+GCR and from WLF to WLF+GCR.

Motor imagery
In terms of the MI data, neither a significant main effect of "signal change" nor of "time on task" was evident. However, highly significant interactions between the two factors were found for all four preprocessing approaches (p < 0.0001; cf. Tables 6, 7).
In order to explore the significant interactions, paired t-tests were performed to test for differences between early and late MI data in the respective subgroups. The results are summarized in Tables 6, 7 and visualized in Figures 8G,H. All comparisons showed a highly significant difference between early and late MI trials (p < 0.001).
As for the ME △[HbR] data, for both the MI △[HbO] and △[HbR] data and all preprocessing approaches, the size of the subgroups are comparable (△N : 1-7) and might be also here the reason for the lacking main effect in the primary analysis (cf. section Motor Imagery: Time on Task Comparison).
Regarding the effect sizes, for both △[HbO] and △[HbR] data the effect sizes decreased in both subgroups from BPF to BPF+GCR approaches. For both subgroups of the △[HbR] data there was also a decrease in effect size from WLF to 3 | Spatial Specificity: Statistical results from the pairwise comparison of both the motor execution and the motor imagery data as well as mean ± SEM for variances and ranks for each signal type and preprocessing approach. WLF+GCR approaches. However, for the △[HbO] data only the "increase" subgroup showed a decrease in effect size from WLF to WLF+GCR whereas the effect size of the "decrease" group minimally increased. Taken together, the results of the exploratory analyses indicate that even though no experimental effect is evident for both the △[HbO] and △[HbR] data of the MI training phase as well as the △[HbR] data of the ME tests when considering the whole sample, significant experimental effects are evident for "signal change" subgroups. Because these experimental effects are of opposing direction they are annulled in the sample average. However, no clear pattern emerged with regard to the relationship between preprocessing approach and size of the "time on task" effect but, except for the comparison of the "decrease" group of the ME △[HbO] data, a decrease of effect size from one approach (BPF or WLF) to its more elaborate version (BPF+GCR or WLF+GCR) is evident. Notably, subgroup assignment was influenced by preprocessing approach. That is, at least for some participants the preprocessing approach determined whether △[HbO]/△[HbR] magnitudes increased or decreased as a function of "time on task."

DISCUSSION
The present study investigated the effects of four fNIRS preprocessing approaches on single trial data by means of temporal consistency between △[HbO] and △[HbR], and of spatial specificity of either signal. In addition, experimental outcomes were compared between preprocessing approaches. The preprocessing approaches chosen were conventional bandpass filtering (BPF), a wavelet MDL detrending filter (WLF; Jang et al., 2009) and the combination of each approach with global component removal (WLF+GCR; Zhang et al., 2016 and BPF+GCR). It was expected that the more elaborate approaches lead to a higher signal quality and consequently in a higher temporal consistency between △[HbO] and △[HbR] and a greater spatial specificity of the signals. Experimental effects were "Time on task" F (1,49) = 9.03, "Approach" F (3,147) = 139.08, "Time on task:approach" F (3,147) = 7.84,     expected to be reduced for the more elaborate preprocessing approaches as compared to BPF and WLF alone.

Temporal Consistency
Descriptively, and as illustrated in Figures 4, 5, against the expectations, for the WLF approach, in most cases signal amplitude and morphology were quite similar to that resulting from the BPF approach. However, as hypothesized after the combination of each filtering method with the GCR method, the signal amplitude was greatly reduced as compared to the BPF and WLF alone (cf. Figures 4, 5). Furthermore, multiple peaks were strongly suppressed, suggesting an origin in evoked systemic activity. Temporal consistency was statistically assessed based on the residuals of the linear models fitted for △[HbO] and △[HbR] data, where a small residual would indicate clean data and high signal quality (Zhang et al., 2016). It was hypothesized that the WLF method results in a higher temporal consistency as compared to the BPF method, consequently a further increase in temporal consistency was expected from BPF+GCR to WLF+GCR. This was not confirmed by means of the statistical results. Although, for both the ME and the MI data, the mean residuals resulting from preprocessing with the BPF and WLF approach were nearly twice as high as those resulting from the BPF+GCR and WLF+GCR approaches (cf . Table 1), the temporal consistency decreased from BPF to WLF and from BPF+GCR to WLF+GCR. Overall and independent of whether the channel selection was based on BPF+GCR or WLF+GCR, the lowest residuals resulted from the BPF+GCR approach which was also confirmed by statistical outcomes.
For block averaged data and using the WLF+GCR approach only, Zhang et al. (2016) proposed that a decrease in residuals reflects higher data quality. The results of the present study indicate that both BPF+GCR and WLF+GCR strongly reduce residuals also when applied to single trial data, and, following the notion of reduced residuals reflecting higher data quality, specifically BPF+GCR can be assumed to significantly improve single trial signal quality.

Spatial Specificity
Spatial specificity was quantified by means of (contralateral) channel correlation matrices and (contralateral) channel variance comparisons. Mean normalized correlation matrices differed significantly between all preprocessing approaches of both MI and ME and △[HbO] and △[HbR] data except for the comparison of BPF and WLF and of BPF+GCR and WLF+GCR approaches. Whereas for the BPF and WLF approaches nearly all contralateral channels were highly positively correlated in spite of considerable distances between the channels, descriptively, for the BPF+GCR and WLF+GCR approaches only adjacent channels formed positive correlation clusters, and negative and zero-correlation clusters appeared to be present as well (cf. Figures 6, 7). Variance increased significantly from BPF to BPF+GCR and from WLF to WLG+GCR, indicating that channel signals became increasingly independent from one another. However, likewise in these analyses no difference between BPF and WLF and between BPF+GCR and WLF+GCR approaches for the ME data and between BPF and WLF approaches in the MI data were evident. These results were unexpected in that it was hypothesized that WLF resulted in higher spatial specificity as compared to BPF, followed by BPF+GCR to WLF+GCR.
Taken together, differences in correlation matrices and the increase in variance support the conclusion that the removal of the global component results in a more specific signal. Furthermore, these results indicate no difference in terms of spatial specificity between BPF and WLF approaches.
The increased spatial specificity in particular for the BPF+GCR and WLF+GCR preprocessed data was clearly evident in the mean activation maps (cf. Figures 6, 7). Here, for △ [HbO] and △[HbR] the contralateral channels with the largest signals (respectively 12, 17, 21, and 22; and 13, 17, 18, and 22) covered supplementary motor and premotor areas (SMA/PMC) and primary motor areas (M1). This is in line with typical activation patterns for sequential tasks found in MI and motor learning research (Dayan and Cohen, 2011;Mizuguchi and Kanosue, 2017).
Interestingly, a high influence of the GC on spatial specificity was evident for both △[HbO] and △[HbR] data. This is in contrast to previous findings (Yücel et al., 2015;Zhang et al., 2016) where it has been suggested that mostly △[HbO] data are affected by global component activity. At present, the cause of this discrepancy is not clear, but our results should caution against relying on uncorrected △[HbR] data in the hope that here the GC poses no problem.

Consequences for Experimental Outcomes
For both ME and MI and respectively △[HbO] and △[HbR], the data resulting from the four preprocessing approaches differed highly significantly from each other. In addition, regarding the ME data, for the △[HbO] data a significant "time on task" effect was evident as well as an significant interaction between the effects of "preprocessing approach" and "time on task." The largest effect size for "time on task" was found for the WLF approach, followed by the WLF+GCR and then the BPF and BPF+GCR approaches. Concerning the MI data, the experimental factor "time on task" was neither associated with a significant main effect nor a significant interaction. Subsequent exploratory analyses performed on "signal change"based subsamples indicated that "time on task" effects were present and that they varied between preprocessing approaches. However, the expected decrease in effect size of the "time on task" effect from BPF to BPF+GCR and from WLF to WLF+GCR could not be reliably observed across all subsamples.
The pattern observed for ME data indicates that for an experimental task where a strong and reliable hemodynamic response can be expected, experimental outcomes will likely be more affected by artifactual sources when based on △[HbO] as compared to △[HbR] data, as reported in previous studies (Zhang et al., 2016;Pfeifer et al., 2018). Yet when responses are smaller and perhaps less reliable as for MI the present results suggest that this might not hold true, as experimental effects varied with preprocessing approach both for △ [HbO] and △[HbR]. This variation indicates that the presumed insensitivity of △[HbR] to artifacts might be a misconception. However, MI results were derived from exploratory analyses and rely on a much smaller sample than those obtained for ME, hence future research is needed to consolidate this observation.

Constraints and Conclusions
The results of the present study are object to a number of constraints. Regarding the GCR approach (Zhang et al., 2016) a possible issue is the optode coverage, which the authors of the approach recommended to be "much larger than the expected pattern of activity" (Zhang et al., 2016) due to the large-sized kernel. More specifically, they indicate the coverage to be 9 cm or more. Though total coverage in the present study was not as extensive as in Zhang et al. (2016), the subset of channels selected for GCR and hence relative coverage was comparable between the studies and fit the 9 cm criterion. As visualized in Figures 6C,D, 7C,D, applying the GC removal results for some of the frontal channels in a reversed hemodynamic response. Reversed response patterns have been reported before for motor imagery tasks (Holper et al., 2011;, and were for instance interpreted as a sign of motor inhibition due to a reduction of activation of the underlying brain regions . Alternatively, these negative responses could also be due to the fact that the SVD algorithm is not able to account for local differences within the global physiological changes. Hence, it offers a good but not perfect estimation of the systemic activity and therefore the results should always be considered with care. GCR, in combination with either WLF and BPF, might be an option not only for offline analyses of averaged and single trial data, but also for fNIRS online implementations such as intermittent neurofeedback. Intermittent neurofeedback describes the situation where, in contrast to continuous neurofeedback, feedback on brain responses is given only following the end of a trials' task period, giving room for some online signal correction. Besides, regarding nowadays' computer hardware, also the implementation of these approaches in a continuous online experiment should be possible since these approaches are not computationally demanding. Before being used for this purpose, the approach should however be validated online, in line with a recent proposal regarding the validation of methods and approaches intended for EEG-based braincomputer interfaces (Lotte et al., 2018).
In single-trial analyses in general and in neurofeedback or BCI setups specifically, clean signals are of particular relevance to ensure reliable measurements and, for neurofeedback, to ensure that a functionally specific modulation of brain activity can be achieved. The results of the present study show that for ME and MI both the WLF+GCR and the BPF+GCR are promising preprocessing approaches to obtain cleaner signals, as it was consistently linked to single trial signal improvement in terms of temporal consistency and spatial specificity. More generally, this studies' results highlight the tremendous need for an agreement on appropriate preprocessing pipelines in fNIRS. Notably, for other tasks which might not result in a strong hemodynamic response, as is typically found in motor tasks, a proper correction is not guaranteed.
Notwithstanding, the major constraint in the present study is the lack of a validation by means of a short-distance channel approach, that would allow to measure and remove the extracerebral component of the GC, and similarly, a means of validation the removal of the cerebral component of the GC. Thus, our results, similar to those of others (Zhang et al., 2016;Pfeifer et al., 2018), should be considered tentative until corresponding data sets and analyses are available.

DATA AVAILABILITY STATEMENT
The datasets generated for this study are available on request to the corresponding author.

ETHICS STATEMENT
The study was approved by the Ethics Committee of the University of Oldenburg.

AUTHOR CONTRIBUTIONS
FK and CK designed the study. FK collected and analyzed the data as well as wrote the first draft of the manuscript. CK supervised FK and edited as well as commented the manuscript in all of its versions.