An Information-Theoretic Approach to Quantitative Analysis of the Correspondence Between Skin Blood Flow and Functional Near-Infrared Spectroscopy Measurement in Prefrontal Cortex Activity

Effect of Skin blood flow (SBF) on functional near-infrared spectroscopy (fNIRS) measurement of cortical activity proves to be an illusive subject matter with divided stances in the neuroscientific literature on its extent. Whereas, some reports on its non-significant influence on fNIRS time series of cortical activity, others consider its impact misleading, even detrimental, in analysis of the brain activity as measured by fNIRS. This situation is further escalated by the fact that almost all analytical studies are based on comparison with functional Magnetic Resonance Imaging (fMRI). In this article, we pinpoint the lack of perspective in previous studies on preservation of information content of resulting fNIRS time series once the SBF is attenuated. In doing so, we propose information-theoretic criteria to quantify the necessary and sufficient conditions for SBF attenuation such that the information content of frontal brain activity in resulting fNIRS times series is preserved. We verify these criteria through evaluation of their utility in comparative analysis of principal component (PCA) and independent component (ICA) SBF attenuation algorithms. Our contributions are 2-fold. First, we show that mere reduction of SBF influence on fNIRS time series of frontal activity is insufficient to warrant preservation of cortical activity information. Second, we empirically justify a higher fidelity of PCA-based algorithm in preservation of the fontal activity's information content in comparison with ICA-based approach. Our results suggest that combination of the first two principal components of PCA-based algorithm results in most efficient SBF attenuation while preserving maximum frontal activity's information. These results contribute to the field by presenting a systematic approach to quantification of the SBF as an interfering process during fNIRS measurement, thereby drawing an informed conclusion on this debate. Furthermore, they provide evidence for a reliable choice among existing SBF attenuation algorithms and their inconclusive number of components, thereby ensuring minimum loss of cortical information during SBF attenuation process.

Effect of Skin blood flow (SBF) on functional near-infrared spectroscopy (fNIRS) measurement of cortical activity proves to be an illusive subject matter with divided stances in the neuroscientific literature on its extent. Whereas, some reports on its non-significant influence on fNIRS time series of cortical activity, others consider its impact misleading, even detrimental, in analysis of the brain activity as measured by fNIRS. This situation is further escalated by the fact that almost all analytical studies are based on comparison with functional Magnetic Resonance Imaging (fMRI). In this article, we pinpoint the lack of perspective in previous studies on preservation of information content of resulting fNIRS time series once the SBF is attenuated. In doing so, we propose information-theoretic criteria to quantify the necessary and sufficient conditions for SBF attenuation such that the information content of frontal brain activity in resulting fNIRS times series is preserved. We verify these criteria through evaluation of their utility in comparative analysis of principal component (PCA) and independent component (ICA) SBF attenuation algorithms. Our contributions are 2-fold. First, we show that mere reduction of SBF influence on fNIRS time series of frontal activity is insufficient to warrant preservation of cortical activity information. Second, we empirically justify a higher fidelity of PCA-based algorithm in preservation of the fontal activity's information content in comparison with ICA-based approach. Our results suggest that combination of the first two principal components of PCA-based algorithm results in most efficient SBF attenuation while preserving maximum frontal activity's information. These results contribute to the field by presenting a systematic approach to quantification of the SBF as an interfering process during fNIRS measurement, thereby drawing an informed conclusion on this debate. Furthermore, they provide evidence for a reliable choice among existing SBF attenuation algorithms and their inconclusive number of components, thereby ensuring minimum loss of cortical information during SBF attenuation process.
Keywords: functional near-infrared spectroscopy, skin blow flow, transfer entropy, conditional entropy, frontal cortex activity

INTRODUCTION
Functional near-infrared spectroscopy (fNIRS) time series of brain cortical activity is subject to various systemic physiological interferences, ranging from cardiac activity and respiration (Orbig et al., 2000;Tonorov et al., 2000;Payne et al., 2003) to motion artifacts Tak and Ye, 2014;Naseer and Hong, 2015) and scalp-hemodynamic (Scholkmann et al., 2014). While detection and correction strategies for most of these physiological and systemic interferences are widely practiced, effect of scalp-hemodynamic and skin blood flow (SBF) has been subject to much divided perspective. Whereas, Takahashi et al. (2011) argued that a major part of task-evoked changes in blood oxygenation hemoglobins ( Oxy-Hb) in the forehead is due to SBF, Sato et al. (2013) claimed that such changes are highly correlated with blood oxygen level dependent (BOLD) of functional magnetic resonance imaging (fMRI) and gray matter, as opposed to SBF or other soft tissues. This, in part, helps explain the lack of incorporation of analytical steps for SBF attenuation in most recent studies (Liand et al., 2010;Cuiand et al., 2011;Gagnonand et al., 2012;Fishbum et al., 2014;Ozawa et al., 2014;Bakerand et al., 2016;Guand et al., 2017;Liu et al., 2017) as well as NIRS-related analytical toolkits (Koh et al., 2007;Huppertand et al., 2009;Strangmann, 2009;Ye et al., 2009;Feketeand et al., 2011).
However, research suggests that systemic changes due to SBF are unpreventable since they are the result of the activation of the autonomic nervous system and/or varying blood pressure in response to action (Lee et al., 2002;Scremin and Kenney, 2004). In fact, Sato et al. (2016) argued that scalp-and cerebralhemodynamic (in particular Oxy-Hb) increase in a taskrelated manner, implying their similar temporal profiles. In light of these observations, the field has witnessed a growing number of research on SBF attenuation approaches. One class of SBF filters has considered the direct measurements from short source-detector distances (e.g., ≤ 2.0 cm) to attenuate the effect of scalp-hemodynamic on channels with longer distance (e.g., 3.0 cm ≤ d ≤ 4.5 cm) (Zhang et al., 2009;Gagnon et al., 2011Gagnon et al., , 2012Gagnon et al., , 2014Saager et al., 2011;Haeussinger et al., 2014), where d stands for source-detector distance. Another family of such filters has been built upon the assumption that scalp-hemodynamic changes are more global than cerebralhemodynamic, thereby introducing mathematically founded techniques for SBF removal. For instance, Zhang et al. (2005) assumed the orthogonality between the spatial interference and the spatial evoked activation subspaces to derive principal components from resting periods, thereby eliminating such components from task period as representatives of SBF. This approach was further extended by Zhang et al. (2016) through application of Gaussian filtering. On the other hand, Kohno et al. (2007) employed independent component analysis (ICA) to extract the most spatially uniform component of Oxy-Hb. Furthermore, they showed their extracted component was highly correlated with SBF through its comparison with simultaneous laser Doppler measurement (Johnson et al., 1984;Oberg, 1990). Kiguchi and Funane (2014) further extended this ICA-based approach for its online use. Sato et al. (2016) argued that although these filters offer reliable and accurate tool for SBF attenuation, their reliance on considerably large number of probes for data acquisition makes their application infeasible. In turn, they addressed this limitation through identification of the scalphemodynamic component from a small number of short sourcedetector channels, thereby removing this effect based on general linear model (GLM) (Mardia et al., 1979;Friston et al., 1995Friston et al., , 2011. Their approach showed a significant improvement in contrast with Zhang et al. (2005) and Kohno et al. (2007), as suggested by their fMRI-based comparative analysis.
Although our overview of research on scalp-hemodynamic and SBF demonstrates fascinating approaches with impressive results, it reveals an immediate shortcoming that is lack of quantitative realization of the utility of these methodologies for their verification toward a common consensus on their use. For instance, whereas Zhang et al. (2005) claimed the first three principal components for adequate SBF attenuation, Sato et al. (2016) suggested a non-significant difference in adaptation of first or combination of first two or three components. More importantly, these approaches fall short in assessing the state of fNIRS time series once the SBF attenuation process is complete. For example, Kohno et al. (2007) showed a high correlation between their selected most spatially uniform component with SBF signal while discarding a report on fNIRS content once this component was removed. Although a few have included measures such as correlation coefficient, signal-to-noise ratio, or Pearson R 2 (Zhang et al., 2005;Gagnon et al., 2011), these linear measures fail to detect the nonlinearity in interacting processes (Kinney and Atwal, 2014). Moreover, they cannot provide any causal insight on the observed effect: they are unable to quantify whether they attenuated the confounding effect of the interfering process or information pertinent to cortical activity. In addition, many of these results have derived the quality of their SBF attenuation through comparison of fNIRS data with fMRI recordings (Haeussinger et al., 2014;Sato et al., 2016). Although high spatial resolution of fMRI along with significant correlation between its BOLD and corresponding Blood de-/Oxygenation of fNIRS (Okamoto et al., 2004;Steinbrink et al., 2006;Strangman et al., 2006;Toronov et al., 2007;Cuiand et al., 2011) provide a basis for such comparative analyses, they are highly time-consuming and require extra care to ensure the least environmental and experimental discrepancies, making their adaptation impractical in a broader domain.
In this article, we address these shortcomings through proposal of information-theoretic criteria for preservation of information content of frontal brain activity. We utilize the concept of transfer entropy (TE) (Schreiber, 2000) to quantify the effect of SBF on fNIRS time series of frontal brain activity via transferring undesired information onto fNIRS measurement. TE provides a powerful tool for measuring the strength and direction (i.e., causation) of the coupling between simultaneously observed processes (Kaiser and Schreiber, 2002). Utilization of TE as a measure for information flow becomes more attractive, considering its minimal assumptions on dynamics of the time series under investigation, its numerical stability even for reasonably small sample sizes, and its ability in capturing both, linear as well as nonlinear effects (Lungarella and Sporns, 2006). In fact, recent years have witnessed a growing interest in application of TE in neuroscientific research (Lungarella and Sporns, 2006;Honey et al., 2007;Vakorin et al., 2010;Liao et al., 2011). Additionally, we exploit the concept of mutual information (MI) and its correspondence with conditional entropy between interacting continuous random variables (Cover and Thomas, 2006;Stone, 2015) to formalize criteria for preservation of the frontal activity's information in resulting fNIRS time series once the SBF attenuation is complete.
Our results suggest that mere reduction of SBF influence on fNIRS time series of frontal activity is insufficient to warrant frontal activity's information is retained. Moreover, they imply a higher fidelity of PCA-based algorithm in frontal activity's information preservation in comparison with ICAbased approach. Furthermore, they indicate that a combination of first two principal components of PCA-based algorithm results in most efficient SBF attenuation while ensuring maximum frontal activity's information preservation. Our results contribute to the field by presenting a systematic approach to quantification of the SBF and its effect as an interfering process during fNIRS measurement, thereby drawing an informed conclusion on this debate. Furthermore, they provide evidence for a reliable choice among existing SBF attenuation algorithms and their inconclusive number of components, thereby ensuring minimum loss of frontal activity's information during SBF attenuation process.

Preliminaries
Prior to formalizing our information-theoretic criteria, we restate the definitions of conditional entropy, MI, and TE, along with two Theorems and a Corollary (without proofs) from information theory to help better elaborate on rationale behind our criteria.
Definition 2.1. If (X , Y) ∼ f (x, y), ∀x ∈ X , y ∈ Y, the conditional entropy H(X |Y) is defined as Cover and Thomas (2006)[p. 249] H(X |Y) = − f (x, y)log(f (x|y))dxdy ( 1) where f (x, y), ∀x ∈ X , y ∈ Y indicates the joint density and f (x|y) is the probability of occurrence of x, given that y occurred. In other words, conditional entropy quantifies the average uncertainty regarding the value of X when the value of Y is known (Stone, 2015).
Definition 2.2. The mutual information I(X ; Y) between two random variables with joint density f (x, y), ∀x ∈ X , y ∈ Y is defined as Cover and Thomas (2006)[p. 251] Mutual information can be expressed in terms of conditional entropy as (ibid.): with H(.) representing the entropy of its argument and H(X |Y) and H(X , Y) are the conditional entropy and the joint entropy between X and Y, respectively. On the other hand, transfer entropy (Schreiber, 2000) aims at extracting directed flow or transfer of information (Lungarella and Sporns, 2006) between interacting processes. In essence, TE quantifies the deviation from generalized Markov property p(x t+1 |x t , y t ) = p(x t+1 |x t ), ∀x t , x t+1 ∈ X , y t ∈ Y, with p(x|y) being the probability of occurrence of x, given y occurred. TE is expressed as a specific version of Kullback-Leibler divergence (Cover and Thomas, 2006;Stone, 2015) i.e., the relative entropy (Lungarella and Sporns, 2006): One can also identify TE as a conditional MI (i.e., a causal inference on shared information) between two interacting processes: If this deviation is small, then the state of Y is assumed to have minimal or no relevance on the transition probabilities of X (Lungarella and Sporns, 2006), thereby implying an absence and/or a non-significant effect of Y on X . It is worthy of note that unlike MI, TE is explicitly and strictly non-symmetric under exchange of the role of the interacting processes (Kaiser and Schreiber, 2002). In other words, Theorem 2.1. (Conditioning reduces entropy): For any two random variables X and Y, we have (Cover and Thomas, 2006, p. 41 and p. 253) with equality if and only if X and Y are independent.
then (Cover and Thomas, 2006, p. 34) Corollary 2.1. In particular, if Z = g(Y), we have (Cover and Thomas, 2006, p. 35) Concretely, Data Processing Inequality (DPI), as formulated in Equations (7,8), states that the result of any manipulation of data cannot improve the inferences that are made from the data (Cover and Thomas, 2006;Kinney and Atwal, 2014). In other words, no matter how sophisticated a processing approach is, it inevitably results in loss of information. Implication of DPI in SBF attenuation is 2-fold: (1) it implies that no matter how well-defined an SBF attenuation process is, loss of information is inherent in its steps.
(2) Therefore, it is of utmost cruciality to ensure such a loss maximally reflects the undesirable information induced by SBF than actual frontal brain activity.
In what follows, we utilize the aforementioned observations from information theory to formalize our criteria. In essence, Criterion 2.1 through Criterion 2.3 provide necessary and sufficient conditions for quantification of the ability of SBF attenuation algorithms in reducing effect of SBF on fNIRS time series of frontal brain activity. On the other hand, Criterion 2.4 signifies utility of such algorithms in preservation of the frontal brain activity's information in resulting fNIRS times series. Finally, Criterion 2.5 examines whether preservation of the cortical activity in resulting fNIRS time series is maximized by adapted SBF algorithms.

The Criteria
Let X and X ′ represent the fNIRS time series of cortical activity before and after application of an SBF attenuation algorithm. Furthermore, let Y be the time series, representing SBF. The main premise of any SBF attenuation algorithm is its effectiveness for reducing the impact of SBF on fNIRS time series of cortical activity of human subjects.
Criterion 2.1. Transferred information from SBF to fNIRS times series is significantly reduced if adapted attenuation process is effective i.e., TE(Y → X ′ ) ≤ TE(Y → X).
Criterion 2.1, in turn, implies that given different SBF attenuation algorithms, the one with significantly smaller TE(Y → X ′ ) is more effective in attenuation of SBF influence on fNIRS time series of frontal brain activity in comparison with other SBF attenuation algorithms.
Furthermore, if X primarily represents the frontal activity that is partially contaminated by SBF, then attenuation of SBF must, in principle, have no effect on ability of X to explain X ′ in absence of Y. This, in turn, implies that Y → X → X ′ holds true (as per Theorem 2.2) and X ′ = g(X) i.e., X ′ is expressible as a function of X. We utilize this observation in conjunction with Corollary 2.1 to formalize our second criterion.
Criterion 2.2. fNIRS time series prior to SBF attenuation must, in principle, have more in common with Y than after SBF attenuation is complete i.e., I(Y; X) ≥ I(Y; g(X)). Using Theorem 2.2 and applying Equation (3), we get ⇒ H(Y|X) ≤ H(Y|X ′ ) As a direct consequence of Criterion 2.2, an SBF attenuation algorithm with significantly larger H(Y|X ′ ) is more effective, in comparison with any other such algorithms. Criterion 2.1 and Criterion 2.2 provide necessary conditions to validate that outcome of an adapted SBF attenuation algorithm is successful in reducing the correspondence between SBF and fNIRS time series of frontal brain activity. However, they are not sufficient conditions for quantification of significance of such a reduction since H(Y|X ′ ) = H(X ′ |Y), ∀X ′ , Y. Therefore, it is necessary to validate the sufficient condition of such an algorithm in reducing the SBF effect. Criterion 2.3. Y must, in principle, be more informative about fNIRS before than after attenuation is complete i.e., H(X|Y) ≤ H(X ′ |Y) As a result of Criterion 2.3, if an SBF attenuation algorithm is more effective, its application must result in significantly larger H(X ′ |Y) in comparison with other such algorithms.
Moreover, an effective SBF attenuation algorithm ensures the preservation of cortical activity while reducing the undesirable effect of SBF, thereby resulting in higher mutual information between X and X ′ than X and Y. We formulate this expectation through the fourth criterion.
Criterion 2.4. X ′ realizes the information content of X more than Y does i.e., I(X; X ′ ) ≥ I(X; Y). Applying Equation (3), this is equivalent to Considering Criterion 2.4, an effective SBF attenuation algorithm results in significantly smaller H(X|X ′ ) in comparison with other SBF attenuation algorithms. Criterion 2.4 forms a necessary condition to ensure that the acquired time series is primarily due to cortical activity that is affected by SBF than vice-versa, as demonstrated through the following Theorem.
Theorem 2.3. Let X, C, and Y represent recorded fNIRS, actual cortical activity, and SBF such that X = C + Y. Then, Criterion 2.4 is a necessary condition if C is reflective of cortical activity.
Proof: Taking derivative of X = C + Y with respect to C (i.e., variable of interest), we have: In addition, the entropy of a transformed random variable X is Stone (2015, p. 119): Substituting Equation (11) in Equation (12), we have: Frontiers in Neuroscience | www.frontiersin.org which, in turn, implies that and therefore, Considering the fact that I(X; C) ≥ 0, ∀X, C, we get which implies that the interval within which X lies i.e., X ∈ [a, b] must satisfy b − a ≥ 1 since b − a < 1 ⇒ H(X) < 0 2 . This observation along with Theorem 2.1 indicate that In addition, SBF attenuation must ensure that attenuated effect is maximally reflective of Y than X, thereby inducing minimum loss of information content of X once SBF attenuation is complete.
Criterion 2.5. Preserved information content of X in X ′ is maximized if attenuation primarily reduces the effect of SBF. Using Theorem 2.1, we have: and Considering Equations (19, 20) and applying Criterion 2.3 and Criterion 2.4, we get: Criterion 2.5, in turn, indicates that an SBF algorithm that significantly (i.e., in comparison with other such algorithms) maximizes the inequality H(X ′ ) − H(X|X ′ ) ≥ 0 also attains the maximum frontal activity's information preservation in the resulting fNIRS time series.
2 Continuous random variables can have negative entropy. For instance, H(X) < 0 if σ 2 X < 1 2π e or µ X < 1 e in case of normal and Poisson distributions.

Simulation-Based Verification
We simulated (Figure 2) four random time series of length 3,000 out of which 600 and 2,500 data points were used as resting and task periods, respectively. We considered four time series to adhere with the number of channels in adapted fNIRS device in this study. In addition, ICA-and PCA-based algorithm require more than a single sequence for their components' calculation. The reason behind 600 and 2,500 data points split was due to the requirement of resting data by PCA-based SBF attenuation algorithm (Zhang et al., 2005) to calculate the global trend in given time series (e.g., global hemodynamic responses such as SBF) in the form of principal components. We chose 600 data points to mimic the 1-min-long resting period in our realtime experimental settings (i.e., 60 s × 10.0 Hz sampling rate of our fNIRS device). This, in turn, resulted in the use of 2,500 data points during the PCA-and ICA-based algorithms analyses. We repeated this random time series simulation for fifty rounds (M = 5.76, SD = 18.61). Furthermore, we simulated SBF as a Gaussian noise with the same length i.e., 3,000 data points (50 different series one for each round of simulated time series above) with their mean and standard deviation equals 3 times the corresponding simulated channels time series mean and standard deviation (M = 15.56, SD = 43.25). This noise was also added to the first 600 data points of each of the simulated sequences (i.e., simulated resting portion for PCA-based SBF attenuation) to mimic the global effect of SBF. We report the averaged analysis results of these fifty simulation rounds for simulation-based verification of Criterion 2.1 through Criterion 2.5.

Participants
We conducted three experiments, namely, verbal fluency task (referred to as VFT hereafter), conversation task experiment (referred to as CTE hereafter), and logical memory test (LMT hereafter). Two different groups of 20 (10 females and 10 males, M = 70.20, SD = 3.78) and 18 (6 females and 12 males, M = 72.31, SD = 4.16) older adults participated in VFT and CTE. On the other hand, LMT included thirty two (sixteen females and sixteen males, M = 20.50, SD = 1.80) younger adults. We were unable to record fNIRS data from two participants during VFT and six participants in LMT. Therefore, these individuals were excluded from our analyses.
Choice of VFT was to ensure the correctness as well as effectiveness of our criteria in quantification of the SBF attenuation and frontal activity's information preservation in a fine-grained working memory task. On the other hand, CTE allowed us to evaluate our analysis results in a naturalistic setting. Last, we included LMT to verify our results are unaffected by age of the participants.
All participants were right-handed (confirmed using FLANDERS (Nicholls et al., 2013) handedness questionnaire), were free of neurological and psychiatric disorders, and had no history of hearing impairment. Prior to the data collection, we received approval (approval code: 16-601-1) from the ethical committee at the Advanced Telecommunications Research Institute International (ATR), Kyoto, Japan. All subjects gave written informed consent. Subjects were seated on an easy armchair in the sound-attenuated experimental room, with instructions to fully relax and their eyes closed while resting.

VFT
We adapted the protocol by Takahashi et al. (2011) in their study of SBF effect on fNIRS time series. It consisted of two blocks. Each block consisted of a 6-s-long word generation. The word generation period was preceded with 30 s of silence period and followed by 70 s of control periods. During the word generation period, participants had to generate as many words as possible that started with the syllable that was auditorily presented every 20 s. In the control period, participants were instructed to repeat five syllables: /a/, /i/. /u/, /e/, and /o/. This task was also used by Sato et al. (2016) in their comparative fMRI-fNIRS analysis of reduction of SBF interference on fNIRS time series of frontal brain activity using their proposed GLM-based approach.
We started each experiment by acquiring a 1-min-long resting data, followed by its corresponding VFT session. We provided our participants with a 1-min-long resting break (while staying at their seat with their eyes closed) prior to the commencement of the experiment. We kept the content of the two blocks intact. However, we randomized the order of their occurrence among the participants. We used a speaker as a medium and generated the sequences of VFT using PsychoPy.

CTE
It included a 20-min-long conversation with participants about a site-seeing visit to Yakushima Island in southern Japan (in Japanese). We started by acquiring a 1-min-long resting data, followed by its corresponding 20-min-long experimental session. We provided our participants with a 1-min-long resting break (while staying at their seat with their eyes closed) prior to the commencement of the experiment. We kept the content of conversation intact. A female assistant, who was not aware of the purpose of this study, conversed with our participants. We controlled the conversational session in such a way that the operator was in charge of leading the topic, thereby requiring all the participants to respond to same series of statements and questions [e.g., their most interesting visited site(s), preferred cousin, etc.].

LMT
We adapted the protocol by Basso Moro et al. (2013). It started with participants listening to a 20-s-long story of the LMT (D. Wechsler, 1997) which was immediately followed by participants repeating the narrated story aloud, trying to recall as much of a detail as possible. The recall period lasted for 30 s. We started each experiment by acquiring a 1-min-long resting data, followed by its corresponding LMT session. We provided our participants with a 1-min-long resting break (while staying at their seat with their eyes closed) prior to the commencement of the experiment. We kept the content of the LMT story intact. A female assistant, who was not aware of the purpose of this study, read the stories to the participants through a speaker medium.

Data Acquisition
We used functional near infrared spectroscopy (fNIRS) (Ferrari and Quaresima, 2012;Dix et al., 2013) to collect frontal brain activity of the participants. We acquired fNIRS time series data of the participants using a wearable optical topography system "HOT-1000," developed by Hitachi High-Technologies Corporation (please refer to Figure 1). Participants wore this device on their forehead. It records the frontal brain activity through detection of total blood flow via emitting a wavelength laser light (810 nm) at 10.0 Hz sampling rate. Data acquisition is carried out through four channels (i.e., Left 1 , Left 3 , Right 1 , and Right 3 , as shown in Figure 1). Subscripted numerical values that are assigned to these channels specify their respective source-detector distances. In other words, Left 1 and Right 1 have a 1.0 cm and Left 3 and Right 3 have 3.0 cm source-detector distances, respectively. Research findings indicate that short source-detector channels [e.g., 0.5 cm (Takahashi et al., 2011), 1.0 cm (Gagnon et al., 2011), 1.5 cm (Sato et al., 2013(Sato et al., , 2016Kiguchi and Funane, 2014), and 2.0 cm (Yamada et al., 2009)] are mostly representatives of scalp hemodynamics than cortical blood flow (CBF). Moreover, choice of 3.0 cm source-detector distance is customary in NIRS-based studies of brain activity (Yamada et al., 2009;Gagnon et al., 2011;Takahashi et al., 2011;Sato et al., 2013Sato et al., , 2016Kiguchi and Funane, 2014). It is also worth noting that Zhang et al. (2005) and Kohno et al. (2007) adapted this source-detector distance in their original articles on PCA-and ICA-based SBF attenuation algorithms. Therefore, we primarily report our analysis results on long source-detector channels [Left 3 in the main body of this article and Right 3 in We placed a laser Doppler tissue blood flow meter probe (FLO-C1, Omegawave Incorporated, Tokyo, Japan) on participants' forehead close to the Left 1 for SBF recording. It collected data from the scalp layer within 1.0 mm from the probe and its analog output was recorded simultaneously alongside the fNIRS recording. This device uses a laser beam with a wavelength of 780.0 nm and has a sampling rate of 10.0 Hz that matches our fNIRS device sampling rate. This device has also been adapted by Takahashi et al. (2011) during their SBF effect study.
To ensure synchronized data acquisition across sensors, we collected data streams for NIRS and SBF through "labstreaminglayer" system.

Data Preprocessing
First, we baseline-normalized the data via subtracting the mean of 1-min-long resting period. This step that is customary in fMRI/fNIRS research is based on the assumption that it removes the brain activity that was present prior to the start of the task (as reflected in the time series data recorded during the resting period) and hence does not reflect the effect of the brain activation during the task period. Next and in oder to attenuate the effect of systemic physiological artifacts (Tak and Ye, 2014) (e.g., cardiac pulsations, respiration, etc.) we applied a one-degree polynomial butter worth filter with 0.01 Hz and 0.6 Hz for low and high bandpass. This was followed by linear detrending. Detrending of the signal that is adapted from signal processing and time series analysis and forecasting is a necessary step to relative locations of channels of fNIRS device in our study (i.e., L1, L3, R1, and R3) are depicted in red (i.e., sources) and green (i.e., detectors) squares. L1, R1, L3, and R3 are channels with short (i.e., 1.0 cm) and long (i.e., 3.0 cm) source-detector distances.
ensure that assumptions of stationarity and homoscedasticity (as reflected in wide spread application of linear models in analysis of fNIRS/fMRI time series) are not strongly violated (e.g., due to seasonality and/or repetitive increasing/decreasing patterns) by acquired fNIRS signal.
We adapted the SBF attenuation algorithms by Zhang et al. (2005) (referred to as PCA-based hereafter) and Kohno et al. (2007) (referred to as ICA-based hereafter) in present study. The first approach utilizes an eigenvector-based spatial filtering method that is applied to the rest (baseline) period, thereby removing the first r spatial eigenvectors calculated from the baseline data by PCA from the fNIRS time series that are recorded during the task period. We used the resting period time series of the four channels of our device (Figure 1), per participant, for this purpose. Given the combination of short (i.e., Left 1 and Right 1 ) and long (i.e., Left 3 , , and Right 3 ) sourcedetector channels of our device, the PCA-based algorithm is able to capture the components that best represent global hemodynamics, as noted by Zhang et al. (2005). This is due to the results that indicate the short source-detector channels [e.g., 0.5 cm (Takahashi et al., 2011), 1.0 cm (Gagnon et al., 2011), 1.5 cm (Sato et al., 2013(Sato et al., , 2016Kiguchi and Funane, 2014), and 2.0 cm (Yamada et al., 2009)] are mostly representatives of scalp hemodynamics and the long source-detector channels are suitable for recording of the CBF (Yamada et al., 2009;Gagnon et al., 2011;Takahashi et al., 2011;Sato et al., 2013Sato et al., , 2016Kiguchi and Funane, 2014). It is worth noting that Zhang et al. (2005) also adapted the 3.0 cm source-detector distance (i.e., similar to Left 3 , , and Right 3 in our case) for capturing the CBF in their original article on PCA-based SBF attenuation algorithm. On the other hand, ICA-based SBF attenuation algorithm (Kohno et al., 2007) removes the component that has the highest coefficient of spatial uniformity (i.e., the absolute value of the coefficient of variation; Everitt, 1998) among the independent components as a representative of the global scalphemodynamic component. Similar to PCA-based algorithm, we used the four channels of our device to determine the component with coefficient of spatial uniformity. It is worth noting that Kohno et al. (2007) also adapted the 3.0 cm source-detector distance (i.e., similar to Left 3 , , and Right 3 in our case) for capturing the CBF in their original article on ICA-based SBF attenuation algorithm. Sato et al. (2016) used the PCA-based algorithm after preprocessing steps for SBF attenuation. On the other hand, they utilized the ICA-based algorithm prior to preprocessing steps. These orders for application of SBF attenuation [i.e., preprocessing the fNIRS time series after (in case of ICA-based) and before (in case of PCA-base) SBF attenuation] were reported by Kohno et al. (2007) and Zhang et al. (2005) as well. Therefore, we followed the same orderings while applying these algorithms for attenuation of the effect of SBF on fNIRS time series data of cortical activity of our participants.
Although Zhang et al. (2005) did not provide any specific reason for selection of the number of components, they considered the combination of the first three components (referred to as PC123 hereafter) to be an adequate choice for attenuation of the SBF effect. On the other hand, Sato et al. (2016) argued that the difference between the first three components in contrast with only the first component (referred to as PC1 hereafter) is non-significant. Subsequently, they adapted the first component in their analysis. In present study, we used these two settings. However, considering the explanatory power of the first three components in PCAbased approach (Zhang et al., 2005;Sato et al., 2016), we also included the combination of first two components (referred to as PC12 hereafter) in our analyses. On the other hand, we followed Kohno et al. (2007) and removed the component with highest coefficient of spatial uniformity (referred to as IC1 hereafter) in case of ICA-based SBF attenuation algorithm.

Statistical Analysis
First, we applied Wilcoxon rank sum (i.e., one-sample) test to determine the effect of SBF on fNIRS (i.e., TE(Y → X)) time series of frontal brain activity of participants (in both, VFT and CTE). This was followed by testing Criterion 2.1 through Criterion 2.5 to determine the utility of PCA-and ICA-based algorithms and their respective adapted components (i.e., PC1, PC12, PC123 in case of PCA-based and IC1 in case of ICA-based SBF attenuations) in reduction of the effect of SBF on fNIRS while preserving information content of frontal brain activity.
In case of Criterion 2.1, we performed Kruskal-Wallis test on combination of TE before [i.e., TE(Y → X)] along side after [i.e., TE(Y → X ′ )] SBF attenuation by each of PC1, PC12, PC123, and IC1 to determine any significance induced by the choice of these components in reduction of TE. This was followed by post-hoc paired Wilcoxon rank sum test.
We followed the same steps in case of Criterion 2.2 and Criterion 2.3, replacing TE before [i.e., TE(Y → X)] and after [i.e., TE(Y → X ′ )] with the "reduction of SBF-related information in frontal brain activity before [i.e., H(Y|X)] and after [i.e., H(Y|X ′ )] SBF attenuation," in case of Criterion 2.2, and "reduction of correspondence between SBF and fNIRS time series of frontal brain activity before [i.e., H(X|Y)] and after [i.e., H(X ′ |Y)] SBF attenuation," in case of Criterion 2.3, respectively.
In case of Criterion 2.4, we performed Kruskal-Wallis test on combination of H(X|Y) alongside the measured H(X|X ′ ) by each of PC1, PC12, PC123, and IC1 to determine any significance induced by the choice of these components in retaining the correspondence between time series of frontal activity before and after application of SBF attenuation. This was followed by post-hoc paired Wilcoxon rank sum test.
In case of Criterion 2.5, we first applied a paired Wilcoxon rank sum (i.e., two-sample) between H(X|X ′ ) and H(X ′ ) for each of the components (i.e., PC1, PC12, PC123, and IC1) separately, thereby signifying their respective effect in preservation of information content of frontal brain activity. Next, we performed Kruskal-Wallis test on combination of preserved frontal brain activity [i.e., H(X ′ ) − H(X|X ′ )] by each of PC1, PC12, PC123, and IC1 to quantitatively determine any significance induced by the choice of these components in such an information preservation. This was followed by post-hoc paired Wilcoxon rank sum between every pair of these components, thereby determining the component(s) that significantly maximize such a preservation of information content of frontal brain activity in comparison with other components.
For Kruskal-Wallis, we reported the effect size r = χ 2 N , with N denoting the sample size, as suggested by Rosenthal and DiMatteo (2001). In case of Wilcoxon test, we used r = W √ N (Tomczak and Tomczak, 2014) as effect size with W denoting the Wilcoxon statistics and N is the sample size. All results reported are Bonferroni corrected (i.e., multiplying the p-values with the sample size, given the use of non-parametric tests). We used JIDT (Lizier, 2014) for calculation of TE, MI, and conditional entropies. All statistical analyses were carried out in Matlab R2016a environment. We used Gramm (Morel, 2018) for data visualization. We used Python 2.7 for simulated data generation, realtime data acquisition and processing, and informationtheoretic measures computation. All statistical analyses were carried out in Matlab R2016a. Figure 2A corresponds to the grand-average of the computed TE values for lags 1 through 100 (i.e., up to 10 s of lag, considering the sampling rate of 10.0 Hz of our device during the real-time data acquisition). This subplot indicates that maximum transferred information from simulated noise to simulated channels was, on average, at lag = 36 (i.e., 3.6 s in adapted fNIRS device in present study). This was the lag we used during our analyses. Figure 2B visualizes a sample spatial map of eigenvectors by PCA-based SBF attenuation algorithm along with the sample original simulated time series (in matrix forms for better visualization purpose) before and after application of PC1, PC12, and PC123 components. Distinctive effect of these components on the simulated data is evident in this subplot. We observed significant differences between data before and after application of PC1 [p < 0.001, W  Figure 2C shows a sample Gaussian noise (blue) along with the computed IC1 by ICA-based SBF attenuation algorithm (green) (pertinent to the case in which simulated channel 1 was selected as component with highest coefficient of spatial uniformity). Although this algorithm resulted in selection of different simulated channels as the component with highest coefficient of spatial uniformity (i.e., the absolute value of the coefficient of variation; Everitt, 1998) (Figure 2D, Ch1 = 20.00%, Ch2 = 16.00%, Ch3 = 40.00%, Ch4 = 24.00%), these components were significantly correlated with simulated noise (Ch1: r = 0.73, p < 0.001, Ch2: r = 0.78, p < 0.001, Ch3: r = 0.78, p < 0.001, and Ch4: r = 0.89, p < 0.001). On the other hand, PCA-based algorithm ( Figure 2E) exhibited a higher specificity in selecting the principal components (PC1 = 89.00%, PC12 = 8.10%, PC123 = 2.90%). In these subplots, IC1, PC1, PC12, and PC123 refer to the measured quantity by these components after their respective SBF attenuation algorithms. Asterisks mark the components with significant difference (*p < 0.05, **p < 0.01, ***p < 0.001).

VFT
In what follows, we examine the effectiveness of PCA-and ICA-based SBF attenuation algorithms on reduction of observed impact of SBF on fNIRS frontal brain activity as well as their utility in preservation of information content of this activity in resulting fNIRS time series through investigation of Criterion 2.1 through Criterion 2.5.  Figure 7 shows these results. Figure 8A corresponds to the grand-average of the computed TE values for lags 1 through 100 (i.e., up to 10 s of lag, considering the sampling rate of 10.0 Hz) in CTE. This subplot indicates that maximum SBF transferred information was, on average, at lag = 30 (i.e., 3.0 s). This was the lag we used during our analyses. Wilcoxon rank sum test implied significant effect of SBF on fNIRS time series of frontal brain activity of participants during conversation [p < 0.001, W (17) = 3.72, M = 0.13, SD = 0.13]. Figure 8B visualizes a sample spatial map of eigenvectors by PCA-based SBF attenuation algorithm along with a sample participant's PFC time series (in matrix forms for better visualization purpose) before and after application of PC1, PC12, and PC123 components. Distinctive effect of these components on participant's PFC time series is evident in this subplot. All matrices are scaled within [0, 1] interval for ease of comparison. We observed significant differences between PFC time series before and after application of PC1  Figure 8C shows a sample SBF (blue) along with the computed IC1 by ICA-based SBF attenuation algorithm (green) (pertinent to the grand-average of the cases in which Right 1 was selected as the component with highest coefficient of spatial uniformity). Although this algorithm resulted in selection of different channels as the component with highest coefficient of spatial uniformity (Figure 8D, Left 1 = 57.89%, Left 3 = 10.53%, Right 1 = 10.53%, Right 3 = 21.05%), these components were significantly correlated with simulated noise (Left 1 : 0.79, p < 0.001, Left 3 : r = 0.67, p < 0.001, Right 1 : r = 0.81, p < 0.001, and Right 3 : r = 0.76, p < 0.001). On the other hand, PCAbased algorithm ( Figure 8E) exhibited a higher specificity in selecting the principal components (PC1 = 97.42%, PC12 = 2.43%, PC123 = 0.15%).

CTE
In what follows, we examine the effectiveness of PCA-and ICA-based SBF attenuation algorithms on reduction of observed impact of SBF on fNIRS frontal brain activity as well as their utility in preservation of information content of this activity in resulting fNIRS time series through investigation of Criterion 2.1 through Criterion 2.5.  Figure 9, subplot C1, shows these results.  In these subplots, IC1, PC1, PC12, and PC123 refer to the measured quantity by these components after their respective SBF attenuation algorithms. Asterisks mark the components with significant difference (*p < 0.05, **p < 0.01, ***p < 0.001).   Figure 11A corresponds to the grand-average of the computed TE values for lags 1 through 100 (i.e., up to 10 s of lag, considering the sampling rate of 10.0 Hz of our device) in LMT. This subplot indicates that maximum SBF transferred information was, on average, at lag = 32 (i.e., 3.2 s). This was the lag we used during  Figure 11C shows a sample SBF (blue) along with the computed IC1 by ICA-based SBF attenuation algorithm (green) (pertinent to the case in which Right 1 was selected as the component with highest coefficient of spatial uniformity). Although this algorithm resulted in selection of different channels as the component with highest coefficient of spatial uniformity ( Figure 11D, Left 1 = 26.92%, Left 3 = 23.08%, Right 1 = 30.77%, Right 3 = 19.23%), these components were significantly correlated with simulated noise (Left 1 : 0.77, p < 0.001, Left 3 : r = 0.73, p < 0.001, Right 1 : r = 0.70,p < 0.001, and Right 3 : r = 0.76, p < 0.001). On the other hand, PCA-based algorithm ( Figure 11E) exhibited a higher specificity in selecting the principal components (PC1 = 94.53%, PC12 = 5.11%, PC123 = 0.36%).

LMT
In what follows, we examine the effectiveness of PCA-and ICA-based SBF attenuation algorithms on reduction of observed impact of SBF on fNIRS frontal brain activity as well as their utility in preservation of information content of this activity in resulting fNIRS time series through investigation of Criterion 2.1 through Criterion 2.5.  In these subplots, IC1, PC1, PC12, and PC123 refer to the measured quantity by these components after their respective SBF attenuation algorithms. Asterisks mark the components with significant difference (*p < 0.05, **p < 0.01, ***p < 0.001). Last, we found that PC1 was significantly more effective than PC123 [p < 0.03, W (50) = 2.41, r = 0.33]. Figure 13 shows these results.

DISCUSSION
In this article, we pinpointed the lack of perspective in previous studies on frontal activity's information preservation while attenuating the effect of SBF on fNIRS time series. Subsequently, we proposed five information-theoretic criteria for quantification of the frontal activity's information preservation during SBF attenuation process. We utilized the concept of TE (Schreiber, 2000) to quantify the effect of SBF on fNIRS time series of frontal brain activity via transferring undesired information onto fNIRS measurement. Advantages of TE for this purpose are twofold: First, TE, similar to MI and unlike other correlation measures, is always ≥ 0 that makes it a good quantitative metric for amount of information transferred/shared. Second, TE, unlike MI and other correlation measures, has a direction [i.e., TE(X ⇒ Y) = TE(Y ⇒ X)] which allows for causal reasoning about which process induces the observed effect on transferred/shared information. Additionally, we exploited the concept of MI and its correspondence with conditional entropy between interacting continuous random variables (Cover and Thomas, 2006;Stone, 2015) to formalize criteria for frontal brain activity's information preservation in resulting fNIRS time series once the process of SBF attenuation is complete.
We verified the validity our criteria on PCA- (Zhang et al., 2005) and ICA-based (Kohno et al., 2007) SBF attenuation algorithms using simulated time series with additive Gaussian noise. We chose these SBF attenuation algorithms due to their widespread adaptation in recent literature (Katura et al., 2008;Kiguchi and Funane, 2014;Tak and Ye, 2014;Naseer and Hong, 2015;Sato et al., 2016;Zhang et al., 2016). Although these algorithms make assumption on orthogonality and statistical independence of their components, their well-defined mathematical formulation help eliminate further empirical assumptions on causal and/or explanatory effects (e.g., channels with short source-detector distances as representatives of SBF, etc.). Subsequently, we examined our criteria on two different Working Memory (WM) tasks and a naturalistic conversational settings. Our results implied a significant effect of SBF on fNIRS time series of frontal brain activity through transfer of undesired information. This finding that was founded on analysis of the information flow from SBF onto fNIRS time series of frontal brain activity presented a systematic approach to quantification of the SBF as an interfering process during fNIRS measurement, thereby drawing an informed conclusion on this issue (Takahashi et al., 2011;Sato et al., 2013).
In addition, our results implied that mere reduction of SBF influence on fNIRS time series of frontal activity was insufficient to warrant frontal activity's information preservation. More importantly, we found this observation to hold true, irrespective of the nature of the adapted task or age of the participants. This, in turn, provided further support for inefficiency of such measures as correlation coefficient, signal-to-noise ratio, or Pearson R 2 (Zhang et al., 2005;Kohno et al., 2007;Gagnon et al., 2011) in determination of the significance of the reduced effect of SBF due to their inability in detecting the direction of the information flow (i.e., causality) between interacting processes (Kinney and Atwal, 2014).
Moreover, our results implied a higher fidelity of PCA-based algorithm in preservation of information content of frontal brain activity in comparison with ICA-based approach. This observation was in accord with the findings by Sato et al. (2016). Furthermore, our results revealed a substantial effect of selected number of components on performance of PCA-based algorithm. Concretely, they indicated that combination of first two principal components of PCA-based algorithm resulted in most efficient SBF attenuation while ensuring a significantly higher (i.e., in comparison with other adapted components) frontal activity's information preservation. This provided an evidence for a reliable choice among existing SBF attenuation algorithms and their inconclusive number of components (Zhang et al., 2005;Sato et al., 2016) to ensure minimum loss of frontal activity's information content during SBF attenuation process.
Further evidence on substantial difference in performance of these algorithms was due to the high variability of ICAbased SBF attenuation in determination of the component with highest coefficient of spatial uniformity. This variability that was originally reported by Kohno et al. (2007) reduces the reliability of this algorithm, given its contingency in selecting the channel of interest (e.g., long source-detector distance channels). On the other hand, PCA-based SBF attenuation algorithm exhibited a stable distribution of the percentages of the varianceexplained among its selected components. It is worth noting that observed distribution of these components in our results is in close correspondence with Sato et al. (2016), although their subtle differences is appreciated in light of comparably larger number of channels in their study. Considering these observations, it is apparent that such differences in stability of these algorithms in determination of their respective SBFrelated component(s) impose substantial variation in analyses results. Given the observed variability in choice of components, as indicated by our results on simulated as well as real time data and irrespective of the age of participants, it is apparent that an SBF algorithm with the stable choice of component(s) to reduce the effect of SBF to retain the information content pertinent to brain activity of human subjects is highly desirable. Our results suggested that ICA algorithm falls short in achieving this objective.
Taken together, we provided evidence that lack of perspective on preservation of information content of frontal brain activity during SBF attenuation underlies the contrasting findings with inconclusive results in previous studies. We showed that a mere reduction of SBF influence on fNIRS time series of frontal activity is insufficient in warranting frontal activity's information preservation in resulting fNIRS time series data. Subsequently, we showed a higher fidelity of PCA-based algorithm in achieving this information preservation in comparison with ICA-based approach. Lastly, we showed that combination of first two components of PCA-based algorithm resulted in most information preservation while ensuring significant attenuation of SBF effect. Our findings contribute to the field by presenting a systematic approach to quantification of the SBF as an interfering process during fNIRS measurement, thereby drawing an informed conclusion on this debate (Takahashi et al., 2011;Sato et al., 2013). Furthermore, they provide evidence for a reliable choice among existing SBF attenuation algorithms and their inconclusive number of components (Zhang et al., 2005;Sato et al., 2016), thereby ensuring minimum loss of information content of fNIRS cortical activity through SBF attenuation process.

ETHICS STATEMENT
This study was carried out in accordance with the recommendations of the ethical committee of Advanced Telecommunications Research Institute International (ATR) with written informed consent from all subjects. All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the ATR ethical committee (approval code:16-601-1).

AUTHOR CONTRIBUTIONS
SK proposed the information-theoretic criteria and carried out the analyses. HS acted as research lead, designing the experiments, supervising their progress, and taking part in experimental setups during data collections. MO conducted the experiments and performed data collection. As the head of Hiroshi Ishiguro Laboratories (HIL), HI oversaw the entire activity of all research teams and themes, ensuring the soundness of all proposals, quality of results, and their validity. SK and HS contributed equally in preparation of the manuscript.