A Methodological Review of fNIRS in Driving Research: Relevance to the Future of Autonomous Vehicles

As automobile manufacturers have begun to design, engineer, and test autonomous driving systems of the future, brain imaging with functional near-infrared spectroscopy (fNIRS) can provide unique insights about cognitive processes associated with evolving levels of autonomy implemented in the automobile. Modern fNIRS devices provide a portable, relatively affordable, and robust form of functional neuroimaging that allows researchers to investigate brain function in real-world environments. The trend toward “naturalistic neuroscience” is evident in the growing number of studies that leverage the methodological flexibility of fNIRS, and in doing so, significantly expand the scope of cognitive function that is accessible to observation via functional brain imaging (i.e., from the simulator to on-road scenarios). While more than a decade’s worth of study in this field of fNIRS driving research has led to many interesting findings, the number of studies applying fNIRS during autonomous modes of operation is limited. To support future research that directly addresses this lack in autonomous driving research with fNIRS, we argue that a cogent distillation of the methods used to date will help facilitate and streamline this research of tomorrow. To that end, here we provide a methodological review of the existing fNIRS driving research, with the overarching goal of highlighting the current diversity in methodological approaches. We argue that standardization of these approaches will facilitate greater overlap of methods by researchers from all disciplines, which will, in-turn, allow for meta-analysis of future results. We conclude by providing recommendations for advancing the use of such fNIRS technology in furthering understanding the adoption of safe autonomous vehicle technology.


INTRODUCTION
The era of autonomous driving is upon us. While semi-autonomous cars or "SAE Level 2 Automation Systems" (SAE, 2018) have become visible on our roads (Martinho et al., 2021;Tan et al., 2021), higher order automated systems (i.e., SAE Level 3-5 Automation Systems) are being engineered and tested (see Figure 1). Among the many advantages automated FIGURE 1 | The future of automobile travel may remove control from the human driver all together, allowing drivers to become passengers who are able to engage in other, driving-unrelated tasks. Results from fNIRS brain imaging studies (left) will provide vital insights that may be integral in the design of future automated systems (right). Written informed consent was obtained from the individual for the publication of any potentially identifiable images included in this article.
vehicles could bring to our streets is an increase in safety by outperforming the human through manipulation in driving dynamics (Goh et al., 2020), an increase in consumption efficiency (Phan et al., 2020), or a reduction of overall travel time by stabilizing traffic flow . However, until fully automated systems are engineered, approved, and legalized for the transport of (fully) passive passengers (i.e., SAE Level 4-5), the human driver will still manually drive the car, for at least segments of the drive, for many years to come. In fact, for Level 2-3 Automation, "fallback drivers" are needed to ensure that in the event the autonomous vehicle is unable to operate or experiences a failure, the driver can safely take-over and navigate the vehicle (SAE, 2018). The human operator must monitor the autonomous vehicle operations and its surroundings and, if possible, anticipate failures of the AV system, and respond quickly for potential take-over events.
Critically, it has been noted that drivers' supervisions of automated vehicles are less than perfect fallbacks themselves. The simultaneous failure of the vehicle automation systems and the fallback driver can have disastrous consequences 1 . Research indicates that with the resumption of manual driving from lower levels of automation, drivers experience an increase in response time (Rudin-Brown and Parker, 2004) and in secondary task involvement (Winter et al., 2016). Further studies demonstrate that during periods of automation, drivers experience increased sleepy and drowsy behavior leading to decreases in driver vigilance (Miller et al., 2015). Thus, as long as a human is a necessary component of driving, a better understanding of the biological correlates to safe driver take-over events is critical to the development of SAE Level 2-3 autonomous vehicles.
There has been increased effort in assessing driver state such as via physiological measurement tools (e.g., heart rate variability, skin conductance, etc.) or image recognition via board camera (e.g., drowsiness detection, emotion recognition, etc.) (Begum, 2013;Chowdhury et al., 2018). At the same time, there have been research efforts to elucidate the neuro-cognitive processes that underlie or precede these physiological or behavioral states via brain imaging (Kim et al., 2020;Ware et al., 2020). One brain imaging technique that has particularly gained traction in the past decade is functional near-infrared spectroscopy (fNIRS). In short, fNIRS is an optical brain imaging approach that uses near-infrared light to measure changes in oxygen levels within the cortex of the brain. As shown in Figure 2, a light source is situated next to a light detector approximately 3 cm apart. The emitted light travels in a banana shape path in all directions. As light passes through blood in the cortex, much of the light is absorbed by oxygen molecules attached to hemoglobin. The remaining light is detected and is used to calculate relative levels of oxygenated and deoxygenated hemoglobin within the region of the cortex between source and detector optodes. As previous research has shown, changes in cortical oxygenation (e.g., blood oxygen level dependence) occur when regions of the cortex become active (Strangman et al., 2002;Cui et al., 2011). Thus, by making such measurements quickly over time (e.g., ≥10 Hz), approximate real-time brain function may be observed and mapped to coinciding behavior.
Compared to electroencephalography (EEG), a commonly used functional neuroimaging approach that records electrical stimulation within the brain, fNIRS provides greater spatial resolution but slower sampling frequency (Scholkmann et al., 2014). Compared to functional magnetic resonance imaging (fMRI), the "gold standard" of functional brain imaging, fNIRS provides a faster sampling frequency but lower spatial resolution (Strangman et al., 2002;Cui et al., 2011). Thus, while maintaining a high sampling frequency, fNIRS also provides adequate spatial resolution needed to localize cortical brain function. Furthermore, fNIRS also affords many other methodological benefits, such as a tolerance to movement and methodological flexibility (Strangman et al., 2002;Cui et al., 2011). As it relates to the study of brain function and driving, fNIRS makes it possible to observe the neural correlates of driving in a manner that may not be feasible with other modalities (Tachtsidis and Scholkmann, 2016;Herold et al., 2017). In particular, recent advances in the portability of fNIRS systems have afforded neuroscientists the methodological flexibility to investigate neurocognitive behavior in naturalistic settings outside the MRI scanner (Baker et al., 2017;Yücel et al., 2021). Though brain imaging research with fNIRS has reached a new era of Real-Life Neuroscience (Shamay-Tsoory and Mendelsohn, 2019;Holleman et al., 2020), it is important to note that the age of real-life neuroscience is still burgeoning, and that researchers are currently in the exploration phase of testing tools and methods to further strengthen our ability to study neurobiological signatures of more complex behavior in naturalistic environments such as driving. Furthermore, advances in technology and methodology have made fNIRS also accessible to groups outside of traditional neuroscience domains. For instance, engineers from diverse backgrounds such as Human Factors and Ergonomics, Human-Computer Interaction, and Engineering Design/Affective Engineering, have begun to study the neural signatures of drivers from their own research perspective (Solovey et al., 2009;Canning and Scheutz, 2013;Balters et al., 2017;Bosworth et al., 2019;Zhu et al., 2020). The transition between laboratory and the real-world (e.g., on-road) combined with advances from researchers outside of classical neuroscience (e.g., engineering) has resulted in a wide array of interesting and differing fNIRS methodologies. With respect to fNIRS driving research, "into the wild" and "multi-disciplinary" approaches have led to an eclectic mix of experimental designs, analytical techniques, and hardware configurations as highlighted throughout our review below. While such scientific diversity is expected in this early phase of any research domain, it will be essential for future research to minimize such methodological differences in a concerted effort to advance the field.
We argue that the construction of safe autonomous driving systems is an ongoing engineering challenge with high impact for society, and that an understanding of human behavior as part of this system is an important and open task. In complement with other psycho-physiological and behavior measures, fNIRS affords unique insights in understanding the underlying cognitive functions related to autonomous driving scenarios. To inform future autonomous driving studies, we therefore review the current state of fNIRS methodology pertinent to driving research. The overarching aim is to provide a methodological benchmark, which we can use to identify existing limitations that hamper fNIRS's utilization in autonomous driving research. Specifically, we provide a detailed methodological review of the fNIRS-based studies of driving published prior to the year 2020. Throughout our review, we focus on seven relevant methodological domains that vary across studies: experimental environment; participant selection and documentation; task familiarization, physiological baseline; and inter-block/inter-trial intervals; task design and analytical approach; control task design; hardware specs, optode distance, and optode placement; and data processing and statistical analysis. For each of these domains, we identify potential best practices, summarize the approaches taken by researchers to date, highlight remaining hurdles and provide recommendations for future research. It is our hope that this paper will serve as an anchor for future discussion and collaboration within and between fNIRS researchers from different disciplines.

METHODOLOGICAL REVIEW
We executed a Google Scholar and PubMed search and considered all peer-reviewed manuscripts that were published through December 31, 2019. Our search strategy included the following keywords: "fNIRS car, " "fNIRS driver, " "fNIRS driving, " "NIRS car, " "NIRS driver, " "NIRS driving." For each search, we inspected the first 250 entries for each keyword category and included all articles that met the criteria of "adult subjects, " "car driving, " and "simulator and/or on the road studies." We operationally defined "simulator" as any virtual interface that included a physical steering wheel and pedals. As such, this allowed for a wide variation in simulator complexity 2 . Additionally, we checked the reference lists of the included articles for any additional relevant articles. We only included journal and conference publications in the English language. For instances in which the same content was published in more than one peer-reviewed publication (i.e., journal article and conference proceedings; N = 6), we distilled a key publication as representative in this review. From the initially identified 55 publications, a total of 48 publications met the above criteria. As shown in Figure 3A, the first studies in the field emerged about one decade ago, and about 5 years later fNIRS driving research significantly intensified. By means of thematic analysis of the identified publications we derived nine research topics of interest (see Figure 3B) 3 . In Table 1, we provide a brief summary of each paper including year of publication, author(s), research topics, basic manipulation and studied brain regions of interest. For an in-depth read on neural processes involved in driving, we refer the interested reader to following reviews Lohani et al., 2019;Kim et al., 2020;Ware et al., 2020).

Experimental Environment
As highlighted in Table 1, researchers have used a myriad of approaches to study brain function under multiple driving conditions. This includes studies employing low-scale driving simulators (i.e., driving simulation conducted on desktop computer with small visual field; N = 17 studies) as well as a range of more immersive simulator environments (i.e., driver seated in a mock automobile with large visual field; N = 22 studies) and on-road conditions (N = 9). Simulated environments provide researchers with the ability to regulate and control the drivers' experience. For instance, researchers may provide repeated instances of a specific driving task (e.g., diversion from unexpected obstacle) that may not occur frequently in real-world driving. Moreover, within simulated environments researchers may observe drivers in states (e.g., drowsiness) that would be too dangerous or irresponsible to observe in real-world driving. While simulated environments may require similar driver movements, many other mediating factors that affect the fNIRS signal during real driving are not present . For example, imperfections in road conditions combined with road camber, centrifugal forces, wind, and unique characteristics of the automobile also introduce sources of noise that may not be easily captured in a driving simulator . Another consideration regarding the task environment is the effect that the driving environment may have on the data. For example, the physical movements needed to operate low fidelity desktop-based simulated environments may differ greatly from immersive in-car experiences. Incar driving requires significant head and limb motions that have been shown to induce artifacts due to motion-induced optode shearing on the scalp (Huppert et al., 2009;Virtanen et al., 2011;Brigadoi et al., 2014). To help overcome such shearing, researchers may consider tightening the fit of the optodes on the participant's head. It is important to note, however, that these measures will not completely remove artifacts due to motion, may increase participant discomfort over time, and may also introduce data artifacts of their own (Baker et al., 2017). Finally, ambient light in realworld driving is often much greater than in simulated environments and thus can negatively affect fNIRS signal compared to indoor lighting (Chenier and Sawan, 2007;Coyle et al., 2007;Baker et al., 2017). While these factors increase the difficulty of conducting fNIRS studies during on-road driving, they are often essential factors in an experimenter's methodological design.

Participant Selection and Documentation
Beyond assuring a sample size that is large enough for statistical interpretations, many participant characteristics (e.g., age, driving experience, gender, personality traits, etc.) have been shown to have a significant impact on driving behavior (Tao et al., 2017;Fountas et al., 2019). Participant selection and the reporting of participant characteristics, including but not limited to sample size, age, driving experience, and gender are important, as those factors may influence experimental design, analysis, and interpretation of the results.

Task Familiarization, Physiological Baseline, and Inter-Block/Inter-Trial Intervals
Another study component is the activity used to familiarize participants with the driving environment. Participants who have not been familiarized with the driving environment may experience confusion with the system controls or may otherwise attend to factors outside of the task of interest. This, in turn, may elicit unwanted cortical activation that would not be present in the absence of such confusion. It is further advisable to establish a physiological baseline at the beginning of each scan, prior to the start of the experiment. This may be accomplished by simply instructing the participant to sit quietly and without movement for 30 s to 1 min. This can serve to stabilize the fNIRS signal, so that it is not artificially inflated or deflated due to excessive movement. Finally, it is important that experimental designs incorporate appropriately timed inter-trial and/or inter-block intervals. These intervals act to separate hemodynamic responses elicited during a task event or block prior to the start of the next trial or block. Event-related designs require jittered intertrial intervals (i.e., randomized or pseudorandomized durations) so that the onset of the successive trial/block is unknown to the participant (Plichta et al., 2006(Plichta et al., , 2007. Among the papers reviewed here, 23 did not report information regarding the driving familiarization task used. Participants were allowed to experience the driving environment (simulated or on-road) within 18 studies Shimizu et al., 2009;Khan and Hong, 2015;Pradhan et al., 2015;Oka et al., 2015;Foy et al., 2016;Sibi et al., 2016;Horrey et al., 2017;Chuang et al., 2018;Ihme et al., 2018;Yamamoto et al., 2018Yamamoto et al., , 2019Khan et al., 2019;Lin et al., 2019;Sturman and Wiggins, 2019;, and seven studies allowed participants to familiarize themselves with the driving environment and task-related stimuli Yoshino et al., 2013a,b;Unni et al., 2015;Balters et al., 2017;Sibi et al., 2017;Scheunemann et al., 2019). Only four studies reported a physiological baseline task (i.e., task designed to allow participants' cortical activity to settle into a resting level), including sitting quietly with or without eyes open for a period of 2 min , 5 min , 10 min , or 20 min . Establishing a resting level of oxygenation is an important component of fNIRS methodology, as it provides a baseline of blood oxygenation from which hemodynamic response magnitudes during the task are determined. Should no baseline be established researchers run the risk of missing true hemodynamic responses due to Type II (i.e., false negative) error. That is, detecting a rise in cortical oxygenation due to task demands (e.g., driving challenges) may be hampered if cortical blood oxygenation levels were artificially high to begin with. Generally, the recommended duration to establish a baseline is at least 30 -60 s in which the participant sits quietly. Seven studies employed a block-design Nakano et al., 2013;Ihme et al., 2018;Hidalgo-Munoz et al., 2019). The inter-block duration ranged from roughly 30 s Hidalgo-Munoz et al., 2019), to a maximum of 5 min Ihme et al., 2018). In one study, a full day of rest between two block-conditions (i.e., drowsy vs. rested) was given . Twelve studies used eventrelated design with inter-trial intervals ranging between 10 s and 1 min (Shimizu et al., 2009;FakhrHosseini et al., 2015;Oka et al., 2015;Unni et al., 2015Unni et al., , 2017Nosrati et al., 2016;Balters et al., 2017;Sibi et al., 2017;Bruno et al., 2018;Chuang et al., 2018;Lin et al., 2019;Scheunemann et al., 2019).

Task Design and Analytical Approach
The analytical approach that a researcher intends to take with their study is inherently related to methodological elements such as trial number and number of experimental conditions (Plichta et al., 2006(Plichta et al., , 2007. For instance, an adequate number of trials are required for each condition to provide a normal distribution of fNIRS samples, which is in turn required to identify a true effect from background noise. If the number of repetitions is too low the researcher will be more prone to both Type I (false positive) and Type II (false negative) errors because outlying values have a greater impact on smaller distributions. This means that singletrial studies may not be suitable for trial-based study designs. Researchers may obtain effect size estimates (e.g., Cohen's d) from published reports and use such information to estimate the appropriate number of trials needed to obtain a desired statistical effect. Moreover, researchers may rely on online tools (e.g., Optseq) designed to aid in the methodological development of neuroimaging studies with respect to statistical power. As with other neuroimaging methods, fNIRS studies are typically conducted as block or event-related designs.
Block design studies require participants to engage in a task for at least a duration long enough to observe an entire hemodynamic response to a given stimulus or experimental condition. For example, a researcher interested in the effect of talking on a cellphone while driving may require participants to drive a pre-defined course 10 times while talking on the phone, then again without talking for a total of 20 trial blocks. Assuming, for the sake of our example, that talking on the phone did elicit cortical activity that was captured by fNIRS, such activity may be observed as a rise in oxygenation that occurs shortly after the beginning of each talking block and lasting until talking ended. The approximate duration required to observe a hemodynamic response function (HRF) is at least 10 s (Strangman et al., 2002;Cui et al., 2011), meaning that each task block must be at least 10 s in duration, although additional time is required to also observe the decrease of the HRF as cortical activity returns to resting levels. While an HRF may not be easily identifiable in a single block of the task, averaging each talking block together may reveal such a response. Several metrics [e.g., area under the curve (AUC), max/min value, etc.] derived from the block-averaged time series may then be calculated and used as the primary dependent variable for group-level analyses. The AUC for talking blocks may be expected to be greater than for non-talking blocks, which may be tested using common inferential statistics (e.g., Student's t-test). However, because a hemodynamic response may vary for any number of reasons (e.g., attentional shift), the use of block durations that are significantly longer than a single HRF (e.g., 60 s or longer) may include unrelated cortical activity when averaging. Thus, researchers should consider limiting their block durations or parsing excessively long blocks into discrete sections for block averaging.
One well established alternative to block design tasks is the generalized linear modeling approach (GLM), which attempts to model small portions of an expected hemodynamic response through convolution onto the recorded fNIRS data. This is done by time-locking the onset and duration of each task trial to the fNIRS timeseries. If the onset of a trial induces an expected hemodynamic response, the fit of the GLM will be greater compared to conditions that do not elicit a hemodynamic response. Because the entire HRF is not sought, researchers may present multiple trials that are shorter in duration compared to block averaging. Furthermore, trials from different conditions may be pseudo-randomized and jittered so that all conditions are experienced evenly throughout the study, yet their onsets may not be reliably determined. Ultimately, the GLM approach will calculate standardized beta weights for each condition, which quantify the degree to which each condition elicited an increase (positive beta) or decrease (negative beta) in cortical response. Task-and control-condition beta weights may then be contrasted and used as a primary dependent outcome. Finally, multiple advanced statistical approaches (e.g., machine learning, functional connectivity, etc.) have been developed for fNIRS data that may also provide greater methodological flexibility. For instance, unconstrained machine learning analyses seek to identify unique patterns of cortical responding that occur during naturalistic driving. This approach may be best suited for long durations of real-world driving that do not include explicit trials. Similarly, while also amenable to trial-based task structures, functional connectivity analyses may be used to identify inter-or intra-brain communication that occurs during naturalistic driving.

Control Task Design
The objective of a control task is to provide a condition that is nearly identical to the primary task yet lacks the component that is expected to elicit a cortical response of interest. For instance, for a hypothetical study of the effect of distraction on the neurobiological signatures of driving, participants may find themselves actively driving the same course in the "distraction" and "no distraction" conditions. However, within the "distraction" condition an attentionally demanding task is added to the driving experience. These conditions allow the researcher to contrast cortical activation during distraction with activation under identical conditions save for the distracting component. The optimal control task will elicit activation from the same brain regions (e.g., motor cortex), but not the primary experimental region of interest. Thus, as opposed to contrasting activation during a task that requires movement with rest, it may be more appropriate to employ a control task that also requires movement. In other words, if our paradigm is to study distraction during manual driving, then the non-distraction control condition should also include manual driving to account for cortical activation related to the driving task itself (e.g., motor cortex, spatial processing, etc.). If, on the other hand, the aim is to derive neurocognitive signatures of distraction during autonomous driving, then the control task ought to include full automation as well. When selecting a control task, it is helpful to first evaluate brain regions that are expected to be active during the task of interest, including regions such as motor cortex that may not be of relevance to the primary experimental hypotheses yet may show activity due to motion when responding. The same concept applies to the design of the distraction task itself. Since driving requires a variety of different cognitive functions (e.g., motor planning, spatial processing, temporal processing, etc.), it is important for the distraction task to utilize similar cognitive functions. From a "statistical power" perspective it is therefore desirable to include a distraction task that predominantly utilizes other cognitive functions (e.g., auditory stimuli). Other research questions, however, might be tied to ecologically valid scenarios that require similar cognitive functions to the baseline driving task (e.g., use of visual GPS during driving). In these cases, an increase in the number of trials might provide high enough effect size. It is up to the clever experimenter and extensive piloting to identify control tasks that remain ecologically valid while satisfying statistical power criterion. In general, because the cognitive and physiological state of a participant may be expected to change over the course of an experiment, the presence of control trials throughout the task are important.
Control tasks used in reviewed fNIRS driving studies varied greatly from active driving tasks to passive resting states to studies that did not include a control task in their study design. While the majority (N = 37) employed an active control task (e.g., "baseline" driving without an experimental stimuli); six studies employed a passive control task ranging from resting states Tsunashima and Yanagisawa, 2009;Oka et al., 2015;Huve et al., 2019;Zhu et al., 2019) to monitoring autonomous driving in the simulator . Five studies did not include a control task in their study design (Shimizu et al., 2009;Inoue et al., 2014;Khan and Hong, 2015;Nguyen et al., 2017;.

Hardware Specs, Optode Distance, and Optode Placement
At the time this manuscript was prepared, we identified 28 fNIRS devices that were being marketed for human subjects research 4 . The specifications of these devices differ in many respects. For example, the number of optodes available in a given system will greatly affect the size and weight of the device. For labbased studies where obtaining the greatest amount of cortical coverage possible is a priority, the issue of portability may not be an issue. However, for real-world studies that attempt to make use of a products' portability, researchers must often sacrifice cortical coverage in lieu of a smaller form factor. These issues may also have an effect on aspects of fNIRS data quality due to variations in source strength (e.g., LED vs. laser light sources) or detector sensitivity (e.g., standard vs. avalanche photo diode), as well as sampling frequency (e.g., time-locked vs. simultaneous). Within the studies included here, 92% studies applied a total of 16 different commercially available devices of which only three were not portable solutions, while the remaining four studies (8%) reported use of devices that were built "in-house" Nosrati et al., 2016;Nguyen et al., 2017;Li et al., 2018). The mean number of channels available across all devices was M = 30.9 (SD = 25.8), and ranged from one (Sturman and Wiggins, 2019) and two channel solutions Li et al., 2009;Nosrati et al., 2016;Horrey et al., 2017) to 98 channels . Two studies did not specify the number of channels used Xu L. et al., 2017), and four studies reported the use of a "tandem" system to increase the number of optodes available (Shimizu et al., 2009;FakhrHosseini et al., 2015;Ihme et al., 2018;Unni et al., 2017). As depicted in Figure 4, the most common placement of optodes was over the prefrontal cortex (PFC), with 56% of all studies reporting placement solely over the PFC Li et al., 2009;Tomioka et al., 2009;Tsunashima and Yanagisawa, 2009;Shimizu T. et al., 2011;Nakano et al., 2013;Inoue et al., 2014;FakhrHosseini et al., 2015;Khan and Hong, 2015;Pradhan et al., 2015;Foy et al., 2016;Nosrati et al., 2016;Sibi et al., 2016Sibi et al., , 2017Balters et al., 2017;Horrey et al., 2017;Nguyen et al., 2017;Foy and Chapman, 2018;Huve et al., 2018Huve et al., , 2019Le et al., 2018;Li et al., 2018;Khan et al., 2019;Sturman and Wiggins, 2019;. Twenty-one studies (44%) report placement over the PFC as well as the motor (Yoshino et al., 2013a,b;Oka et al., 2015;Orino et al., 2015;Xu L. et al., 2017;Chuang et al., 2018), occipital Shimizu et al., 2009;Unni et al., 2017;Ihme et al., 2018;Hidalgo-Munoz et al., 2019;Zhu et al., 2019), and parietal (Yoshino et al., 2013a,b;Oka et al., 2015;Orino et al., 2015;Unni et al., 2015Unni et al., , 2017Bruno et al., 2018;Chuang et al., 2018;Ihme et al., 2018;Yamamoto et al., 2018Yamamoto et al., , 2019Hidalgo-Munoz et al., 2019;Zhu et al., 2019) cortices. Only one study included here did not report placement over the PFC (Lin et al., 2019), while one other study reported coverage of "almost the whole head" (Scheunemann et al., 2019).
Optode distance mediates the photon path depth that is sampled at each measurement. The recommended specification of 30-40 mm is thought to optimally sample hemodynamic activity in the cortex while maintaining an acceptable signal to noise ratio (Brigadoi and Cooper, 2015). However, because all photons pass through scalp vasculature, fNIRS measurements at the recommended optode distance are confounded by extracortical hemodynamics. As a solution, many fNIRS vendors now offer "short-channel" optode distances of approximately 5 mm. Since the photon path of a channel this "short" is very shallow, thus sampling only extra-cortical blood flow, much of this noise signal may be used during pre-processing or through statistical procedures to reduce unwanted artifact. Eighteen studies followed established guidelines of constant 30-40 mm distance between optodes Shang et al., 2007;Shimizu S. et al., 2011;Yoshino et al., 2013a,b;Orino et al., 2015;Nosrati et al., 2016;Nguyen et al., 2017;Unni et al., 2017;Bruno et al., 2018;Li et al., 2018;Hidalgo-Munoz et al., 2019;Khan et al., 2019;Lin et al., 2019;Scheunemann et al., 2019). A total of 26 articles (55%) did not, however, report optode distance, and four used varying distances, e.g., 20-30 mm , 20-40 mm (Ihme et al., 2018), and 30-40 mm . No studies included here reported the use of short-channels.
In addition to optode distance, accurate optode placement is required to target regions of interest. The use of a standardized method to place optodes is necessary to ensure that the regions of interest are appropriately covered consistently across participants. Common methods such as the International 10/20 system have been shown to provide consistent coverage despite changes in head size across participants (Okamoto et al., 2004). Outside of the needs within a study, accurate reporting of optode placement will assist in the future replication of studies. About half (52%) of the research articles reviewed here do not specify the optode placement strategy, while four studies used the 10/10 International System Liu et al., 2017;Xu L. et al., 2017), and nine other studies the 10/20 International System (Tomioka et al., 2009;Khan and Hong, 2015;Nosrati et al., 2016;Unni et al., 2017;Bruno et al., 2018;Chuang et al., 2018;Ihme et al., 2018;Hidalgo-Munoz et al., 2019;Lin et al., 2019;. Other studies just specified placement such as 4 cm from mid-line and 2 cm above supra-orbital ridge Shimizu T. et al., 2011;Horrey et al., 2017). Five studies used a 3D neuroscan digitizer (e.g., Polhemus 5 ) to co-register the optode positions on the head (Yoshino et al., 2013a,b;Oka et al., 2015;Orino et al., 2015;Orino et al., 2017), and one study conducted a sensitivity profile by projecting the fNIRS probe onto a digital brain atlas . Only two studies report the use of optode placement software Ihme et al., 2018).

Data Processing and Statistical Analysis
Efforts have been made to develop and standardize fNIRS data processing procedures and tools Di Lorenzo et al., 2019). For example, the decision tree in Figure 5 outlines a common fNIRS data processing pipeline. We refer the reader to  for a more detailed overview of the most common fNIRS data processing steps. While a full review of these methods is outside of the scope of this paper, the reader will see the order at which each step is generally taken, beginning with raw optical density. The attempt to standardize data processing procedures Herold et al., 2017Herold et al., , 2018 has had a positive impact on the fNIRS community. Such efforts are also supported by the development and increasingly common usage of fNIRS-specific data analysis packages (e.g., HOMER2 6 , HOMER3 7 , NIRS SPM 8 , nirsLAB 9 , fNIRSOFT 10 , open-potato 11 , PHEOBE 12 , etc.).
A similar variety of analysis approaches were noted for the reviewed studies. For example, the majority of studies (N = 29) averaged the hemodynamic activity recorded during all trials or events to calculate the mean values, standard deviations, and/or maximum values across. These values were commonly used to conduct inferential group-level statistics, including t-tests and ANOVA. The remaining approaches varied between connectivity analyses Xu L. et al., 2017), GLM Sibi et al., 2017;Bruno et al., 2018), linear or logistic regression on subject-level time series data Ihme et al., 2018;Scheunemann et al., 2019), machine learning (Le et al., 2018;Huve et al., 2018Huve et al., , 2019Khan et al., 2019;Zhu et al., 2019), factor analysis , linear discriminant analysis (Khan and Hong, 2015), and frequency power analysis to conduct non-parametric test across mean change in power .

DISCUSSION AND CONCLUSION
The widespread adoption of fNIRS to study the brain's response to driving has led to many interesting research questions and findings. This review highlighted how methodological approaches, data processing steps, and analyses often vary greatly across the studies. While such scientific diversity is well expected in this early phase of "naturalistic neuroscience, "  it may also hamper the generalization of findings that allow researchers to compare and confirm results moving forward. As a consequence, systematic comparisons (i.e., meta-analysis) of the findings to establish generalizable results are difficult if not impossible. One striking outcome of our review, and one that may help explain the high amount of methodological variance across studies, is the distribution of experiments conducted across the neuroscience and engineering disciplines. As shown in Figure 6A, roughly half (N = 25) of the manuscripts were published in engineering journals/conference-proceedings, while the remaining manuscripts (N = 23) were published in neuroscience journals/conference-proceedings. Both disciplines have largely focused on the same sub-topics of driving research (see Figure 6B). This similar distribution provides a naturally occurring overlap in empirical focus across disciplines that affords a unique opportunity to compare and contrast the strengths and weaknesses of both disciplines.
Notably, seven studies (i.e., six in engineering and one in neuroscience) applied fNIRS to assess cognitive function while driving with automated features (i.e., adaptive cruise control)  or while engaging with higher automated systems (SAE Level 2-3) Balters et al., 2017;Sibi et al., 2017;Hidalgo-Munoz et al., 2019;Huve et al., 2019;Zhu et al., 2019). While these studies provide first valuable insights into brain function related to autonomous driving scenarios, the applied methodologies differed in many aspects such as analysis approach (i.e., GLM vs. block averaging vs. machine learning), data processing steps, and metrics used (i.e., HbO vs. HbO and HbR vs. tHb vs. THR), to name a few. This diversity in experimental approaches highlights the need for methodological standardization so that meta-analyses of results may be conducted in the future. Moreover, all seven autonomous driving studies were executed within a simulator environment. While exclusive use of a simulator environment might be attributed to safety-critical considerations during the experiment, the need to conduct on-road "ecologically valid" studies is obvious. Multiple factors inherent to on-road driving (e.g., motion induced artifacts due to driver movement or road imperfections, sunlight, etc.) introduce noise into fNIRS data that must be addressed to adequately analyze data from future studies. Beyond repeatability of stimuli and safety that a simulator provides, it is specifically artifacts -either induced by environment (e.g., road vibrations and sunlight) or driver (e.g., motion) that have to be mastered to advance valid driving on-the-road research. In order to move toward standardization and to "make fNIRS ready" for autonomous on-road driving, we provide recommendations below (see Figure7). We argue that an effort toward standardization and advancement within three domains (i.e., immediate methodological advancements, analysis, and hardware) may facilitate a more efficient and meaningful progression of fNIRS research toward reliable onroad measurements.

Immediate Methodological Advancements
As described above, the field can clearly benefit from attention to participant selection and more detailed documentation of the participant cohort. Proper participant selection (e.g., sufficient sampling size) as well as proper reporting of the participant cohort will enhance research quality and generalizability, and provide the reader with the information needed for making consistent and valid inferences. Our methodological review discussed the importance of including task-familiarization procedures and the need for a physiological baseline, as well as sufficient inter-block/inter-trial intervals in future studies. We further highlighted the importance of determining task duration and task repetition, depending on the underlying analytical approach. We identified a trade-off in control task design, often driven by the desire to maintain high levels of ecological validity versus experimental control over cortical activations within a region of interest. Similarly, our review revealed considerable divergence in data processing steps across studies. Many papers did not use or did not document data filtering procedures, despite readily available processing software (e.g., Matlab-embedded HOMER 2) and step-by-step instructions (e.g., Brigadoi et al., 2014;Di Lorenzo et al., 2019). We argue that future interdisciplinary fNIRS driving research should consider using these pre-existing analysis tools. At the same time, it is vital that these open access tools are well maintained by ongoing discussions within the community to avoid detrimental biases and/or assumptions as well as stagnation in novel algorithm development. In addition, it is important that future research reports processing steps to enhance interpretation and replication. Further, our review demonstrated high variation in the hemodynamic signal used and reported. The majority of papers reported using traditional metrics of brain function (i.e., HbO and HbR), though not consistently across studies. As others have suggested (e.g., Herold et al., 2017), we propose that future studies should utilize at least the two standard fNIRS metrics (i.e., HbO and HbR). Finally, while analytical variations are expected, and indeed often required, across studies, it should be noted that differences in data processing and analysis approaches can hamper our ability to compare obtained results Tak and Ye, 2014;Herold et al., 2017). Accurate and detailed reporting of statistical output (e.g., effect size, sample size, degrees of freedom, etc.) is essential for allowing metaanalyses of study outcomes. Of the studies included in this review, only 77% reported statistics sufficient for meta-analysis. All of the above emphasizes that a detailed reporting of methods and results is needed to enhance future research.

Analysis
Our methodological review highlighted many different approaches with respect to the choice of data analyses, data processing, and hemodynamic proxies (e.g., oxygenated and deoxygenated) across studies. This diversity is, in part, attributed to the wide range of research questions investigated. For instance, studies focusing on single driving events (e.g., near-collision events) employed different analytical approaches than those that focused on driver fatigue over long durations. We argue that future research should utilize experimental procedures that allow for inclusion of results in future meta-analyses. Future research should also prioritize experimental approaches that are ecologically valid and analytic pipelines that can be widely adopted across researchers in all disciplines. For example, the community could agree and define standardized procedures for documentation (e.g., optode placement, data processing steps, metrics) that should be reported in publications. Journals could provide checklists, to both authors and reviewers, to ensure the reporting of necessary information. Our review further highlighted the need for the development of analytics that allow study of single events as well as over-time measures. Multi-disciplinary teams of both automotive engineers and neuroscientists could jointly tackle this challenge to acknowledge both, analytical needs for ecologically valid scenarios and neurophysiological feasibility. Further development and maintenance (along with usage) of open-source software will enhance cross-study comparability for future research. We argue that more effort is needed to develop and disseminate such analytical tools via peer-reviewed publication and open-source file sharing.

Hardware
Further advances in fNIRS portability as well as reduction in motion artifact effects (i.e., induced by both the human and the moving environment) will undoubtedly enhance our ability to execute high-quality fNIRS autonomous driving research on the road. Reducing the effect of sunlight and/or artificial light will further increase data usability. Including short-channels in the standard set up of hardware systems will allow researchers to robustly filter and/or account statistically for noise caused by physiological signals (e.g., heart rate, mayer waves, breathing rates). Higher density head coverage with increased optodes will allow researchers to derive holistic brain models of the human cortex. Improved fit and comfort of optodes on the scalp will allow for longer study durations that may be better for longduration driving studies. Alternatively, single-channel solutions could be applied for long duration driver monitoring scenarios once a critical brain region of interest is identified. Several studies have demonstrated multi-modal sensor approaches, such as the use of EEG-fNIRS hybrid systems, which will permit the study of brain function with a high degree of temporal and spatial resolution Lin et al., 2019). The integration of multi-modal brain imaging, along with concurrent physiological and behavioral measurements, will help to generate a more holistic and accurate model of human driving behavior. Notably, twelve of our reviewed studies already included additional sensors to detect heart rate, heart rate variability, breathing rate, as well as eye blinking and/or eye closure rate Khan and Hong, 2015;Horrey et al., 2017;Nguyen et al., 2017;Unni et al., 2017;Bruno et al., 2018;Chuang et al., 2018;Foy and Chapman, 2018;Lin et al., 2019;Sturman and Wiggins, 2019;. These are promising examples of such a multi-modal approach.
Overall, we believe that a joint effort on the part of neuroscience and engineering disciplines will continue to advance our ability to measure and understand brain function during autonomous driving scenarios in the simulator (e.g., safe environment for critical testing) and ultimately within on-road settings. Promoting the benefits of enhanced communication and interaction between the two disciplines holds promise for motivating new and productive interdisciplinary collaborations. Potential opportunities include the convening of special interest groups at conferences, the promotion of joint-disciplinary callfor-papers and presentations, or the formation of a shared society. The interdisciplinary effort across engineering and neuroscience toward determining how the brain functions when we "operate" a motor vehicle across all SAE levels of automation, will help to design and engineer safe driving of the future.