Capturing Human Interaction in the Virtual Age: A Perspective on the Future of fNIRS Hyperscanning

Advances in video conferencing capabilities combined with dramatic socio-dynamic shifts brought about by COVID-19, have redefined the ways in which humans interact in modern society. From business meetings to medical exams, or from classroom instruction to yoga class, virtual interfacing has permeated nearly every aspect of our daily lives. A seemingly endless stream of technological advances combined with our newfound reliance on virtual interfacing makes it likely that humans will continue to use this modern form of social interaction into the future. However, emergent evidence suggests that virtual interfacing may not be equivalent to face-to-face interactions. Ultimately, too little is currently understood about the mechanisms that underlie human interactions over the virtual divide, including how these mechanisms differ from traditional face-to-face interaction. Here, we propose functional near-infrared spectroscopy (fNIRS) hyperscanning—simultaneous measurement of two or more brains—as an optimal approach to quantify potential neurocognitive differences between virtual and in-person interactions. We argue that increased focus on this understudied domain will help elucidate the reasons why virtual conferencing doesn't always stack up to in-person meetings and will also serve to spur new technologies designed to improve the virtual interaction experience. On the basis of existing fNIRS hyperscanning literature, we highlight the current gaps in research regarding virtual interactions. Furthermore, we provide insight into current hurdles regarding fNIRS hyperscanning hardware and methodology that should be addressed in order to shed light on this newly critical element of everyday life.


INTRODUCTION
The COVID-19 pandemic has dramatically disrupted the daily lives of much, if not all, of the world's population. Overnight, in-person social interactions have been replaced by video conferencing. Today "Zoom meetings" are commonplace and have largely allowed us to continue engaging in our daily routines. Indeed, in the weeks after COVID-19 emerged across the globe downloads of videoconferencing apps increased by >90% of the 2019 average (AppAnnie.com 2020). Since then, videoconferencing has been a vital tool for business, medicine, education, and social interactions alike. Despite our ability to stay "connected, " there is both empirical and anecdotal evidence to suggest that these mediums are inadequate substitutes for traditional in-person social interactions. For example, virtual interactions have been shown to have adverse effects on emotional and mental health (Holmes et al., 2020;Pfefferbaum and North, 2020), education outcome (Ahmed et al., 2020;Schwartz et al., 2020), and medical care service (Hollander and Carr, 2020;Pappot et al., 2020). Moreover, a glut of popular press articles lamenting the negative effects of "Zoom fatigue" in its many forms (BBC April 22, 2020;National Geographic, April 24, 2020;New York Times, May 4, 2020;the Wall Street Journal, June 5, 2020) are testimony to the negative impact that this new form of communication may have on human-to-human interaction.
These reports are concerning given that video conferencing is likely to play a significant role in human's lives for the foreseeable future (Van Bavel et al., 2020). Critically, too little is currently understood about the underlying neurocognitive mechanisms that result in the adverse effects reported above (e.g., increase in social isolation, decrease in learning outcome, increase in fatigue, etc.). In fact, to our knowledge, there is currently no study that directly compares the differences in neural signatures of social interactions between virtual and inperson interactions. We argue that it is critically important to understand these differences in neural mechanisms that underlie digital human-to-human interaction, and specifically how these neural mechanisms may differ from traditional in-person interactions. We propose functional near-infrared spectroscopy (fNIRS) hyperscanning (i.e., measuring two or more brains simultaneously as they interact socially) as a tool to quantify and understand the potential differences between virtual and in-person interactions. As we argue below, fNIRS hyperscanning may provide an ideal approach to elucidate the neurocognitive differences between virtual and in-person interactions that may result from changes in social behavior (e.g., eye-to-eye contact), from differences in environmental information (e.g., disparate background/foreground lighting), and/or from technological parameters (e.g., unequal frame rates). A clear understanding of the underlying neural mechanisms could inform the development of behavioral interventions and/or the design and engineering of technology that help to mitigate adverse effects. For example, imagine brief yet highly-effective pro-social behavioral exercises that combat social isolation or software that simply synchronizes frame rates to decrease fatigue during virtual teaching/learning activities.
There is conceptual and empirical evidence that social cognition is fundamentally different when we are in interaction with others rather than merely observing them (Schilbach et al., 2013). Hyperscanning technology has allowed us to shed light onto the neural processes underpinning social cognition (Babiloni and Astolfi, 2014;Wang et al., 2018). Over the past decade the field of hyperscanning with functional near-infrared spectroscopy has increased dramatically and has provided unique insight into signatures of brain-to-brain connectivity that are invisible to the naked eye (Dumas et al., 2011;Babiloni and Astolfi, 2014;Redcay and Schilbach, 2019). Specifically, fNIRS hyperscanning has highlighted inter-brain coherence (i.e., correlation of cortical activity between brains) that occurs during social interactions, such as cooperation (Cui et al., 2012;Yang et al., 2020), and is often associated with enhanced behavioral metrics of interaction (Baker et al., 2016). Importantly, given fNIRS' relatively robust tolerance to movement and methodological flexibility, hyperscanning in this modality allows researchers to observe the neural correlates of shared human neural activity in naturalistic environments that are often not feasible in other modalities, such as fMRI or EEG (Scholkmann et al., 2013;Baker et al., 2017;Quaresima and Ferrari, 2019;Gvirts and Perlmutter, 2020). The dramatic increase in fNIRS hyperscanning research has spurred the publication of several systematic reviews, to which we refer the interested reader (Babiloni and Astolfi, 2014;Wang et al., 2018;Czeszumski et al., 2020). In this paper, we focus on providing a review of methodology used in fNIRS hyperscanning research and provide a novel framework to help guide the development of future studies for advancing the field toward capturing human interaction in the virtual age.

DERIVING AN FNIRS HYPERSCANNING FRAMEWORK
We executed a keyword search via Google Scholar and PubMed up to May 15, 2020 that included the following keywords: "fNIRS hyperscanning" and "NIRS hyperscanning." For each search engine, we inspected the first 250 entries for each keyword category and checked the reference lists of the included articles for any additional relevant articles. We included journal and conference articles in the English language only, resulting in a total of 69 fNIRS hyperscanning studies. For the scope of this paper, we focused only on those studies that investigated interaction between adults. As such, we excluded nine infantparent fNIRS hyperscanning studies (Leong et al., 2017;Reindl et al., 2018;Azhari et al., 2019Azhari et al., , 2020Miller et al., 2019;Quiñones-Camacho et al., 2019;Behrendt et al., 2020;Nguyen et al., 2020;Piazza et al., 2020). Furthermore, we excluded two papers that included comparisons of temporally noncongruent fNIRS scans (Liu Y et al., 2017;Hou et al., 2020), resulting in a total of 58 fNIRS hyperscanning papers (see Table 1 for an overview). From each of the resulting 58 fNIRS hyperscanning papers, we extracted all experimental conditions (i.e., "hyperscan" conditions) that were utilized and from which data were analyzed.
In order to find a consistent methodological structure across the resulting 151 hyperscans, two researchers (SB and JMB) executed a thematic analysis. Two naturally occurring dimensions (i.e., Transfer of Information and Type of Communication) emerged from each scan. First, Transfer of Information (ToI) refers to the interface through which human-to-human interaction was conveyed. We clustered TOI into three levels: (1) hyperscans that comprised humanto-human interaction in a face-to-face setting (i.e., Analog), where no digital medium was present; (2) hyperscans that comprised a combination of analog and digital transfer methods (i.e., Mixed ToI), such as sitting side-by-side while problem Shielded refers to a setup in which participants interaction is shielded by a physical divider, and cond. is the abbreviation for condition(s). We marked those studies that included wavelet coherence analysis "WTC." We further included cognitive functions that were required to execute the experimental task and highlighted those cognitive functions that were investigated with an "*".

EXISTING FNIRS HYPERSCANNING HURDLES
Taken together, our analysis highlights the areas of study that have received little to no attention. Specifically, no fNIRS hyperscanning study has, to date, focused on understanding pure Digital ToI (i.e., virtual meeting) nor has any study focused on comparing Digital ToI with Analog ToI (i.e., in-person meeting).
Similarly, Joint open-ended ToC (e.g., chit chat with a friend via zoom) has received very little empirical attention.
The lack of focus on Digital ToI has likely been due, in part, to technological or methodological shortcomings that constrain this line of research. For instance, many fNIRS devices do not easily accommodate a digital hyperscanning interface, which would ostensibly take place in separate rooms so that no inperson communication may occur. While it may be feasible, for example, to build a structure that splits optodes of one device allowing to scan two distant participants, this may be unrealistic for researchers in many instances. Thus, when faced with this challenge, even interested researchers may find such methodology prohibitively difficult. One alternative may be the use of two individual fNIRS devices, each positioned in their own room. However, aside from cost-related drawbacks, in this instance researchers must be able to accurately sync the time series' recorded from both devices in order to facilitate downstream processing and analysis of their data. This may require the development of sophisticated software to sync and timestamp event markers wirelessly across both devices. Notably, while promising examples for such analytical tools do exist (e.g., Labstreaminglayer), there is currently no readily available tool designed specifically for fNIRS hyperscanning. We argue that more effort is needed to develop and disseminate such analytical tools via peer-reviewed publication and open-source file sharing. Alternatively, researchers may video record both members of a separated dyad to capture events, then code the event timestamps post-hoc. This procedure is useful but requires a considerable amount of time and manual effort. Moreover, such procedures should be performed in tandem, so that inter-rater reliability may be established. It is our hope that advances within the community will help overcome this hardware hurdle in order to facilitate the study of the digital ToI domain.
The lack of data within the open-ended ToC domain may be less due to technological drawbacks, and more due to a lack of established analytical approaches to tasks that are not trial based. To quantify and analyze brain-to-brain coupling, researchers have applied more traditional statistical approaches, such as block-averaging (e.g., Holper et al., 2013); analysis of covariance (e.g., Funane et al., 2011); and correlation analysis (e.g., Duan et al., 2013). Cui et al. (2012) introduced a novel analytical approach for fNIRS hyperscanning (i.e., Wavelet Transform Analysis or "WTC"), wherein the coherence and phase lag in two time series is assessed across both time and frequency. By contrasting the average task-related coherence during the task (i.e., cooperation paradigm) and rest, the authors demonstrated an increase in coherence during cooperation that dissipated during rest. Wavelet coherence analysis has been widely adopted within the fNIRS hyperscanning research (as shown in Table 1, roughly 70% of all studies included WTC analysis), and there are efforts to further improve WTC's efficacy . However, while the method was originally developed for block-design studies in which a task frequency band and condition markers may be identified, it currently lacks the ability to derive instant and fluctuating components of social interactions. Recent approaches (e.g., Mayseless et al., 2019) have therefore attempted to develop novel analytical methods that do not rely on task blocks, and which may be applicable to open-ended task designs. Finally, Granger Causality, a method that allows for the derivation of directionality of synchrony between two time series, has also been shown to be a useful analytical approach to investigate the fluctuations of interactive dynamics between individuals . Similar to WTC, further advances in Granger Causality analysis might allow for investigations of fluctuating social dynamics during joint open-ended interactions. It will be important for future research to build upon these approaches, and to develop algorithms and techniques to better facilitate analysis of hyperscanning data.

A PERSPECTIVE OF THE FUTURE POTENTIAL OF FNIRS HYPERSCANNING
The structure presented in Figure 1 is reminiscent of a similar framework that was introduced earlier in this journal (Liu and Pelowski, 2014). Specifically, Liu and Pelowski (2014) proposed a framework that distinguished between FIGURE 2 | (A) This matrix provides a schematic of all nine possible intersections of ToI and ToC within our framework. The schematic shows three hypothetical tasks being conducted across each intersection. First, data analysis (denoted by the bar chart) provides an example of a Joint goal-directed ToC. Next, the instance of one person performing (denoted by the star) while one or more people watch passively (denoted by the eye), provides an example of a Mixed ToC task. Finally, friendly chit-chat (denoted by the chat bubbles) provides an example of a Joint open-ended ToC. Importantly, each of these activities may be conducted under Analog, Mixed, or Digital ToI. (B) This schematic demonstrates a hypothetical 3-person hyperscan that fluctuates continuously across time through multiple domains outlined in our framework. First, a pair of participants situated in the same room engage in open-ended conversation for a period of time (1). Next, a third participant joins the pair via a live video feed, which introduces a mixed digital interface between the three participants (2). Following a period of chit-chat, the triad begins work on a goal-driven task together (3). Next, one of the two participants situated together exits, leaving an interacting pair separated by a digital divide that work together on a goal-driven task (4). These participants continue to work on the goal-driven task until completion (5).
task structure (interdependent vs. independent), interaction structure (concurrent vs. turn-based), and goal structure (cooperative vs. competitive) as variables that hyperscanning studies should consider during task design. As the field of fNIRS hyperscanning progresses toward Real-life Neuroscience (Shamay-Tsoory and Mendelsohn, 2019;Holleman et al., 2020), the need for an updated framework that includes virtual social interactions (i.e., Digital ToI) as well as open-ended interactions (i.e., Joint open-ended ToC) is warranted. We propose that our updated framework, as depicted in Figure 2A, can help guide hyperscanning researchers toward a future where all forms of human-to-human social interactions are fairly represented. In order to achieve equal distributions, the community has to overcome the current hurdles as described above. These hurdles include, but are not limited to, developing methodological designs that address each condition in Figure 2A, hardware that is amenable to hyperscanning when participants are separated physically, and software that is capable of managing back-end data streams of such tasks. It is our hope that both hardware and software will be flexible enough to approach more and more realistic scenarios in which complex and sudden social interactions can be captured (see Figure 2B).
Efforts from the broader fNIRS community will be required to make fNIRS truly ready for realistic scenarios. With respect to hardware, this includes increased device portability and robustness (e.g., with respect to movement and environmental light), increased optode number to cover more cortical areas, and short-channels to account for extra-cerebellar blood flow that may contaminate fNIRS signals (Brigadoi and Cooper, 2015;Baker et al., 2017;Herold et al., 2017). Furthermore, efforts should be made with respect to standardizing fNIRS procedures, such as optode placement, data processing, choice of activation proxy (i.e., oxy-vs. de-oxygenated hemoglobin) (Brigadoi et al., 2014;Tachtsidis and Scholkmann, 2016;Herold et al., 2017;Di Lorenzo et al., 2019), and adoption of standardized opensource fNIRS-specific data analysis packages (e.g., HOMER2,NIRS SPM,nirsLAB,etc.).
While adherence to our framework will help to more completely elucidate the neurobiological signatures of humanto-human interactions across all platforms, future research in this field will not be without limitations. Primarily, this includes the cortical depth at which fNIRS may sample while maintaining acceptable signal quality. While efforts have been made to infer deep-brain activity using fNIRS (Liu N et al., 2015), the relatively low sampling depth of ∼3cm (Brigadoi and Cooper, 2015) limits the neurocognitive functions that may be directly measured by fNIRS. As shown in Table 1, the existing fNIRS hyperscanning research has focused on studying cognitive functions within cortical regions underlying attention, executive function, language, social cognition, visuospatial processing, and motor activity. Methodological approaches to the existing fNIRS hyperscanning studies have been diverse and focused on social interactions during simple motor-synching (e.g., Holper et al., 2012), cooperative and competitive gameplay (e.g., Cui et al., 2012), unstructured and structured conversation including singing (e.g., Osaka et al., 2014), teaching activities (e.g., Nozawa et al., 2019), and creative problem solving (e.g., Lu et al., 2019). Studies also tested for effects of moderators, such as sex (Cheng et al., 2015), level of acquaintance , eye-to-eye contact (e.g., Hirsch et al., 2017), and pro-social priming effects (e.g., Balconi et al., 2019) on inter-brain cognitive functioning and task outcome. In fact, the methodological flexibility afforded by fNIRS is so great that researchers may run the risk of creating methods that are so creative as to be difficult to interpret, replicate or compare. Therefore, we encourage researchers in the immediate future to parsimoniously advance into the understudied areas of our framework (i.e., Digital ToI and JoinT open-ended ToC). For instance, it may be useful to commence the study of differences between virtual and in-person interactions with established hyperscanning tasks, such as simple computer-based cooperation tasks (Cui et al., 2012). In this manner, researchers may directly investigate the effect of ToC on inter-brain coherence and are further able to compare new data with existing outcomes (i.e., confirmatory science). Another interesting inroad could be to extend the study of differences in social cognition between "observing others" and "actually interacting with them" (Schilbach et al., 2013) to video/virtual interactions. In that case, prior fNIRS studies assessing the temporally non-congruent inter-brain coherence of video-recorded individuals and spectators (who watch the videos at a later stage) could serve as entry points (Liu Y et al., 2017;Hou et al., 2020).
Ultimately, multi-dimensional data approaches will allow us to determine which parameters (i.e., behavioral, environmental, and/or technological) are most explanatory with respect to potential differences in neurocognitive signatures between virtual and in-person interactions. For example, using congruent fNIRS-EEG systems will improve temporal resolution. Physiological metrics (e.g., heart rate, heart rate variability, galvanic skin response, pupil dilation, etc.) along with behavioral measures (e.g., eye-gaze-tracking, body-motion tracking, analysis of voice, emotional face tracking, etc.) will provide vital information to better understand the humans' psychophysiological response during social interactions. Lastly, the monitoring of environmental information (e.g., ambient noise, reflecting light on reading glasses, etc.) and technological parameters (e.g., computer frame-rate, computer audio, internet speed, computer screen activity, etc.) will be essential to control and account for potential external biases.
The future of fNIRS hyperscanning is limitless and very well may be a key component of our understanding of the neurobiological underpinnings of social behavior. From tele-health to tele-education, and from internet dating to online gaming, technology driven activities will likely play a ubiquitous role in our social interactions moving forward. The framework presented here is meant to advance discussion among researchers in their study of all aspects of human interaction, including those that technology has yet to make possible.

DATA AVAILABILITY STATEMENT
All datasets generated for this study are included in the article/supplementary material.

AUTHOR CONTRIBUTIONS
SB: conceptualization, literature review, methodology, and writing. JMB: conceptualization, methodology, and writing. GH: conceptualization. ALR: conceptualization, methodology, supervision, and writing. All authors contributed to the article and approved the submitted version.