MEG dual scanning: a procedure to study real-time auditory interaction between two persons

Social interactions fill our everyday life and put strong demands on our brain function. However, the possibilities for studying the brain basis of social interaction are still technically limited, and even modern brain imaging studies of social cognition typically monitor just one participant at a time. We present here a method to connect and synchronize two faraway neuromagnetometers. With this method, two participants at two separate sites can interact with each other through a stable real-time audio connection with minimal delay and jitter. The magnetoencephalographic (MEG) and audio recordings of both laboratories are accurately synchronized for joint offline analysis. The concept can be extended to connecting multiple MEG devices around the world. As a proof of concept of the MEG-to-MEG link, we report the results of time-sensitive recordings of cortical evoked responses to sounds delivered at laboratories separated by 5 km.


INTRODUCTION
Humans spend a considerable amount of time interacting with other people, for example, communicating by verbal and nonverbal means and performing joint actions. Impressively, most persons deal with the ever-changing and intermingling conversations and tasks effortlessly. Various aspects of social interaction have been studied extensively in social sciences, for example by conversation analysis, but they have also recently started to gain interest in systems neuroscience and brain imaging communities (for reviews, see Hari and Kujala, 2009;Becchio et al., 2010;Dumas, 2011;Dumas et al., 2011;Hasson et al., 2012;Singer, 2012). However, many current approaches for studying the brain basis of social interaction are still methodologically clumsy, mainly because of the lack of suitable recording setups and analysis tools for simultaneous recordings of two persons.
Consequently, most brain imaging studies on social interaction have concentrated on recording brain activity of one participant at a time in "pseudo-interactive" situations (e.g., Schippers et al., 2009Schippers et al., , 2010Stephens et al., 2010;Anders et al., 2011). For example, a few-second-time-scale synchronization between the speaker's and listener's brain was demonstrated with functional magnetic resonance imaging (fMRI) by first recording one person's brain activity while she was narrating a story and later on scanning other persons while they listened to this story (Stephens et al., 2010). With near-infrared spectroscopy (NIRS), one person's brain activity was monitored during face-to-face communication with a time resolution of several seconds (Suda et al., 2010). With magnetoencephalography (MEG), more rapid changes were demonstrated, as the dominant coupling of the listener's cortical signals to the reader's voice occurred around 0.5 Hz and 4-6 Hz (Bourguignon et al., 2012).
However, in the above-mentioned studies, the data were obtained in measurements of one person at a time. For example, in the fMRI study by Stephens et al. (2010), brain data from the speaker and the listeners were obtained in separate measurements. In the MEG study, the interaction was more natural as two persons were present all the time, although only the listener's brain activity was measured. However, in these experimental setups, the flow of information was unidirectional, which is not typical for natural real-time social interaction. In addition, if only one subject is measured at a time, the complex pattern of mutually dependent neurophysiological or hemodynamic activities cannot be appropriately addressed.
Real-time two-person neuroscience (Hari and Kujala, 2009;Dumas, 2011;Hasson et al., 2012) requires accurate quantification of both behavioral and brain-to-brain interactions. In fact, brain functions have already been studied simultaneously from two or more participants during common tasks. The first demonstration of this type of "hyperscanning" was by Montague et al. (2002) who connected two fMRI scanners, located in different cities, via the Internet to study the brain activity of socially engaged individuals. No real face-to-face contact was possible as the subjects were neither visually nor auditorily connected and the communication was mediated through button press. This approach has been applied to e.g., a trust game where the time lags inherent to fMRI are not problematic (King-Casas et al., 2005;Tomlin et al., 2006;Chiu et al., 2008;. However, the sluggishness of the hemodynamics limits the power of fMRI in unraveling the brain basis of fast social interaction, such as turn-taking in conversation, that occurs within tens or hundreds of milliseconds. The same temporal limitations apply to NIRS which has been used for studying two persons at the same time (Cui et al., 2012). Thus, methods with higher temporal resolution, such as electroencephalography (EEG) or MEG, are called for in studies of rapid natural social interaction.
EEG has previously been recorded from two to four interacting subjects to study inter-brain synchrony and connectivity during competition and coordination in different types of games (Babiloni et al., 2006(Babiloni et al., , 2007Astolfi et al., 2010a,b), playing instruments together (Lindenberger et al., 2009), and spontaneous nonverbal interaction and coordination (Tognoli et al., 2007;Dumas et al., 2010). This type of EEG hyperscanning enables visual contact between the participants who can all be placed in the same room.
EEG and MEG provide the same excellent millisecond-range temporal resolution, but MEG may offer other benefits as it enables a more straightforward identification of the underlying neuronal sources (for a recent review, see Hari and Salmelin, 2012). Here we introduce a novel MEG dual-scanning approach to provide both excellent temporal resolution and convenient source identification. In our setup, two MEG devices, located in separate MEG laboratories about 5 km apart, are synchronized and connected via the Internet. The subjects can communicate with each other via telephone lines. The feasibility of the developed MEG-to-MEG link was tested by recording time-sensitive cortical auditory evoked fields to sounds delivered from both MEG sites. Figure 1 (top) shows the experimental setup. MEG signals were recorded with two similar 306-channel neuromagnetometersone at the MEG Core, Brain Research Unit (BRU), Aalto University, Espoo, and the other at BioMag laboratory (BioMag) at the Helsinki University Central Hospital, Helsinki; both devices are located within high-quality magnetically shielded rooms (MSRs), and the sites are separated by 5 km.

GENERAL
We constructed a short-latency audio communication system that enables connecting two MEG recording sites. Specifically, the system allows: 1. Free conversation between the two subjects located at the two laboratories. 2. Instructing both subjects by an experimenter at either site. 3. Presentation of acoustic stimuli from either laboratory to both subjects.
Each laboratory is equipped with a custom-built system for recording the incoming and outgoing audio streams. The audio recording systems of both sites are synchronized with the local MEG devices and to each other, allowing millisecond-range alignment of the MEG and audio data streams.

AUDIO-COMMUNICATION SYSTEM
We devised a flexible audio-communication system for setting up audio communication between the subjects in the MSRs and/or experimenters in the MEG control rooms at the two laboratories.
The system comprises two identical sets of hardware at the two sites, each including: 1. An optical microphone (Sennheiser MO2000SET; Sennheiser, Wedemark, Germany) used for picking up the voice of the subject inside the MSR. The microphone is MEG-compatible and provides good sound quality. 2. Insert earphones with plastic tubes between the ear pieces and the transducer (Etymotic ER-2, Etymotic Research, Elk Grove Village, IL, USA) to deliver the sound to the subject. 3. Microphones and headphones for the experimenter in the control room. 4. Two ISDN landline phone adapters enabling communication between the laboratories. 5. An 8-channel full-matrix digital mixer (iDR-8; Allen & Heath, Cornwall, United Kingdom) connected to all the audio sources and destinations described above. Additionally, the mixer is connected to the local audio recording system and the stimulus computer.
To eliminate the problem of crosstalk between the incoming and outgoing audio streams, each of the two ISDN telephone landlines was devoted for streaming the audio in one direction only.
In "free" conversation experiments, the two subjects can talk to each other and the experimenters at both sites can listen to the conversation. In a simple auditory stimulation experiment (reported below), sounds can be delivered from the stimulus computer at one site to both subjects.

LATENCIES OF SOUND TRANSMISSION
We examined the delays introduced by our setup into the audio streams: 1. The silicone tubes used for delivering the sound to subject's ear introduced a constant delay of 2.0 ms. 2. Each mixer introduced a constant delay of 2.7 ms from any input to any output. 3. The delay of the telephone landlines was stable and free of jitter. We estimated this delay before each experiment by measuring the round-trip time of a brief audio signal presented over a loop including the two phone lines and the two mixers; the round-trip time was consistently 16 ms.
In sum, the total local transmission delay from the stimulus computer to the local participant at each laboratory was 2.0 + 2.7 = 4.7 ms.
The lab-to-lab transfer time to the remote laboratorycomputed from the local stimulus computer to the participant at the remote laboratory-was 12.7 ms (4.7 ms local transmission delay + 8 ms remote mixer and phone line delay). As the local transmission delays (4.7 ms) were identical for each participant, only the lab-to-lab transfer time was taken into account in the analysis of the two MEG datasets (see below).

AUDIO RECORDING
At each site, the audio signals were recorded locally using a dedicated PC (Dell OptiPlex 980) running Ubuntu Linux 10.04 and in-house custom-built audio-recording software. Each PC was equipped with an E-MU 1616 m soundcard (E-MU Systems, Scotts Valley, California, USA), and it recorded the incoming and outgoing audio streams at a sampling rate of 22 kHz. The same audio signals were also recorded by the local MEG system as auxiliary analog input signals (at a rate of 1 kHz) for additional verification of the synchronization.

SYNCHRONIZATION
The audio and MEG data sets were synchronized locally by means of digital timing signals, generated by the audio-recording software and fed from the audio recording computer's parallel port to a trigger channel of the MEG device. To time-lock data from the two sites, the real-time clocks of the audio-recording computers at the two sites were synchronized via the Network Time Protocol (NTP). To pass through the hospital firewall (at BioMag), the NTP protocol was tunneled over a secure shell (SSH) connection established between the sites. The achieved local audio-MEG synchronization accuracy was about 1 ms. The typical latency of the network connection between the two sites (as measured by the "ping" command) was about 1 ms, and the NTP synchronization accuracy, as reported by the "ntpdate" command, was typically better than 1 ms. Thus we were able to achieve about 2-3 ms end-to-end synchronization accuracy between the two MEG devices. We did not observe any significant loss of the NTP synchronization in a 4.5 h test run.

STIMULATION FOR AUDITORY EVOKED FIELDS
For recording of cortical auditory evoked fields, 500 Hz 50 ms tone pips (10 ms rise and fall times) were generated with a stimulation PC (Dell Optiplex 755) running Windows XP and the Presentation software (Neurobehavioral Systems Inc., CA, USA; www.neurobs.com; version 14.8 at BRU and version 14.7 at BioMag). The sound level was adjusted to be clearly audible but comfortable for both participants. During each recording session, stimuli were generated at one laboratory and presented to both subjects (locally to the local subject and over the telephone line to the subject at the remote site). Stimulation was synchronized locally by recording the stimulation triggers generated by the Presentation software.
The interstimulus interval was 2005 ms, and each block comprised 120 tones. The stimuli were delivered in two blocks from each site.

DATA ACQUISITION
The MEG signals were recorded with two similar 306-channel neuromagnetometers by Elekta Oy (Helsinki, Finland): Elekta Frontiers in Human Neuroscience www.frontiersin.org Neuromag® system at BRU and Neuromag Vectorview system at BioMag. Both devices comprise 204 orthogonal planar gradiometers and 102 magnetometers on a similar helmet-shaped surface. However, despite the slightly different electronics and data acquisition systems, the sampling rates were the same within 0.16%. Both devices were situated within high-quality MSRs (at BRU, a three-layer room by Imedco AG, Hägendorf, Switzerland; at BioMag, a three-layer room by Euroshield/ETS Lindgren Oy, Eura, Finland). During the recording, the participants were sitting with their eyes open and their heads were covered by the MEG sensor arrays (see Figure 1). In addition to the MEG channels, vertical electro-oculogram, stimulus triggers, digital timing signals for synchronization, and audio signals were recorded simultaneously into the MEG data file. All channels of the MEG data file were filtered to 0.03-330 Hz, sampled at 1000 Hz and stored locally.
The position of the subject's head with respect to the sensor helmet was determined with the help of four head-positionindicator (HPI) coils, two attached to mastoids and two attached to the forehead of both hemispheres. Before the measurement, the locations of the coils with respect to three anatomic landmarks (nasion and left and right preauricular points) were determined using a 3-D digitizer before the measurement. The HPI coils were activated before each stimulus block, and the head position with respect to the sensor array was determined on the basis of the signals picked up by the MEG sensors.
External interference on MEG recordings was reduced offline with the signal-space separation (SSS) method (Taulu et al., 2004). Averaged evoked responses were low-pass filtered at 40 Hz. The 900 ms analysis epochs included a 200 ms pre-stimulus baseline.

DATA ANALYSIS
For comparable analysis of the two data sets, the 8 ms remote mixer and phone line delay to the remote laboratory had to be taken into account. First, the two datasets were synchronized according to the real-time stamps recorded during the measurement. Thereafter, the triggers in the remote data were shifted forward by 8 ms. With the applied 1000 Hz sampling rate, the accuracy of the correction was 1 ms.
The magnetic field patterns of the auditory evoked responses were modeled with equivalent current dipoles, one per hemisphere. The dipoles were found by a least-squares fit to best explain the variance of 28 planar gradiometer signals over each temporal lobe.

RESULTS
The lower part of Figure 1 shows, for both subjects, eight unaveraged MEG traces from temporal-lobe and occipital-lobe gradiometers. The two lowest channels below the MEG traces illustrate both the local and remote audio streams, in this case indicating alternate counting of numbers by the two subjects. Figure 2 shows the source waveforms for the auditory evoked fields modeled as current dipoles located in the supratemporal auditory cortices of each hemisphere. For both subjects, N100m peak latencies were similar for tones presented locally (black lines) and over the auditory link (red lines). Response amplitudes were FIGURE 2 | Source waveforms of averaged auditory evoked fields from both participants to tones presented locally (black lines) and remotely (red lines), separately for the left and right hemisphere. The superimposed traces illustrate replications of the same stimulus block. Please note that we did not rigorously control the sound intensities in this proof-of-the-concept experiment, and thus the early difference between local and remote sound presentations in the left hemisphere of the BRU subject likely reflects differences in sound quality.
well replicable both for the local and the remote presentations, as is evident from the superimposed two traces for both conditions; Table 1 shows source strengths and peak latencies for both subjects and stimulus repetitions.

DISCUSSION
We introduced a novel MEG-to-MEG setup to study two interacting subjects' brain activity with good temporal and reasonable spatial resolution. The impetus for this work derives from the view that dyads rather than individuals form the proper analysis units in studies of the brain basis of social interaction (Hari and Kujala, 2009;Dumas, 2011). Within this kind of "two-person neuroscience" framework, it is evident that one cannot obtain all the necessary information by studying just one person at a time, and simultaneous recordings of the two interacting persons' brain function are required.
It is well known that just the presence of another person affects our behavior. Daily social life comprises various types of interactions, from unfocused encounters happening on busy streets (where the main obligation is not to bump into strangers, andshould it happen-to politely apologize) to focused face-to-face interactions with colleagues, friends, and family members. We spend much time observing other people's lives that intrude into our homes via audiovisual media and literature. Normal social interaction is, however, more symmetric and mutual so that information flows in both directions, with verbal and nonverbal cues tightly coupled.
Social interaction is characterized by its rapid and poorly predictable dynamics. One important issue for any hyperscanning approach is thus the required time resolution. The facial expression of a speaker can change clearly even during a single phoneme (Peräkylä and Ruusuvuori, 2006), and to pick up the brain effects of such fleeting nonverbal cues requires a temporal resolution not worse than a hundred or tens of milliseconds (Hari et al., Frontiers in Human Neuroscience www.frontiersin.org  I  II  I  II  I  II  I  II   BRU  Latency  105  105  98  96  105  105  99  101   Amplitude  38  39  45  52  34  37  36  38   BioMag  Latency  90  90  94  94  90  94  94  94   Amplitude  59  47  67  64  115  115  107  98 Data are given separately for both sessions (I and II) and for both hemispheres. 2010); similar time scales would be also needed for monitoring of brain events related to turn-taking in a conversation (Stivers et al., 2009). Moreover, brain rhythms that have been hypothesized to play an important role in social interaction (Wilson and Wilson, 2005;Tognoli et al., 2007;Schroeder et al., 2008;Lindenberger et al., 2009;Scott et al., 2009;Dumas et al., 2010;Hasson et al., 2012) are very fast (5-20 Hz) compared with hemodynamic variations and can be only picked up by electrophysiological methods. However, the below 1 Hz modulations of neuronal signals have clear correlates in the hemodynamics (Magri et al., 2012), meaning that the electrophysiological (MEG/EEG) and hemodynamic (fMRI/NIRS) approaches complement each other in the study of the brain basis of social interaction.
Compared with EEG, the rather straightforward source analysis of MEG is beneficial for pinpointing the generators of both evoked responses and spontaneous activity. For example, the differentiation of the rolandic mu rhythm from the parieto-occipital alpha rhythm (for a review, Hari and Salmelin, 1997), appearing in overlapping frequency bands, is easy with MEG-often evident just by examining the spatial distributions of the signals at the sensor level-but the corresponding differentiation is strenuous with EEG because extracerebral tissues smear the potential distribution that is also affected by the site of the reference electrode (Hari and Salmelin, 2012).
Our MEG-to-MEG setup, with its high temporal resolution and reasonable spatial resolution, therefore, provides a promising tool for studying the brain basis of social interaction. In the following, we discuss the technical aspects and future applications of the established MEG-to-MEG link.

TECHNICAL PERFORMANCE OF THE MEG-TO-MEG LINK
Our major technical challenge in building the MEG-to-MEG link was to create a stable and short-latency audio connection between two laboratories. Both these criteria were met. The obtained 12.7 ms lab-to-lab transmission time corresponds to sound lags during normal conversation between participants about 4 m apart. Thus, our subjects were not able to perceive the delays of the audio connection.
High sound quality was another central requirement, and the selected optical microphones and the telephone-line bandwidth were sufficient for effortless speech communication.
Moreover, it was crucial to accurately synchronize the MEG datasets of the two laboratories. We achieved offline alignment accuracy of 3 ms by synchronizing the computers at the two sites to a real-time clock via NTP, and by recording the digital timing signal to both MEG data files. As a result, the millisecond temporal resolution of MEG was preserved in the analysis of the two subjects' brain signals in relation to each other.
Recording of auditory evoked cortical responses, used as a "physiological test" of the connection, also endorsed the good quality of the established MEG-to-MEG link: the prominent 100 ms deflections were similar in amplitude and latency when the stimuli were presented from either laboratory.

FURTHER DEVELOPMENT AND APPLICATIONS
The current setup with combined MEG and audio recordings could be extended to multi-person interaction studies with only a few extra steps, even connecting subjects located in various parts of the world. As the major part of human-to-human interaction is nonverbal, one evident further development is the implementation of an accurate video connection that, however, will inherently involve longer time lags than does the audio connection.
Face-to-face interaction, obtainable with such a video link, gives immediate feedback about the success and orientation of the interaction. Fleeting facial expressions that uniquely color verbal messages in a conversation are impossible to be mimicked in a conventional brain-imaging setting where one prefers to study all participants in as equal conditions as possible.
The MEG-to-MEG connection can be further enriched by adding, e.g., eye tracking and/or measures of the autonomic nervous system. Just glancing at another person, even briefly, during the interaction gives information about the mutual understanding between them; for example, too sluggish reactions would be interpreted as lack of presence of the partner. Eye gaze also informs about turn-taking times in conversation, and gaze directed to the same object tells about shared attention. Eye-gaze analysis has already given interesting results on the synchronization of two persons' behavior (Kendon, 1967;Richardson et al., 2007).
It has to be emphasized that analysis of the two-person datasets still remains the bottleneck in dual scanning experiments. The analysis approaches attempted so far range from looking at the similarities between the participants' brain signals, searching for inter-subject coupling at different time scales, and combining the two persons' data to obtain a more integrative view of the whole situation. In a recent joint improvisation task-applying a mirror game where two persons follow and lead each other without Frontiers in Human Neuroscience www.frontiersin.org any pre-set action goals-the participants entered in smooth coleadership states in which they did not know who was leading and who was following (Noy et al., 2011). Thus any causality measures trying to quantify information flow from one brain to another during a real-life-like interaction likely run into problems. This example also illustrates the uniqueness of real-life interaction: it would be impossible to recreate exactly the same states even if the same participants would be involved again. Thus measuring the brain activity of both interaction partners at the same time is crucial for tracking down any coupling between their brain activities.
One may try to predict one person's brain activity with the data of the other or to use, e.g., machine-learning algorithms to "decode" from brain signals joint states of social interaction, such as turn-taking in a conversation. Beyond these data-driven approaches, this field of research calls for better conceptual basis for the experiments, analysis, and interpretations. One of the first steps is, however, the acquisition of reliable data, to which purpose the current work contributes.