Effects of Background Music on Objective and Subjective Performance Measures in an Auditory BCI

Several studies have explored brain computer interface (BCI) systems based on auditory stimuli, which could help patients with visual impairments. Usability and user satisfaction are important considerations in any BCI. Although background music can influence emotion and performance in other task environments, and many users may wish to listen to music while using a BCI, auditory, and other BCIs are typically studied without background music. Some work has explored the possibility of using polyphonic music in auditory BCI systems. However, this approach requires users with good musical skills, and has not been explored in online experiments. Our hypothesis was that an auditory BCI with background music would be preferred by subjects over a similar BCI without background music, without any difference in BCI performance. We introduce a simple paradigm (which does not require musical skill) using percussion instrument sound stimuli and background music, and evaluated it in both offline and online experiments. The result showed that subjects preferred the auditory BCI with background music. Different performance measures did not reveal any significant performance effect when comparing background music vs. no background. Since the addition of background music does not impair BCI performance but is preferred by users, auditory (and perhaps other) BCIs should consider including it. Our study also indicates that auditory BCIs can be effective even if the auditory channel is simultaneously otherwise engaged.

Visual BCIs can yield high classification accuracy and information transfer rate (Kaufmann et al., 2011;Riccio et al., 2012;Jin et al., 2014Jin et al., , 2015Zhang et al., 2014;Chen et al., 2015;Yin et al., 2016). However, these BCIs are not useful for patients who cannot see. Tactile BCIs have been validated with patients with visual disabilities, including persons with a disorder of consciousness (DOC) (Kaufmann et al., 2014;Edlinger et al., 2015;Li et al., 2016). Devices that can deliver the tactile stimuli used in modern tactile BCIs are less readily available and usable than the tools required for auditory BCIs. Most end users for BCIs do not have vibrotactile stimulators and experience using them, but do have headphones, laptops, cell phones, and/or other devices that can generate auditory stimuli that are adequate for modern auditory BCIs.
Several groups have shown that an auditory P300 BCI could serve as a communication channel for severely paralyzed patients, including persons diagnosed with DOC. Indeed, DOC patients could also benefit from BCI technology to assess cognitive function (Risetti et al., 2013;Lesenfants et al., 2016;Käthner et al., 2013;Edlinger et al., 2015;Ortner et al., accepted). Since many DOC patients cannot see, and have very limited means for communication and control, they have a particular need for improved auditory BCIs.
Auditory BCI systems require users to concentrate on a target sound, such as a tone, chime, or a word (Kübler et al., 2001(Kübler et al., , 2009Hill et al., 2004;Vidaurre and Blankertz, 2010;Kaufmann et al., 2013;Treder et al., 2014). Auditory BCIs entail some different challenges from visual BCIs. Compared to vision, sound perception is relatively information-poor (Kang, 2006). Concordantly, the event-related potentials evoked in auditory BCI systems may lead to less effective discrimination between attended and unattended stimuli (Belitski et al., 2011;Chang et al., 2013). To improve the performance of auditory P300 BCI systems, many studies focused on enhancing the difference between attended and ignored events, which could produce more recognizable differences in the P300 and/or other components (Hill et al., 2004;Furdea et al., 2009;Guo et al., 2010;Halder et al., 2010;Nambu et al., 2013;Höhne and Tangermann, 2014). These efforts have made progress, but also show the ongoing challenge of identifying the best conditions for an auditory BCI. Some work has explored BCIs to control music players and similar systems to improve quality of life. Music could affect users' emotions (Kang, 2006;Lin et al., 2014), which could make BCI users feel comfortable during BCI use. Tseng et al. (2015) developed a system to select music for users based on their mental state (Tseng et al., 2015). Treder et al. (2014) explored a multistreamed musical oddball paradigm as an approach to BCIs (Treder et al., 2014). Their article presents a sound justification for this paradigm: "In Western societies, the skills involved in music listening and partly, music understanding are typically overlearnt." This paper introduces a simple auditory BCI system that includes background music, which we validated through offline and online experiments. This BCI system does not require musical training or expertise. Percussion sounds from cymbals, snare drums, and tom tom drums were used as stimuli and presented over headphones. We chose percussion stimuli because they are easy to recognize, and easy distinguish from each other and background piano music. We hypothesized that subjects would prefer background music, and would not perform worse while background music is playing. Hence, we explored the effect of background music on auditory BCI performance and users' subjective experience, evaluated via surveys. If successful, our approach would render auditory BCIs more ecologically valid.

Subjects and Stimuli
Sixteen healthy right handed subjects (8 male, 8 female, aged 21-27 years, mean age 24.8 ± 1.5) participated in this study. Nine of the subjects had prior experience with an auditory BCI. All subjects' native language was Mandarin Chinese. Each subject participated in one session within 1 day. The order of the conditions was counterbalanced across subjects for each session.
All subjects signed a written consent form prior to this experiment and were paid 50 RMB for their participation in each session. The local ethics committee approved the consent form and experimental procedure before any of the subjects participated.
Three percussion sounds (cymbals, snare drums, and tom tom drums) were used as stimuli. The "cymbals" stimulus was played in the right headphone, the "snare drum" stimulus was played through both headphones to sound as if it came from the middle, and the "tom tom drum" stimulus was played in the left headphone (see Figure 1). For each subject, we confirmed the subject could hear the stimulus clearly.
Experimental Set Up, Offline, and Online Protocols EEG signals were recorded with a g.HIamp and a g.EEGcap (Guger Technologies, Graz, Austria) with active electrodes, sampled at 1200 Hz and band pass filtered between 0.1 and 100 Hz. g.HIamp uses wide-range DC-coupled amplifier FIGURE 1 | The three percussion stimuli used in this study, and their spatial distribution. technology in combination with 24-bit sampling. The result is an input voltage of ±250 mV with a resolution of <60 nV. The impedance of the electrodes was less than 30 k . Data were recorded and analyzed using the BCI platform software package developed through the East China University of Science and Technology. We recorded from 30 EEG electrode positions based on the extended international 10-20 system (see Figure 2). Active electrodes were referenced to the nose, using a front electrode (FPz) as ground. The recorded data was filtered using a high pass of 0.1 Hz, a low pass of 30 Hz, notch-filtered at 50 Hz for analysis and classification (Käthner et al., 2013). A prestimulus interval of 100 ms was used for baseline correction of single trials.
This study compared two conditions: with no background music (WNB) and with background music (WB). The latter condition used piano music as the background. The piano music was titled "Confession" from Falcom Sound Team jdk. Each subject participated in offline and online sessions for both conditions within the same recording session.
In the offline session, the order of the two conditions was decided pseudorandomly. Each subject completed fifteen runs of one condition, then fifteen runs of the other condition, with a 2 min break after every five runs. Each run contained twelve trials that each consisted of one presentation of each of the three auditory stimuli. At the beginning of each run, an auditory cue in Chinese told the subject which stimulus to count during the upcoming run. The first auditory stimulus began 2 s after the trial began. The stimulus "on" time was 200 ms and the stimulus "off " time was always 100 ms, yielding an SOA of 300 ms. The three auditory stimuli were randomly distributed between stimulus type and corresponding location (see Figure 1), with the constraint that the same stimuli did not occur twice in succession. The target to target interval (TTI) was at least 600 ms. There was a 4 s break at the end of each run, and no feedback was provided. Thus, the offline session took a little over 15 min (0.3 s × three stimuli × twelve trials × fifteen runs × two conditions + a two min break × five times). Subjects had a 5 min break after the offline session.
The online session presented the two conditions in the same order as the offline session. However, there were 24 runs per condition, the number of trials per run was selected adaptively, and subjects received feedback at the end of each run (Jin et al., 2011a). This "adaptive classifier" means that the system would end the run and present feedback if the classifier chose the same output on two consecutive trials. Thus, the minimum number of trials per run was two. Each run still began with an auditory cue (in Chinese) to instruct the subject which target stimulus to count. At the end of the run, the target that the BCI system identified was presented to the subject via a human voice played through the target speaker (left, right, or front), as well as via the monitor. The time required for the online session varied because of the adaptive classifier.

Classification Scheme
The EEG was down-sampled by selecting every 30th sample from the EEG. The first 1000 ms of EEG after each stimulus presentation was used for feature extraction. Spatial-Temporal Discriminant Analysis (STDA) was used for classification (Zhang et al., 2013b). Data acquired offline were used to train the STDA classifier model. This model was then used in the online BCI system. STDA has exhibited superior ERP classification performance relative to competing algorithms (Hoffmann et al., 2008).

Subjective Report
After completing the last run of each session, each subject was asked two questions about each condition. Each question could be answered on a 1-5 rating scale indicating strong disagreement, moderate disagreement, neutrality, moderate agreement, or strong agreement. Subjects were also allowed to answer with intermediate replies (i.e., 1.5, 2.5, 3.5, and 4.5), thus allowing nine possible responses to each question. All questions were asked in Chinese. The two questions were: (1) Did you prefer this condition when you were doing the auditory task? (2) Did this condition make you tired?

Statistical Analysis
Before statistically comparing classification accuracy, the "outputs per minute" and "correct outputs per minute" were statistically tested for normal distribution (One-Sample Kolmogorov Smirnov test) and sphericity (Mauchly's test). Consecutively, repeated measures ANOVAs or t-tests with stimulus type as factor were conducted. Post-hoc comparison was performed with Tuckey-Kramer tests. The alpha level was adjusted according to Bonferoni-Holm. Non-parametric Kendall tests were computed to statistically compare the questionnaire replies. Figure 3 shows the averaged evoked potentials from the online data over sites Fz, FCz, C5, Cz, C6, CPz, Pz, and Oz. These potentials were averaged from subjects who obtained higher than 70% classification accuracy in all conditions. Figure 3 shows fairly weak negative potentials before 200 ms, and less distinct potentials in occipital areas.

RESULTS
This study had two conditions: WNB and WB. Table 1 shows the online classification accuracy, "outputs per minute, " and "correct outputs per minute" for both conditions. The "outputs per minute" was defined as follows: in which N is the "outputs per minute" and Na reflects the averaged trials in a run. The terms t 1 and t 2 denote the time required for a trial and the 4 s break between two runs, respectively.
The CN is the "correct outputs per minute, " and acc is the accuracy of each subject in the online experiment. Paired samples t-tests were used to show the differences between the WNB and WB conditions. There were no significant differences between the WNB and WB conditions in classification accuracy [t (1, 15) = −1.2, p > 0.05], in "output characters" per minute [t (1, 15) = 0.8, p > 0.05] and in "correct outputs" per minute [t (1, 15) = −0.9, p > 0.05]. This result suggests that background music did not affect performace. Table 2 presents the subjects' replies to questionnaires about the WNB and WB conditions. Non-parametric Kendall tests were used to explore these differences. Results showed a significant preference for background music (p < 0.05). Only one subject (AB44) showed a preference for the WNB condition. AB44 also verbally reported that he felt that the background music affected his task performance. There was no significant   difference between the WNB and WB conditions in tiredness (p > 0.05). Figure 4 shows the contributions of ERPs between 1 and 300 ms, between 251 and 450 ms and between 451 and 800 ms for classification performance across subjects. The independent variables were the three time windows, and the dependent variable was the classification accuracy. Figure 4 shows that ERPs between 1 and 300 ms did not contribute strongly to classification, unlike the P300 potential between 250 and 450 ms. Figure 4 also shows that negative ERPs that were predominant between 451 and 800 ms. A two-way repeated measures ANOVA was used to show the classification accuracies based on these time windows [F (2, 30), p < 0.016]. Potentials between 451 and 800 ms yielded significantly higher classification accuracy than the ERPs between 251 and 450 ms (p < 0.016) and the ERPs between 1 and 300 ms (p < 0.016), and the potentials between 251 and 450 ms obtained significantly higher classification accuracy compared to the ERPs between 1 and 300 ms (p < 0.016). A one-way repeated measures ANOVA was used to test the contributions to classification accuracy among ERPs in different time windows for the WNB condition [F (2, 30) = 14.1, p < 0.016] and WB condition [F (2, 30) = 19.8, p < 0.016) respectively. The result showed that the potentials between 451 and 800 ms obtained significantly higher classification accuracy compared to the ERPs between 1 and 300 ms (p < 0.016) and the ERPs between 251 and 450 ms (p < 0.016), except for the WB pattern. The potentials between 451 and 800 ms did not obtain significantly higher classification accuracy compared to the ERPs between 251 and 450 ms (p = 0.029) in WB condition.

Effects of Background Music
The main goal of this study was to assess the effects of background music on the performance and user preferences on an auditory BCI. Results showed that the subjects preferred the WB condition over the WNB condition (see Table 2). There were no significant differences in classification accuracy between these two conditions. These results indicate that background music could make auditory BCI users more comfortable without impairing classification accuracy (see Figure 4, Tables 1, 2).
The classification accuracy and information transfer rate in WB condition in the present study were at least comparable to related work. For example, Halder and colleagues presented results with a three stimulus auditory-based BCI  and discussed training with an auditory-based BCI (Halder et al., 2016). The information transfer rate of the three stimuli BCI was lower than present study (the best of them was 1.7 bit/min). Käthner and colleagues reported that the average accuracy of their auditory-based BCI is only 66% and the SD is 24.8 (Käthner et al., 2013). Some other studies also reported that the average accuracies of their auditory-based BCI were about 70% (Schreuder et al., 2009;Belitski et al., 2011;De Vos et al., 2014). Compared to these studies, the accuracy and information transfer rate in the present study were very common for auditorybased P300 BCI.

Trial-to-Trial Interval (TTI) and Number of Stimuli
This study used a fairly long SOA (300 ms) because our paradigm only used three stimuli. Although we avoided successive repetition of the same stimulus in the same position, shorter TTIs could have made it difficult for subjects to distinguish the Frontiers in Computational Neuroscience | www.frontiersin.org FIGURE 4 | The contributions of the evoked potentials between 1 and 300 ms, between 251 and 450 ms and between 451 and 800 ms to BCI classification performance, across subjects. different stimuli and reduced P300 amplitude (Gonsalvez and Polich, 2002).
Most P300 BCIs use more than three stimuli. Adding more stimuli and making them more distinct could make a shorter SOA feasible. For example, Höhne and Tangermann (2014) used an 83.3 ms ISI with 26 stimuli, with a duration of 200-250 ms (Höhne and Tangermann, 2014). In their study, the target stimulus was presented after 6-10 non-target stimuli, and the TTI was at least 600 ms, which should be enough for the subjects to detect the target stimulus. Figure 4 showed that ERPs before 300 ms contributed to classification accuracy less than ERPs from the other two time windows that we analyzed. Some auditory studies reported that their paradigms could evoke clear mismatch negative potentials (MMNs) (Hill et al., 2004;Kanoh et al., 2008;Brandmeyer et al., 2013). However, there were no clear negative potentials from target trials around 200 ms, due to differences in our stimuli and task instructions. Several factors can affect the MMN, including modality, stimulus parameters, target probability, sequence order, and task instructions (Näätänen et al., 1993;Pincze et al., 2002;Sculthorpe and Campbell, 2011;Kimura, 2012). Figure 4 shows that the time window between 451 and 800 ms yielded significantly higher classification accuracy than the other time windows (1-300 ms and 251-450 ms) in most comparisons. Thus, late potentials (after 450 ms) contributed to classification accuracy more than early potentials (before 450 ms) in this study.

ERPs and Relative Contributions to Classification Accuracy
The evoked potentials were weak in occipital areas (see Figure 3). This result suggests that the BCI presented here might be practical with a reduced electrode montage that does not include occipital sites. Thus, a conventional electrode cap may not be necessary. Alternate means of mounting electrodes on the head (such as headphones) might reduce preparation time and cost while improving comfort and ease-of-use. This could be especially important in long-term use for patients with DOC or other severe movement disabilities, when occipital electrodes can become uncomfortable if the head is resting on a pillow or cushion.

CONCLUSION
The main goal of this paper was to explore the effects of adding background music to an auditory BCI approach that used three stimuli . Results showed that the users preferred background music to the canonical approach (no background music) without significant changes in BCI performance. While auditory BCIs have been validated in prior work (Hill et al., 2004;Kübler et al., 2009;Treder et al., 2014;Lesenfants et al., 2016;Ortner et al., accepted), this outcome suggests that future auditory, and perhaps other, BCIs could improve user satisfaction by incorporating background music. Further work is needed to explore issues such as: the best signal processing methods and classifiers; performance with target patients at different locations; improving performance with inefficient subjects; and different types of auditory stimuli and background music, including music chosen based on each subject's mental state (Tseng et al., 2015).

AUTHOR CONTRIBUTIONS
SZ operated the experiment and analyzed the data. BA improved the paper in discussion and introduction. AK improved the experiment and the method and result part. AC offer help in algorithm. JJ offered the idea of this paper and wrote the paper. Dr. Yu zhang help in classification method. XW guided the experiment.