Feedback Related Potentials for EEG-Based Typing Systems

Error related potentials (ErrP), which are elicited in the EEG in response to a perceived error, have been used for error correction and adaption in the event related potential (ERP)-based brain computer interfaces designed for typing. In these typing interfaces, ERP evidence is collected in response to a sequence of stimuli presented usually in the visual form and the intended user stimulus is probabilistically inferred (stimulus with highest probability) and presented to the user as the decision. If the inferred stimulus is incorrect, ErrP is expected to be elicited in the EEG. Early approaches to use ErrP in the design of typing interfaces attempt to make hard decisions on the perceived error such that the perceived error is corrected and either the sequence of stimuli are repeated to obtain further ERP evidence, or without further repetition the stimulus with the second highest probability is presented to the user as the decision of the system. Moreover, none of the existing approaches use a language model to increase the performance of typing. In this work, unlike the existing approaches, we study the potential benefits of fusing feedback related potentials (FRP), a form of ErrP, with ERP and context information (language model, LM) in a Bayesian fashion to detect the user intent. We present experimental results based on data from 12 healthy participants using RSVP Keyboard™ to complete a copy-phrase-task. Three paradigms are compared: [P1] uses only ERP/LM Bayesian fusion; [P2] each RSVP sequence is appended with the top candidate in the alphabet according to posterior after ERP evidence fusion; corresponding FRP is then incorporated; and [P3] the top candidate is shown as a prospect to generate FRP evidence only if its posterior exceeds a threshold. Analyses indicate that ERP/LM/FRP evidence fusion during decision making yields significant speed-accuracy benefits for the user.


INTRODUCTION
Event related potentials (ERPs) are commonly employed in the design of non-invasive electroencephalography (EEG)-based brain computer interfaces (BCIs) to detect the user intent (Farwell and Donchin, 1988;Acqualagna et al., 2010;Orhan et al., 2012;Akcakaya et al., 2014;Moghadamfalahi et al., 2015). The pioneer study from Donchin and Farewell demonstrated that ERPs can be used to design a letter by letter typing BCI (Farwell and Donchin, 1988). In addition to event related potentials (ERPs), depending on the BCI application, errorrelated potentials (ErrPs) can be used to indicate a perceived error. ErrPs are detectable as deflections in the EEG signal measured over the scalp of a person when they make or perceive an error (Falkenstein et al., 2000;Davies et al., 2004;Buttfield et al., 2006;Yazicioglu et al., 2006;Ferrez and del R. Millan, 2008;Gürel and Mehring, 2012;Margaux et al., 2012;Spüler et al., 2012;Kieffaber et al., 2016). Different variants of ErrPs can be measured in recorded EEG signal. For example, when the user realizes that the interface failed to properly recognize user's intention, an ErrP signal is induced, which can characterized by two fronto-central positive peaks appearing 200 and 320 ms after the feedback; a fronto-central negativity near 250 ms and at last, broader frontocentral negative deflection about 450 ms after the feedback. These latencies can change depending on the experimental paradigm (Iturrate et al., 2013). Moreover, some studies have demonstrated correlation between trial-by-trial estimates of the ErrP and the post-error slowing (Debener et al., 2005). Based on these studies, it has been proposed that the negative deflection of the ErrP signal is the result of an error-detection mechanism, as opposed to being an inhibitory or corrective signal. In addition, it has been studied that the positive components of the ErrP reflects conscious error processing or post-error adjustment of response strategies (Falkenstein et al., 2000).
While some BCI typing systems have shown encouraging results (Kawala-Sterniuk et al., 2021), there is still much work to be done to produce real-world-worthy systems that can be comfortably, conveniently, and reliably used by individuals with severe neuromuscular disabilities who cannot use standard communication pathways or other assistive technologies. This work presents several improvements to a languagemodel-assisted EEG-based typing BCI, RSVP Keyboard TM (Moghadamfalahi et al., 2015), as well as similar designs that depend on visually evoked P300 potentials. The baseline system fuses text/language and EEG evidence to infer user intent in EEG-controlled spelling to generate expressive language. In particular, we study the potential benefits of fusing feedback related potentials, a form of ErrP, with ERP and context information (language model, LM) in a Bayesian framework. The probabilistic evidence for ERP, ErrP, and non-EEG are computed using different probabilistic generative models.
We represent the domain knowledge and casual relationship among difference variables in a probabilistic graphical model. The presented approach is a general dynamic fusion framework that could be used with various presentation paradigms. Typing interfaces aim to reach a certain confidence level before making a decision on the user intent, and accordingly, sequences of symbols are repeated multiple times. In our approach, after every presented sequence, we compute the posterior distribution of the symbol set (all the symbols in the English alphabet and the backspace symbol) conditioned on ERP likelihoods and LMbased priors. The mode of posterior distribution is selected as prospect symbol that is presented to the user, either after every sequence or after a confidence threshold is reached. The prospect symbol is an additional visual stimuli, which induces an EEG response that is indicative of that prospect's correctness. We refer to this response as feedback related potential (FRP), which takes the form of an ErrP/non ErrP indicating an incorrect/correct prospect symbol being presented. After the prospect symbol is presented and the new FRP evidence is obtained, through the Bayesian graphical model, the FRP evidence is fused with the EEG and LM-based evidence and the posterior distribution of the symbols is updated. Given the low signal-to-noise-ratio of EEG, we take an iterative update approach by presenting multiple sequences of ERP and FRP stimuli to the user to compute a more robust estimate, until the posterior reaches an information theoretic confidence threshold. User intent is then selected using maximum a posteriori (MAP) inference.
Existing typing BCIs that attempt to use ERP/FRP jointly typically fall into one of these categories: a flag produced by the ErrP classifier results in (a) the deletion of the last selection made using the ERP classifier (Dal Seno et al., 2010;Schmidt et al., 2012;Spüler et al., 2012;Chavarriaga et al., 2014); (b) replacing the last selection made using the ERP classifier with the second probable option (Combaz et al., 2012;Margaux et al., 2012;Chavarriaga et al., 2014); (c) presenting more stimuli to gather additional ERP evidence, but not using the FRP to update symbol probabilities over the alphabet (Combaz et al., 2012). A language model is not fused with ERP evidence in these particular examples, but it has been suggested for boosting both ERP and FRP evidence assessment. Unlike these early attempts on using FRP evidence to make hard decisions based on ErrP classifier outputs, we seek Bayesian fusion of ERP, FRP, and language evidence using probabilistic generative models. The system presented in this paper automatically decides to select a letter to type or proceed with more ERP/FRP evidence collection in a probabilistic fashion.
In an earlier study, we observed the potential enhancements that can be achieved through a joint probabilistic inference from all evidences (i.e., FRP, ERP, and LM), rather than using FRP as a switch Gonzalez-Navarro et al. (2016a), Orhan et al. (2016). In the early study, Monte Carlo simulations are performed using synthetic EEG features from models calibrated with real ERP/FRP data, and the results are simulated for five users with synthetic EEG features (Gonzalez-Navarro et al., 2016a). As our simulation results suggested, Bayesian fusion of all evidence (FRP, ERP, and LM) yields faster typing speeds for all participants without compromising accuracy. On the other hand, use of ErrP in a sub-optimal fashion, by allowing FRP decisions to override ERP, also improved speed relative to not using FRP at all. But our results indicated that Bayesian fusion of FRP with ERP, and not treating the former as a de facto superior form of evidence, may yield better outcomes. Based on these results, we decided to conduct a new study, presented in this manuscript, to evaluate the performance of two different system strategies for a joint probabilistic inference framework. This is the first work where we study experimental results based on data from healthy participants. We study the potential benefits of fusing feedback related potentials (FRP) with ERP and context information (LM) in a Bayesian fashion to detect the user intent.
To illustrate the efficacy of our approach we use RSVP Keyboard TM (Moghadamfalahi et al., 2015), an EEG based BCI for letter by letter typing, which is described in more details in section 3. Three strategies [P1], [P2], and [P3] are compared in terms of speed, accuracy, and information transfer rate (ITR). The EEG for this study is acquired from 12 healthy participants using RSVP Keyboard TM to complete a copy-phrase-task. [P1], the baseline system fuses LM and ERP (collected from RSVPs) evidence in a Bayesian fashion to infer user intent. On the other hand, our novel propositions, [P2] and [P3], use a joint inference from all evidence (FRP, ERP, and LM) to make a decision. In [P2], FRP evidence is collected after every RSVP sequence; whereas in [P3], RSVP sequences are repeated multiple times until a confidence level is achieved, then the feedback is presented as the mode of estimated posterior (in other words, FRP evidence is collected less frequently in [P3]).

Decision Framework
In a typical letter by letter typing BCI application, the user has to select among a discrete set of task symbols from a Dictionary D = {A, B, . . . Z} ∪ {<, −} where "−" represents space symbol and "<" represents backspace symbol. Here, we examine how a BCI can infer a task symbol from different EEG evidence and prior context information. In particular, we build a decision framework that takes into account two types of EEG evidence: FRP and ERP evidence. We propose several methods for combining FRP, ERP evidence and prior context information, using real-time posterior probability updates. This BCI application utilizes a visual presentation module to detect the user intent and the EEG collected during the visual stimulation is then employed in decision making procedure.
Different visual presentation methods can be considered in order to evoke visual potentials. Rapid serial visual presentation (RSVP) paradigm is a minimally gaze dependent alternative for matrix presentation paradigms, that is aimed to induce ERPs for intent detection. In the RSVP paradigm, the symbols are rapidly presented as a time series on a prefixed location on the screen in a pseudo-random order, to evoke the response when the target symbol appears (Acqualagna et al., 2010;Orhan et al., 2012;Moghadamfalahi et al., 2015). In this presentation scheme, each flashing letter is a trial and in each "sequence, " a subset of dictionary is presented. From now on, we will be referring to only inducing ERP (target) evidence when we mention RSVP trial. Figures 1A,B illustrate a flash of a prospect symbol and RSVP trial respectively. Due to low signal-to-noise-ratio (SNR) of EEG, the system usually requires to query the user with more than one "sequence" and "prospect symbol" to achieve a desired confidence level before making a decision. The set of "sequence" and "prospect symbol" which leads to a decision is called an "epoch." In every epoch, it is assumed that the target symbol remains unchanged. Figure 1C represents a schematic of an EEG epoch in the RSVP Keyboard TM including a series of letters in an ERP sequence and a feedback stimulus as a "prospect symbol." The feedback stimulus is always presented at the end of the RSVP sequence (shown in green). In Figures 1A,B, "Press Space Bar or Enter to pause" indicates the Pause/Play button. "Esc to quit" indicates the exit button should the participant choose to end the experimental session. Both options are added to the experimental design for the convenience of the user.

Probabilistic Graphical Model (PGM)
The proposed probabilistic graphical model (PGM) that represents kth "epoch" for an EEG-based typing application is presented in Figure 2.
Here, a * k is a random variable which represents the user intent in epoch k, A c (t) = {a t j |j = 1, . . . |A c (t)|} is a subset from the dictionary D, treated as the "sequence" at instant t of the epoch k, c denotes for candidate, |A c (t)| is the number of symbols presented in the t-th sequence, C k represents the context information that has been provided with the language model for which we will provide a brief description, in section 2.5. Moreover, here we introduce A p (t) ∈ D. This set is a singleton A p (t) = {ā t } which includes the prospective symbol for the query set A p (t) at instant t (p denotes for prospect). In addition, e c (a t j ) and e p (ā t ) are the ERP and FRP evidence obtained in response to an RSVP trial a t j and feedback trialā t respectively. We assume that the user intent is not changing within an epoch. Hence, given that every a t j ∈ A c (t) andā t ∈ A p (t) are either target or nontarget, the intent inference can be formulated as a binary decision problem. Therefore y(·), z(·) correspond to binary class labels for ERP, FRP responses. Hence, y(a t j ) : = δ(a t j ; a * ) has a oneto-one relationship with the true state a * k such that y(a t j ) = 1 if a * k = a t j and 0 otherwise. Similarly, z(ā t ) : = δ(ā t ; a * ). N c and N p are the maximum number of "sequences" and "prospect symbols" that can be used in an epoch if a desired confidence level is not reached in reasonable duration. In the case that we do not use FRP evidence, the right box from the graphical model from Figure 2 will be eliminated and the rest will remain the same. We utilize the graphical model presented to compute the posterior distribution of the intended character a * k after collecting the EEG evidence and by utilizing the language model evidence. The details of the posterior distribution computation is given in section 2.4. In order to make inference on the user intent, we compare three different evidence acquisition paradigms (one for each strategy). These paradigms are discussed in section 2.3.

Evidence Acquisition Paradigms
Here, we present three different evidence acquisition paradigms:

[P1] (Baseline):
In this paradigm a set of pseudo-randomly ordered stimuli are presented to the user to elicit ERP. Each stimulus is a trial. Sets of trials that are presented with no time gaps are called a sequence A c (t). Every sequence can only contain up to one target stimulus. After each sequence, the posterior distribution over the character set is computed and a decision is made if the maximum probability exceeds a predefined threshold or a time limit is reached. Otherwise, the system continues with more sequences. This paradigm, is the baseline for RSVP Keyboard TM and it does not include FRP evaluation.

[P2] (Always FRP):
In this paradigm we first query the user with ERP sequences in a similar fashion as [P1], then the mode of posterior is depicted as a prospect symbol i.e. A p (t). A p (t) is then presented on a prefixed location of the screen, like in regular RSVP trials, to induce FRP in EEG. Depending on the instructions given to the user, this FRP may take the form of an error-related potential (ErrP) indicating an incorrect prospect symbol being presented. The collected EEG in response to each prospect symbol is used to update the posterior using the PGM shown in Figure 2. This paradigm is also utilizes MAP inference, in a procedure similar to [P1].

[P3] (Confirm FRP): This paradigm is similar to [P1] and
[P2] but the top candidate is shown as a prospect symbol to generate FRP evidence only if its posterior probability exceeds a threshold. The graphical model presented in Figure 2 is directly used to fuse the ERP and FRP evidence to infer the user intent.

Maximum a Posteriori (MAP) Inference
The decision making process utilizes a maximum a posteriori (MAP) inference mechanism for intent detection. The graphical model presented in Figure 2 is used to compute the posterior distribution of the intended symbol, after evaluating the ERP and FRP likelihoods in recorded EEGs during ERP and FRP sequences and using context priors. A general decision framework for the three evidence acquisition paradigms is presented in Figure 4. According to this framework, before making a final decision the ERP and FRP evidences corresponding to multiple sequences are aggregated and fused with the context prior. Different query selection methods [P i ] i = {1, 2, 3} are presented in Figure 4. (Please see section 2.3 for more details.) We estimate the prospective symbolā t ∈ A p (t) at instant t, as the mode of posterior distribution:  whereâ * k is the estimated user intent; is the set of observation for the query is the FRP EEG evidences for all the observed prospective sequences in epoch k at instant t; E p (A p (t)) = e p (ā t ) is the set of observation vectors for the prospective set A p (t). For [P2] and [P3] the FRP EEG evidence e p (ā t ) is obtained in response toā t .
To compute the posterior distribution in (1), we utilize the assumptions of the graphical model presented in Figure 2. According to this PGM, the ERP and FRP evidence and context information are independent when the intended symbol a k is given. Then for epoch k and at time instant t, after observing the query sets A c (t) and A p (t), the maximum a posteriori can be computed using the objective function in (2).
We can further assume that conditioned on the unknown symbol a k all EEG evidence from different trials are independent, and simplify the first two terms of Equation (2) as: According to the inference equation defined in (2), we need to estimate (i) the context prior that we estimated using a language model P(a * k = a|C), (ii) class conditional distributions over the ERP evidence p (e c |1) for target and p (e c |0) for non-target classes, and (iii) class conditional distributions over the FRP EEG evidence p e p |0 and p e p |1 . We have implemented the proposed ERP and FRP data acquisition paradigms using the RSVP Keyboard TM framework (Moghadamfalahi et al., 2015).

Context Information
To compute P(a * k = a|C), we utilize an n-gram language model which provides a prior probability over every symbol in the dictionary. We have shown that context information when fused with EEG evidence improves the system performance effectively (Orhan et al., 2013;Moghadamfalahi et al., 2015). An n-gram LM is a Markov model of order n−1. Let C = {a * m } m=n−1, ..., 1 , where a * m is the m th previously typed character. Then: P(a|C) = P(a|{a * m } m=n−1, ..., 1 ) = P(a, a * n−1 , . . . , a * 1 ) P(a * n−1 , . . . , a * 1 ) In our system, we use a 6-gram language model, which is trained on the NY Times portion of the English Gigaword corpus (Roark et al., 2010).

HUMAN-IN-THE-LOOP EXPERIMENTS
We perform a set of online experiments to compare the effects of [P1], [P2], and [P3] on system performance. We collected . The only part that differs in each paradigm is the select query block. BCI channel decides which query is going to be presented, the evidence from the query is collected in the user channel, α d is the decision threshold. N d is the total number of sequences (including ERP + FRP). Decision is made when the posterior probability of the selected symbol passes the threshold α d , or when the total number of sequences is reached (denoted with ≤ α d /N d ). In (B), t%2 stands for t mod 2 (modulo operation), indicating that the prospect symbol is shown once after every RSVP sequence.
data from 12 healthy participants (5 females), 22-38 years old. After a calibration session, participants were asked to perform a copy phrase task of RSVP Keyboard TM . The data were collected according to the guidelines of an IRB-approved protocol at Northeastern University (IRB 130107).

Method
In RSVP Keyboard TM , the EEG signal is acquired using a g.USBamp biosignal amplifier with active g.  (Luck, 2014). In our work, temporal-windowed EEG signals are filtered by [1.5,42] Hz bandpass filter (FIR, linear phase, length 153, 0 DC-gain) to eliminate the low frequency deviations and high frequency noise. Lower high-cutoff frequencies may be used (Orhan et al., 2016). In order to capture the ERP and FRP, while omitting the possible motor reposes (Moghadamfalahi et al., 2015), EEG from a time window of [0, 500) ms after each flash's onset is processed as the corresponding raw data for each trial. To further preprocess after filtering, the EEG data for each channel are first down-sampled by 2 and projected to a lower dimensional space using principal component analysis (PCA), and finally data from every channel is concatenated to form the feature vector y i j for trial ith, of type j in response to a trial, as we defined in Equation (6). More specifically, y i p represents FRP evidence for the prospective symbol trial ith; and y i c represents ERP evidence for the query trial ith. After pre-processing, where v i j [n] is the multivariate measurement collected from channel n. Note that here N ch = 16 is the number of channels and N t is the number of time samples for each channel after applying PCA.
We then perform a quadratic projection of these feature vectors on to a one dimensional space so that it maximizes the separation between two possible classes of non-target and target. This projection is obtained as the log-likelihood ratio of two multivariate normal density functions estimated using regularized discriminant analysis (RDA) over target and nontarget classes. e i j (a i c ) and e i j (a i p ), are the one dimensional ERP and FRP evidences, respectively. We estimate the class conditional distributions of p e c (a)|1 , p e c (a)|0 over the ERP evidences; and p e p (a)|1 , p e p (a)|0 over the FRP evidences, using kernel density estimation (KDE). We employ Gaussian kernel with a bandwidth computed using the Silverman's rule from the recorded labeled data (Silverman, 1986). Note that these distributions are computed after collecting data in a calibration session. Then, the estimated densities are used in test sessions.
Recall that the EEG (ERP and FRP) evidence and language model prior are fused using the assumptions of the graphical model presented in Figure 2 to obtain the posterior probability mass function (PMF). The posterior probabilities is then used in MAP inference framework to make a joint decision as described in section 2.

Experiment Design
All users participate in three copy phrase tasks, each task being performed on a separate day. In each day, the user performs the task pursuing one of [P1], [P2], and [P3] paradigms. The order of the paradigms are randomly assigned to the users to avoid the learning impact on the typing performance.
A copy phrase task includes typing the following ten different phrases.
1. THE DOG "WILL" BITE YOU, 2. GO TO "THE" MOVIES, 3. GOOD HEALTH "CARE" IS CRUCIAL, 4. SUPER "BOWL" SUNDAY, 5. EAT THREE TIMES A "DAY, " 6. THE THIRD "SEAT" FROM THE LEFT, 7. MY PARENTS "FIND" ME FUNNY, 8. SHE ALSO "PAID" FOR LUNCH, 9. SOMETHING THAT "BUYS" US TIME, 10.THE COMPOSER "SITS" QUIETLY, Each phrase includes a missing word and the users are asked to complete these words. Here, the target words are written in bold. The entire sentence is shown to the user before each phrase is being typed. We use different phrases with different difficulty levels in terms of prior probability provided by the language model. For instance, the words such as "THE" or "WILL" are very easy to type because their initial letters are very likely based on the LM prior. However, the words such as "PAID" or "BUYS" are very difficult to type. Figure 5 demonstrates an example of a user performing the copy phrase task.
Prior to each copy phrase task all participants perform two calibration tasks: calibration ERP and calibration FRP . Calibration ERP is used to learn the statistics of the ERP classifier (target vs. non target), using the calibration mode of the system to record labeled EEG data. Typically, each calibration ERP session consists of 100 sequences of symbols. Before each sequence, the user is asked to attend to a particular symbol. Then a sequence consisting of the target symbol and 9 other non-target symbols is presented to the user in a random order. Calibration FRP is used to learn the statistics of the FRP classifier (correct vs. non FIGURE 5 | Copy phrase task performed on EEG-based BCI using RSVP Keyboard TM paradigm. The user is asked to type WILL. correct) using the copy mode of the system to record labeled EEG data. To obtain compatible evidence, we simulated [P2] and [P3] paradigms to collect supervised FRP EEG data.
During calibration FRP , we modify the LM probabilities, in order to record enough labeled data for correct and non correct classes. Users are asked to rest between calibrations and copy phrase tasks and continue once they felt ready.
The length of each trial is 500 ms for all paradigms, there are 10 trials in one ERP sequence and 1 trial (i.e., the prospect symbol followed by a question mark) in one FRP sequence. A c (t) is selected based on the posterior probability (fusion of evidence + LM). In [P1], the trial symbol is shown for 150 ms followed by a 50 ms blank screen (i.e., the inter-trial interval). The interval between successive sequences is 500 ms. In [P2] and [P3], after ERP evidence is collected, the trial symbol is shown for 0.9 s followed by a 0.1s blank screen for the FRP evidence. Decision symbol is shown for 2 s. The maximum number of sequences allowed in an epoch is 100 for calibration tasks (Simply because during calibration, we have a single combined epoch and we do not make decisions). In copy phrase tasks, the maximum number of sequences allowed in an epoch is 8 (that is, a decision is made after max 8 sequences in an epoch). In paradigm [P3], the posterior probability for showing the prospect symbol is set as α p = 0.66. Note that, we do not employ an α p during paradigm [P2] because we already show the prospect symbol after every ERP sequence. For all three paradigms ([P1], [P2], and [P3]), the posterior probability threshold for decision is set as α d = 0.9.

ANALYSIS RESULTS
Using the data collected in the human-in-the-loop copy phrase and calibration experiments described in section 3, we report the effect of the three evidence acquisition paradigms:

Human-in-the-Loop Calibration Experiment Results
Using the supervised data collected during the calibration FRP , we first analyze the average EEG recorded in response to correct and incorrect feedback for the two evidence acquisition, [P2]    We then compare classification accuracies across different acquisition paradigms by employing AUC values as the measure of EEG evidence classification accuracy. In particular, using the calibration data obtained in the Human-in-the-loop calibration experiment (calibration ERP and calibration FRP ) as described in section 3.2, we compare the offline target vs. non-target stimuli and correct vs. incorrect feedback classification results for the three data acquisition paradigms, [P1], [P2], and [P3]. Figure 8A compares the areas under the receiver operating characteristics curves (AUCs) for the FRP evidences of each user in different acquisition paradigms, [P2] and [P3]. Similarly, Figure 8B shows the ERP classification AUCs for each user for different acquisition paradigms, [P1], [P2], and [P3]. AUC values are calculated based on the cross validation of the classifier's performance on the training (calibration) data sets. In 10 out of 12 users tested, the classification AUC for paradigm [P2] is larger than [P3], as observed in Figure 8A. This can be a result of the experiment [P2] being more controlled. In other words, since each RSVP sequence is appended with a prospect symbol in paradigm [P2], the user always knows when the feedback is going to be presented in [P2] as opposed to [P3]. Comparing the calibration results from Figures 8A,B we can see that for most users, the ERP calibration results have higher AUCs compared to the FRP classification. This difference in the classification AUCs can be due to the fact that the number of observations that we collect during ERP calibration is higher than the number of observations that we can collect during FRP calibration.

Human-in-the-Loop Copy Phrase Experiment Results
Using the data collected during three copy phrase tasks, we analyze the typing performance for the three evidence acquisition paradigms. As explained in section 3.2, each copy phrase task includes typing ten different phrases with different difficulty levels. Table 1 shows the typing accuracy performance of the three evidence acquisition paradigms for all users in terms of two measures: accuracy in typing a letter correctly (ATL), which is the total number of correctly typed letters divided by the total number of typed letters; and probability of the phrase completion (PPC) which is the total number of correctly typed phrases divided by the total number of phrases. We observe that both [P2] and [P3] paradigms improve the typing accuracy performance compared to [P1]. As shown in Table 1, none of the users are able to complete the 10 copy phrase tasks correctly using [P1]. A paired t-test is also performed on ATLs to compare the typing accuracies among different paradigms across 12 users. In most EEG-based BCI systems, signal recorded from multiple channels along the scalp is assumed to be a Gaussian process with an unknown covariance and mean (Gonzalez-Navarro et al., 2016b). Assuming the Gaussianity of the recorded signal, we believe that applying t-testing is plausible. The result is presented in Table 3. From Table 3 Here, we use information transfer rate (ITR) (Obermaier et al., 2001) as another performance measure. ITR summarizes the accuracy and speed into a single metric and it is commonly used to measure BCI performance. Figure 9 illustrates the ITR (bits/sequence) values for all subjects; and Table 2 reports the mean of the ITR values among 12 subjects for the three strategies. From Figure 9 and Human-in-the-loop copy phrase experiment results in Figure 9 and Table 1 show that the proposed strategies [P2] and [P3] outperform the strategy [P1] in terms of accuracy (with [P2] leading the race); and result in significant improvements in both speed and accuracy when compared to [P1]. We believe that improving not only accuracy but also speed is highly desired for BCI systems that are designed for real-life applications.
Finally, using online copy phrase and calibration results, we report ITR as a function of AUC obtained from the FRP and  Compared to the benchmark BCI spellers which rely on visually evoked potentials (VEPs) such as SSVEPs, our ERP/ErrP based BCI speller has a slight advantage in accuracy (Liu et al., 2020), (Wong et al., 2020). Compared to the vision-independent BCI paradigms which rely on ERP elicitation via auditory and tactile stimulation, our visually evoked ERP/LM/FRP fusion BCIspeller has a significant advantage in ITR, and the accuracies we obtain with paradigms [P2] and [P3] compete with stateof-the-art P300 BCIs in the literature (Eidel and Kübler, 2020), (Kawala-Sterniuk et al., 2021).

CONCLUSIONS
In this manuscript, we compared three different Bayesian inference frameworks that tightly fuses context information and different EEG evidences to be used in intent inference engines of EEG-based brain computer interfaces. In particular, we study the potential benefits of fusing FRP, ERP, and language evidence using probabilistic generative models for a speller BCI. Based on the human-in-the-loop (copy phrase and calibration) experiments with 12 healthy participants using RSVP Keyboard TM , three strategies are compared: [P1]-Baseline, which only fuses ERP/LM evidence; [P2]-AlwaysFRP, where each RSVP sequence is followed by an FRP trial using the top candidate in the alphabet according to posterior after ERP/LM evidence fusion; [P3]-ConfirmFRP, where the top candidate is shown as a prospect to generate FRP evidence only if its posterior exceeds a threshold, possibly after multiple ERP-evidence acquisition sequences.
We performed several analyses on the Human-in-the-loop copy phrase experiment results, which are: (i) accuracy (in the form of AUC, ATL, and PPC), (ii) speed (in the form of ITR), (iii) Information Transfer Rate (ITR) (bits/sequence). Our results show that by using enough FRP evidence in addition to ERP evidence and language model (LM), the typing speed could be increased compared to a model that does not use FRP evidence.
Overall, both proposed strategies [P2] and [P3], which utilize FRP evidence outperform [P1] in terms of accuracy. Moreover, [P2] yields significant speed and accuracy and, therefore, ITR improvements compared to [P1] and also performs better compared to [P3]. These results could be due to the fact that for [P3] we do not collect enough FRP evidence during copyphrase tasks, and that [P2] causes less mental fatigue due to its deterministic presentation method. We think that, for a Brain-Computer Interface which is designed to be used daily, it is crucial to improve the speed as well as the accuracy. Our results suggest that, probabilistic fusion of the FRP evidence can bring the true performance of a BCI one step closer to the objective.
According to the results, BCI users can benefit from the fusion of the FRP evidence to the decision making, if there are enough FRP evidences. Based on the analyses, we propose a BCI typing system capable of employing multiple evidence acquisition paradigms. This system, after individual assessments, will be able to determine the most profitable evidence presentation/inference paradigm as per user preference, capabilities, and EEG signal statistics.
We demonstrate theoretically that probing the users intent with FRP-acquisition using the current top candidate is an optimal strategy in an active learning framework employing the independent-trial-EEG-evidence assumption paradigm. This approach constitutes an improvement over previous literature employing ERP paradigms alone. In earlier work, we demonstrated that showing the top letters according to the current posterior in a sequence for ERP evidence acquisition is similarly optimal under the same independence assumption (Moghadamfalahi et al., 2015). Therefore, under the independent-trial-EEG-evidence model, the best strategy is to repeat the following until a decision is confidently made: show the top candidate, gather EEG evidence, and update the posterior. Clearly the independence assumption is incorrect, if not for the auto-correlation of EEG time series, due to the overlapping time windows that are used for trial-EEG-evidence extraction. Consequently, in an improved ERP/FRP/LM fusion framework that can be designed in the future, the following issues need to be considered more carefully: (1) a signal model that captures the temporal dependency of EEG features extracted for each trial, (2) the temporal cost of gathering a sequence-worth of ERP evidence vs. FRP evidence by showing the current top prospect. Therefore, in future work, we plan to address these issues and develop an ERP/FRP/LM fusion mechanism for BCI spellers that will dynamically decide whether to gather more ERP evidence, more FRP evidence, or neither during intent inference. The inference framework does not strictly rely on EEG evidence, therefore, we will also explore multi-modal physiological evidence fusion using signal sources such as EMG or eye-gaze trajectories.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Northeastern University Institutional Review Board. The patients/participants provided their written informed consent to participate in this study.