AUTHOR=La Fisca Luca , Vandenbulcke Virginie , Wauthia Erika , Miceli Aurélie , Simoes Loureiro Isabelle , Ris Laurence , Lefebvre Laurent , Gosselin Bernard , Pernet Cyril R. TITLE=Biases in BCI experiments: Do we really need to balance stimulus properties across categories? JOURNAL=Frontiers in Computational Neuroscience VOLUME=Volume 16 - 2022 YEAR=2022 URL=https://www.frontiersin.org/journals/computational-neuroscience/articles/10.3389/fncom.2022.900571 DOI=10.3389/fncom.2022.900571 ISSN=1662-5188 ABSTRACT=Brain Computer Interfaces (BCIs) consist in an interaction between humans and computers with a specific mean of communication, such as voice, gestures, or even brain signals that are usually recorded by an Electroencephalogram (EEG). To ensure an optimal interaction, the BCI algorithm typically involves the classification of the input signals into predefined task-specific categories. However, a recurrent problem is that the classifier can easily be biased by uncontrolled experimental conditions, namely covariates, that are unbalanced across the categories. This issue led to the current solution of forcing the balance of these covariates across the different categories which is time consuming and drastically decreases the dataset diversity. The purpose of this research is to evaluate the need for this forced balance in BCI experiments involving EEG data. A typical design of neural BCIs involves repeated experimental trials using visual stimuli to trigger the so-called Event-Related Potential ( ERP ). The classifier is expected to learn spatio- temporal patterns specific to categories rather than patterns related to uncontrolled stimulus properties, such as psycho-linguistic variables (e.g., phoneme number, familiarity and age of acquisition) and image properties (e.g., contrast, compactness and homogeneity). The challenges are then to know how biased the decision is, which features affect the classification the most, which part of the signal is impacted and what is the probability to perform neural categorization per se. To address these problems, this research has two main objectives: 1) modeling and quantifying the covariate effects to identify spatio-temporal regions of the EEG allowing maximal classification performance while minimizing the biasing effect, and 2) evaluating the need to balance the covariates across categories when studying brain mechanisms. To solve the modeling problem, we propose using a linear parametric analysis applied to some observable and commonly studied covariates to identify the part of the EEG signal related to them. The biasing effect is quantified by comparing the regions highly influenced by the covariates with the regions of high categorical contrast, i.e. parts of the ERP allowing a reliable classification.