ORIGINAL RESEARCH article

Front. Neuroergonomics

Sec. Neurotechnology and Systems Neuroergonomics

Volume 6 - 2025 | doi: 10.3389/fnrgo.2025.1582724

This article is part of the Research TopicInsights from the 5th International Neuroergonomics ConferenceView all 8 articles

The Impact of Cross-Validation Choices on pBCI Classification Metrics: Lessons for Transparent Reporting

Provisionally accepted
  • 1Liverpool John Moores University, Liverpool, United Kingdom
  • 2Institut Supérieur de l'Aéronautique et de l'Espace (ISAE-SUPAERO), Toulouse, Occitanie, France
  • 3Defence Science and Technology Laboratory, Salisbury, United Kingdom

The final, formatted version of the article will be published soon.

Neuroadaptive technologies are a type of passive Brain-computer interface (pBCI) that aim to incorporate implicit user-state information into human-machine interactions by monitoring neurophysiological signals. Evaluating machine learning and signal processing approaches represents a core aspect of research into neuroadaptive technologies. These evaluations are often conducted under controlled laboratory settings and offline, where exhaustive analyses are possible. However, the manner in which classifiers are evaluated offline has been shown to impact reported accuracy levels, possibly biasing conclusions. In the current study, we investigated one of these sources of bias, the choice of cross-validation scheme, which is often not reported in sufficient detail. Across three independent electroencephalography (EEG) n-back datasets and 74 participants, we show how metrics and conclusions based on the same data can diverge with different cross-validation choices. A comparison of cross-validation schemes in which train and test subset boundaries either respect the block-structure of the data collection or not, illustrated how the relative performance of classifiers varies significantly with the evaluation method used. By computing bootstrapped 95% confidence intervals of differences across datasets, we showed that classification accuracies of Riemannian minimum distance (RMDM) classifiers may differ by up to 12.7% while those of a Filter Bank Common Spatial Pattern (FBCSP) based linear discriminant analysis (LDA) may differ by up to 30.4%. These differences across cross-validation implementations may impact the conclusions presented in research papers, which can complicate efforts to foster reproducibility. Our results exemplify why detailed reporting on data splitting procedures should become common practice.

Keywords: passive brain-computer interfaces, pBCI, Electroencephalography, EEG, crossvalidation, non-stationarity, Workload

Received: 24 Feb 2025; Accepted: 28 May 2025.

Copyright: © 2025 Schroeder, Fairclough, Dehais and Richins. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Felix Schroeder, Liverpool John Moores University, Liverpool, United Kingdom

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.