AUTHOR=Borra Davide , Fantozzi Silvia , Magosso Elisa TITLE=A Lightweight Multi-Scale Convolutional Neural Network for P300 Decoding: Analysis of Training Strategies and Uncovering of Network Decision JOURNAL=Frontiers in Human Neuroscience VOLUME=Volume 15 - 2021 YEAR=2021 URL=https://www.frontiersin.org/journals/human-neuroscience/articles/10.3389/fnhum.2021.655840 DOI=10.3389/fnhum.2021.655840 ISSN=1662-5161 ABSTRACT=Convolutional neural networks (CNNs), which automatically learn features from raw data to approximate functions, are being increasingly applied to end-to-end analysis of electroencephalographic (EEG) signals, especially for decoding brain states in Brain-Computer Interfaces (BCIs). Nevertheless, CNNs introduce a large number of trainable parameters, may require long training times and lack in interpretability of learned features. The aim of this study is to propose a CNN design for P300 decoding with emphasis on its lightweight design while guaranteeing high performance, on the effects of different training strategies and on the use of post-hoc techniques to explain network decision. The proposed design, named MS-EEGNet, learns temporal features at two different time scales (i.e. multi-scale, MS) in an efficient and optimized (in terms of trainable parameters) way, and was validated on three P300 datasets. The CNN was trained using different strategies (within-participant and within-session, within-participant and cross-session, leave-one-subject-out, transfer learning) and was compared with several state-of-the-art (SOA) algorithms. Furthermore, variants of the baseline MS-EEGNet were analysed, to evaluate the impact of different hyper-parameters on the performance. Lastly, saliency maps were used to derive representations of the relevant spatio-temporal features that drove CNN decisions. MS-EEGNet resulted the lightest CNN compared to the tested SOA CNNs, despite its multiple time scales, and significantly outperformed SOA algorithms. The post-hoc hyper-parameter analysis confirmed the benefits of the innovative aspects of our architecture. Furthermore, MS-EEGNet did benefit from transfer learning, especially using a low number of training examples, suggesting that our approach could be used in BCIs to accurately decode the P300 event while reducing calibration times. Representations derived from saliency maps matched the P300 spatio-temporal distribution, further validating the proposed decoding approach. The present study, by specifically addressing the aspects of lightweight design, transfer learning, interpretability, can contribute to advance the development of deep learning algorithms for P300-based BCIs.