AUTHOR=Gwon Daeun , Won Kyungho , Song Minseok , Nam Chang S. , Jun Sung Chan , Ahn Minkyu TITLE=Review of public motor imagery and execution datasets in brain-computer interfaces JOURNAL=Frontiers in Human Neuroscience VOLUME=Volume 17 - 2023 YEAR=2023 URL=https://www.frontiersin.org/journals/human-neuroscience/articles/10.3389/fnhum.2023.1134869 DOI=10.3389/fnhum.2023.1134869 ISSN=1662-5161 ABSTRACT=Motor imagery is one of the major control paradigms in the brain-computer interfaces (BCIs) field and many datasets related to motor tasks are open to the public. However, to our best knowledge, none of these studies investigated the public datasets and evaluated them, although data quality is an important issue for reliable results and designing subject or system-independent BCIs. In this study, we conducted a thorough investigation of the motor imagery/execution datasets published over the past 13 years. The 25 datasets were collected from six repositories, and we did a meta-analysis of all of them. In particular, we reviewed the specifications of the recording settings and experimental design, and we evaluated the data quality measured by classification accuracy from standard algorithms like Common Spatial Pattern (CSP) and Linear Discriminant Analysis (LDA) for comparison and compatibility across the datasets. As a result, we identified that the stimuli type used in each dataset varies, and one trial lasts for 9.8s (minimum 2.5s to maximum 29 s) on average. Each trial normally consists of multiple sections: the pre-rest (2.38s), the imagination ready (1.64s), the imagination (4.26s, ranging from 1s to 10s), and the post-rest (3.38s). In a meta-analysis of the total of 850 sessions (or subjects), the average classification accuracy of the two-class (left-hand versus right-hand motor imagery) problem was 66.53%, and we obtained 36.27% as the population rate of the BCI illiterates who are unable to reach proficiency in using a BCI system according to the estimated accuracy distribution. Also, we analyzed CSP features and found that each dataset forms a cluster and some datasets overlap in the feature space, meaning a higher similarity among the datasets. Finally, we checked the minimal essential information (continuous signals, event type/latency, and channel information) that should be included in the datasets for convenient use, and we found that only 71% of the datasets met the conditions. We think that our attempts to evaluate and compare the public datasets are timely and that these results will contribute to understanding the quality and recording settings of the datasets as well as using public datasets for future BCIs.