AUTHOR=Alonso-Vázquez Denise , Mendoza-Montoya Omar , Caraza Ricardo , Martinez Hector R. , Antelis Javier M. TITLE=From pronounced to imagined: improving speech decoding with multi-condition EEG data JOURNAL=Frontiers in Neuroinformatics VOLUME=Volume 19 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/neuroinformatics/articles/10.3389/fninf.2025.1583428 DOI=10.3389/fninf.2025.1583428 ISSN=1662-5196 ABSTRACT=IntroductionImagined speech decoding using EEG holds promising applications for individuals with motor neuron diseases, although its performance remains limited due to small dataset sizes and the absence of sensory feedback. Here, we investigated whether incorporating EEG data from overt (pronounced) speech could enhance imagined speech classification.MethodsOur approach systematically compares four classification scenarios by modifying the training dataset: intra-subject (using only imagined speech, combining overt and imagined speech, and using only overt speech) and multi-subject (combining overt speech data from different participants with the imagined speech of the target participant). We implemented all scenarios using the convolutional neural network EEGNet. To this end, twenty-four healthy participants pronounced and imagined five Spanish words.ResultsIn binary word-pair classifications, combining overt and imagined speech data in the intra-subject scenario led to accuracy improvements of 3%–5.17% in four out of 10 word pairs, compared to training with imagined speech only. Although the highest individual accuracy (95%) was achieved with imagined speech alone, the inclusion of overt speech data allowed more participants to surpass 70% accuracy, increasing from 10 (imagined only) to 15 participants. In the intra-subject multi-class scenario, combining overt and imagined speech did not yield statistically significant improvements over using imagined speech exclusively.DiscussionFinally, we observed that features such as word length, phonological complexity, and frequency of use contributed to higher discriminability between certain imagined word pairs. These findings suggest that incorporating overt speech data can improve imagined speech decoding in individualized models, offering a feasible strategy to support the early adoption of brain-computer interfaces before speech deterioration occurs in individuals with motor neuron diseases.