ORIGINAL RESEARCH article
Front. Neurosci.
Sec. Auditory Cognitive Neuroscience
This article is part of the Research TopicFactors impacting outcomes in adult cochlear implant usersView all 5 articles
Speech and Music Source Separation for Cochlear Implant users: Front-end and End-to-end Approach
Provisionally accepted- 1Hannover Medical School, Hanover, Germany
- 2Universitat Autonoma de Barcelona, Barcelona, Spain
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
A cochlear implant (CI) is a surgically implanted neuroprosthetic device designed to restore auditory perception in individuals with profound sensorineural hearing loss. While CI users generally demonstrate good speech intelligibility in quiet listening environments, their performance significantly declines in the presence of competing sound sources. Moreover, music perception and appreciation remain limited for many CI users. These limitations are largely attributed to the inadequate representation of pitch information, which is critical for both music and speech stream segregation in complex auditory scenes. To address these challenges, source separation techniques have been increasingly employed to enhance target speech and isolate singing voices in music. Previous research has shown that CI users report greater music enjoyment when vocals are enhanced relative to the accompanying background instrumentation. Building on this, recent studies have leveraged deep neural networks (DNNs) as both front-end and end-to-end modules to improve speech intelligibility and music enjoyment for CI users. In the present study, we compare front-end and end-to-end DNN-based source separation approaches for two tasks: speech masked by competing speech and singing music. All implemented pipelines were first evaluated using objective instrumental metrics. Based on these results, the models were subsequently assessed in a listening experiment involving nine bilateral CI users. While the end-to-end pipeline outperformed the front-end pipeline in speech understanding tasks, the front-end approach yielded higher scores in music appreciation questionnaires. These findings support the hypothesis that CI sound coding strategies can be effectively combined with DNN-based source separation models. Furthermore, we hypothesize that the limited performance of end-to-end music source separation in enhancing music perception for CI users may be due to the absence of a dedicated sound coding strategy tailored for instrumental music.
Keywords: Cochlear Implants, DNNs, End-to-end source separation, Singing music, Speech
Received: 01 Sep 2025; Accepted: 18 Dec 2025.
Copyright: © 2025 Tahmasebi and Nogueira. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence:
Sina Tahmasebi
Waldo Nogueira
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.
