Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Neurosci.

Sec. Auditory Cognitive Neuroscience

This article is part of the Research TopicFactors impacting outcomes in adult cochlear implant usersView all 5 articles

Speech and Music Source Separation for Cochlear Implant users: Front-end and End-to-end Approach

Provisionally accepted
Sina  TahmasebiSina Tahmasebi1*Waldo  NogueiraWaldo Nogueira1,2*
  • 1Hannover Medical School, Hanover, Germany
  • 2Universitat Autonoma de Barcelona, Barcelona, Spain

The final, formatted version of the article will be published soon.

A cochlear implant (CI) is a surgically implanted neuroprosthetic device designed to restore auditory perception in individuals with profound sensorineural hearing loss. While CI users generally demonstrate good speech intelligibility in quiet listening environments, their performance significantly declines in the presence of competing sound sources. Moreover, music perception and appreciation remain limited for many CI users. These limitations are largely attributed to the inadequate representation of pitch information, which is critical for both music and speech stream segregation in complex auditory scenes. To address these challenges, source separation techniques have been increasingly employed to enhance target speech and isolate singing voices in music. Previous research has shown that CI users report greater music enjoyment when vocals are enhanced relative to the accompanying background instrumentation. Building on this, recent studies have leveraged deep neural networks (DNNs) as both front-end and end-to-end modules to improve speech intelligibility and music enjoyment for CI users. In the present study, we compare front-end and end-to-end DNN-based source separation approaches for two tasks: speech masked by competing speech and singing music. All implemented pipelines were first evaluated using objective instrumental metrics. Based on these results, the models were subsequently assessed in a listening experiment involving nine bilateral CI users. While the end-to-end pipeline outperformed the front-end pipeline in speech understanding tasks, the front-end approach yielded higher scores in music appreciation questionnaires. These findings support the hypothesis that CI sound coding strategies can be effectively combined with DNN-based source separation models. Furthermore, we hypothesize that the limited performance of end-to-end music source separation in enhancing music perception for CI users may be due to the absence of a dedicated sound coding strategy tailored for instrumental music.

Keywords: Cochlear Implants, DNNs, End-to-end source separation, Singing music, Speech

Received: 01 Sep 2025; Accepted: 18 Dec 2025.

Copyright: © 2025 Tahmasebi and Nogueira. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence:
Sina Tahmasebi
Waldo Nogueira

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.