AUTHOR=Siegert Ingo , Weißkirchen Norman , Wendemuth Andreas 

TITLE=Acoustic-Based Automatic Addressee Detection for Technical Systems: A Review

JOURNAL=Frontiers in Computer Science

VOLUME=Volume 4 - 2022

YEAR=2022

URL=https://www.frontiersin.org/journals/computer-science/articles/10.3389/fcomp.2022.831784

DOI=10.3389/fcomp.2022.831784

ISSN=2624-9898

ABSTRACT=Objective: Acoustic addressee detection is a challenge that arises in human group interactions, as well as in interactions with technical systems. Especially due to the growth of usage of voice assistants this topic became emergent. To allow a natural interaction on the same level as human interactions, many studies focused on the acoustic analyses of speech.
Methods: The survey followed the Preferred Reporting Items for Systematic reviews and Meta-Analysis (PRISMA) guidelines. We included all studies which were analyzing acoustic and/or prosodic characteristics of speech utterances to automatically detect the addressee. For each study, we describe the used dataset, feature set, classification architecture, performance, and other relevant findings.
Results: 1581 studies were screened, of which 23 studies met the inclusion criteria. The majority of studies utilized German or English speech corpora. 26% of the studies were tested on in-house datasets, where only rare information is available. Nearly 40% of the studies employed hand-crafted feature sets, the other studies mostly rely on Interspeech ComParE 2013 feature set or Log-FilterBank Energy and Log Energy of Short-Time Fourier Transform features. 12 out of 23 studies used deep-learning approaches, the other 11 studies used classical machine learning methods. 9 out of 23 studies furthermore employed a classifier fusion.
Conclusion: Speech-based automatic addressee detection is a relatively new research domain. Especially by using vast amounts of material or sophisticated models, device-directed speech is distinguished from non-device-directed speech . Furthermore, a clear distinction into in-house datasets and pre-existing ones can be drawn and a clear trend towards pre-defined larger feature sets (with partly used feature selection methods) is apparent.