METHODS article
Front. Psychiatry
Sec. Computational Psychiatry
Volume 16 - 2025 | doi: 10.3389/fpsyt.2025.1451368
This article is part of the Research TopicMental Health in the Age of Artificial IntelligenceView all 5 articles
A Software Pipeline for Systematizing Machine Learning of Speech Data
Provisionally accepted- 1University of Alberta, Edmonton, Alberta, Canada
- 2University of Oviedo, Oviedo, Asturias, Spain
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
The reproducibility and replicability of experimental findings is an essential element of the scientific process. The machine-learning community has a long-established practice of sharing data sets so that researchers can report the performance of their models on the same data. In the area of speech analysis, and more specifically speech of individuals with mental health and neurocognitive conditions, a number of such data sets exist and are the subject of organized "challenge tasks". However, as the complexity of the available relevant software libraries and their parameters increases, we argue that researchers should not only share their data but also their processing and machine learning configurations so that their experiments may be fully reproduced. This is why we have designed and developed a suite of configurable software pipelines with Python Luigi for speech-data preprocessing, feature extraction, fold construction for cross-validation, machine learning training, and label prediction. These components rely on state-of-the-art software libraries, frequently used by researchers, and implement many typical tasks in this field, i.e., scikit-learn, openSMILE, LogMMSE, so that, given the configuration parameters of each task, any underlying experiments can be readily reproduced. We have evaluated our platform by replicating three different machine learning studies, with the aim of detecting depression, mild cognitive impairment, and aphasia from speech data.
Keywords: Speech analysis, digital mental health, Depression, Dementia, Aphasia, machine learning for speech audio, software pipeline for speech signal processing
Received: 19 Jun 2024; Accepted: 02 Jun 2025.
Copyright: © 2025 Celeste, Tasnim, Vald És Cuervo, De La Cal and Stroulia. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence:
Jimuel Jr. Celeste, University of Alberta, Edmonton, T6G 2R3, Alberta, Canada
Mashrura Tasnim, University of Alberta, Edmonton, T6G 2R3, Alberta, Canada
Amable J Vald És Cuervo, University of Oviedo, Oviedo, 33003, Asturias, Spain
Enrique A De La Cal, University of Oviedo, Oviedo, 33003, Asturias, Spain
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.