Abstract
We introduce a novel audio processing architecture, the Open Voice Brain Model (OVBM), improving detection accuracy for Alzheimer's (AD) longitudinal discrimination from spontaneous speech. We also outline the OVBM design methodology leading us to such architecture, which in general can incorporate multimodal biomarkers and target simultaneously several diseases and other AI tasks. Key in our methodology is the use of multiple biomarkers complementing each other, and when two of them uniquely identify different subjects in a target disease we say they are orthogonal. We illustrate the OBVM design methodology by introducing sixteen biomarkers, three of which are orthogonal, demonstrating simultaneous above state-of-the-art discrimination for two apparently unrelated diseases such as AD and COVID-19. Depending on the context, throughout the paper we use OVBM indistinctly to refer to the specific architecture or to the broader design methodology. Inspired by research conducted at the MIT Center for Brain Minds and Machines (CBMM), OVBM combines biomarker implementations of the four modules of intelligence: The brain OS chunks and overlaps audio samples and aggregates biomarker features from the sensory stream and cognitive core creating a multi-modal graph neural network of symbolic compositional models for the target task. In this paper we apply the OVBM design methodology to the automated diagnostic of Alzheimer's Dementia (AD) patients, achieving above state-of-the-art accuracy of 93.8% using only raw audio, while extracting a personalized subject saliency map designed to longitudinally track relative disease progression using multiple biomarkers, 16 in the reported AD task. The ultimate aim is to help medical practice by detecting onset and treatment impact so that intervention options can be longitudinally tested. Using the OBVM design methodology, we introduce a novel lung and respiratory tract biomarker created using 200,000+ cough samples to pre-train a model discriminating cough cultural origin. Transfer Learning is subsequently used to incorporate features from this model into various other biomarker-based OVBM architectures. This biomarker yields consistent improvements in AD detection in all the starting OBVM biomarker architecture combinations we tried. This cough dataset sets a new benchmark as the largest audio health dataset with 30,000+ subjects participating in April 2020, demonstrating for the first time cough cultural bias.
1. Introduction
Since 2001, the overall mortality for Alzheimer's Dementia (AD) has been increasing year-on-year. Between 2000 and 2020 deaths resulting from stroke, HIV and heart disease decreased while reported deaths from AD increased by about 150% (Alzheimer's Association, 2020). Currently no treatments are available to cure AD, however, if detected early on, treatments may greatly slow and eventually possibly even halt further deterioration (Briggs et al., 2016).
Currently, methods for diagnosing AD often include neuroimaging such as MRI (Fuller et al., 2019), PET scans of the brain (Ding et al., 2019), or invasive lumbar puncture to test cerebrospinal fluid (Shaw et al., 2009). These diagnostics are far too expensive for large-scale testing and are usually used once family members or personal care detect late-stage symptoms, when the disease is too advanced for onset treatment. On top of the throughput limitations, recent studies on the success of the most widely used form of diagnostic, PET amyloid brain scans, have shown expert doctors in AD currently misdiagnose patients in about 83% of cases and change their management and treatment of patients nearly 70% of the time (James et al., 2020). This is mainly caused by the lack of longitudinal explainability of these scans. As a result it is hard to track effectiveness of treatments and even more to evaluate personalized treatments tailored to specific on-set populations of AD (Maclin et al., 2019). AI in general suffers from similar issues and operates a bit as a black-box, and does not offer explainable results linked to specific causes of each individual subject (Holzinger et al., 2019).
Based on the above findings, our research aims to find AD diagnostic methods achieving the following four warrants:
Onset Detection: detection needs to occur as soon as the first signs emerge, or sooner even if only probabilistic metrics can be provided. Preclinical AD diagnosis and subsequent treatment may offer the best chances at delaying the effects of dementia (Briggs et al., 2016). Therapeutic significance may require establishing subclassifications within AD (Briggs et al., 2016). Evidence that there are early signs of AD onset in the human body come in the form of recent research on blood plasma phosphorylated-tau isoforms diagnostic biomarkers demonstrating chemical traces of dementia, and of AD in particular, decades in advance of clinical diagnosis (Barthélemy et al., 2020; Palmqvist et al., 2020). These are encouraging findings, and hopefully there are also early onset signs in free-speech audio signals. In fact, preclinical AD is often linked to mood changes and in cognitively normal adults onset AD includes depression (Babulal et al., 2016), while apathy and anxiety have been linked to some cognitive decline (Bidzan and Bidzan, 2014). Both of these may be detectable in preclinical AD using existing sentiment analysis techniques (Zunic et al., 2020).
Minimal Cost: we need a method that has very little side effects, so that a person can perform the test periodically, and at very low variable costs to allow broad pre-screening possibilities. Our suggestion is to develop methods that can run on smart speakers and mobile phones (Subirana et al., 2017b) at essentially no cost while respecting user privacy (Subirana et al., 2020a). There is no medically approved system allowing preclinical AD diagnosis at scale. There are different approaches to measure AD disease onset and progression but all rely on expensive human assessments and/or medical procedures. We demonstrate our approach using only free speech but the approach can also include multi-modal data if available including MRI images (Altinkaya et al., 2020) and EEG recordings (Cassani et al., 2018).
Longitudinal tracking: the method should include some form of AD degree metric, especially to evaluate improvements resulting from medical interventions. The finer disease progression increments can be measured, the more useful they'll be. Ideally, adaptive clinical trials would be supported (Coffey and Kairalla, 2008).
Explainability: the results need to have some form of explainability, if possible including the ability to diagnose other types of dementia and health conditions. Most importantly, the approach needs to be approved for broad use by the medical community.
Our approach is enabled by and improves upon advances in deep learning on acoustic signals to detect discriminating features between AD and non-AD subjects—it aims to address the warrants above, including explainability which has been challenging for previous approaches. While research in AD detection from speech has been ongoing for several years most approaches did not surpass the 90% detection mark as shown in Table 1. These approaches use black-box deep learning algorithms providing little to no explainability as to what led the model's decision, making it hard for clinicians to use and hence slowing adoption by the healthcare system. In Petti et al. (2020), review of the literature on AD speech detection, about two thirds of the papers reviewed use Neural Nets or Support Vector Machines, while the rest focus on Decision Trees and Naïve Bayes. Neural Nets seem to achieve the highest detection accuracy on average. Previous work, instead, has very little inspiration on the different stages of human intelligence and at most focuses solely on modeling a small part of the brain as shown in Nassif et al. (2019), de la Fuente Garcia et al. (2020), and Petti et al. (2020).
Table 1
| References | Date | Accuracy(%) |
|---|---|---|
| Syed et al. (2020) | 2020 | 85.4 |
| Haulcy and Glass (2021) | 2021 | 85.4 |
| Orimaye et al. (2014) | 2016 | 87.5 |
| Yuan et al. (2020) | 2020 | 89.6 |
| Karlekar et al. (2018) | 2018 | 91.1 |
| Laguarta and Subirana | 2021 | 93.8 |
A review of other AD diagnostic algorithms on the same dataset from Lyu (2018).
Our top performing model only uses audios while Orimaye et al. only used 35 patients hence risking high variance. Karlekar et al. only used transcripts. The rest used the transcripts from the ADreSS challenge Luz et al. (2020).
Combining independent biomarkers with recent advances in our understanding of the four modules of the human brain as researched at MIT's Center for Brain Minds and Machines (CBMM) (CBM, 2020), we introduce a novel multi-modal processing framework, the MIT CBMM Open Voice Brain Model (OVBM). The approach described in this paper aims to overcome limitations of previous approaches, firstly by training the model on large speech datasets and using transfer learning so that the accurate learned features improve AD detection accuracy even if the sample of AD patients is not large. Secondly, by providing an explainable output in the form of a saliency chart that may be used to track the evolution of AD biomarkers.
The use of independent biomarkers in the CBMM Open Voice Brain Model enables researching what is the value of each of them, simply by contrasting results with and without one of the biomarkers—we illustrate this point with a biomarker focused on cough discrimination (Subirana et al., 2020b) and one focused on wake words (Subirana, 2020). We feel this is an original contribution of our work grounded on the connection between respiratory conditions and Alzheimer's.
Furthermore, we also show that our framework lets apply the same biomarker models for audio detection of multiple diseases, and explore whether there may be common biomarkers between AD and other diseases. To that end, the OVBM framework we introduce may be extended to various other tasks such as speech segmentation and transcription. It has already proven to detect COVID-19 from a forced-cough recording with high sensitivity including 100% asymptomatic detection (Laguarta et al., 2020). Here we demonstrate it in the individualized and explainable diagnostic of Alzheimer's Dementia (AD) patients, where, as shown in Table 1 we achieve above state-of-the-art accuracy of 93.8% (Pulido et al., 2020), and using only raw audio as input, while extracting for each subject a saliency map with the relative disease progression of 16 biomarkers. Even with expensive CT scans, to date experts can not create consistent biomarkers as described in James et al. (2020), Henriksen et al. (2014), and Morsy and Trippier (2019) even when including emotional biomarkers, unlike our approach which automatically develops them from free speech. Experts point at this lack of biomarkers as the reason why no new drug has been introduced in the last 16 years despite AD (Zetterberg, 2019) being the sixth leading cause of death in the United States (Alzheimer's Association, 2019), and one of the leading unavoidable causes for loss of healthy life.
We found that cough features, in particular, are very useful biomarker enablers as shown in several experiments reported in this paper and that the same biomarkers could be used for COVID-19 and AD detection. Our emphasis on detecting relevant biomarkers corresponding to the different stages of disease onset, led us to build ten sub-models using four datasets. To do so, over 200,000 cough samples were crowd sourced to pre-train a model discriminating English from Catalan coughs, and then transfer learning was leveraged to exploit resulting features by integrating it into an OVBM brain model, showing improvements in AD detection, no matter what transfer learning strategy was used. This COVID-19 cough dataset we created approved by MIT's IRB 2004000133 sets a new benchmark as the largest audio health dataset, with over 30,000 subjects participating in less than four weeks in April 2020.
In the next section we present a literature review with evidence in favor of our choice of four biomarkers. In section 3, we present the different components of the Open Voice Brain Model AD detector, from sections 4 to 7 we introduce the 16 biomarkers with results and a novel personalized AD biomarker comparative saliency map. We conclude in section 8 with a brief summary and implications for future research.
2. Literature Review Supporting Our Choice of Four Sensory Stream Audio Biomarkers: Cough, Wake word, Sentiment, and Memory
Informed by a review of the literature, our choice of biomarkers is consistent with the vast literature resulting from AD research as we discuss next.
2.1. Mood Biomarkers
Preclinical AD is often linked to mood changes. In cognitively normal adults it include depression (Babulal et al., 2016), while apathy and anxiety have been linked to some cognitive decline (Bidzan and Bidzan, 2014). Sentiment biomarker. Clinical evidence supports the importance of sentiments in AD early-diagnosis (Costa et al., 2017; Galvin et al., 2020), and different clinical settings emphasize different sentiments, such as doubt, or frustration (Baldwin and Farias, 2009).
2.2. Memory Biomarkers
One of the main early-stage AD biomarkers is memory loss (Chertkow and Bub, 1990), which occurs both at a conceptual level as well as at a muscular level (Wirths and Bayer, 2008) and is different from memory forgetting in healthy humans (Cano-Cordoba et al., 2017; Subirana et al., 2017a). A prominent symptom of early stage AD is malfunctioning of different parts of memory depending on the particular patient (Small et al., 2000), possibly affecting one or more of its subcomponents including primary or working memory, remote memory, and semantic memory. The underlying causes of these memory symptoms may be linked to neuropathological changes, such as tangles and plaques, initially affecting selected areas of the brain like the hippocampi or the temporal and frontal lobes, and gradually expanding beyond these (Morris and Kopelman, 1986). Memory biomarker.
2.3. Respiratory Tract Biomarkers Cough and Wake Word
The human cough is already used to diagnose several diseases using audio recognition (Abeyratne et al., 2013; Pramono et al., 2016) as it provides information corresponding to biomarkers in the lungs and respiratory tract (Bennett et al., 2010). People with chronic lung disorders are more than twice as likely to have AD (Dodd, 2015), therefore we hypothesize features extracted from a cough classifier could be valuable for AD diagnosis.
There is an extensive cough-based diagnosis research of respiratory diseases but to our knowledge, no one had applied it to discriminate other, apparently unrelated, diseases like Alzheimer's. Our findings are consistent with the notion that AD patients cough differently and that cough-based features can help AD diagnosis; they are also consistent with the notion that cough features may help detect the onset of the disease. The lack of longitudinal datasets prevents us from exploring this point but do allow us to demonstrate the diagnostic power of cough-based features, to the point where without these features we would not have surpassed state-of-the-art performance.
The respiratory tract is often involved in the fatal outcome of AD. We introduce two biomarkers focused on the respiratory tract that may help discriminate between early and late stage AD. We have not found research indicating how early changes in the tract may be detected but given it's importance in the disease outcome it may be early on. This could also explain the success of many speech-based AD discrimination approaches—some of which have been applied to early stages of FTD. Significant research in AD such as Heckman et al. (2017) has proven that the disease impacts motor neurons. In other diseases, like Parkinson's, where motor neurons are affected, vocal cords have proven to be one of the first muscles affected (Holmes et al., 2000).
Dementia in general has been linked to increased deaths from pneumonia (Wise, 2016; Manabe et al., 2019) and COVID-19 (Azarpazhooh et al., 2020; Hariyanto et al., 2020) possibly linked to specific gens (Kuo et al., 2020). COVID-19 deaths are more likely with Alzheimer's than with Parkinson's disease (Yu et al., 2020). This different respiratory response depending on the type of dementia suggests that related audio features, such as coughs, may be useful not only to discriminate dementia subjects from others but also to discriminate specific types of dementia.
We contend there is correlation, instead of causality, between our two respiratory track biomarkers and Alzheimer's but further elucidation to this extent is necessary as there is in many other areas with AD and more broadly in science in general (Pearl and Mackenzie, 2018). Some causality link may exist due to the simultaneous role of substance P in Alzheimer's (Severini et al., 2016) and in cough (Sekizawa et al., 1996). The existence of spontaneous cough per se may not be enough to predict onset risk but in combination with other health parameters may contribute to an accurate risk predictor (Song et al., 2011). Our biomarker suggestion is based on “forced coughs” which, to our knowledge, has not been studied in connection with Alzheimer's. We feel it may be an early indication of future respiratory tract conditions that will show in the form of spontaneous coughs. In patients with late-onset Alzheimer's Disease (LOAD) a unique delayed cough response has been reported in COVID-19 infected subjects (Isaia et al., 2020; Guinjoan, 2021). Dysphagia and aspiration pneumonia continue to be the two most serious conditions in late stage AD with the latter being the most common cause of death of AD patients (Kalia, 2003), suggesting substance P induced early signs in the respiratory tract may already be present in forced coughs, perhaps even unavoidably.
What seems unquestionable is the connection between speech and orofacial apraxia and Alzheimer's, and it has been suggested that it, alone, can be a good metric for longitudinal assessment (Cera et al., 2013). Various forms of apraxia have been linked to AD progression in different parts of the brain (Giannakopoulos et al., 1998). Nevertheless, given the difficulty in estimating speech and orofacial apraxia these figures are not part of common Clinical Dementia Rating scales (Folstein et al., 1975; Hughes et al., 1982; Clark and Ewbank, 1996; Lambon Ralph et al., 2003). However, all these studies reveal difficulties in an objective, accurate, and personalized scale that can track each patient independently from the others (Olde Rikkert et al., 2011). The lack of metrics also spans other related indicators such as quality of life estimations (Bowling et al., 2015). There are no reliable biomarkers for other neurogenerative disorders either (Johnen and Bertoux, 2019).
Recent research has demonstrated that apraxia screening can also predict dementia disease progression (Pawlowski et al., 2019), especially as a way to predict AD in early stage FTD subjects, a population that we are particularly interested in targeting with our biomarkers. For the Behavioral Variant of Fronto Temporal Dementia (bvFTD), in patients under 65 the second most common cognitive disorder caused by neurodegeneration, little tonal modulation and buccofacial apraxia, are targeted by our biomarkers and are established diagnostic domains (Johnen and Bertoux, 2019). We hope that our research can help establish reliable biomarkers for disease progression that can also distinguish at onset between the different possible diagnostics. The exact connection between buccofacial apraxia and dementia has not been as well-documented as that of other forms of apraxia. Recent results show that there buccofacial apraxia may be present in up to fifty percent of dementia patients with no association to oropharyngeal dysphagia (Michel et al., 2020). Oropharyngeal dysphagia, on the other hand, has been linked to dementia, in some studies in over fifty percent of the cases, appearing, in particular, in late stages of FTD and in early stages of AD (Alagiakrishnan et al., 2013).
According to the NIH's National Institute of Neurological Disorders and Stroke information page on apraxia1, the most common form of apraxia is orofacial apraxia which causes the inability to carry out facial movements on request such as coughing. Cough reflex sensitivity and urge-to-cough deterioration has been shown to help distinguish AD from dementia with Lewy Bodies and control groups (Ebihara et al., 2020). The impairment of cough in the elderly is linked to dementia (Won et al., 2018).
3. Overview of the MIT Open Voice Brain Model (OVBM) Framework
The OVBM architecture shown in Figure 1 frames a four-unit system to test biomarker combinations and provides the basis for an explainable diagnostic framework for a target task such as AD discrimination. The Sensory Stream is responsible for pre-training models on large speech datasets to extract features of individual physical biomarkers. The Brain OS splits audio into overlapping chunks and leverages transfer learning strategies to fine-tune the biomarker models to the smaller target dataset. For longitudinal diagnosis, it includes a round-robin five stage graph neural network that marks salient events in continuous speech. The Cognitive Core incorporates medical knowledge specific to the target task to train cognitive biomarker feature extractors. The Symbolic Compositional Models unit combines fine-tuned biomarker models into a graph neural network. Its predictions on individual audio chunks are fed into an aggregating engine subsequently reaching a final diagnostic plus a patient saliency map. To enable doctors to gain insight into the specific condition of a given patient, one of the novelties of our approach is that the outputs at each unique module are extracted to create a visualization in the form of a health diagnostic saliency map showing the impact of the selected biomarkers. This saliency map may be used to longitudinally track and visualize disease progression.
Figure 1

Diagram of MIT CBMM open voice 4 module brain model with the selected AD Biomarkers.
3.1. OVBM Applied to AD Detection
Next, we review each of the four OVBM modules in the context of AD, introducing 16 biomarkers and gradually explaining the partial GNN architecture shown in Figure 2. To be able to compare models, our baselines and 8 of the biomarkers are based on the ResNet50 CNN due to its state-of-the-art performance on medical speech recognition tasks (Ghoniem, 2019). All audio samples are processed with the MFCC package published by Lyons et al. (2020), and padded accordingly. We operate on Mel Frequency Cepstral Coefficients (MFCC), instead of spectrograms (Lee et al., 2009), because of its resemblance to how the human cochlea captures sound (Krijnders and t Holt, 2017). All audio data uses the same MFCC parameters (Window Length: 20 ms, Window Step: 10 ms, Cepstrum Dimension: 200, Number of Filters: 200, FFT Size: 2,048, Sample rate: 16,000). All datasets follow a 70/30 train-test split and models are trained with an Adam optimizer (Kingma and Ba, 2014).
Figure 2

OVBM GNN architecture at a given Brain OS time.
The dataset from DementiaBank, ADrESS (Luz et al., 2020), is used for training the OVBM framework and fine-tuning all biomarker models on AD detection. This dataset is the largest publicly available, consisting of subject recordings in full enhanced audio and short normalized sub-chunks, along with the recording transcriptions from 78 AD and 78 non-AD patients. The patient age and gender distribution is balanced and equal for AD and non-AD patients. For the approach of this study focusing purely on audio processing we only use the full enhanced audio and patient metadata, excluding transcripts from any processing. It is worth noting this given the poor audio quality of some of the recordings.
4. OVBM AD Sensory Stream Biomarkers
We have selected four biomarkers inspired by previous medical community choices (Chertkow and Bub, 1990; Wirths and Bayer, 2008; Dodd, 2015; Heckman et al., 2017; Galvin et al., 2020), as reviewed next.
4.1. Biomarker 1 (Muscular Degradation)
We follow memory decay models from Subirana et al. (2017a) and Cano-Cordoba et al. (2017) to capture this muscular metric by degrading input signals for all train and test sets with the Poisson mask shown in Equation (1), a commonly occurring distribution in nature (Reed and Hughes, 2002). We use as a Possion function a mask with input MFCC image = Ix, output mask = M(IX), λ = 1, and k = each value in Ix:
As shown in Table 2, this Poisson biomarker brings a unique improvement to each model except for Cough, consistent with both inherently capturing similar features containing muscular degradation.
Table 2
| Model | W/o Poisson(%) | With Poisson(%) |
|---|---|---|
| Baseline | 65.6 | 68.8 |
| Cough | 75.0 | 75.0 |
| Intonation | 68.8 | 75.0 |
| Wake-Word “Them” | 75.0 | 78.1 |
| Multi-Modal | 90.6 | 93.8 |
| Avg improvement(%) | 3.1 |
Impact of Poisson mask on AD performance.
Baseline is a ResNet50 trained on the AD task without transfer learning.
4.2. Biomarker 2 (Vocal Cords)
We have developed a vocal cord biomarker to incorporate in OBVM architectures. We trained a Wake Word (WW) model to learn vocal cord features on LibriSpeech—an audiobook dataset with ≈1,000 h of speech from Panayotov et al. (2015) by creating a balanced sample set of 2 s audio chunks, half containing the word “Them” and half without. A ResNet50 (He et al., 2016) is trained for binary classification of “Them” on 3 s audio chunks(lr:0.001, val_acc: 89%).
Illustrated in Table 3 and Figure 4, this vocal cords model proves its contribution of unique features, which without fine-tuning to the AD task performs as well as the baseline ResNet50 fully trained on AD, and significantly beats it when fully fine-tuned.
Table 3
| Biomarker | Model Name | Alzheimer's(%) | COVID-19(%) |
|---|---|---|---|
| Respiratory tract | Cough | 9 | 23 |
| Sentiment | Intonation | 19 | 8 |
| Vocal cords | WW “THEM” | 16 | 19 |
| R. Tract and sentiment | Cough and Tone. | 0 | 0 |
| R. Tract and vocal cords | Cough and WW | 6 | 1 |
| Sentiment and vocal cords | Tone. and WW | 3 | 0 |
| In all 3 | 41 | 34 | |
| In neither of the 3 | 6 | 15 |
To illustrate the complementary nature of the biomarkers we show the unique AD patients detected by each individual biomarker model with only the final classification layer fine-tuned on the target disease, Alzheimer's and COVID-19 in this case.
Each transfer model detects unique patients reinforcing orthogonality of the biomarkers and hence the potential of combining new ones. Note how exactly the same biomarker models can detect Alzheimer's and COVID-19 subjects, showing the transferable nature for different diseases and how they behave “orthogonally” in both cases.
4.3. Biomarker 3 (Sentiment)
We train a Sentiment Speech classifier model to learn intonation features on RAVDESS—an emotional speech dataset by Livingstone and Russo (2018) of actors speaking in eight different emotional states. A ResNet50 (He et al., 2016) is trained on categorical classification of eight corresponding intonations such as calm, happy, or disgust(lr: 0.0001, val_acc on 8 classes: 71%).
As illustrated by Table 3 and Figure 4, this biomarker captures unique features for AD detection, and when only fine-tuning its final five layers outperforms a fully trained ResNet50 on AD detection.
4.4. Biomarker 4 (Lungs and Respiratory Tract)
We use the cough dataset collected through MIT Open Voice for COVID-19 detection (Subirana et al., 2020b), strip all but the spoken language of the person coughing (English, Catalan), and split audios into 6 s chunks. A ResNet50 (He et al., 2016) is trained on binary classification (Input: MFCC 6s Audio Chunks (1 cough)—Output: English/Catalan, lr: 0.0001, val_acc: 86%).
Figure 4 and Table 3, justify the features extracted by this cough model as valuable for the task of AD detection by capturing a unique set of samples and improving performance. Further, Figure 3 validates its impact on various OBVM architectures, including the top performing multi-modal model, justifying the relevance of this novel biomarker.
Figure 3

Impact of sensory stream biomarkers on OVBM performance by removing transferred knowledge one at a time. Top dotted sections of bars indicate there is always performance gain from the cough biomarker. Baselines are the OVBM trained on AD without any transfer learning. In the other bars, a biomarker is removed and replaced with an AD pre-trained ResNet50, hence removing the transferred knowledge but conserving computational power, showing complementarities since all are needed for maximum results.
5. OVBM Brain OS Biomarkers
The Brain OS is responsible for capturing learned features from the individual biomarker models in the Sensory Stream and Cognitive Core, and for integrating them into an OVBM architecture, with the aim of training the ensemble for a target task, in this case AD detection.
To make the most out of the short patient recordings, we split each patient recording into overlapping audio chunks (0–4, 2–6, 4–8 s). Once the best pre-trained biomarker models in the sensory stream and cognitive modules are selected, we first concatenate them together and then pass their outputs through a 1,024 neuron deeply connected neural network layer with ReLU activation. We also incorporate at this point metadata such as gender. We test three Brain OS transfer learning strategies: (1) CNNs are used as fixed feature extractors without any fine-tuning; (2) CNNs are fine-tuned by training all layers; (3) only the final layers of the CNN are fine-tuned.
From Figure 5, it is evident AD detection improves as chunk length increases consistent with the fact that attention-marking has more per-chunk information to formulate a better AD prediction. From this attention-marking index (quantity of information required in a chunk for a confident diagnosis) we select chunk sizes 2, 8, 14, and 20 s, shown in Figure 7, as the Brain OS biomarkers, establishing individual AD progression. In terms of transfer learning strategies, Figure 4 shows that fine-tuning all layers always leads to better results, however for most models almost no fine-tuning is required to beat the baseline.
Figure 4

Sensory Stream Saliency Bar Chart: To illustrate the potential of our approach we show the strength of the simplest transfer models we tried. The numbers 0-5-10-ALL on the x-axis labels refer to the number of convolution layers trained after transfer learning in addition to the final dense layer. We find the most surprising, perhaps, is that the simple wakeword model to find the word “Them” is as powerful as the baseline. If we let the model fine-tune the last few (0-5-10) layers then it goes well beyond it. Our novel Cough database, inspired in the effect of AD in the respiratory tract also shows surprising results, even without any adaptation at all. If we let fine-tuning of the whole model, it's validation accuracy improves ≈10% points with respect to the baseline. Baseline is the same OVBM architecture trained on AD without any transfer learning of features.
6. OVBM Cognitive Core Biomarkers
Neuropsychological tests are a common screening tool for AD (Baldwin and Farias, 2009). These tests, among others, evaluate a patient's ability to remember uncommon words, contextualize, infer actions, and detect saliency (Baldwin and Farias, 2009; Costa et al., 2017). In the case of this AD dataset, all patients are asked to describe the Cookie Theft picture created by Goodglass et al. (1983), where a set of words such as “kitchen” (context), “tipping” (unique), “jar” (inferred), and “overflow” (salient), are used to capture four cognitive biomarkers. To keep the richness of speech, we train four wake word models from LibriSpeech (Panayotov et al., 2015) with ResNet50s following the same approach as Biomarker 2. The four chosen cognitive biomarkers aim to detect patients' ability on: context, uniqueness, inference, and saliency.
We could show the same saliency bar chart in Figure 4 and a uniqueness table such as Table 3 to illustrate the impact of each cognitive biomarker. Instead in Figure 5, we show the impact of removing the cognitive core on the top OVBM performance which drops ≈10%, validating the relevance of the cognitive core biomarkers.
Figure 5

The two top lines illustrate the full OVBM performance, with its biomarker feature models, as a function of chunk size. PT refers to individually fine-tuning each biomarker model for AD before re-training the whole OVBM. The middle line shows the OVBM without the cognitive core, illustrating how it boosts performance by about 10% across the board. Baseline PT is the OVBM architecture with each ResNet50 inside individually trained on AD before retraining them together in the OVBM architecture.
7. OVBM Symbolic Comp. M. Biomarkers
This module fine-tunes previous modules' outputs into a graph neural network. Predictions on individual audio chunks for one subject are aggregated and fed into competing models to reach a final diagnostic. We tested the model with various BERT configurations and found no improvement in detection accuracy. In the AD implementation, given we had at most 39 overlapping chunks, three simple aggregation metrics are compared: averaging, linear positive (more weight given to later chunks), and linear negative (more weight given to earlier chunks).
In Figure 6, averaging proves to be the most effective, while positive linear over performing the negative linear indicates the latter audio chunks are more informative than front ones. Figure 7 includes four biomarkers derived from combining chunk predictions from biomarker models of the three other modules (Cummings, 2019). With more data and longitudinal recordings, the OVBM GNN may incorporate other biomarkers.
Figure 6

Relation between chunk size and AD discrimination error, showing increased importance of the latter chunks.
Figure 7

(A) Saliency map to study the explainable AD evolution for all the patients in the study based on the predictions of individual biomarker models. BrainOS (2, 8, 14, 20) show the model prediction at different chunk sizes. This map could be used to longitudinally monitor subjects where a lower score on the biomarkers may indicate a more progressed AD subject. (B) Saliency map comparing AD+ subject S092 with a solid line and AD- subject S019 represented with a dashed line.
8. Discussion
We conclude by providing a few insights further supporting our OVBM brain-inspired model for audio health diagnostics as presented above. We have proven the success of the OVBM framework, setting the new benchmark for state-of-the-art accuracy of AD classification, despite only incorporating audio signals—one that can incorporate GNNs (Wu et al., 2020). Future work may improve this benchmark by also incorporating into OVBM longitudinal GNN's natural language biomarkers using NLP classifiers or multi-modal graph neural networks incorporating non-audio diagnostic tools (Parisot et al., 2018).
One of the most promising insights of all is the discovery of cough as a new biomarker (Figure 3), one that improves any of the intermediate models tested and that validates OVBM as a framework on which medical experts can hypothesize and test out existing and novel biomarkers. We are the first to report that cough biomarkers have information related to gender and culture, and are also the first to demonstrate how they improve simultaneous AD classification as illustrated in the saliency charts (Figure 4) as well as that of other apparently unrelated conditions.
Another promising finding is the model's explainability, introducing the biomarker AD saliency map tool, offering novel methods to evaluate patients longitudinally on a set of physical and neuropsychological biomarkers as shown in Figure 7. In future research, longitudinal data may be collected to properly test the onset potential of OVBM GNN discrimination in continuous speech. We hope our approach brings the AI health diagnostic experts closer to the medical community and accelerates research for treatments by providing longitudinal and explainable tracking metrics that can help succeed adaptive clinical trials of urgently needed innovative interventions.
Statements
Data availability statement
The data analyzed in this study is subject to the following licenses/restrictions: in order to gain access to the datasets used in the paper, researchers must become a member of DementiaBank. Requests to access these datasets should be directed to https://dementia.talkbank.org/.
Author contributions
JL wrote the code. BS designed the longitudinal biomarker Open Voice Brian Model (OVBM) and the saliency map. All authors contributed to the analysis of the results and the article.
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
References
1
AbeyratneU. R.SwarnkarV.SetyatiA.TriasihR. (2013). Cough sound analysis can rapidly diagnose childhood pneumonia. Ann. Biomed. Eng. 41, 2448–2462. 10.1007/s10439-013-0836-0
2
AlagiakrishnanK.BhanjiR. A.KurianM. (2013). Evaluation and management of oropharyngeal dysphagia in different types of dementia: a systematic review. Arch. Gerontol. Geriatr. 56, 1–9. 10.1016/j.archger.2012.04.011
3
AltinkayaE.PolatK.BarakliB. (2020). Detection of Alzheimer's disease and dementia states based on deep learning from MRI images: a comprehensive review. J. Instit. Electron. Comput. 1, 39–53. 10.33969/JIEC.2019.11005
4
Alzheimer's Association (2019). 2019 Alzheimer's disease facts and figures. Alzheimer's Dement. 15, 321–387. 10.1016/j.jalz.2019.01.010
5
Alzheimer's Association (2020). Alzheimer's disease facts and figures [published online ahead of print, 2020 mar 10]. Alzheimers Dement.1–70. 10.1002/alz.12068
6
AzarpazhoohM. R.AmiriA.MorovatdarN.SteinwenderS.ArdaniA. R.YassiN.et al. (2020). Correlations between covid-19 and burden of dementia: an ecological study and review of literature. J. Neurol. Sci. 416:117013. 10.1016/j.jns.2020.117013
7
BabulalG. M.GhoshalN.HeadD.VernonE. K.HoltzmanD. M.BenzingerT. L.et al. (2016). Mood changes in cognitively normal older adults are linked to Alzheimer disease biomarker levels. Am. J. Geriatr. Psychiatry24, 1095–1104. 10.1016/j.jagp.2016.04.004
8
BaldwinS.FariasS. T. (2009). Unit 10.3: Assessment of cognitive impairments in the diagnosis of Alzheimer's disease. Curr. Protoc. Neurosci.10:Unit10-3. 10.1002/0471142301.ns1003s49
9
BarthélemyN. R.HorieK.SatoC.BatemanR. J. (2020). Blood plasma phosphorylated-tau isoforms track CNS change in Alzheimer's disease. J. Exp. Med. 217:e20200861. 10.1084/jem.20200861
10
BennettW. D.DaviskasE.HasaniA.MortensenJ.FlemingJ.ScheuchG. (2010). Mucociliary and cough clearance as a biomarker for therapeutic development. J. Aerosol Med. Pulmon. Drug Deliv. 23, 261–272. 10.1089/jamp.2010.0823
11
BidzanM.BidzanL. (2014). Neurobehavioral manifestation in early period of Alzheimer disease and vascular dementia. Psychiatr. Polska48, 319–330.
12
BowlingA.RoweG.AdamsS.SandsP.SamsiK.CraneM.et al. (2015). Quality of life in dementia: a systematically conducted narrative review of dementia-specific measurement scales. Aging Ment. Health19, 13–31. 10.1080/13607863.2014.915923
13
BriggsR.KennellyS. P.O'NeillD. (2016). Drug treatments in Alzheimer's disease. Clin. Med. 16:247. 10.7861/clinmedicine.16-3-247
14
Cano-CordobaF.SarmaS.SubiranaB. (2017). Theory of Intelligence With Forgetting: Mathematical Theorems Explaining Human Universal Forgetting Using “Forgetting Neural Networks”. Technical Report 71, MIT Center for Brains, Minds and Machines (CBMM).
15
CassaniR.EstarellasM.San-MartinR.FragaF. J.FalkT. H. (2018). Systematic review on resting-state EEG for Alzheimer's disease diagnosis and progression assessment. Dis. Mark. 2018:5174815. 10.1155/2018/5174815
16
CeraM. L.OrtizK. Z.BertolucciP. H. F.MinettT. S. C. (2013). Speech and orofacial apraxias in Alzheimer's disease. Int. Psychogeriatr. 25, 1679–1685. 10.1017/S1041610213000781
17
ChertkowH.BubD. (1990). Semantic memory loss in dementia of Alzheimer's type: what do various measures measure?Brain113, 397–417. 10.1093/brain/113.2.397
18
ClarkC. M.EwbankD. C. (1996). Performance of the dementia severity rating scale: a caregiver questionnaire for rating severity in Alzheimer disease. Alzheimer Dis. Assoc. Disord. 10, 31–39. 10.1097/00002093-199603000-00006
19
CoffeyC. S.KairallaJ. A. (2008). Adaptive clinical trials. Drugs R & D9, 229–242. 10.2165/00126839-200809040-00003
20
CostaA.BakT.CaffarraP.CaltagironeC.CeccaldiM.ColletteF.et al. (2017). The need for harmonisation and innovation of neuropsychological assessment in neurodegenerative dementias in Europe: consensus document of the joint program for neurodegenerative diseases working group. Alzheimer's Res. Ther. 9:27. 10.1186/s13195-017-0254-x
21
CummingsL. (2019). Describing the cookie theft picture: Sources of breakdown in Alzheimer's dementia. Pragmat. Soc. 10, 153–176. 10.1075/ps.17011.cum
22
de la Fuente GarciaS.RitchieC.LuzS. (2020). Artificial intelligence, speech, and language processing approaches to monitoring Alzheimer's disease: a systematic review. J. Alzheimer's Dis. 78, 1547–1574. 10.3233/JAD-200888
23
DingY.SohnJ. H.KawczynskiM. G.TrivediH.HarnishR.JenkinsN. W.et al. (2019). A deep learning model to predict a diagnosis of Alzheimer disease by using 18F-FDG pet of the brain. Radiology290, 456–464. 10.1148/radiol.2018180958
24
DoddJ. W. (2015). Lung disease as a determinant of cognitive decline and dementia. Alzheimer's Res. Ther. 7:32. 10.1186/s13195-015-0116-3
25
EbiharaT.GuiP.OoyamaC.KozakiK.EbiharaS. (2020). Cough reflex sensitivity and urge-to-cough deterioration in dementia with Lewy bodies. ERJ Open Res. 6, 108–2019. 10.1183/23120541.00108-2019
26
FolsteinM. F.FolsteinS. E.McHughP. R. (1975). “Mini-mental state”: a practical method for grading the cognitive state of patients for the clinician. J. Psychiatr. Res. 12, 189–198. 10.1016/0022-3956(75)90026-6
27
FullerS. J.CarriganN.SohrabiH. R.MartinsR. N. (2019). Current and developing methods for diagnosing Alzheimer's disease, in Neurodegeneration and Alzheimer's Disease: The Role of Diabetes, Genetics, Hormones, and Lifestyle, eds MartinsR. N.BrennanC. S.Binosha FernandoW. M. A. D.BrennanM. A.FullerS. J. (John Wiley & Sons Ltd.), 43–87. 10.1002/9781119356752.ch3
28
GalvinJ.TariotP.ParkerM. W.JichaG. (2020). Screen and Intervene: The Importance of Early Detection and Treatment of Alzheimer's Disease. The Medical Roundtable General Medicine Edition.
29
GhoniemR. M. (2019). Deep genetic algorithm-based voice pathology diagnostic system, in Natural Language Processing and Information Systems, eds MétaisE.MezianeF.VaderaS.SugumaranV.SaraeeM. (Cham: Springer International Publishing), 220–233. 10.1007/978-3-030-23281-8_18
30
GiannakopoulosP.DucM.GoldG.HofP. R.MichelJ.-P.BourasC. (1998). Pathologic correlates of apraxia in Alzheimer disease. Arch. Neurol. 55, 689–695. 10.1001/archneur.55.5.689
31
GoodglassH.KaplanE.BarressiB. (1983). Cookie Theft Picture. Boston Diagnostic Aphasia Examination. Philadelphia, PA: Lea & Febiger.
32
GuinjoanS. M. (2021). Expert opinion in Alzheimer disease: the silent scream of patients and their family during coronavirus disease 2019 (covid-19) pandemic. Pers. Med. Psychiatry2021:100071. 10.1016/j.pmip.2021.100071
33
HariyantoT. I.PutriC.SitumeangR. F. V.KurniawanA. (2020). Dementia is a predictor for mortality outcome from coronavirus disease 2019 (COVID-19) infection. Eur. Arch. Psychiatry Clin. Neurosci. 26, 1–3. 10.1007/s00406-020-01205-z
34
HaulcyR.GlassJ. (2021). Classifying Alzheimer's disease using audio and text-based representations of speech. Front. Psychol. 11:3833. 10.3389/fpsyg.2020.624137
35
HeK.ZhangX.RenS.SunJ. (2016). Deep residual learning for image recognition, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (Las Vegas, NV), 770–778. 10.1109/CVPR.2016.90
36
HeckmanP. R.BloklandA.PrickaertsJ. (2017). From age-related cognitive decline to Alzheimer's disease: a translational overview of the potential role for phosphodiesterases, in Phosphodiesterases: CNS Functions and Diseases eds ZhangH. T.XuY.O'DonnellJ. (Springer), 135–168. 10.1007/978-3-319-58811-7_6
37
HenriksenK.O'BryantS.HampelH.TrojanowskiJ.MontineT.JerominA.et al. (2014). The future of blood-based biomarkers for Alzheimer's disease. Alzheimer's Dement. 10, 115–131. 10.1016/j.jalz.2013.01.013
38
HolmesR.OatesJ.PhylandD.HughesA. (2000). Voice characteristics in the progression of Parkinson's disease. Int. J. Lang. Commun. Disord. 35, 407–418. 10.1080/136828200410654
39
HolzingerA.LangsG.DenkH.ZatloukalK.MüllerH. (2019). Causability and explainability of artificial intelligence in medicine. Wiley Interdisc. Rev. Data Mining Knowl. Discov. 9:e1312. 10.1002/widm.1312
40
HughesC. P.BergL.DanzigerW.CobenL. A.MartinR. L. (1982). A new clinical scale for the staging of dementia. Br. J. Psychiatry140, 566–572. 10.1192/bjp.140.6.566
41
IsaiaG.MarinelloR.TibaldiV.TamoneC.BoM. (2020). Atypical presentation of covid-19 in an older adult with severe Alzheimer disease. Am. J. Geriatr. Psychiatry28, 790–791. 10.1016/j.jagp.2020.04.018
42
JamesH. J.Van HoutvenC. H.LippmannS.BurkeJ. R.Shepherd-BaniganM.BelangerE.et al. (2020). How accurately do patients and their care partners report results of amyloid-β pet scans for Alzheimer's disease assessment?J. Alzheimer's Dis. 74, 625–636. 10.3233/JAD-190922
43
JohnenA.BertouxM. (2019). Psychological and cognitive markers of behavioral variant frontotemporal dementia-a clinical neuropsychologist's view on diagnostic criteria and beyond. Front. Neurol. 10:594. 10.3389/fneur.2019.00594
44
KaliaM. (2003). Dysphagia and aspiration pneumonia in patients with Alzheimer's disease. Metabolism52, 36–38. 10.1016/S0026-0495(03)00300-7
45
KarlekarS.NiuT.BansalM. (2018). Detecting linguistic characteristics of Alzheimer's dementia by interpreting neural models. arXiv preprint arXiv:1804.06440. 10.18653/v1/N18-2110
46
KingmaD. P.BaJ. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980. Available online at: https://arxiv.org/abs/1412.6980
47
KrijndersJ.t HoltG. (2017). Tone-fit and MFCC scene classification compared to human recognition. Energy400:500. Available online at: https://www.researchgate.net/publication/255823915_Tonefit_and_MFCC_Scene_Classification_compared_to_Human_Recognition
48
KuoC.-L.PillingL. C.AtkinsJ. L.KuchelG. A.MelzerD. (2020). !‘i?‘APOE!‘/i?‘ e2 and aging-related outcomes in 379,000 UK biobank participants. Aging12, 12222–12233. 10.18632/aging.103405
49
LaguartaJ.HuetoF.SubiranaB. (2020). Covid-19 artificial intelligence diagnosis using only cough recordings. IEEE Open J. Eng. Med. Biol. 1, 275–281. 10.1109/OJEMB.2020.3026928
50
Lambon RalphM. A.PattersonK.GrahamN.DawsonK.HodgesJ. R. (2003). Homogeneity and heterogeneity in mild cognitive impairment and Alzheimer's disease: a cross-sectional and longitudinal study of 55 cases. Brain126, 2350–2362. 10.1093/brain/awg236
51
LeeH.PhamP.LargmanY.NgA. Y. (2009). Unsupervised feature learning for audio classification using convolutional deep belief networks, in Advances in Neural Information Processing Systems 22, eds BengioY.SchuurmansD.LaffertyJ. D.WilliamsC. K. I.CulottaA. (Curran Associates, Inc.), 1096–1104.
52
LivingstoneS. R.RussoF. A. (2018). The ryerson audio-visual database of emotional speech and song (ravdess): a dynamic, multimodal set of facial and vocal expressions in north American english. PLoS ONE13:e196391. 10.1371/journal.pone.0196391
53
LuzS.HaiderF.de la FuenteS.FrommD.MacWhinneyB. (2020). Alzheimer's dementia recognition through spontaneous speech: the ADReSS Challenge, in Proceedings of INTERSPEECH 2020 (Shanghai). 10.21437/Interspeech.2020-2571
54
LyonsJ.WangD. Y.-B.ShteingartH.MavrinacE.GaurkarY.WatcharawisetkulW.et al. (2020). jameslyons/python_speech_features: release v0.6.1.
55
LyuG. (2018). A review of Alzheimer's disease classification using neuropsychological data and machine learning, in 2018 11th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI) Beijing, 1–5. 10.1109/CISP-BMEI.2018.8633126
56
MaclinJ. M. A.WangT.XiaoS. (2019). Biomarkers for the diagnosis of Alzheimer's disease, dementia lewy body, frontotemporal dementia and vascular dementia. Gen. Psychiatry32:e100054. 10.1136/gpsych-2019-100054
57
ManabeT.FujikuraY.MizukamiK.AkatsuH.KudoK. (2019). Pneumonia-associated death in patients with dementia: a systematic review and meta-analysis. PLoS ONE14:e0213825. 10.1371/journal.pone.0213825
58
MichelA.VerinE.HansenK.ChassagneP.RocaF. (2020). Buccofacial apraxia, oropharyngeal dysphagia, and dementia severity in community-dwelling elderly patients. J. Geriatr. Psychiatry Neurol. 34, 150–155. 10.1177/0891988720915519
59
MorrisR. G.KopelmanM. D. (1986). The memory deficits in Alzheimer-type dementia: a review. Q. J. Exp. Psychol. 38, 575–602. 10.1080/14640748608401615
60
MorsyA.TrippierP. (2019). Current and emerging pharmacological targets for the treatment of Alzheimer's disease. J. Alzheimer's Dis. 72, 1–33. 10.3233/JAD-190744
61
NassifA. B.ShahinI.AttiliI.AzzehM.ShaalanK. (2019). Speech recognition using deep neural networks: a systematic review. IEEE Access7, 19143–19165. 10.1109/ACCESS.2019.2896880
62
Olde RikkertM. G.TonaK. D.JanssenL.BurnsA.LoboA.RobertP.et al. (2011). Validity, reliability, and feasibility of clinical staging scales in dementia: a systematic review. Am. J. Alzheimer's Dis. Other Dement. 26, 357–365. 10.1177/1533317511418954
63
OrimayeS. O.WongJ. S.-M.GoldenK. J. (2014). Learning predictive linguistic features for Alzheimer's disease and related dementias using verbal utterances, in Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality (Baltimore, MD), 78–87. 10.3115/v1/W14-3210
64
PalmqvistS.JanelidzeS.QuirozY. T.ZetterbergH.LoperaF.StomrudE.et al. (2020). Discriminative accuracy of plasma phospho-tau217 for Alzheimer disease vs. other neurodegenerative disorders. JAMA324, 772–781. 10.1001/jama.2020.12134
65
PanayotovV.ChenG.PoveyD.KhudanpurS. (2015). Librispeech: an ASR corpus based on public domain audio books, in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (South Brisbane, QLD), 5206–5210. 10.1109/ICASSP.2015.7178964
66
ParisotS.KtenaS. I.FerranteE.LeeM.GuerreroR.GlockerB.et al. (2018). Disease prediction using graph convolutional networks: application to autism spectrum disorder and Alzheimer's disease. Med. image Anal. 48, 117–130. 10.1016/j.media.2018.06.001
67
PawlowskiM.JokschV.WiendlH.MeuthS. G.DuningT.JohnenA. (2019). Apraxia screening predicts Alzheimer pathology in frontotemporal dementia. J. Neurol. Neurosurg. Psychiatry90, 562–569. 10.1136/jnnp-2018-318470
68
PearlJ.MackenzieD. (2018). The Book of Why: The New Science of Cause and Effect. Basic books.
69
PettiU.BakerS.KorhonenA. (2020). A systematic literature review of automatic Alzheimer's disease detection from speech and language. J. Am. Med. Inform. Assoc. 27, 1784–1797. 10.1093/jamia/ocaa174
70
PramonoR. X. A.ImtiazS. A.Rodriguez-VillegasE. (2016). A cough-based algorithm for automatic diagnosis of pertussis. PLoS ONE11:e162128. 10.1371/journal.pone.0162128
71
PulidoM. L. B.HernándezJ. B. A.BallesterM. Á. F.GonzálezC. M. T.MekyskaJ.SmékalZ. (2020). Alzheimer's disease and automatic speech analysis: a review. Expert Syst. Appl. 2020:113213. 10.1016/j.eswa.2020.113213
72
ReedW. J.HughesB. D. (2002). From gene families and genera to incomes and internet file sizes: Why power laws are so common in nature. Phys. Rev. E66:067103. 10.1103/PhysRevE.66.067103
73
SekizawaK.JiaY. X.EbiharaT.HiroseY.HirayamaY.SasakiH. (1996). Role of substance p in cough. Pulmon. Pharmacol. 9, 323–328. 10.1006/pulp.1996.0042
74
SeveriniC.PetrellaC.CalissanoP. (2016). Substance p and Alzheimer's disease: emerging novel roles. Curr. Alzheimer Res. 13, 964–972. 10.2174/1567205013666160401114039
75
ShawL. M.VandersticheleH.Knapik-CzajkaM.ClarkC. M.AisenP. S.PetersenR. C.et al. (2009). Cerebrospinal fluid biomarker signature in Alzheimer's disease neuroimaging initiative subjects. Ann. Neurol. 65, 403–413. 10.1002/ana.21610
76
SmallB. J.FratiglioniL.ViitanenM.WinbladB.BäckmanL. (2000). The course of cognitive impairment in preclinical Alzheimer disease: three-and 6-year follow-up of a population-based sample. Arch. Neurol. 57, 839–844. 10.1001/archneur.57.6.839
77
SongX.MitnitskiA.RockwoodK. (2011). Nontraditional risk factors combine to predict Alzheimer disease and dementia. Neurology77, 227–234. 10.1212/WNL.0b013e318225c6bc
78
SubiranaB. (2020). Call for a wake standard for artificial intelligence. Commun. ACM63, 32–35. 10.1145/3402193
79
SubiranaB.BagiatiA.SarmaS. (2017a). On the Forgetting of College Academics: at “Ebbinghaus Speed”? Technical Report 68, MIT Center for Brains, Minds and Machines (CBMM). 10.21125/edulearn.2017.0672
80
SubiranaB.BivingsR.SarmaS. (2020a). Wake neutrality of artificial intelligence devices, in Algorithms and Law, eds EbersM.NavasS. (Cambridge University Press). 10.1017/9781108347846.010
81
SubiranaB.HuetoF.RajasekaranP.LaguartaJ.PuigS.MalvehyJ.et al. (2020b). Hi Sigma, do I have the coronavirus?: call for a new artificial intelligence approach to support health care professionals dealing with the COVID-19 pandemic. arXiv preprint arXiv:2004.06510.
82
SubiranaB.SarmaS.CantwellR.StineJ.TaylorM.JacobsK.et al. (2017b). Time to Talk: The Future for Brands is Conversational. Technical report, MIT Auto-ID Laboratory and Cap Gemini.
83
SyedM. S. S.SyedZ. S.LechM.PirogovaE. (2020). Automated screening for Alzheimer's dementia through spontaneous Speech, in INTERSPEECH 2020 Conference (Shanghai), 2222–2226. 10.21437/Interspeech.2020-3158
84
The Center for Brains Minds & Machines. (2020). Modules. Available online at: https://cbmm.mit.edu/research/modules (accessed April 14, 2020).
85
WirthsO.BayerT. (2008). Motor impairment in Alzheimer's disease and transgenic Alzheimer's disease mouse models. Genes Brain Behav. 7, 1–5. 10.1111/j.1601-183X.2007.00373.x
86
WiseJ. (2016). Dementia and flu are blamed for increase in deaths in 2015 in England and wales. BMJ353:i2022. 10.1136/bmj.i2022
87
WonH.-K.YoonS.-J.SongW.-J. (2018). The double-sidedness of cough in the elderly. Respir. Physiol. Neurobiol. 257, 65–69. 10.1016/j.resp.2018.01.009
88
WuZ.PanS.ChenF.LongG.ZhangC.PhilipS. Y. (2020). A comprehensive survey on graph neural networks. IEEE Trans. Neural Netw. Learn. Syst. 32, 4–24. 10.1109/TNNLS.2020.2978386
89
YuY.TravaglioM.PopovicR.LealN. S.MartinsL. M. (2020). Alzheimer's and Parkinson's diseases predict different COVID-19 outcomes, a UK biobank study. medRxiv. 1–16. 10.1101/2020.11.05.20226605
90
YuanJ.BianY.CaiX.HuangJ.YeZ.ChurchK. (2020). Disfluencies and fine-tuning pre-trained language models for detection of Alzheimer's disease, in INTERSPEECH 2020 Conference (Shanghai), 2162–2166. 10.21437/Interspeech.2020-2516
91
ZetterbergH. (2019). Blood-based biomarkers for Alzheimer's disease–an update. J. Neurosci. Methods319, 2–6. 10.1016/j.jneumeth.2018.10.025
92
ZunicA.CorcoranP.SpasicI. (2020). Sentiment analysis in health and well-being: systematic review. JMIR Med. Inform. 8:e16023. 10.2196/16023
Summary
Keywords
multimodal deep learning, transfer learning, explainable speech recognition, brain model, graph neural-networks, AI diagnostics
Citation
Laguarta J and Subirana B (2021) Longitudinal Speech Biomarkers for Automated Alzheimer's Detection. Front. Comput. Sci. 3:624694. doi: 10.3389/fcomp.2021.624694
Received
01 November 2020
Accepted
25 February 2021
Published
08 April 2021
Volume
3 - 2021
Edited by
Saturnino Luz, University of Edinburgh, United Kingdom
Reviewed by
Mary Jean Amon, University of Central Florida, United States; Andrea Seveso, University of Milano-Bicocca, Italy
Updates
Copyright
© 2021 Laguarta and Subirana.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Brian Subirana subirana@mit.edu
This article was submitted to Human-Media Interaction, a section of the journal Frontiers in Computer Science
Disclaimer
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.