<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article article-type="editorial" dtd-version="2.3" xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Comput. Sci.</journal-id>
<journal-title>Frontiers in Computer Science</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Comput. Sci.</abbrev-journal-title>
<issn pub-type="epub">2624-9898</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">780169</article-id>
<article-id pub-id-type="doi">10.3389/fcomp.2021.780169</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Computer Science</subject>
<subj-group>
<subject>Editorial</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Editorial: Alzheimer&#x27;s Dementia Recognition through Spontaneous Speech</article-title>
<alt-title alt-title-type="left-running-head">Luz et&#x20;al.</alt-title>
<alt-title alt-title-type="right-running-head">Editorial: Alzheimer&#x2019;s Recognition through Speech</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Luz</surname>
<given-names>Saturnino</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
<uri xlink:href="https://loop.frontiersin.org/people/141969/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Haider</surname>
<given-names>Fasih</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/779106/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>de la Fuente Garcia</surname>
<given-names>Sofia</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/932624/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Fromm</surname>
<given-names>Davida</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/227222/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>MacWhinney</surname>
<given-names>Brian</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/10713/overview"/>
</contrib>
</contrib-group>
<aff id="aff1">
<label>
<sup>1</sup>
</label>Usher Institute, Edinburgh Medical School, The University of Edinburgh, <addr-line>Edinburgh</addr-line>, <country>United&#x20;Kingdom</country>
</aff>
<aff id="aff2">
<label>
<sup>2</sup>
</label>Department of Psychology, Carnegie Mellon University, <addr-line>Pittsburgh</addr-line>, <addr-line>PA</addr-line>, <country>United&#x20;States</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>
<bold>Edited by and reviewed by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/81342/overview">Anton&#x20;Nijholt</ext-link>, University of Twente, Netherlands</p>
</fn>
<corresp id="c001">&#x2a;Correspondence: Saturnino Luz, <email>s.luz@ed.ac.uk</email>
</corresp>
<fn fn-type="other">
<p>This article was submitted to Human-Media Interaction, a section of the journal Frontiers in Computer Science</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>21</day>
<month>10</month>
<year>2021</year>
</pub-date>
<pub-date pub-type="collection">
<year>2021</year>
</pub-date>
<volume>3</volume>
<elocation-id>780169</elocation-id>
<history>
<date date-type="received">
<day>20</day>
<month>09</month>
<year>2021</year>
</date>
<date date-type="accepted">
<day>30</day>
<month>09</month>
<year>2021</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2021 Luz, Haider, de la Fuente Garcia, Fromm and MacWhinney.</copyright-statement>
<copyright-year>2021</copyright-year>
<copyright-holder>Luz, Haider, de la Fuente Garcia, Fromm and MacWhinney</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these&#x20;terms.</p>
</license>
</permissions>
<related-article id="RA1" related-article-type="commentary-article" xlink:href="https://www.frontiersin.org/researchtopic/13702" ext-link-type="uri">Editorial on the Research Topic <article-title>Alzheimer&#x27;s Dementia Recognition through Spontaneous Speech</article-title>
</related-article>
<kwd-group>
<kwd>Alzheimer&#x2019;s disease</kwd>
<kwd>signal processing (SP)</kwd>
<kwd>machine learning</kwd>
<kwd>speech processing and recognition</kwd>
<kwd>natural language processing (NLP)</kwd>
<kwd>computational paralinguistics</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<p>The need for inexpensive, safe, accurate and non-invasive biomarkers for Alzheimer&#x2019;s disease (AD) has motivated much current research (<xref ref-type="bibr" rid="B12">Mandell and Green, 2011</xref>). While diagnosis and evaluation of interventions are still primarily done through clinical assessment, &#x201c;digital biomarkers&#x201d; have attracted increasing interest. AI-enabled speech and language analysis has emerged as promising such biomarker for the assessment of disease status (<xref ref-type="bibr" rid="B2">de la Fuente Garcia et&#x20;al., 2020</xref>).</p>
<p>While a number of studies have investigated speech and language features for the detection of AD and mild cognitive impairment (<xref ref-type="bibr" rid="B5">Fraser et&#x20;al., 2016</xref>), and proposed various signal processing and machine learning methods for this task (<xref ref-type="bibr" rid="B13">Petti et&#x20;al., 2020</xref>), the field still lacks balanced benchmark data against which different approaches can be systematically compared. This Research Topic addresses this issue by exploring the use of speech characteristics for AD recognition using balanced data and shared tasks, such as those provided by the ADReSS Challenges (<xref ref-type="bibr" rid="B10">Luz et&#x20;al., 2020</xref>, <xref ref-type="bibr" rid="B16">Luz et&#x20;al., 2021</xref>). These tasks have brought together groups working on this active area of research, providing the community with benchmarks for comparison of speech and language approaches to cognitive assessment. Reflecting the multidisciplinary character of the topic, the articles in this collection span three journals: Frontiers of Aging Neuroscience, Frontiers of Computer Science and Frontiers in Psychology.</p>
<p>Most papers in this Reseach Topic target two main tasks: AD classification, for distinguishing individuals with AD from healthy controls, and cognitive test score regression, to infer the patient&#x2019;s Mini Mental Status Examination (MMSE) score (<xref ref-type="bibr" rid="B4">Folstein et&#x20;al., 1975</xref>). Of the twenty papers published in this collection, 14 used the ADReSS dataset (<xref ref-type="bibr" rid="B10">Luz et&#x20;al., 2020</xref>), by itself or in combination with other data. The ADReSS dataset is a curated subset of DementiaBank&#x2019;s Pitt Corpus, matched for age and gender so as to minimise risk of bias in the prediction tasks. The data consist of audio recordings of picture descriptions elicited from participants using the Cookie Theft picture from the Boston Diagnostic Aphasia Examination (<xref ref-type="bibr" rid="B1">Becker et&#x20;al., 1994</xref>; <xref ref-type="bibr" rid="B6">Goodglass et&#x20;al., 2001</xref>), transcribed and annotated using the CHAT coding system (<xref ref-type="bibr" rid="B11">MacWhinney, 2021</xref>). The papers covered a variety of approaches and models.</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3389/fnagi.2020.607449">Antonsson et&#x20;al.</ext-link> aimed to distinguish progressive cognitive decline from stable cognitive impairment using semantic analysis of a discourse task. Support Vector Machine (SVM) models performed best (AUC &#x3d; 0.93) with both semantic verbal fluency scores and disfluency features from the discourse task. Discourse analysis revealed significantly greater use of unrelated speech in the progressive cognitive decline group compared with the stable group and healthy controls&#x20;(HC).</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3389/fcomp.2021.634360">Clarke et&#x20;al.</ext-link> examined the impact of five different speech tasks (picture description, conversation, overlearned narrative recall, procedural recall, novel narrative retelling) on classification of 50 participants: 25 HC, 13 mild AD, 12 MCI. Linguistic features (<italic>n</italic>&#x20;&#x3d; 286) were automatically extracted from each task and used to train SVMs. Classification accuracy varied across tasks (62&#x2013;78% for HC vs AD &#x2b; MCI, 59&#x2013;90% for HC vs AD, 50&#x2013;78% for HC vs MCI) as did which features were most important to the classification.</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3389/fnagi.2021.635945">Balagopalan et&#x20;al.</ext-link> used linguistic and acoustic features derived from ADReSS speech and transcripts. They tuned a pretrained BERT model (<xref ref-type="bibr" rid="B3">Devlin et&#x20;al., 2018</xref>) and compared its features to clinically-interpretable language features. The BERT model outperformed other features and achieved accuracy of 83.33% for AD classification. A ridget regressor with 25&#x20;pre-engineered features obtained root mean squared error (RMSE) of 4.56 in MMSE prediction.</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3389/fpsyg.2020.623237">Chlasta and Wo&#x142;k</ext-link> used VGGish, a pretrained a Tensorflow model for audio feature extraction and a custom raw waveform based convolutional neural (CNN), DemCNN, to model the acoustic characteristics of AD speech on the ADreSS dataset. DemCNN provided better results than VGGish (<xref ref-type="bibr" rid="B9">Hershey et&#x20;al., 2017</xref>) and achieved an accuracy of 62.5% using only the acoustic information.</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3389/fnagi.2021.637404">De Looze et&#x20;al.</ext-link> combined structural MRI, neuropsychological testing and conversational features to explore temporal characteristics of speech in a collaborative referencing task. They investigated associations with cognitive function and volumetry in brain areas known to be affected by MCI and AD. A linear mixed-effect model was built for data of 32 individuals to assess the predictive power of conversational speech features to classify clinical groups. They found that slower speech and slower turn-taking may provide useful markers for early detection of cognitive decline.</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3389/fcomp.2021.642517">Guo et&#x20;al.</ext-link> emphasized the importance of large normative datasets in training accurate and reliable machine learning models for dementia detection. They incorporated a new corpus of Cookie Theft picture descriptions (HC &#x3d; 839, NC &#x3d; 115) from the Wisconsin Longitudinal Study (<xref ref-type="bibr" rid="B8">Herd et&#x20;al., 2014</xref>) to train a BERT model and demonstrated improved performance on the detection task compared with results of the model trained on the ADReSS data alone (82.1% vs 79.8, accuracy, and 92.3 vs 88.3%&#x20;AUC).</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3389/fpsyg.2020.624137">Haulcy and Glass</ext-link> investigated the use of i-vectors and x-vectors (<xref ref-type="bibr" rid="B15">Snyder et&#x20;al., 2018</xref>), which are acoustic features originally devised for speaker identification, and linguistic features to tackle AD detection and MMSE prediction. The i-vectors and x-vectors were pre-trained on existing datasets unrelated to AD as well as in-domain data. Several classification and regression models were tested, yielding 85.4% accuracy in AD detection with SVM and Random Forests, and 4.56 RMSE with a gradient boosting regressor. Linguistic and acoustic features were modelled separately. The former yielded better performance. The authors speculate that the poor performance of i-vectors and x-vectors was due to in- and out-of-domain training data mismatch.</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3389/fcomp.2021.642633">Jonell et&#x20;al.</ext-link> proposed a multimodal analysis of patient behavior to improve early detection of dementia. Their system captured data from clinical interviews using nine different sensor devices which recorded speech, language, facial gestures, motor signs, gaze, pupil dilation, heart rate variability and thermal emission. This information was gathered from 25 patients with AD and later combined with brain scans, psychological tests, speech therapist assessments and other clinical data. They found that multimodality, in combination with the more established biomarkers, improves clinical discrimination.</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3389/fcomp.2021.624694">Laguarta and Subirana</ext-link> present an approach to the identification of different diseases which combines multiple biomarkers (features), including vocal cords, sentiment, lung and respiratory tract, among others. The authors employed transfer learning from other (non-AD) audio datasets to learn these features. The resulting model achieved up to 93% accuracy on the ADReSS dataset. Interestingly, the respiratory tract features, which were previously used in the detection of COVID-19 from a cough dataset, also proved helpful in AD detection.</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3389/fnagi.2021.642033">Lindsay et&#x20;al.</ext-link> investigated spontaneous speech of 78 HC and 76 AD individuals in English and French, proposing a multilingual model. Task-specific, semantic, syntactic and paralinguistic features were analysed. They found that language features, excluding task specific features, represent &#x201c;generalisable&#x201d; signs for cognitive language impairment in AD, outperforming all other feature sets. Semantic features were the most generalizable, with paralinguistic features showing no overlap between languages.</p>
<p>The work of <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3389/fnagi.2021.623607">Mahajan and Baths</ext-link> tested several acoustic and linguistic models, comparing their performance on ADReSS and a larger subset of DementiaBank. They employed a deep learning bimodal model to combine these features. For linguistic models, accuracy was lower on ADReSS than on DementiaBank (73 vs 88%). The authors attribute this to the smaller size of ADReSS and to overfitting in DementiaBank due to repeated samples from the same participant. Although the best linguistic model performed similarly to the bimodal learner, the authors suggest a number of possible improvements.</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3389/fnagi.2021.642647">Martinc et&#x20;al.</ext-link> presented a multimodal approach to AD detection using ADreSS data. The Active Data Representation method (<xref ref-type="bibr" rid="B7">Haider et&#x20;al., 2020</xref>) was used for fusion of acoustic and textual features at sentence and word level, along with temporal aspects of linguistic features. They achieved an accuracy of 93.75% through late fusion of acoustic, text and temporal models.</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3389/fcomp.2021.624558">Meghanani et&#x20;al.</ext-link> compared two approaches to the challenge tasks based on use of the non-automatic, hand-created transcripts. Both methods relied on the extraction of n-grams of varying lengths (<italic>n</italic>&#x20;&#x3d; 2,3,4, and 5) from the transcripts. The first method employed CNNs with a single convolutional layer in which the kernel size was adapted to the n-gram size. The second method used the fastText model with bigrams and trigrams. The fastText models outperformed the CNN models, achieving 83.3% accuracy for classification and RMSE of 4.87 for prediction of MMSE scores.</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3389/fcomp.2021.649508">Millington and Luz</ext-link> approached the data representation problem in the ADReSS dataset by converting its text transcriptions into word co-occurrence graphs and computing several graph structure metrics. They found that AD graphs have lower heterogeneity and centralization, but higher edge density. These metrics were used as input features to standard machine learning classifiers and regressors. A graph embedding metric was tested for comparison. Graph metrics outperformed graph embedding, achieving 66.7% accuracy in classification, and a 5.67 RMSE in MMSE regression.</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3389/fcomp.2021.640669">Nasreen et&#x20;al.</ext-link> investigated the role of conversational features such as dysfluencies, pauses, overlaps and other interactional elements in AD detection. They used the Carolinas Conversations Collection (<xref ref-type="bibr" rid="B14">Pope and Davis, 2011</xref>) to create classification models based on those features. The combination of dysfluency and interactional features resulted in a classification accuracy of 90%. These findings in conversational speech seem to agree with the findings from other papers in this Research Topic, which highlighted the importance of pauses and dysfluency in detecting AD in the ADReSS monologue&#x20;data.</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3389/fnagi.2020.605317">Parvin et&#x20;al.</ext-link> performed a randomised controlled clinical trial to investigate the effects of dual-task training on 26 patients with AD. Patients performed physical, cognitive and mental assessments and had their brain oscillations measured pre- and post-intervention, which consisted of a 12-weeks visual training program. The trained group showed significant improvements in cognitive function, mood and fitness. This was associated with a significant positive change in brain oscillation.</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3389/fcomp.2021.624594">Sadeghian et&#x20;al.</ext-link> examined the potential of an almost fully automated system for AD detection. Rather than using DementiaBank, they collected 72 new samples (26 AD, 46 HC) with higher quality audio. ASR was performed on data with pauses removed using voice activity detection. From this, they extracted 236 textual features and then used a genetic algorithm as well as a Multi-Layer Perceptron to identify the 10 most useful features, achieving 94% accuracy in detection.</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3389/fcomp.2021.624659">Shah et&#x20;al.</ext-link> used speech samples from the DementiaBank database for binary classification and MMSE regression. Although they developed models that combined acoustic and language-based features, their best performing model for binary classification used language-based features only with a regularized logistic regression, achieving 85.4% accuracy on a hold-out test set. A more reduced set of language features was their best performing model for the regression task, with an RMSE of&#x20;5.62.</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3389/fcomp.2020.624488">Yuan et&#x20;al.</ext-link> presented a method for encoding filled and unfilled pauses in transcripts to fine tune the training of language models using BERT and ERNIE. The accuracy of dementia detection improved to 89.6% (with ERNIE). Compared with controls, the individuals with dementia vocalised filled pause <italic>um</italic> much less frequently than <italic>uh</italic>, and their language samples included more pauses.</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3389/fcomp.2021.624683">Zhu et&#x20;al.</ext-link> used a transfer learning technique to fine-tuning the last layers of a pretrained model with customized layers for AD detection. The MobileNet and YAMNet network architectures were employed for this. They then used speech and text versions of BERT, individually and in combination for the same task. The text models outperformed the speech models, with the version based on pre-training with the longest input frame achieving 89.58% accuracy. The models which combined audio and speech data generally performed better than the models separately.</p>
<p>The studies in this Research Topic represent the state of the art in dementia detection, and contribute to the increasing body of evidence supporting machine learning and spoken language for detecting cognitive decline.</p>
</body>
<back>
<sec id="s1">
<title>Author Contributions</title>
<p>All authors made substantial contributions to the work and approved this manuscript for publication.</p>
</sec>
<sec id="s2">
<title>Funding</title>
<p>This work funded by the European Union&#x2019;s Horizon 2020 research and innovation programme, grant agreement No 769661 (SAAM project). The original acquisition of the DementiaBank data was supported by NIH grants AG005133 and AG003705 to the University of Pittsburgh.</p>
</sec>
<sec sec-type="COI-statement" id="s3">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s4">
<title>Publisher&#x2019;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Becker</surname>
<given-names>J.&#x20;T.</given-names>
</name>
<name>
<surname>Boller</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Lopez</surname>
<given-names>O.</given-names>
</name>
<name>
<surname>Saxton</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>McGonigle</surname>
<given-names>K.</given-names>
</name>
</person-group> (<year>1994</year>). <article-title>The Natural History of Alzheimer&#x27;s Disease</article-title>. <source>Arch. Neurol.</source> <volume>51</volume>, <fpage>585</fpage>&#x2013;<lpage>594</lpage>. <pub-id pub-id-type="doi">10.1001/archneur.1994.00540180063015</pub-id> </citation>
</ref>
<ref id="B2">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>de la Fuente Garcia</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Ritchie</surname>
<given-names>C. W.</given-names>
</name>
<name>
<surname>Luz</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Artificial Intelligence, Speech, and Language Processing Approaches to Monitoring Alzheimer&#x27;s Disease: A Systematic Review</article-title>. <source>J.&#x20;Alzheimers Dis.</source> <volume>78</volume>, <fpage>1547</fpage>&#x2013;<lpage>1574</lpage>. <pub-id pub-id-type="doi">10.3233/JAD-200888</pub-id> </citation>
</ref>
<ref id="B3">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Devlin</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Chang</surname>
<given-names>M.-W.</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Toutanova</surname>
<given-names>K.</given-names>
</name>
</person-group> (<year>2018</year>). <source>Bert: Pre-training of Deep Bidirectional Transformers for Language Understanding</source>. <publisher-name>arXiv</publisher-name>. <comment>[Preprint] arXiv:1810.04805</comment>. </citation>
</ref>
<ref id="B4">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Folstein</surname>
<given-names>M. F.</given-names>
</name>
<name>
<surname>Folstein</surname>
<given-names>S. E.</given-names>
</name>
<name>
<surname>McHugh</surname>
<given-names>P. R.</given-names>
</name>
</person-group> (<year>1975</year>). <article-title>&#x201C;Mini-mental State&#x201D;</article-title>. <source>J.&#x20;Psychiatr. Res.</source> <volume>12</volume>, <fpage>189</fpage>&#x2013;<lpage>198</lpage>. <pub-id pub-id-type="doi">10.1016/0022-3956(75)90026-6</pub-id> </citation>
</ref>
<ref id="B5">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fraser</surname>
<given-names>K. C.</given-names>
</name>
<name>
<surname>Meltzer</surname>
<given-names>J.&#x20;A.</given-names>
</name>
<name>
<surname>Rudzicz</surname>
<given-names>F.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Linguistic Features Identify Alzheimer&#x27;s Disease in Narrative Speech</article-title>. <source>J.&#x20;Alzheimers Dis.</source> <volume>49</volume>, <fpage>407</fpage>&#x2013;<lpage>422</lpage>. <pub-id pub-id-type="doi">10.3233/JAD-150520</pub-id> </citation>
</ref>
<ref id="B6">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Goodglass</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Kaplan</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Barresi</surname>
<given-names>B.</given-names>
</name>
</person-group> (<year>2001</year>). <source>BDAE-3: Boston Diagnostic Aphasia Examination</source>. <edition>Third Edition</edition>. <publisher-loc>Philadelphia, PA</publisher-loc>: <publisher-name>Lippincott Williams &#x26; Wilkins</publisher-name>. </citation>
</ref>
<ref id="B7">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Haider</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>de la Fuente</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Luz</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>An Assessment of Paralinguistic Acoustic Features for Detection of Alzheimer&#x27;s Dementia in Spontaneous Speech</article-title>. <source>IEEE J.&#x20;Sel. Top. Signal. Process.</source> <volume>14</volume>, <fpage>272</fpage>&#x2013;<lpage>281</lpage>. <pub-id pub-id-type="doi">10.1109/jstsp.2019.2955022</pub-id> </citation>
</ref>
<ref id="B8">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Herd</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Carr</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Roan</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Cohort Profile: Wisconsin Longitudinal&#x20;Study (WLS)</article-title>. <source>Int. J.&#x20;Epidemiol.</source> <volume>43</volume>, <fpage>34</fpage>&#x2013;<lpage>41</lpage>. <pub-id pub-id-type="doi">10.1093/ije/dys194</pub-id> </citation>
</ref>
<ref id="B9">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hershey</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Chaudhuri</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Ellis</surname>
<given-names>D. P. W.</given-names>
</name>
<name>
<surname>Gemmeke</surname>
<given-names>J.&#x20;F.</given-names>
</name>
<name>
<surname>Jansen</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Moore</surname>
<given-names>R. C.</given-names>
</name>
<etal/>
</person-group> (<year>2017</year>). &#x201c;<article-title>Cnn Architectures for Large-Scale Audio Classification</article-title>,&#x201d; in <conf-name>2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</conf-name>, <conf-loc>New Orleans, LA, United States</conf-loc>, <conf-date>March 5&#x2013;9, 2017</conf-date>, <fpage>131</fpage>&#x2013;<lpage>135</lpage>. <pub-id pub-id-type="doi">10.1109/ICASSP.2017.7952132</pub-id> </citation>
</ref>
<ref id="B10">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Luz</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Haider</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Fuente</surname>
<given-names>S. d. l.</given-names>
</name>
<name>
<surname>Fromm</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>MacWhinney</surname>
<given-names>B.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Alzheimer&#x27;s Dementia Recognition through Spontaneous Speech: The ADReSS Challenge</article-title>. <source>Proc. Interspeech</source> <volume>2020</volume>, <fpage>2172</fpage>&#x2013;<lpage>2176</lpage>. <pub-id pub-id-type="doi">10.21437/Interspeech.2020-2571</pub-id> </citation>
</ref>
<ref id="B16">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Luz</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Haider</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>de la Fuente</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Fromm</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>MacWhinney</surname>
<given-names>B.</given-names>
</name>
</person-group> (<year>2021</year>). &#x201c;<article-title>Detecting Cognitive Decline Using Speech Only: The ADReSSo Challenge</article-title>,&#x201d; in <conf-name>Proceedings of Interspeech 2021</conf-name>, <conf-loc>Brno, Czechia</conf-loc>, <conf-date>August 30&#x2013;September 3, 2021</conf-date>, <fpage>3780</fpage>&#x2013;<lpage>3784</lpage>. <pub-id pub-id-type="doi">10.21437/Interspeech.2021-1220</pub-id> </citation>
</ref>
<ref id="B11">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>MacWhinney</surname>
<given-names>B.</given-names>
</name>
</person-group> (<year>2021</year>). <source>Tools for Analyzing Talk Part 1: The CHAT Transcription Format</source>. <publisher-loc>Pittsburgh, PA</publisher-loc>: <publisher-name>Carnegie Mellon University</publisher-name>. <comment>Technical Report</comment>. <pub-id pub-id-type="doi">10.21415/3mhn-0z89</pub-id> </citation>
</ref>
<ref id="B12">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Mandell</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Green</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2011</year>). &#x201c;<article-title>Alzheimer&#x2019;s Disease</article-title>,&#x201d; in <source>Handbook of Alzheimer&#x2019;s Disease</source>. Editors <person-group person-group-type="editor">
<name>
<surname>Budson</surname>
<given-names>A. E.</given-names>
</name>
<name>
<surname>Kowall</surname>
<given-names>N. W.</given-names>
</name>
</person-group> (<publisher-loc>Malden, MA</publisher-loc>: <publisher-name>John Wiley &#x26; Sons</publisher-name>), <fpage>4</fpage>&#x2013;<lpage>91</lpage>. <comment>chap. 1</comment>. <pub-id pub-id-type="doi">10.1002/9781444344110.ch1</pub-id> </citation>
</ref>
<ref id="B13">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Petti</surname>
<given-names>U.</given-names>
</name>
<name>
<surname>Baker</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Korhonen</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>A Systematic Literature Review of Automatic Alzheimer&#x27;s Disease Detection from Speech and Language</article-title>. <source>J.&#x20;Am. Med. Inform. Assoc.</source> <volume>27</volume>, <fpage>1784</fpage>&#x2013;<lpage>1797</lpage>. <pub-id pub-id-type="doi">10.1093/jamia/ocaa174</pub-id> </citation>
</ref>
<ref id="B14">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Pope</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Davis</surname>
<given-names>B. H.</given-names>
</name>
</person-group> (<year>2011</year>). <article-title>Finding a Balance: The Carolinas Conversation Collection</article-title>. <source>Corpus Linguist. Linguist. Theory</source> <volume>7</volume> (<issue>1</issue>), <fpage>143</fpage>&#x2013;<lpage>161</lpage>. </citation>
</ref>
<ref id="B15">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Snyder</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Garcia-Romero</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Sell</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Povey</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Khudanpur</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2018</year>). &#x201c;<article-title>X-vectors: Robust DNN Embeddings for Speaker Recognition</article-title>,&#x201d; in <conf-name>Procs IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</conf-name> (<publisher-name>IEEE</publisher-name>), <fpage>5329</fpage>&#x2013;<lpage>5333</lpage>. <pub-id pub-id-type="doi">10.1109/icassp.2018.8461375</pub-id> </citation>
</ref>
</ref-list>
</back>
</article>