AUTHOR=Cho Sunghye , Nevler Naomi , Parjane Natalia , Cieri Christopher , Liberman Mark , Grossman Murray , Cousins Katheryn A. Q. TITLE=Automated Analysis of Digitized Letter Fluency Data JOURNAL=Frontiers in Psychology VOLUME=Volume 12 - 2021 YEAR=2021 URL=https://www.frontiersin.org/journals/psychology/articles/10.3389/fpsyg.2021.654214 DOI=10.3389/fpsyg.2021.654214 ISSN=1664-1078 ABSTRACT=The letter-guided naming fluency task is a measure of an individual’s executive function and working memory. This study employed a novel, fully automated, quantifiable, and reproducible method to investigate how language characteristics of words produced during a F-letter fluency task are related to fluency performance, inter-word response time (RT), and over task duration using digitized F-letter-guided fluency recordings produced by 76 young healthy participants. Our automated algorithm counted the number of correct responses from the transcripts of the F-letter fluency data, and individual words were rated for concreteness, ambiguity, frequency, familiarity, and age of acquisition (AoA). With a forced-aligner, the transcripts were automatically aligned with the corresponding audio recordings. We measured inter-word RT, word duration, and word start time from the forced alignments. Articulation rate was also computed. Phonetic distance between two consecutive F-letter words was measured as a cumulative distance of the first 13 mel-frequency cepstral coefficients that were obtained from the speech signals of the two words. Semantic distance was calculated as the Euclidean distance between the vector representations of two consecutive F-letter words. We found that total F-letter score was significantly correlated with the mean values of word frequency, familiarity, AoA, word duration, phonetic similarity and articulation rate; total score was also correlated with an individual’s standard deviation of AoA, familiarity, and phonetic similarity. RT was negatively correlated with frequency and ambiguity of F-letter words, and was positively correlated with AoA, number of phonemes, phonetic and semantic distance. Lastly, the frequency, ambiguity, AoA, number of phonemes, and semantic distance of words produced significantly changed over time during the task. The automated method employed in this paper demonstrate the successful implementation of our automated language processing pipelines in a standardized neuropsychological task, extending the application of our algorithm beyond our prior analyses of semi-structured speech samples collected during a picture description task. This novel approach captures subtle and rich language characteristics during test performance that enhance informativeness and cannot be extracted manually without massive effort. This work will serve as the reference for letter-guided category fluency production similarly acquired in neurodegenerative patients.