Language uncovers visuospatial dysfunction in posterior cortical atrophy: a natural language processing approach

Introduction Posterior Cortical Atrophy (PCA) is a syndrome characterized by a progressive decline in higher-order visuospatial processing, leading to symptoms such as space perception deficit, simultanagnosia, and object perception impairment. While PCA is primarily known for its impact on visuospatial abilities, recent studies have documented language abnormalities in PCA patients. This study aims to delineate the nature and origin of language impairments in PCA, hypothesizing that language deficits reflect the visuospatial processing impairments of the disease. Methods We compared the language samples of 25 patients with PCA with age-matched cognitively normal (CN) individuals across two distinct tasks: a visually-dependent picture description and a visually-independent job description task. We extracted word frequency, word utterance latency, and spatial relational words for this comparison. We then conducted an in-depth analysis of the language used in the picture description task to identify specific linguistic indicators that reflect the visuospatial processing deficits of PCA. Results Patients with PCA showed significant language deficits in the visually-dependent task, characterized by higher word frequency, prolonged utterance latency, and fewer spatial relational words, but not in the visually-independent task. An in-depth analysis of the picture description task further showed that PCA patients struggled to identify certain visual elements as well as the overall theme of the picture. A predictive model based on these language features distinguished PCA patients from CN individuals with high classification accuracy. Discussion The findings indicate that language is a sensitive behavioral construct to detect visuospatial processing abnormalities of PCA. These insights offer theoretical and clinical avenues for understanding and managing PCA, underscoring language as a crucial marker for the visuospatial deficits of this atypical variant of Alzheimer’s disease.


Introduction
Posterior cortical atrophy (PCA) is a clinico-radiological syndrome characterized by a progressive decline in higher-order visuospatial processing with relative preservation in other cognitive domains at initial presentation (Benson et al., 1988;Renner et al., 2004;Crutch et al., 2017).Common visuospatial symptoms of the syndrome include impaired object and space perception, simultanagnosia, environmental agnosia, and visual field defects (Crutch et al., 2017).From the neuroimaging perspective, the syndrome is associated with atrophy, hypometabolism, and usually tau deposition in posterior parietal, occipital, and temporo-occipital cortices (Whitwell et al., 2007;Lehmann et al., 2011).As the majority of PCA cases are due to underlying Alzheimer's pathology, PCA is also referred to as the visual variant of Alzheimer's disease (AD) (Levine et al., 1993;Alladi et al., 2007).While the diagnostic criteria for PCA indicate preserved functions in cognitive domains outside of visuospatial processing at symptom onset, a growing literature has documented language abnormalities in PCA emerging early in the course of the illness.Specifically, impaired category fluency and confrontation naming have been documented on formal neuropsychological assessments (Tang-Wai et al., 2004;McMonagle et al., 2006;Putcha et al., 2018Putcha et al., , 2020)).Language abnormalities are also evident during spontaneous speech, such as using higher frequency words and slowed speech rate (number of words per minute) (Crutch et al., 2013).These emerging observations suggest that there is still much to be understood about the nature and origin of language impairments in PCA.
The specific types of language abnormalities observed in PCA may be related to the network dysfunction that supports lexicosemantic retrieval, as has been previously postulated across the phenotypic spectrum of AD (Putcha et al., 2020).Another possible explanation is that the language abnormalities observed in PCA may stem from the visuospatial impairments central to the syndrome, rather than representing a primary language deficit.A large body of research supports close relationships between the visual processing of objects and the amodal semantic processing required for retrieving the names of those objects (Binder and Desai, 2011;Huth et al., 2016;Aliko et al., 2023).Recently, it has been shown that anterior to each region that is selective for the visual processing of a given category in the visual cortex, there is a corresponding area selective to its linguistic processing (Popham et al., 2021).This anatomical and functional configuration suggests that the anterior border of the visual cortex acts as a convergence zone where information from the unimodal visual system enters the amodal linguistic systems involved in linguistic retrieval.Therefore, the successful production of a word that has visual attributes requires intact visual processing, essential for providing the information needed to retrieve its corresponding linguistic representation (i.e., its name).Therefore, deficits in visual processing would theoretically impede the production of words with visual attributes.If the pathophysiology of language abnormalities in PCA involves disrupted visual processing, then it stands to reason that tasks heavily dependent on visual processing will exhibit significant language impairments.Conversely, tasks with minimal reliance on visual input should result in relatively intact language performance.
In the current study, we sought to test this hypothesis by contrasting the language used in two different speech samples as PCA participants described the Picnic scene from the Western Aphasia Battery (Kertesz et al., 2007) (visually dependent) and their prior jobs (visually independent).In each speech sample, we measured word frequency, word utterance latency, and the use of spatial relational words.For the picture description task, we hypothesized that PCA patients would use higher frequency words (e.g., replacing specific names of pictured items with superordinate words potentially including "thing"), have increased word utterance latency due to object recognition difficulty, and use fewer spatial relational words such as "into" or "underneath" compared to healthy individuals.For the non-visually dependent job description task, we expected these linguistic features to be comparable between PCA patients and healthy individuals.
Building on the hypothesis that speech patterns in visually dependent tasks reflect visuospatial processing deficits, we next sought to identify linguistic markers of these challenges.Specifically, we investigated which elements in the picnic scene presented particular retrieval difficulties and whether PCA patients could intuitively grasp and articulate the overall theme of the scene, such as using the word "picnic." Due to the difficulty with visually integrating a scene (simultanagnosia) (Tang-Wai et al., 2004;Singh et al., 2015;Cui et al., 2022), we hypothesized that PCA patients were less likely to verbalize the term "picnic" compared to healthy individuals.Lastly, to address the clinical significance of this work, we used the language features derived from the picture description task to develop a classifier aimed at distinguishing PCA patients from healthy individuals and hypothesized a high degree of classification accuracy.

PCA patients
Twenty-five patients diagnosed with PCA were recruited from the Massachusetts General Hospital (MGH) Frontotemporal Disorders Unit PCA program for this study (Wong et al., 2019).All but one was confirmed amyloid positive (A+) and tau positive (T+) by either CSF analysis or amyloid and tau PET.The remaining participant's biomarker status is unknown due to a failed lumbar puncture.Each patient had posterior cortical atrophy and/or hypometabolism (see Putcha et al., 2019 for the atrophy map of our PCA cohort), consistent with the typical neurodegeneration (N+) of PCA.All participants received a standard clinical evaluation comprising a structured history obtained from both participant and informant, comprehensive neurological and psychiatric history, as well as neuropsychological assessment.See Table 1 for neuropsychological profiles of the PCA cohort included in this study.Clinicians determined the impaired performance of PCA patients using the available normative data on these tests (Humphreys and Riddoch, 1993;Delis et al., 2000;Stern and White, 2003;Herrera-Guzmán et al., 2004;Strauss et al., 2006;Shirk et al., 2011;Weintraub et al., 2018).The clinical formulation was performed through a consensus conference by our multidisciplinary team of neurologists, psychiatrists, neuropsychologists, and speech and language pathologists, with each patient classified based on all available clinical information as having a 3-step diagnostic formulation of mild cognitive impairment or dementia ( (Rabinovici et al., 2010;Villeneuve et al., 2015;Cho et al., 2023).Determination of tau positivity and neurodegeneration was conducted by visual read using internal methods similar to published work (Rabinovici et al., 2011;Fleisher et al., 2020;Sonni et al., 2020).This work was carried out according to The Code of Ethics of the World Medical Association (Declaration of Helsinki) for experiments involving humans.All participants and their caregivers provided informed consent in accordance with the protocol approved by the Mass General Brigham Human Research Committee Institutional Review Board in Boston, Massachusetts.A speech sample for the picture description task was acquired from all 25 PCA participants.Twenty-one PCA participants also took part in the job description task.

Cognitively normal individuals
Twenty-nine cognitively normal (CN) participants (CN1) were enrolled through the Speech and Feeding Disorders Laboratory at the MGH Institute of Health Professions to participate in the picture description task.These participants passed a cognitive screen, were native English speakers, and had no history of neurologic injury or developmental speech/language disorders.Twenty-two CN participants (CN2) were additionally recruited through Amazon's Mechanical Turk (MTurk) to describe their jobs.MTurk participants filled out the short and validated version of the 12-item Everyday Cognition questionnaire, a questionnaire designed to detect cognitive and functional decline (Tomaszewski Farias et al., 2011).Only language samples from participants who were native English speakers with no self-reported history of brain injury or speech/language disorder, either developmental or acquired, were included in the analyses.Table 2 compares demographic data across patients with PCA and CN groups.There was no statistical difference in any demographic data between CN1 and CN2.CN groups were matched to PCA patients with respect to age, gender, and handedness.We included healthy individuals if they had at least 12 years of education which resulted in PCA participants having higher average years of education than CN1 [t(33.47)= −3.07,p = 0.004] and CN2 [t(39.13)= −3.42,p = 0.002].To ensure that the difference in years of education did not confound our results of language analyses, we conducted two additional analyses as presented in the Supplementary materials.We found no correlations between the language variables of interest and years of education.Furthermore, we observed findings similar to our main results when we repeated the analyses in a subgroup of PCA patients and CN participants with matched years of education (see Supplementary materials).

Speech samples and data analysis
Speech samples were collected under two conditions.For the visually dependent task, participants described the Western Aphasia Battery-Revised (WAB-R) (Kertesz et al., 2007) Picnic Scene with the instruction to use full sentences.For the visually independent task, participants were asked to describe what they did for work.There were no time limits applied to either task.Autotranscription was done using Google Cloud Speech-to-Text API for audio transcription (Cloud Speech-to-Text API v1 -Package cloud, n.d.), and manually verified by a research staff blinded to the diagnosis.

Speech sample analysis
All feature extraction was performed automatically using Quantitext, a fully automated speech and language analysis toolbox we developed in the Frontotemporal Disorders Unit of Massachusetts General Hospital.The program provides an objective assessment of language to enhance the precision of clinical evaluations as described previously (Rezaii et al., 2022).The program receives audio samples from participants and employs automated techniques for transcription, such as the Google API, ensuring reliable transcription accuracy.The toolbox uses a variety of software packages, such as Stanza (Qi et al., 2020) and Librosa (McFee et al., 2015) to conduct speech and language

Word frequency
To measure word frequency, we used the Switchboard corpus (Godfrey et al., 1992), which consists of spontaneous telephone conversations averaging 6 min in length spoken by more than 500 speakers of both sexes from a variety of American English dialects.We use this corpus to estimate word frequency in spoken English, independently of the patient and control sample.The corpus contains 2,345,269 words.Here, word frequency denotes the log frequency of content words (comprised of nouns, verbs, adjectives, and adverbs).

Word utterance latency and articulation rate
Our analysis employed the Google Cloud Speech-to-Text API to ascertain word timestamps, pinpointing the onset and offset for each spoken word within the audio recordings.Speech rate-often quantified as the number of words spoken per minute-can vary based on factors such as word utterance latency and the individual articulation rate of each word.To ensure a more granular and accurate interpretation of the underlying phenomena, we sidestepped aggregated metrics like speech rate, focusing instead on separately evaluating its constituent components.Word utterance latency is defined as the time interval preceding the articulation of a word.This method was applied on all except for the very first word in each sample, as the time to start the description task depends on multiple factors.Articulation rate measures the number of syllables per second (Cordella et al., 2019).

Spatial relational words
Relational words are automatically tagged by Stanza as "case." For most words, the relational words are spatial, for example, the word "under" in the sentence "I found the gem under my bed." In our analysis, we divided the number of relational words by the total words.

Content units
To determine which items within the picture posed greater challenges for PCA compared to CN participants, we coded the visual items using content units.Content units are words with correct information units that are intelligible in context and accurate about the picture or topic.Words do not have to be used in a grammatically correct manner to be counted as content units (Nicholas and Brookshire, 1993).Each content unit is only counted once, regardless of how many times it is mentioned in a sample.The morphological variants were grouped within one single content unit.For example, the nouns "girl" and "daughter" are considered the same content unit.Therefore, if one participant used both words (girl and daughter), they would only be counted as one content unit.To specify content units, Quantitext first generates a Python dictionary using a predefined set of words as previously described and then uses this dictionary to automatically identify all content units in new texts it receives.Previously, we showed that the program has an accuracy of 99.7% in identifying content units (Josephy-Hernandez et al., 2023).

Statistical analysis
We used Welch Two Sample t-tests to compare the language features across the two groups.We performed point-biserial correlation analysis on our dataset to investigate the relationships between the likelihood of reporting each content unit and the group designation (with PCA coded as 1 for patients and 0 for healthy controls).We applied Bonferroni's correction to account for multiple comparisons, setting the significance threshold at 0.0016 (Bonferroni, 1935).For classification, we used a binary logistic regression model.We employed a leave-one-out cross-validation (LOOCV) approach on our dataset to validate the model's performance.In each iteration of the LOOCV, a single observation was set aside as the test data, and the remaining observations were used to train the model.

Language abnormalities in PCA are observed during picture description but not job description
We first compared the speech samples of PCA participants describing the WAB Picnic Scene to the CN1 group to determine language abnormalities in this task.We also compared the speech sample from the job description task between the PCA and CN2 groups (see Table 3).We used Welch Two Sample t-tests to compare the means of the following features across the two groups (Figure 1).

Picture description task
Patients with PCA used higher frequency words (i.e., more commonly used words) (mean = 6.50 ± 0.53) compared to healthy  There was a trend toward a slower articulation rate in PCA patients (mean = 2.92 ± 0.60) compared to healthy controls (mean = 3.14 ± 0.35) [t(33.88)= 1.616, p = 0.115], suggesting that slower speech rate may be primarily due to an increased word utterance latency rather than articulation rate.
Specific language indicators of visuospatial processing deficits of PCA can be extracted from the picture description task We next probed the samples obtained from the picture description task to extract the specific language features that reflect visuospatial impairment in PCA compared to healthy controls.First, we determined the likelihood of mentioning each content unit by each diagnostic group.The picture consists of 32 content units, as shown in Figure 2. We performed point-biseral correlation analysis on our dataset to investigate the relationships between the likelihood of reporting each content unit and the group designation.As shown in Figure 2, we did not observe a uniform reduction in the likelihood of mentioning each content unit in PCA.Instead, certain content units had a much lower chance of being verbalized.Of all content units, "fisherman, " a small, central feature of the WAB Picnic scene, was the least likely to be mentioned by a patient with PCA compared with healthy controls (r = −0.85,p < 0.001).A few content units had a numerically higher, though not statistically significant, likelihood of being mentioned by patients with PCA compared to healthy individuals, such as "clouds" (r = 0.18, p = 0.186).Figure 3 is the artistic rending we developed to show the rate at which patients with PCA mention each content unit.We then compared the total number of content units retrieved across the two groups.Overall, PCA patients retrieved fewer content units (mean = 7.20 ± 5.63) compared to healthy individuals (mean = 16.28 ± 4.41) [t(45.21)= 6.52, p < 0.001].Lastly, patients with PCA had a lower likelihood of reporting the overall theme of the picture (i.e., mentioning the word "picnic") (mean = 0.16 ± 0.37), than healthy individuals did (mean = 0.90 ± 0.31) [t(46.77),p < 0.001].

Diagnostic classification
We used binary logistic regression to classify PCA and healthy individuals.Our predictor variables consisted of word frequency, word utterance latency, relational words, the total number of content units, and the probability of mentioning "picnic." As the sixth variable, we included the probability of mentioning "fisherman" because this content unit had the highest correlation with the group designation, likely due to its visuospatial processing demands.The average accuracy of the model was 98.15% after leave-one-out cross-validation.The average precision across all iterations was found to be 0.96, which means that, on average, 96% of the predicted positive cases were actual positive cases.Moreover, the model demonstrated an average recall of 1, indicating that it successfully identified all the positive cases from the test data in each iteration.We also evaluated the performance of the model using a Receiver Operating Characteristic (ROC) curve.The Area Under the Curve (AUC) was 1, indicating the perfect discrimination ability of the model (Figure 4).Similar prediction outcomes were achieved after word frequency was excluded as a predictor variable, resulting in the most parsimonious model with the highest prediction accuracy.Removing other variables led to a decline in the model's prediction accuracy.

Discussion
Using computational linguistic analysis in PCA, our study illuminated a distinction in language performance between visually-dependent and visually-independent contexts.At the theoretical level, we used this method to show that at least some of the language abnormalities increasingly being identified in PCA are byproducts of visuospatial deficits characteristic of this atypical AD syndrome.On the visually-independent job description task, the characteristics of language production we measured here were not The likelihood of mentioning each content unit of the WAB Picnic Scene by healthy individuals (A) and PCA patients (B).The shading intensity of each item corresponds to its verbalization probability by participants, with darker elements indicating a higher likelihood of being mentioned by PCA patients and healthy individuals. 10.3389/fnins.2024.1342909 Frontiers in Neuroscience 08 frontiersin.orgimpaired in patients with PCA.Translating our observations to clinical practice, we showed that computational linguistic analysis of a simple picture description task robustly classified nearly all PCA patients as distinct from healthy controls, supporting its value in clinical diagnostic evaluation.Our work is consistent with studies showing rich connections between networks representing information directly received from senses and information conveyed through spoken language (Damasio, 1989;Ralph et al., 2017;Popham et al., 2021).Unimodal sensory information and abstract language information are combined at multiple points across the cortex, such as inferior parietal lobule (comprising the angular and supramarginal gyri) and large swaths of posterolateral temporal cortex (Mesulam, 2000;Devereux et al., 2013), many of which can be affected in PCA.Therefore, language abnormalities in PCA may arise for at least two potential reasons.First, the neurodegeneration of PCA may extend beyond visuospatial areas to encompass regions involved in abstract language processing.In support of this hypothesis, evidence suggests that the brain regions affected in PCA overlap with those critical for word retrieval (Vonk et al., 1991;Migliaccio et al., 2009;Warren et al., 2012;Ossenkoppele et al., 2015;Leyton et al., 2017).Alternatively, language anomalies may arise as a consequence of visuospatial deficits hindering the transfer of necessary sensory information for amodal language processing.
While these two possibilities are not mutually exclusive, our results suggest that language impairments might be largely secondary to visuospatial dysfunction.In our analysis comparing a variety of speech and language properties of the narratives produced when PCA patients describe a complex visual scene versus a recounting of their primary occupation from memory, we observed speech and language impairments in only the visually dependent picture description task.We believe the increased word frequency and word utterance latency in the picture description task reflect visual deficits in object recognition.Similarly, the reduced use of spatial relational words may reflect the patients' difficulty processing spatial relations between elements of the picnic scene.The absence of abnormalities in word frequency, word utterance latency, and relational words in the job description task provides evidence that these language impairments do not stem from an intrinsic deficit in the language system in PCA.
Since most daily communication is a blend of visuospatial cognition, episodic memory, and other cognitive domains, we anticipate that an analysis of everyday speech would reveal linguistic deficiencies proportionate to the visuospatial load of its content.This expectation aligns with prior research reporting linguistic anomalies in participants recounting their recent holiday, an account that naturally encompasses the visuospatial processing of a recent event, such as where they went and what they saw (Crutch et al., 2013).Relatedly, it has been shown that the autobiographical narratives of patients with PCA have diminished spatiotemporal and perceptual details compared to healthy individuals.This finding likely stems from the impaired access of PCA patients to visual information, which plays a crucial role in constructing autobiographical memories (Ahmed et al., 2018).Furthermore, the narratives of patients with PCA have been shown to be spatially fragmented when they were asked to imagine and describe commonplace scenes (Ramanan et al., 2018).These findings provide converging evidence for the hypothesis that language reveals the visuospatial impairments of PCA proportionate to the visuospatial load of the content.
Another consideration in interpreting these task differences is that the picture description task required the use of specific linguistic elements representing the specific visual stimulus.It is also possible that when given fewer constraints in the job description task, individuals had the freedom to choose a potentially more familiar and more easily accessible language.Our explanation of the underlying language abnormalities in PCA is consistent with findings that showed a striking discrepancy between visual and verbal comprehension tasks in this population (Rogers et al., 2006).Our conclusion is also synergistic with results reporting very mild impairment in semantic memory in PCA, indicating that the apparent semantic impairment in these conditions may be secondary to visual impairment (Rogers et al., 2006).
Based on the observation that language is a sensitive indicator of visuospatial impairments of PCA, we performed an in-depth content unit analysis of language elicited through the picture description task.First, we measured the probability of verbalizing each content unit of the picture.The most distinguishing feature between PCA patients and healthy individuals was the probability of mentioning the "fisherman." While 93% of healthy participants mentioned this content unit, only 8% of PCA patients did so.This discrepancy could be attributed to the smaller size of this element in the picture.In addition, multiple elements are superimposed in the location of this content unit.Numerically, though not significantly, certain items, such as "clouds, " had a higher likelihood of being mentioned by PCA patients than healthy individuals.This type of analysis provides a naturalistic way of identifying the visuospatial elements that are particularly challenging for PCA patients and could help clinicians devise rehabilitative strategies to alleviate these challenges.Moreover, we observed that PCA patients often missed describing the overarching theme of the image ("picnic"), even when they identified certain components related to it ("basket").We believe this finding represents the effects of simultagnosia, which prevents many PCA patients from grasping the integrated theme of a composite visual entity.
Finally, when specific quantitative language metrics were employed to differentiate PCA patients from healthy individuals, our predictive model achieved a high level of performance, as evidenced by an AUC of 1 and an accuracy rate of 98.15%.Automating this linguistic evaluation from an easily acquired speech sample would facilitate the integration of measures like this into digital healthcare infrastructure, which a wide array of healthcare providers could potentially use once trained.This work extends our prior research, which used narrative data from a simple picture description task for a high accuracy classification of the three primary progressive aphasia variants (Rezaii et al., 2022(Rezaii et al., , 2023a,b),b).Future studies are needed to directly analyze the sensitivity and specificity of language predictors across a wide range of patient populations, examine the neurobiological underpinning of linguistic indicators of PCA, and address some of the limitations of this study.These limitations include the absence of neuropsychological scores for participants recruited via the MTurk online platform and the lack of normative data for tests used to determine the severity of PCA symptoms.Ultimately, these advancements will facilitate early diagnosis of PCA as well as monitoring response to disease-modifying, rehabilitative, or other therapies in this underserved atypical variant of AD.

FIGURE 2 PCA
FIGURE 2 PCA participants and healthy individuals described different content units on the picture description task.The point-biserial correlation coefficients between the likelihood of reporting each content unit in the picnic scene and the designated group.Negative values indicate that PCA patients have a lower chance of mentioning the content unit compared to healthy individuals.*** denotes p < 0.001, ** indicates 0.001 < p < 0.01, and * shows 0.01 < p < 0.05.Bars without an asterisk are not significantly different.

FIGURE 4
FIGURE 4Diagnostic performance of a model that distinguishes PCA patients from healthy individuals using linguistic features from the picture description task.(A) The scatter plot shows the predicted probabilities of classifying an individual as having PCA using selected linguistic features.Each point represents an individual participant, with colors indicating the group.The sigmoid curve illustrates the general trend of predicted probabilities.The dashed black line at the predicted probability of 0.5 serves as a decision threshold to classify between PCA patients and healthy individuals.Points are jittered vertically for better visualization.(B) ROC curve illustrating the performance of the logistic regression model in discriminating between PCA patients and healthy individuals, with an AUC value of 1 indicating the model's overall perfect accuracy.
(Rezaii et al., 2023c)al., 2023)l., , 2023a))ficant impairment in visuospatial functions and memory encoding and retrieval.Milder impairment was observed in specific language and memory tests.Performance on tests of simple attention (longest digit span forward) as well as auditory naming with phonemic cues was intact.CVLT-II-SF, California Verbal Learning Test-2nd Edition -Short Form.SDFR, short delay free recall; LDFR, long delay free recall; LDCR, long delay cued recall; BORB, Birmingham object recognition battery; VOSP, visual object spatial perception test.10.3389/fnins.2024.1342909Frontiers in Neuroscience 04 frontiersin.organalysis.The program generates a comprehensive set of text-based metrics, including the ratio of parts of speech and dependency relations to the total word count, word frequency, syntax frequency(Rezaii et al., 2022(Rezaii et al., , 2023a)), content units(Josephy-Hernandez et al., 2023), total units, and efficiency of words(Rezaii et al., 2023c).Additionally, Quantitext assesses audio-based features encompassing the three primary domains of time, frequency, and time-frequency.The variables used in this work are further described below.

TABLE 2
Clinical characteristics of the Aß + posterior cortical atrophy (PCA) and cognitively normal (CN) group.
Means and standard deviations are reported for continuous variables.MoCA, Montreal cognitive assessment; CDR, clinical dementia rating scale; SOB, sum of box scores.

TABLE 3
Comparing language features obtained from picture description and job description tasks across PCA patients and healthy individuals.