Cognitive Processes Underlying Verbal Fluency in Multiple Sclerosis

Background: Verbal fluency (VF) has been associated with several cognitive functions, but the cognitive processes underlying verbal fluency deficits in Multiple Sclerosis (MS) are controversial. Further knowledge about VF could be useful in clinical practice, because these tasks are brief, applicable, and reliable in MS patients. In this study, we aimed to evaluate the cognitive processes related to VF and to develop machine-learning algorithms to predict those patients with cognitive deficits using only VF-derived scores. Methods: Two hundred participants with MS were enrolled and examined using a comprehensive neuropsychological battery, including semantic and phonemic fluencies. Automatic linear modeling was used to identify the neuropsychological test predictors of VF scores. Furthermore, machine-learning algorithms (support vector machines, random forest) were developed to predict those patients with cognitive deficits using only VF-derived scores. Results: Neuropsychological tests associated with attention-executive functioning, memory, and language were the main predictors of the different fluency scores. However, the importance of memory was greater in semantic fluency and clustering scores, and executive functioning in phonemic fluency and switching. Machine learning algorithms predicted general cognitive impairment and executive dysfunction, with F1-scores over 67–71%. Conclusions: VF was influenced by many other cognitive processes, mainly including attention-executive functioning, episodic memory, and language. Semantic fluency and clustering were more explained by memory function, while phonemic fluency and switching were more related to executive functioning. Our study supports that the multiple cognitive components underlying VF tasks in MS could serve for screening purposes and the detection of executive dysfunction.


INTRODUCTION
Multiple sclerosis (MS) is a demyelinating disease and the most common cause of non-traumatic disability in working-age adults (1). It presents different lesions and cortical/ subcortical gray matter brain damage, as well as functional disconnection (2). The most prominent cognitive symptoms are slowed cognitive processing speed, attention, episodic memory, and executive function impairments, including verbal fluency (VF) deficits, and visuospatial analysis impairment (3).
Executive functions are an essential part of cognitive assessment and include different specialized cognitive processes. One of these cognitive processes is fluency, understood as the ability to generate non-overlearned responses after a cue presentation in a certain time window (4). In this regard, verbal fluency tasks are some of the most widely used tasks, and according to the cue presentation, it is possible to distinguish two modalities: words of a specific semantic field, called semantic fluency; and words beginning with a specific letter, named phonemic fluency. Due to the time window of the task, sustained activation is necessary for the generation of nonoverlearned responses (also called processes of energization). While search and access strategies are required during fluency tasks (4), selection mechanisms seem to be key to understanding the different mechanisms involved in phonemic and semantic fluency. Phonemic fluency tasks imply a selection effort to retrieve words according to the initial letter, instead of semantic fields that are more common. Thus, associated stored words could be easily activated and should be inhibited based on task instruction. In contrast, a semantic cue would activate interconnected words based on lexico-semantic networks, giving as a result, less competition between correct words and intrusions than in a phonemic fluency task (5,6). For this reason, deficits in phonemic fluency tasks have been more closely associated with executive dysfunction. On the other hand, deficits in semantic fluency tasks could be more related to semantic memory impairments than executive dysfunction (7). After word retrieval, self-monitoring processes play a significant role in verbal fluency tasks (4,7).
Although verbal fluency tasks have been more studied than other fluency tasks, the cognitive processes involved in VF remain unclear (5). In this regard, the limited information of a total score (number of correct answers) has given rise to the study of other scores, such as word production during the first 15 s and errors. The higher number of words during the first 15 s, compared to the decrease in word production during the rest of the time window suggests easy access to the lexico-semantic storage and the need for a search strategy to continue the word production over the course of the task (8). Errors have also been proposed as complementary information in VF. Repetitions and intrusions (also called rule break errors) are the most frequent and have been associated with inhibition impairments (8). However, these scores do not give specific information about the lexical access strategy (9) and there is not a significant difference in error score between MS patients and healthy controls to consider errors as an optimal executive dysfunction measure (8). For a deeper understanding of the lexical access strategy, it has been proposed the study of clustering and switching (7). During task performance, participants generate different responses that can be classified into subcategories or clusters. Once a subcategory is exhausted, participants switch to a different subcategory (7). Thus, it is possible to obtain the number of clusters and switches, as well as qualitative information. These parameters could be more sensitive to detect the underlying neuropsychological deficits involved in each patient and could contribute to the understanding of cognitive function in patients with MS (10).
The cognitive profile of MS is generally characterized by impairments in processing speed, attention-executive functioning, and memory. VF tasks have some important advantages in clinical practice. They are easy to administer and shorter than other neuropsychological assessments for MS. Because VF has been associated with executive functioning and memory and they are assessed in a limited time, VF could serve as sensitive and brief cognitive function measures in MS, with special interest during early onset of the disease (11). Furthermore, these tasks are well-tolerated and are not significantly impacted by visual or motor impairments (4,6). However, previous studies show inconsistent evidence (6). On the one hand, some authors have suggested VF as a screening test (9,12,13). On the other hand, while some studies suggested an equal impairment between phonemic and semantic fluency (6,14), others have found a greater impairment in phonemic or semantic fluency (6,15).
Our hypothesis is that VF reflects multiple cognitive components, and the assessment of different VF tasks and several parameters (number of words, clustering, switching, etc.) could be useful to disentail the cognitive demands underlying each task and score. A comprehensive assessment of VF tasks could be useful to detect patients with cognitive impairment in MS, and the impairment of specific cognitive domains, particularly executive functioning. Accordingly, our aim was 2-fold: first, to evaluate the cognitive processes related to verbal fluency in patients with MS through the identification of predictor variables of verbal fluency scores in a comprehensive neuropsychological battery; second, to develop machine-learning algorithms to predict those patients with cognitive deficits using only VFderived scores.

Neuropsychological Assessment
All participants were evaluated using the neuropsychological battery Neuronorma (17,18), previously validated for MS in our setting. This battery included the following tests: Digit Span forward and backward, Corsi's Test forward and backward, Trail Making Test A and B (TMT), Symbol Digit Modalities Tower of London-Drexel test (ToL) (total moves score, total correct score, total initiation time score, total execution time score, total problem-solving time score), a semantic fluency task (SF) (animals), and a phonemic fluency tasks (PF) (words beginning with "p"). According to this battery, patients were classified as cognitively impaired or cognitively preserved using the previously validated criteria (17). In brief, these criteria define cognitive impairment when at least two cognitive domains are −1.67 standard deviations below the mean, according to age-, sex-, and education-adjusted scores. Similarly, cognitive domains were considered impaired according to the same criteria (17) (see Supplementary Material 1). Furthermore, Paced Auditory Serial Addition Test (PASAT) and two extra phonemic fluency tasks (words beginning with "m" and "r") were also performed. In the VF tasks, participants were asked to produce as many words as possible in 1 min, according to the specified cues. One point was assigned for each correct word based on the guidelines by Ledoux et al. (19). In addition, Beck's Depression Inventory (20), and Fatigue Severity Scale (21) were administered.

Procedure
Patients were evaluated on a single session lasting ∼120 min. First, digit span, Corsi's test, and VF tasks were performed and took ∼10 min. Next, FCSRT was administrated and, to avoid the interference of other verbal stimuli during the delay, tests without a high verbal load were performed, such as SDMT, TMT, ROCF copy, Stroop, ROCF recall after 3 min, and ToL. FCSRT took ∼15 min with a delay of 30 min. The SDMT and Stroop were considered timed tests with time of performance of 90 and 45 s per each Stroop part, respectively. TMT, ROCF copy, and recall after 3 min took ∼7 min (for mean time details, see Table 2), while Tower of London test took ∼20 min. After the delayed recall of FCSRT and during ROCF 30 min delay, tests with verbal responses were administrated, such as PASAT and BNT with a mean duration of 8 and 15 min, respectively. Then, ROCF recall after 30 min, ROCF recognition task, and JLO were administered. Both ROCF tasks took ∼7 min, and JLO had a mean time of administration of 15 min. Finally, patients completed the Beck's Depression Inventory and the Fatigue Severity Scale.
All scores obtained from fluency tasks were calculated by two of the authors working independently, and final scores were reached by consensus, according to the scoring criteria developed by Ledoux et al. (19). VF-derived scores included: (a) number of correct answers without repetitions or intrusions; (b) repetitions; (c) intrusions; (d) number of clusters; (e) number of switches; (f) mean clusters (total words in clusters/ number of clusters); (g) percentage of correct words in clusters; (h) correct words in clusters. In PF, the results related to words beginning with "p" were considered singly, as well as in the sum of the results from PF considering the three initial-letters ("p, " "m, " and "r"), as previous studies (22).

Statistical Analysis
Statistical analysis was performed using SPSS Statistics 22.0. Descriptive data are shown as mean ± standard deviation. Pearson's correlation coefficient (r) was used for the analysis of the correlation between quantitative variables. The Pearson r coefficient was classified as very low (0-0.29), low (0.3-0.49), moderate (0.5-0.69), high (0.7-0.89), and very high (0.9-1). R software (ggplot) was used to create a heatmap of the correlation matrix. One-way ANOVA and Tukey post hoc test were calculated for intergroup differences, considering statistically significant a p < 0.05. Automatic linear modeling (LINEAR) procedure was used to identify the neuropsychological tests predictors of VF scores (23). A different model was estimated for each VF score, introducing all Neuronorma tests, PASAT, and phonemic fluency scores as predictor variables. Only variables with p < 0.05 were considered predictors.

Machine Learning Analysis
Two supervised classification algorithms, Support Vector Machine (SVM) with linear kernel and Random Forest (RF) were implemented with Scikit-learn v.0.22.1 in Python v.3.6.9. Six different binary classification tasks were performed depending on the class to predict: the presence of cognitive impairment or cognitive dysfunction in five different cognitive domains (attention and executive functioning, information processing speed, memory, visuospatial function, and language), according  to the criteria explained above. Before performing classification, high and very high correlated features -those with a Pearson's coefficient >0.7-were excluded. For each classification task, the dataset was randomly split into training (n = 140, 70%) and test (n = 60, 30%) sets. The split was made taking into account the distribution of each class. Best hyperparameters of each model were determined carrying out a 5-Fold Cross-Validation Grid Search on the training set. Each best model was then evaluated on its corresponding test set. Models' performance was evaluated in terms of precision, recall, and F1-score values.

Ethical Approval
The study was conducted with the approval of our hospital's Ethics Committee, and all participants gave written informed consent.

VF Across Groups and Correlation With Non-cognitive Characteristics
Considering the classification of MS patients, there was only a significant difference between groups in semantic fluency total scores (F 2 = 5.39; p = 0.005). Tukey post hoc test showed differences between RR and SP groups with lower scores in SP (p = 0.004). Semantic fluency total score correlated with EDSS score (r = −0.284; p < 0.001). Phonemic fluency ("p" and "pmr" total scores) also correlated with EDSS (r = −0.208; p = 0.003 and r = −0.191; p = 0.008, respectively). There was a significant correlation between semantic fluency total score and depression (r = −0.195; p = 0.006). Phonemic fluency with "p" total score (r = −0.176; p = 0.012) and phonemic fluency with "pmr" total score (r = −0.210; p = 0.003) also correlated with depression.

Correlation Between VF and Other Neuropsychological Tests
Main neuropsychological results by the three sub groups are shown in Table 2. Correlations between VF and neuropsychological tests are shown in Figure 1. In summary, semantic fluency showed moderate correlations with BNT, FCSRT, phonemic fluency scores, SDMT, Stroop A, and Stroop B. Phonemic fluency with "p" correlated moderately with BNT, SDMT, and semantic fluency. Similar correlations were found in phonemic fluency with "pmr, " including a moderate correlation with Stroop A.

Neuropsychological Predictors of VF Tests
Automatic linear modeling assessing the neuropsychological predictors of each verbal fluency score is shown in Tables 3-5. The criterion variables with the highest percentage of explanation by the predictor variables were correct answers, clusters, switches, and words in clusters. In semantic fluency, the linear modeling identified FCSRT (total free recall), BNT, Stroop A, and PASAT as predictors of correct answers and explained 54.6% or the variance. Regarding

Machine Learning Classification
Two different classifiers (Support Vector Machine and Random Forest) were used to predict the presence of cognitive impairment, as well as the presence of cognitive dysfunction in each evaluated cognitive domain. Tuned hyperparameters and specifications of each model can be found in Supplementary Material 2. Figure 2 shows the F1-score obtained for each classifier, and full information about precision, recall, and F1-score values are depicted in Supplementary Material 3. Both aforementioned classifiers performed better for cognitive impairment and attention and executive dysfunction, with F1-scores between 67 and 71%. Conversely, classification performance scores for the other cognitive domains were lower. Features importances in Random Forest models are shown in Figure 3.

DISCUSSION
The cognitive processes involved in verbal fluency in MS remains controversial, due to the specific characteristics of cognitive impairment and brain damage associated with MS. In this study, we applied automatic linear modeling to investigate the neuropsychological tests that better explained the verbal fluency tests performance. Interestingly, we found different predictors according to the different fluencies (phonemic or semantic) and the different scores used (total words, clustering, and switching).
These results support the view that fluency tasks provide useful information about a wide range of cognitive functions. Specifically, semantic fluency (total score) was predicted by the FCSRT (total free recall), Boston Naming Test, Stroop A, and PASAT, which confirm the influence of memory and language tasks, but also attention and time-dependent tests. Similarly, clustering in semantic fluency was predicted by the FCSRT, Stroop A, and Corsi test. Conversely, switching in semantic fluency was mainly explained by three attention-executive and time-dependent tests: Stroop A, ToL, and PASAT.
Regarding phonemic fluency, several tests measuring attention-executive functioning, language, and memory were the main predictors. Clustering was predicted by ToL and FCSRT, while switching by BNT, Stroop A, Corsi, ToL, FCSRT, and TMT-B. Thus, our results confirm the influence of three main cognitive domains in fluency tasks, including attention-executive functioning, memory, and language. Although the tests mainly associated with these cognitive domains are predictors of the different fluencies and scores, the importance of memory was greater in semantic fluency and clustering, and executive functioning in phonemic fluency and switching. In addition, it is worth mentioning that several of the best predictors were time-dependent tasks, which also emphasize a potential role of processing speed. Although the SDMT was not included in any statistical model, it showed moderate correlations with all the fluency scores, as in previous studies (6,9). Overall, these findings emphasize the interest to extract several parameters in fluency tasks to capture as much information as possible.
Another interesting result is the role of the Boston Naming Test, which predicted several fluency scores, such as correct answers in semantic and phonemic fluency. This test shares some cognitive processes with fluency tasks, such as search, selection, and word retrieval, but with a lower degree of time restriction. Although language was usually considered to be largely preserved in MS, recent studies using novel tests evaluating the speed to lexical access have shown frequent impairment even in early stages (24).
We have developed several machine-learning algorithms trying to predict those patients with cognitive impairment, and those with dysfunction of specific cognitive domains. Interestingly, VF scores achieved acceptable values for the prediction of general cognitive impairment and executive dysfunction, which confirms the major role of executive functioning in VF in MS. Scores derived from phonemic fluency (e.g., correct words beginning with "p, " clusters, and switches) were more useful in the prediction of executive dysfunction. For general cognitive impairment prediction, a combination of scores from semantic and phonemic fluencies were amongst the most predictive, which suggests the interest of combining semantic and phonemic VF in short batteries (14). Unfortunately, the algorithms showed low levels of accuracy in the other cognitive domains, which supports the need for a full and comprehensive neuropsychological assessment to evaluate specific cognitive deficits in MS.
These findings may also be interpreted in terms of the neural basis of cognitive dysfunction in MS. Semantic fluency and phonemic fluency have been associated with subcortical volumes in voxel-based morphometry analysis (2). Specifically, phonemic fluency was mainly correlated with caudate, while semantic fluency with both thalamus and caudate in both hemispheres. Impairment of these structures is considered key in the pathophysiology of cognitive impairment in MS, especially in attention and executive functioning. Conversely, in other functions, such as memory or language, other regions are necessary to predict cognitive performance (i.e., hippocampus and temporal lobe in memory) (25). Neural basis of cognitive assessment in MS shows several particularities, in contrast with other disorders (tumors, stroke, or neurodegenerative dementias). In this regard, in other disorders VF has been mainly correlated with several cortical regions in the left hemisphere (26). These specificities warrant the study of the cognitive processes and neuroimaging correlates of the neuropsychological tests used in the setting of MS to accomplish an adequate interpretation of neuropsychological assessment.
Our study has some limitations. First, algorithms were developed on the basis of some criteria, which also included the impairment of VF. This could imply a certain degree of circularity in the machine learning analysis. However, these criteria were previously validated in an independent study, and impairment of VF according to these criteria was present in a relatively low percentage of cases classified as cognitively impaired (36.2% for semantic VF, and 29.8% for phonemic VF). Second, VF are tasks language-dependent, and our results should be confirmed in other cultures. In this regard, there are differences in the frequency of words between languages, and cross-cultural adaptations are required to minimize it, especially for phonemic fluency (27,28). For instance, words beginning with "f, " "a, " and "s" are common phonemic fluency tasks for English speakers, but for Spanish speakers the initial letters "p, " "m, " and "r" have been proposed as an alternative and are generally preferred, based on the frequency of words (27)(28)(29). Third, we did not include neuroimaging analysis in this study. Correlation between the different scores and neuroimaging techniques (voxel-based morphometry, cortical thickness, diffusion tensor imaging, etc.) may be of interest in future studies. Fourth, we did not perform a correction considering motor dexterity. Due to the possibility of motor disorders in MS patients that could compromise the test interpretation, particularly in timed neuropsychological tests, this type of correction may be useful to improve the reliability of the neuropsychological examination (30). Finally, due to the aims of the study, a comprehensive battery was administrated with the possible presence of fatigue effect.
In conclusion, our study highlights the interest of further research into the assessment of VF in patients with MS. VF was influenced by many other cognitive processes, mainly including attention-executive functioning, episodic memory, and language. Semantic fluency and clustering were more explained by memory function, while phonemic fluency and switching were more related to executive functioning. The multiple cognitive components underlying VF tasks could serve for screening purposes. In this regard, we have developed several machine learning algorithms that could be useful to detect patients with cognitive impairment using only VF, although these models performed adequately only for general cognitive impairment and executive dysfunction. Overall, our study supports the implementation of a comprehensive and qualitative assessment of verbal fluency in MS, which may provide interesting insights into cognitive function in patients with MS.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Comite de Etica e Investigacion Clinical del Hospital Clinico San Carlos. The patients/participants provided their written informed consent to participate in this study.