Language Patterns Discriminate Mild Depression From Normal Sadness and Euthymic State

Objectives Deviations from typical word use have been previously reported in clinical depression, but language patterns of mild depression (MD), as distinct from normal sadness (NS) and euthymic state, are unknown. In this study, we aimed to apply the linguistic approach as an additional diagnostic key for understanding clinical variability along the continuum of affective states. Methods We studied 402 written reports from 124 Russian-speaking patients and 77 healthy controls (HC), including 35 cases of NS, using hand-coding procedures. The focus of our psycholinguistic methods was on lexico-semantic [e.g., rhetorical figures (metaphors, similes)], syntactic [e.g., predominant sentence type (single-clause and multi-clause)], and lexico-grammatical [e.g., pronouns (indefinite, personal)] variables. Statistical evaluations included Cohen’s kappa for inter-rater reliability measures, a non-parametric approach (Mann–Whitney U-test and Pearson chi-square test), one-way ANOVA for between-group differences, Spearman’s and point-biserial correlations to analyze relationships between linguistic and gender variables, discriminant analysis (Wilks’ λ) of linguistic variables in relation to the affective diagnostic types, all using SPSS-22 (significant, p < 0.05). Results In MD, as compared with healthy individuals, written responses were longer, demonstrated descriptive rather than analytic style, showed signs of spoken and figurative language, single-clause sentences domination over multi-clause, atypical word order, increased use of personal and indefinite pronouns, and verb use in continuous/imperfective and past tenses. In NS, as compared with HC, we found greater use of lexical repetitions, omission of words, and verbs in continuous and present tenses. MD was significantly differentiated from NS and euthymic state by linguistic variables [98.6%; Wilks’ λ(40) = 0.009; p < 0.001; r = 0.992]. The highest predictors in discrimination between MD, NS, and euthymic state groups were the variables of word order (typical/atypical) (r = −0.405), ellipses (omission of words) (r = 0.583), colloquialisms (informal words/phrases) (r = 0.534), verb tense (past/present/future) (r = −0.460), verbs form (continuous/perfect) (r = 0.345), amount of reflexive (e.g., myself)/personal (r = 0.344), and negative (e.g., nobody)/indefinite (r = 0.451) pronouns. The most significant between-group differences were observed in MD as compared with both NS and euthymic state. Conclusion MD is characterized by patterns of atypical language use distinguishing depression from NS and euthymic state, which points to a potential role of linguistic indicators in diagnosing affective states.

inTrODUcTiOn Mild depression (MD) is a common mental state (1), observed in 15% of the adult population (2), with only 23% receiving any treatment (3). MD is mostly related to life stresses (4) and [unlike moderate and severe major depressive disorder (MDD)] is poorly responsive to antidepressant medication (1,5,6). Nonetheless, MD [as distinct from subthreshold, minor depression (7) or normal sadness (NS) (8,9)] is a serious medical condition causing professional and personal disabilities (10)(11)(12). Indeed, MD is associated with unemployment in 16% of cases (13). The chronic course of mild depressive symptoms within dysthymia brings an elevated suicidality risk, compared with MDD (14). MD is often prodromal to MDD (7,15,16). NS in the absence of clinical depression is also frequent (29.8%) in the general population (17).
The ICD-10 (18) diagnosis of MD requires four symptoms, whereas the DSM-V (19) criteria are based on seven main symptoms, and the Hamilton Depression Rating Scale (HDRS) gives an MD diagnosis threshold for scores ranging from 7 to 17 as widely accepted by clinicians or cutoff scores from 8 to 16 as suggested by the recent severity classification of HDRS (20)(21)(22)(23). However, depression is heterogeneous and presents with highly variable clinical symptoms, so its diagnosis cannot be made merely by the number of symptoms, but should include their detailed analysis and causal relations (24)(25)(26). Diagnosis of MD was reported to be less stable compared with diagnosis of severe depression using ICD-10 criteria and was characterized by a fair level of agreement (kappa = 0.25) between clinicians compared with the moderate reliability in severe depression cases (kappa = 0.53) (8,27). The claimed high prevalence of MD is sometimes viewed with skepticism, given the questionable reliability of psychiatric diagnoses in general (28), and especially with respect to the differentiation of MD from NS (8,29). Correct recognition of subthreshold forms of NS is based upon the number, duration, and quality of presented symptoms (30). Despite the elaboration of criteria cited above, psychiatry still lacks objective clinical tests of symptoms comparable with those routinely used in other medical disciplines (31). Affective (e.g., decreased mood) and cognitive (e.g., negative content of thoughts) components of MD and NS are mostly expressed through language, while more severe forms of depression are also recognized by a motor component (e.g., slow bodily movements). The search for objective indicators of MD vs. NS might help to increase the reliability of MD diagnosis. Andreasen and Pfohl (32) first showed that language is a specific marker of depression, and currently active study groups have concluded that an analysis of natural language processing could afford the foundation for developing objective diagnostic tests "based on dimensions of observable behavior" (33) (p. 904).
While a clinical interview remains the basic tool for diagnosing depression (34), linguistic research has demonstrated that systematic analysis of language content reliably classifies patients into appropriate diagnostic groups (35,36). Nguyen et al. (37) report that computerized word counting techniques (38,39) discriminate depression communities from other subgroups and also reveal strong online-language predictors of depression (40) and suicide (41). Aberrant written and spoken languages are frequently reported in patients with depression (42)(43)(44)(45)(46). Being a chronic affective disorder presenting either within mild depressive symptoms or with marked absence of pleasure in daily activities, dysthymia is characterized by increased speech flow, in contrast to the slowed speech typical of MDD (14). The excessive use of first-person singular pronouns (I) correlated with depression in many (22,23,38,46,47), but not all studies (48). Objective (me) and possessive (my) first-person pronouns were more frequent in speech of a group with depression, and predicted depression better than did subjective (I) pronouns (47). Elevated usage of first-person pronouns was attributed to self-focused attention or self-preoccupation (44,47,49). Among various measures of depressive self-focusing style, rumination (repetitions of the same, usually negative, information) has been mentioned in many studies (50)(51)(52). Other features of depression included elevated use of mental state verbs (think), words denoting causal relations (because) (53), greater use of generalizing terms (everything, always), negation (nothing, never), and words referring to ambivalent emotional states (54,55). The increased use of discrepancy words (should), possibly reflecting enhanced aspirations for the future (56), has been discussed as a marker of improvement with therapy for depression. Together, these promising results denote that "the styles in which people use words" represent no less meaningful information than "the content of what they say" about their symptoms (38) (p. 548). Nonetheless, language phenomena are still not widely considered for psychiatric diagnosis of affective states.
hypotheses Given this background, we predicted that our exploratory analysis of linguistic variables would reveal a set of word-use patterns for differentiation of MD from NS and euthymic state (see Russian/English examples in Table 1). Directional hypotheses.
(generalized, negative) pronouns, reflecting words of generalization, negation and ambivalent emotional states revealed in depression (54,55). Non-directional hypotheses. Our specific hypotheses follow: Focusing on syntactic and lexico-semantic variables, we explored whether MD patients (2) predominantly used single-clause vs. multi-clause sentences and (3) narration vs. reasoning, as reflecting descriptive vs. analytic thought style. We predicted (4) an increased number of lexical (tautologies) and semantic repetitions in MD as a marker of ruminations and depressive self-focusing style (51,52), and further explored whether MD (5) favors figurative language (metaphors, similes), and (6) unusual/atypical word order related to their emotionally overwhelmed state (54). Based on some previous studies and our own clinical experience, we also hypothesized that, since ruminations are mostly focused on past negative events, MD patients would express within lexico-grammatical variables (7) predominantly with the continuous (the imperfective tense of Russian verbs denoting uncompleted actions) rather than the perfect (perfective type/completed actions) form [state-of-being verbs (32)], and (8) the past rather than present or future tense verbs [past vs. future in depression (57); negative schemas of the past in depression (58)]. Thus, we aimed to apply the linguistic approach as an additional diagnostic key for understanding clinical variability along the continuum of affective states.

Participants
All 201 subjects gave written informed consent according to the Declaration of Helsinki to participate in the study. The research protocol was approved by the Samara State Medical University's Ethics Committee in 2009. Patients were examined at the University's Department of Psychiatry after referral from general practitioners, neurologists, and psychotherapists, and had not previously consulted a psychiatrist or been prescribed psychotropic medications before or during the brief period of investigation. The diagnoses were based on the results of clinical psychiatric interviews delivered by psychiatrists (Daria Smirnova and Gennadii Nosachev) and were coded using ICD-10 diagnostic criteria. Inclusion criteria for patients were (1)  Healthy controls (HC), including subgroups of normal healthy (NH) and individuals in a state of NS, were recruited from among volunteers invited by public announcement and signage. Each HC participant was interviewed separately by two psychiatrists (Daria Smirnova and Gennadii Nosachev) of the University's Department of Psychiatry to confirm an absence of history of mental disorders in the past and any present diagnoses based on ICD-10 diagnostic criteria. Qualification of NS state in HC participants was consensus-based (Daria Smirnova and Gennadii Nosachev). Inter-rater reliability on categorization of NH vs. NS between two psychiatrists was high: k = 0.894, p < 0.001, 95% CI (0.795-0.993). HC included 77 age-and education-matched native Russian speakers (61 females) of mean (SD) age 40 (12) years. Among HC, 42 participants were designated as NH and 35 were qualified as being in a state of normal sadness (NS), based on reporting current life problems and low mood. The NS individuals were coded as having potential health hazards according to the following ICD-10 categories: Z56-problems related to employment and unemployment (n = 7), Z59-housing and economic circumstances (n = 14), Z60-social environment (n = 4), and Z63-primary support group, including family circumstances (n = 10).

Data collection Procedures
Clinical psychiatric interviews were used as a database for psychopathological evaluation. In the psycholinguistic approach, we focused on the written self-reports [on the topic (i) "The current state of life and future expectations" and (ii) "The meaning of life"] provided by all participants. The instruction on each of two topics was given orally by a researcher as follows: "Please write as much as you think is necessary and take as much time as you need to describe your current state of life and future expectations." In total, 402 texts were analyzed by the research team, which included a psychiatrist (Daria Smirnova), linguist (Elena Sloeva), and clinical psychologist (Natalia Kuvshinova). While one rater (Daria Smirnova) was necessarily informed about the clinical state of the individuals (patients or HC), the other two raters were blind regarding the group assignment. Both blind raters analyzed the entire sample regarding linguistic variables. The HDRS (21 items) validated Russian version was administered to all subjects. HDRS raters were not blind to MD group, as patients had been referred with the preliminary diagnosis of depression. As for the HC group, HDRS scores have been recorded before the HC (NH vs. NS) group allocation.

Psycholinguistic analysis
Written samples were analyzed with respect to the number of words in the text using MS Word properties and hand-coding procedures: (i) lexico-semantic [e.g., rhetorical figures (metaphors, similes)], (ii) syntactic [e.g., predominant sentence type (single-clause, multi-clause)], and (iii) lexico-grammatical [e.g., pronouns (indefinite, personal)]. We defined categorical variables according to the participant's predominant usage of each relevant linguistic unit in each linguistic sample. For example, if a participant used 5 single-clause sentences and 10 multi-clause sentences, then the estimate of the variable "Predominant sentence type" was specified as "multi-clause." Quantitative variables were scored as quotients according to the number of the relevant units over a span of 10 sentences. In other words, if a participant used 6 metaphors across 20 sentences, then the quotient of metaphors is equal to 3, calculated as the proportion per 10 sentences. All the variables are summarized in Table 1.

statistical Data analysis
All data were checked for the assumption of normality using the Shapiro-Wilk test and by inspection of histograms. Differences between study groups were calculated using the non-parametric Mann-Whitney U-test, two-tailed, Pearson chi-square test, and one-way ANOVA, depending on the type of variables and number of groups compared. Spearman's bivariate and point-biserial correlations were used to analyze relationships between linguistic data and demographic variable of gender. Values of p < 0.05 were considered statistically significant. An inter-rater reliability analysis using the Cohen's kappa statistic was performed to determine consistency between raters on categorization of NH and NS groups and on linguistic variables. Discriminant analysis (Wilks' λ) was used to establish the level of significance in relation to diagnostic types based on linguistic variables. All statistical analyses were performed with the IBM SPSS Statistics 22 (59).

resUlTs clinical Description of MD and ns
From the psychiatrist's clinical perspective using the classical approach of descriptive psychopathology, a state of MD was characterized by the following signs and symptoms: (i) depressed mood, consisting of sadness, sorrow, irritability, despondency, or melancholy, (ii) mood swings during the day with predominant hypothymia, and (iii) more prominent mood changes in reaction to current life events. The depressive condition affected the patient's quality of life and was perceived by the patient as a pattern of unwanted or even alien behavioral reactions. Furthermore, MD included partial anhedonia and distortion of self-image to reflect low self-esteem, lack of selfconfidence, and self-dislike. Patients also expressed difficulties in decision-making, as well as a pessimistic perception of current life events. Their complaints included a negative view of the past, with emphasis on committed mistakes and failures. Finally, MD was associated with loss of energy, fatigue and lack of interest in social activities. Their somato-autonomic dysfunction manifested in sleep disturbances, changes in appetite, reduced libido, and asthenia. In contrast to MD, self-perception in the NS subgroup was expressed as an adequate and appropriate reaction to current adverse life events. While NS participants described their emotional experience as a constant subjective feeling of dissatisfaction regarding objective life circumstances arising from external reasons, their ideation was focused on the details of their problematic life situation. The NS group continued their usual daily activities, but with some muting of interests and periods of ruminations accompanied by feelings of sadness. The NS further differed from MD in their focus on present difficulties while analyzing their decision-making and problem-solving strategies, and in that they commonly described future aspirations.
Mild depression patients used more colloquialisms or informal words/phrases ( Table 2). Responses in MD also had more repetitions, both with respect to re-using the same words (tautologies) and to expressing the same idea multiple times (lexical and semantic repetitions). The MD group used significantly more metaphors and similes (figurative language) than HC ( Table 2). In comparison with euthymic NH, the NS group was impoverished at the lexico-semantic sublevel, showing greater use of tautologies and repetitions, in general ( Table 2).

Lexico-Grammatical Variables
Patients' responses contained significantly more personal and indefinite pronouns, compared with NS and HC (

DiscUssiOn
By choosing the topics for written reports for patients, we intended the diagnostically relevant mental state to appear in the written speech, thus matching responses to the clinical interview and reflecting the context of past and present in the frame of the patients' description of their depressed mood. We assigned the topics about future expectations and meaning of life to document the patients' positive resources, motivations, and potential ability to use the context of the future as reflecting these perspectives for future recovery. However, we concede that these topics might have biased the emotional involvement in patients and thus influenced the content of written reports, as well as the writing style. Our study demonstrated that language of MD patients was characterized by significant differences within the set of lexico-semantic, syntactic, and lexico-grammatical variables, as earlier shown within some language indicators for depression (32,43,44,46). In agreement with a report of increased speech flow in dysthymia, which is mostly characterized by mild depressive symptoms with a chronic course (14), as distinct from the briefer responses in MDD (60), we found longer written responses emerged as a diagnostic sign for discrimination of MD and HC. As predicted, while providing longer responses, our MD patients predominantly used single-clause sentences, reduced utterances, and incomplete phrases with omission of words (ellipses), which reflects the language flow interruptions previously observed in studies of clinical depression (45,60,61). We suppose that the pattern of frequent usage of rhetorical figures within phenomena of figurative language (metaphors, similes) and atypical word order (inversions, ellipses) in MD could be interpreted as arising from overt emotional dominance in language content, following the concept presented by Pennebaker et al. (38) about language features reflecting Our finding of increased usage of typically oral language expressions (colloquialisms), which was among the highest predictors for differentiation between MD, NS, and NH, together with unusual/atypical word order, confirmed the hypothesized predomination of conversational style over standard written language patterns in MD. Patients seemingly had a certain lack of flexibility, such they could not readily shift from oral conversation with the researcher into the written style appropriate for the self-reporting task. This resembles their difficulty in switching from depressive self-focused attention and ruminations (lexical/ word and semantic/topic repetitions) toward potential positive thinking and adaptive coping strategies (50)(51)(52).
We also established that, within multi-clause sentences, our MD patients more often used compound sentences (without causal relations between the clauses content with a sentence) than complex sentences (with causal relations between the clauses). This finding in MD stands somewhat in contrast to that of Pennebaker et al. (53), who found generally increased use of causation words (typical for compound-type rather than complex-type multi-clause sentences) in depression, although we did not explicitly rate causation words. In combination with the finding of a predominant use of the single-clause sentences, these properties of sentence use revealed a more frequent addressing to descriptive rather than analytic thought strategies. From a developmental point of view (62), descriptive strategies within narration may represent an early acquired or basal form of verbal behavior, in comparison with the mature analytic style within reasoning acquired later in life. This scenario suggests that MD entails regression in the style of using verbal strategies for organizing the discourse (62). While HC used a mature strategy, including both analysis of events and intellectual reflection (self-analysis and problem-solving behavior), intellectual reflection in MD was subsumed by a sensual/emotional reflection within passive narration.
Consistent with previous findings on greater pronoun use within the context of depressive self-focusing or self-preoccupation style (38,44,46,47,49,52), we found that an increased number of personal (e.g., I), possessive (e.g., my), reflexive (e.g., myself) pronouns, gave significant discrimination of MD, NS, and NH. Higher use of personal pronouns was earlier described for healthy participants of female gender (63), but this was not evident in our sample. Enlarged use of generalized (e.g., every thing) and negative (e.g., nobody) indefinite pronouns confirmed previously obtained data describing the overt emotional dominance within generalization, negation, and polarity in emotional expression in depression (54,55). Frequent use of negative pronouns may refer to the coping mechanisms of denial and negation associated with depressive symptoms or depressive personality traits (44). Insofar as pronouns lack semantic content in their word root, we suggest that their increased use in MD conveys loss of specific meanings in speech and could also be interpreted as a manifestation of semantic impoverishment; this is in keeping with data on reduced semantics in depression (64) and mild cognitive impairment (65).
As we hypothesized, written language in MD was shifted into the past, reflected not only through ruminations about past life events within lexical and semantic repetitions but also in the increased frequency of past tense verbs (57,58). This concurs with studies demonstrating that depressed patients use fewer discrepancy words (e.g., should), which typically symbolize aspirations for the future (56,66). Our patients used more verbs in continuous/imperfective form, as earlier noted by Andreasen and Pfohl (32). The self-perception of time in MD within the past tense verbs emerged as an additional discriminative feature of the high predictability for differentiation of affective states in our study.
Our findings regarding the patterns of language use as a result of affect or mood influence may reflect not only symptomatic behavior and thinking within the affective states of MD or NS but could also be indicative of stable personality traits or defensive mechanisms, a possibility that requires further investigation (44). However, our discriminant model significantly differentiated the conditions of MD from NS and euthymia with a probability of 98.6%. Another discriminant model using linguistic indicators significantly differentiated the states of MD with and without anxious features, NS and euthymia with the similar level of probability (97.6%). These data may support our hypothesis about the particular effect of affective component on the deviations in language use. This result, which confirms and extends the observations in depression by Oxman et al. (35), Desmet and Hoste (67), Kahn et al. (68), and others, also illuminates the role of assessment of verbal behavior in MD and NS for clarifying the continuum and variety of affective states.

limitations of the study and implications for Further research
We analyzed only written texts but did not record examples of natural oral speech flow. We used hand-coding procedures and did not apply the Linguistic Inquiry and Word Count (39) computer program elaborated to categorize the text into linguistic categories, because this does not yet exist for Russian language. Also, given the large number of variables examined in the study, we must consider the possible occurrence of type I errors related to interpretation of results. As no patients with psychiatric comorbidities were included, we accordingly isolated the influence of depressive affect on language. As such, we do not take a strong position related to the generalizability of findings in our sample but propose a broader investigation addressing these potential confounds. Future studies might benefit from examining the relationships between language patterns in patients with affective states and their personality traits, thus aiming to define the contribution of personality factors on language use. We expect that these results will draw more attention to the diagnostic significance of language assessment in psychiatry and clinical disciplines and show that verbal behavior is a sensitive diagnostic marker in MD. We also suggest that this would encourage practitioners to attend not only to what the patient utters but also how it is spoken. There remains a need for more data regarding linguistic features of conversational language in depression and for generalization to different languages, so as to support a broader applicability of the concept of diagnostic criteria based on written language, and to support precise recommendations for guidelines in clinical practice. In relation to practical implementation, for example, these results might inform the development of a standard questionnaire for diagnosis of MD through written language patterns, designed to be administered by non-experts, and perhaps automatically scored. Present results lead us to contend that linguistic study could inform future clinical approaches to non-pharmacological treatment of MD. Such psychotherapeutic approaches would address not only language content but also language remediation or cognitive training of language style and structure. If symptoms are indeed partially organized by language structure, a treatment approach to normalizing of language might play a beneficial role in improving affective state.

eThics sTaTeMenT
All subjects gave written informed consent according to the Declaration of Helsinki to participate in the study. The research protocol was approved by the Samara State Medical University's Ethics Committee.
aUThOr cOnTribUTiOns DS, GN, and ES designed the project. DS and GN collected the data. DS, GN, ES, and NK analyzed the data with advice from DR and PC. DS, GN, and PC wrote the first draft of the manuscript. All the authors reviewed the final version of the manuscript.

acKnOWleDgMenTs
All the authors express the deepest gratitude to Prof. Alexander Krasnov, the Head of Pedagogy, Psychology and Psycholinguistics Department, Samara State Medical University, Samara, Russia, who recently passed away, for his consultation on psycholinguistic issues and strong support of the study. The authors thank their collaborator Prof. Evelina Fedorenko (Massachusetts General Hospital, Harvard Medical School, Department of Psychiatry, MGH/HST Martinos Center for Biomedical Imaging, Harvard, MA, USA) for providing independent expert opinion on the psycholinguistic approach and also for reviewing the manuscript. Their special gratitude goes to Prof. Norman Segalowitz of Concordia University, Montreal, Canada for his detailed review of their manuscript and for suggesting recalculations to highlight the most important findings of the study. The authors wish to express their gratitude to Dr. Maxim Ustinov (Samara State Medical University, Samara, Russia), who assisted in primary statistical evaluations and to all of their patients and healthy volunteers from the control group who participated in the study.