Linguistic markers for major depressive disorder: a cross-sectional study using an automated procedure

Introduction The identification of language markers, referring to both form and content, for common mental health disorders such as major depressive disorder (MDD), can facilitate the development of innovative tools for early recognition and prevention. However, studies in this direction are only at the beginning and are difficult to implement due to linguistic variability and the influence of cultural contexts. Aim This study aims to identify language markers specific to MDD through an automated analysis process based on RO-2015 LIWC (Linguistic Inquiry and Word Count). Materials and methods A sample of 62 medicated patients with MDD and a sample of 43 controls were assessed. Each participant provided language samples that described something that was pleasant for them. Assessment tools (1) Screening tests for MDD (MADRS and DASS-21); (2) Ro-LIWC2015 – Linguistic Inquiry and Word Count – a computerized text analysis software, validated for Romanian Language, that analyzes morphology, syntax and semantics of word use. Results Depressive patients use different approaches in sentence structure, and communicate in short sentences. This requires multiple use of the punctuation mark period, which implicitly requires directive communication, limited in exchange of ideas. Also, participants from the sample with depression mostly use impersonal pronouns, first person pronoun in plural form – not singular, a limited number of prepositions and an increased number of conjunctions, auxiliary verbs, negations, verbs in the past tense, and much less in the present tense, increased use of words expressing negative affects, anxiety, with limited use of words indicating positive affects. The favorite topics of interest of patients with depression are leisure, time and money. Conclusion Depressive patients use a significantly different language pattern than people without mood or behavioral disorders, both in form and content. These differences are sometimes associated with years of education and sex, and might also be explained by cultural differences.


Introduction
Major depressive disorder (MDD) is one of the most common diagnoses in mental health (WHO, 2023).Therefore, early identification of both new cases and relapses is a priority for both policy makers and clinicians, to aleviate its global burden.
The ICD-11 definition of depressive episode requires at least 5 out of 10 symptoms; one of the mandatory symptoms should be either depressed mood, or significantly diminished interest or pleasure in activities.Hopelessness regarding the future is not listed in DSM-5 as a symptom of major depression (DSM, 2013), nevertheless the ICD-11 includes it due to its power to discriminate patients with depression from those without (WHO, 2022).Other depression symptoms involving cognition include thoughts of low self-worth, guilt, one's own death, suicide.Among depression symptoms relevant for cognitive functioning, ICD-11 also lists decreased sustained attention and concentration, and significant indecisiveness.The symptoms should be present for at least 2 weeks, almost daily and most of the time in the occurring days; they should not be secondary to another health condition, medication or substance use, or bereavement.Moreover, they should generate significant functional impairment.
Building on these descriptors, existing studies explore the relationship between language and depression, as language is a natural way through which we elicit an outward expression of thoughts, emotions and other mental processes.The challenge resides in our ability to identify language features that convey information about interest, pleasure, self-confidence, indecision, self-esteem, hopelessness, diminished appetite, social withdrawal and other main features of depression in a form that can be used by natural language processing (NLP) machines.

Language as a biomarker for depression
Literature suggests that depression influences how individuals communicate, and multiple studies are performed in this direction.Recent research (Koops et al., 2023) states that depressive speech is characterized by several anomalies, such as lower speech rate, less pitch variability and more self-referential speech; moreover, data shows that current technologies are able to predict these features in depression with an accuracy of up to 91%.Our previous studies (Trifu et al., 2015(Trifu et al., , 2017) aimed to explore the potential of language markers in major depressive disorder (MDD) via manual analysis and identified specific patterns of language, specific depression language markers, and their relationships with cognitive functioning.
Recent studies brought increasingly conclusive results that support the hypothesis of mood disorders imprint on language.Linguistic imprints of mood disorders appear in every component of language, more specifically the lexical, semantic, morphological unit, syntax, pragmatic and social communication, respectively.Language assessment becomes more accessible if broken down into the aforementioned components.
Regarding language lexicology and morphology, a study of two bilingual samples identified the following linguistic imprints: self-referential language, 1st person speech using 1st person singular pronoun ('I'), increased amount of negative emotion words with decreased use of positive emotion words, and decreased use of 1st person plural speech ('we'), respectively (Behdarvandirad and Karami, 2022).Another study reports similar results for the use of negative emotion and 1st-person singular.Moreover, the sample of participants with depression more often used words connected with work, family, sex, biology, and health.Additionally, the use of past tense, causation, achievement, family, death, psychology, impersonal pronouns, quantifiers, and preposition words outlined emotion-dependent differences between the sample with depression and controls (Yang et al., 2023).Another research supports these findings with similar results (Robertson et al., 2023).
Persons with depression also use fewer future tense constructions (e.g., will), fewer high-certainty constructions (certainly), more low-certainty constructions (could) and more deontic, especially volitive boulomaic modal constructions, such as hopes, wishes and desires.Also, absolute meaning words are more frequent in depression in general (Al-Mosaiwi and Johnstone, 2018;Yahya and Rahim, 2023).This concurs with the cognitive rigidity theory in depression and mental health (Aguilera et al., 2019).
Diminished communication in persons with depression reflects in, and is altered by, the language syntax.The language displayed by a person with depression indicates use of truncated sentences, with omissions, short sentences and reversed topic of the sentence, which in turn render the message of communication difficult to understand (Trifu et al., 2017).Single-clause sentence predominance over multiclause and atypical word order is common in the group of depressive persons.Modified word order, presence of ellipses and colloquialisms were the highest predictors in discriminating between persons with and without depression (Smirnova et al., 2018).Similarly, simple sentences vs. complex syntax can predict successful outcomes of recovery programs; more specifically, complex syntax predicts wellbeing (Zinken et al., 2010).
Regarding the pragmatic component of the language, more precisely the use of natural language, colloquial expressions, and idioms that facilitate interpersonal communication and social relations, a recent study (Bridges et al., 2023) shows specific patterns of language.Based on language analysis related to naturalistic/ formulaic expressions in patients with treatment-resistant depression who have benefited from surgery, i.e., surgical deep brain stimulation of the subcallosal cingulate white matter pathways, the study indicates that patients with depression produced fewer conversational speech expressions pre-and postoperatively, compared with healthy controls.The study also ascertained a higher rate of non-nuanced familiar expressions, large lexical bundles, e.g., "it is used to," "on the other hand," "finally, whatever" that are fixed linguistic segments.In this respect, studies ascertain (Biber, 2004;Zhang et al., 2021) that lexical bundles create building blocks of discourse and actively contribute to fluent linguistic production; however, they reduce interactivity in communication, due to their stereotypical nature.This contrasts with the decreased use of nuanced expressions, formulaic expressions post-operatively, e.g., "Er," "Uhm," "Oh" and conversational speech formulas, e.g., "Are you ok? "You've got to be kidding," "Excuse me?," which elicits more personal communication and subsequently better interpersonal relationships.

Computational analytic approaches to language in depression
In recent years, given the widespread use of social media and access to technologies, text analysis has emerged as a promising way to gather information about individuals and their illnesses.A comparison between human raters and NLP machines, such as Linguistic Inquiry and Word Count (LIWC) was performed in the medical field regarding self-reported psychological and physical health, based on written essays of chronic pain patients.The authors (Ziemer and Korkmaz, 2017) ascertained a better predictive power for human raters than computerized text analysis on measures of depression, but LIWC and human raters had similar predictive power in other medical variables, such as pain severity, pain catastrophizing and illness intrusiveness.Nevertheless, other studies (Burkhardt et al., 2021;Dudău and Sava, 2022;Spruit et al., 2022;Santos et al., 2023) proved the efficacy of LIWC in identifying depressive mood and other mental health indicators from language.Burkhardt monitored in his longitudinal research the linguistic indicators in persons with depression and underlined that LIWC markers of depression and novel linguistic indicators of activation, such as linguistic indicators of planning and participation in enjoyable activities, strongly associate with depression scores, evaluated with the Patient Health Questionnaire 9 (PHQ-9), and longitudinal patient trajectories, respectively.Likewise, emotional tone, pronoun rates, words related to sadness, health, and biology, and behavior activation-related LIWC categories, respectively, appear to be complementary.Another study (Coello-Guilarte et al., 2019) managed to reasonably identify depression with LIWC even in the cross-linguistic context, i.e., by using an automatic translation of texts and bilingual dictionaries.Kimball et al. (2019) proved that LIWC is a sensitive tool in screening for anxiety and depression in tinnitus patients, even when the selfassessment fails to indicate relevant levels of anxiety and depression symptoms.Furthermore, a longitudinal study in which LIWC assessment combined with other Coefficient for Naturalistic Language processing (NLP) tools such as SentiWordNet, LDA Topic and Word2Vec, indicated that language processing and analysis moderately predicts depression risk onset among pregnant persons, 30 days and 60 days after giving birth, respectively (Krishnamurti et al., 2022).
Thus, we may state that the study of the relationship between language and depressive disorder is a current, highly interesting topic.However, studies in this direction are only at the beginning and are difficult to implement due to linguistic variability, the difficulty of qualitative analysis of linguistic samples, the specificity of language, cultural context in relationship with language and depressive disorder, and the many different approaches to language evaluation used until now.

Study relevance
The current, cross-sectional study uses an automated procedure and performs a computerized comparative assessment of language markers through the Ro-LIWC 2015 in a clinical sample of MDD participants and a control sample.Based on our knowledge, and the literature search, this is the first study that applies the LIWC-RO 2015 in a clinical sample with depression.Previous validation studies (Dudău and Sava, 2022) use depressive language from books and literature, not clinical samples.Moreover, the current study complements the LIWC -2015-RO validation study, one of the limits of Dudau's study being the lack of clinical samples.Furthermore, many of the previous studies used screening instruments like the PHQ-9 in the evaluation of depressive symptoms.We set out to use the Montgomery-Asberg Depression Rating Scale (MADRS) instead, which is a clinician-rated instrument designed specifically for the sensitive evaluation of the intensity of depressive symptoms (Montgomery and Asberg, 1979).
The current study provides relevant, valuable data regarding linguistic markers for Romanian language and cultural context, since both the clinical and control samples who underwent automated assessment and rating are Romanian.Furthermore, such findings can be useful in building pre-trained language based models to augment feature-based dictionaries for programming machines to identify people at risk for or suffering from depression (Rawsthorne et al., 2020;Balagopalan et al., 2021;Malins et al., 2022;Yu et al., 2022).

Participants
We included a sample of 62 participants with clinical depression, diagnosed with MDD based on ICD-10 criteria by two independent clinicians, and a sample of 43 controls without any history of mental health illness or mood disorders, assessed on ICD-10 criteria by two independent clinicians.The sample with depression consisted of psychiatric inpatients from Cluj County Emergency Hospital and outpatients under psychiatric medication.Exclusion criteria were: age (less than 18, more than 65) and the existence of other psychiatric disorders, confirmed also by the clinicians.Inclusion criteria for the control group were: age (between 18 and 65), no history of psychiatric disorder, and no current psychiatric symptoms, confirmed by two independent clinicians.Each participant was evaluated with clinical tests and language tests, provided language samples and described something that made them happy or was pleasant for them.

Assessment tools
(1) Depression, Anxiety and Stress Scale 21 (DASS-21) is a 21 item clinical instrument that assesses three dimensions of the emotional state present during the previous week (Lovibond et al., 2011).It is a widely used scale, that has been translated, adapted and validated in numerous languages, including Romanian.(2) The Montgomery-Asberg Depression Rating Scale (MADRS) is one of the most widely used clinical instruments for quantifying the severity of depressive symptoms.It is a 10 item scale that covers the main characteristics of depression; it has the advantage of being more specific for depression than other scales since it does not include items that can be associated with other mental disorders (e.g., anxiety), and it is very sensitive to change (Montgomery and Asberg, 1979).(3) Linguistic Inquiry and Word Count (Chung andPennebaker, 2013, 2018;Pennebaker et al., 2015) is a closed vocabulary approach, a tool that allows researchers to analyze specific language data.More precisely, it consists of an internal dictionary and an automated software design for language classification and word count.The instrument was developed initially in English and, with the help of technology and dictionaries, it was adapted in other languages.Easy download and use, affordability and wider range of content and grammar features (Chung and Pennebaker, 2018), make it an ideal candidate for analyzing written language and language samples.Currently, the LIWC 2015 website (Pennebaker, 2023) lists the tool in 22 available languages, including Romanian.
LIWC became a preferred research tool for language analysis.In 2023, a simple search in the WOS Core Collection databases with the "topic" search key criteria indicated a number of 657 researches and 667 for "all fields" search key criteria.Some of these studies are validation and adaptation studies, but some are applicative and highlight the use of LIWC in different contexts of language analysis.LIWC 2015 "all fields" search key criteria indicates 54 studies, 16 of which with specific focus on LIWC-2015.From these, 7 are of interest for health and education (Stanton et al., 2017;Chippendale and Gentile, 2021;Monzani et al., 2021;Ansari and Du, 2022;Yang et al., 2022;Sengun et al., 2023;Wu et al., 2023).Moreover, 7 studies (Toma and D' Angelo, 2015;Coello-Guilarte et al., 2019;Kimball et al., 2019;Berkout et al., 2020;McDonnell et al., 2020;Dudău and Sava, 2022;Meyerhoff et al., 2023) focus on the use of LIWC in connection with depression, all of them relevant for the topic of this study.The value of LIWC in mental health and depression studies stems from its perspective on software analysis.More specifically, LIWC distinguishes between content categories and functional categories; furthermore, it identifies different hierarchical patterns of structuring of the 2 aforementioned categories.

Procedure
We used interviews for collecting language samples.Each participant provided approximately 5 min of narrative production.The participants were asked to talk about something that is pleasurable for them.As language occurs in a natural manner, we used openended personal narrative questions.The investigator request was:" Please describe something that you like.What are your hobbies?. " Based on the depressive condition, the investigator rephrased with:" Please describe something that you enjoyed doing before you experienced this illness?Did you have any hobby?. " Speech recording was performed with a Sony Recorder.Cloud Speech API was used to transcript from audio.mp3 and audio.wav to text files, followed by manual editing, transcripts and verification, using WavePad Sound Editor and the text editor.Subsequently, the conversations were converted into Microsoft Ofice Excell Pack, and the data were exported for analysis in LIWC-Ro2015, which generated the data used for analysis.
The study was approved by the Ethical Committee of the Iuliu Hatieganu University of Medicine and Pharmacy Cluj Napoca, with the number Av. 227/7.02.2022.

Statistical analysis
Data analysis was performed in R version 4.3.1.Normally distributed data is presented as mean ± standard deviation.Non-normally distributed data is presented as median (1st quartile; 3rd quartile).The Chi-square test was used to analyze differences in qualitative variables distribution across groups.The Wilcoxon Rank Sum test (Mann-Whitney U test) was used to assess the differences between non-normally distributed variables across groups.The Kolmogorov-Smirnov test was used to assess the normality of data distribution.Regression models were used to control for the effect of sex, age, and years of education on the significant differences in language parameters across groups.The models were built using a step-by-step approach, where we first tested for collinearity between age and years of education, then we introduced, consecutively, the variables group, sex, and then years of education in the regression model, if relevant for each language parameter.At each step, the standard error of the coefficient of interest was monitored for significant increase (suggestive for high collinearity).Variables that generated high collinearity were excluded from subsequent models.Spearman and Pearson correlation coefficients were calculated, according to data distribution, to test for linearity.

Results
The demographic characteristics of the two samples are presented in Table 1.Depressive patients were significantly older than controls and had significantly lower years of education.
Concerning the presence of depressive symptoms, the sample with depression recorded on DASS-21 a score of M = 10 (7; 15) and Controls recorded a score of M = 1 (0; 3); concerning the severity of depression measured with MADRS, the sample with depression scored M = 31 (25; 36), compared with Controls M = 2 (0; 4).
The differences in language parameters between depressive patients and controls are summarized in Table 2.
The overview of differences regarding LIWC between the sample with depression and controls is presented in the Figure 1.The significant differences recorded refer to all aspects of language, i.e., morphology, syntax and semantics.
Age and years of education were significantly correlated (Spearman's rho = −0.544-p < 0.001) in our samples.Therefore, to   avoid breaking the collinearity assumption, we only included sex and years of education in the regression models.The relationships between language parameters that were significantly different between our samples on the one side, and sex and years of education on the other side, are summarized in Table 3.The best regression models (i.e., highest multiple R2, no parameter without statistical significance, and no highly collinear variables) built for controlling for the influence of sex and years of education are shown in Table 4. Regression models in which adding sex or years of education did not improve the model are not shown, i.e., models built to control for the confounding effect of sex and years of education on: prepositions, auxiliary verbs, negations, positive emotions, negative emotions, anxiety, tentative, biological processes, health, ingest, affiliation, power, focus on the past, leisure, home, and money.In all models, the sample retained an (independent) effect, except for Big Words and achievement, for which years of education was a better predictor.

Discussion
The current study investigates the linguistic markers and changes in narrative language in a sample of patients with depression versus controls.The study used an automated speech analysis, the Romanian version of LIWC 2015, for the assessment.Our study shows specific patterns of language in the sample of patients with depression, who display specific language markers.Nevertheless, we identified slight differences from our previous studies.
Regarding the choice of the instrument, the hierarchical structure of LIWC is significantly more relevant and brings a better outlook on the relationship between thinking and language, compared with other types of linguistic content analysis, manual or automatized, such as SentiWordNet, SentiStrenght, ANEW or General Inquire see (Dudău and Sava, 2022), SetembroBR corpus (Santos et al., 2023), Inflexitext (Berkout et al., 2020).For instance, the Affect category in LIWC is a superordinate category, defined by several lower order categories (subcategories) listed under the aforementioned category, more specifically Positive Affect, Negative Affect, Anxiety, Anger, Sadness.Thus, LIWC may generate significant information from the relationship between the category, i.e., affect, and another psychological construct, such as a cognitive process, or with one or more of the five aforementioned lower order categories, respectively.Moreover, the hierarchical sub-category of 1st person pronoun use in LIWC, listed in the superordinate category of Pronouns, appears significantly important for the diagnosis of depression.This result emerges from a meta-analysis (Edwards and Holtzman, 2017) focused on the association between depression and the use of first-person singular pronouns in a sample of k = 21, N = 3.758, using LIWC.The results revealed a small correlation r = 0.13, 95% CI = [0.10-0.16]and the authors conclude that first-person singular pronouns can be used as a linguistic marker of depression, in a manner transcending demographic parameters and not mediated by gender.The firstperson pronoun is also discussed when addressing how interpersonal connections are reflected in the relationship between language and depression (Meyerhoff et al., 2023).People with higher levels of depressive symptoms (PHQ-8) tend to use more differentiation words; (1) connected with close contacts, they used more first-person singular, filler, sexual, anger, and negative emotion words; (2) connected with non-close contacts, they used more conjunctions, tentative, and sadness-related words, and fewer first-person plural constructions.A study carried out by Smirnova et al. (2018) showed slight differences between the samples with depression and normal sadness, respectively.The use of reflexive (e.g., myself) vs. personal pronouns is different in patients with mild depression.This concurs with the initial results of a study that compared the written language of students with current, previous, and no depression, respectively.Authors (Rude et al., 2004) found that the 1st person pronouns were mainly used in the text by currently or previously depressed participants.This is a feature of interest for the sample of participants vulnerable to depression.In our previous study (Trifu et al., 2017), we used manual scoring methodology and we found that language in depression is sensitive to the use of the first person singular pronoun and the tendency of self-focus.This is similar with the study of Zimmerman et al. (2018) who ascertained that the predominance of first-person pronoun use is associated with the severity of depression and worsening of depression symptoms in clinical inpatients.More recently, a study (Yahya and Rahim, 2023) observed a higher rate of first-person pronoun use in X (Twitter) users who exhibit depression, i.e., almost double than in average users.
The computerized analysis through LIWC in our present study shows no significant differences in the use of the 1st person singular personal pronouns between the sample with depression and controls.However, we ascertained significant differences in the use of 1st person plural personal pronouns between samples.More specifically, the self-referential and self-focused language range is active in participants with depression.These results may be explained by the specificities of Romanian language and the cultural context in which the person experiences isolation.Another explanation is that Romanian is a Romance language; this language family uses mostly non-accentuated or incomplete forms of 1st person singular pronouns; this feature is probably not detected via computerized analysis.Data from the study of LIWC-2015 RO multilingual analysis (Dudău and Sava, 2021) support this hypothesis: "I" pronoun has the smallest usage in Romanian language with an average of 1.17 (SD = 1.12), compared with English M = 2.78 (SD = 2.21), and Dutch M = 2.81 (SD = 2.27).Brazilian Portuguese M = 1.87 (SD = 1.63), another Romance language, exhibited in this respect similar values with the Romanian.Moreover, Romanian language uses self-reporting (1st person singular) via inflexional and morphological verbal forms included in the conjugation, and morphemes, respectively.The subject of the sentence is often implicit in Romanian language, therefore the actual use of a personal pronoun with the verb is not necessary; this cultural feature is associated with Romanian language.
The current study examines the influence of depression and affective states on linguistic style.Other studies of language markers related to emotional states and the use of personal pronouns have differentiated between types of use of personal pronouns and their connection with affective states.For example, a study focused on language, depression and affect (Bernard et al., 2016) examines the impact of affective states on language.The result of this study suggests that both depression and temporary negative affect impact pronoun use; however, only depression influences the use of 1st person pronouns, while negative affect influences the use of 3rd person pronouns.Similarly, our study shows a lower use of impersonal pronouns in the sample of participants with depression (M = 2.52), compared with controls (M = 3.2), p = 0.024.As personal and impersonal pronouns are opposite types of pronouns, these results suggest the tendency of patients with depression to self-focus.This supports the theory of social disintegration (Durkheim, 2005).Moreover, the effect of 1st person personal pronouns and 'I'-speech use in impersonal contexts appears limited, due to diminished between-persons variability (Tackman et al., 2019).
For other function words, the use of Prepositions, Auxiliary verbs, Conjunctions, and Negations categories is slightly different in the sample with depression.This sample uses less prepositions (p = 0.009), more auxiliary verbs (p < 0.001) and more conjunctions (0.010).The role of the preposition is to indicate direction, place, location, and spatial relationship (Encyclopedia Britannica, 2023;Nastachowski, 2023), or to create a connection between objects in a sentence.Limited use of prepositions appears to decrease daily communication, truncate sentences, and generate temporo-spatial disorientation.This result concurs with our previous research (Trifu et al., 2017) where linguistic markers show the use of impersonal, truncated sentences in connection with cognitive impairment.Moreover, persons with depression use short sentences, which in written language involves more use of punctuation marks, especially periods.Our results confirm this language pattern; more specifically, the use of LIWC categories All punctuation marks and Period is significantly higher in the sample with depression, compared with controls.Also, prepositions contribute to spatial cognition, together with the use of spatial terms.Limited spatial cognition and limited daily communication are characteristics of depression and other disorders based on perceptual errors, such as autism (Bochynska et al., 2020).Both people with depression, and those with autism perform poorly in the use of connective sentence structures such as prepositions.In contrast, increased use of auxiliary verbs and conjunctions in the sample of patients with depression reflects their need to focus on action and their mental state.By definition "auxiliary verbs, also known as helping verbs, are verbs used in conjunction with main verbs in order to express grammatical functions such as tense, mood, voice, aspect, and more" (IELTS, 2023).Increased use of auxiliary verbs reflects an extensive mood dysfunction, expressed through language.These data are in concordance with other observations from previous studies (Tausczik and Pennebaker, 2009).In the context of depression, people with mood changes tend to remain in a contemplative, obsessive thinking pattern and display the same revolving ideas.The presence of auxiliary verbs outlines a characteristic of this fixed pattern in thinking, while the use of strong verbs in the present tense, which express directive clear action, is limited.A longitudinal study of written samples of persons with high scores on PHQ-9, GAD-7 shows increased use of auxiliary verbs and negations in the language of people with depression and suicidal ideation (O'Dea et al., 2021).These language markers can act as robust predictors of mental health.In our study, we obtained very high scores (M = 5.34 compared with M = 3.71, p = 0.006) for negation (e.g., use of words like not, never, nowhere etc.).This confirms the tendency of people with depression toward negative interpretation and negative framing.Our data concur with data of another impactful study which suggests that negation words, together with negative emotions words, use of 1st person singular, use of 2nd person pronouns, and use of swear words, respectively, were "significantly positively associated with current depression symptom severity" (Kelley and Gillan, 2022, p.3).
A high number of conjunctions also emphasizes that people with depression tend to elaborate different cognitive scenarios which are very difficult to manage.When cognitive scenarios overlap, the effort to separate and clarify them generates a clutter of linguistic function words, expressing intention to communicate through language paired with poorly communicated content.Thus, patients with depression do not complete the intention to communicate through language.This increases the emotional aspect of communication, as these function words are considered primarily emotional intensifiers (Savekar et al., 2019).Their use is more automatically activated, compared with the use of the content words nouns, adjectives and verbs; hence, they occur in an increased number.The increased use of function words is evidence of a specific psychological status with diminished selfregulation and self-control, hence its potential as a linguistic marker of depression.
On a semantic level, the results of our study indicate that the use of "affect words" displays significant statistical differences.More specifically, the use of words that express positive affect is very limited in the sample with depression, compared with controls, i.e., M = 3.48 (2.47; 5.16) compared with M = 5.66 (3.99; 7.01), p < 0.001.This indicates that people with depression tend to lack positive attitudes -more specifically related to problem-solving, which in turn burdens everyday life.The results are similar with available literature.Newell et al. ( 2018) found that people with chronic stress who use depressive language also used fewer positively valenced words and more negatively valenced words.Also, Capecelatro et al. (2013) found that individuals with a history of depression longer than 5 years used fewer words related to positive emotions.A consistent observation is underlined in a study (Rude et al., 2004) that compared written language of college students who previously experienced depression symptoms.This particular group used in their language marginally fewer positive affect words and more negative affect words.In our study, negative affect words have a higher rate in the sample with depression M = 2.40 (1.42; 3.47) compared with controls M = 1.54 (1.01; 2.35), p = 0.011.Similar results were found in another study, where the authors underline that "negative affect predicted use of negatively focused emotion language, highlighting the potential importance of negative affect in negative emotion word use" (Bernard et al., 2016) (p.323).In our study, negative affect words were related with the presence of the anxiety valence words, M = 0.29 (0.00; 0.68) compared with limited or no use in controls, p = 0.007; however, we found no differences between samples in the use of anger and sadness affect valence words (SAW).Although the group with depression used almost double the amount of SAW M = 0.76 (0.14; 1.57) compared with controls, M = 0.38 (0.00; 1.03), the difference was not statistically significant.This contradicts expected results, and may be influenced by sample size and study methodology, i.e., the type of information requested from participants, which was to describe something nice, or pleasurable for them, not a neutral topic.Previous studies (Capecelatro et al., 2013) use the International Affective Picture System (IAPS) with more neutral stimuli.More recently, the public domain provides the Open Affective Standardized Image Set (OASIS) stimulus set, by Benedek Kurdi, Shayn Lozano, and Mahzarin R. Banaji, which proves a better choice for eliciting information from patients in future studies (Kurdi et al., 2023).Moreover, a study (Al-Mosaiwi & Johnstone, 2018) observed a lower prevalence of negative emotion and content dictionaries such as "sad, " affect, " and "feel" in a forum of people with suicidal ideation, compared with people with anxiety and depression.In this respect, is possible that the use of sadness affect words might be related with the severity of depression and the presence of suicidal ideation, which should be addressed in future studies.
Our study shows no significant differences between the sample with depression and controls for the LIWC categories that explore the cognitive and perceptual processes via language record, suggesting that both samples represent this process similarly in language.An exception is the use of tentative subcategory from the LIWC cognitive process category.The sample with depression used less tentative words M = 4.72 (2.83; 6.10) compared with controls M = 5.30 (4.27; 7.79), p = 0.045.This result is understandable, expected, and explainable in the context of depression.The reduced use of tentative words, i.e., that express probabilities, uncertainty, indicates patterns of absolute, blackand-white thinking, associated with radical decisions and perseverance.Previous studies (Al-Mosaiwi and Johnstone, 2018;Aguilera et al., 2019) underline the absolute thinking bias and cognitive rigidity in persons with depression.
This tendency for absolute thinking and bias in cognition also reflects in the category of Personal concerns, where the Leisure, Home The difference based on LIWC categories between the sample with depression and controls.and Money categories displayed statistically significant differences between the sample with depression and controls.The concern for leisure and relaxation is less important in the sample with depression M = 1.09 (0.59; 1.84), compared with controls, M = 2.21 (1.51; 3.52), p < 0.001; the focus of attention is the concern for the categories Home in the sample with depression, M = 0.55 (0.00; 0.95), compared with controls M = 0.22 (0.00; 0.62), p = 0.013, and Money, respectively, M = 0.27 (0.00; 0.65) in the sample with depression compared with controls, M = 0.00 (0.00; 0.21), p = 0.003.Surprisingly, we did not ascertain any difference between the depression and control samples regarding the categories of Religion and Death.Nevertheless, it is possible that these two categories become relevant within the sample of patients with depression, in specific subgroups such as those with long history of depressive symptoms, or suicidal ideation, or suicidal attempts; this may be the scope of future studies.
Likewise, the language of patients with depression indicates more focus on time components.The result of our study confirms previous results regarding verbs/ action tense and relativity category of LIWC.The language of the sample with depression is more focused on past tense M = 10.79 (7.83; 13.74), compared with controls M = 6.47 (4.61; 9.71), p < 0.001.This tendency of increased focus on past and diminished focus on present and future tense was also observed in our  previous study (Trifu et al., 2017).The tendency is in connection with a negative exploratory style, in which the person with depression focuses on negative past events (Kellogg et al., 2020), while their capacity to project themselves in a positive present or future action diminishes (Peterson and Seligman, 1984;Pomerantz and Rose, 2014).The use of interrogative structures underlines that people with depression tend to ask themselves and others many questions, and are more focused on asking than on finding solutions.These findings concur with the mood expression related to ruminations, which is also emphasized by the use of an increased number of past tense verbs (Eysenck et al., 2006;Groß et al., 2017).We ascertained a lower number of future tense verbs in the sample with depression than in controls, similarly with Smirnova et al. (2018).However, the use of future tense words is high in our sample with depression, which indicates an inconsistent pattern at the group level.
Body representation reflected through language is different in the sample with depression, compared with controls.Depression patients are more preoccupied with health and ingestion categories, but perform similarly with controls in the body and sexual categories.A possible explanation regarding the health category is that the group with depression extensively ruminates on different topics and actions, especially in connection with their own health, and expresses concerns on health subjects.Results regarding ingestion can be interpreted in the context of somatic symptoms associated with depression.More specifically, reduced appetite and weight loss, or increased cravings for food and weight gain are symptoms specific to depressive disorder, both in ICD-11 and DSM V (DSM, 2013).
Moreover, symptoms listed as diagnostic criteria for depression (ICD-11 and DSM V) reflect changes in drives, which in turn may be expressed through language.For example, social withdrawal as a diagnostic criterion reflects the decreased need for affiliation, which is expressed in the LIWC Drives category as decreased affiliation.In our study, the Drives category from LIWC elicits statistically significant differences between samples on multiple elements, such as affiliation, achievements and power, with higher or lower scores depending on the valence of the drives.The depression sample has lower affiliation and risk scores and higher achievements, power, and risk scores.The results confirm available literature; other studies associate depression and loneliness with lower scores for linguistic markers of social relationships and activities such as affiliation from LIWC (Liu et al., 2022).Similarly with our study, another study indicates that in the Drives category, reward words were negatively associated with depressive symptoms (Bernard et al., 2016).
These findings, taken together with experimental results of other studies, can be useful in building pre-trained language based models to augment feature-based dictionaries for programming machines to identify people at risk for or suffering from depression, since this approach has been demonstrated to be effective (Rawsthorne et al., 2020;Balagopalan et al., 2021;Malins et al., 2022;Yu et al., 2022).

Study limitations
An important limitation of this study is the relatively small sample size which might lead to classify some language parameters as irrelevant for differentiating between depressive patients and controls.The small sample size might lead to negative results concerning the influence of years of education and sex on language parameters in the regression models.Our results, however, are generally in line with studies using significantly larger sample sizes and are backed by the theoretical models generally accepted in the field.Furthermore, we did not collect information about the specific medication the patients were using, their work status, marital status and duration of illness.The effect of antidepressant medication is of particular interest in this case as some antidepressants are more "activating" and others might induce symptoms like fatigue, emotional blunting, or sedation, which might have an effect on language and thus represent an important confounding factor.Another limitation might be that we did not assess personality traits; this might act as confounding factors, since personality traits might influence both the risk for depressive disorders, and language parameters, respectively, i.e., especially through character traits like self-transcendence.To our knowledge however, there is no study published yet on the role of personality traits as a confounding factor in the relationship between the phenomenology of depression and its expression through language.

Conclusion
Depressive patients use significantly different language, in both form and content, compared with people without mood or behavioral disorders.Mainly, patients with depression use different approaches in sentence structure: short sentences, which require multiple use of the period and, implicitly, directive communication with limited content of ideas.The sample with depression predominantly use: impersonal pronouns, first person pronouns in plural form -not singular, a limited number of prepositions with increased number of conjunctions, auxiliary verbs, negations.Also, they use verbs in the past tense, much less in the present tense, an increased number of words indicating negative affects, anxiety, and a limited number of words indicating positive affects.
The main topics of interest of the sample with depression are leisure, time and money from the category Personal concerns, time from the category Relativity, and agreement from the informal language category.It is important to mention that, in our sample, the level of education acts as a predictor in the regression model, which is of interest for future research regarding the importance of the level of education as potential protective factor for depression.

TABLE 1
Demographic characteristics of study sample.

TABLE 2
Differences in low-level features between groups.

TABLE 3
Relationship between significantly different language parameters across groups and sex and years of education.