The Effect of User Psychology on the Content of Social Media Posts: Originality and Transitions Matter

Multiple studies suggest that frequencies of affective words in social media text are associated with the user's personality and mental health. In this study, we re-examine these associations by looking at the transition patterns of affect. We analyzed the content originality and affect polarity of 4,086 posts from 70 adult Facebook users contributed over 2 months. We studied posting behavior, including silent periods when the user does not post any content. Our results show that more extroverted participants tend to post positive content continuously and that more agreeable participants tend to avoid posting negative content. We also observe that participants with stronger depression symptoms posted more non-original content. We recommend that transitions of affect pattern derived from social media text and content originality should be considered in further studies on mental health, personality, and social media.


INTRODUCTION
Many people express rich moods and emotions in their social media posts. Psychologists use the word "affect" to describe these experiences of feelings and emotions. Affect plays an important role in cognition (Gross et al., 1998) and well-being (Silvera et al., 2008). Therefore, affective expressions in social media text have emerged as a key variable for making inferences about users' personality traits (Golbeck et al., 2011;Bachrach et al., 2012;Farnadi et al., 2013) or mental health (De Choudhury et al., 2013;Coppersmith et al., 2014;De Choudhury and De, 2014;Bazarova et al., 2015).
Existing studies formulate the associations between affect and well-being based on the frequencies of affective words used in social media text (Yarkoni, 2010;Golbeck et al., 2011;Schwartz et al., 2013;Park et al., 2015;Chen et al., 2020). However, patterns of affect are an important class of symptoms of affective disorders (Frijda, 1993;Rottenberg, 2005;Bylsma et al., 2011;Carlo et al., 2012;Thompson et al., 2012;Houben et al., 2015;Sheppes et al., 2015). Personality may also predispose individuals to specific moods (Rusting and Larsen, 1995;Rusting, 1998). With this in mind, we examined how patterns of affect expressed in social media text are related to users' mental health and personality.
While non-original content has been extensively studied in opinion mining (Balahur et al., 2009;Agarwal et al., 2011), it has been comparatively neglected in the study of psychological interpretations of social media data. However, social media users often use lyrics or quotes to communicate their emotions. Such content comes from other media, such as literature, videos, films, or music, which can evoke strong emotional experiences (Scherer and Zentner, 2001;Juslin and Laukka, 2004;Scherer, 2004). Since the affect of the non-original content may be different from the social media users' affect when they are post this content, we differentiated between original and non-original content in our analysis.
This pilot study was designed to examine the following research questions:

Changes in Affect:
To what extent do changes in the affect of social media posts correlate with users' personality traits and mental well-being? 2. Originality: To what extent does the use of non-original material in their posts correlate with users' personality traits and mental well-being?
Following best practice in sentiment analysis and opinion mining, we distinguish between positive, negative, neutral, and mixed (both positive and negative) affect (Moilanen and Pulman, 2007;Agarwal et al., 2011;Rosenthal et al., 2015). We used a well-known dataset, myPersonality (Bachrach et al., 2012;Youyou et al., 2015), which enriches Facebook posts with many validated psychological measures. In MyPersonality, positive mental well-being is measured using the Satisfaction with Life Scale (Diener et al., 1985(Diener et al., , 1999, while the presence of depressive symptoms is assessed using the Centre for Epidemiologic Studies Depression scale (CES-D) (Radloff, 1977). Personality traits are established following the OCEAN model (McCrae and John, 1992), which consists of the five traits Openness to Experience, Conscientiousness, Extroversion, Agreeableness, and Neuroticism.
We included all 70 adult users who provided sufficient, regular Facebook data for 2 months before completion of the CES-D questionnaire and corrected for multiple comparisons in our statistical analysis. We find that the transitions from one affective state to another expressed in social media posts give us a highly nuanced view of personality traits. While the amount of nonoriginal posts in ones' social media status updates is closely linked to depression symptoms, this link is mediated by neuroticism.

BACKGROUND
Affect refers to both mood and emotion. Moods are slow-moving states that can be influenced by people, objects or situations, whereas emotions are quick reactions to stimuli (Watson, 2000;Rottenberg and Gross, 2003) and are highly situation-or objectspecific (Bylsma et al., 2008). Mood influences the probability of having emotions of the same valence-negative mood facilitates negative emotions, and positive mood makes positive emotions more likely (Fredrickson, 1998;Rottenberg, 2005). Affect is an important predictor of mental well-being, including a person's overall satisfaction with life (Headey et al., 1993;Singh and Jha, 2008;Chen et al., 2017), and the level of symptoms of depression (Coppersmith et al., 2015;Resnik et al., 2015;Tsugawa et al., 2015).
Personality also predisposes people to certain affective states (Rothbart et al., 2000). While neuroticism is associated with negative affect (Pishva et al., 2011), positive affect is strongly linked to extroversion (Fujita et al., 1991;Watson and Clark, 1997). Extroverts experience more positive affect because they engage in more social situations (Diener and Emmons, 1984;Ryan and Deci, 2001). Individuals who score high on agreeableness have a greater ability to regulate negative affect (Meier et al., 2006;Haas et al., 2007). This relationship between affect and personality is also reflected in social media studies (Pennebaker and King, 1999;Golbeck et al., 2011;Schwartz et al., 2013;Lin et al., 2017). For example, people who use negative affective words in their social media posts tend to have lower conscientiousness, lower agreeableness (Golbeck et al., 2011), and higher neuroticism (Pennebaker and King, 1999).
In psychology, quantitative representations of affect are typically multidimensional (Russell, 1980). In this study, we focus on valence, which is represented in many classic affect models. Traditional measures, such as the Positive and Negative Affect Schedule (PANAS) (Watson et al., 1988), report the strength of positive and negative valence. Mixed valence can occur when people experience "dialectic" emotion, which is a mix of positive and negative emotions (Schimmack et al., 2002;Russell, 2003).
The personality trait measurements in myPersonality are based on Costa and McCrae's well-validated OCEAN model (McCrae and John, 1992). The model consists of five dimensions: extroversion, agreeableness, conscientiousness, neuroticism, and openness to experience. Neuroticism refers to the degree of emotional stability. Openness reflects the degree of creativity and curiosity. Conscientious individuals tend to be careful and diligent. Extroversion refers to a tendency to be energetic and friendly. Agreeableness reflects the tendency to be compassionate and to cooperate with others (Digman, 1990). The five-factor structure has proved to be robust in both self and peer ratings (McCrae and John, 1992), in both children and adult (Mervielde et al., 1995), and across different cultures (McCrae and Allik, 2002) and to be stable over time (McCrae and John, 1992).

DATA AND METHODOLOGY
The myPersonality data set (Bachrach et al., 2012;Youyou et al., 2015) contains more than 180,000 Facebook users, enriched with a variety of additional validated scales (Bachrach et al., 2012). The collection of myPersonality data complied with the terms of service of Facebook, informed consent for research use was obtained from all users, and researchers had to seek permission to use the dataset. Permission for the use of this database was obtained before it closed for new studies in 2018. The study was granted Ethical Approval by the Ethics Committee of the School of Informatics, University of Edinburgh.

Choice of Scales
From the extensive data collected within myPersonality, we chose two scales for quantifying mental well-being, the Center for Epidemiologic Studies Depression Scale (CES-D) and the Satisfaction with Life Scale (SWL). The CES-D scale measures a key aspect of mental health, the presence of depression symptoms (Radloff, 1977). The scale has high internal consistency, test-retest reliability (Radloff, 1977;Roberts, 1980;Orme et al., 1986), and validity (Orme et al., 1986). Following previous social media studies (Park et al., 2012;De Choudhury et al., 2013), we adopt a score of 22 or higher as a cut-off value for likely depressive disorder (maximum score: 60). The fiveitem SWL scale has been tested across different cultures and age groups (Pavot and Diener, 2009) and has been found to have high internal consistency and temporal reliability (Diener et al., 1985). Personality traits were measured using a 100-item scale using items from the open-source International Personality Item Pool (Goldberg et al., 2006) that were validated against the NEO-PI-R (Schwartz et al., 2013) instrument.

Selection of Participants
The data set was originally designed for a study of the effect of mental well-being and values on social media disclosure. We therefore selected only those participants who had completed the CES-D scale, the SWL scale, and the Schwartz Value survey (Schwartz, 1992) in addition to the full personality questionnaire. A total of 301 participants in myPersonality provided full data for all four scales.
To ensure we had enough posts to assess the frequency of affect transitions, we only included users in our sample that regularly updated their public Facebook feed (regular users). We defined regular users as individuals who posted on average twice a week or more. We estimated posting frequency using the average post-count per day during the sampling frame. If an individual had a post-count per day of 0.3, this individual made around 110 posts in 365 days, which was roughly equivalent to an average of two posts per week. Of the original 301 participants, 122 (40.5%) were regular users.
Since the CES-D asks about symptoms in the past week, we excluded a further 31 users who had not posted any content in the week before completing the CES-D scale. We then focused on a 60-days span (2 months) before CES-D completion to ensure that we had sufficient data to track the development of users' moods. We removed 14 users who contributed <20 posts during that time. Finally, we removed four users who were under 18 years old and three users with more than 20% of the posts written in a language other than English, because English was the common language of the annotation team. The final sample consisted of 4,086 posts from 70 users.

Social Media Affect
For the purpose of this study, we refer to the affect shown in social media posts as social media affect. In this study, following (Mohammad, 2016), we operationalize valence as the post-author's attitude toward a primary target of opinion. We refer to the "dialectic" affective state as mixed valence. If there is no clear trend toward positive or negative affect, the associated valence is neutral.
After extensive piloting, we created an annotation guideline (available as part of the supplementary material) that was largely based on Mohammad (2016)'s work on defining the valence of a social media post. Each post is assigned one of four affect polarities: + (positive), − (negative), ± (mixed), or 0 (neutral). We used manual annotation since this is commonly used in computational linguistics to create a baseline gold standard data set for further analysis (Teufel, 1999).
Of the 4,086 posts, 2,698 (66%) were annotated by a team of six trained annotators and 1,185 (29%) by the first author; 5% of all posts were annotated by all seven annotators to establish interrater reliability, which was measured using Cohen's κ (Gamer et al., 2019). Average inter-rater reliability between the first author and the annotators is 0.88, and it is 0.78 among the six annotators.

Originality
We define posts that consist of quotes from sources, such as song lyrics, books, or movies as non-original content; all other content was defined as original. Since non-original content might not directly reflect the user's moods or emotions, annotators were instructed to annotate such posts according to the likely emotions of the author. For example, if a post consists of an uplifting motivational quote, annotators considered the underlying valence to be positive.
In order to establish the originality of a post, we retrieved the first page of results obtained by searching for the post-text using the Google API. For each web page on the first page of results, we computed the cosine similarity between the post-content and the page content. Posts with a cosine similarity >0.96 were labeled as non-original, and posts with a cosine similarity between 0.92 and 0.96, where the website links or website names included the words "lyrics" or "quote" were labeled as potentially non-original. Posts with a cosine similarity lower than 0.92 were labeled as original. The cutoff points were determined based on a sample of 300 posts manually annotated for originality by the first author. On these posts, the classifier yields 100% recall, 81% precision, and an F1-score of 0.89. In our data set, 287 (7%) of all posts were identified as non-original.

Modeling Affect Transitions
We examine two types of transitions: • Post-level vs. Day-level: Post-level transitions focus on changes in affect between subsequent social media posts, whereas day-level transitions focus on changes in overall dominant affect between subsequent days. • Silence vs. Non-silence: Not all users post every day. In our default models, these silent days are ignored, whereas in our with-silence models, days without posts are explicitly modeled as Silence.
The post-level social media affect is likely to be influenced by underlying emotions, which change more quickly, whereas the day-level social media affect is likely to be influenced by underlying mood during the day. Day-level affect was calculated as follows. If the majority of the posts p ij on day d j have the same affect a, then the affect of day d j is set to a. If there is an equal number of positive (+) and negative (−) posts or if the number of mixed affect (±) posts is equal to the number of posts with other types of affect, affect is set to ± (mixed). For transitions between original and non-original posts, we only consider the post-level representation. Table 1 shows an example of the affect and originality representations.

Statistical Analysis
Demographic differences between users above and below the CES-D cut-off score for probable depression were assessed using Wilcoxon-Mann-Whitney tests (R-package "Stats"). We used Pearson correlation coefficients to assess the significance of correlations between social media data on the one hand and personality traits and mental well-being on the other hand. Due to the small sample size and the number of correlations computed, all correlation coefficients were estimated using a permutation approach (Higgins, 2003), as implemented in the R Package jmuOutlier (Garren, 2017). Correlations that reach p < 0.01 or better are reported as significant; correlations that reach p < 0.05 are reported as trends in the data. For all correlations reported in the paper, we give the estimated correlation coefficient, the bootstrap 95% confidence interval, and the corresponding coefficient of determination r 2 . Table 2 shows the basic statistics of our sample. Our data predominantly comes from single female Caucasian young adults. The average CES-D score is above the cut-off for possible depressive disorder.

Demographics and Baseline Statistics
Thirty-nine (56%) participants had a CES-D score of 22 or higher (mean: 33, SD: 6.5), which means that it is possible that they have depressive disorder, and 31 (44%) had a score of 21 or lower (mean: 12, SD: 6).
All scales are normally distributed (Shapiro-Wilks test), except for openness to experience (W = 0.96, p < 0.05) and satisfaction with life (W = 0.95, p < 0.05), which are bimodal. Figure 1 Plot 1 shows the correlations between different personality dimensions. As expected, the five personality dimensions are not orthogonal.

Social Media Affect: Frequencies vs. Transitions
For overall frequencies of affect category, the only clear correlation is between extroversion and positive content. Overall, more extroverted participants are more likely to have days where they make predominantly positive posts (r = 0.29, p < 0.01, 95%CI = (−0.15, 0.32), r 2 = 0.08). In addition, participants who score higher on agreeableness tend to post fewer negative posts and have fewer days with predominantly negative posts [both r = −0.26, p < 0.05, 95%CI = (−0.48, −0.04), r 2 = 0.07].
When we look at transitions between affect categories, however, a more nuanced picture emerges. Table 3 summarizes the correlations between personality, well-being and transition types. Significant correlations are summarized in Table 4. Due to the number of correlations presented, we choose a cut-off of p < 0.01, which is stricter than the normal p < 0.05.
Several transition types are correlated positively and negatively with Extroversion and Agreeableness. Neuroticism, conscientiousness, and SWL show interesting trends (p < 0.05) that do not reach significance (c.f. Table 3).
Since neuroticism is closely linked to depression symptoms, we also computed a partial correlation between content originality and CES-D while controlling for neuroticism. The resulting correlation was no longer significant (r = 0.14, p = 0.22, r 2 = 0.02). Therefore, the association between content originality and depression symptoms might be moderated by neuroticism.

Main Findings
Many studies have found associations between the frequency of affective words used in social media text and personality. However, existing studies often saw affect as static and only focused on the strength of bipolar valence (positive/negative). Instead, our work focuses on affect patterns. We encode posting behavior, transitions between affect states, and content originality. From a practical point of view, our technique can supplement experience sampling techniques (Myin-Germeys  et al., 2018) to help clinicians and patients develop a more comprehensive view of a person's affect patterns, arrive at a better-substantiated diagnosis, and make improved treatment decisions. However, this depends on whether the patient is willing to share information from their social media feed with their therapist. Overall, the correlations seen between affect transitions and personality traits are in line with the consensus in the early literature (Gross et al., 1998). Extroverts tend to produce sequences of positive posts. This behavior fits well with the positive emotional core in extroverts stipulated in (Watson and Clark, 1997). Participants with higher agreeableness are less likely to post-sequences of negative posts. This could be due to their ability to regulate negative affect (Meier et al., 2006;Haas et al., 2007).
Although the psychology literature suggests a strong association between negative mood states and neuroticism (Rusting and Larsen, 1995), we did not find this in our data. Our results are in line with previous studies of verbal cues to personality traits in social media (Yarkoni, 2010;Golbeck et al., 2011;Schwartz et al., 2013;Park et al., 2015). Golbeck et al. (2011) found that social media users who were more likely to talk about anxiety were on the higher end of the neuroticism scale. We speculate that self-presentation bias may influence how social media users regulate their expression of negative emotions in their public posts. The only relevant association we found was that social media users on the high end of neuroticism are more likely to switch between posting positive and negative affective content. This finding aligns well with the 4 | Summary of the significant correlations between transition states and the five personality traits (p < 0.01).

Transitions
Post-level (with-silence) Post-level (without-silence) fact that high neuroticism is associated with high emotional instability (Costa and McCrae, 1992). The link between posting non-original content and elevated depression symptoms appears to be moderated by neuroticism. This suggests that high levels of neuroticism predispose users both to depressive symptoms and to an indirect disclosure of emotions through quotes and lyrics.

Day-level
In our sample, the prevalence of depressive symptoms is higher than would be expected in the general population. In the original CES-D paper, Radloff (1977) proposed three levels of depression severity: low (0-15), mild-to-moderate (16-22), and high (23-60). They found that only 21% of the general population scored above the low symptom level. In contrast, in our sample, nearly half of the participants exhibit a high level of symptoms (>22). Within the context of social media studies of depression, however, our data set is not exceptional. For many studies in the area, high symptom individuals account for nearly half of the data set (De Choudhury et al., 2013;Tsugawa et al., 2015;Nadeem, 2016;Reece et al., 2017;Orabi et al., 2018).
Our results support the claim that affect expressed in social media data text is associated with social media users' affect patterns in real life. However, the data set used in this study is from the early 2010's and only covers the well-established social media platform Facebook. The associations found in this study are likely to be slightly different from those found in another social networks (e.g., Instagram) or in a new data set collected 10 years later.

Limitations
Due to the restrictions imposed by the need for sufficient Facebook updates to allow analysis, our final sample is relatively small. Given the size of the significant effects we found in the data, power calculations indicate that a well-powered study should include data from around 200 users (Schönbrodt and Perugini, 2013). It also skews heavily toward younger female Caucasians with relatively low satisfaction with life and strong depression symptoms. It is possible that other groups of users (e.g., non-Caucasians, males) are less likely to disclose personal information about mood and emotions on their public Facebook pages (Dosono et al., 2017;McDonald et al., 2019).

CONCLUSION AND FUTURE WORK
In this pilot study, we demonstrated the benefits of detailed representations of social media affect for unpacking the relationship between personality, mental well-being, and the content posted on social media. Importantly, our representations include non-binary affect categories (positive, negative, mixed, neutral), and take into account content originality. As a consequence, we were able to obtain a more detailed picture of the link between patterns of affect and depressive symptoms.
In future work, we plan to enrich our data set with more in-depth analyses of original vs. non-original content, extend coverage by including a larger sample of the myPersonality data set, and construct statistical models that allow us to observe long-term trends in posting patterns. Future studies should also examine the extent to which affect expressed in nonoriginal content is aligned with the users' affect when they post the material.

DATA AVAILABILITY STATEMENT
The datasets generated for this study will not be made publicly available because the myPersonality database is closed for further research. Requests to access the datasets should be directed to the corresponding author.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Self-Certification according to the procedure of the School of Informatics, University of Edinburgh. The patients/participants provided their written informed consent to participate in this study. The secondary analysis of this data set was reviewed and approved by the Ethics Committee of the School of Informatics, University of Edinburgh, Reference Number 72771.

AUTHOR CONTRIBUTIONS
LC: study design, statistical analysis, analysis of results, and drafting of paper. WM: principal supervisor of LC. MW: second supervisor of LC. WM and MK contributed to paper writing, advised on study design, statistical analysis, and analysis of results.