Bored to death: Artificial Intelligence research reveals the role of boredom in suicide behavior

Background Recent advancements in Artificial Intelligence (AI) contributed significantly to suicide assessment, however, our theoretical understanding of this complex behavior is still limited. Objective This study aimed to harness AI methodologies to uncover hidden risk factors that trigger or aggravate suicide behaviors. Methods The primary dataset included 228,052 Facebook postings by 1,006 users who completed the gold-standard Columbia Suicide Severity Rating Scale. This dataset was analyzed using a bottom-up research pipeline without a-priory hypotheses and its findings were validated using a top-down analysis of a new dataset. This secondary dataset included responses by 1,062 participants to the same suicide scale as well as to well-validated scales measuring depression and boredom. Results An almost fully automated, AI-guided research pipeline resulted in four Facebook topics that predicted the risk of suicide, of which the strongest predictor was boredom. A comprehensive literature review using APA PsycInfo revealed that boredom is rarely perceived as a unique risk factor of suicide. A complementing top-down path analysis of the secondary dataset uncovered an indirect relationship between boredom and suicide, which was mediated by depression. An equivalent mediated relationship was observed in the primary Facebook dataset as well. However, here, a direct relationship between boredom and suicide risk was also observed. Conclusion Integrating AI methods allowed the discovery of an under-researched risk factor of suicide. The study signals boredom as a maladaptive ‘ingredient’ that might trigger suicide behaviors, regardless of depression. Further studies are recommended to direct clinicians’ attention to this burdening, and sometimes existential experience.


Introduction
Suicide, one of the major public-health concerns today following the COVID-19 crisis, 1-3 is a highly complex human tragedy. 4Unfortunately, despite decades of research, our understanding of this somewhat enigmatic phenomenon is unsatisfying, as implied in the well-cited meta-analysis by Franklin et al. 5 According to this wide-scope meta-analysis, "the [suicide] field has primarily focused on the same risk factors for the past 50 years, with risk factors becoming increasingly homogenous over past five decades".Nevertheless, a significant change may occur today, with the sweeping revolution in Artificial Intelligence (AI) and the ubiquitous spread of social media. 6e recent introduction of strong Large Language Models (LLMs), such as GPT-3, 7 allowed researchers to analyze large amounts of authentic and valuable personal texts, which became highly accessible with the growing popularity of the various social networking sites (e.g., Facebook, Twitter).LLMs are capable of capturing subtle and meaningful patterns in raw data (e.g., posts or tweets) and previous research had already demonstrated their promising potential for suicide predictions. 8To date, dozens of AI studies reported of high quality predictions of suicide risk from social media, using purely bottom-up methods, which did not involve the more traditional, top-down examination of pre-defined hypotheses. 9The few that do include top-down examination require predefined hypotheses and in that manner do not allow new scientific discoveries by themselves. 10tably, bottom-up AI studies often achieved superior predictions than the pure top-down, theorydriven studies, [11][12][13] however, to our knowledge, these AI-based predictions were never translated into actual theoretical advancements in suicidology.
In the absence of pre-defined hypotheses about specific risk factors and the opaque nature of the LLMs, which are often referred to as 'black box' models, 14 it has been difficult to pinpoint the exact patterns or themes (i.e., risk factors) that drove these complex models to make their high quality predictions.In a way, the improved predictions generated by the LLMs came at a cost, as we now struggle to understand the internal mechanism of these models.As opposed to the top-down studies that typically examined a limited number of well-defined risk factors, the bottom-up AI studies analyzed numerous and implicit, data-driven features, thus restricting our ability to isolate the concrete psychosocial risks that might have been involved in the creation or maintenance of the suicide behaviors.
The overall goal of this research was to address this gap and harness the power of the LLMs for scientific discovery of risk factors.To exhaust the full potential of the LLMs, the first steps of the research pipeline were designed in a purely bottom-up manner, so that the results will not reflect upon predefined risk factors, but on authentic, data-driven, and perhaps less researched factors.
Notably, the results from these almost fully automated steps indicated that the topic (i.e., theme) that contributed the most to the prediction of suicide addressed boredom experiences.
Boredom, or the "unfulfilled desire to be engaged in satisfying activity" as defined by Fahlman and collegues, 15 is typically accompanied by mild negative emotions, low arousal and attentiveness, and decreased sense of meaning in life. 16This negative experience of boredom is even considered as a significant risk factor, or an inherent component, of depression. 17,18However, to our knowledge, the role of boredom in the emergence and maintenance of suicide ideation and behaviors has not been characterized in the literature.[21][22] We therefore completed the research pipeline with a final step that consisted of the collection of a new dataset.Using this new dataset, we could now examine a pre-defined hypothesis (that emerged from the previous bottom-up steps) that boredom experiences will predict suicide risk, either directly or indirectly, through the mediating variable of depression.In this way, we could further characterize the role of boredom in suicide behavior and illustrate how the new LLMs can be leveraged, not only for practical prediction of suicide risk, but also for theoretical advancements in suicidology (Discussion).

Data
The collection of the data was conducted with the ethical approval of the Institutional Review Boards of the Hebrew University of Jerusalem and the Technion -Israel Institute of Technology.The complete description of the primary dataset of the current study is available in our previous publication that focused on prediction (rather than on understanding) of suicide. 9Briefly, upon reading and signing a consent form, participants recruited from Amazon's Mechanical Turk (MTurk) completed common psychiatric and psychological questionnaires and gave a one-off authorization to download their Facebook activity up to 12 months prior to the research date.This activity was extracted to an encrypted data storage through a designated application that was developed for this purpose.
Altogether, the primary dataset consisted of 228,052 Facebook postings that were uploaded by 1,006 active Facebook users (23.26% male).The participants were English speaking residents of the US (mean age = 44.7,SD = 13.9).The quality of the participants' responses was ensured via subtle measures and attention checks we developed (for further information about this primary dataset, see the Supplementary Material). 23mportantly, the collected postings were matched to the users' responses to the wellresearched Columbia Suicide Severity Rating Scale (CSSRS), 24 which was administered in the original study.The CSSRS is considered a 'diagnostic tool of choice', both in clinical settings and in empirical research. 25,26Upon consultation with the principal developer of the CSSRS (Posner, personal written communication), we chose to administer the electronic, screening version of the scale, in light of the fact that the research examined participants from a crowdsourcing platform.This version has been demonstrated to have psychometric validity and prediction accuracies that are comparable to the original clinician version of the scale. 27,28e structure of the CSSRS contains two parts.In the first part (items 1-2), participants are asked to respond YES or NO to questions about passive suicide ideation.In the second part (items 3-6, which are only shown to the participants if the first part indicated they had thoughts of killing themselves), the participants are asked about active suicide ideation (i.e., suicidal thoughts with method, intent, or a specific plan) suicide behavior (i.e., real-life activities aimed at ending one's life, such as collecting pills or obtaining a gun).In this study, the distribution of the participants' scores was as follows: 64.01% received zero, 10.47% received 1, 12.36% received 2, 6.08% received 3, 2.99% received 4, 3.29% received 5, and 0.8% received 6.

Procedures (a 5-step research pipeline)
Based on this high-quality dataset, we designed a 5-step research pipeline (Figure 1).A complete description of all the methodological details of this pipeline is provided in the Supplementary Material.The description below, in the current section, provides a brief overview of the five consecutive procedures that comprised the research pipeline, including the key computational methods that were implemented in each step and the secondary dataset that was collected in the fifth step.

An overall representation of the research pipeline
First, we used a state-of-the-art (yet still "black box") LLM named SBERT 29 to assign vector representations to the Facebook postings.Second, we applied a clustering algorithm named HDBSCAN, 30 with the goal of organizing the different posts (represented numerically) into groups that ideally capture meaningful 'topics' (i.e., themes).Third, we conducted a stepwise regression 31 and identified the topics that contributed the most to the prediction of suicide (i.e., the CSSRS scores).
Fourth, we analyzed the thematic content of the resulted topics using three methods: manual inspection of representative postings, 'consultation' with ChatGPT, 32 and content analysis using the well-established method of TF-IDF (Term Frequency -Inverse Document Frequency). 33fth, to validate and further examine the bottom-up results from the previous steps, which suggested that suicide risk is linked to boredom experiences (Results), a secondary new sample of 1,062 participants was collected (52% male, mean age = 44.7,SD = 13.9).In this secondary, questionnaire-based dataset, participants completed three well-validated psychological measurements: (1) the aforementioned suicide scale, 24 (2) the 9-item Patient Health Questionnaire (PHQ-9) that is commonly used to measure depression, [34][35][36] (considering its potential mediating role that was described in the Introduction), and (3) a well-researched measurement of boredom named the Multidimensional State Boredom Scale (MSBS).The descriptive information about the CSSRS is presented above.The PHQ-9 is a nine-item scale that targets the nine symptoms of depression that appear in the fifth edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-5). 37Following the DSM diagnostic criteria, a cutoff point for the presence of depression can be calculated when participants report of five or more symptoms that disturbed them in the past two weeks, for at least "more than half the days".In addition, at least one of these symptoms should refer to either low interest or depressed mood, which are the core symptoms of the disorder. 34In the current secondary sample, a total of 130 users (12.24%) met this DSM-based cut-off point for depression.
The MSBS consists of 28 items designed to evaluate the state of boredom.An additional scale measuring the more general tendency of the person to feel bored (i.e., the Boredom Proneness Scale) 17 was also implemented in this secondary sample (see the Supplementary Material).However, the following analyses relate to the state boredom scale only to match the two additional scales of the study that evaluated the current/recent (state) depressive and suicidal experiences of the participants (i.e., the PHQ-9 and the CSSRS).The state boredom scale also reflected best the boredom-related expressions that were identified in the primary Facebook data.
The 28 items of the MSBS are classified into five theoretical dimensions: Disengagement, High arousal, Low arousal, Inattention, and Time perception.Of particular interest to our investigation is the central dimension of Disengagement.This dimension, which consists of the largest number of items (N = 10) and has the highest factor loading (0.97) includes items, such as: "Everything seems repetitive and routine to me", "I feel bored", and "I want to do something fun, but nothing appeals to me". 15 The mean scores of the participants on this dimension of Disengagement was 2.26 (SD = 1.56, range = 0 to 6).
Notably, this Disengagement dimension was found to be the closest dimension (out of the five potential dimensions of boredom) to the boredom expressions that were common in the primary Facebook data.Various similarity analyses, which are described in the Supplementary Material ensured this similarity formal concept of boredom, as manifested in this Disengagement dimension, and the informal, Facebook-based topic of boredom.Altogether, this boredom measure, alongside the two measurements of suicide and depression, were used to examine the hypothesis that was generated in the previous, bottom-up steps, regarding the direct and indirect links between boredom and suicide.

Results
The analysis of the data was conducted via Pingouina commonly used statistics package written in Python. 38The bottom-up analysis of the primary dataset from Facebook and the consecutive top-down analysis of both datasets are presented below.

Bottom-up, topic-based analysis
The stepwise regression model (step 3) of the entire set of topics (step 2) resulted in four topics that were significantly correlated with suicide risk [ 2 = 0.037, (4,1001) = 9.694,  < 0.0001].Table 1 presents the regression scores for each one of these four topics.The first topic that related to suicide [β=0.144,t(1005)= 3.3, p<0.005] consisted of boredom-related manifestations, as can be seen in the TF-IDF analysis and the response by ChatGPT (Step 4, Figure 2, Table S1, Table S2).
The second topic seemed to reflect the user's wish for something of value to him or her Table 1.
Results from the stepwise regression that aimed to predict suicide risk from topics.
Predictor (topic) Note.See the third step in the Supplementary Material for the description of the input and output of the regression as well as the selection thresholds that were implemented in the analysis.This study focuses on the topic with the highest prediction value (i.e., boredom).Future studies are recommended to further examine the role of the remaining topics in the context of suicide research.

Top-down hypothesis testing
To validate the purely bottom-up finding regarding boredom, we calculated the correlations between the participants' scores on the boredom and the suicide scales in the secondary dataset (Step 5).This calculation indicated that the Disengagement factor was moderately correlated with the CSSRS (r = .353,p < 0.001; Figure 3a), thus providing a first confirmation of the hypothesis that boredom is linked to suicide behaviors.
To further characterize this link, we used the secondary dataset to conduct a path analysis, 38 which also considered depression, which has been linked before, both to suicide and to boredom (Introduction).As illustrated in Figure 3b, this analysis indicated that the direct path between boredom and suicide was not significant (β = -.08,SE=.048, p = .093),while the indirect path between boredom and suicide through the mediating variable of depression was significant and quite substantial (β = .508,SE = .048,p < .0001).
Finally, we replicated this path analysis using the original Facebook data and replaced the boredom questionnaire with the boredom topic (see also the Supplementary Material).This analysis resulted in two significant paths from boredom to suicide, an indirect path as documented in the previous analysis of the newly collected measures (β = .046,SE = .024,p = .002)as well as a direct path (β = .021,SE = .040,p < .001) that suggests that boredom itself can increase the risk for suicide (Figure 3).Moreover, in this dataset from Facebook, the bivariate correlation between the topic of boredom and suicide risk (r = 0.185, p < 0.001) was stronger than the correlation between boredom and depression (r = 0.078, p < 0.05).A similar pattern of results was evidenced when the proneness boredom scale was used (see the Supplementary Material).

Discussion
This study, which consisted of a designated, 5-step research pipeline, aimed to harness the powerful LLMs for scientific discovery of risk factors for suicide.The first, purely bottom-up steps of the pipeline indicated that the topic that contributed the most to the prediction of suicide risk addressed experiences of boredom.The fifth, top-down step, which involved pre-defined hypotheses, resulted in two potential paths that link boredom to suicide behaviors: The analysis of the secondary questionnaire-based dataset suggested that the least severe construct of boredom predicts the more severe moderator of depression, which in turn, predicts suicide ideation and behaviors.The analysis of the primary Facebook data indicated that boredom is linked directly to the risk of suicide, thus implying that boredom might serve as a maladaptive 'active ingredient' that is capable of triggering suicide ideation regardless of depression.Specifically, this mediation analysis of the Facebook data, as well as the observed superiority of the correlation between boredom and suicide over the correlation between boredom and depression (Figure 3), imply that the role of boredom in the creation and maintenance of suicide behaviors may be more crucial than what we might have thought based on the scarce and mostly outdated literature on this subject.
This last conclusion is noteworthy considering the existing research on boredom.As mentioned in the introduction, following the almost fully automated emergence of the topic of boredom, we used APA PsycInfo to search for the terms boredom and depression in any possible search field.This search yielded 46 results, of which 24 consisted of empirical studies, and only four were relevant to the hypothesized link between boredom and suicide: A 30-year old research presented two surveys among Canadian and French adolescents, which yielded multiple links between suicide ideation and a range of risk factors, such as drug use, somatic complaints, and other self-perceived health problems, which included also a self-reported item regarding "bored/has a boring life". 19A second, relatively old study, among 127 students reported a correlation (r = 0.37) between suicide risk and tedium, but the term 'tedium' may not serve as a good reflection of boredom as the authors defined it as an "experience of physical, emotional, and mental exhaustion". 21A third, more recent PhD dissertation among prison inmates reported of multiple risk factors that were associated with suicide risk, including one of the items in the Psychopathy Checklist-Revised scale (PCL-R), which addressed the inmate's need for stimulation and his proneness to boredom. 20Finally, a fourth study that followed 31 depressed patients for a week, reported that suicide ideation was preceded by feelings of tension, sadness, previous suicide ideation, and boredom. 22gether, the findings from this literature review, which indicated that boredom did not receive sufficient research attention in the context of suicide, and the findings from the current study emphasize the need to keep investigating the potential harmful role of boredom.It is noted that we do not argue that the current study provides a definite proof for a direct pathway between boredom and suicide.However, we strongly recommend that further research will aim to explore this direction, especially considering the ambiguity of the concept of boredom and its intertwined relationships with depression.
Despite the fact that all humans experience boredom from time to time, 39,40 the term 'boredom' and its distinctiveness from similar psychological constructs, such as anhedonia, apathy, and emptiness, are not consistently defined in the literature. 41Correspondingly, a variety of theories were proposed over the years to explain this experience and to track its etiology, ranging from attention-based and arousal paradigms 42 to existential and psychodynamic theories. 43Indeed, some evolutionary-oriented theories emphasized the potentially positive role of boredom, which could encourage people to seek change and engage in creative and meaningful activities. 44,45However, in many cases, the experience of boredom seems to trigger negative emotions 46 and even psychopathologies, 47 including mainly depression.
The developers of the most common boredom proneness scale reported that it demonstrated moderate-size correlations (0.47 ≤ r ≤ 0.52) with three validated measurements of depression. 17ese strong correlations were expected, according to the authors, because boredom and depression share "overlapping symptomology" (p.11).Similarly, the developers of the brief State Boredom Measure (SBM) reported that all of its eight items correlated positively with depression (0.19 ≤ r ≤ 0.59). 18These psychometric data alongside the aforementioned theoretical statements about "overlapping symptomology" bring forth an unsettling discriminant validity issue whereby boredom is not clearly differentiated from depression. 45ndeed, some theorists aimed to pinpoint internal psychological aspects that differentiate boredom tendencies from pure depression. 48,49These aspects might include external ambivalence in which bored persons typically blame others for their failures rather than directing anger towards themselves as is common in depression.They may also be characterized by passive avoidance and tend to avoid work or evade responsibility rather than keep trying while failing to win the love and admiration of others as often is the case with depression. 48However, in practice, the phenomenological overlap between these two concepts of boredom and depression makes the research on the implications of boredom rather difficult.This discriminant validity problem may explain why the results from the path analysis of the questionnaire data were not identical to the results from the Facebook data.In the questionnaire data, it is possible that the concept of boredom was not differentiated sufficiently from the concept of depression, thus preventing us from isolating its direct impact on suicide.

Limitations
This study has several limitations.Firstly, the social media data of this study was extracted from relatively active users of Facebook, 9 thus limiting the generalizability of its bottom-up findings.
Users who suffer from depression (a close predictor of suicide), for example, might be less active, or hesitant to share their thoughts or emotions online, 50 and therefore be less represented in this sample of Facebook users.Secondly, despite the similarity between the Facebook-based topic of boredom and the questionnaire-based concept of boredom (Figure S1), the two psychological constructs may not be identical.This characteristic may explain the different patterns of results, but it also limits our ability to provide a perfect confirmation of the bottom-up findings through the top-down analyses.
Lastly, the observational nature of the two datasets used in this study limits our ability to propose causal oriented conclusions, thus emphasizing the importance of conducting further studies on this topic, preferably using longitudinal research designs.

Clinical implications
Considering these limitations, the findings of the current study should be interpreted with caution.Nevertheless, the purely bottom-up nature of the key finding of this study emphasizes the potential hazard in continuous, burdening experiences of boredom.Although boredom is a very common experience, its negative implications may be undervalued, both in the literature and in the clinical field.For some people, boredom may indeed feel like a relatively benign experience, or even a positive experience as it could lead to creative and novel ideas and activities, as well as meaningful positive change. 44,45However, for many people, boredom is perceived as a fairly aversive experience, 40 and our findings suggest that it might even lead a person to engage in dangerous suicidal behaviors.
The mechanism that might explain this hazard has not been the focus of the current study.
However, based on the existing literature on boredom, one potential explanation may relate to the increased risk for self-harm behaviors in boredom experiences, which in turn, might evolve into actual suicidal behavior.In the well-cited series of studies on "the challenges of the disengaged mind", a seemingly harmless and easy task of spending 6-15 minutes doing nothing was experienced by most participants as an inherently unpleasant mission, to the point that many participants (especially male participants) chose to give themselves an electric shock. 51In a way, for these individuals, an adverse stimulation was preferred to no stimulation at all.Indeed, self-harm behaviors are typically understood as a (non-adaptive) method to regulate unwanted negative emotions. 52,53However, there seems to be convincing evidence that self-harm behaviors can also result simply from the person's wish to avoid tedious experiences and disrupt their burdening monotony.5][56] This 'dark side' of boredom has even led researchers to reconsider its role in the self-harming behaviors that characterize the common Borderline Personality Disorder. 57second explanation for the observed link between boredom and suicide may be the enmeshment between boredom and deeper existential experiences, such as emptiness 58 and lack of meaning in life, 59 which are recognized as substantial risk factors of suicide. 60To our knowledge, these subjective, amorphic, and even mysterious human experiences typically do not receive sufficient space in the context of evidence-based treatments for suicide, such as interpersonal psychotherapy (IPT) 61 and Cognitive-Behavioral Therapy (CBT). 62We therefore conclude that boredom should be further considered, both in theory as argued above, and in practice, during suicide prevention programs and therapies.
This conclusion is noteworthy considering the current state of the literature on suicide.
Suicide research, as described in the introduction, typically revolves around the same risk factors, 5 and this study demonstrated that the integration of AI methods could lead to the exposure of underrecognized risks, such as boredom.Boredom, as documented in our thorough review of the literature above, has rarely been investigated directly within the context of suicide, and the current AI-inspired research allowed us to identify it and further explore it using complementing top-down analysis.We therefore join previous calls to combine AI tools in suicide research, 8,63 in a careful and responsible manner, as these tools can improve our understanding of suicide, and perhaps direct us to craft more clinically tailored treatments to individuals at risk.

Primary dataset from Facebook
The complete description of the Facebook dataset used in the current study is available in our previous publication that focused on prediction of suicide risk, rather than on understanding of the suicide phenomenon. 1 However, two characteristics differentiate between the current dataset and the previous one.First, the current sample (N = 1,006) included four additional participants whose Facebook activity could not be investigated with the (less advanced) language model that was used in our previous study.Second, the number of Facebook postings in the current dataset (N = 228,052) was larger than the number of postings in the previous dataset as it included also (textual) postings that were attached to videos, images, and internet links.This is in contrast to our previous study, which focused on standalone textual postings only.

5-step research pipeline
As mentioned in the main article, the procedures of the study consisted of a 5-step research pipeline (Figure 1, main article).Below is a detailed description of the five steps.

Step 1. Representing the Facebook texts
In the first step, we extracted numeric representations of the Facebook texts we collected, using a popular Deep Language Model (DLM) named Sentence-BERT (SBERT). 2 SBERT is a variant of the well-established BERT model, 3 and it is specifically suitable for generating representations of short texts, such as social media postings.
In addition to the two objectives that served the development and training of the original BERT model ("masked language model" and "next sentence prediction"), 3 SBERT has been further optimized to generate sentence representations that reflect similarities between sentences.These representations appear in the form of numerical, multidimensional vectors known as "embedding vectors", which have been demonstrated to capture many linguistic and semantic aspects of the language. 2In this study, we utilize the "base" version of SBERT (via the implementation of https://huggingface.co/sentence-transformers), which takes as input the entire sentence and provides a 768-dimensional vector.
These vectors are also known as 'embedding vectors' and previous studies showed that they are able to capture many linguistic and semantic aspects of natural language. 2Altogether, the representation process resulted in 228,052 vectors, one for each Facebook text.

Step 2. Clustering the text representations into topics
In the second step, we applied a clustering algorithm with the goal of organizing the different posts, which were now represented numerically, into groups that ideally capture meaningful 'topics' (we refer to the resulted clusters as topics because the spatial proximity of the representations is assumed to reflect proximity in the semantic meaning of the posts).This is achieved by projecting posts into a numerical space with semantic significance.Consequently, posts sharing similar subjects or themes tend to belong to the same cluster, which can be labeled with the common topic represented by the posts within it.Note that there are many other traditional methods for "topic modeling" (e.g., LDA: Latent Dirichlet Allocation) 4 that are based on statistical distributions of words/n-grams in the text.However, in this study we leveraged the power of the more rich and fine-grained embedding representations that capture the entire context of the words and not just the words themselves.
Furthermore, the clustering method ensures that each post is assigned to only one topic, a property that is crucial for subsequent analysis (see Step 3).
The clustering algorithm that was used for this task was HDBSCAN. 5 This algorithm applies a hierarchical density-based clustering method and automatically determines the optimal number of clusters (minimum cluster size = 25).Before applying the algorithm, the embedding representation vectors were dimensionally reduced from 768 to 10 dimensions using the Uniform Manifold Approximation and Projection (UMAP) algorithm. 5Altogether, this process yielded 771 topics, of which we removed 60 topics that were shared by less than ten different users, in order to avoid esoteric topics that are used only by very few.The number of posts in each topic ranged from 25 to 4956 (M = 96.9,SD =259.9 ).
Step 3. Identifying the topics that are most related to suicide risk In the third step, we aimed to filter out irrelevant topics and identify the topics that are mostly correlated with suicide risk.To achieve this aim, we implemented a stepwise regression model 6 that predicts the users' suicide risk score (i.e., the CSSRS scores, which ranged from 0 to 6) based on their proportional use of each of topic.More specifically, for each user u, we created a numerical topic distribution vector   : Where n is the number of topics and    is the proportion of posts user u published in topic i.
These vectors were fed as input to the regression model, with each topic considered as a potential independent variable.To determine a manageable number of topics and facilitate interpretation, we introduced strict thresholds in our stepwise algorithm.The forward selection (inclusion) threshold was set to p-values < 0.003 and the backward selection (exclusion) threshold was set to p-values > 0.005.This step yielded a small set of topics that can be viewed as the most significant predictors of suicide risk.
Step 4. Analyzing the thematic content of the suicide-related topics.
In the fourth step, we interpreted the resulting set of topics from the previous step, using three methods: a manual inspection of posts sampled from the topics by three of the authors, a 'consultation' with the currently popular LLM of ChatGPT 7 (which was asked to propose a shared topic for each list of posts, see Table S1), and a well-established analysis named TF-IDF (Term Frequency -Inverse Document Frequency). 8TF-IDF provides statistical measurements to weight words according to their relevance within each document (or 'topic,' in this case), by taking into consideration both the frequencies of words in the topic and their overall prevalence in the other Table S2.Step 5. Using a top-down hypothesis testing to further examine the strongest suicide-related topic (i.e., boredom).
To validate and further examine the results from the previous steps, we re-examined them using more conventional, top-down hypothesis testing methods.Note that the a-priory hypotheses needed to implement this top-down step could have only been formulated after we obtained the results of the previous steps, which suggested that boredom plays a role in the creation or maintenance of the suicide risk (Results).
In this step, as mentioned in the main article, we collected a new sample of 1,062 participants from MTurk and asked them to complete three psychological measurements: (1) the Columbia Suicide Severity Rating Scale (CSSRS), 9,10 (2) the 9-item Patient Health Questionnaire (PHQ-9) measuring depression, [11][12][13] and (3) the Multidimensional State Boredom Scale (MSBS). 14This last scale consists of 28 items that aims to assess five theoretical dimensions of boredom.Notably, the central dimension of this scale, the Disengagement dimension that has the largest number of items and the highest factor loading, reflected the topic of boredom that emerged from the primary Facebook data (see below).

Similarity analysis
As a complementary analysis we wished to evaluate the extent to which the bottom-up topic of boredom (which was extracted using the clustering algorithm that was applied to the LLM-based representations of the Facebook posts) is indeed related to the more formal concept of boredom, as might be extracted from the validated questionnaire that measures it.To this end, we used SBERT (see the Methods in the main article) to produce vector representations for the items that comprise the disengagement factor of the Multidimensional State Boredom Scale. 14In addition, we produced representations of the items of the two other common psychological scales that could serve as control scales for the analysis below -the previously mentioned PHQ-9, which measures depression, 13 and the short version of the well-known Big Five Inventory. 15In addition to the straightforward comparison with the boredom questionnaire, we were interested in the connection to depression, as the psychological literature draws meaningful connections between boredom and depression (see the introduction section of the main article).Furthermore, we sought to gauge the validity of our findings by employing a broad-spectrum personality assessment, the Big Five questionnaire, as a general measure.
Following this procedure, we computed the similarity score of these scales with the SBERT representation vectors of the postings that were classified to the boredom topic.To calculate this similarity score, we computed the similarity between each sentence in a post, since sentence level is more compatible to match a questionnaire item, and each item in the questionnaire.The final score for each post was obtained by selecting the highest similarity score between a sentence and a questionnaire item.To compute the overall similarity score between the questionnaire and a topic, we took the average of all the similarity scores calculated between the questionnaire and the posts of that topic.When comparing the similarities between the scores of the different scales, we revealed that

Path analysis of the Facebook data
The path analysis presented in the main article addressed three psychological constructs: suicide risk, depression, and boredom.The measures used to obtain the first two constructs in the primary Facebook dataset were the same as the measures implemented in the secondary questionnaire dataset (i.e., the CSSRS 16 and the PHQ-9 13 ).The third measure of boredom, in contrast, was obtained differently in the two datasets.In the primary Facebook dataset, the boredom scores were assigned to participants based on their usage of the boredom topic, while in the secondary questionnaire dataset the boredom scores were assigned to the participants based on their responses to the boredom scale (i.e., the MSBS).
This difference created a significant gap between the two datasets.While the secondary questionnaire dataset included boredom scores for all the participants (as all were required to complete the boredom questionnaire), in the primary Facebook data, only 3% of the sample (35 of 1,006 participants) had postings that were assigned to the topic of boredom by the clustering algorithm that was used in the study.This low percentage occurred because the clustering algorithm

Figure S1 Comparing
Figure S1 15 Top 10 TF-IDF words in each of the significant buttom-up topicsNote.The words in each column are ordered by their TF-IDF scores (which are presented in parenthesis).