The Differences Between Individuals Engaging in Nonsuicidal Self-Injury and Suicide Attempt Are Complex (vs. Complicated or Simple)

Background Why do some people engage in nonsuicidal self-injury (NSSI) while others attempt suicide? One way to advance knowledge about this question is to shed light on the differences between people who engage in NSSI and people who attempt suicide. These groups could differ in three broad ways. First, these two groups may differ in a simple way, such that one or a small set of factors is both necessary and sufficient to accurately distinguish the two groups. Second, they might differ in a complicated way, meaning that a specific set of a large number of factors is both necessary and sufficient to accurately classify them. Third, they might differ in a complex way, with no necessary factor combinations and potentially no sufficient factor combinations. In this scenario, at the group level, complicated algorithms would either be insufficient (i.e., no complicated algorithm produces good accuracy) or unnecessary (i.e., many complicated algorithms produce good accuracy) to distinguish between groups. This study directly tested these three possibilities in a sample of people with a history of NSSI and/or suicide attempt. Method A total of 954 participants who have either engaged in NSSI and/or suicide attempt in their lifetime were recruited from online forums. Participants completed a series of measures on factors commonly associated with NSSI and suicide attempt. To test for simple differences, univariate logistic regressions were conducted. One theoretically informed multiple logistic regression model with suicidal desire, capability for suicide, and their interaction term was considered as well. To examine complicated and complex differences, multiple logistic regression and machine learning analyses were conducted. Results No simple algorithm (i.e., single factor or small set of factors) accurately distinguished between groups. Complicated algorithms constructed with cross-validation methods produced fair accuracy; complicated algorithms constructed with bootstrap optimism methods produced good accuracy, but multiple different algorithms with this method produced similar results. Conclusions Findings were consistent with complex differences between people who engage in NSSI and suicide attempts. Specific complicated algorithms were either insufficient (cross-validation results) or unnecessary (bootstrap optimism results) to distinguish between these groups with high accuracy.


INTRODUCTION
Nonsuicidal self-injury (NSSI) is defined as the direct and deliberate destruction of body tissue without any suicidal intent, whereas suicide attempt refers to the engagement in potentially self-injurious behavior with at least some intent to die from the behavior (1). Both behaviors are dangerous in nature, and both are unfortunately common. The prevalence rates of NSSI among the general population are estimated to be 17% among adolescents, 13% among young adults, and 5.5% among adults (2). For suicide attempt, the lifetime prevalence rates are estimated to be 2-4% (3,4). Given that NSSI significantly increases risk for future suicide attempt (5) and suicide attempt is associated with worse treatment course and increased risk of mortality (6, 7), it is important to understand why certain individuals only engage in NSSI whereas others engage in suicide attempt. A first step toward answering this question is to understand how the characteristics of individuals engaging in NSSI and those engaging in suicide attempt differ cross-sectionally.
There are three general ways that individuals with NSSI and individuals with suicide attempt might differ (Table 1). First, they might differ in a simple way. That is, one or a small set of factors might be both necessary and sufficient to distinguish between them. One example of a simple difference is how atoms are different from each other: the number of protons is the necessary and sufficient factor to identify each type of atom. Importantly, simple differences entail easily comprehensible and sharp distinctions rather than oversimplification. In terms of the differences between individuals engaging in NSSI and suicide attempt, the interpersonal theory of suicide, one of the most widely known theories in the field, posits that the presence of both suicidal desire and acquired capability for suicide (i.e., fearlessness about death) leads to suicidal behaviors (8,9). Therefore, the key differentiating factors between individuals who only engage in NSSI and those who attempt suicide should be the combination of both suicidal desire and capability for suicide (8)(9)(10).
Second, individuals engaging in NSSI and those engaging in suicide attempt might differ in a complicated manner, such that a specific set of a large number factors is both necessary and sufficient to accurately classify them. As an example of a complicated difference, a functioning smartphone requires a large number of working components, including a circuit board, a speaker, a microphone, an antenna, a battery, a display screen, and a SIM card. If any component in the specific set is missing (e.g., a dead battery), the smartphone becomes nonfunctioning. That is, in order to distinguish between functioning and nonfunctioning smartphones, the above mentioned combination of a large number of factors is both necessary and sufficient. Any phones with all the above components present are considered functioning, and any phones with even just one component missing are considered nonfunctioning. Even though complicated differences involve a large number of factors, the distinctions are nonetheless sharp and clear.
In the context of NSSI and suicide attempt, perhaps individuals engaging in suicide attempt exhibit a specific set of TABLE 1 | Possible differences between individuals engaging in NSSI and suicide attempt.

Simple Complicated Complex
Definitions One or a small number of factors are both necessary and sufficient for accurate distinction.
A specific set of a large number of factors is both necessary and sufficient for accurate distinction.
Many (but not all) combinations of factors are sufficient for accurate distinction, but no combination is necessary.
Examples The number of protons is both a necessary and sufficient factor to accurately distinguish between different types of atoms.
The presence of the following components is both necessary and sufficient to accurately distinguish between a functioning smartphone and a nonfunctioning smartphone: a circuit board, a speaker, a microphone, an antenna, a battery, a display screen, and a SIM card.
The solutions to the following mathematical problems are complex: a + b = 1 a + b + c + … + x + y + z = 1

NSSI and Suicide Attempt Examples
The presence of suicidal desire and acquired capability for suicide might be both necessary and sufficient to distinguish between individuals only engaging in NSSI and individuals engaging in suicide attempt.
The presence of the following factors might be both necessary and sufficient to distinguish between individuals only engaging in NSSI and individuals engaging in suicide attempt: suicidal plans, nonzero suicidal desire, nonzero suicidal intent, acquired capability for suicide, no reasons for living, loneliness, hopelessness, access to means, and recent stressors.
One possible combination that might accurately distinguish between individuals only engaging in NSSI and individuals engaging in suicide attempt: above 60 years old + male + … + access to firearm = an individual engaging in suicide attempt Another possible combination: bullied + low socioeconomic status + childhood abuse + … + lack of friends = an individual engaging in suicide attempt One combination that might not distinguish the two groups: shoe size above five + yellow as favorite color + … + have a pet characteristics that is both necessary and sufficient to distinguish the two groups . For example, all individuals with suicide attempt  might have the following characteristics: presence of suicidal  plans, nonzero suicidal desire, nonzero suicidal intent, acquired  capability for suicide, no reasons for living, loneliness, hopelessness, access to means, and recent stressors. If the combination of these factors is both necessary and sufficient to distinguish between individuals with NSSI and those with suicide attempt, it entails that we could classify any individual with even one of the factors lacking as an individual with NSSI (vs. NSSI and suicide attempt) with a high degree of certainty. For individuals with all the factors present, we could confidently classify this individual as someone engaging in suicide attempt. Third, complex differences might exist between the two groups. Colloquially, it is common to refer to complicated systems and differences as complex. For example, in our prior work we sometimes referred to complicated algorithms and complicated factor relationships as complex (11). But in the technical sense, there are many important differences between complicated systems/differences and complex systems/ differences (12)(13)(14)(15)(16), highlighting the need to distinguish between complicated and complex. One such difference concerns necessary and sufficient factors. Whereas complicated systems/differences involve a combination of necessary and sufficient factors (see above), complex systems/differences do not. If the difference between two groups is complex, there may be no algorithm that is sufficient to distinguish between all members of the two groups (i.e., no sufficient combination of factors). If a sufficient algorithm is found, the differences between the groups would still be complex if multiple algorithms with different factors or factor combinations were also sufficient to distinguish between the groups (i.e., no necessary factors or factor combinations).
It is important to note that there are degrees of complexity. For example, an algorithm that correctly classified 70% of the members of two groups would indicate more complex differences than an algorithm that correctly classified group 95% of group members (i.e., less sufficiency and, thus, greater complexity indicated by the 70% algorithm). Similarly, group differences would be considered more complex if 1,000 algorithms were sufficient to distinguish between groups than if only two algorithms were sufficient to distinguish between groups (i.e., less necessity and, thus, greater complexity indicated in the scenario where 1,000 algorithms were sufficient).
Although it is intuitive for humans to attempt to model systems as simple (17)(18)(19), most natural systems are complex (20)(21)(22). As such, many consider complexity to be the default model; evidence must be provided to constrain from a complex model to a complicated or simple model ( Figure 1). To constrain from a complex model to a complicated model, evidence must be shown that a complicated combination of factors is both necessary and sufficient to distinguish between all members of two groups. To further constrain from a complicated to a simple model, evidence must be shown that a simple combination of factors is both necessary and sufficient to distinguish between all members of two groups. Several lines of evidence have led some researchers to suggest that most biological, psychological, and social phenomena are complex rather than complicated or FIGURE 1 | Evidence needed to constrain complex differences to simple or complicated differences. The null model is complexity, and evidence must be provided to constrain from a complex model to a complicated or simple model. Although sufficiency indicates perfect classification of the two groups, we lowered our criterion for sufficiency to good classification accuracy in terms of diagnostic accuracy metrics (e.g., areas under the curve [AUCs]~0.90) in consideration of measurement error. To demonstrate that one factor or one factor combination is necessary, it must be shown that no other algorithms with different factors or factor combinations are also sufficient (i.e., yields good classification accuracy).
simple (16,20,(22)(23)(24)(25)(26). We likewise hypothesize that the differences between people who engage in NSSI and people who attempt suicide are complex rather than complicated or simple. We accordingly hypothesize that no simple or complicated algorithm will be necessary and sufficient to correctly distinguish between all (or nearly all) people who engage in NSSI and suicide attempts.
The present study will test this hypothesis by evaluating whether any simple or complicated algorithms are necessary and sufficient to distinguish between people who engage in NSSI and suicide attempts. In consideration of measurement error, we will lower our criterion for sufficiency from perfectly distinguishing between these two groups to distinguishing between these two groups with very good accuracy in terms of diagnostic accuracy metrics (e.g., areas under the curve [AUCs]0 .90). To test for simple differences, we will conduct univariate logistic regression analyses for each available factor. In addition, we will test a theoretically hypothesized simple difference by entering acquired capability for suicide, suicidal desire, and their interaction term as independent variables into a multiple logistic regression analysis (10). To support simple differences between individuals engaging in NSSI and suicide attempt, either the individual factors or the theoretically informed multiple logistic regression model should produce high classification accuracy. The absence of such evidence would suggest that these group differences are either complicated or complex.
To test for complicated differences, we will use multiple logistic regression analyses and machine learning analyses to construct complicated algorithms to distinguish between people who engage in NSSI and people who attempt suicide. To support complicated differences, two bars must be cleared: sufficiency and necessity. First, to clear the sufficiency bar, at least one algorithm must accurately distinguish between the two groups. The absence of such evidence would suggest that these group differences are complex. Second, if the sufficiency bar is cleared, to additionally clear the necessity bar, only one algorithm should accurately distinguish between the two groups. If more than one algorithm (e.g., with different factors or a different combination of the same factors) produces high accuracy, this would violate necessity and indicate that group differences are complex.
The results of this study will advance the understanding of the nature of differences among individuals engaging in NSSI and suicide attempt, providing a foundation from which we can better understand why some people engage in NSSI whereas others engage in suicide attempts.

METHOD Participants
A total of 954 participants were selected from a high-risk sample recruited internationally for a larger study (27). Participants were recruited from online forums that focused on topics of psychopathology, self-injury, and suicide. The inclusion criteria of the larger study required that participants must (a) be at least 18 years of age or older; (b) demonstrate sufficient English fluency to understand study instructions; (c) have engaged in nonsuicidal self-cutting at least twice in the past two weeks, have attempted suicide in the past year, or have thought about suicide more days than not in the past two weeks. The third inclusion criterion was designed to balance the need of recruiting a large sample to avoid potential model overfitting (see Modeling Approach below) and the need of recruiting a severe sample to ensure sufficient variance in the data (e.g., a sufficient number of suicide attempts). Because self-cutting is a severe and yet common form of NSSI (28), the frequency of self-cutting was used as a screening criterion. Participants with other forms of NSSI (e.g., self-burning) were not excluded if they met one of the three criteria on previous self-injurious thoughts and behaviors.
In addition to the inclusion criteria of the larger study, the present study required that participants must have either engaged in NSSI (N = 319) or attempted suicide (N = 635) in their lifetime. For participants who met the inclusion criteria of the original larger study because they had thought about suicide more days than not in the past two weeks at screening (but might not have engaged in nonsuicidal self-cutting at least twice in the past two weeks or attempted suicide in the past year), they were retained for the present study as long as they have engaged in NSSI or suicide attempt at least once in their lifetime.

Procedures
The Institutional Review Boards at Florida State University and Vanderbilt University approved all study procedures. With the approval of online forum moderators, study advertisements were posted in web forums about mental health, self-injury, and suicide. Individuals interested in participation were asked to complete a brief screening survey to determine their eligibility. To ensure anonymity, individuals were asked to provide a nonidentifiable email address at the end of the screening survey (e.g., without names, date of birth, school and work information) for future study communication. Eligible individuals who provided consent were emailed their unique, randomly generated identification number and a link to complete the study assessment. The survey included approximately 50-min of computerized tasks and questionnaires. Within 24 h of completion, participants were provided with a $10 electronic Amazon gift card as study compensation.
The present study elected to collect data online due to the benefits of this method and at the same time implemented multiple procedures to guard against potential threats to validity. The advantages of online recruitment include easier access to diverse populations, minimal geographical constraints, and increased possibility of recruiting severe clinical samples (29). In addition, research has shown that online studies produce comparable results to the traditional face-to-face settings (30). Consistent with best practices of online recruitment (31,32), multiple steps were adopted during the screening process to ensure data quality. First, to reduce the likelihood of individuals intentionally altering their responses to gain access to the study, the inclusion criteria were not included in the study advertisements, and relevant screening questions were embedded among irrelevant filler questions. Second, duplicate items and free-response items were included in the screening survey to check for consistency and English fluency. Third, to prevent the same individuals from entering the study more than once, only unique IP addresses were allowed to participate in the study.

Measures
We included factors that have been found to be broadly associated with NSSI and suicide attempt (33,34), such as demographics, psychopathology, prior self-injurious thoughts and behaviors, and explicit and implicit processes. We intentionally balanced relatively stable, distal factors with more variant and proximal factors (e.g., affective states). Theoretically relevant constructs (e.g., hopelessness, capability for suicide) were also assessed. Given that hundreds of factors have been studied in relation to NSSI and suicide attempt, it was not feasible to include all possibly relevant factors. However, the potential omission of one or a few specific factors is unlikely to impact the results. Meta-analytic evidence suggests that hundreds of factors confer risk for NSSI and suicide attempt to a similar extent, and no factor exerts particularly strong effects (33,34). Therefore, it is unlikely that any factors not included in the present study would exert an effect above and beyond the included factors.

Demographics
Demographic information including age, employment, gender, sexual orientation, and race was assessed using brief selfreport items.

Modified Suicidal Thoughts and Behaviors Interview (SITBI)
The SITBI (35) is a standardized and validated measure assessing for thoughts of NSSI, NSSI, suicidal thoughts, plans, preparations, and attempts. The interview appears valid as it has been shown to strongly correspond to other measures of suicidal thoughts, suicide attempt, and NSSI. The scale also demonstrates strong interrater reliability and test-retest reliability (35). The present study adopted the modified SITBI, a self-report adaptation of the original interview that has been used in previous studies (36,37). In this study, the modules on NSSI and suicidal plans, preparations, and attempts were administered.

Acquired Capability for Suicide Scale-Fearlessness about Death (ACSS-FAD)
The seven-item ACSS-FAD (38) measures fearlessness about death, an important construct theorized to distinguish between individuals who engage in NSSI and suicide attempts (38).
Participants were asked to rate on a four-point Likert scale from 0 (not at all like me) to 4 (very much like me) on statements such as "I am very much afraid to die." Higher scores suggest greater capability for suicide. This measure has been shown to demonstrate good convergent and discriminant validity (38). The internal consistency of the scale was good (Cronbach's a =.85).

Affective States Questionnaire (ASQ)
The ASQ (39, 40) was included to assess nine different negative affective states, such as feelings of self-hatred, abandonment, and humiliation. Participants were asked to answer either "yes" or "no" to experiences of these negative states. The ASQ demonstrates good validity and is predictive of future suicidal behavior (39).

Modified Affect Misattribution Procedure (AMP)
The present study included the modified AMP (41)(42)(43) to assess implicit affect toward suicide and self-injury stimuli given that prior studies have established that reduced implicit aversion toward suicide and self-injury stimuli are associated with increased risk for NSSI (41,44). On each trial of the AMP, an image was presented to the participants on the computer screen. Subsequently, an ambiguous Chinese symbol was presented. Participants were told to ignore the first image stimuli and treat them only as cues that the Chinese symbols were about to flash on the screen. Participants were asked to rate whether they found the Chinese symbols to be pleasant or unpleasant. Research has shown that the pleasantness of the image or word stimuli influences the ratings of the subsequent Chinese symbols (43). Through this misattribution, participants' implicit affective reactions to the original stimuli were assessed. For the present study, we used both positive stimuli (e.g., images of pets, babies, beaches) and suicide/self-injury stimuli. The intensity of suicide/ self-injury stimuli ranged from low (e.g., pills, heights, body bags), moderate (e.g., a floating body in the water, bleeding from self-cutting), to high (e.g., body with severe burn, corpse with fatal gunshot wound to head). The internal consistency was good for each category of images: Cronbach's a was.85 for both the low-intensity and moderate-intensity suicide/self-injury images,.86 for the high-intensity suicide/self-injury images, and.80 for the positive images.

Beck Scale for Suicide Ideation (BSS)
The 21-item BSS (45,46) measures suicidal thoughts and behaviors. In this study, items 1-5 on suicidal desire were administered. Each item was rated on a Likert scale ranging from 0 to 2, with lower scores indicating lower desire for suicide. The internal consistency for the suicidal desire subscale was acceptable (a =.85).

Brief Agitation Measure (BAM)
The BAM (47) includes three self-report items assessing for agitation in the past week. Participants were asked to rate each statement (e.g., "I want to crawl out of my skin") on a sevenpoint Likert scale ranging from 0 (strongly disagree) to 6 (strongly agree). Higher scores indicate higher levels of agitation. This scale has been shown to have good validity and reliability (47). Cronbach's a indicated good internal consistency (a =.84).

Brief Symptom Inventory-18 (BSI-18)The BSI-18
The BSI-18 (48) was adopted to inquire past week psychological symptoms (e.g., anhedonia, pains in heart or chest, nausea). Participants rated how much they experienced each symptom on a five-point Likert scale from 0 (not at all) to 4 (extremely). Higher scores reflect greater psychological distress. Previous research has found that this scale has good reliability and validity. The internal consistency of this scale was good (a =.81).

Explicit Affective Ratings
In addition to measuring implicit affects, we also measured explicit affects (41,49) toward positive, and suicide and selfinjury stimuli given that implicit and explicit associations tend to diverge under certain circumstances [e.g., motivation to disguise explicit attitudes; (50,51)]. Moreover, reduced explicit aversion to suicide and self-injury stimuli has been linked with increased risk for NSSI (41,42). Explicit affect was assessed using a 10point Likert scale. Higher scores indicated that participants found the stimuli more pleasant. Five positive images (Cronbach's a =.79) and five suicide/self-injury images (Cronbach's a =.90) were drawn from the stimuli used in the AMP task described previously for the present assessment. For suicide/self-injury stimuli, the images were of moderate intensity.

Disgust With Life Scale (DWLS)
With 12 self-report items, the DWLS (52, 53) includes two subscales (i.e., disgust with self, disgust with others). Participants rated each item (e.g., "I am disgusted with myself") on a seven-point Likert scale from 0 (not at all true of me) to 6 (very much true of me). Higher scores on the DWLS indicate greater disgust toward self and others. The subscales have shown strong convergent validity with other measures of disgust (52), as well as good internal consistency (a =.90).

Insomnia Severity Index (ISI)
The ISI (54) is a seven-item self-report inventory that measures symptoms of insomnia. The index has shown adequate internal reliability and convergent validity (55). The internal consistency of this scale was good (a =.86).

Missing Data
A total of 33 factors were considered (see Tables 2 and 3 for details). Missing data were minimal (< 0.01%) and addressed using multiple imputation. No outcome data (i.e., engagement in NSSI or suicide attempt) were missing.

Modeling Approach
Considering that it is common for individuals to engage in both NSSI and suicide attempt (56, 57), we elected not to exclude individuals with both behaviors from the models. Individuals engaging in both NSSI and suicide attempt were grouped with individuals with suicide attempt only. That is, the models were tasked with separating individuals with suicide attempt (regardless of their engagement in NSSI) from individuals engaging in NSSI only. This decision was intended to increase the clinical relevance of the study as many clinicians are concerned with whether patients might engage in suicide attempt. Retaining the whole sample would also allow the models to leverage a larger sample size and thereby producing more precise model performance estimates (i.e., narrower confidence intervals). For completeness, we repeated analyses based on the subsample of individuals with NSSI only and individuals with suicide attempt only; the results were statistically identical (Tables 4 and 5).
All statistical analyses were performed in R (58) via glm in base R, and randomForest and pROC packages. To test for simple differences, univariate analyses were conducted for each factor. Even though the primary aim of the study is not to test specific theories, we also considered suicidal desire and capability for suicide as an example of theorized simple difference given that the interpersonal theory (8,9) is one of the most prominent theories in the field that also has well-established measures on the posited factors. To test this theoretically-driven model, a multiple logistic regression model with suicidal desire (as measured by BSS), acquired capability for suicide (as measured by ACSS-FAD), and their interaction term as independent variables was conducted.
A range of analyses were conducted to test for complicated differences between individuals engaging in NSSI and suicide attempt (i.e., to constrain from a complex model to a complicated model). We first examined whether multiple logistic regression analyses with all variables might be sufficient (again, operationalized as AUC~.90 in consideration of measurement errors). This decision was based on prior research supporting the utility of adopting logistic regression models in the classification and prediction of self-injurious thoughts and behaviors (59,60). Second, we analyzed whether machine learning analyses might be sufficient in distinguishing the two groups. Specifically, we adopted random forest algorithms given that they have been commonly used in the field of suicide, self-injury, and medicine (61-64). As a nonparametric method, random forests might also serve as a complement to the multiple logistic regression model (see below for details). If neither model was sufficient in classifying individuals with NSSI and suicide attempt, it would indicate that the differences between the two groups were likely complex instead of complicated. If either model cleared the sufficiency requirement for complicated differences, we would then test for necessity by dropping variables included in the models in various ways (i.e., removing the top five most important factors identified by random forests, the top five most discriminative factors identified by univariate analyses, and a  randomly selected 10% of the variables). If results showed that multiple models could produce similarly accurate classification, they would suggest that none of the models was necessary. This would again indicate complex differences.

Random Forests
Given that random forest algorithms are relatively new compared to traditional logistic regression methods, we hereby provide a brief overview of this method. The random forest algorithm consists of an ensemble of decision trees. Randomness was strategically introduced into the algorithm to avoid overfitting (i.e., overcapitalizing on noise within the present sample) and to increase the likelihood that the algorithm would generalize to a different dataset. For instance, within each tree in the ensemble, only a random subset of factors is allowed to be considered at each "split" of the decision tree. This procedure results in trees that are less correlated, thereby making the overall algorithm more reliable and robust. Per common practice in the field, the number of factors randomly considered at each split in this study was set as the square root of the total number of factors (65). The fitting process was repeated 500 times in this study to produce a forest of trees (66,67). The outcome of the algorithm for each participant (i.e., whether an individual engages in NSSI or suicide attempt) was determined by a majority vote from the 500 trees. The random forest algorithm also provides estimates of the importance of factors within the algorithm by averaging and standardizing the decrease in classification accuracy after randomly permuting each variable.

Internal Validation
Internal validation methods help to reduce overfitting, where algorithms may capitalize on noise in a given dataset, providing an estimate that may not generalize to a new dataset. We first employed 10-fold cross-validation, a commonly used internal validation method (65). This approach involves randomly dividing the data into 10 sets, where models are developed on the combination of nine sets and tested on the one selected set. This procedure is repeated 10 times, each time with a different set selected as the test set. Because of sample imbalance and the accompanied possibility that one set might not contain at least one individual with nonsuicidal self-injury to allow for validation, we adopted a stratified approach during the data splitting process. We also employed bootstrap optimism correction as an additional internal validation technique. To implement this method, the model first needs to be trained on the complete available data, then on a set of bootstrap replicates created from the original data. One hundred replicates were generated in this study. The models built on the replicates are subsequently applied to the original data, yielding performance estimates called "out of bag" estimates. The mean difference between the bootstrapped performance estimates and the "out of bag" estimates represents the extent of overfitting, which is termed "optimism." The model performance indices corrected for optimism can be obtained by subtracting the optimism from the original model performance indices.
Bootstrap optimism correction has been employed in prior work using machine learning to study NSSI and suicide attempts (62,64). Some studies have indicated its particular appropriateness for small samples as this method allows training on the entirety of the data (68)(69)(70)(71). However, recent work indicates that this approach may not adequately reduce overfitting in some cases, resulting in higher accuracy estimates than those obtained with other approaches (72). On balance, some studies indicate that bootstrap optimism correction methods perform similarly to other internal validation methods (73,74), random forest models can generalize well to new data (75,76), and random forest combined with bootstrap optimism correction performs similarly to other internal validation methods and other machine learning techniques (73,77,78). There is also evidence that Walsh et al.'s algorithm (64) using this approach generalizes well to new samples and new suicide-related outcomes (79,80). Nonetheless, much remains unknown about how various methods perform under various conditions, so at a minimum these discrepancies indicate that it would be prudent to conduct analyses with multiple internal validation techniques.

Model Fit Indices
Consistent with prior research (64,81), a range of model fit indices were adopted to evaluate model performance. Area Under the Receiving Operating Characteristic Curve (AUC) was used to assess the overall classification accuracy. Because individuals engaging in suicide attempt substantially outnumbered those only engaging in NSSI in the present sample, solely relying on AUC to evaluate models could be misleading. For instance, a model classifying everyone as engaging in suicide attempt might produce high AUC, but is not clinically meaningful. Therefore, we also considered indices such as precision (i.e., positive predictive value) and recall (i.e., sensitivity). Following guidelines in the field (34,64,81), AUCs of 0.50 to 0.59 suggest extremely poor classification, 0.60 to 0.69 poor classification, 0.70 to 0.79 fair classification, 0.80 to 0.89 good classification, and above.90 excellent classification. These guidelines were also applied to precision and recall. Additionally, Brier score as a calibration index was considered. In the field of clinical psychology, discrimination indices (e.g., AUC, precision, recall) have been more commonly used than calibration indices (82). Yet, in order for a model to be clinically useful, the probability of an outcome as estimated by the model should approximate the actual probability of such an event. In the context of this study, the proportion of individuals identified as engaging in suicide attempt compared to those identified as engaging in NSSI only by the model should match the actual proportion in the sample. A Brier score ranges from zero to one, with zero indicating a complete match between projected probability and the real-world probability. Higher scores indicate poorer model performance due to more deviation of the projected outcome probability from the realworld probability. Brier scores can be calculated with the following formula, where N is the sample size of classified individuals, p i is the projected outcome for individual i, and o i is the observed outcome (83).

Baseline Characteristics
Among the 319 individuals engaging in NSSI but not suicide attempt, 90.91% endorsed self-cutting, 42.63% endorsed selfburning, and 61.76% endorsed using methods other than cutting and burning. Many of the individuals were still actively engaging in these behaviors at the time of the study. About 46.08% of participants reported having cut themselves within the past month, and 23.51% reported having done so within the past week. Approximately 5.96 and 3.45% of the participants reported having burned themselves in the past month and in the past week, respectively. In terms of using other NSSI methods, 28.53% reported such behaviors in the past month and 17.24% in the past week. Based on responses on the SITBI-SF (35), approximately 40.12% of the participants reported no desire to stop engaging in NSSI. In terms of self-rated estimated likelihood of engaging in NSSI again in the future, 94.36% reported nonzero likelihood, and 74.29% reported at least moderate likelihood (i.e., at least 5 on a 0-to-10 Likert scale).
Among the 635 individuals with suicide attempt, the majority of the participants (91.65%) also endorsed previous engagement in NSSI. Most participants (75.59%) had attempted more than once in their lifetime. The median lifetime frequency of suicide attempts is 3 (M = 6.30, SD = 13.50). About 45.98% attempted in the past year, 13.23% in the past month, and 4.41% in the past week. Half of the participants (55.75%) reported at least one instance of attempt that resulted in at least moderate physical damage and required medical attention. According to responses on the SITBI-SF (35), 94.80% of the participants with lifetime history of suicide attempt noted nonzero likelihood to attempt suicide again in the future, with 66.30% noting at least moderate likelihood (i.e., at least 5 on a 0-to-10 Likert scale).

Model Performance
In terms of the possibility of simple differences between individuals engaging in NSSI and suicide attempt, univariate logistic regression analyses with both internal validation techniques showed that on average individual factors produced chance level classification accuracy, and that all factors produced AUCs lower than 0.75 (Tables 2 and 3). Univariate classification was weak across other metrics (i.e., precision, recall, and Brier score) for most variables (Tables 4 and 5). The theoretically informed multiple logistic regression model including acquired capability for suicide, suicidal desire, and their interaction term produced near chance level accuracy, with fair precision, poor recall, and poor calibration (Tables 4 and 5). Thus, neither univariate models nor the theoretically informed models appeared sufficient for distinguishing between the two groups.
Regarding possible complicated differences, traditional multiple logistic regression with either internal validation technique yielded fair accuracy and did not appear sufficient in distinguishing individuals with NSSI and suicide attempt (Tables 4 and 5). That is, results from the multiple logistic regression analyses were unable to constrain from complex differences to complicated differences. When internally validated with 10-fold cross-validation, the random forest algorithm with all variables did not appear sufficient as it yielded only fair accuracy ( Table 4). When internally validated with the bootstrap optimism correction method, the random forest algorithm with all variables yielded AUC close to.90, suggesting that it was sufficient in distinguishing the two groups ( Table 5). The following variables were then removed from inclusion of the models: the top five most important variables (i.e., confidence in killing self during preparations for suicide, intent on acting on suicide plans, lifetime history of preparations for suicide, self-rated likelihood of developing future suicide plans, disgust with self), the five most discriminative variables identified by univariate analyses (i.e., confidence in killing self during preparations for suicide, lifetime history of preparations for suicide, intent on acting on suicide plans, insomnia, past month frequency of suicide plan), and a randomly selected 10% of variables. After removing variables in various ways, however, the algorithms produced similarly sufficient classifications ( Table 5), indicating that none of the algorithms were necessary. In other words, results from random forests with either internal validation technique failed to constrain from complex differences to complicated differences. Result remained consistent when analyses were restricted to the sample of individuals with NSSI only and suicide attempt only ( Table 5): no model was able to constrain complex differences to either simple or complicated differences.

DISCUSSION
Although researchers have long been interested in how people who engage in NSSI differ from people who engage in suicide attempts, the nature of these differences has remained unclear. The present findings indicated that these differences are complex in nature: results were unable to detect evidence of simple or complicated differences. Across all available variables considered in the study, no specific factor accurately separated the two groups in univariate analyses. The theoretically informed model with two factors (i.e., acquired capability for suicide and suicidal desire) yielded chance level accuracy as well. These results suggest that it is unlikely for an individual factor or a small set of individual factors to be both necessary and sufficient to distinguish between individuals engaging in NSSI and suicide attempt. Multiple logistic regression analyses and random forest analyses with 10fold cross-validation produced fair accuracy, indicating that complicated algorithms constructed with these methods were insufficient to distinguish between NSSI and suicide attempt groups with high accuracy. Random forest analyses with bootstrap optimism correction was sufficient to distinguish between these groups with high accuracy, but many complicated algorithms constructed with this approach produced comparable results. Accordingly, none of these algorithms was necessary to distinguish between these groups with high accuracy. These findings are most consistent with complex differences between people who engage in NSSI and people who attempt suicide, where no factor or factor combination is necessary and sufficient to distinguish between these groups.
The current findings are consistent with prior research on self-injurious thoughts and behaviors. Multiple meta-analyses examining predictors of NSSI and suicidal thoughts and behaviors have found that, on average, univariate predictions yielded accuracy only slightly above chance levels (33,34). Such findings indicate that all known factors and simple combinations of factors are insufficient to accurately predict self-injurious thoughts and behaviors or to distinguish among subgroups of people who engage in self-injurious thoughts and behaviors. Also consistent with the present findings, several studies have found that complicated algorithms can produce fair-to-good accuracy using a range of statistical methods (64,(84)(85)(86)(87). Among complicated algorithms that have produced highly accurate classification or prediction, evidence across (and sometimes within) studies indicates that no particular factor combination is necessary to produce high accuracy. These broader findings, along with the present findings, show that even complicated algorithms are either insufficient or unnecessary to produce high accuracy prediction or classification of self-injurious thoughts and behaviors. That is, existing evidence does not yet allow us to constrain from a complex view to a complicated view of selfinjurious thoughts and behaviors.
All approaches employed in the present study converged on the same conclusionthat the differences among people who engage in NSSI and suicide attempts are complex. But the different approaches indicated different degrees of complexity. Multiple logistic regression and random forest with 10-fold cross-validation indicated a higher degree of complexity, as these complicated algorithms were neither sufficient nor necessary for high accuracy classification. Random forests with bootstrap optimism correction indicated a lower degree of complexity, with complicated algorithms that were sufficient but not necessary to produce high accuracy classification.
It is important to note that, so far in this paper, we have discussed sufficiency in terms of the ability to produce high accuracy classification within a single sample. However, NSSI and suicide research are primarily concerned with identifying simple or complicated factor combinations that accurately classify (or predict or cause) these phenomena across all samples. That is, we are primarily concerned with identifying nomothetic explanations or algorithms. To truly justify constraining from a complex to a complicated view of self-injurious thoughts and behaviors, we must show that a given algorithm is both sufficient and necessary across a large number of samples (ideally across different ages, cultures, etc.). Existing studies, including the present study, have been unable to detect a necessary and sufficient algorithm within a single sample, raising serious doubts about detecting such an algorithm that applies across all or most samples. It will always be possible that such a simple or complicated algorithm will be found but, in our opinions, this possibility no longer appears plausible. We believe that it is most plausible that the causes, predictors, and correlates of self-injurious thoughts and behaviors are complex, and that it is most useful for researchers and clinicians to assume this complexity.
So, what would it mean if the causes, predictors, and correlates of self-injurious thoughts and behaviors truly are complex? In our opinions, this would mean at least six things. First, self-injurious thoughts and behaviors work like most other psychological phenomena, which are complex on the level of biopsychosocial factors [e.g., emotions: see (23,24)]. Second, the causes, predictors, and correlates of self-injurious thoughts and behaviors are indeterminate (i.e., show degeneracy and pluripotentiality, which are core feature of complex systems), but they are not random. There are likely to be many notable regularities across instances of self-injurious thoughts and behaviors (e.g., negative affect). But these regularities are unlikely to be either sensitive or specific to self-injurious thoughts and behaviors, and there are likely to be many irregularities. Third, this indeterminacy will likely make it impossible to form a simple (or even complicated) theory of self-injurious thought and behavior causes that accounts for most instances. Fourth, this indeterminacy likely places a ceiling effect on the accuracy of prediction algorithms, especially across samples. Fifth, this indeterminacy likely places a ceiling effect on intervention efficacy, especially for interventions that target a few specific factors. Indeed, Fox et al. (88) meta-analyzed over 300 randomized controlled trials for self-injurious thoughts and behavior, finding that many interventions slightly reduce these phenomena (~8-15% reductions), but no intervention produces large or even moderate reductions. Sixth, self-injurious thought and behavior research may benefit from moving to a different level of analysis. Although the contributions to these phenomena may be complex on the level of biopsychosocial factors, they may not be complex on other levels. Facing similar difficulties, researchers in other areas of psychological science-most notably affective science (23, 89-91)-have moved to the level of psychological primitives (26).
Although beyond the scope of the present manuscript, we will briefly outline this approach here to illustrate one potential way that we may understand self-injurious thoughts and behaviors on a level other than biopsychosocial factors. Psychological primitives are fundamental elements of the mind that cannot be reduced to anything else psychological (92). These psychological primitives give rise to all psychological phenomena. Three psychological primitives have been identified: internal stimuli, external stimuli, and conceptual knowledge (23,91,93,94). Psychological phenomena (including behaviors) emerge when an individual attempts to make meaning of their current internal and external stimuli based on their conceptual knowledge (i.e., prior experiences). For example, anger occurs when an individual makes sense of their ongoing internal and external stimuli based on their concept of anger. Each person's concept of anger is heterogenous (i.e., includes many different exemplars of "anger") and partially unique. As a result, there is substantial heterogeneity in the internal and external stimuli associated with anger, and in behavioral expressions of anger (95)(96)(97). This heterogeneity is why meta-analyses indicate that there is no neural or physiological signature for anger or any other emotion (98,99). In other words, biopsychosocial factor associations with anger are complex.
The primitive-based approach makes sense of this complexity by proposing that this complex set of factors are all associated with anger via a common primitive-based mechanism: they all activate the anger concept. As a result, a major focus of the new primitive-based approach is to understand how concepts are formed, activated, implemented, and disrupted. For example, the anger concept can be disrupted with semantic satiation techniques, and this makes it more difficult for people to experience anger and to identify stereotypically angry faces (100,101). Similarly, people with a certain form of semantic dementia do not possess concepts for specific emotions like anger. They are accordingly unable to distinguish between stereotypically angry, fearful, or sad faces (102). The primitivebased approach further specifies that all behaviors are motivated by allostasis (i.e., prediction of whether the anticipated metabolic costs of a given behavior are worth the anticipated metabolic benefits; see 103). When an individual conceptualizes that a given behavior will promote allostasis more effectively than any other considered in a given moment, they engage in that behavior.
From this perspective, NSSI and suicide attempts are best understood in terms of concepts for NSSI and suicide, and momentary conceptualizations of how NSSI and/or suicide might contribute to allostasis. Based on this approach, selfinjury concepts (e.g., NSSI, suicide) are necessary (but not sufficient) for self-injury to occur. Consistent with this view, people who have immature self-injury concepts [e.g., young children: (104,105)] have very low rates of self-injurious thoughts and behaviors (4). As these concepts mature in late childhood and early adolescence (104,105), the rates of selfinjurious thoughts and behavior increase dramatically (4). Also based on this approach, the conceptualization that self-injury will promote allostasis more effectively than any other behavior in a given moment should be a necessary and sufficient cause of self-injurious behaviors. Recent work using a virtual reality (VR) suicide paradigm (106) is consistent with this possibility. These studies show that manipulations such as rejection, stress, and pain have little-to-no causal effect on VR suicide. But changing how someone conceptualizes the allostatic consequences of VR suicide (e.g., if told that engaging in VR suicide will help one to avoid stress or pain, or to obtain a reward) has a large causal effect on VR suicide (106,107). The greater the perceived likelihood of obtaining a reward or avoiding a punishment (i.e., of promoting allostasis), the more likely someone is to engage in VR suicide (108).
From this perspective, self-injury theories should focus on how people develop self-injury concepts and how they arrive at the momentary conceptualization that self-injury will promote allostasis. Self-injury prediction efforts should focus on how people conceptualize the potential consequences of self-injury (e.g., as providing major allostatic benefits vs. costs). And selfinjury intervention efforts should focus on disrupting self-injury concepts and changing conceptualizations about the relative costs and benefits of engaging in self-injury. Once again, a full description of the primitive approach is beyond the scope of the present article (see 26 for a more detailed discussion), but the present findings along with the broader literature indicate that, clinically, we may benefit from developing primitive-based methods for predicting and preventing NSSI and suicidality.
A few limitations of the study should be considered when interpreting the findings. First, the present sample included individuals at high risk for self-injurious thoughts and behaviors. It is unclear how the findings might generate to other samples. Second, most participants in the NSSI group reported self-cutting as their primary form of NSSI. Future studies are needed to directly examine the differences between individuals primarily engaging in other forms of NSSI (e.g., burning, scratching) and individuals engaging in suicide attempt. Because complexity already appeared to characterize the differences between the more uniform NSSI group (i.e., primarily self-cutting) and the suicide attempt group, the current findings will likely replicate if the NSSI group is more heterogeneous. Third, the study was unable to include all factors associated with NSSI or suicide attempt. Although it is possible that future studies might discover one individual factor or a specific set of factors that is both necessary and sufficient to separate individuals engaging in suicide attempt from individuals who only engage in NSSI, it is increasingly implausible considering previous metaanalytic evidence (33,34) and the present results.
In sum, the present study found that complex differences exist between individuals engaging in NSSI and those engaging in suicide attempt. It is always possible that future work will be able to constrain these differences to a complicated or simple set of factors. But we believe that it is most plausible to assume that these differences are truly complex and to shift some of our research questions and objectives to align with this complexity. One potential way to do this would be to move beyond biopsychosocial factors to a different level of analysis such as psychological primitives. Such a move may allow for biopsychosocial factor complexity while also providing an explanation for self-injurious thoughts and behaviors that is simple enough to advance theory, prediction, and treatment.

DATA AVAILABILITY STATEMENT
The datasets generated for this study are available on request to the corresponding author.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by The Florida State University Human Subjects Office and Institutional Review Board. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
XH and JF conceived of the study. XH conducted analyses and drafted the initial manuscript. JR collected the data. JF and JR provided comments on the manuscript. XH and JF drafted the final manuscript.