REVIEW article

Front. Psychiatry, 27 November 2020
Sec. Psychopathology

Reviewing a Decade of Research Into Suicide and Related Behaviour Using the South London and Maudsley NHS Foundation Trust Clinical Record Interactive Search (CRIS) System

  • 1Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom
  • 2South London and Maudsley NHS Foundation Trust, London, United Kingdom

Suicide is a serious public health issue worldwide, yet current clinical methods for assessing a person's risk of taking their own life remain unreliable and new methods for assessing suicide risk are being explored. The widespread adoption of electronic health records (EHRs) has opened up new possibilities for epidemiological studies of suicide and related behaviour amongst those receiving healthcare. These types of records capture valuable information entered by healthcare practitioners at the point of care. However, much recent work has relied heavily on the structured data of EHRs, whilst much of the important information about a patient's care pathway is recorded in the unstructured text of clinical notes. Accessing and structuring text data for use in clinical research, and particularly for suicide and self-harm research, is a significant challenge that is increasingly being addressed using methods from the fields of natural language processing (NLP) and machine learning (ML). In this review, we provide an overview of the range of suicide-related studies that have been carried out using the Clinical Records Interactive Search (CRIS): a database for epidemiological and clinical research that contains de-identified EHRs from the South London and Maudsley NHS Foundation Trust. We highlight the variety of clinical research questions, cohorts and techniques that have been explored for suicide and related behaviour research using CRIS, including the development of NLP and ML approaches. We demonstrate how EHR data provides comprehensive material to study prevalence of suicide and self-harm in clinical populations. Structured data alone is insufficient and NLP methods are needed to more accurately identify relevant information from EHR data. We also show how the text in clinical notes provide signals for ML approaches to suicide risk assessment. We envision increased progress in the decades to come, particularly in externally validating findings across multiple sites and countries, both in terms of clinical evidence and in terms of NLP and machine learning method transferability.


Suicidality Research Prior to CRIS

Prior to the introduction of electronic health records (EHRs), the study of suicidality in Camberwell, the southeast London catchment area served by King's College Hospital, was undertaken by paper case note review, for example of all referrals to a self-harm team over a 6 month period (1). Data was painstakingly extracted and checked from each consecutive referral to ensure they fitted written criteria and in the Neeleman et al. (1) study a single research question about ethnic differences was posed.

Later, when Dutta et al. (24) were trying to determine the epidemiology of completed suicides in a clinically representative cohort of patients experiencing their first episode of psychosis over a 40-year inception period, it was imperative that diagnostic consistency was stringent. They achieved this by amalgamating the Camberwell Cumulative Psychiatric Case Register for the period between January 1, 1965, and December 31, 1983 (5), and then for the period between January 1, 1984, and December 31, 2004, using the basic hospital computer records held at the time with structured fields, to generate a list of all patients admitted with any possible psychotic illness (according to ICD-9 and ICD-10 codes). They then used the information gleaned from reading through the paper case records of all these patients, including medical, nursing, social work, and occupational therapy notes, together with all correspondence relating to the year after each patient's first presentation to complete the Operational Checklist for Psychotic Disorders (OPCRIT) (6). This is a well-validated symptom checklist which enabled operational research diagnostic criteria (RDC) (7) computer diagnoses to be made using the OPRCIT program.

This methodology meant inclusion in the cohort was clearly and consistently defined, and the outcome of deaths by suicide and open verdicts up until March 31, 2007 according to the International Classification of Diseases (ICD) was identified by a direct case-tracing procedure with the Office for National Statistics (ONS) for England and Wales and the General Register Office (GRO) for Scotland. This enabled the study of early risk factors for suicide in the cohort (3) and also studies of both unnatural and natural causes of mortality in first episode psychosis patients (4).

OPCRIT+ (a redesigned version of OPCRIT for use in clinical settings with an expanded number of objectively rated items) facilitated access to structured symptom information entered by clinicians to generate diagnoses including “suicidal ideation” but not self-harm (8), limiting its application for the study of self-harm and suicidal behaviour. However, another more cogent reason for it not being as useful as hoped was “clinicians may feel that these documents are overly prescriptive and restrict their clinical freedom.” There was “disgruntlement amongst the clinicians using the form; extra time on ‘paperwork' is rarely popular” (9) and the OPCRIT+ remains a research tool to obtain “gold standard” research diagnoses, e.g., (10).

Why EHRs and CRIS?

The widespread adoption of EHRs has meant that large-scale clinical data are now available for clinical research, although researchers have to contend with the large volume, complexity and heterogeneity of these “big data” resources. Typical EHR systems store patient data in both structured fields and as unstructured text (as well as other media types, such as medical images). Structured data fields, such as drop-down menus, forms and checkboxes, tend to be made available to clinical practitioners as a means to directly encode patient diagnoses, assessment results, etc. in a predetermined format. However, rates of completion can vary. Unstructured text entry allows for more nuanced documentation, providing context to assessments, patient status, and other information pertinent to the clinical interaction. The availability of these electronic health data has greatly facilitated mental health research. Investigators can now use EHRs to gather data about clinical populations, identify participants for clinical trials, carry out retrospective case-control studies, develop and trial predictive models, and guide the implementation of evidence-based practices (11, 12).

In 2008, the South London and Maudsley National Health Service (NHS) Foundation Trust Biomedical Research Center (SLaM BRC) developed the Clinical Record Interactive Search (CRIS) application. Since 2008, CRIS became an extensive UK-based repository of anonymised, structured and free-text data derived from the EHR system used by SLaM [See (13) for further details]. Under a strict governance model, CRIS has provided secure access to the de-identified records of all those patients in contact with SLaM services. SLaM provides comprehensive mental health services to an ethnically and socioeconomically diverse population of over 1.2 million residents of all ages, covering four inner city and suburban London boroughs — Croydon, Lambeth, Lewisham and Southwark. SLaM also provides highly specialist services which treat patients from across the UK. SLaM CRIS has been the UK exemplar for all NHS Mental Health Trusts, providing an approach for transforming the electronic health record into a data asset and research tool. The SLaM based CRIS system has been replicated in 12 NHS Trusts across the UK1 capturing over 2.6 million patients.

CRIS provides unprecedented information on mental disorders and outcomes in routine clinical care at scale, particularly through enhancements from the use of natural language processing (NLP) to extract previously inaccessible information, ranging from patients' cognitive function, smoking status and education, to antipsychotic medication profiles and substance misuse (14), as well as linkages to external data sources such as national mortality data from the ONS (15), education data (National Pupil Database) (16), and Hospital Episode Statistics (HES) (17). CRIS has also allowed smaller-scale linkages, such as SHIELD, a service improvement project investigating self-harm at the emergency departments of two major London hospitals (18).

The availability of this type of large-scale data heralds the prospect of using statistical and data science approaches to analyse larger cohorts and better understand how these behaviours manifest in healthcare settings (19). However, using these data also presents major challenges, as much of the key clinical information, including suicidal behaviour, is recorded as unstructured clinical case notes and correspondence (2022).

Over the last 10 years, researchers have used CRIS to conduct a number of epidemiological studies to examine suicidal behaviours across a range of mental health conditions (e.g., autism, psychotic disorders), and demographic groups (e.g., adults, children and adolescents, pregnant women). Methodologies have evolved, improving the accuracy of identifying suicidality-related constructs and predictive models of suicide risk. In the following sections, we review the evidence generated from CRIS on suicidal behaviours, the NLP methods used, and the value of the resulting cohorts and datasets created.

Identification and Prevalence Estimates of Suicidality in CRIS Clinical Populations

Suicide-related behaviour is the manifestation of a complex set of phenomena that depend on many contextual factors which can change quickly from 1 day to another. Completed suicide remains relatively rare, meaning that tools to assess suicide risk must have a high predictive validity to be of use in a clinical setting (23). Accurate identification of suicide-related behaviour is, therefore, both highly challenging and of prime importance in determining prevalence of suicidal behaviour in clinical populations, and for the development of risk models. While the earliest studies on suicide and related behaviour in CRIS relied on structured fields and mortality data linkages to identify cohorts, increasing efforts have focussed on using NLP to identify suicidality-related concepts in the high volume of unstructured clinical text held in the database. The task of automatically identifying mentions of suicidal behaviour in clinical notes is complicated by the necessity to distinguish actual events relating to the patient from negated mentions, behaviour reported as family history, or those that are recorded with a degree of uncertainty (24). Furthermore, given the inherent variation across clinical populations, which is reflected in the language used in clinical reporting, NLP tools developed for one clinical subpopulation, such as working age adults, may not be reliably transferable to another group, such as school age children, without adaptation. NLP systems used to identify suicide-related constructs in clinical notes must, therefore, be developed for and validated within each target population.

A wide range of known risk and contributory factors are associated with suicide, with symptoms of mental illness being recognisable in more than 85% of people who die by suicide, according to psychological autopsy interviews with family, friends and medical professionals (25, 26). Over the last 10 years, research using CRIS has been conducted to examine the associations of self-harm, suicidality and death by suicide with mental health conditions and a broad range of situational factors, from homelessness to drug misuse to limited service continuity (2729). As we describe in our summary below, initial studies on suicide and related behaviour in CRIS used structured fields held within standard assessment forms or diagnostic codes. Progressively, researchers began to make use of CRIS's free-text fields and search functionalities, while more recently, NLP techniques have been employed to extract and structure suicide-related information from within the case notes. The principal characteristics of the clinical cohorts mentioned in this review are summarised in Table 1.


Table 1. Summarised characteristics of clinical cohorts created using CRIS for the study of suicide and related behaviour.

Using Structured Data

Suicidality Outcome Data

The Health of the Nation Outcome Scales (HoNOS) were introduced in 1996, to measure the health and social functioning of people with mental illness. Within SLaM, as with most UK mental health trusts, clinicians are expected to complete HoNOS for all patients receiving care. The non-accidental self-injury item on the HoNOS score has been shown to be the only individual item associated with higher mental health service costs (37). It has been used in a number of studies in CRIS to assess both the direct and indirect impact of self-harm. The individual non-accidental self-injury HoNOS item has been included as a covariate in a number of analyses of adverse outcomes within CRIS. These include homelessness and length of hospital stay for psychiatric inpatients (27), functional status and mortality in serious mental illness (38), facilitated discharge and bed days (39), and the effects of clozapine on premature mortality (15). When assessing self-harm as a potential risk factor for mortality among patients with personality disorder, the HoNOS item was again used in isolation as a marker of self-harm risk (40). Despite the provision of optional structured questionnaires on CRIS, such as the Patient Health Questionnaire-9 (PHQ-9) (whose final item enquires about thoughts of self-harm and suicide) and the Beck Scale for Suicide Ideation (BSS), very few are completed in general clinical work where free-text input is favoured by clinicians, making them of limited value for studies of real-world clinical cohorts. Conversely local NHS Trust requirements to complete structured suicide risk assessments for all patients means this data is better recorded and has been studied.

Suicide Risk Assessment Data

Structured suicide and violence risk assessments in mental health services has been shown to have low predictive accuracy for all-cause mortality (30), however these assessments have continued to be used in clinical practice. Lopez-Morinigo et al. examined the use of risk assessment proforma for their investigation into suicide completion in secondary mental health care. The risk proforma, which clinicians were expected to use at that time according to local clinical policy, consisted of present/absent tick boxes for factors including suicidal history, suicidal ideation and alcohol misuse. They found that patients with a diagnosis other than schizophrenia spectrum disorder who had died by suicide, were much less likely than patients with schizophrenia to either have had a full risk assessment or a complete HoNOS even though they showed increased frequency and greater predictability in key suicide risk assessment factors: suicidal ideation, hopelessness, impulsivity and significant loss (29). In their later study, they found structured risk assessment relating to suicide in schizophrenia spectrum disorders to be of little use in predicting completed suicide, with risk assessments fully completed in only 43.6% of patients who had died by suicide (30). Subsequent work revealed a limited role for structured risk assessment, especially in its usefulness in revealing more nuanced factors relevant to suicide risk such as “mental pain” (31). They suggest that research should “switch the focus from long-term risk factors to short-term risk algorithms, which are more relevant to the clinician.”

Suicide Mortality Data

Research into mortality, including death by suicide, has typically utilised ICD-10 diagnostic codes (which must be completed as part of clinical assessment), linked with outcome data from the Office for National Statistics, ONS (15, 41). In a retrospective cohort study, Roberts et al. (32) used CRIS to investigate the mortality of individuals in secondary and tertiary care who had been diagnosed with chronic fatigue syndrome (CFS). Although all-cause mortality for people with CFS was not significantly different to that of the general population, there was a significantly elevated risk of completed suicide. CRIS has also been used to conduct a number of pharmaco-epidemiological studies, for example [Hayes et al. (15)] examined the risk or potential risk mitigation of psychopharmacological interventions on death by suicide in patients with serious mental illness (including schizophrenia, schizoaffective and bipolar disorders). Findings of this study demonstrated treatment with the medication clozapine was associated with a reduction in risk of death by unnatural causes, including suicide, as well as natural causes.

Using Unstructured Data

Free-Text Keywords to Study Self-Harm Presentations to Emergency Departments

Polling et al. (18) used external data linkages in combination with CRIS data (including keywords recorded in free-text fields) to create a novel dataset for the study of self-harm, which is strongly associated with mental health disorders, and is the strongest single risk factor for future suicide. In England, population-level assessment of self-harm is recorded in the Hospital Episode Statistics (HES) database. However, many emergency department attendances, namely those that do not lead to a hospital admission, still go unrecorded in HES, and completion of the reason for presentation is low, thus limiting the value of this data source for studies of self-harm presentations. Polling et al. addressed these shortcomings by combining routinely collected data from electronic health records in CRIS and HES. They validated their data against another dataset curated through manual review of emergency department notes and audit forms, also compiling a list of self-harm search terms.

Free-Text Keywords to Study Perinatal Self-Harm in Women With Psychiatric Disorders

Using the self-harm-related terms identified by Polling et al. (18) and Taylor et al. (33) investigated the prevalence and risk factors of self-harm and suicide ideation in women with psychotic disorders and bipolar disorder during pregnancy. They identified a cohort of 420 patients by performing a free-text search of CRIS records for both suicidal ideation and self-harm. The perinatal period is generally associated with lower risk of both suicide and self-harm in the general population, however, women diagnosed with severe postpartum psychiatric disorders are up to 70 times more at risk of suicide. In Taylor et al.'s cohort, 24.3% of women had a report of suicidal ideation and 7.9% had a recorded self-harm event during their index pregnancy.

Free-Text Keywords to Study Self-Harm and Human Trafficking

In a further study using the free-text search capabilities of CRIS, Borschmann et al. (42) carried out an analysis of self-harm among victims of human trafficking. They identified patients for their cohort by searching the CRIS free-text notes for terms indicating possible trafficking (e.g., “victim of trafficking,” “sex trafficking,” “trafficked”). In the same way, documents were screened for mentions of self-harm behaviour using a list of terms including “self-harm,” “DSH,” “burn*” and “electrocut*.” They found that 33% of all trafficked patients had engaged in self-harm prior to care, while 25% did so during care. After self-harming, trafficked patients were subsequently more likely to be admitted to a ward than those who had not been victims of human trafficking. After self-harming, trafficked patients were more likely than non-trafficked patients to be admitted as a psychiatric inpatient, but less likely to attend an emergency department.

Using Natural Language Processing (NLP)

The first approaches that were developed to process CRIS data were pattern matching approaches to identify certain pieces of information (e.g., medication, smoking status, substance misuse) using the GATE framework (43). In many cases, the information of interest is a particular clinical construct (e.g., hallucinations, echolalia) or a specific diagnosis. A bespoke application, called TextHunter (14), was developed for these types of constructs. TextHunter is a software application that requires a set of manually pre-annotated examples to train a supervised machine learning classifier (Support Vector Machine). These NLP applications identify and classify the relevant constructs and produce structured variables indicating their presence or absence within the texts. These structured variables are stored in table columns in the CRIS database. Researchers may access these variables (along with the “standard” structured fields – e.g., diagnosis codes, demographic information, dates – from the EHR) through the SQL interface of the CRIS database to identify cohorts of patients for epidemiological studies and clinical research. Several studies cited herein have made use of these structured variables (28, 32, 36).

In addition to these “integrated” NLP applications, clinicians have worked alongside NLP researchers to develop custom NLP tools to identify suicide-related constructs in specific population samples within CRIS. As we have seen, the focus of most work has been the epidemiology and prevalence of suicidal behaviour, with NLP tools that use both rule-based (35, 44) and machine learning paradigms (45), including neural network architectures (46). Most recently, efforts have also been made to model dynamic suicide risk using supervised machine learning (36).

Study of Mortality in Opioid Use Disorder Patients Using NLP to Identify Cohorts

Using data from CRIS with an external linkage to ONS mortality data, Bogdanowicz et al. (28) investigated the effectiveness of addiction-specific clinical risk assessments for identifying groups with high mortality in opioid use disorder (OUD). Patients with a diagnosis of OUD were identified by ICD-10 code F11. ICD-10 diagnosis was supplemented with structured output of one of the CRIS NLP tools that identifies diagnoses in unstructured clinical notes. Overdose (both accidental and intentional) was the most common cause of death and clinically assessed suicidality was found to be significantly associated with increased overdose mortality.

NLP to Identify Suicide-Related Behaviour

Today, with the increasing body of research on suicide and related behaviour in CRIS, and a diversity of clinical population groups under study, has come a need to develop more targeted methods of accessing the suicide-related data within the unstructured clinical narratives. NLP systems designed for this task need to identify the different types of suicide-related behaviour (suicide attempt, suicidal ideation, self-harm, etc.) and account for the linguistic variation that indicates whether a mention is attested, negated or uncertain, is relevant to the patient, or a family member, and so on. These considerations have spurred on the recent development of bespoke NLP tools. For example, Gkotsis et al. (44) developed an NLP system specifically designed to detect whether a suicide-related concept is negated or not. This system was developed and evaluated on a random sample of clinical notes from CRIS. In a more recent study, Fernandes et al. (45) developed two NLP approaches to detect relevant mentions of suicidal ideation and another to identify recorded suicide attempts.

NLP Features to Identify Key Suicide Risk Periods

Identifying periods during which a patient is at elevated risk of making a suicide attempt is key to enabling timely intervention. However, information available to clinicians concerning the rapidly changing dynamic factors leading up to a suicide attempt has been limited. Bittar et al. (36) explored whether it is possible to use EHRs to automatically predict suicide attempts in a broad clinical population (across all age groups) using only data from a relatively short period of 30 days leading up to an event. This work was based on the hypothesis that periods prior to a suicide attempt are a time of acute crisis that is reflected, explicitly or implicitly, in clinician records, making these periods distinguishable from periods not preceding an attempted suicide. Combining all three features of (1) structured data from EHRs, (2) structured values extracted by NLP software, and (3) vectorised bag-of-words of all documents provided the best model to classify or distinguish between “document windows” prior to a suicide attempt or not. Thus, the features were found to be complementary in this study.

NLP to Study Suicidal Behaviour in Children and Adolescents

The risk and conceptualisation of suicidal behaviour for children and adolescents can be different to adults (47). Downs et al. (34) conclude that the clinician notes on suicidal risk in children and adolescents are different to an adult review. For example, clinicians may have a greater reliance on third person report, where caregivers voice concerns regarding the young person's suicidality. It is also possible that suicidality is “discovered” rather than being the presenting complaint, hence changing the emphasis and position of suicide-related text/progress notes within the young person's clinical record.

Adolescence is associated with a high risk of suicide and self-harm compared to most other age groups, but few studies have examined the prevalence of suicidal behaviour in large adolescent patient cohorts. Downs et al. (34) first used CRIS to explore suicidality in young people but focussed on a population with autism spectrum disorders (ASD), who have shown much greater risk of suicidal behaviours than neurotypically developing children. A cohort of young people diagnosed with ASD were identified and NLP techniques were used to identify suicidal behaviour from the clinical notes in CRIS. Their corresponding free-text notes (progress reports, medical correspondence, risk assessments, etc.) were manually annotated for mentions of suicidality by clinical researchers. A prevalence analysis of suicidality in a sample of the data showed that only 3% of all documents mentioned suicide-related information.

Using a subset of this cohort, Holden et al. (48) used a historical cohort design and applied NLP approaches to extract information on victimisation by bullying and suicidal behaviour. They found those young people with ASD who were bullied were nearly twice as likely to report later suicidal ideation. The dataset created by Downs et al. has also recently proven useful to train machine learning models for use in suicide research. Song et al. (46) used a revised version of the data to develop a deep neural network classifier that identifies sentences containing positive mentions of suicidality while taking into account the contextual information in surrounding sentences. This type of approach provides an alternative to modelling suicide-related information from text that better takes into account the narrative discourse in the clinical documentation.

Velupillai et al. (19) developed and validated a method for identifying suicidality across a more heterogenous clinical adolescent population in EHRs using NLP, expanding the population beyond ASD. They examined 1,601,422 documents from 23,455 young people and developed a method to accurately identify suicidal behaviour information in a very broad clinical population. The resulting dataset and NLP approaches used, provide a powerful example of how NLP approaches can be used to rapidly examine the prevalence of suicidal behaviour in very large adolescent clinical populations.

NLP to Study Depression and Suicidality in Older Adults

Free-text mentions of depressive symptoms were used as outcome measures in the assessment of later-life depression in people from ethnic minorities by Mansour et al. (49). This study used NLP tools designed to detect depressive symptoms recorded in unstructured texts in CRIS, including the identification of mentions of suicidal ideation. These depressive symptom NLP tools, developed to account for the presence of contextual markers such as negation and irrelevant concepts, were also used by Cai et al. (50) in their investigation into predictors of mortality in people with late life depression.

The Next Ten Years?

Although EHR data are not created for research purposes, they provide a rich resource for large-scale retrospective research, allowing identification of diverse and comprehensive clinical study samples. One of the main challenges in suicide research is obtaining sufficiently large study samples to study an outcome with a high enough base rate for predictive modelling to have a meaningful positive predictive value. The low base rate of completed suicide limits the predictive value of any model, whether established statistical techniques or machine learning (51), but related behaviours, such as suicidal ideation, intention, planning and self-harm can be studied. Over the past 10 years, CRIS has provided an unprecedented resource for studying suicide and related behaviour in a UK clinical population to an extent that would not have been possible before the introduction of EHRs. The development and implementation of this type of resource is an incredibly valuable investment, which should be encouraged.

One avenue of research being pursued in CRIS is comparison of suicide-related phenomenon over a span of time, within the same hospital trust culture, but where mandatory changes have occurred with regards to how assessments are made and recorded. The focus on a single mental health trust for a review opens the opportunity for a different set of more detailed analyses than a review that covers multiple sites (52).

Furthermore, EHRs reflect real-world clinical practice. This means that the context of how, for example, structured risk assessment tools and other schedules, like HoNOS, are used in daily clinical work needs to be well understood when including them as variables in clinical research studies. Most of the relevant information is found in the free text, and appropriate NLP solutions are key components for enabling risk modelling.

Looking to the future, replication studies of work based on SLaM CRIS, including the developed NLP applications, across other EHR systems and in other clinical catchment areas would provide insights into the generalisability of these particular models to new clinical settings. However, the portability of these NLP applications needs further scrutiny. The studies in this review all have developed their methodologies from the same CRIS system; clinical text may have higher internal homogeneity (e.g., in terminology) with respect to other CRIS systems based in other health districts. Testing the generalisability of the NLP tools described in this review across other health organisations is essential and has only just begun. As described earlier, CRIS has also been implemented in other sites across the UK. On example is the Camden & Islington Research Database (53), which has proved a useful starting point for comparison with SLaM CRIS as the data reflects a similar healthcare organisation with socio-demographic and geographic similarities, i.e., represents a comparable, but not identical, urban population.

Other studies using EHR data for suicide-related research range from those that use rule-based approaches to e.g., estimate the use of diagnostic codes vs. information recorded in free text broadly in EHR data (21) or to monitor suicidal patients in primary care (20), to studies using more advanced ML and NLP approaches for e.g., psychiatric readmission risk prediction using inpatient psychiatric discharge summaries where suicidality could be an important risk factor (54), for automated epidemiological surveillance of suicide attempts in emergency departments in France (22), or for estimating risk of death by suicide after discharge (55). Findings from these studies are, however, currently difficult to compare, as the underlying populations, healthcare settings, EHR systems, and data-driven approaches differ.

Furthermore, given the inherent variation across clinical populations, which is reflected in the language used in clinical reporting, NLP tools developed for one clinical subpopulation, such as working age adults, may not be reliably transferable to another group, such as school age children, without adaptation. NLP systems used to identify suicide-related constructs in clinical notes must, therefore, be developed for and validated within each target population. The same principle applies for the application of NLP tools across institutions and EHR systems. The studies in this review all use data from the same system, CRIS, for which language is likely to show a certain level of internal homogeneity (e.g., in terminology) with respect to other systems. Testing the generalisability of the NLP tools described below in this review has only just started.

When also including free text and NLP models, as mentioned above, the extent to which internal homogeneity (e.g., in terminology) impacts results across different institutions and clinical settings, is an area well worth further studies to further advance this field and provide evidence about the broader generalisability of findings. The culture, incentives, and structure of clinical systems outside of the UK may induce further differences between the signals of NLP systems for detecting discussion of suicide. Collaborative efforts are currently being made to compare methodologies and NLP tools across healthcare institutions not just within the UK, but also with collaborators in the USA. We envision advances in ML and NLP methods, standards for interoperability, and infrastructures to enable such comparisons in the future.

Furthermore, advances in computational analysis of EHR data, e.g., machine learning in combination with NLP, will continue to develop, and provide novel solutions to suicide research (56). With the existing CRIS subsets, clinical cohorts, and NLP approaches developed for the studies described in this review, benchmarks have been created that allow for appropriate comparisons between different methodologies.

Going beyond identification or prediction of those at risk, analysis of continuously collected data, and integration of EHR data with smartphone, wearable device and even social media data could allow collection of data across different time periods, not just at the time of clinical interactions, thus helping to understand suicidal crises and enabling delivery of targeted suicide prevention interventions (57).

Summary and Conclusion

In this review of a decade of research into suicide and related behaviour using CRIS we have summarised the evolution of different methods employed to identify suicide and related behaviour, including linkages to mortality data, structured ICD-10 codes, manual review of clinical notes, keyword searching in free text and relevant mentions identified using NLP techniques. Cohorts under study have varied in size from several hundred to tens of thousands of patients and have covered adult, elderly as well as child and adolescent patients. A range of clinical disorders have been described from the perspective of suicide and related behaviours, including pregnancy, severe mental illness and self-harm, opioid use disorder patients, chronic fatigue syndrome and autism spectrum disorders. Finally, some studies have identified and investigated specific clinical events, such as emergency department attendances or hospital admissions.

In conclusion, the breadth and depth of the research and findings of understanding suicide and related behaviour from this past decade using CRIS have accelerated the field in ways unthinkable prior to the availability of EHR data. These studies not only add to the clinical evidence base, but also reflect an important evolution of data-driven method applicability and development that is central to advancing this field further. We envision increased progress in the decades to come, particularly in externally validating findings across multiple sites and countries, both in terms of clinical evidence and in terms of NLP and machine learning method transferability.

Gaining Access to CRIS

The de-identified CRIS database has received ethical approval for secondary analysis: Oxford REC C, reference 18/SC/0372. The data is used in an anonymised and data-secure format under strict governance procedures. CRIS data is made available to researchers with appropriate credentials (provided by the South London and Maudsley NHS Trust) working on approved projects. Projects are approved by a CRIS Oversight Committee, a body set up by and reporting to the South London and Maudsley Caldicott Guardian. On request, and after appropriate credentials have been obtained as well as arrangements with the lead of the respective CRIS project, data presented in this study can be viewed within the secure system firewall.

Author Contributions

RD and SV proposed the manuscript and its contents. AB wrote the first draft of the manuscript and compiled data pertaining to study design, cohorts and NLP systems used in the cited literature, and incorporated edits by other authors. Each author contributed to specific sections of the manuscript: RD and SV on introductory and historical overviews: RS on use of structured data fields: JD on child and adolescent populations: AB and SV on natural language processing: RD, SV, and JD on perspectives and conclusions. All authors contributed to editing and revising the manuscript and approved the final version.


RD was funded by a Clinician Scientist Fellowship (research project e-HOST-IT) from the Health Foundation in partnership with the Academy of Medical Sciences which also funds AB. SV was funded by the National Institute for Health Research (NIHR) Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King's College London. JD was supported by NIHR Clinician Science Fellowship award (CS-2018-18-ST2-014) and has received support from a Medical Research Council (MRC) Clinical Research Training Fellowship (MR/L017105/1) and Psychiatry Research Trust Peggy Pollak Research Fellowship in Developmental Psychiatry. RS is employed as an NIHR Academic Clinical Fellow and was funded by a BRC Preparatory Clinical Research Training Fellowship.


The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health and Social Care.

Conflict of Interest

RD and SV declare previous research funding received from Janssen.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.


The authors acknowledge infrastructure support from the National Institute for Health Research (NIHR).



1. Neeleman J, Jones P, Van Os J, Murray RM. Parasuicide in Camberwell-ethnic differences. Soc Psychiatry Psychiatr Epidemiol. (1996) 31:284–7. doi: 10.1007/BF00787921

PubMed Abstract | CrossRef Full Text | Google Scholar

2. Dutta R, Murray RM, Hotopf M, Allardyce J, Jones PB, Boydell J. Reassessing the long-term risk of suicide after a first episode of psychosis. Arch Gen Psychiatry. (2010) 67:1230–7. doi: 10.1001/archgenpsychiatry.2010.157

PubMed Abstract | CrossRef Full Text | Google Scholar

3. Dutta R, Murray RM, Allardyce J, Jones PB, Boydell J. Early risk factors for suicide in an epidemiological first episode psychosis cohort. Schizophr Res. (2011) 126:11–9. doi: 10.1016/j.schres.2010.11.021

PubMed Abstract | CrossRef Full Text | Google Scholar

4. Dutta R, Murray RM, Allardyce J, Jones PB, Boydell JE. Mortality in first-contact psychosis patients in the UK: a cohort study. Psychol Med. (2012) 42:1649–61. doi: 10.1017/S0033291711002807

PubMed Abstract | CrossRef Full Text | Google Scholar

5. Castle D, Wessely S, Der G, Murray RM. The incidence of operationally defined schizophrenia in Camberwell, 1965–84. Br J Psychiatry. (1991) 159:790–4. doi: 10.1192/bjp.159.6.790

PubMed Abstract | CrossRef Full Text | Google Scholar

6. McGuffin P, Farmer A, Harvey I. A polydiagnostic application of operational criteria in studies of psychotic illness: development and reliability of the OPCRIT system. Arch Gen Psychiatry. (1991) 48:764–70. doi: 10.1001/archpsyc.1991.01810320088015

PubMed Abstract | CrossRef Full Text | Google Scholar

7. Spitzer RL, Endicott J, Robins E. Research diagnostic criteria: rationale and reliability. Arch Gen Psychiatry. (1978) 35:773–82. doi: 10.1001/archpsyc.1978.01770300115013

PubMed Abstract | CrossRef Full Text | Google Scholar

8. Rucker J, Newman S, Gray J, Gunasinghe C, Broadbent M, Brittain P, et al. OPCRIT+: an electronic system for psychiatric diagnosis and data collection in clinical and research settings. Br J Psychiatry. (2011) 199:151–5. doi: 10.1192/bjp.bp.110.082925

PubMed Abstract | CrossRef Full Text | Google Scholar

9. Lobo SEM, Rucker J, Kerr M, Gallo F, Constable G, Hotopf M, et al. A comparison of mental state examination documentation by junior clinicians in electronic health records before and after the introduction of a semi-structured assessment template (OPCRIT+). Int J Med Inform. (2015) 84:675–82. doi: 10.1016/j.ijmedinf.2015.05.001

PubMed Abstract | CrossRef Full Text | Google Scholar

10. Davis KAS, Bashford O, Jewell A, Shetty H, Stewart RJ, Sudlow CLM, et al. Using data linkage to electronic patient records to assess the validity of selected mental health diagnoses in English Hospital Episode Statistics (HES). PLoS ONE. (2018) 13:e0195002. doi: 10.1371/journal.pone.0195002

PubMed Abstract | CrossRef Full Text | Google Scholar

11. Casey JA, Schwartz BS, Stewart WF, Adler NE. Using electronic health records for population health research: a review of methods and applications. Annu Rev Public Health. (2016) 37:61–81. doi: 10.1146/annurev-publhealth-032315-021353

PubMed Abstract | CrossRef Full Text | Google Scholar

12. Castillo EG, Olfson M, Pincus HA, Vawdrey D, Stroup TS. Electronic health records in mental health research: a framework for developing valid research methods. Psychiatr Serv. (2015) 66:193–6. doi: 10.1176/

PubMed Abstract | CrossRef Full Text | Google Scholar

13. Perera G, Broadbent M, Callard F, Chang C-K, Downs J, Dutta R, et al. Cohort profile of the South London and Maudsley NHS Foundation Trust Biomedical Research Centre (SLaM BRC) case register: current status and recent enhancement of an electronic mental health record-derived data resource. BMJ Open. (2016) 6:e008721. doi: 10.1136/bmjopen-2015-008721

PubMed Abstract | CrossRef Full Text | Google Scholar

14. Jackson RG, Patel R, Jayatilleke N, Kolliakou A, Ball M, Gorrell G, et al. Natural language processing to extract symptoms of severe mental illness from clinical text: the clinical record interactive search comprehensive data extraction (CRIS-CODE) project. BMJ Open. (2017) 7:e012012. doi: 10.1136/bmjopen-2016-012012

PubMed Abstract | CrossRef Full Text | Google Scholar

15. Hayes RD, Downs J, Chang C-K, Jackson RG, Shetty H, Broadbent M, et al. The effect of clozapine on premature mortality: an assessment of clinical monitoring and other potential confounders. Schizophr Bull. (2014) 41:644–55. doi: 10.1093/schbul/sbu120

PubMed Abstract | CrossRef Full Text | Google Scholar

16. Downs J, Ford T, Stewart R, Epstein S, Shetty H, Little R, et al. An approach to linking education, social care and electronic health records for children and young people in South London: a linkage study of child and adolescent mental health service data. BMJ Open. (2019) 9:e024355. doi: 10.1136/bmjopen-2018-024355

PubMed Abstract | CrossRef Full Text | Google Scholar

17. Chang C-K, Chen C-Y, Broadbent M, Stewart R, O'Hara J. Hospital admissions for respiratory system diseases in adults with intellectual disabilities in Southeast London: a register-based cohort study. BMJ Open. (2017) 7:e014846. doi: 10.1136/bmjopen-2016-014846

PubMed Abstract | CrossRef Full Text | Google Scholar

18. Polling C, Tulloch A, Banerjee S, Cross S, Dutta R, Wood DM, et al. Using routine clinical and administrative data to produce a dataset of attendances at emergency departments following self-harm. BMC Emerg Med. (2015) 15:15. doi: 10.1186/s12873-015-0041-6

PubMed Abstract | CrossRef Full Text | Google Scholar

19. Velupillai S, Hadlaczky G, Baca-Garcia E, Gorrell GM, Werbeloff N, Nguyen D, et al. Risk assessment tools and data-driven approaches for predicting and preventing suicidal behavior. Front Psychiatry. (2019) 10:36. doi: 10.3389/fpsyt.2019.00036

PubMed Abstract | CrossRef Full Text | Google Scholar

20. Anderson HD, Pace WD, Brandt E, Nielsen RD, Allen RR, Libby AM, et al. Monitoring suicidal patients in primary care using electronic health records. J Am Board Fam Med. (2015) 28:65–71. doi: 10.3122/jabfm.2015.01.140181

PubMed Abstract | CrossRef Full Text | Google Scholar

21. Haerian K, Salmasian H, Friedman C. Methods for identifying suicide or suicidal ideation in EHRs. AMIA Annu Symp Proc. (2012) 2012:1244−53.

Google Scholar

22. Metzger M-H, Tvardik N, Gicquel Q, Bouvry C, Poulet E, Potinet-Pagliaroli V. Use of emergency department electronic medical records for automated epidemiological surveillance of suicide attempts: a French pilot study: text-mining and epidemiology of suicide attempts. Int J Methods Psychiatr Res. (2017) 26:e1522. doi: 10.1002/mpr.1522

CrossRef Full Text | Google Scholar

23. Carter G, Milner A, McGill K, Pirkis J, Kapur N, Spittal MJ. Predicting suicidal behaviours using clinical instruments: systematic review and meta-analysis of positive predictive values for risk scales. Br J Psychiatry. (2017) 210:387–95. doi: 10.1192/bjp.bp.116.182717

PubMed Abstract | CrossRef Full Text | Google Scholar

24. Velupillai S, Suominen H, Liakata M, Roberts A, Shah AD, Morley K, et al. Using clinical natural language processing for health outcomes research: overview and actionable suggestions for future advances. J Biomed Inform. (2018) 88:11–9. doi: 10.1016/j.jbi.2018.10.005

PubMed Abstract | CrossRef Full Text | Google Scholar

25. Arsenault-Lapierre G, Kim C, Turecki G. Psychiatric diagnoses in 3275 suicides: a meta-analysis. BMC Psychiatry. (2004) 4:37. doi: 10.1186/1471-244X-4-37

PubMed Abstract | CrossRef Full Text | Google Scholar

26. Cavanagh JTO, Carson AJ, Sharpe M, Lawrie SM. Psychological autopsy studies of suicide: a systematic review. Psychol Med. (2003) 33:395–405. doi: 10.1017/S0033291702006943

PubMed Abstract | CrossRef Full Text | Google Scholar

27. Tulloch AD, Khondoker MR, Fearon P, David AS. Associations of homelessness and residential mobility with length of stay after acute psychiatric admission. BMC Psychiatry. (2012) 12:121. doi: 10.1186/1471-244X-12-121

PubMed Abstract | CrossRef Full Text | Google Scholar

28. Bogdanowicz KM, Stewart R, Chang C-K, Downs J, Khondoker M, Shetty H, et al. Identifying mortality risks in patients with opioid use disorder using brief screening assessment: secondary mental health clinical records analysis. Drug Alcohol Depend. (2016) 164:82–8. doi: 10.1016/j.drugalcdep.2016.04.036

PubMed Abstract | CrossRef Full Text | Google Scholar

29. Lopez-Morinigo J-D, Fernandes AC, Chang C-K, Hayes RD, Broadbent M, Stewart R, et al. Suicide completion in secondary mental healthcare: a comparison study between schizophrenia spectrum disorders and all other diagnoses. BMC Psychiatry. (2014) 14:213. doi: 10.1186/s12888-014-0213-z

PubMed Abstract | CrossRef Full Text | Google Scholar

30. Lopez-Morinigo J-D, Ayesa-Arriola R, Torres-Romano B, Fernandes AC, Shetty H, Broadbent M, et al. Risk assessment and suicide by patients with schizophrenia in secondary mental healthcare: a case–control study. BMJ Open. (2016) 6:e011929. doi: 10.1136/bmjopen-2016-011929

PubMed Abstract | CrossRef Full Text | Google Scholar

31. Lopez-Morinigo J-D, Fernandes AC, Shetty H, Ayesa-Arriola R, Bari A, Stewart R, et al. Can risk assessment predict suicide in secondary mental healthcare? Findings from the South London and Maudsley NHS Foundation Trust Biomedical Research Centre (SLaM BRC) Case Register. Soc Psychiatry Psychiatr Epidemiol. (2018) 53:1161–71. doi: 10.1007/s00127-018-1536-8

PubMed Abstract | CrossRef Full Text | Google Scholar

32. Roberts E, Wessely S, Chalder T, Chang C-K, Hotopf M. Mortality of people with chronic fatigue syndrome: a retrospective cohort study in England and Wales from the South London and Maudsley NHS Foundation Trust Biomedical Research Centre (SLaM BRC) Clinical Record Interactive Search (CRIS) Register. Lancet. (2016) 387:1638–43. doi: 10.1016/S0140-6736(15)01223-4

PubMed Abstract | CrossRef Full Text | Google Scholar

33. Taylor CL, van Ravesteyn LM, van denBerg MPL, Stewart RJ, Howard LM. The prevalence and correlates of self-harm in pregnant women with psychotic disorder and bipolar disorder. Arch Womens Mental Health. (2016) 19:909–15. doi: 10.1007/s00737-016-0636-2

PubMed Abstract | CrossRef Full Text | Google Scholar

34. Downs J, Velupillai S, Gkotsis G, Holden R, Kikoler M, Dean H, et al. Detection of suicidality in adolescents with autism spectrum disorders: developing a natural language processing approach for use in electronic health records. Proc AMIA Annu Symp. (2017) 641−9.

Google Scholar

35. Velupillai S, Epstein S, Bittar A, Stephenson T, Dutta R, Downs J. Identifying suicidal adolescents from mental health records using natural language processing. In: Proceedings of MEDINFO 2019: Health and Wellbeing e-Networks for All. (2019). p. 413–7. Available online at: (accessed February 18, 2020).

Google Scholar

36. Bittar A, Velupillai S, Roberts A, Dutta R. Text classification to inform suicide risk assessment in electronic health records. Stud Health Technol Inform. (2019) 264:40–4. doi: 10.3233/SHTI190179

PubMed Abstract | CrossRef Full Text | Google Scholar

37. Twomey C, Prina AM, Baldwin DS, Das-Munshi J, Kingdon D, Koeser L, et al. Utility of the health of the nation outcome scales (HoNOS) in predicting mental health service costs for patients with common mental health problems: historical cohort study. PLoS ONE. (2016) 11:e0167103. doi: 10.1371/journal.pone.0167103

PubMed Abstract | CrossRef Full Text | Google Scholar

38. Hayes RD, Chang C-K, Fernandes AC, Begum A, To D, Broadbent M, et al. Functional status and all-cause mortality in serious mental illness. PLoS ONE. (2012) 7:e44613. doi: 10.1371/journal.pone.0044613

PubMed Abstract | CrossRef Full Text | Google Scholar

39. Tulloch AD, Khondoker MR, Thornicroft G, David AS. Home treatment teams and facilitated discharge from psychiatric hospital. Epidemiol Psychiatr Sci. (2015) 24:402–14. doi: 10.1017/S2045796014000304

PubMed Abstract | CrossRef Full Text | Google Scholar

40. Fok ML-Y, Stewart R, Hayes RD, Moran P. Predictors of natural and unnatural mortality among patients with personality disorder: evidence from a large UK case register. PLoS ONE. (2014) 9:e100979. doi: 10.1371/journal.pone.0100979

PubMed Abstract | CrossRef Full Text | Google Scholar

41. Das-Munshi J, Chang C-K, Dutta R, Morgan C, Nazroo J, Stewart R, et al. Ethnicity and excess mortality in severe mental illness: a cohort study. Lancet Psychiatry. (2017) 4:389–99. doi: 10.1016/S2215-0366(17)30097-4

CrossRef Full Text | Google Scholar

42. Borschmann R, Oram S, Kinner SA, Dutta R, Zimmerman C, Howard LM. Self-harm among adult victims of human trafficking who accessed secondary mental health services in England. Psychiatr Serv. (2017) 68:207–10. doi: 10.1176/

PubMed Abstract | CrossRef Full Text | Google Scholar

43. Cunningham H, Tablan V, Roberts A, Bontcheva K. Getting more out of biomedical documents with GATE's full lifecycle open source text analytics. PLoS Comput Biol. (2013) 9:e1002854. doi: 10.1371/journal.pcbi.1002854

PubMed Abstract | CrossRef Full Text | Google Scholar

44. Gkotsis G, Velupillai S, Oellrich A, Dean H, Liakata M, Dutta R. Don't let notes be misunderstood: a negation detection method for assessing risk of suicide in mental health records. In: Proceedings of the Third Workshop on Computational Lingusitics and Clinical Psychology. (2016). p. 95–105.

Google Scholar

45. Fernandes AC, Dutta R, Velupillai S, Sanyal J, Stewart R, Chandran D. Identifying suicide ideation and suicidal attempts in a psychiatric clinical research database using natural language processing. Sci Rep. (2018) 8:7426. doi: 10.1038/s41598-018-25773-2

PubMed Abstract | CrossRef Full Text | Google Scholar

46. Song X, Downs J, Velupillai S, Holden R, Kikoler M, Bontcheva K, et al. Using deep neural networks with intra- and inter-sentence context to classify suicidal behaviour. In: Proceedings of the Twelfth International Conference on Language Resources and Evaluation (LREC 2020). Marseille: LREC (2020).

Google Scholar

47. Cha CB, Franz PJM, Guzmán E, Glenn CR, Kleiman EM, et al. Annual research review: suicide among youth—epidemiology, (potential) etiology, and treatment. J Child Psychol Psychiatry Allied Discip. (2018) 59:460–82. doi: 10.1111/jcpp.12831

PubMed Abstract | CrossRef Full Text | Google Scholar

48. Holden R, Mueller J, McGowan J, Sanyal J, Kikoler M, Simonoff E, et al. Investigating bullying as a predictor of suicidality in a clinical sample of adolescents with autism spectrum disorder. Autism Res. (2020). 13:988–97. doi: 10.1002/aur.2292

PubMed Abstract | CrossRef Full Text | Google Scholar

49. Mansour R, Tsamakis K, Rizos E, Perera G, Das-Munshi J, Stewart R, et al. Late-life depression in people from ethnic minority backgrounds: differences in presentation and management. J Affect Dis. (2020) 264:340–7. doi: 10.1016/j.jad.2019.12.031

PubMed Abstract | CrossRef Full Text | Google Scholar

50. Cai W, Mueller C, Shetty H, Perera G, Stewart R. Predictors of mortality in people with late-life depression: a retrospective cohort study. J Affect Dis. (2020) 266:695–701. doi: 10.1016/j.jad.2020.01.021

PubMed Abstract | CrossRef Full Text | Google Scholar

51. McHugh CM, Large MM. Can machine-learning methods really help predict suicide? Curr Opin Psychiatry. (2020) 33:369–74. doi: 10.1097/YCO.0000000000000609

PubMed Abstract | CrossRef Full Text | Google Scholar

52. Francis ER, Freitas DF, de Colling C, Pritchard M, Kadra-Scalzo G, Viani N, et al. Measuring the Incidence of Suicidality in Depression Over a Ten-Year Period in a Large UK Healthcare Provider. Manuscript submitted for publication.

Google Scholar

53. Werbeloff N, Osborn DPJ, Patel R, Taylor M, Stewart R, Broadbent M, et al. The Camden & Islington research database: using electronic mental health records for research. PLoS ONE. (2018) 13:e0190703. doi: 10.1371/journal.pone.0190703

PubMed Abstract | CrossRef Full Text | Google Scholar

54. Rumshisky A, Ghassemi M, Naumann T, Szolovits P, Castro VM, McCoy TH, et al. Predicting early psychiatric readmission with natural language processing of narrative discharge summaries. Transl Psychiatry. (2016) 6:e921. doi: 10.1038/tp.2015.182

PubMed Abstract | CrossRef Full Text | Google Scholar

55. McCoy TH, Castro VM, Roberson AM, Snapper LA, Perlis RH. Improving prediction of suicide and accidental death after discharge from general hospitals with natural language processing. JAMA Psychiatry. (2016) 73:1064–71. doi: 10.1001/jamapsychiatry.2016.2172

PubMed Abstract | CrossRef Full Text | Google Scholar

56. Walsh CG, Ribeiro JD, Franklin JC. Predicting risk of suicide attempts over time through machine learning. Clin Psychol Sci. (2017) 5:457–69. doi: 10.1177/2167702617691560

CrossRef Full Text | Google Scholar

57. Torous J, Walker R. Leveraging digital health and machine learning toward reducing suicide—from panacea to practical tool. JAMA Psychiatry. (2019) 76:999–1000. doi: 10.1001/jamapsychiatry.2019.1231

PubMed Abstract | CrossRef Full Text | Google Scholar

Keywords: electronic health records, natural language processing, machine learning, suicide attempted, suicide completed, self-injurious behaviour

Citation: Bittar A, Velupillai S, Downs J, Sedgwick R and Dutta R (2020) Reviewing a Decade of Research Into Suicide and Related Behaviour Using the South London and Maudsley NHS Foundation Trust Clinical Record Interactive Search (CRIS) System. Front. Psychiatry 11:553463. doi: 10.3389/fpsyt.2020.553463

Received: 18 April 2020; Accepted: 29 October 2020;
Published: 27 November 2020.

Edited by:

Jorge Lopez-Castroman, Centre Hospitalier Universitaire de Nîmes, France

Reviewed by:

Glen Coppersmith, Qntfy, United States
Ayah Zirikly, National Institutes of Health (NIH), United States

Copyright © 2020 Bittar, Velupillai, Downs, Sedgwick and Dutta. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: André Bittar,