AUTHOR=Senior Morwenna , Burghart Matthias , Yu Rongqin , Kormilitzin Andrey , Liu Qiang , Vaci Nemanja , Nevado-Holgado Alejo , Pandit Smita , Zlodre Jakov , Fazel Seena TITLE=Identifying Predictors of Suicide in Severe Mental Illness: A Feasibility Study of a Clinical Prediction Rule (Oxford Mental Illness and Suicide Tool or OxMIS) JOURNAL=Frontiers in Psychiatry VOLUME=Volume 11 - 2020 YEAR=2020 URL=https://www.frontiersin.org/journals/psychiatry/articles/10.3389/fpsyt.2020.00268 DOI=10.3389/fpsyt.2020.00268 ISSN=1664-0640 ABSTRACT=BACKGROUND: Oxford Mental Illness and Suicide tool (OxMIS) is a brief, scalable, freely available, structured risk assessment tool to assess suicide risk in patients with severe mental illness (schizophrenia-spectrum disorders or bipolar disorder). OxMIS requires further external validation, but a lack of large-scale cohorts with relevant variables makes this challenging. Electronic Health Records provide possible data sources for external validation of risk prediction tools. However, they contain large amounts of information within free-text documents that is not readily extractable. In this study, we examined the feasibility of identifying suicide predictors needed to validate OxMIS in routinely collected Electronic Health Records. METHODS: In study 1, we manually reviewed electronic health records of 57 patients with severe mental illness to calculate OxMIS risk scores. In study 2, we examined the feasibility of using natural language processing to scale up this process. We used anonymized free-text documents from the Clinical Record Interactive Search database to train a Named Entity Recognition model, a machine learning technique which recognizes concepts in free-text. The model identified 8 concepts relevant for suicide risk assessment: medication (antidepressant/antipsychotic treatment), violence, education, self-harm, benefits receipt, drug/alcohol use disorder, suicide, and psychiatric admission. We assessed model performance in terms of precision (similar to positive predictive value), recall (similar to sensitivity) and F1 statistic (an overall performance measure). RESULTS: In study 1, we were able to manually extract information on all variables used by OxMIS from routine clinical records. Four variables were missing in a small number: education, parental drug and alcohol use disorder, benefits recipient, and parental psychiatric admission. In study 2, the Named Entity Recognition model had an overall precision of 0.77, recall of 0.90 and F1 score of 0.83. The concept with the best precision and recall was medication (precision 0.84, recall 0.96) and the weakest were suicide (precision 0.37), and drug/alcohol use disorder (recall 0.61). CONCLUSIONS: Some predictors of suicide used in scalable risk prediction tools can be identified in routine clinical records and extracted using natural language processing. However, methodological challenges are substantial because electronic health records differ from other data sources, particularly for family history variables.