AUTHOR=Karystianis George , Cabral Rina Carines , Han Soyeon Caren , Poon Josiah , Butler Tony TITLE=Utilizing Text Mining, Data Linkage and Deep Learning in Police and Health Records to Predict Future Offenses in Family and Domestic Violence JOURNAL=Frontiers in Digital Health VOLUME=Volume 3 - 2021 YEAR=2021 URL=https://www.frontiersin.org/journals/digital-health/articles/10.3389/fdgth.2021.602683 DOI=10.3389/fdgth.2021.602683 ISSN=2673-253X ABSTRACT=Family and Domestic violence (FDV) is a global problem with significant social, economic and health consequences for victims including increased health care costs, mental trauma and social stigmatisation. In Australia, the estimated annual FDV cost is $22 billion, with one woman being murdered by a current or former partner every week. Despite this, tools that can predict future FDV based on the features of the person of interest (POI) and victim are lacking. The New South Wales Police Force attends thousands of FDV events each year and records details as fixed fields (e.g., demographic information for individuals involved in the event) and as text narratives which describe in detail abuse types, victim injuries, threats, and the mental health status for POIs and victims. Information within the narratives is mostly untapped for research and reporting purposes. After applying a text mining methodology to extract information from 492,393 FDV event narratives (abuse types, victim injuries and mental illness mentions), we linked these characteristics with the respective fixed fields and also with actual mental health diagnoses obtained from NSW Health for the same cohort to form a comprehensive FDV dataset. These data were input in five deep learning models (MLP, LSTM, Bi-LSTM, Bi-GRU, BERT) to predict three FDV offence types (‘hands-on’, ‘hands-off’, ‘Apprehended Domestic Violence Order (ADVO) breach’). The transformer model with BERT embeddings returned the best performance (69.00% accuracy; 66.76% ROC) for ‘ADVO breach’ in a multilabel classification setup with the binary classification setup returning similar results. ‘Hands-off’ proved the hardest offence type to predict (60.72% accuracy; 57.86% ROC at best using BERT) but showed potential to improve with fine-tuning of binary classification setups. ‘Hands-on’ benefitted least from the contextual information gained through BERT embeddings in which MLP with categorical embeddings outperformed it in three of four metrics (65.95% accuracy; 78.03% F1-score; 70.00% precision). The promising results indicate that future FDV offence can be predicted using deep learning on a large corpus of police and health data. Incorporating additional data sources will likely improve the performance which can assist FDV professionals and law enforcement agencies to better manage FDV to benefit victims.