AUTHOR=Karystianis George , Cabral Rina Carines , Han Soyeon Caren , Poon Josiah , Butler Tony 

TITLE=Utilizing Text Mining, Data Linkage and Deep Learning in Police and Health Records to Predict Future Offenses in Family and Domestic Violence

JOURNAL=Frontiers in Digital Health

VOLUME=Volume 3 - 2021

YEAR=2021

URL=https://www.frontiersin.org/journals/digital-health/articles/10.3389/fdgth.2021.602683

DOI=10.3389/fdgth.2021.602683

ISSN=2673-253X

ABSTRACT=Family and Domestic violence (FDV) is a global problem with significant social, economic and health consequences for victims including increased health care costs, mental trauma and social stigmatisation. In Australia, the estimated annual FDV cost is $22 billion, with one woman being murdered by a current or former partner every week. Despite this, tools that can predict future FDV based on the features of the person of interest (POI) and victim are lacking. The New South Wales Police Force attends thousands of FDV events each year and records details as fixed fields (e.g., demographic information for individuals involved in the event) and as text narratives which describe in detail abuse types, victim injuries, threats, and the mental health status for POIs and victims. Information within the narratives is mostly untapped for research and reporting purposes. After applying a text mining methodology to extract information from 492,393 FDV event narratives (abuse types, victim injuries and mental illness mentions), we linked these characteristics with the respective fixed fields and also with actual mental health diagnoses obtained from NSW Health for the same cohort to form a comprehensive FDV dataset. These data were input in five deep learning models (MLP, LSTM, Bi-LSTM, Bi-GRU, BERT) to predict three FDV offence types (‘hands-on’, ‘hands-off’, ‘Apprehended Domestic Violence Order (ADVO) breach’). The transformer model with BERT embeddings returned the best performance (69.00% accuracy; 66.76% ROC) for ‘ADVO breach’ in a multilabel classification setup with the binary classification setup returning similar results. ‘Hands-off’ proved the hardest offence type to predict (60.72% accuracy; 57.86% ROC at best using BERT) but showed potential to improve with fine-tuning of binary classification setups. ‘Hands-on’ benefitted least from the contextual information gained through BERT embeddings in which MLP with categorical embeddings outperformed it in three of four metrics (65.95% accuracy; 78.03% F1-score; 70.00% precision). The promising results indicate that future FDV offence can be predicted using deep learning on a large corpus of police and health data. Incorporating additional data sources will likely improve the performance which can assist FDV professionals and law enforcement agencies to better manage FDV to benefit victims.