ORIGINAL RESEARCH article
Front. Sociol.
Sec. Medical Sociology
This article is part of the Research TopicEnhancing Data Collection and Integration to Reduce Health Harms and Inequalities Linked to ViolenceView all 4 articles
Improving Police Recorded Crime Data for Domestic Violence and Abuse through Natural Language Processing
Provisionally accepted- 1City University of London, London, United Kingdom
- 2University of Lancashire, Preston, United Kingdom
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Introduction: Domestic Violence and Abuse (DVA) is a growing public health and safeguarding concern in the UK, compounded by long-standing data quality issues in police records. Incomplete or inaccurate recording of key variables undermines the ability of police, health services, and partner agencies to assess risk, allocate resources, and design effective interventions. Methods: We evaluated two machine learning models (Random Forest and DistilBERT) for classifying the type of victim/offender relationship (ex-partner, current partner, and family) from approximately 19,000 DVA incidents recorded by a UK police force. Models were benchmarked against a static rule-based classifier and assessed using precision, recall, and F1-score. To reduce false positives in the most challenging relationship categories, we implemented a selective classification strategy that abstained from low-confidence predictions. Results: Both machine learning models outperformed the baseline across all metrics, with average absolute gains of 11 percentage points in precision and 16 in recall. Ex-partner cases were classified most accurately, while current partner cases were classified with the least accuracy. Selective classification substantially improved precision for underperforming categories, albeit at the expense of reduced coverage. Discussion: These findings demonstrate that computational tools can enhance the completeness and reliability of police DVA data, provided their use balances predictive accuracy, interpretability, and safeguarding risks.
Keywords: Natural Language Processing, police recorded crime, Domestic violence (DV), Text classication, Supervised machine learning, DistilBERT, Free text
Received: 15 Aug 2025; Accepted: 27 Oct 2025.
Copyright: © 2025 Cook, Weir and Humphreys. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Darren Cook, darren.cook@city.ac.uk
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.
