The final, formatted version of the article will be published soon.
PERSPECTIVE article
Front. Vet. Sci.
Sec. Veterinary Epidemiology and Economics
Volume 11 - 2024 |
doi: 10.3389/fvets.2024.1352726
Text mining for disease surveillance in veterinary clinical data: Part two, training computers to identify features in clinical text
Provisionally accepted- 1 Institute of Infection, Veterinary and Ecological Sciences, Faculty of Health and Life Sciences, University of Liverpool, Liverpool, United Kingdom
- 2 Department of Population Medicine, Ontario Veterinary College, University of Guelph, Guelph, Ontario, Canada
- 3 Department of Computer Science, Faculty of Science and Engineering, The University of Manchester, Manchester, England, United Kingdom
- 4 Department of Computer Science, Jamoum University College, Umm Al-Qura University, Makkah, Saudi Arabia
- 5 Department of Electronics and Computer Science, Faculty of Engineering and Physical Sciences, University of Southampton, Southampton, Hampshire, United Kingdom
- 6 Department of Computer Science, Faculty of Science, Durham University, Durham, England, United Kingdom
In part two of this mini-series, we evaluate the range of machine-learning tools now available for application to veterinary clinical text-mining. These tools will be vital to automate extraction of information from large datasets of veterinary clinical narratives curated by projects such as the Small Animal Veterinary Surveillance Network (SAVSNET) and VetCompass, where volumes of millions of records preclude reading records and the complexities of clinical notes limit usefulness of more 'traditional' text-mining approaches. We discuss the application of various machine learning techniques ranging from simple models for identifying words and phrases with similar meanings to expand lexicons for keyword searching, to the use of more complex language models. Specifically, we describe the use of language models for record annotation, unsupervised approaches for identifying topics within large datasets, and discuss more recent developments in the area of generative models (such as ChatGPT). As these models become increasingly complex it is pertinent that researchers and clinicians work together to ensure that the outputs of these models are explainable in order to instill confidence in any conclusions drawn from them.
Keywords: big data, machine learning, Neural Language Modelling, Clinical records, companion animals
Received: 08 Dec 2023; Accepted: 17 Jul 2024.
Copyright: © 2024 Davies, Nenadic, Alfattni, Arguello Casteleiro, Al Moubayed, Farrell, Radford and Noble. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence:
Peter-John Noble, Institute of Infection, Veterinary and Ecological Sciences, Faculty of Health and Life Sciences, University of Liverpool, Liverpool, United Kingdom
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.
Mercedes Arguello Casteleiro
5