AUTHOR=Rooney Nicola J. , Clark Corinna C. A. TITLE=Development of a Performance Monitoring Instrument for Rating Explosives Search Dog Performance JOURNAL=Frontiers in Veterinary Science VOLUME=Volume 8 - 2021 YEAR=2021 URL=https://www.frontiersin.org/journals/veterinary-science/articles/10.3389/fvets.2021.545382 DOI=10.3389/fvets.2021.545382 ISSN=2297-1769 ABSTRACT=The growing body of working dog literature includes many examples of scales robustly developed to measure behavioural factors affecting success of working dogs. However, in these analyses, most rely on organisation’s own long-established ratings of performance, or simply pass/fail at selection or certification as measures of success. Working ability is multifaceted and it is likely that different aspects of ability are differentially affected. In order to understand how specific aspects of selection, training and operations impact on dogs’ working ability, these different facets of performance need to be considered. An accurate and validated method for quantifying multiple aspects of performance is therefore required. Here we describe the first stages of formulating a meaningful Performance Measurement Tool for two type of working search dogs. The systematic methodology used was: 1) interviews and workshops with a representative cross-section of stakeholders to produce a shortlist of behaviours integral to current operational performance of vehicle (VS) and high assurance (HAS) search dogs; 2) assessing reliability and construct validity of shortlisted behavioural measures (at the behaviour and individual rater level) using ratings of diverse videoed searches by experienced personnel; 3) selecting the most essential and meaningful behaviours based on their reliability/validity and importance. The resulting performance measurement tool, was composed of twelve shortlisted behaviours most of which proved reliable and valid when assessed by a group of raters. At the individual rater level, however, there was variability between raters in ability to use and interpret behavioural measures, in particular, more abstract behaviours such as Independence. This illustrates the importance of examining individual rater scores rather than extrapolating from group consensus (as is often done), especially when designing a tool which will ultimately be used by single raters. For ratings to be practically valuable, individual rater reliability needs to be improved, especially for behaviours deemed as essential (e.g. Control, Confidence). We suggest the next step is to investigate why raters vary, and to undertake efforts to increase the likelihood that they reach a common conceptualisation of each behaviour-construct. Plausible approaches are improving the format in which behaviours are presented e.g. by adding benchmarks, and utilising rater-training.