Enhanced Selection of Assistance and Explosive Detection Dogs Using Cognitive Measures

MacLean, Evan L.; Hare, Brian

doi:10.3389/fvets.2018.00236

ORIGINAL RESEARCH article

Front. Vet. Sci., 04 October 2018
Sec. Veterinary Humanities and Social Sciences
Volume 5 - 2018 | https://doi.org/10.3389/fvets.2018.00236

Enhanced Selection of Assistance and Explosive Detection Dogs Using Cognitive Measures

Evan L. MacLean^1,2^*

Brian Hare^3,4

¹School of Anthropology, University of Arizona, Tucson, AZ, United States
²Department of Psychology, University of Arizona, Tucson, AZ, United States
³Evolutionary Anthropology, Duke University, Durham, NC, United States
⁴Center for Cognitive Neuroscience, Duke University, Durham, NC, United States

Working dogs play a variety of important roles, ranging from assisting individuals with disabilities, to explosive and medical detection work. Despite widespread demand, only a subset of dogs bred and trained for these roles ultimately succeed, creating a need for objective measures that can predict working dog aptitude. Most previous research has focused on temperamental characteristics of successful dogs. However, working dogs also face diverse cognitive challenges both in training, and throughout their working lives. We conducted a series of studies investigating the relationships between individual differences in dog cognition, and success as an assistance or detection dog. Assistance dogs (N = 164) and detection dogs (N = 222) were tested in the Dog Cognition Test Battery, a 25-item instrument probing diverse aspects of dog cognition. Through exploratory analyses we identified a subset of tasks associated with success in each training program, and developed shorter test batteries including only these measures. We then used predictive modeling in a prospective study with an independent sample of assistance dogs (N = 180), and conducted a replication study with an independent sample of detection dogs (N = 90). In assistance dogs, models using data on individual differences in cognition predicted higher probabilities of success for dogs that ultimately succeeded in the program, than for those who did not. For the subset of dogs with predicted probabilities of success in the 4th quartile (highest predicted probability of success), model predictions were 86% accurate, on average. In both the exploratory and prospective studies, successful dogs were more likely to engage in eye contact with a human experimenter when faced with an unsolvable task, or when a joint social activity was disrupted. In detection dogs, we replicated our exploratory findings that the most successful dogs scored higher on measures of sensitivity to human communicative intentions, and two measures of short term memory. These findings suggest that that (1) individual differences in cognition contribute to variance in working dog success, and (2) that objective measures of dog cognition can be used to improve the processes through which working dogs are evaluated and selected.

Introduction

Working dogs play a wide variety of important roles in human society, performing tasks ranging from assisting people with disabilities, to explosive and medical detection (1). Despite widespread demand, only a subset of dogs bred and trained for these roles are ultimately able to succeed as working dogs (2–4). Attrition from training programs, or failure to succeed after training, have important consequences with respect to public health (e.g., wait lists to receive certified assistance dogs) as well as the financial costs of breeding, training, and placing working dogs (e.g., through investment of resources in dogs that ultimately do not succeed). Therefore, there is an important need for objective measures that can predict whether individual dogs are likely to succeed in diverse types of working dog programs [reviewed in (5, 6)].

To date, most research on predictors of success as a working dog have focused largely on measures related to temperament and behavior. Studies of temperament have been motivated by the idea that working dogs are often utilized in highly stimulating environments, that these dogs frequently encounter unfamiliar people, other animals, and potentially startling stimuli, and that dogs must be able to remain calm, and task-focused in these situations. Similarly, inappropriate behaviors (e.g., excessive barking, scavenging, inappropriate elimination) can cause problems for dog handlers, or compromise a dog's ability to effectively perform his or her role. Studies across the last two decades have developed a wide range of approaches for assessing these characteristics, many of which serve as useful predictors of working dog success (2, 3, 7–17). For example, Wilsson and Sinn (16) found that scores on a principal component relating to a tendency to engage in tug-of-war, chasing, and interest in object retrieval, were positively associated with training success in the Swedish armed forces. In assistance dog populations, Duffy and Serpell (3) found that dogs prone to excitability, stranger- and dog-directed aggression, and social and nonsocial fear were less likely to successfully complete training. Lastly, Svobodová et al. (14) found that puppies that were more willing to chase and fetch a ball, and least reactive to noise, were the most likely to pass police dog certification. Therefore, previous studies have identified a range of temperamental and behavioral traits that relate to working dog training outcomes.

However, working dogs also face a variety of cognitive challenges, both in their initial training, and throughout their working lives (2, 18). Therefore, it is possible that individual differences in dog cognition also explain variance in aptitude for working roles (2). The cognitive skills that dogs require in these roles are likely to be diverse, extending beyond the basic learning mechanisms typically emphasized in dog training (e.g., operant and classical conditioning). For example, while animal trainers can shape behavior so that a dog associates the completion of a goal (e.g., retrieve the keys), with a social or food reward, trainers cannot train animals to flexibly and spontaneously respond to barriers that might prevent the completion of a trained goal (e.g., the closest door to the room where the keys are is closed, the keys are on the floor among many objects and are partially occluded by a book). In retrieving the keys, it is cognitive flexibility and not just temperament or trained responses that allows a dog to solve the problem. A dog's cognitive abilities allow her to mentally represent space and infer the need to take a detour (19, 20); to categorize objects as either being keys or not, and to inhibit bringing back incorrect object(s) (21–23); to maintain a mental representation of the referent of the verbal command (keys) in short-term memory even though it is not at first visible (24, 25); and to understand the communicative intention behind a human pointing gesture to infer the location of the occluded keys and finally retrieve them (26, 27).

Relative to studies of temperament, there have been very few investigations of whether individual differences in cognition relate to aptitude as a working dog. Bray et al. (2) recently tested young adult candidate guide dogs with a series of temperament and problem-solving tasks, and found that performance on a multistep problem-solving task was a significant predictor of subsequent success in the guide dog program. However, current work has employed relatively few cognitive measures, and has not explored associations between cognition and working dog outcomes in working roles beyond guide dogs.

The importance of assessing dog cognition broadly is evidenced through work describing the psychometric structure of individual differences in dog cognition. Specifically, individual differences in dogs are best described by multiple factors, reflecting psychological processes such as memory, understanding of communicative intentions, inhibitory control, and social engagement (28, 29). Thus, individual differences in dog cognition vary across multiple cognitive domains, yet we know little about which domains of cognition are most important for working dogs. Moreover, given that different working roles present different sets of job-specific challenges, it is probable that the aspects of cognition associated with working dog success will vary between different working roles. Therefore, the central challenges for this line of research are to (1) measure diverse cognitive processes when assessing links between cognition and working dog performance, and (2) identify the specific links between these aspects of cognition, and success in different working dog roles.

Here, we present a series of studies investigating associations between individual differences in dog cognition and success as an assistance or explosive detection dog. We implemented a similar research approach with both working dog populations. Within each population, we first conducted an exploratory study in which a sample of dogs was tested with a 25-item cognitive test battery, probing diverse aspects of dog cognition. We then identified associations between individual differences on items in this test battery, and measures of success as a working dog. Based on these preliminary findings, we then developed short-format test batteries including the subset of tasks that were most strongly associated with outcomes (specific to each population). Lastly, we implemented predictive models (Experiment 1) or a replication study with an independent sample of dogs (Experiment 2) to validate or confirm the associations between individual differences on the cognitive measures, and measures of success as a working dog.

General Methods (All Populations)

During exploratory studies, large samples from both populations of working dogs were tested in the Dog Cognition Test Battery [DCTB; (28)]. The DCTB consists of 25 problem solving tasks designed to assess skills for reasoning about social and physical problems as well as domain-general cognitive processes. A detailed description of the DCTB, including its factor structure and implementation with the populations described here, is reported by MacLean et al. (28). All tasks in the DCTB are described briefly in Table 1, and were conducted by trained experimenters (university students and researchers). Detailed methods for these tasks are provided in the Supplemental Material. In all experiments, researchers were blinded to the training outcomes of dogs during testing, and dog trainers were blind to the results of the cognitive tests.

TABLE 1

Table 1. Brief descriptions of measures included in the Dog Cognition Test Battery (DCTB).

All testing was voluntary, and dogs were free to stop participating at any time. Subjects participated for food and toy rewards, and were not deprived of food or water. All experimental procedures were approved by the Duke University Institutional Animal Care and Use Committee (protocol number: A138-11-06). Inter-rater reliability was assessed for a randomly selected sample of 20% of the data and was excellent for all measures [mean ± SEM: kappa = 0.96; correlation = 0.94 ± 0.02; (28)].

Experiment 1

Exploratory Study

Methods

Subjects

Candidate assistance dogs were tested at Canine Companions for Independence (CCI) in Santa Rosa, CA (N = 164; 107 females, 57 males, 19 Labrador retrievers, 4 golden retrievers, 141 Labrador retriever x Golden retriever crosses). CCI raises and places assistance dogs for diverse roles, including service dogs (placed with adults with physical disabilities), hearing dogs (placed with adults who are deaf or hard of hearing), skilled companion and facility dogs (placed with an adult or child with a disability under the guidance of a facilitator, or partnered with a facilitator in a health care, visitation, or education setting). Because hearing dogs are selected for a different behavioral phenotype than the other roles, hearing dogs (N = 21) were excluded from analysis. Dogs who aborted more than 2 tasks in the battery (N = 24) or were released for medical reasons (N = 8), were also excluded from analysis, yielding a final sample of 111 dogs included for exploratory analysis (mean age = 1.98 years, SD = 0.19 years).

Assistance dog outcome measures

After entering professional training and passing medical clearances, dogs in CCI's program either graduate and are placed in one of the roles described above, or are released for behavioral problems during training (a decision made by professional trainers without input from the researchers or knowledge of performance on cognitive tests). Therefore, success was coded as a categorical variable with two levels (graduate, release). In the sample for the exploratory study, 76 dogs graduated the program and 35 dogs were released. Assistance dogs were tested in the DCTB prior to obtaining an outcome in the program.

Analysis

For exploratory analysis, we implemented a variety of predictive modeling strategies. The cognitive data were prepared for analysis using a Box-Cox transformation (30) with missing data imputed using a K nearest neighbors approach. Exploratory analysis was conducted using eight different predictive modeling techniques to assess the utility of diverse modeling strategies, as well as consensus across models regarding variable importance. Specifically, we employed the following models from the caret package (31) in the R programming environment (32): (1) generalized linear model [GLM], (2) linear discriminant analysis [LDA], (3) regularized regression [RR], (4) partial least squares [PLS], (5) naïve Bayes classification [NB], (6) multivariate adaptive regression splines [MARS], (7) K nearest neighbors, and (8) random forest.

Because our aim was to develop a short-format battery using only the cognitive measures most strongly associated with training outcomes, we investigated the relative importance of predictor variables across models. To do so, we extracted the variable importance statistic from each of the training models, which reflects the relative importance of variables in a model (31), with values scaled to a range between 0 (unimportant) and 100 (most important). We also conducted univariate analyses (logistic regression) using each individual cognitive task as a predictor of training outcomes. We ranked the results of these analyses by p-value, and interpreted associations with the smallest p-values as those warranting future investigation. Across these analyses we identified 5 cognitive measures which were implicated as being strongly associated with training outcomes in exploratory analyses (causal reasoning [visual], spatial transpositions, inferential reasoning, cylinder, and social referencing). We further identified an additional 6 measures with more modest associations with training outcomes, or which were important covariates, that warranted further investigation (unsolvable task, odor discrimination, laterality: object manipulation, laterality: first step, arm pointing, and reward preference).

We then used data from these 11 measures to develop statistical models predicting training outcomes. Models were trained and evaluated using 4-fold cross validation, repeated 100 times (data randomly divided into 4-folds, 3 folds used for model construction, 1 fold used to assess model accuracy, with this process repeated 100 times). To assess model performance we used the cross-validated accuracy and area under the curve (AUC) from the receiver operating characteristic (ROC), a measure of sensitivity and specificity for a binary classifier. AUC values range between 0.5 and 1, with a value of 0.5 indicating a non-informative model, and a value of 1 indicating a perfectly predictive model.

Categorical predictions (graduate, release) were made using a probability threshold of 0.5 (i.e., predict release when predicted probability of graduation <0.5; predict graduate when predicted probability of graduation >0.5) but we retained predicted probabilities for additional analyses. Specifically, to assess whether accuracy was higher for observations with the highest and lowest predicted probability of success, we calculated cross-validated model accuracies for dogs with predicted probabilities of success in the 1st and 4th quartiles (at each iteration of the cross validation procedure).

Results and Discussion

The results from predictive models using the 11 candidate cognitive measures in the exploratory study are shown in Table 2. The best performing models (partial least squares, k nearest neighbors [n = 7]) yielded cross-validated AUCs of 0.76, and an overall cross-validated accuracy of 72%. However, all models tended to be much more accurate for dogs predicted to have the highest probabilities of graduating (Table 2). Specifically, for dogs in the 4th quartile of predicted probability of success (i.e., the 25% of dogs with the highest predicted probability of success), predictions were 87% accurate during cross validation. In contrast, for dogs in the 1st quartile of predicted probability of success, predictions tended to be less accurate (mean = 50%). Thus, we expected that future predictions would be most reliable for dogs with the highest predicted probabilities of success.

TABLE 2

Table 2. Model statistics from the exploratory phase of Experiment 1.

On a descriptive level, the largest differences between graduate and released dogs in the exploratory study were for the following tasks: spatial transposition, odor discrimination, causal reasoning (visual), unsolvable task (look at experimenter), inferential reasoning, social referencing. Specifically, graduate dogs scored higher than released dogs on the odor discrimination and inferential reasoning tasks, and made more eye contact with the experimenter during the social referencing and unsolvable tasks. However, release dogs scored higher than graduate dogs on the spatial transpositions task (one-cross condition) and the causal reasoning (visual) task. Therefore there was no clear pattern of graduates or releases systematically scoring higher across diverse cognitive measures.

Prediction Study

Following the exploratory study, we designed a shorter test battery consisting of the 11 tasks determined to be potentially promising measures during initial predictive modeling. Because some of these tasks included relatively few trials in their initial format, we added additional trials to assess whether more data from these measures would improve predictive power. Specifically, we implemented changes in the number of trials as follows: causal reasoning [visual]: 4 trials → 8 trials; spatial transpositions [one-cross condition]: 4 trials → 8 trials; inferential reasoning: 6 trials → 10 trials; odor discrimination: 6 trials → 10 trials. The revised test battery consisted of two test sessions (conducted on two consecutive days).

Subjects

We tested an independent sample of 180 dogs (i.e., none had participated in the exploratory study) in the revised assistance dog battery (115 females, 65 males, 43 Labrador retrievers, 4 golden retrievers, 133 Labrador retriever X golden retriever crosses). Of these, 33 dogs were excluded from analysis because they were transitioned into the hearing dog program (N = 19), released for medical reasons (N = 5), placed in a new program not represented in the training data (N = 7), or were still in training at the time of the analysis (N = 2). Twenty-six additional dogs were excluded from analysis due to missing data on more than two of the cognitive predictor variables. Therefore, our final sample for predictive modeling included 121 dogs (mean age = 1.88 years, SD = 0.23 years).

Procedure

Testing procedures were identical to those in the exploratory study with the exception that the revised battery included a smaller number of tasks, as well as additional test trials for some tasks, as described above. The order of tasks in the revised test battery for assistance dogs was: Day 1: warm-ups > causal reasoning (visual) > spatial transpositions > inferential reasoning > cylinder > mutual gaze; Day 2: warm-ups > unsolvable task > odor discrimination > laterality (object manipulation) > arm pointing > laterality (first step) > reward preference. Inter-rater reliability was assessed for ~40% of trials in the prediction study (Cohen's κ for discrete measures, Pearson correlation for continuous measures), and was excellent across measures (mean Cohen's κ = 0.98; mean Pearson's R: 0.96).

Analysis

To assess predictive validity, we used the predictive models in the exploratory study to predict training outcomes for dogs in the independent sample tested on the short-format battery. As in the exploratory study, we assessed model performance via accuracy and area under the curve (AUC) from the receiver operating characteristic (ROC). To assess the effect of including additional test trials in the short-format battery, we initially ran all predictive models both including and excluding data from these additional trials. Model performance was better using data including the additional trials, and we report these analyses below. Based on the results of the exploratory study, we expected that models would be most accurate for the subset of dogs with the highest predicted probability of success. To evaluate this prediction, we calculated accuracy separately for dogs with predicted probabilities of success in the 4th quartile of predicted probabilities.

Results and Discussion

At a descriptive level, all but one model (Naïve Bayes Classifier), predicted higher average probabilities of success for dogs that ultimately did graduate from the program, than for dogs who were released from training. However, one-tailed t-tests indicated that only the random forest model yielded predicted probabilities of success that were significantly higher for graduate than release dogs (Table 3). The best performing models (generalized linear model, random forest, and regularized regression) yielded AUCs of 0.60-0.61 (accuracy range: 68–74%; Table 4). Thus, overall model performance was considerably poorer than expected based on initial cross-validation with the training data set. However, the distribution of training outcomes varied considerably between the exploratory and prediction datasets, an issue that can seriously affect model performance (33). Specifically, in the exploratory dataset, 68% of dogs graduated from the program whereas considerably more dogs did so in the independent sample (77%). In addition, as expected based on the exploratory study, predictions were much more accurate for dogs in the 4th quartile of predicted probabilities of success. On average (across models), outcome predictions for dogs with predicted probabilities of success in the 4th quartile were 86% accurate (Table 4). Two models (linear discriminant analysis and random forest) yielded predictions that were 90% accurate for this subset of observations (Table 4).

TABLE 3

Table 3. Results from t-tests comparing the predicted probability of success for dogs that were ultimately successful (graduates) or unsuccessful (releases) in the assistance dog training program.

TABLE 4

Table 4. Model statistics from the prediction study of Experiment 1.

As a further test of the ability to discriminate between dogs most and least likely to succeed, we used one-tailed t-tests to compare the predicted probability of success for graduate and release dogs, restricting our analyses to dogs with predicted probabilities of success in the 1st and 4th quartiles (calculated separately for each model). In these analyses, 5 of 8 models produced significantly higher predicted probabilities of success for graduate compared to release dogs (Table 3; Figure 1). At a descriptive level, some differences in mean performance between graduate and release dogs were consistent between the exploratory and predictive studies, whereas others were not. Consistent with findings from the exploratory study, in the independent sample, graduate dogs again tended to make more eye contact with the experimenter in the unsolvable and social referencing tasks, and tended to score higher on the inferential reasoning task. However, in contrast to the exploratory study, graduate dogs scored higher on the spatial transpositions task, and scored lower on the odor discrimination task.

FIGURE 1

Figure 1. Mean predicted probability of success (±SEM) for dogs that ultimately graduated (blue points), or were released from the training program (yellow points), restricting data to dogs with predicted probabilities of the success in the 1st and 4th quartiles. Asterisks indicate significant differences at p < 0.05. LDA, Linear discriminant analysis; GLM, Generalized linear model; RR, Regularized regression; PLS, Partial least squares; NB, naïve Bayes; MARS, multivariate adaptive regression splines; KNN, K nearest neighbors; RF, random forest.

Overall, we were able to produce useful predictions regarding training outcomes with an independent sample, although relative to initial cross-validations, predictions for the independent sample tended to be less accurate. One important finding from this study was that predictions were much more accurate for the subset of dogs predicted to have the highest probability of success (with the strongest models performing at 90% accuracy in these cases). Therefore, from an applied perspective, we expect that it will be challenging to produce accurate predictions for all candidate assistance dogs, but that these measures and models may be particularly valuable for identifying the subset of dogs with the most potential for success. Given that these cognitive measures (1) can be collected in <2 h per dog, (2) do not require any training of dog participants, and (3) do not require specialized or costly equipment, these types of measures will provide a useful addition to the existing screening mechanisms employed by assistance dog agencies.