Testing for Behavioral and Physiological Responses of Domestic Horses (Equus caballus) Across Different Contexts – Consistency Over Time and Effects of Context

In a number of species, consistent behavioral differences between individuals have been described in standardized tests, e.g., novel object, open field test. Different behavioral expressions are reflective of different coping strategies of individuals in stressful situations. A causal link between behavioral responses and the activation of the physiological stress response is assumed but not thoroughly studied. Also, most standard paradigms investigating individual behavioral differences are framed in a fearful context, therefore the present study aimed to add a test in a more positive context, the feeding context. We assessed individual differences in physiological [heart rate (HR)] and behavioral responses (presence or absence of pawing, startle response, defecation, snorting) of 20 domestic horses (Equus caballus) in two behavioral experiments, a novel object presentation and a pre-feeding excitement test. Experiments were conducted twice, once between July and August, and once between September and October. Both experiments caused higher mean HR in the first 10 s after stimulus presentation compared to a control condition, but mean HR did not differ between the experimental conditions. In the novel object experiment, horses displaying stress-related behaviors during the experiments also showed a significantly higher HR increase compared to horses which did not display any stress-related behaviors, reflecting a correlation between behavioral and physiological responses to the novel object. On the contrary, in the pre-feeding experiments, horses that showed fewer behavioral responses had a greater HR increase, indicating the physiological response being due to emotional arousal and not behavioral activity. Moreover, HR response to experimental situations varied significantly between individuals. Individual average HR was significantly repeatable across both experiments, whereas HR increase was only significantly repeatable during the novel object and not the pre-feeding experiment. Conversely, behavioral response was not repeatable. In conclusion, our findings show that horses’ behavioral and physiological responses differed between test situations and that emotional reactivity, shown via mean HR and HR increase, is not always displayed behaviorally, suggesting that behavioral and physiological responses may be regulated independently according to context.

In a number of species, consistent behavioral differences between individuals have been described in standardized tests, e.g., novel object, open field test. Different behavioral expressions are reflective of different coping strategies of individuals in stressful situations. A causal link between behavioral responses and the activation of the physiological stress response is assumed but not thoroughly studied. Also, most standard paradigms investigating individual behavioral differences are framed in a fearful context, therefore the present study aimed to add a test in a more positive context, the feeding context. We assessed individual differences in physiological [heart rate (HR)] and behavioral responses (presence or absence of pawing, startle response, defecation, snorting) of 20 domestic horses (Equus caballus) in two behavioral experiments, a novel object presentation and a pre-feeding excitement test. Experiments were conducted twice, once between July and August, and once between September and October. Both experiments caused higher mean HR in the first 10 s after stimulus presentation compared to a control condition, but mean HR did not differ between the experimental conditions. In the novel object experiment, horses displaying stress-related behaviors during the experiments also showed a significantly higher HR increase compared to horses which did not display any stress-related behaviors, reflecting a correlation between behavioral and physiological responses to the novel object. On the contrary, in the pre-feeding experiments, horses that showed fewer behavioral responses had a greater HR increase, indicating the physiological response being due to emotional arousal and not behavioral activity. Moreover, HR response to experimental situations varied significantly between individuals. Individual average HR was significantly repeatable across both experiments, whereas HR increase was only significantly repeatable during the novel object and not the pre-feeding experiment. Conversely, behavioral response was not repeatable. In conclusion, our findings show

INTRODUCTION
Repeatable individual variation of physiological and/or behavioral responses across time and contexts, known as personality or temperament, has been very much in focus of scientific research in recent years and was described in a vast variety of species, including horses (Equus caballus) (Goldsmith et al., 1987;Le Scolan et al., 1997;Momozawa et al., 2005;Cockrem, 2007;Lansade et al., 2008;Grajfoner et al., 2010;Olczak et al., 2018). Differences in responses result from an individual's perception of a potential threat to homeostasis, caused by extrinsic or intrinsic stimuli (stressors) provoking the activation of the physiological stress response in animals and causing behavioral changes (Moberg, 1985;Chrousus and Gold, 1992). In a framework provided by Koolhaas et al. (1999) for rodents, behavioral responses to a stressor usually range from a proactive (fight-flight) to reactive (freeze-hide) axis, linked to low hypothalamic-pituitary-adrenal (HPA) axis and high sympatho-adreno-medullary (SAM) axis reactivity in proactive individuals, and high HPA and low SAM axis reactivity in reactive individuals (Koolhaas et al., 1999). Since then, most studies investigating differences between individuals focus on behavioral responses in experimental situations, such as novel object exposure or open field exploration (Carter et al., 2013). However, the relationship between behavior and individual physiological stress response remains unclear, as it seems to vary across studies.
Individual differences in behavioral and physiological responses to stressors are often determined by early environmental stimuli, such as differences in maternal investment (Stamps, 2003;Claessens et al., 2011). For example, Meaney (2001) showed how adult rats (Rattus norvegicus) are more fearful and sensitive to stress if they were raised in the first 8 days of their life by mothers that licked their body and anogenital regions in a frequency lower to the average of the cohort. Contrarily, rats raised by mothers performing a higher-than-average frequency of body and anogenital region licking showed lower fearfulness and sensitivity to stress in adulthood. Moreover, animals can be bred to either show high or low responsiveness to stressors, indicating that individual differences in stress-responsiveness are heritable (Flaherty and Rowan, 1989;Carere et al., 2003). Studies on great tits (Parus major) have shown individual physiological stress responses to be related to differences in exploration strategies (Carere and van Oers, 2004) and heritable throughout four generations (Drent et al., 2003).
In horses, diverse factors influencing individual differences in behavior and physiology have been identified in terms of experience, such as habituation (Leiner and Fendt, 2011), diet (Bulmer et al., 2015), handling (Visser et al., 2002), and maternal behavior (Houpt and Hintz, 1983). Similarly, breed has also been found to strongly influence individual reactivity, suggesting a relationship between individual responsiveness and the heritability of traits (Hausberger et al., 2004;Lloyd et al., 2008). Further studies on equine temperament have focused on the assessment of horse responsiveness to different stimuli such as diverse environmental conditions (McCall et al., 2006;Schmidt et al., 2010a,b), novel situations (Visser et al., 2001(Visser et al., , 2002Fureix et al., 2009;Leiner and Fendt, 2011;Ellis et al., 2014) human interactions in terms of both handling (Fureix et al., 2009;König von Brostel et al., 2011;Ellis et al., 2014) and riding (Visser et al., 2008) and have shown how an individual's response to threat -or "fearfulness" -is stable across time (Visser et al., 2001(Visser et al., , 2003Lansade et al., 2008). However, similar to studies in other species, research investigating personality differences in horses often base their categorization only on observations of behaviors. For example, Grajfoner et al. (2010) compared behavioral ratings between high and low performing horses, showing how the combination of multiple traits, such as a horse being "nice, " "patient, " "easy to handle, " shape the perceived personality of horses. Nonetheless, contrasting results have been found according to the relationship between heart rate (HR) and behavioral parameters. For instance, Momozawa et al. (2003) describe correlations between behavior and HR in their fear-inducing experiments, with more anxious horses showing a higher HR increase and more stress-related behaviors, such as defecation, during the experiment. However, subsequent research reports a lack of this relationship (Christensen et al., 2005;Lansade et al., 2008). Moreover, the lack of stress-related behaviors does not always reflect a lack of physiological stress-response, with studies in horses and cattle (Bos taurus) showing that low-behaviorally respondent individuals had higher physiological reactivity (e.g., Jezierski et al., 1999;Welp et al., 2004;Christensen et al., 2005;Lansade et al., 2008).
If a stimulus is perceived as a threat to homeostasis, individuals react with a physiological stress-response, which often reflects an increase in emotional arousal. Emotional arousal is defined as an internal state, which is triggered by specific extrinsic or intrinsic stimuli (Visser et al., 2003;Lansade et al., 2008;Anderson and Adolphs, 2014). Emotional arousal can range between the subject being calm -low arousal, and excited -high arousal, as well as the experience being of positive or negative valence (Russell, 1980). Therefore, the perception of a threatening stimulus can result in an increase in negatively valanced emotional arousal. On the contrary, if it is not perceived as threat but as positively exciting, such stimulus would cause a positivelyvalanced emotional arousal. The activation of a physiological stress response is often quantified by measuring HR, which therefore can be regarded as a valid standardized, objective, and non-invasive indicator of emotional arousal. Emotional arousal causes changes in behavior, cognition, and physiology (Anderson and Adolphs, 2014). Studies in non-human animals mostly use behavioral measures to quantify emotional arousal (e.g., Anderson et al., 2015;Barnard et al., 2015;Finlayson et al., 2016;Bennett et al., 2017;Albuquerque et al., 2018). Briefer et al. (2015) have studied the effect of emotional arousal and valence on the physiological and behavioral response in goats, showing both, positive (feeding) as well as negative (frustration, isolation) emotional context to cause a significant HR increase and changes in behavior like ear posture compared to a control situation. However, emotional arousal was also found of not being always expressed behaviorally (Wascher et al., 2008). Therefore, it is important to understand how behavior, emotional, and physiological responses are linked.
In the present study, we aim at investigating individual differences in emotional arousal in horses in response to two experimental paradigms, a novel object exposure and a test of pre-feeding excitement. Experimental assessments of animal personality usually focus on stressful contexts, e.g., novel object exposure, open field test. In this study, we aim to investigate consistencies in behavioral and physiological responses across contexts of different valence. Furthermore, we aim at gaining further understanding of how emotional arousal relates to individual differences in behavioral and physiological reactivity and how the responses are interlinked. In particular, we question whether physiological responses during a novel object presentation and pre-feeding test are caused by behavioral changes, e.g., locomotion or, in the absence of behavioral activity or locomotion, by emotional arousal. Furthermore, we ask whether behavioral and physiological responses are stable across time and contexts. We expect that horses show a greater physiological reaction to a fear-inducing situation such as being exposed to a novel object, compared to the anticipatory pre-feeding experiment. Also, due to its greater salience, we expect horse physiological and behavioral responses to the novel object to be of greater similarity over time compared to the response of individuals to the pre-feeding excitement test, which would not represent an event that horses would often encounter in the wild.

Animals and Housing
The study was conducted at the equine yard of the College of West Anglia (United Kingdom) between July and November 2017. The research was conducted on 20 horses which were individually stabled in loose boxes. Five of the 20 horses were tested only in one of the two experimental conditions, two solely for pre-feeding excitement and three only for novel object test, due to their lack of availability during testing periods ( Table 1). The sample included 14 geldings [age: mean 11.8 ± 3.8 years (yrs), range 6-18 yrs] and six mares (age: mean 11.8 ± 2.6 yrs, range 10-17 yrs) of diverse breeds, use, and training experiences ( Table 1). The horses were fed twice a day: once in the morning (0800-0830) and once in the afternoon (1500-1600). Water was Mean heart rate expressed in beats per minute (bpm) and standard deviation during the different experimental situations: novel object experiment (NO) and pre-feeding excitement test (PF). Numbers 1 and 2 indicate repetition of experiments, with number 1 indicated tests in summer (July/August) and number 2 tests in autumn (October/November). The color code indicates the horses being categorized as high (orange) or low (blue) behavioral responders.
available ad libitum, and feces were removed from the stables after the horses were fed. In the late afternoon (1600-1700), some of the horses were turned out in the paddock for the night.

Experimental Design
We conducted two experimental tests -the pre-feeding excitement and the novel object test, presented in random order, with one trial per horse and repeated in summer (test 1 -July/August) and in autumn (test 2 -October/November). Behavior was recorded by video camera (Canon Legria HF R56), and HR was recorded using a Polar R V800 system. The belt was placed around the chest of the horses, positioned where a saddle or vaulting girth would normally sit. The belt consists of an electrode belt with a built-in transmitter, connected via Bluetooth to a wristwatch (receiver). To optimize the contact between the belt and the skin, both the coat of the horse in the interested area and the belt were wetted. The receiver was placed at the stable entrance or inside the stable. All horses were already habituated to wearing the HR belt prior to the present study. Before each test, an adjustment period of 5 min was allowed to exclude potential effects of prior handling (Figures 1, 2).
In the novel object test, the horses were exposed to one of three different objects in their stable. The first object was formed of a main cylindrical hard body (approximately 30 cm in length and 7 cm in diameter) filled with gravel which was fixed to a soft foam rubber ball (about 15 cm in diameter) and covered in blue fabric. The second object was formed of two cylindrical plastic tubes fixed together to form an "x." Similar to the first object, the cylinders were approximately 30 cm in length and 7 cm in diameter, filled with gravel and covered with yellow fabric. In addition, 12 tennis balls of different colors (green, blue, red, and yellow) and materials were pierced and attached to four strings (three balls per string) of approximately 50 cm in length. These were then tied to the main body of the object and left hanging. The third object was a pink inflatable guitar of approximately 1 m in length. All objects were attached to a string, around 4 m long, to allow their retrieval from the stable and are shown in Supplementary Figure S1.
The object assigned to each individual was randomized for each season as well as the order of the horses tested. For the second repeat, the object was chosen randomly from the remaining two objects. To avoid the horses seeing the object before testing, the objects were covered from sight when carried around the yard. The novel object tests took place between the hours of 0900 and 1300 and between 1500 and 1800 when the yard was quiet, and the horses were fed. The test was based on the procedure described by Górecka-Bruzda et al. (2011) and Dai et al. (2015) and adapted for the present experiment. A novel object was placed over the box entrance, with the cord hanging over the stable door to keep the object at the height of approximately 1 m. The object was kept in this position for the following 5 min and was then dropped to the floor (the objects filled with gravel created a muffled noise). The horse reaction was recorded for the following 5 min. Thereafter, the object was removed from the stable, while behavioral monitoring and HR measurement continued for another 15 min ( Figure 3A).
The pre-feeding excitement test was conducted during morning feeds (0800-0830) and started with the horses being FIGURE 1 | Example heart rate of an individual from the start of heart rate recordings and during the novel object experiment. The area shaded in blue presents a 5 min habituation period after the heart rate monitor is placed on the horse, but before the start of the novel object exposure. x-axis: time in minutes; y-axis: heart rate in beats per minute (bpm).
Frontiers in Psychology | www.frontiersin.org FIGURE 2 | Example heart rate of an individual from the start of heart rate recordings and during the pre-feeding excitement experiment. The area shaded in blue presents a 5-min habituation period after the heart rate monitor is placed on the horse, but before the start of the experiment. x-axis: time in minutes; y-axis: heart rate in beats per minute (bpm).
shown a bucket containing their individual mix of hard feed on the floor outside the stable, while the other horses were fed. The horses' physiological and behavioral responses were recorded for 5 min. Thereafter, the horses were given their hard feed by placing the feeding bucket inside the box and their behavioral and physiological responses were measured for the following 10 min (Figure 3B).

Data Processing
Raw HR data were purged with a moving average filter to remove biologically implausible outlier values. Due to the quick regulation of HR (von Borell et al., 2007), the following HR variables were calculated: (1) mean HR in beats per minute (bpm) for the 10 s preceding and following the introduction of the hard feed inside the box, as well as preceding and following the presentation, drop and removal of the object; (2) HR increase in bpm following the food introduction and novel object presentation, drop, and removal, calculated as difference between maximum value within 60 s from the exposure to the stimulus and 3 s average HR before the presentation of the stimulus. Such timeframes were selected in order to assess the immediate cardiac response of the horses to the stimulus, as well as to measure the degree of activation of the SAM, which can often appear only after 20-30 s after the detection of the stressor (von Borell et al., 2007). For each horse, we calculated three average HR values for the novel object experiments: one following the presentation of the object, one following the drop of the object, and one following the removal of the object. For the pre-feeding experiments, two average HR values were calculated: one before the hard feed was given to the horse and one after the introduction of the hard feed in the stable. For both novel object and pre-feeding experiments, control average HRs were calculated from the 10 s preceding the presentation of the novel object ( Figure 3A).
Behavioral responses of the horses were analyzed from videos using Solomon Coder v. beta 17.03.22 (©András Péter 1 ) and an ethogram of the behaviors coded can be found in Supplementary Table S1. Behavior of the individuals was analyzed for the five minutes prior to the presentation of the hard feed for the pre-feeding excitement. For the novel object task, the 5 min following the presentation and drop of the object and the 2 min following its removal were analyzed. The behavior analyzed included walking, pawing, occurrence of vocalizations (snorting and whinnying), occurrence of startle response, and defecating as their frequency has been shown to increase in threatening and stress-inducing situations (Seaman et al., 2002;Lansade et al., 2008;Leiner and Fendt, 2011). Behavior was recorded as continuous variables, e.g., walking as duration of behavior in s per observation period, or frequency of behavior per observation period, e.g., snorting. The classification of the individual in high and low behavioral respondents was based on the frequency of vocalizations and duration of pawing behavior for the pre-feeding experiment; whereas on frequency of startle response, defecation, and vocalizations for the novel object experiment ( Table 2). Specifically, horses were classified as high behavioral respondents in the pre-feeding experiment if they performed more than two vocalizations and/or more than 20 s of pawing. Horses that performed less than two vocalizations and/or less than 20 s of pawing during the pre-feeding experiment were categorized as low respondents. For the novel object test, horses were classified as high behavioral respondents if they performed a startle response and/or defecated and/or vocalized for more than four times. Horses were classified as low respondents in the novel object test if they performed less than four vocalizations, no startle response and no defecation. The vocalization threshold was increased for the subdivision of the horses in low and high respondent in the novel object test due to the exposure of the horses to more stimuli (presentation, drop, and removal of object) in this test, compared to the sole presentation of the hard feed in the pre-feeding test. Walking was recorded to assess effects of locomotion on HR responses.

Statistical Analysis
All data were analyzed using R version 3.4.3 (RStudio Team, 2016;R Core Team, 2017). In order to investigate how behavioral and physiological reactivity of horses varied across contexts and how such responses were interlinked, we conducted two generalized linear mixed models (GLMMs) with the additional packages "glmmADMB" (Skaug et al., 2016). The response variable was assigned to the 10-s average HR for the first model (GLMM1) and to the HR increase for the second (GLMM2). Both models had the same fixed factors, namely, the experimental situation (pre-feeding excitement, novel object or control), test number (first vs second), behavioral categorization (high versus low respondents), and Pre-feeding Low Less than two vocalizations (snorting and/or whinnying) and less than 20 s of pawing behavior 8 11 High More than two vocalizations (snorting and/or whinnying) and/or pawing for more than 20 s 9 6 Novel object Low No defecation, no startle response, less than four vocalizations (snorting and/or whinnying) 9 14 High Defecation, and/or startle response, and/or more than four vocalizations (snorting and/or whinnying) 9 4 Frontiers in Psychology | www.frontiersin.org locomotion (duration of walking), together with the interaction between experiment and behavioral response categorization, as well as the interaction between experiment and locomotion. For the purpose of the analysis, the different conditions of each experiment -such as the time before and after the presentation of the feed in the pre-feeding experiment, and the presentation, drop and removal of the object in the novel object experiment -were individually included in the dataset, resulting in horses having multiple values for each experiment. The "multicomp" (Hothorn et al., 2008) package was used to conduct the post hoc analysis. In particular, Tukey test for multiple comparisons was chosen to gain further understanding of the effect of the fixed factors in the models. We analyzed multicollinearity between fixed factors by calculating the variance inflation factors (VIFs) through the "vif " function in the package "car" (Fox and Weisberg, 2011). VIFs for both models were below 1.02, indicating no issue with multicollinearity being present (Zuur et al., 2009). A likelihood ratio test was used to compare models fit according to presence or absence of the individual random effect.
To analyze the consistency of both behavioral and physiological responses over time, we used the "rptR" package (Stoffel et al., 2017). In particular, we assessed the repeatability of the 10-s average HR and HR increase with 1000 permutations for the physiological reactivity data collected for the control, novel object, and pre-feeding conditions. The repeatability of behavioral categorization was assessed by coding with 1 the individuals showing a high behavioral response and 0 the horses performing little behavioral response and conducted with 1000 permutations for the novel object and pre-feeding experiments separately. The significance level was set at α = 0.05.

Average Heart Rate
Average HR of the horses was significantly higher during the novel object experiment compared to the control period (Tukey: z = 4.980, p < 0.001; Figures 4A, 5) and tended to be higher during the pre-feeding excitement compared to the control period (Tukey: z = 2.104, p = 0.083; Figures 1A, 5). HR between novel object and pre-feeding excitement was not significantly different (Tukey: mean: z = −1.986, p = 0.108; Figures 4A, 5). We found a significant interaction between behavioral response categorization and experiment affecting HR. Average HR during the novel object experiment was significantly lower in the group of horses showing a low behavioral response compared to horses showing a high behavioral response (GLMM1: z = −3.66, p < 0.001; Figure 4A). Locomotion did not have any effect on the average HR of the horses (GLMM1: z = −0.77, p = 0.439) and the average HR model including individual identity as random factor had a significantly better fit compared to the model without the random effect (ANOVA: deviance = 5.688, df = 1, p = 0.017). Full model results can be found in Table 3.

Heart Rate Increase
Despite there being no difference between average HRs in the two experimental conditions, horses showed a significantly lower HR increase in the pre-feeding experiment (GLMM2: z = −2.97, p = 0.003; Figure 4B). Contrarily to what was seen in the average HR data, horses showing a low behavioral response were observed of having a higher HR increase compared to individuals with a high behavioral response (GLMM2: z = 3.34, p < 0.001; FIGURE 4 | Effect of behavioral categorization on the mean heart rate (HR) (A) and HR increase (B) of the horses recorded during the study. High and low behavioral category indicates whether the individual a more or less intense behavioral response during the testing situation. Boxplots represent the median (black bar), the interquartile range -IQR (boxes), maximum and minimum values excluding outliers (whiskers) and outliers (black dots). * * p < 0.01 and * * * p < 0.001.  Figure 4B). Locomotion affected the HR increase of the subjects (GLMM2: z = 2.61, p = 0.009). Nonetheless, its interaction with experiment did not affect the data (GLMM2: z = −1.36, p = 0.174). Finally, the fit of the HR increase model having subject identity as a random factor did not vary from that of the model without the random effect (ANOVA: deviance = 1.05, df = 1, p = 0.306). Full model results can be found in Table 4.

Overall and Individual Repeatability of Physiological and Behavioral Responses
As a group, horses showed an overall tendency for mean HR to be higher during the first experimental session compared to the second repeat (GLMM1: z = −1.86, p = 0.063). Conversely, at the individual level, horses' average HR was significantly repeatable across both experiments, showing how individual horses were consistent in their average HR response across repeats. In particular, individual horses' average HR was more consistent during the novel object experiment (R = 0.372, CI 95% [0.129, 0.575], p = 0.001), compared to the average HR of horses during the pre-feeding test (R = 0.221, CI 95% [0, 0.467], p = 0.022).
Similarly, HR increase was not significantly different between the two experimental repeats (GLMM2: z = 1.51, p = 0.131). Nonetheless, at the individual level, only the HR increase during the novel object test was significantly repeatable (R = 0.386, CI 95% [0.142, 0.572], p = 0.001), with the HR increase of individual horses during the pre-feeding experiment not being repeatable, and therefore not consistent, across test repeats (R = 0, CI 95% [0, 0.443], p = 1).
Out of the 18 horses tested in the novel object experiment, 11 showed consistent behavioral response between the two repeats. Of these 11, eight consistently showed a low behavioral response and three a high behavioral response. For the prefeeding experiment, 12 horses showed a constant response across repeats, five of which were categorized as high behavioral respondent and seven as low behavioral respondent. Overall, nine horses performed a constant behavioral response in both experiments, of which only four horses being consistent in their behavioral response across experiments and repeats, and five showing opposite responses in the two experimental tests (Table 1). Nonetheless, the analysis showed how behavioral categorization of horses was not significantly repeatable across both novel object test and pre-feeding test (

DISCUSSION
In the present study, we investigated individual behavioral and physiological responses of horses during two experimental procedures, a novel object experiment (NO) and a pre-feeding test (PF). We found a higher HR increase in response to the NO test compared to the PF test. Furthermore, our results suggest that in a fearful context (NO) behavioral arousal was  linked to a higher physiological arousal, whereas in a feeding context, the relationship between behavioral and physiological arousal was less pronounced. This is in line with previous studies describing that individuals classified as calmer had higher HR compared to more behaviorally excited individuals when tested for pre-feeding reactivity (horses: Jezierski et al., 1999;Christensen et al., 2005;Lansade et al., 2008;Ellis et al., 2014;cattle: Welp et al., 2004). Based on our analysis, we found rather little behavioral consistency between test repeats and only a limited number of individuals responded similarly across contexts and repeats, which would have been expected when behavioral responses in experimental tests are indicative of temperamental traits. Conversely, average HR was significantly repeatable across both experiments. In particular, in line with our predictions, the horses' physiological responses were more repeatable for the NO experiment compared to the PF test. Similarly, the HR increase resulting from the NO experiment was consistent across repeats, while it was not repeatable for the PF test. Such variation reflected how behavioral responses of horses do not necessarily predict physiological reactions during a NO and PF excitement test. This marked distinction in repeatability of behavioral and physiological responses highlight how current methods of behavioral classification of horse temperament may not be as appropriate as previously thought. In fact, despite the two experiments measuring the horse responses in two different contexts, we would have expected their behavioral responses to be stable at least across time if not across contexts, as the behaviors selected in our research are often used in the assessment of horse personality (Seaman et al., 2002;Lansade et al., 2008;Leiner and Fendt, 2011). Moreover, the disjointed results of the lack of repeatability of the behavior and general consistency of both HR indices of the horses across time suggest that behavior and physiological response may be decoupled and regulated independently. Classical models regarding individual differences in behavior and physiology assume them to be associated with each other to form different coping styles (Koolhaas et al., 1999). However, evidence for independent modulation of the HPA axis (Ferrari et al., 2013;Boulton et al., 2015;Dosmann et al., 2015), the SAM axis (Qu et al., 2018), and behavioral traits have been recently accumulating. For example, Harewood and McGowan (2005) showed how stabling naïve horses resulted in an elevated performance of stress-related behaviors, which did not correlate with changes in HR or salivary cortisol, contradicting previous literature (Goldsmith et al., 1987;Le Scolan et al., 1997;Momozawa et al., 2005;Lansade et al., 2008;Grajfoner et al., 2010).
In our NO experiment, we were able to exclude a possible effect of locomotion on the HR of the subjects, allowing us to link the physiological response to underlying emotional arousal. Conversely, during the PF task individuals lacking a strong behavioral response during the experiment showed a higher HR increase, which indicates that emotional arousal, but not physical activity, accounted for the increase. Effects of emotional arousal on physiological responses have already been identified in other non-human animals. For example, in the study by Wascher et al. (2008), immobile greylag geese (Anser anser) watching aggressive interaction between conspecifics showed a significantly higher increase in HR compared to geese watching non-social interactions. Moreover, an increase in physiological reactivity resulting solely from emotional arousal was also identified in guide dogs (Fallani et al., 2007). Conversely, horses that were classified as high behavioral responders during the PF experiments had and once they were given the feed, despite having similar average HRs to low behaviorally respondent horses. Identifying emotional valence from arousal in different contexts has proved to be challenging, especially in experiments aiming at detecting positive emotional states (Reefmann et al., 2009). High emotional arousal triggers mechanisms of increased attention and energy mobilization to prepare the subject to cope with an adverse situation, facilitating a possible fight-or-flight response (Dawkins, 1998). Conversely, positive states, such as feeding or grazing, tend to show a physiologically lower arousal levels compared to negative states, with some exceptions, e.g., sexual activity. Reefmann et al. (2009) suggest how in sheep, behavioral responses together with physiological ones may aid to identify the valence of emotional states in animals. Therefore, both behavioral and physiological responses are needed for a conclusive assessment of individual emotional reactivity, as we have shown that the relationship between the two is not stable over time or contexts.
In some situations, showing emotional arousal behaviorally may represent an important adaptation in group-living species. While some behaviors may simply result from sympathetic activation, such as defecation (Van Reenen et al., 2005), others can signal important information, e.g., danger, to group members (Špinka, 2012;Maigrot et al., 2017). The social aspect of emotion has been extensively studied in humans (Bastiaansen et al., 2009;Špinka, 2012), with studies showing how arousal can promote information sharing (Berger, 2011). In animals, however, research is still lacking. Pigs (Sus scrofa) have been focus of attention regarding emotionally driven behavior, with studies providing information on how specific pitches of vocalizations not only were related to heightened arousal but were also specific to the negative valence of the emotion felt (Düpjan et al., 2008). Similar findings were shown in horses, with Maigrot et al. (2017) providing further information regarding vocalizations deriving from emotional arousal in both Przewalski's (Equus przewalskii) and domestic horses. Such findings highlight how emotionally driven behaviors can represent reliable indicators of precise emotional/motivational states which can play a key role in group-living species in avoiding danger. As group-living species rely on group coordination to survive, providing information to others about an individual's emotional arousal can allow for a better coordination in avoiding negative aspects of an environment (Spoor and Kelly, 2004). In fact, it is thought that negative emotions related to high arousal, such as fear, ought to spread more quickly than positive ones due to the urgent nature of the signal (Špinka, 2012). The results from our study support such hypothesis. We have showed how the performance of behaviors linked to high arousal varies according to context. In the NO condition, horse behavioral response was linked with their physiological arousal, whereas in the PF experiment, individual emotional arousal did not match an increased performance of stress-related behaviors. Such difference may arise from the perceptually different stimuli, providing evidence that sharing arousal-related information is more likely to happen in a fearful situation to aid conspecific coordination. On the contrary, the conditions and stimuli during the PF experiment may not represent evolutionarily salient stressors, reducing the need of expressing emotional arousal behaviorally. In fact, in the wild horses are less likely to suffer from food shortages compared to the risk of being attacked by a potential predator.
Overall, the results of the present research must be handled cautiously due to the low sample size: only 15 out of 20 horses were tested in both testing conditions (NO and PF) and periods. Furthermore, we did not test for potential effects of sex, age, or breed. To conclude, our study suggests independencies between physiological and behavioral reactions in non-fearful contexts as well as a low repeatability of behavioral, contrarily to physiological, responses in different test conditions over time.
To gain a more conclusive insight into individual differences regarding behavioral and physiological response patterns, we would suggest to combine tests across different experimental contexts. Especially our findings from the PF test indicate that models, such as coping style, which were derived from studies in the context of fear and aggression, might not translate to other contexts.

ETHICS STATEMENT
All applied methods were non-invasive, and the experimental procedure was approved by Anglia Ruskin University's Departmental Research Ethics Panel (Reference Number: A&EB DREP 17-053).

AUTHOR CONTRIBUTIONS
AS, DH, and CW designed the experiments and wrote the manuscript. AS and DH conducted the experiments. AS and CW analyzed the data.