Towards a Positive Welfare Protocol for Cattle: A Critical Review of Indicators and Suggestion of How We Might Proceed

Current animal welfare protocols focus on demonstrating the absence (or at least low levels) of indicators of poor welfare, potentially creating a mismatch between what is expected by society (an assurance of good animal welfare) and what is actually being delivered (an assurance of the absence of welfare problems). This paper explores how far we have come, and what work still needs to be done, if we are to develop a protocol for use on commercial dairy farms where the aim is to demonstrate the presence of positive welfare. Following conceptual considerations around a perceived “ideal” protocol, we propose that a future protocol should be constructed (i) of animal-based measures, (ii) of indicators of affective state, and (iii) be structured according to indicators of short-term emotion, medium-term moods and long-term cumulative assessment of negative and positive experiences of an animal's life until now (in contrast to the current focus on indicators that represent different domains/criteria of welfare). These three conditions imposed the overall structure within which we selected our indicators. The paper includes a critical review of the literature on potential indicators of positive affective states in cattle. Based on evidence about the validity and reliability of the different indicators, we select ear position, play, allogrooming, brush use and QBA as candidate indicators that we suggest could form a prototype positive welfare protocol. We emphasise that this prototype protocol has not been tested in practice and so it is perhaps not the protocol itself that is the main outcome of this paper, but the process of trying to develop it. In a final section of this paper, we reflect on some of the lessons learnt from this exercise and speculate on future perspectives. For example, while we consider we have moved towards a prototype positive welfare protocol for short-term affective states, future research energy should be directed towards valid indicators for the medium and long-term.

Current animal welfare protocols focus on demonstrating the absence (or at least low levels) of indicators of poor welfare, potentially creating a mismatch between what is expected by society (an assurance of good animal welfare) and what is actually being delivered (an assurance of the absence of welfare problems). This paper explores how far we have come, and what work still needs to be done, if we are to develop a protocol for use on commercial dairy farms where the aim is to demonstrate the presence of positive welfare. Following conceptual considerations around a perceived "ideal" protocol, we propose that a future protocol should be constructed (i) of animal-based measures, (ii) of indicators of affective state, and (iii) be structured according to indicators of short-term emotion, medium-term moods and long-term cumulative assessment of negative and positive experiences of an animal's life until now (in contrast to the current focus on indicators that represent different domains/criteria of welfare). These three conditions imposed the overall structure within which we selected our indicators. The paper includes a critical review of the literature on potential indicators of positive affective states in cattle. Based on evidence about the validity and reliability of the different indicators, we select ear position, play, allogrooming, brush use and QBA as candidate indicators that we suggest could form a prototype positive welfare protocol. We emphasise that this prototype protocol has not been tested in practice and so it is perhaps not the protocol itself that is the main outcome of this paper, but the process of trying to develop it. In a final section of this paper, we reflect on some of the lessons learnt from this exercise and speculate on future perspectives. For example, while we consider we have moved towards a prototype positive welfare protocol for short-term affective states, future research energy should be directed towards valid indicators for the medium and long-term.
Keywords: animal welfare, welfare assessment, dairy (cows), positive emotion, animal-based measures, affective states INTRODUCTION Studies of citizens' views of animal welfare have identified that most people think of animal welfare in positive terms (Miele et al., 2011;Spooner et al., 2014). Nevertheless, animal welfare protocols focus on demonstrating the absence (or at least low levels) of indicators of poor welfare (Winckler, 2018). Thus, there is a mismatch between what is expected by society (an assurance of good animal welfare) and what is actually being delivered (an assurance of the absence of welfare problems). This paper explores how far researchers have come, and what work still needs to be done, if we are to develop a protocol for use on commercial dairy farms where the aim is to demonstrate the presence of positive welfare.
It was logical that the first approaches to animal welfare assessment focused on identifying the worst breaches against an animal's welfare. There is clearly a continued need to prevent these types of animal welfare challenges, for example by having and enforcing welfare regulations (Knierim and Pajor, 2018). Nevertheless, the emphasis on negative issues related to animal welfare, rather than on positive ones, has had longterm consequences for animal welfare in general and welfare assessment in particular. Once a threshold of what is acceptable in a given situation has been set, there is often little incentive to go beyond merely reaching that threshold. An example illustrating this is that almost all buildings are stocked at the maximum density allowed by legislation, which can be considered the lowest acceptable level in most countries, rather than what might be considered optimal from an animal welfare point of view.
To counteract this problem, more recently, and particularly in some countries, there has been an increase in the number of quality assurance schemes to satisfy the growing demand for products from animals that have a level of welfare above the minimum level required by regulation (Mench, 2008;Van Dijk et al., 2018). The expectation is that these quality assurance schemes identify the "best" farms for inclusion in their welfarefriendly labels, whereas the reality is that even these schemes work by checking for indicators of poor welfare, such as dirtiness and mortality. The changing terminology to refer to these indicators as cleanliness and liveability reflects an awareness of the need to present a more positive image.
A change towards a more positive image would also likely be beneficial for the farmers. There is a large body of literature on farmers' views on animal welfare in general (see Balzani and Hanlon, 2020 for a review) but much less work related to views on positive welfare (Vigors, 2019). One of the findings is that farmers seek indicators that signal to them that they are doing a good job . Studies in psychology have repeatedly demonstrated that positive feedback (on what is being done well) is received better by people than negative feedback (on problems or things to be improved). Thus one can speculate that changing from focusing on indicators of poor welfare to those of good welfare may be more effective in motivating farmers to have the welfare of their animals assessed as "even better" at the next control rather than "less bad." The extent to which this actually occurs and what it means for continuous improvement of animal welfare, rather than "plateauing out" once the minimum threshold has been achieved, remains to be investigated. Nevertheless, there are already several studies emphasising the importance for the farmer of a positive atmosphere during visits by animal welfare inspectors (e.g., Roe et al., 2011).
There is an increasing number of papers that discuss positive emotions and positive welfare (e.g., Boissy et al., 2007;Yeates and Main, 2008;Lawrence et al., 2019;Webb et al., 2019;Rault et al., 2020) and some that even focus on cattle (e.g., Napolitano et al., 2009;Mattiello et al., 2019). In this paper we aim to take the process one step further with an evaluation of the possible indicators that may be included in a future protocol. We argue that developing assurance schemes that focus more on positive welfare is a necessary next step in the progress of animal welfare assessment. Given the difficulties inherent in this "next step" and uncertainty about what such future protocols might look like, it is important to start that process now. In common language, a protocol is a predefined and precise method for carrying out or reproducing a given experiment or activity. It would be overambitious to try to reach that level of "end product" in a single paper, but we do try to present a form of prototype protocol. Prototypes are a part of the design process and challenge people to validate their concepts by putting an early version of the solution in front of real users. Thus, it is not the prototype protocol itself that is the main outcome of this paper, but the process of trying to develop it. To narrow the task, we focus on cattle, although many of the issues we encounter will apply to other species and we focus on animal-based as opposed to resource-based or management-based indicators. Furthermore, since most research in animal welfare science reflects the hedonic view in line with the assumption that pain is bad and pleasure is good, we focus here on indicators of positive affective states.
The paper is structured into five sections. Firstly, we clarify the terminology we use throughout the paper. In the second section, we discuss conceptually what a "perfect" positive welfare protocol in an "ideal" world might look like. This is followed by our critical review of the literature on potential indicator candidates for our prototype protocol in the third section. In the fourth section we take a step "back to reality, " attempting to come as far as we can with today's knowledge in presenting candidate indicators towards a prototype protocol of positive welfare in cattle. We acknowledge that this first attempt to a protocol, even a prototype one, is very far from a fully validated positive welfare protocol in an ideal world. In a final, fifth section, we reflect on some of the lessons learnt from this exercise and discuss future perspectives.

Terminology
Here we briefly outline how we use some animal welfare terminology. We chose these simplifications in terminology not to distract from the main aim of this paper, which is to contribute towards including indicators of positive welfare in onfarm welfare assessments. We use the term "positive welfare" to refer to the positively-valenced part of the whole scale of animal welfare and the term "negative welfare" to refer to the negativelyvalenced part, based on the understanding that animal welfare (summarised for our purposes as how an individual feels and is experiencing its situation) can range from very poor to very good. We use the term "positive indicator" as something that indicates the presence of a positive affective state. Finally, we use the term "positive protocol" to mean a collection of two or more positive indicators and refer to a protocol designed to be feasible to implement on commercial dairy farms. Such a positive protocol could be used alone or be combined with indicators of negative welfare in a more general overall animal welfare assessment.
A "PERFECT" POSITIVE PROTOCOL IN AN "IDEAL" WORLD-SOME CONCEPTUAL CONSIDERATIONS What Does "Positive" Actually Mean?
Conceptual thinking around the meaning of "positive" in positive welfare is crucial if we aim to develop a positive protocol, but translating these thoughts into practice on farm can be difficult. For simplicity in this paper, we defined positive welfare as the positively-valenced part of the whole scale of animal welfare. However, this implies that we can divide animal welfare into "negative" and "positive" as two distinct categories, while the commonly proposed view is that welfare ranges on a continuum from very negative to very positive welfare. In practice, it is hard to find methods for validating an overall positive state, and so in most studies a comparison is made between one situation with a certain welfare state and another situation that is thought to yield better welfare. Thus, the obvious question is where on the welfare scale negative becomes positive (as opposed to merely less negative) and so at what point we can actually start to talk about a positive welfare protocol.
Another way to consider the meaning of "positive" is within the area of positive affective states. If we take the view proposed by Fraser and Duncan (1998) that natural selection has favoured negative and positive affect as separate processes to solve two different types of motivational problems, then there is no longer a dilemma. Positive and negative emotions are not opposite ends of a one-dimensional scale and "positive" does actually mean positive. Unfortunately, this only applies when referring to a single short-term emotion. As animals can experience several emotions simultaneously, or within a short period, the overall experience at that time is some form of integration of them. In humans, subjective well-being is defined, in part, as people feeling many pleasant and few unpleasant emotions (Diener, 2000). One can speculate that an animal is experiencing positive welfare when it experiences more positive emotions than negative ones [e.g., discussed as "affect balance" in Webb et al. (2019) and as "positive welfare balance" in Rault et al. (2020)], but as yet we have no agreed way to integrate different emotions to determine when the overall experience is on the positive side of the welfare continuum. Thus, we are returned to our original dilemma, although now with the additional knowledge that an indicator of a positive emotion can be observed, even when the overall assessment of animal welfare is poor. Play behaviour in calves can be used as an example to illustrate this. There is converging evidence that play may reflect positive experiences as well as causing them (see later) and it has been shown that calves play more if not food deprived (Jongman et al., 2020). What we can conclude from this study is that it is probable that fed calves experience more positive emotions, but what we do not know is whether or not they actually experience positive welfare.

The One Perfect Indicator… or the Perfect Combination of Indicators
Ideally, we would not need a protocol encompassing several indicators or even to distinguish between indicators of positive or negative welfare at all. In an ideal world, there would be only one valid and reliable indicator that places an individual on a scale from having very poor to very good welfare. This indicator should preferably also be quick and cheap to analyse and feasible to take on all individuals on the dairy farm, thereby eliminating the need to select a representative sample. It would enable us to draw conclusions about the welfare of each individual as well as to assess the farm as a whole. One should be aware that such a single indicator would reflect an integrative measure of both negative and positive welfare rather than a positive only indicator. In this respect it would be different from an iceberg indicator, which is usually used to reflect a welfare outcome that has multifactorial causes in the housing and management of the animal, implying that there are probably other consequences (potentially also positive ones) arising from these same causal factors, that are not being recorded. Having the one perfect indicator is the holy grail of on-farm welfare assessment, but it is likely to remain unobtainable for a considerable time.
A conclusion for the present is that a positive welfare protocol for on-farm welfare assessment is going to consist of a combination of several different indicators ideally complementing each other to better reflect the whole affective experience of the animal. This inevitably leads to discussion of how they complement one another, given that animal welfare is usually considered a multidimensional concept. There has already been considerable discussion, followed by conceptual and practical work, to decide how to aggregate animal-based welfare indicators (albeit mainly negative ones) into an overall welfare assessment, as well as concerns about how to do this in a good way (Botreau et al., 2007;Sandoe et al., 2019). We propose that similar concerns apply whether the indicators are negative or positive.

Positive Indicators Only, or Positive and Negative Indicators Combined in One Protocol?
Although in this section of the paper we are imagining a perfect positive protocol in an ideal world, we have already concluded that for the time being we will need a combination of different indicators. In this case a crucial question is whether an ideal positive protocol should consist only of positive indicators or whether it should also include negative indicators. If one reason underlying the need for a positive protocol is that the absence of indicators of poor welfare does not necessarily imply good welfare (only the absence of poor welfare) then surely the same criticism must be directed here. The absence of indicators of positive welfare does not necessarily imply poor welfare, but if we do not look for indicators of poor welfare, we cannot be sure.
The dilemma lies in part at the level of integration and the time dimension considered. It is possible for an animal to have some negative experiences e.g., a minor twinge of pain occasionally, even if the overall experience is positive. That there is a higher affective ratio of positive to negatively-valenced experiences is suggested by Yeates (2017) to be implicit in the Farm Animal Welfare Council's concept of a "Good Life" (FAWC, 2009). The advantages and disadvantages of integrating indicators of positive welfare into already existing protocols or developing a purely positive protocol are discussed below.
When developing a protocol combining negative and positive indicators, the "best of " the positive indicators are combined with indicators of negative welfare into an overall protocol. A major benefit here is the potential to build on existing knowledge and acceptance. For example, the positive indicators could be included in the Welfare Quality 12th criterion "positive emotional state" (Welfare Quality, 2009), or in the 5th "affective experience" domain in the Five Domains Model, in which potential positive affective states accompanying each of the other four domains, i.e., nutrition, environment, health, and behaviour, are described (Mellor and Beausoleil, 2015). These positive indicators could be reported separately or included in the overall welfare assessment, which ultimately could better reflect the full range from negative to positive welfare. A disadvantage would be that fewer positive indicators could be included if the whole protocol is to remain feasible.
A broader range of positive indicators could be considered if we develop a protocol containing only indicators of positive welfare. One approach would be to reconsider the dimensions that are currently considered important (summarised as covering nutrition, housing/environment, health and appropriate behaviour). We could thus start by considering the option to build at least initially on Welfare Quality and the Five Domains Model, but with only positive indicators in each principle/domain (e.g., Mattiello et al., 2019). Examples could include indicators of positive affective states associated with "good feeding" (e.g., anticipation of tasty food, post-feeding satisfaction), of "good housing/environment" (e.g., fun reflecting a stimulating environment, sense of security), of "good health" (i.e., mainly mental health and well-being) and "appropriate behaviour" (e.g., satisfied motivation). However, such an approach leads to the search for indicators of discrete emotional states rather than more general indicators of positive valence. We believe that a major benefit of a protocol with only positive indicators is the possibility to think outside the current (welfare assessment) "box."

Structuring a Protocol Around the Time Dimension
The focus so far has been on the importance of selecting indicators that reflect the different domains/criteria of animal welfare. Nevertheless, it is critical to reflect on the time periods that should be covered by on-farm welfare assessments. These can range from a snapshot of the animal's affective state at a given moment, to a cumulative assessment of negative and positive experiences of an animal's life until now and all are relevant for on-farm welfare assessments. Most existing protocols consist of indicators for welfare consequences occurring over a range of different time periods (e.g., Welfare Quality, 2009;AssureWel, 2018). That is to say, indicators that reflect short-term welfare consequences (e.g., the pain associated with a fresh wound), those reflecting the long-term (e.g., hunger or disease that led to a poor body condition score) and those where it is unknown (e.g., fear reactions can result from one specific stimulus or an accumulation of different stimuli over time), but the protocol is not structured around these. However, when it comes to a positive welfare protocol, we argue that a structure based on the time dimension is more biologically appropriate, since positive states are likely to be less specific to a domain/criteria. An animal may experience pleasure for any one of many reasons. Furthermore, and regarding how a positive protocol may potentially be used in the future, short-term or mediumterm positive indicators may provide valuable information on the effect of interventions to improve welfare or certain management procedures on the animal's welfare , whereas long-term indicators reflect society's requests to purchase products from animals with an overall good quality of life (Autio et al., 2018). Note that we are not arguing for the structure of protocols addressing negative states to be changed.
In the following section we critically review the research on animal-based indicators of positive welfare for cattle to date, thereby focusing particularly on aspects of validity, reliability and feasibility. We specify whether we propose them as shortterm (emotion), medium-term (mood) or long-term (whole life) indicators.

INDICATORS OF POSITIVE WELFARE
In this section we only include studies of suggested positive indicators in which the relevant measure is compared to a recognised welfare outcome, thus some attempt of validation is done in the study. Ideally, the indicator seen in a situation in which an animal is known to be in a positive affective state should be compared with the indicator in another situation. However, as argued above, studies generally use a comparison between two situations, with one presumed to yield better welfare than the other, without proof that the better situation is positive in an absolute sense. Strictly speaking most of the indicators below are therefore indicators of better welfare rather than necessarily good welfare. However, if one takes the view that some behaviours have become associated with positive emotion in order to promote their performance in certain opportunity situations, then it might be argued that during the time that the actual behaviour is being performed the animal is experiencing positive welfare.
Our current list of potential indicators is sorted with respect to the potential time dimension they reflect, thus including candidate indicators of short-term, medium-term and longterm positive affective states. We include behavioural, (neuro-)physiological as well as cognitive indicators. We have chosen not to include behaviours that are the result of the satisfaction of a physiological need, e.g., drinking when thirsty or feeding when hungry, nor of the possible reinforcing properties of these, e.g., pleasant taste or smell of the feed (for a discussion of these see Mattiello et al., 2019). The sections are structured as a review of their validity, followed by reliability and feasibility. In doing this, we also highlight where more research is needed (mapping of gaps). When there are two or more studies investigating the indicator of interest the findings are summarised in a table. The validity presented in the tables is the one given by the authors of the paper (and in some instances our interpretation of the statements of the authors). We do not address the sampling procedures for each indicator, except when they directly affect the feasibility of the indicator.

Eye White
The amount of visible eye white has been suggested as a measure of arousal with a low amount of visible eye white possibly reflecting low arousal states in dairy cows, including positive ones (Proctor and Carder, 2015a;Gómez et al., 2018). Many of the studies on percentage of eye white in dairy cattle focus on the comparison between negative and neutral states and are thus not included in Table 1 since we focus on indicators of positive states. The results from the studies in Table 1 support the hypothesis that low eye white percentage is associated with low arousal, and in most cases this is interpreted as being a low arousal positive affective state. There does not seem to be any study on low arousal negative affective state however, and so the discriminatory validity of this indicator may be low.
The percentage of eye white shown has been assessed in different ways, either by using a ruler on previously taken pictures on the computer screen and then put into the formula for an ellipse (Sandem et al., 2002;Gómez et al., 2018), or alternatively by assessing the total eye white percentage using an image analysis program (Core et al., 2009). The data for the method used by Core et al. (2009) indicate good to very good repeatability with Pearson coefficients from 0.77 to 0.97. To use eye white it is necessary to photograph the eye of the animal, which reduces the feasibility of the indicator. If it can be done, there is however some first evidence that it may be an indicator for positive low arousal states.

Ear Positions
Ear positions have been suggested to be indicative of various emotional states, as well as serving a communicative function, in cattle (Lambert and Carder, 2019). As can be seen in Table 2, the studies on ear position in cattle have investigated a number of naturally occurring contexts (e.g., feeding, using a brush) compared to ear positions in other contexts, but little experimental work has been carried out. The exceptions to this are the studies comparing ear positions in response to the presence or absence of various forms of stroking (e.g., Proctor and Carder, 2014).
As for many indicators the same behaviour or behavioural expression, in this case ear position, may be caused by different contexts, and therefore possibly different underlying emotions or moods. For example, a number of studies suggest that both ears backwards and hanging ears reflect positive states, but with an important exception. Gleerup et al. (2015) found that cows that were assessed as being in pain (based on a clinical examination) also showed ears backwards and hanging ears.
The level of precision of the description of the ear position varies between articles with e.g., Proctor and Carder (2014) using both photos and a verbal description, whereas e.g., Lee et al. (2018) has a much more cursory description. This makes it hard to know if e.g., the ear position "ears backwards" is the same in all studies. Further research is needed to develop a standardised way of describing the different ear positions.
With 95% agreement between pairs of observers (Proctor and Carder, 2014) and a correlation coefficient of at least rs = 0.92 (Schmied et al., 2008a), inter-observer reliability of assessing ear positions has been shown to be high. Repeatability within observers ranged between Cohen's K = 0.61 (Lange et al., 2020a) and K = 0.78 (Lange et al., 2020b).
Instantaneous observations appear to be feasible (e.g., when arriving at the farm or in certain specific situations such as feeding), but they will be more time-consuming if proportions of different positions over time or transitions are to be assessed. We need also to be aware that ear positions, and especially changes in ear position, are affected by sounds on the farm.
Most of the concerns raised above could be addressed by improved standardisation of when and how ear position is determined. The evidence that the specific ear positions, ears backwards and ears hanging, are associated with positive emotion seems to be strong, once a clinical examination of the animal has excluded that the animal is in pain.

Tail Position
The tail is a body part that is often thought to be affected by the emotional state of an animal (e.g., Reimert et al., 2013;Marcet Rius et al., 2018; but see Reefmann et al., 2009). Three different aspects of the tail are thought to be important in various species; tail position, tail movement and laterality of tail movement. For cattle a raised or tucked tail may indicate fear (Goma et al., 2018;Rizzuto et al., 2020), but as far as we know there is no study investigating the association between tail position and positive emotions in cattle. There are some indications that tail movement, e.g., wagging, may be associated with pleasurable activities (brushing and feeding; de Oliveira and Keeling, 2018). However, since tail movement is related to the fly density care should be taken to control for this (Frantz et al., 2019). As described in more detail below, laterality of a behaviour or a behavioural expression may be associated with positive or negative emotions (Leliveld et al., 2020;Siniscalchi et al., 2021). At present we are not aware of studies on the effect of emotions on the laterality of tail movement in cattle. A complicating factor may be that a proportion of cows have been found to have a strong laterality and so are showing a right or left side preference independent of the situation and thus probably of the emotion experienced (Phillips et al., 2003).
Studies on intra-and inter-observer agreement in cattle have not been carried out so far. In pigs, correlations between two observers was 0.90 and 0.85 for tail movement duration and tail movement frequency, respectively (Marcet Rius et al., 2018). Using 3D cameras allowed to detect hanging tails with 79% accuracy (D'Eath et al., 2018). However, these results are specific to pigs; studies investigating the reliability and feasibility of live and automatic recordings are needed for dairy cattle.
To summarise, based on observations in other species, tail position, movement and/or laterality may be indicators of positive welfare in dairy cattle, but studies assessing the validity, reliability and feasibility of this potential indicator are currently lacking.

Allogrooming, Self-Grooming, and Brushing
Allogrooming in cattle consists mainly of licking movements on the head, neck and shoulder area of the receiver but also the back and rump regions including the tail (Sato et al., 1991;Schmied et al., 2008b;Val-Laillet et al., 2009). Licking is often preceded by a solicitation to be licked whereby the typical posture includes lowering the cheek near the other animal's mouth which can be associated with gentle nudging or pushing the nose or cheek (Sato et al., 1991). Licking is often mentioned as a potential indicator of positive emotions (Knierim and Winckler, 2009), especially for the receiving animal.
The results for self-grooming are contradictory, with studies finding more self-grooming in situations associated with poorer welfare (Kerr and Wood-Gush, 1987;Lv et al., 2018), but also more self-grooming in healthy than in sick animals (Borderas et al., 2008).
Cattle brushes can be thought of as a special example of grooming. Brush use does not affect the level of self-grooming, at least in calves (Horvath and Miller-Cushon, 2019), but may affect the level of allogrooming as observed in feedlot steers (Park et al., 2020). Weaned beef calves will approach and use cattle brushes indicating that they find the use of the brushes enjoyable (Horvath et al., 2020) and dairy cows will even work for access to them (McConnachie et al., 2018). In line with our argument above for not including e.g., feeding for a hungry animal, we do not include studies on brush use if the animal is suffering from ectoparasites, e.g., mange (Moncada et al., 2020).
In conclusion the calming effect of receiving allogrooming is fairly well-documented (Table 3). However, this also means that although the experience of the individual animal is positive, a high level of social conflict, or other adverse conditions, may also be associated with a higher level of grooming as suggested by e.g., Sato et al. (1991) for youngstock and Napolitano et al. (2009) for cows. It should therefore be used with caution, especially if other indicators of poor conditions can be observed. Allogrooming may therefore be a behaviour that makes a receiving animal feel good, but which may occur because the situation is less than optimal (see also Sato et al., 1991). Further research is needed to assess which adverse conditions increase the level of allogrooming.
Allogrooming and self-grooming are rare but conspicuous behaviours which can be easily recognised. Allogrooming in dairy cows was reliably detected by multiple observers during live observations (Kendall's W = 0.96 for 3 observers, Westerath et al., 2009a;ICC = 0.87 de Freslon et al., 2020). A similar agreement has been reported for self-grooming and brush use in calves (Zobel et al., 2017;Horvath and Miller-Cushon, 2019) and steers (Toaff-Rosenstein et al., 2016). However, in the study of Westerath et al. (2009a), too rare occurrence of self-grooming For three of the studies a reduction in heart rate has been described as a relaxation response. While a decrease in heart rate is not necessarily an indication of a positive emotion, the situation in which it has been studied here does make it a valid assumption. B., Beef; D., Dairy; Dual, Dual purpose. in dairy cows did not allow to calculate meaningful measures of reliability, indicating the need for even longer observations to obtain reliable data compared to allogrooming, at least in this age group. Attempts to automatically assess brush use have not been successful so far. In heifers, the accuracy of a radio-frequency identification system for detecting brush contact varied across animals (sensitivity 0.54-1.0; specificity 0.59-0.98), generally overestimating the actual time spent using the brush (Toaff-Rosenstein et al., 2017) thus not allowing reliable recordings. Further developments of sensor technology are needed before automated recording of brush use can be recommended. In summary, there seems to be sufficient evidence that being groomed by a conspecific and self-grooming using a mechanical brush (but not self-grooming without a brush) are associated with a positive state. Potential confounding with skin disorders for brush use would need to be excluded, and the extent to which social tension is a confounder for allogrooming should be explored further or allogrooming excluded if indicators of poor conditions are observed.

Anticipatory Behaviour
According to the anticipatory behaviour theory, the reinforcing properties of a given stimulus are at least partly dependent on the situation of the animal (Spruijt et al., 2001). An animal with good welfare is thought to react less to a given reward than an animal with worse welfare (Spruijt et al., 2001). By giving a signal that predicts a reward it is possible to study the behaviour of the animal while it is anticipating the reward. Sensitivity to housing conditions along the lines predicted by the theory has been shown in several species, e.g., rats (Van der Harst et al., 2003;Makowska and Weary, 2016) and mink (Vinke et al., 2004). A recent study on calves showed differences in the predicted direction when comparing the anticipatory response of calves from basic and enriched housing (Neave et al., 2021), and similar results have been found for adult cattle (Crump et al., 2021) (Table 4). However, care needs to be taken since very bad situations may also lead to reduced anticipatory behaviour due to anhedonia (Lecorps et al., 2019). Moreover, it may be difficult to identify the anticipatory behaviour, see Anderson et al. (2020) for a critical review. In conclusion, anticipatory behaviour is a possible indicator of emotional state that would be comparably easy to automate, but which will need more work, both on validation and methodology, before it can be used.
Few tests for assessing anticipatory behaviour have been used in cattle. Neave et al. (2021) report high inter-and intra-observer reliability (K > 0.90) for behaviour observations before and during anticipatory periods. However, so far it has not been applied in an on-farm setting and feasibility may be low due to the stimulus specificity. The reward value can however be assessed in different ways, and specific test situations during which the reward value of e.g., food can be assessed (e.g., as reaction task during feeding) appear to be more feasible than observations of spontaneous behaviour during daily routine.

Laterality-Differential Eye Use
For a number of species laterality, e.g., differential use of eyes when observing an object, is thought to be affected by the emotions associated with the object (for a review see Leliveld et al., 2013Leliveld et al., , 2020Siniscalchi et al., 2021). In general, animals observe fear evoking stimuli primarily with their left eye, whereas familiar objects are observed with their right eye. This preference has also been found in dairy and beef cattle (Robins and Phillips, 2010;Phillips et al., 2015). While we are not aware of studies that validate a preferential use of the right eye in cattle when in a positive affective state, the results of Kappel et al. (2017) do suggest that dairy cows who readily approach and contact a novel object are preferentially using their right eye. Substantial intra-(K = 0.77) and almost perfect inter-observer agreement (K = 0.94) has been obtained for assessment of visual lateralisation in a feeding motivation test (Franchi et al., 2020).
In summary, while differential eye use when approaching a novel object may be used on farm, it requires isolating the individual animal and careful measuring, something that considerably reduces the feasibility of this measure in practise.

Oxytocin
Oxytocin is a hormone that has been linked to positive social interactions in a number of species (Scatliffe et al., 2019), and has been suggested as a possible physiological candidate for positive emotions also in cattle (Rault et al., 2017). In a study investigating the effect of interactions between cattle and humans, positive contact (talking to the animal in a gentle voice, petting, scratching) did not affect salivary oxytocin concentrations, but some behaviours shown during the interaction, such as neck stretching while being stroked, were positively correlated with oxytocin levels (Lürzel et al., 2020). Blood levels of oxytocin may be confounded by endogenous secretion during e.g., milk letdown.
In cattle, determination of oxytocin in saliva samples, which can be easier and less invasively obtained than blood samples, has been shown to be reliable after extraction (Lürzel et al., 2020). In general, validated laboratory protocols for the determination of blood and saliva constituents are assumed to provide reliable results.

Nasal Temperature
Nasal temperature in cattle is affected by both positive and negative events Carder, 2015b, 2016). The authors conclude that it is the change in valence rather than an absolute measure of the valence that induces the change in temperature. While work in humans have used it as a measure of arousal (Diaz-Piedra et al., 2019), work on primates suggest a more complex picture with differential responses to positive and negative treatments (Kano et al., 2016;Chotard et al., 2018). This indicator needs more studies on its validity as well as on the practical application before it can be suggested as a good candidate for positive emotions.
There are no studies on the reliability of measuring nasal temperature. Standardisation of infrared measurements is needed as they may be confounded by anatomical location, angle and ambient temperature (Proctor and Carder, 2016). While there is the advantage of non-invasiveness, taking accurate measures may be time-consuming.

Heart Rate Variability
While heart rate reflects arousal, heart rate variability (HRV) has been shown to be sensitive also to the emotional valence of the animal in a number of species (von Borell et al., 2007). A decrease in HRV was found in studies of cattle in stressful situations (e.g., Hagen et al., 2005;Kovács et al., 2013;Mandel et al., 2019).
One of the few attempts looking at the relation between positive experiences and HRV in cattle is a study by Lange et al. (2020a), who however failed to find an expected increase in HRV in response to a human stroking the animal. Whereas there is good evidence for HRV being affected by an animal's affective state, most of the evidence is on negative rather than positive affective states.
HRV is affected by factors such as physical activity, posture and diurnal rhythms, thus requiring standardised recording. While recording entire electrocardiograms provides the most reliable data source for analysis of HRV, mobile devices that may be used on farm usually detect the R-peaks and store the interbeat interval (IBI) data only. IBI measurements require thorough editing to identify and correct artefacts (von Borell et al., 2007;Kovács et al., 2013). Currently, non-invasive methods to assess cardiac activity require the animals to wear a belt to fix the electrodes and the heart rate monitor, which requires habituation to handling and wearing of the belt, rendering the indicator not (yet) feasible for on-farm assessments.

Play
Play is a behaviour that is thought to be shown when the most basic needs of an animal are met and is often mentioned as an indicator of positive affective states (Held and Špinka, 2011;Ahloy-Dallaire et al., 2018). A number of papers have studied the effect of increasing the area accessible for play behaviour and have found a rebound effect in calves (e.g., Jensen and Kyhn, 2000). While this indicates that play may be a behavioural need, it does not by itself validate play as an indicator of positive welfare. These rebound studies have therefore not been included in Table 5.
Play behaviour in calves can be reliably recorded in terms of both intra-and inter-observer agreement. Reported correlations  Krohn, 1994 within observers for locomotor play range between 0.85 and 0.98 (Krachun et al., 2010;Rushen and de Passillé, 2012;Mintline et al., 2013) and between 0.82 and 1.00 for agreement between observers (Krachun et al., 2010;Rushen and de Passillé, 2012;Miguel-Pacheco et al., 2015). No such information is available for heifers and cows, most likely due to the rare occurrence of play in these age groups making testing of agreement difficult. Therefore, apart from being specific to an age category, a major drawback is the time it takes to gather enough data (Westerath et al., 2009a). There is, however, the potential for automated assessment with accelerometers, which have been validated (e.g., Luu et al., 2013;Größbacher et al., 2019;Gladden et al., 2020). Since leg data loggers are already used for onfarm welfare assessment, this together with the validation studies renders automatic monitoring of play behaviour a promising positive indicator for dairy calves.

Exploration
Like play, exploration is often thought to be exhibited by an animal that has its primary needs met, and that it thus may be a good candidate for positive affective states (Boissy et al., 2007; but see Inglis et al., 2001 for an alternative view). A distinction is generally made between inquisitive and inspective exploration, where the former is spontaneous and will occur even in a stable well-known environment (related to agency, e.g., Špinka and Wemelsfelder, 2018;Špinka, 2019), whereas the latter is provoked by a change, e.g., the presence of a novel object. While inquisitive exploration thus refers to an animal actively looking for novelty, inspective exploration is done by an animal that is confronted with an ambiguous stimulus. Both types of exploration therefore have the potential to work as indicators for positive affective states ( Table 6).
For inquisitive exploration to be used as a positive indicator, a distinction needs to be made between searching for information and searching for a specific resource, e.g., food or a possibility to escape. In many cases this is evident, but there may be instances in which it is more difficult.
There are two alternative hypotheses for how inspective exploration may change with welfare. According to the first hypothesis, animals with good welfare are thought to explore more (Lecorps et al., 2018). According to the second, animals with poor welfare in a barren environment are thought to explore e.g., a novel object more (Westerath et al., 2009b), in line with research on boredom-like states in mink (Meagher and Mason, 2012;Meagher et al., 2017). However, if animals are anhedonic, exploration may be diminished. For inspective exploration to be used as an indicator of positive emotions it is therefore necessary to exclude the effect of the complexity of the home environment. There is no consensus on whether the reaction to differing novel stimuli (e.g., novel food and novel object) is consistent within individuals (Herskin and Kristensen, 2004;Meagher et al., 2016Meagher et al., , 2017Hirata and Arimoto, 2018). Some researchers regard exploration to a high degree to be a personality trait (Foris et al., 2018;Neave et al., 2018Neave et al., , 2020 and a high level of individual baseline variation may therefore be expected. For inspective exploration to be a good indicator for positive affective states further research is needed on how the nature of the novel stimulus affects the level of exploration. There are indications that the way inquisitive exploration is measured, i.e., duration vs. frequency, affects the way in which it can be interpreted and this needs to be investigated further (Kerr and Wood-Gush, 1987;Krohn, 1994).
There are few studies on the reliability of assessing exploration in cattle. In veal calves, inter-observer reliability for latency to touch a novel object on farm was high (Kendall's W = 0.8; Bokkers et al., 2009). Novel object tests (e.g., Westerath et al., 2009b) can be carried out in a short period of time. However, if the inquisitive component of exploration is of interest, observation of spontaneous behaviour requires more time due to the expected low occurrence making the assessment less feasible. Feasibility may be improved when all types of exploratory behaviours are included (e.g., sniffing and licking equipment or at the ground), as done by (Krohn, 1994).

Qualitative Behavioural Assessment
In qualitative behaviour assessment (QBA) the observer is asked to assess the way in which an animal or a group of animals behave. Focus is on how the behaviour is done rather than on what behaviour is shown (Wemelsfelder et al., 2000a). There are two methods for using QBA, one using adjectives developed by each observer (Free Choice Profiling), and one using a list of adjectives agreed on beforehand (Fixed Terms). The result of the observations is analysed with a procrustes analysis or a PCA, respectively. Typically, two axes are identified. In many cases one axis is interpreted as being associated with valence and the second axis is associated with arousal. There is, however, no theoretical reason for this interpretation to be valid in all cases.
In the list of adjectives of both approaches there are words describing positive as well as negative affective states. Since the positive states are explicit in the method, it has the potential to identify positive emotions and it may even be argued that all QBA studies include an aspect of positive affective states (Fleming et al., 2016). While the method has been employed in a large number of studies on a range of species, it has seldom been used to identify positive affective states in cattle. As far as we know there is only one study on cattle doing this: A positive interaction with a human (provision of feed concentrate) resulted in a significant decrease in avoidance response with a corresponding change in the QBA evaluation from "fear/distress/aversion" towards "relaxation/attraction/trust" (Schmitz et al., 2020).
While a number of studies have been conducted to validate the QBA approach on individual animals (especially in pigs, e.g., Wemelsfelder et al., 2000bWemelsfelder et al., , 2009) less work has been done on group level, and information is currently lacking on how e.g., the impression of one animal affects the impression of the total group. When using Free Choice Profiling (FCP), consistently significant consensus among different observers is achieved in their assessment of behavioural expression (e.g., cattle: Stockman et al., 2011, buffaloes: Napolitano et al., 2012. Work on sows indicates that the inter-observer reliability is similar for both FCP and the use of Fixed Terms (FT; Clark et al., 2016), and this may also be true for the assessment of individual animals (e.g., cows during tactile interactions and release from restraint: Kendall's W = 0.95; Ebinghaus et al., 2016). However, agreement can vary greatly when qualitatively assessing the spontaneous behaviour of groups of cows with Kendall's W ranging between 0.14-0.48 (Bokkers et al., 2012) and 0.56-0.72 (Winckler, 2014). Intraobserver agreement appears to be higher for FCP (rs = 0.95/96; Rousing and Wemelsfelder, 2006) than for FT (rs = 0.56-0.76; Bokkers et al., 2012).
Regarding feasibility, FT assessments using instantaneous observations of 10-20 min (e.g., Welfare Quality, 2009) are comparatively easy to implement, but since time of the day seems to affect the outcomes (Gutmann et al., 2015) it is unclear how many observations spread over the day are needed for a reliable assessment. While FCP might have advantages in terms of validity, implementation on-farm is less feasible as multiple observers (10-15) are needed.

Synchrony
The level of synchrony seen in a herd has been suggested to be associated with the level of welfare, with higher levels of synchrony seen in herds with positive welfare (Napolitano et al., 2009). However, it has also been hypothesised that very barren More on pasture than on bedding than in tie stalls Krohn et al., 1992 B., Beef;D., Dairy. environments may lead to high levels of synchrony (Webster and Hurnik, 1994). The stocking density and space available may also affect the level of synchrony that animals show, not because of differences in welfare but because of physical constraints (Wierenga, 1983). If resources are restricted, high levels of synchrony may lead to competition, which in turn may be a welfare challenge. While synchrony may be a potential indicator for positive welfare care needs to be taken when comparing groups of animals housed in different environments ( Table 7).
The assessment of synchrony, e.g., using instantaneous scan sampling, appears to be feasible on-farm, but the degree of synchrony may vary with time of the day (Stoye et al., 2012). If proportions of the day exceeding a certain synchrony level are of interest, repeated observations are required. There is no information on the reliability of the observations of synchrony, but since it relies on the counting of animals, agreement between observers should be high except for large groups and poorly visible barn areas.

Cognitive Biases
From humans we know that mood can bias cognitive processes, including memory (Bradley et al., 1996), judgements (Wright and Bower, 1992) and attention (Roy et al., 2008), a phenomenon called cognitive bias or, more precisely, affect-induced cognitive bias (Mendl et al., 2009).

Judgement Bias
Judgement bias tasks assess whether an individual judges an ambiguous stimulus rather "optimistically, " i.e., in the expectancy of something positive, or "pessimistically, " i.e., in the expectancy of something negative. Since its first translation to non-human animals in, 2004 (Harding et al., 2004), different judgement bias task designs have been applied to a broad range of species, including studies on calves and dairy cows ( Table 8).
Data on the reliability of coding the outcome measures for judgement bias tasks is scarce. However, the nature of both commonly used outcome measures, i.e., choices and latencies to approach probe trials, suggests that they are reliably recordable. In line with this, a study in goats showed that latency to reach a goal bucket coded by two observers was highly and positively correlated (rs = 0.98; Baciadonna et al., 2016). Training for the judgement bias task is time-consuming. Implementation on farm thus requires automated systems that can be integrated into the animals' home environments. First automated designs exist in rodents (e.g., Jones et al., 2018;Krakenberg et al., 2019), but those have not yet been implemented into the animals' housing environments. In automated systems both choices and latencies would be assessed highly reliably since they can be detected easily by the automated system. Thus, it is within the area of automation for on-farm assessment that more work is needed.

Attention Bias
Positive mood biases attention towards positive stimuli, whereas negative mood biases attention towards negative stimuli. In humans, most research has been on negative attention bias with only a few studies investigating positive attention bias (e.g., Tamir and Robinson, 2007;Grafton et al., 2012). The same holds true for non-human animals, with most studies investigating negative attention bias in sheep (e.g., Lee et al., 2016;Monk et al., 2018;Raoult and Gygax, 2019). Two studies aiming to pharmacologically validate positive and negative attention bias in sheep failed to do so (Monk et al., 2019(Monk et al., , 2020, which suggests that positive attention bias may be difficult to assess in nonhuman animals. To our knowledge, only two studies on attention bias exist in cattle, albeit in negative contexts (see Table 9). Thus, more studies on the validity of this potential indicator in this species and especially for positive welfare are needed before attention bias can be used as an indicator of positive affective states on farm.
There is no reliability data available from the two studies that have been conducted in cattle. Due to the nature of the outcome measures, which include subtle behaviours, observer training for reliable assessment is very likely needed. In contrast to judgement bias, the assessment of attention bias does not require training and would thus be more feasible for on-farm assessments. However, since the outcome measures are mostly different behaviours, scoring, e.g., from video recordings, may be time-consuming if not automated.

Telomere Length
There are many studies in various species that indicate telomere length and especially changes in telomere length are affected by stressors (Bateson, 2016). Interestingly, it seems that the stressors involved in telomere shortening can be both physical and psychological. There are fewer studies showing that telomere length may increase in the absence of stressors or caused by positive events (Hoelzl et al., 2016;Criscuolo et al., 2020). Two major obstacles, apart from the relative lack of studies, are whether it is telomere length or telomere attrition (which would need repeated measures of the same animal) that is the relevant measure, and the strong genetic component. One study has shown that there is a strong genetic component in cattle, and that telomere length differs markedly between individuals already at birth (Ilska-Warner et al., 2019). Attrition is more rapid earlier in life compared to later, and studies have shown associations to age, stage of lactation etc. (Brown et al., 2012;Laubenthal et al., 2016).
Telomere length has the potential to become a good welfare indicator because of its possible ability to be an aggregated measure of animal welfare over the lifetime of an individual. There are, however, still many aspects that have to be clarified before it can be used, not least the effect of positive experiences.
For example, in the study by Seeker et al. (2021) the average telomere attrition calculated over multiple repeated samples of individuals was linked to survival traits. However, to what extent these survival traits are linked to a high quality of life is also uncertain.
Similar to other blood components, the assessment of telomere length following established protocols is assumed to provide reliable results. Depending on the biological specimen (e.g., blood, epithelial swabs), sampling may be more or less invasive. However, in buffaloes nasal swabs used to sample epithelial cells turned out not to be a suitable alternative to blood samples (Seibt et al., 2019).

Hippocampal Biomarkers
It has been suggested that markers of cumulative affective experience of an animal might be found in the hippocampus, the part of the brain involved in learning, memory, and stress regulation . Support for the validity of two macroscopic (the size of the hippocampus and the amount of grey matter in the anterior/ventral hippocampus) and two microscopic (the rate of neurogenesis and the structural characteristics of mature neuronal cell bodies) categories of hippocampal biomarkers is well-reviewed by Poirier et al. (2019). Examples of this support include that hippocampal biomarkers correlate with psychological concepts such as subjective wellbeing in humans, which is close to the concept of cumulative affective experience ( Van't Ent et al., 2017). Importantly, these hippocampal biomarkers have also been found to increase in individuals regularly exposed to events known to induce positive affective states, e.g., voluntary physical activity and mindfulness meditation in humans and with sexual behaviour, voluntary physical activity and cage enrichment in rodents (see Poirier et al., 2019).
As with telomere length, it is not possible to interpret absolute values of any hippocampal biomarker. Nevertheless, relative measures such as changes over time or differences between groups or individuals (when confounding factors like age, breed etc. are accounted for) could be useful. We are not aware of any work in cattle.
Macroscopic hippocampal biomarkers can be measured invivo, allowing repeated measures on the same animals, although they require access to magnetic resonance imaging facilities and are thus not practical for on-farm welfare assessment. Microscopic hippocampal biomarkers are taken post-mortem and do not require expensive equipment in proximity to the animals. In this case, brains could be collected when animals are slaughtered and processed elsewhere.

OUR SELECTION OF PROMISING CANDIDATE INDICATORS
The critical review in the previous section identified no positive indicator that could already be considered to satisfy all three criteria of being validated, reliable and currently feasible, although it did identify some that seem to have potential for further development. So what we present here is not a protocol, but a selection of promising candidate indicators focusing on validity and reliability that may serve as the basis for a positive welfare protocol. We have chosen not to emphasise feasibility in this section since what is considered feasible will depend on the purpose of the assessment and may change over time.
Given the current state of knowledge, we propose the following indicators are the most promising candidates to be recorded on a commercial dairy farm (see Table 10; highlighted in green for both validity and reliability). We suggest that the ear positions "backwards" and "hanging" could be used as an indicator of short-term positive emotion, assuming that we can exclude that they are attributed to pain. We also propose observations of play behaviour as an indicator of positive emotion and mood. If play was observed, then one could assume the absence of pain. If play was not observed, there would need to be additional observations or a clinical evaluation to exclude the possibility of pain as the reason for the backwards or hanging ear positions. Although the evidence is not as strong, additional observations of allogrooming behaviour would further support that the receiving individual is in a positively-valenced affective state as would brushing. However, it would need to be excluded that the reason was something negative e.g., social tension in the group, leading to increased allogrooming, or ectoparasites for brush use. Thus, observations of aggressive interactions and occurrence of, e.g., mange, would also be needed. Confidence that an individual is indeed on the positivelyvalenced side of its affective space could be increased by a Qualitative Behavioural Assessment.
Observations of these five candidate indicators could be carried out on a commercial farm even now, although the fact that they need to be carried out at the level of the individual and some negative indicators are also required, means this would be very time-consuming. We are also aware that the actual measures (e.g., frequency vs. duration of brush use or play behaviour) and details of the observation procedures (e.g., choosing a representative sample, scan vs. continuous sampling, time of the day) necessary even for a prototype protocol still need to be defined. However, here we can build upon earlier experiences of developing indicators of (negative) animal welfare, for example with regard to resting behaviour (Plesch et al., 2010).
We consider judgement bias to be a promising candidate indicator for medium-term affective states (mood) but there are still considerable developments needed regarding automation and integration into the animals' home environments before it is feasible for on-farm use. This is why it is not included in the list of candidates above, despite being highlighted in green for both validity and reliability.

LESSONS LEARNT AND FUTURE PERSPECTIVES
Firstly, we return to our earlier discussion on the inherent problem of whether we have really identified indicators that can help us decide if an animal has positive welfare, given that welfare is a continuum and most studies compare animals in two situations. As we discussed earlier (sections 'What Does 10 | Summary of all indicators with regard to validity, reliability and feasibility sorted by time frame, i.e., short-term, medium-term and long-term, as well as components of affective states ("category"), i.e., behaviour, physiology, and cognition.

Indicator
Time "Positive" Actually Mean?' and 'Indicators of Positive Welfare'), the interpretation is that the welfare of the animal is better in one situation, but we do not know that it is necessarily on the positively-valenced side of the continuum. Indicators of shortterm emotion are potentially least problematic, especially those with strong construct validity, since if observed then the animal is presumably in a positively-valenced state, at least while the behaviour is being performed. However, they have the limitation that they are insufficient to help us decide if the animal is actually experiencing positive welfare at another time (when not performing the behaviour), since we do not know how the animal weighs these positive moments against the negative moments into its overall emotional balance (see Rault et al., 2020 for further discussion). Indicators of medium-term affective states are useful because the animal has already weighed these different moments into its overall mood, but it is in the nature of the currently validated indicators of mood that they are relative. We do not know if the animal is experiencing less negative or more positive welfare. We presumably increase the likelihood of correctly identifying a positive emotional balance, and so correctly locating the animal on the positively-valenced side of the continuum, if we have evidence for a large proportion of time, or a high frequency, of performing short-term indicators of positive emotion. We would also increase the likelihood if we had some indication of the extent of the contribution of negative emotion to this overall balance. Any prototype protocol developed from our list would only exclude those negative affective states that have been associated with our selected positive indicators, not any other negative affective states. Thus, our first lesson learnt is that outwith a "perfect world, " a positive welfare protocol is probably insufficient to detect positive welfare reliably even if indicators of positive welfare are observed. It seems necessary in the "real world" to exclude the possibility that negative welfare may somehow outweigh the presence of positive welfare. In practise this means that any future positive welfare protocol might only be valid on a farm where there is no, or very little, evidence that animals are experiencing negative welfare. Only when this is done, can we conclude that welfare is actually positive.
A second lesson learnt relates to the potential use of a future positive welfare protocol for cattle based on our list of potential candidate indicators. The sensitivity of a measure is the ability to correctly identify the presence of the condition of interest (true positives) and the specificity is the ability to correctly identify the absence of the condition of interest (true negatives). It seems that we may be moving towards a protocol that could be quite sensitive (which is good), but with low specificity (unless combined with indicators of negative welfare). More worrying is the risk that it would be a protocol that is valid only during the actual data gathering, since the majority of measures are shortterm indicators of positive welfare. While still useful to a farmer, this would limit its usefulness for assurance schemes as farms are visited only at intervals.
Regarding future perspectives, our focus is on the paucity of experimental studies upon which to build a body of evidence for, or against, the validity of a particular indicator of positive welfare in dairy cattle. We have tried to make a distinction between evidence against the use of a particular indicator and a general lack of evidence for or against this indicator. However, that this critical review would lead to so few positive indicators, in fact none if we had decided to include whether they were currently feasible, was disappointing. For this reason, under each indicator in the section 'Indicators of Positive Welfare' , we have noted where there are specific gaps and so where additional research is needed. Hopefully this mapping will make it easier for future research to be targeted towards filling in these gaps regarding validity, reliability and feasibility. In particular, we note a lack of construct and convergence validity for many of the indicators.
Admittedly, the focus on affective states adds a level of complication compared to developing clinical indicators (injury scores etc.). The fact that negative emotions are likely to be context specific whereas positive emotions are likely to be more general, as discussed earlier, probably further increases the challenge when identifying indicators of positive welfare. Nevertheless, we suggest useful insights could be gained by evaluating the process by which indicators of negative affective states such as pain and fear were developed. They are still being further developed, but of interest is how the different indicators of these states reached the stage where they could be used in everyday practice. One could even pose the question of whether a "best practices guide" for developing indicators of animal welfare would be useful in helping researchers have a more systematic approach as a scientific community.
In the introduction, we referred to the increasing number of papers that discuss positive emotions and positive welfare, and that we had the aim in this paper to take the process one step further by evaluating possible indicators with view to proposing a list of potential candidates that could be included in a future protocol. We draw the conclusion that feasible indicators of short-term emotion for various reasons seem to have been easier to develop and validate. We acknowledge the work to increase the feasibility of medium-term indicators of mood, and the work on telomere length or hippocampal biomarkers recorded at slaughter is promising as long-term indicators for the more distant future. But the lack of such longer-term positive indicators is the main stumbling block if we want to move towards a positive welfare protocol for cattle within a reasonable time frame.
This lack of longer-term indicators leads us to make two suggestions. We tentatively propose a calculation as an interim long-term indicator, derived from the "experience sampling" method used in research on human subjective well-being (e.g., Diener, 2000). The idea involves taking a snapshot of the positive welfare of individuals at regular intervals, and then inferring from those positive experiences over a longer-time period up to the whole lifetime of the animal. This idea would need to be further developed and investigated in longitudinal studies, but it could give information until the time that animalbased measures of the experiences of the animal over longer periods of time become available. However, clearly best would be to leave it to the individual animal to integrate short and medium-term affective states. For this reason, our final suggestion to support the move towards assessing positive welfare in commercial practise, is to focus on the strategic benefit of directing more research energy into the area of medium and long-term indicators.