- 1Institute of Epidemiology, Friedrich-Loeffler-Institut, Federal Research Institute for Animal Health, Greifswald-Insel Riems, Germany
- 2Animal Behaviour and Welfare Group, Research Institute for Farm Animal Biology (FBN), Dummerstorf, Germany
Good health and welfare are of great importance when rearing calves. Stress can negatively impact health. Changes in a calf’s behavior may indicate stress. To recognize these changes at early stages, continuous monitoring is required. The aim of this study was to evaluate a model of rapid video analysis of calf behavior. The focus was on the ‘standing’ and ‘lying’ behavior, time spent at the feeder or drinker, and the localization of the area in the pen where the animals stayed, including body orientation. The one-stage detection model YOLOv8-Pose was utilized for this objective. It combines object detection and keypoint estimation, and it was adapted to train a new model. Additionally, further programming work was done to determine the duration of each individual event of the various behaviors in the video recordings. The start and end times of the duration were compared with the manually coded observations of the same videos. The trained YOLOv8-pose model showed a high accuracy. It detected and differentiated the postures ‘standing’, ‘lying_prone’ and ‘lying_lateral’ with a mean average precision (mAP) of 96.78%. The mAP was 97.16% in the case of consumption. The mAP for detecting the calves’ location and orientation was 92.38% and 92.66%, respectively. With a recognition time of 5.6 ms per frame, it can be used for real-time monitoring. This model enables simultaneous monitoring of multiple observations and provides comprehensive behavior monitoring. Consequently, it offers a valuable basis for early warning systems and makes a decisive contribution to forward-looking health management in calf rearing.
1 Introduction
Ensuring animal health and welfare is of great importance for successful calf rearing. Calf welfare is a dynamic concept since physiological processes such as growth, hormonal and immune responses can fluctuate (Curtis, 1987; Stull and Reynolds, 2008). In practice, environmental factors such as temperature, humidity, nutritional programs as well was social and behavioral interactions with other calves and staff also have an impact (Jensen and Larsen, 2014; Jurkovich et al., 2024). Depending on the intensity and interaction of the individual factors, this can lead to stress. Stress in calves refers to certain reactions. These reactions occur when calves are exposed to adverse or challenging conditions. These reactions can be physiological or behavioral (van Reenen et al., 2005). This can result in a weakened immune system. The consequences are reduced resistance and increased susceptibility to infectious diseases, coupled with reduced growth rates and impaired welfare in young calves (Stull and Reynolds, 2008; Verdon, 2021). In addition, diseases can have long-term effects on the development and performance of calves (van der Fels-Klerx et al., 2002; Brickell et al., 2009; Duthie et al., 2021). Reducing or avoiding stressors is therefore an important part of disease prevention (Herbut et al., 2021). When combined with the early diagnosis of sick calves, appropriate treatment can be provided (Nikkhah and Alimirzaei, 2023; Duthie et al., 2021). This can reduce negative effects and the risk of infecting other calves and keep calves healthy (Duthie et al., 2021; Hart, 1988; Millman, 2007). Further, external influences such as noise, odors, and group composition can also be stressors, leading to stress reactions. These can manifest as evasive maneuvers, turning away, or jerky movements (Mandel et al., 2016). In all cases, calves affected by stress show changes in their behavior.
Basic calf behavior includes standing, lying, drinking, feeding, ruminating, activity and playing. The proportions of each behavior make up the behavioral budget. The behavioral budget of calves can provide information about their health and welfare (Hänninen et al., 2005; Tapkı et al., 2006; Stull and Reynolds, 2008). Proportional changes in individual behaviors from typical behavioral budget may indicate reduced welfare or illness (Bowen et al., 2021). The first signs appear even before clinical symptoms (Duthie et al., 2021; Hart, 1988; Millman, 2007). These can be indicated by reduced activity (Stull and Reynolds, 2008; Hanzlicek et al., 2010). Another indicator is standing and lying behavior. It results in less and longer periods of lying (Swartz et al., 2016; Duthie et al., 2021). Stull and Reynolds (2008) also observed an increase in lying in lateral position. A further change in behavior can manifest itself in drinking and feeding behavior. Indications are reduced intake, a lower intake rate, and fewer visits to the feeding or drinking place (Duthie et al., 2021; Sutherland et al., 2018; Knauer et al., 2017). These changes can be identified early through daily observations and may indicate illness or welfare problems (Sharma and Koundal, 2018; Dittrich et al., 2019).
Given the importance of detecting early behavioral changes, the continuous and reliable observation of calves becomes essential. Tracking and motion analysis can be useful for quantifying such stress responses (Cangar et al., 2008). This requires continuous monitoring of the calves to identify any changes in behavior. Such monitoring is very time-consuming for the staff and practically impossible (Knauer et al., 2017; Sutherland et al., 2018; McDonagh et al., 2021). One option is automated video monitoring with the use of computer vision. It can be used 24/7 and in real time, is non-invasive and requires little staff time (Knauer et al., 2017; Sutherland et al., 2018). Such automated behavior monitoring can be used as an advantageous early warning system for management to identify changes in the health and welfare status of calves (Kovács et al., 2018).
In this context, various approaches to animal monitoring using object detection were pursued. Most of them include posture recognition [e.g. cows: Bai et al. (2024), pigs: Nasirahmadi et al. (2019), chicken: Wang et al. (2020), red foxes: Schütz et al. (2021; 2022)]. In another study (Yuan et al., 2025), drinking and feeding were added to the behavioral traits using YOLOv8. Further studies have focused on identifying cows based on their coat patterns (Shen et al., 2020), faces (Dac et al., 2022), or muzzles (Kaur et al., 2022). The standing and lying behavior of individual heifers was recognized by Jahn et al. (2025) through the combination of two trained YOLOv4 models, one to detect the posture and one to detect the individual heifer.
Another approach to animal monitoring that has been receiving attention in recent years is keypoint estimation (Gong et al., 2022). Individual body regions are localized. This facilitates the precise determination of body posture and movement sequences. Farahnakian et al. (2024) used keypoint estimation to monitor the birthing process in pigs, with the intention of improving piglet survival and sow welfare. In the context of cattle, keypoint estimation is often used for lameness detection based on walking patterns (Russello et al., 2022; Duan et al., 2025; Jia et al., 2025). Li et al. (2024) used YOLOv8-pose to analyze rumination movements in cattle. YOLOv8-pose is a highly accurate detection model with a convolutional neural network architecture that combines object detection and keypoint estimation. It is also fast and can be used in real time (Cai et al., 2025; Xu et al., 2025).
The aim of the study was to develop a model that enables rapid video analysis of calf behavior in an experimental setting. In addition to posture and time spent at the feeder or drinker, the position of the calves in the pen, including their orientation, were determined. The respective duration of each behavior event was also recorded. This should enable changes in behavioral budget to be detected more quickly. For this purpose, a YOLOv8-pose network was trained, as it combines object detection and keypoint estimation. We expected a mean average precision (mAP) of over 90% for the entire model to reliably recognize all relevant events. Object detection was used to determine standing, lying and consumption behavior. Additionally, it was used to determine the location of the calves and their body orientation, using specific keypoints.
2 Materials and methods
2.1 Experimental setup and execution
The video material employed was recorded during the animal experiment ‘UltraRind’ (authorized by the local authority in Mecklenburg-Western Pomerania (Landesamt für Landwirtschaft, Lebensmittelsicherheit und Fischerei (LALLF) Mecklenburg-Vorpommern), # 7221.3-1-023/22) at the Research Institute for Farm Animal Biology (FBN Dummerstorf, Germany). The experiment was conducted to determine whether calves can perceive ultrasonic sounds produced by farming equipment and whether these sounds are aversive to them. The experiment was conducted in several trials. Each trial lasted 7 weeks, with three experimental weeks (Monday to Friday) separated by two-week intervals. For each experimental week, one pair of calves was individually accommodated in the calf arena. The calf arena measured approximately 10 m in length and 5 m in width, separated length-wise by a fence that allowed for visual, auditory, olfactory and restricted tactile contact between the two calves, but stopped them from interfering with each other’s movements (Figure 1). The experiment was recorded using two AXIS M1135 cameras (Axis Communications, Lund, Sweden) placed centrally at the opposing short sides of the arena. The cameras were installed at a height of 2.65 mwith an inclination angle of 39° (degrees). Compared with a vertical, top-down perspective, this configuration provided visibility of keypoints on the back line and extremities, even when the calves were located at the far end of the pen. Mondays were dedicated to habituation and were not recorded. On the remaining days, playbacks of farm equipment noise (full frequency range, audible range and ultrasound range in pseudo-randomized order) or silence (control) were played back (data not presented here) and video data were recorded before, during and after playbacks.
Figure 1. Construction of the test arena including the feeding and drinking stations and the camera positions. Colored areas show the blind spots of the cameras (blue: camera 1, yellow: camera 2). (Created in BioRender. Jahn (2025) https://BioRender.com/jm4jzus).
2.2 Calves
For this study, a subset of data from 5 of the experimental trials was used, i.e. comprising video data from 10 calves. Calves were between 98 and 138 days old at the beginning of each trial. The pairs of calves remained constant and were continuously accommodated within the same pen, even between experimental weeks.
2.3 Video and image data
The study involved recording three videos totaling 02h25m00s per day. The recordings were taken daily from 8:00-10:25, 10:45-13:10 and 13:30-15:55. In total, the video material used comprised 318 individual videos, with a total data volume of 898,7 GB. Each video was recorded at 1920 pixels (width) x 1080 pixels (height) (Full-HD) and had a frame rate of 30 frames per second. A varying number of single frames were extracted from randomly selected videos to create an image set for model training. The objective was to achieve a high degree of variability, thereby ensuring the inclusion of a wide range of scenarios within the image set.
2.4 Image preparation/image set
The open source data labeling platform Label Studio (HumanSignal, Inc. San Francisco, California, USA) was used for labeling. Each image was annotated with bounding boxes delineating the three classes ‘standing’, ‘lying_prone’, ‘lying_lateral’ in the posture category. In addition to these, the two classes ‘drinking’ and ‘feeding’ in the consumption category were distinguished. It was not always possible to clearly see in the video whether the subjects were actually ‘drinking’ or ‘feeding’; hence, we used ‘muzzle in the drinker’ and ‘head through feeding fence’ as proxy measures. For ease of reading, the behaviors will be called ‘drinking’ and ‘feeding’ throughout the manuscript. The use of these proxy indicators may result in a certain risk of false positive (FP) classifications in direct behavioral analyses. For this study, the definitions were used to compare two methods. The definitions of the five classes are shown in Table 1.
Table 1. Definitions of the individual classes for the categories posture and consumption in the YOLOv8 pose model.
The entire image set comprised 2,223 labeled images. The dataset was partitioned such that 80% constituted the training set and 20% the validation set. Table 2 presents the distribution of the individual classes in the training set and the validation set. Two labels were annotated to each image, since at least two calves were visible in each one. Calves that were ‘feeding’ or ‘drinking’ were given an additional consumption label.
Table 2. Splitting of the image set into a training set and a validation set in the ratio of 80% to 20%.
No class balancing, loss weighting or class-specific sampling was applied during training. Although YOLOv8-pose uses standard data augmentations, such as horizontal flip, small rotations and translations, and brightness and color adjustments, to improve model generalization. However, these augmentations do not correct for class imbalance. In this dataset, ‘standing’ occurred more frequently because calves typically stand while ‘drinking’ or ‘feeding’. ‘Lying_lateral’ and ‘drinking’ did not occur as frequently as the other classes in the videos, making it challenging to select images with high variability for the two classes. Consequently, the observed class imbalance should be considered when interpreting model performance.
The location of each calf per image was determined by drawing a bounding box rectangle. Its body posture was classified according to the classes ‘standing’, ‘lying_prone’ or ‘lying_lateral’. Furthermore, different body parts were labeled with keypoints according to Figure 2 if they were visible in the image; a maximum of 30 keypoints were possible. The keypoints were assigned to the corresponding posture bounding box. If a calf was being observed to be either ‘drinking’ or ‘feeding’, an additional bounding box was drawn and labeled with the corresponding class of the consumption category. The bounding boxes of the consumption category did not contain keypoints with the intention of avoiding duplications.
Figure 2. Arrangement of keypoints on the calf. Keypoints that are hidden due to their position have dashed outlines. (Created in BioRender. https://BioRender.com/jm4jzus).
The distribution of the keypoints across the dataset is illustrated in Supplementary Table 1. Furthermore, the table lists the percentage of labeled keypoints to actual possible keypoints in the image set. The distribution was imbalanced, with some keypoints rarely annotated. To address this, only seven keypoints relevant to this study were considered in subsequent analyses. Specifically, the front hooves were used to determine the calves’ location within the pen, and the keypoints ‘whithers’, ‘back1’, ‘back2’, ‘back3’ and ‘tail_base’ were used to determine the orientation in the pen. Model accuracy was recalculated based on this subset to minimize the impact of imbalanced keypoint distribution on performance.
2.5 Model training and evaluation of the model performance
In this study, the neural network YOLOv8s-pose (Version: YOLOv8.0.196, Ultralytics Inc., Frederick, MD, USA) was used for the training of a model. YOLOv8-pose is a one-stage algorithm, is based on the YOLOv8 object detection model, integrating keypoint detection functionality (Wang et al., 2024). The architecture consists of a backbone, a neck and a head and does not require a separate region proposal step. This means that the algorithm operates in a forward pass in accordance with the one-stage principle, thereby enabling its utilization for real-time analyses (Li et al., 2024). Object detection and keypoint estimation are performed in a network pass directly from the image (Jocher et al., 2023). The small model (YOLOv8s-pose) was selected because it offers a favorable balance between accuracy and computational efficiency. The nano model (YOLOv8n-pose) is generally less robust in feature extraction and pose estimation, while larger YOLOv8-pose models require substantially more computing resources without providing clear advantages for this application (Ultralytics, 2023). Therefore, the small model (YOLOv8s-pose) was chosen as a compromise between robustness and resource efficiency. The parameters of the model training are shown in Table 3.
The evaluation of the model performance is separated into bounding boxes and keypoints. The following parameters were needed to select the best model.
2.5.1 Bounding boxes
The Intersection of Union (IoU) is the quotient of the area of overlap between the predicted and the ground truth bounding box and their union area (Equation 1). The resulting ratio determines the level of recognition of an object by the model (Everingham et al., 2010).
The threshold value of 0.5 was chosen, as it is a widely used standard that balances sufficient overlap with tolerance for natural variation in animal posture and annotation (Scaillierez et al., 2024; Cai et al., 2025; Widyadara and Mulya, 2025). If , it was a true positive (TP) prediction. It was a FP prediction if . If , there was no overlap and the prediction was considered false negative (FN) (Chen et al., 2022). The accuracy of the YOLOv8-pose model’s object detection was evaluated using TP, FP and FN.
Precision und recall were calculated to evaluate the accuracy of the YOLOv8-pose model. The precision (Equation 2) indicates how often the model correctly recognizes an object. It is described by the ratio of TP predictions and the sum of TP and FP predictions (Gong et al., 2022).
Recall (Equation 3) determines whether the model has identified every object to be recognized. It is calculated from the quotient of TP and the sum of TP and FN predictions (Chen et al., 2022). The sum of TP and FN corresponds to the number of labeled objects in the validation set.
The precision-recall curve is calculated for each class. The area under this curve corresponds to the average precision (AP) and summarizes the precision recall performance for each class of the model (Equation 4).
In this study, the model has more than one class. To evaluate it, the mAP was used and the average value of the APs of all classes () was calculated (Equation 5) (Chen et al., 2022).
2.5.2 Keypoints
Object keypoint similarity (OKS) is a metric for evaluating the keypoint estimation. The idea of IoU loss is extended from bounding boxes to keypoints and is calculated between ground truth and predicted keypoints (Gong et al., 2022; Li et al., 2024). The OKS is calculated separately for each keypoint and added up to obtain the final OKS (Gong et al., 2022; Maji et al., 2022; Su et al., 2024) (Equation 6).
with:
: Euclidean distance between the predicted and the ground truth keypoint. is the ID of the keypoint.
: surface area of the target bounding box.
is the ID of the bounding box : specific weight of the nth keypoint.
: visibility flag for each keypoint.
In addition, the AP50 is also considered for the complete images when the OKS threshold is 0.5 (Equation 7). A keypoint is recognized as correct if the OKS value is at least 0.5.
is the mAP for the keypoints with a threshold of 0.5 and is calculated with Equation 8.
The YOLOv8-pose model was trained and validated on a Linux Red Hat 8.0 operating system with an AMD EPYC 74F3 24-Core Processor with a 3.2 GHz CPU base-clock, 528 GB RAM and 4 NVIDIA A100-SXM4-80GB with 320 GB video RAM in total (Nvidia, Santa Clara, CA, USA). The algorithm was developed using Jupyter notebook (version:7.3.3) (Kluyver et al., 2016) and Python 3.11.5 (van Rossum and Drake, 2014).
2.6 Validation of the trained YOLOv8-pose model
2.6.1 Manual method
To obtain gold-standard data for validating the YOLOv8-pose model, employees from the FBN were invited to watch the videos in their entirety. All instances of the defined observations, namely ‘standing’, ‘lying_prone’, ‘lying_lateral’, ‘drinking’ and ‘feeding’, the area as a place of stay (‘area1’, ‘area2’, ‘area3’, ‘area4’ and ‘area5’) and the orientation in the pen (‘sound source’, ‘observation room’, ‘against sound source’ and ‘open field’) was documented per calf for each video using the software ‘The Observer® XT’ (Noldus Information Technology BV, Wageningen, the Netherlands). The resulting observations and their corresponding durations were then used for the validation process.
2.6.2 Automated method
The trained YOLOv8-pose model was applied to the videos of two days (three videos per day). For each video run, the YOLOv8-pose model calculates the coordinates and confidences of the bounding boxes (class, xmin, ymin, xmax, ymax, confbb) and of the 30 keypoints (xkp, ykp, confkp) for each frame. In this study, every fifth frame (six frames per second) was considered. The duration of 02:25:00 resulted in approximately 52,200 analyzed frames per video. The bounding box and keypoint coordinates determined for both recorded videos of a scenario were combined through further programming steps. This made it possible to avoid non-detections due to blind spots or obstructions.
The data cleaning was performed for each analyzed frame. Therefore, the calves were assigned to an identifier based on their location within calf arena. The calf in the pen next to the observation room got the id ‘calf1’ and the other one was allocated the identifier ‘calf2’. Subsequently, the bounding boxes were allocated to the corresponding id. In instances where multiple bounding boxes were estimated for a given calf based on its posture or consumption, the one with the highest confidence in the corresponding category was selected.
The y-coordinates of each front hoof’s keypoints (‘hoof_front_left’, ‘hoof_front_right’) were used to determine the calves’ location. If both front hooves were in the same area, the assignment was considered definitiv. Otherwise, the keypoint of the front hoof (left or right) with the higher confidence was selected to determine the area. The area limits were determined using the pixels of the image height (y-axis) (Supplementary Table 2). In instances where no front hoof was detected at the start of the lying position, the last determined area was assigned to the entire lying period.
The keypoint coordinates of the calf’s back line were used to determine the orientation within the pen. Therefore, the keypoints ‘withers’, ‘back1’, ‘back2’, ‘back3’ and ‘tail_base’ were used. An average line was calculated from the keypoints above. The ‘withers’ constituted the initial keypoint of the line, thereby defining the calf’s viewing direction. The line angle was used to determine the calf’s orientation within the pen. Figure 3 depicts the floor plan of the calf arena, along with the possible orientations and their corresponding boundary angles.
Figure 3. Schematic ground plan of the calf arena. The marking points (P) were used to transform the coordinates and angles into the image perspectives of the cameras. The red dotted lines delimit the orientation areas, indicating the boundary angles. The colored areas correspond to the respective orientation. The measuring points (A) were calculated using the transformation rules.
The coordinates were transformed according to the camera perspectives and the angles were recalculated (Supplementary Table 3).
The YOLOv8-pose model was applied to both videos per period because the monitoring was carried out by two cameras located opposite to each other. The observations were summarized for each analyzed frame and each calf. If both cameras recorded the same observation, it was adopted. In instances where the observations differed, the following procedure was implemented. The principle of highest confidence was once again applied for the posture category and the calves’ location. In the consumption category, an observation was only detected if both cameras captured it. In the case of orientation in the pen, both were adopted if the orientation was different.
Subsequently, the start time, the end time and the duration of the observation was determined. Observations in the posture category or consumption category was only counted if it was at least 3 seconds long. The minimum duration of an observation was set at two seconds for the calves’ location and one second for orientation. Observations of shorter duration were assigned to their category according to the duration of the previous observation. A further condition was considered with regard to the area. The calves’ location does not change when the calf is lying down. The first recorded calves’ location of a lying period was assigned for the entire duration.
Finally, a comparative analysis was conducted between the manually and the automatically recorded observations. Based on the start and stop times of the manual and automated recording, the temporal overlap and the temporal difference of each observation were calculated. The temporal overlaps were the periods that were recognized identically by both methods. The temporal differences described the discrepancy in the detection of both methods. A negative difference indicates that the automated method detected fewer elements than the manual method. Conversely, the presence of positive differences suggests that the automated method may have detected an excessive number of false positives. Subsequently, the durations, overlaps and differences determined included both calves and were cumulated per day.
3 Results
3.1 Model training and evaluation of the model performance
The trained YOLOv8 pose model achieved a mAP of 96.37% for bounding box detection with an IoU of 91.88%. It was evaluated on a data set with five classes (‘standing’, ‘lying_prone’, ‘lying_lateral’, ‘drinking’ and ‘feeding’). The class-specific AP values were 99.32% for ‘standing’, 98.03% for ‘lying_prone’, 97.26% for ‘lying_lateral’, 88.04% for ‘drinking’ and 99.22% for ‘feeding’. The posture classes had an IoU of approximately 95% (‘standing’ and ‘lying_prone’, ‘lying_lateral’). For the consumption classes (‘drinking’ and ‘feeding’) the IoU was lower. ‘Drinking’ had the lowest precision at 89.13%. The precision of the other classes was over 97.00%. The recall was 96,49% for all classes. All results were obtained using the default IoU threshold of 0.5.
Overall, the YOLOv8-pose model achieved a mAP of 96.37%, a precision of 96.49% and a recall of 98.94% with regard to the bonding boxes (Table 4).
Using the OKS with a threshold value of 0.50, evaluation of the keypoint detection performance of the YOLOv8-pose model achieved an mAP50 of 67.38% for the entire model across all 30 keypoints. High mAP50 values were obtained for the back line from the horn_base to the tail_base, both ears, and both hips. Lower mAP50 values were obtained for the elbows, knees, and hooves, ranging between 72.4% and 79.3%. Very low AP50 values of less than 20.0% occurred at all four ankle_joints and both shoulders.
For this study, only the back line keypoints from the withers to the tail_base and the two front hooves were used. Reducing the model to these seven keypoints resulted in an mAP50 of 90.31% (Supplementary Table 4).
The bounding box and keypoint detections in the analyzed video look as shown in Figure 4.
Figure 4. Examples of the classes and keypoints detected when using the YOLOv8-pose model. Each class and keypoint is assigned a different color. (A–E) show the individual classes detected by the model. A and D: Drinking and eating are always detected in combination with standing.
The Yolov8-pose model has an average recognition time of 5.60 ms per frame (min: 5.37 ms, max: 5.97 ms).
3.2 Validation of the trained YOLOv8-pose model
3.2.1 Posture
The overlap between the manual and automated method was 99.85% for the ‘standing’ posture on day1 and 99.80% on day2. For the lying posture, a differentiation was made between the ‘lying_prone’ and ‘lying_lateral’. These postures showed differences on both days. The difference was negative for lying prone and positive for lying lateral. The amounts were similar, almost balancing each other out in total. The ‘lying prone’ was detected at 96.30% on day1 and 96.40% on day2. The overlap in the ‘lying lateral’ was 79.12% on day 1 and 68.49% on day 2.
Per day and calf, the YOLOv8-pose model was able to detect the posture for 07:14:35 of the daily video duration of 07:15:00. No posture was detected for 7 seconds. The overlap of both methods corresponded to 07:01:22 for posture. On average, 96.87% of the three postures could be detected by the YOLOv8-pose model according to the manual method (Table 5).
3.2.2 Consumption
Table 6 shows the results of consumption with the classes ‘drinking’ and ‘feeding’. On day1 and day2, the YOLOv8-pose model detected an overlap of 74.35% and 74.11% of the manually detected ‘drinking’ periods, respectively. For the ‘feeding’ periods, the overlap was 99.42% and 99.18% respectively. On average, the classes of consumption per day and calf were recognized by the YOLOv8-pose model in 97.16% of the manually recorded observations. In the case of ‘feeding’, the model recognized a 39 second longer ‘feeding’ duration.
3.2.3 Calves’ location
On day2, there were large differences between the two methods. ‘Area1’ was recognized more frequently by the YOLOv8-pose model. The difference between the manual and automated method was 01:47:15. ‘Area2’ was recognized more frequently by the manual method. In this case, the YOLOv8-pose model detected only 16.80% of the manual observations. The difference between the two methods was -01:47:22. In contrast, ‘area1’ was detected for a duration of 01:47:15. On average per day and calf, both methods recognized the same areas at 06:41:51. This corresponds to 92.38% of the total daily video duration of 07:15:00. The YOLOv8-pose model was unable to detect an area for 12 seconds (Table 7).
As demonstrated in Figure 5, the scenario is shown from both perspectives. In the setting of camera 1 (A), the hooves of ‘calf2’ were not detected. In contrast, Camera 2 (B) has detected the hooves. This enables the determination of the calves’ location.
Figure 5. The same scenario from opposite perspectives. The calves’ location is also assigned if only one camera detects the keypoints of the front hooves. (A) Camera 1 - Front hooves of ‘calf2’ are covered by its body and were not detected. (B) Camera 2 - Front hooves of both calves were detected.
Figure 6 shows a graphical comparison between manual observation and the YOLOv8-pose model for the categories posture, consumption and calves’ location. The manually and automatically determined observations of one calf from a video were compared here.
Figure 6. Graphical representation of the postures, consumption and stay in the areas of a calf during a complete video sequence.
3.2.4 Orientation in the pen
The orientation of the calves in their pens was measured, and an average of 92.66% of the manually recorded times per calf and day could be determined using the automated method. The orientation to the ‘observation room’ exhibited the highest values days 1 and 2, reaching 99.29% and 99.24%, respectively. On day 2, a negative difference of -00:25:16 was observed for the orientation towards the ‘sound source’. In this case, there was 68.69% agreement between the two methods, whereas on day1 it reached 92.10%. For the orientation toward the ‘open field’ and opposite the ‘sound source’, the automated method achieved over 90.00% on both days (Table 8).
Figure 7 shows the average line of the back line and the corresponding angle from both camera perspectives. The orientation was determined based on the angle. ‘Calf1’ is aligned with the ‘open field’ and ‘calf2’ was facing ‘against sound source’.
Figure 7. The same scenario from opposite perspectives. The orientation of the back line calculated bei the camera detects the keypoints of the front hooves. (A) Camera 1. (B) Camera 2.
The mAP of all observations achieved 94.77% for the comprehensive Yolov8-pose model.
4 Discussion
In this study, a YOLOv8-pose model was trained and validated to determine the ‘drinking’ and ‘feeding’ phases of calves, as well as their body posture. Additionally, it can determine their location and orientation within the pen. Some limitations should be mentioned due to its high accuracy.
4.1 Data
The model’s accuracy achievement regarding the keypoints is mAP50 of 67.38%. This suboptimal performance was primarily due to the inadequate APs of the ankles, shoulders, muzzle and eyes (Supplementary Table 4). In the present study only the following keypoints were considered for further analysis: ‘withers’, ‘back1’, ‘back2’, ‘back3’, ‘tail_base’, ‘hoof_front_left’, and ‘hoof_front_right’. Accordingly, the mAP50 was recalculated based on these seven keypoints, resulting in a substantially higher accuracy of 90.31%, further demonstrating the model’s precision for this reduced configuration. Because several relevant joints – particularly shoulders and ankles – showed low detection accuracy in the full 30-keypoint configuration, the model is currently unsuitable for comprehensive motion analysis. In contrast, the reduced seven-keypoint configuration yielded robust detection performance, indicating that the underlying model architecture and dataset are fundamentally suitable for further optimization. The keypoint estimation by Gong et al. (2022) was based on a combination of YOLOv4 with its own keypoint extraction network for 16 keypoints. The model, trained on 1,800 labeled images, achieved an accuracy of only 85% under daylight conditions. The keypoints of the legs achieved an AP50 of over 90%, while the AP50 of the other keypoints ranged between 51.5% and 83%. Comparing these results with the present study indicates that YOLOv8-pose achieves higher accuracy for the keypoints relevant here, although both models show considerable variation in the AP50 of the individual keypoints. An alternative approach was adopted by Peng et al. (2024). The YOLOv8-pose model was trained using seven keypoints, achieving a total accuracy of 92.4%. This is likely attributable to the use of a larger image set (2985 images) combined with a reduced number of keypoints. In addition, the previous study by Peng et al. (2024) exclusively focused on the side view of cows, whereas the calves in the present study were recorded at an array of angles and at varying distances from the camera. This finding suggests that the image set utilized in this study should be expanded and that future work should prioritize annotations for the underperforming keypoints, thereby enabling accurate detection of all 30 keypoints and supporting comprehensive locomotion assessment. In the selection of images, priority should be given to keypoints with a low AP50. The maximum number of labels that the annotator could be assigned to each keypoint in the image set used in this study was 4,446. It is noteworthy that a minimum of 70% of the total potential number of labels were assigned to keypoints with a high degree of accuracy. This corresponds to approximately 3,100 labels per keypoint in the image set. This should be considered when expanding the image set for further investigations.
It should be noted that an assessment of inter-annotator reliability was not possible for the present study. Video-based behavioral observations and image annotations for model training were carried out independently by different annotators. However, temporally aligning these data proved impossible due to the absence of timestamps in the camera 2 screenshots. Consequently, calculating metrics such as Cohen’s kappa was infeasible. Additionally, the operational definitions for ‘drinking’ and ‘feeding’ were based on visible proxies, which may have introduced ambiguity. Future studies should implement synchronized annotation workflows and conduct systematic reliability assessments to ensure maximum consistency and objectivity.
4.2 Model
The trained YOLOv8-pose model demonstrates a high level of accuracy in object detection, as evidenced by a mAP of 96.37% across the five classes ‘standing’, ‘lying_prone’, ‘lying_lateral’, ‘drinking’ and ‘feeding’. In comparison, the model demonstrated a higher degree of accuracy than the studies conducted by McDonagh et al. (2021) and Yuan et al. (2025). McDonagh et al. (2021) used a ResNet50 architecture for object detection to recognize seven classes (accuracies: ‘standing’ = 84%, ‘walking’ = 80%, ‘shuffle’ = 80%, ‘contractions’ = 83%, ‘lying’ = 90%, ‘drinking’ = 93%, ‘feeding’ = 95%). The accuracy of the results was attributed to potential sources of error, including unfavorable lighting conditions and obstruction by other animals. In a study conducted by Yuan et al. (2025), a convolutional neural network with a reduced number of layers from the YOLOv8 family was employed. Their model also included five classes (APs: ‘standing’ = 94.0%, ‘walking’ = 97.7%, ‘lying’ = 79.0%, ‘drinking’ = 78.7%, ‘feeding’ = 91.5%). The accuracy of nighttime images was reported to be lower (although it should be noted that these were not used in the present study).
Based on this overall high performance in object detection, the model also achieved high accuracy in recognizing different body postures. Nevertheless, a minor divergence of approximately eight minutes (3%) was observed between the manual and automated methods for lying postures during validation. This bias may have been introduced during video observation and subsequent image labeling by different annotators. Despite the definition of ‘lying_prone’ and ‘lying_lateral’, these postures cannot always be determined unequivocally, and the observed discrepancy may reflect subjective misinterpretation. In the absence of any differentiation between the lying positions, the discrepancy would be reduced to only two seconds per calf, based on the average lying time during the daily observation period. Subsequently, the level of agreement between the manual method and the automated method for the lying position corresponds to 95.68%. Vvan Erp-van der Kooij et al. (2019) reported on the difficulties and sources of error in interpretating lying positions. Short-term occlusions or subtle movements can further hinder reliable classification. For this reason, annotators were thoroughly trained in advance to minimize misinterpretations. For the present study, the definition for ‘lying_lateral’ should have been formulated more clearly, for example by specifying that the head must also be laid down. In this case, the image set can be adjusted or expanded through the incorporation of additional subclasses for the purpose of retraining. Conversely, re-examining all videos necessitates a substantially greater investment of time.
Expanding beyond postures, the model occasionally struggled to recognize the act of ‘drinking’. The experimental design incorporated a condition wherein both cameras were required to detect the calf as ‘drinking’. In instances where the calf in one of the two videos obscured the drinker, the animal was not identified as ‘drinking’. This limitation is reflected in the consistent underestimation of ‘drinking’ periods (accuracy of ~74%) and is relevant for practical applications, as reduced ‘drinking’ is an early indicator of dehydration or illness (Lowe et al., 2019). No alternative confidence thresholds were tested in this study. The pattern of underestimation suggests that the primary constraint is limited visibility of the drinker rather than the model’s decision threshold. Improving camera placement and expanding the training dataset with additional ‘drinking’ postures may therefore be necessary to enhance detection accuracy and ensure reliable early warning of health-related changes.
A discrepancy was also observed in the classification of ‘feeding’. The feeding fence had two potential feeding locations, but the trough was only accessible via one of the two openings. In the manual method, only one feeding place was evaluated (where the trough was placed). The YOLOv8-pose model did not differentiate between the feeding places.
The YOLOv8 pose model here can also be used to locate calves in the pen. The arena was subdivided into five sections. The front hooves are in direct contact with the ground. Consequently, the impact of image perspective is negligible. Localization could be easily determined using their y-coordinates. It is evident that other body parts including but not limited to the muzzle, horn base, and withers, are located at a variable distance from the ground. Localization would require additional calculations to take the influence of image perspective into account. Therefore, the front hooves of the calves were utilized to identify their location. A total of 92.38% of the calves’ locations were recorded identically with both methods. It is important to note that failure to recognize the calves’ location may occur in cases where the calf is lying at the beginning of a video and its hooves are obscured from view. In such cases, no area can be allocated for the designated lying period. This issue also explains the very low accuracy for ‘area2’ on day2 (16.8%), where the YOLOv8 pose model incorrectly classified 01:47:15 of the actual duration of ‘area2’ as ‘area1’. The misclassification occurred in ‘calf2’, while it was lying down, when the front hooves – the primary reference for area assignment – were not visible. If lying down began near the boundary between two areas, the model assigned the last recognized area while the calf was standing to the entire lying phase. The solution to these issues can be found by considering the shoulder keypoints in the lying position. In an upright position, the shoulder is positioned over the hoof and is therefore an alternative method of localization. However, this requires expanding the existing image set and retraining the YOLOv8 pose model. In individual cases, the location of lying calves was incorrectly determined, usually at the border of two areas.
The orientation of the calves’ back line was also determined. In this case as well, a high level of accuracy was achieved, with a percentage of 92.66%. It should be noted that this level of accuracy reflects agreement with manually assigned orientation classes rather than a continuous angular measurement, as the manual annotations do not include ground-truth angles. Consequently, a direct calculation of continuous angular error metrics (e.g., MAE) was not possible. Future studies including explicit angle annotations will enable a more detailed assessment of angular accuracy.
4.3 Experimental setup
Two cameras were utilized to observe each time period and recorded the same scenery from opposite directions. This approach circumvented the issues of blind spots and occlusions that were previously observed by Jahn et al. (2025), with the exception of ‘drinking’.
In addition to the model limitations regarding localization, the experimental design may also be a reason for incorrect determinations of the calves’ location. In the manual method, the boundaries of these areas were determined by careful observation of the calves during video observation. For the automated method, the area boundaries were defined using the pixels on the y-axis of the image. The respective y-coordinates of the hooves were utilized to delineate the area. Nevertheless, the determination of the calves’ location in this study was highly accurate. While the present study focused on a pixel-based approach using hoof coordinates, modern alternatives could further improve robustness. Techniques such as homography correction can compensate for camera angle and ground-plane distortion, enabling more consistent location mapping across the field of view. Likewise, multi-view camera calibration could be applied to fuse information from both cameras, create a common coordinate system, and thereby minimize perspective-related ambiguities. These methods offer potential avenues for enhancing localization performance in future work. In automated approaches, the location of cows and calves is determined either by sensor systems or by the use of computer vision algorithms. Bloch and Pastell (2020) used a sensor system that locates via Bluetooth Low Energy technology with an accuracy of up to 2 m. The ultrawide-band system from D’Urso et al. (2023) exhibited an 1 m deviation. Accuracies with a deviation of 1–2 m may be sufficient in large free-range barns. However, regarding the use of cubicles or the determining the distance between individual cows require considerably higher precision. In the present study, the sensor systems also exhibited a lack of precision for the calf arena (approximately 5 m x 10 m) experiments Consequently, the pen was divided into five areas. Localization via the front hooves is point-based. The delineation into areas illustrates the efficacy of localization strategies in this context. The use of a grid with quadrants is also a viable option. However, in this case, the x and y coordinates of the front hoof keypoints must be used for localization. The size of the areas or quadrants can be determined individually. However, the model’s accuracy must be improved when the areas or quadrants selected are of a reduced size. In the context of computer vision, object detection is a predominant application in most cases, with its primary use being to identify the presence of objects. Previous studies (Mar et al., 2023; Yamamoto et al., 2025) used the center of the bounding boxes for localization. Mar et al. (2023) achieved a mAP of 97.1% for localization in their model. Yamamoto et al. (2025) achieved a mAP of approximately 90%. Nevertheless, Ponn et al. (2020) indicated that the center of the bounding box is not a fixed point on the object; changes in posture, movement, or orientation can cause it to shift, potentially leading to inaccuracies in localization. The present study’s findings are with the previously reported range, but the localization was performed accurately by focusing on the keypoints of the front hooves.
Furthermore, the localization of the calves was extended by the orientation in the pen. This focus on orientation corresponds to the approach in earlier studies. In their studies on cows kept on pasture, Begall et al. (2008; 2011) and Hert et al. (2011) used satellite images from Google Earth to manually determine the cow’s back line. Begall et al. (2008; 2011) observed and estimated the orientation angles manually. Hert et al. (2011) used a computer program for this task. Both the detection of the back line and the determination of the orientation angle are now integrated in the Yolov8-pose model. Automatically determining back lines and orientation is a substantial methodological improvement over previous manual approaches. Unlike earlier studies, the present system can analyze large datasets objectively and continuously without observer bias. This enables a more detailed investigation of behavioral dynamics over time. The orientation can be used to evaluate the social behavior. The back lines can be used to determine the distance between the calves and the orientation of the calves in relation to each other. Relative orientation and distance are key indicators of affiliative behavior, avoidance, and social interactions within a group. These factors offer valuable insights into how animals interact with each other, how they position themselves, and how they form social preferences (Schlägel et al., 2019).
The present study exclusively addressed the issue in two calves separated by a fence. In the event of the model being applied to larger groups of calves, statements could also be made about social behavior, as in a social network. When scaled to larger groups, the model can automatically generate social network representations. This allows for the detection of social bonds, dominance relationships, subgroup structures, and changes in social dynamics over time. Such analyses are generally time-consuming and dependent on human observation. This information can be instrumental in the management of herds, particularly in the context of group formation. Furthermore, continuous orientation monitoring could support precision livestock management by identifying disruptions and social conflicts earlier than conventional observation methods (Parivendan et al., 2025). In addition, the orientation of the calves can provide information regarding unfavorable housing conditions, as the calves will turn away from sources of disturbance such as unpleasant wind currents or odors. Therefore, systematic patterns in orientation, such as the consistence avoidance of specific areas, may serve as early indicators of environmental problems. This allows for timely intervention to improve animal welfare.
5 Outlook
This study presents a trained YOLOv8-pose model that can be extended to include other behaviors. The image set must be expanded and further label work carried out in advance. This possible enhancement in accuracy is particularly evident in the identification of keypoints, such as the ankles and shoulders. The interaction of the various keypoints enables the determination of repetitive behavior patterns such as licking, scratching, and ruminating. Furthermore, detailed statements can be made about activity behavior. The utilization of keypoints facilitates the correlation of the activity with specific anatomical regions, such as the head or legs.
6 Conclusion
In the present study, a YOLOv8 pose model was trained with high accuracy. As it combines object detection and keypoint estimation, it can be used for several purposes simultaneously. In addition to posture, it determines drinking and feeding behavior. Calves can also be tracked by recording the area of stay and their orientation. In this way, a larger spectrum of behavior can be monitored. If changes in the calves’ behavior occur, it is possible to react immediately. This allows calves to be reared in a less stressful and healthier way.
Occlusions, lying positions near area boundaries, and keypoints with lower detection accuracy can reduce reliability, particularly for behaviors such as lying at the beginning of recordings or ‘drinking’. This study employed a dual-camera system and an offline analytical approach. The implementation of this method within the context of commercial farm environments would necessitate real-time processing and integration with existing farm management systems. Furthermore, the augmentation of the image dataset and the incorporation of additional keypoints (e.g., shoulders) would likely enhance the system’s robustness under diverse lighting conditions, camera angles, and group sizes. Subsequent studies should concentrate on enhancing the model’s development for real-time, farm-level applications, with the objective of fully leveraging automated behavior monitoring in commercial contexts.
Data availability statement
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.
Ethics statement
The animal study was approved by Landesamt für Landwirtschaft, Lebensmittelsicherheit und Fischerei (LALLF) Mecklenburg-Vorpommern, # 7221.3-1-023/22. The study was conducted in accordance with the local legislation and institutional requirements.
Author contributions
SJ: Data curation, Formal analysis, Investigation, Methodology, Software, Validation, Visualization, Writing – original draft, Writing – review & editing. SD: Project administration, Resources, Writing – review & editing. VR: Resources, Validation, Writing – review & editing. J-NJ: Resources, Validation, Writing – review & editing. SA: Software, Writing – review & editing. TH-B: Conceptualization, Methodology, Project administration, Resources, Supervision, Writing – review & editing.
Funding
The author(s) declared that financial support was received for this work and/or its publication. This study is part of the project “Innovations for healthy and ‘happy’ cows” (IGG), which are supported by BMEL (28N-3-039-02 and 28N-3-039-01).
Acknowledgments
We thank Lena Pabel, Katrin Siebert, Evelin Normann for their contribution to the animal experiment. We also thank the EAR staff at the FBN for taking care of the calves and providing further support in organizing the experiments.
Conflict of interest
The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The author(s) declared that generative AI was not used in the creation of this manuscript.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Supplementary material
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fanim.2025.1718641/full#supplementary-material
Abbreviations
AP, average precision; FN, false negative; FP, false positive; IoU, Intersection of Union; mAP, mean average precision; OKS, object keypoint similarity; TP, true positive.
References
Bai Q., Gao R., Li Q., Wang R., and Zhang H. (2024). Recognition of the behaviors of dairy cows by an improved YOLO. Intell. Robotics. 4, 1–19. doi: 10.20517/ir.2024.01
Begall S., Burda H., Červený J., Gerter O., Neef-Weisse J., and Němec P. (2011). Further support for the alignment of cattle along magnetic field lines: reply to Hert et al. J. Comp. Physiol. A. 197, 1127–1133. doi: 10.1007/s00359-011-0674-1
Begall S., Červený J., Neef J., Vojtěch Oldřich, and Burda H. (2008). Magnetic alignment in grazing and resting cattle and deer. Proc. Natl. Acad. Sci. U.S.A. 105, 13451–13455. doi: 10.1073/pnas.0803650105
Bloch V. and Pastell M. (2020). Monitoring of cow location in a barn by an open-source, low-cost, low-energy bluetooth tag system. Sensors. 20, 3481. doi: 10.3390/s20143841
Bowen J. M., Haskell M. J., Miller G. A., Mason C. S., Bell D. J., and Duthie C.-A. (2021). Early prediction of respiratory disease in preweaning dairy calves using feeding and activity behaviors. J. dairy Sci. 104, 12009–12018. doi: 10.3168/jds.2021-20373
Brickell J. S., McGowan M. M., Pfeiffer D. U., and Wathes D. C. (2009). Mortality in Holstein-Friesian calves and replacement heifers, in relation to body weight and IGF-I concentration, on 19 farms in England. Animal: an Int. J. Anim. biosci. 3, 1175–1182. doi: 10.1017/S175173110900456X
Cai S., Xu H., Cai W., Mo Y., and Wei L. (2025). A human pose estimation network based on YOLOv8 framework with efficient multi-scale receptive field and expanded feature pyramid network. Sci. Rep. 15, 15284. doi: 10.1038/s41598-025-00259-0
Cangar Ö., Leroy T., Guarino M., Vranken E., Fallon R., Lenehan J., et al. (2008). Automatic real-time monitoring of locomotion and posture behaviour of pregnant cows prior to calving using online image analysis. Comput. Electron. Agric. 64, 53–60. doi: 10.1016/j.compag.2008.05.014
Chen Z., Wu R., Lin Y., Li C., Chen S., Yuan Z., et al. (2022). Plant disease recognition model based on improved YOLOv5. Agronomy. 12, 365. doi: 10.3390/agronomy12020365
Curtis S. E. (1987). Animal well-being and animal care. The Veterinary clinics of North America. Food Anim. Pract. 3, 369–382. doi: 10.1016/s0749-0720(15)31158-0
D’Urso P. R., Arcidiacono C., Pastell M., and Cascone G. (2023). Assessment of a UWB real time location system for dairy cows’ Monitoring. Sensors. 23, 4873. doi: 10.3390/s23104873
Dac H. Ho, Viejo G., Claudia L., Nir T., Eden D., Frank R., et al. (2022). Livestock identification using deep learning for traceability. Sensors. 22, 8256. doi: 10.3390/s22218256
Dittrich I., Gertz M., and Krieter J. (2019). Alterations in sick dairy cows’ daily behavioural patterns. Heliyon. 5, e02902. doi: 10.1016/j.heliyon.2019.e02902
Duan W., Wang F., Li H., Liu Na, and Fu X. (2025). Lameness detection in dairy cows from overhead view: high-precision keypoint localization and multi-feature fusion classification. Front. veterinary Sci. 12, 1675181. doi: 10.3389/fvets.2025.1675181
Duthie C.-A., Bowen J. M., Bell D. J., Miller G. A., Mason C., and Haskell M. J. (2021). Feeding behaviour and activity as early indicators of disease in pre-weaned dairy calves. Animal: an Int. J. Anim. biosci. 15, 100150. doi: 10.1016/j.animal.2020.100150
Everingham M., van Gool L., Williams C. K.I., Winn J., and Zisserman A. (2010). The pascal visual object classes (VOC) challenge. Int. J. Comput. Vision. 88, 303–338. doi: 10.1007/s11263-009-0275-4
Farahnakian F., Farahnakian F., Björkman S., Bloch V., Pastell M., and Heikkonen J. (2024). Pose estimation of sow and piglets during free farrowing using deep learning. J. Agric. Food Res. 16, 101067. doi: 10.1016/j.jafr.2024.101067
Gong C., Zhang Y., Wei Y., Du X., Su L., and Weng Z. (2022). Multicow pose estimation based on keypoint extraction. PloS One. 17, e0269259. doi: 10.1371/journal.pone.0269259
Hänninen L., Passillé A. M., and Rushen J. (2005). The effect of flooring type and social grouping on the rest and growth of dairy calves. Appl. Anim. Behav. Sci. 91, 193–204. doi: 10.1016/j.applanim.2004.10.003
Hanzlicek G. A., White B. J., Mosier D., Renter D. G., and Anderson D. E. (2010). Serial evaluation of physiologic, pathological, and behavioral changes related to disease progression of experimentally induced Mannheimia haemolytica pneumonia in postweaned calves. Am. J. veterinary Res. 71, 359–369. doi: 10.2460/ajvr.71.3.359
Hart B. L. (1988). Biological basis of the behavior of sick animals. Neurosci. Biobehav. Rev. 12, 123–137. doi: 10.1016/s0149-7634(88)80004-6
Herbut P., Hoffmann G., Angrecka S., Godyń D., Vieira F. MárcioCorrêa, Adamczyk K., et al. (2021). The effects of heat stress on the behaviour of dairy cows – a review. Ann. Anim. Sci. 21, 385–402. doi: 10.2478/aoas-2020-0116
Hert J., Jelinek L., Pekarek L., and Pavlicek A. (2011). No alignment of cattle along geomagnetic field lines found. J. Comp. Physiol. A. 197, 677–682. doi: 10.1007/s00359-011-0628-7
Jahn S., Schmidt G., Bachmann L., Louton H., Homeier-Bachmann T., and Schütz A. K. (2025). Individual behavior tracking of heifers by using object detection algorithm YOLOv4. Front. Anim. Sci. 5. 10.3389/fanim.2024.1499253
Jensen M. B. and Larsen L. E. (2014). Effects of level of social contact on dairy calf behavior and health. J. dairy Sci. 97, 5035–5044. doi: 10.3168/jds.2013-7311
Jia Z., Zhao Y., Mu X., Liu D., Wang Z., Yao J., et al. (2025). Intelligent deep learning and keypoint tracking-based detection of lameness in dairy cows. Veterinary Sci. 12, 218. doi: 10.3390/vetsci12030218
Jocher G., Chaurasia A., and Qui J. (2023). YOLOv8 pose models. Available online at: https://github.com/ultralytics/ultralytics/issues/1915 (Accessed July 18, 2025).
Jurkovich V., Bakony M., and Reiczigel J. (2024). A retrospective study of thermal events on the mortality rate of hutch-reared dairy calves. Front. veterinary Sci. 11, 1366254. doi: 10.3389/fvets.2024.1366254
Kaur A., Kumar M., and Jindal M. K. (2022). Cattle identification with muzzle pattern using computer vision technology: a critical review and prospective. Soft Computing. 26, 4771–4795. doi: 10.1007/s00500-022-06935-x
Kluyver T., Ragan-Kelley B., Pérez F., Granger B., Bussonnier M., Frederic J., et al. (2016). Jupyter Notebooks - a publishing format for reproducible computational workflows. Amsterdam, Netherlands: IOS Press 87–90.
Knauer W. A., Godden S. M., Dietrich A., and James R. E. (2017). The association between daily average feeding behaviors and morbidity in automatically fed group-housed preweaned dairy calves. J. dairy Sci. 100, 5642–5652. doi: 10.3168/jds.2016-12372
Kovács L., Kézér F. L., Bakony M., Jurkovich V., and Szenci Ottó (2018). Lying down frequency as a discomfort index in heat stressed Holstein bull calves. Sci. Rep. 8, 15065. doi: 10.1038/s41598-018-33451-6
Li J., Liu Y., Zheng W., Chen X., Ma Y., and Guo L. (2024). Monitoring cattle ruminating behavior based on an improved keypoint detection model. Animals: an Open Access J. MDPI. 14, 1791. doi: 10.3390/ani14121791
Lowe G. L., Sutherland M. A., Waas J. R., Schaefer A. L., Cox N. R., and Stewart M. (2019). Physiological and behavioral responses as indicators for early disease detection in dairy calves. J. dairy Sci. 102, 5389–5402. doi: 10.3168/jds.2018-15701
Maji D., Nagori S., Mathew M., and Poddar D. (2022). YOLO-pose: enhancing YOLO for multi person pose estimation using object keypoint similarity loss. Available online at: http://arxiv.org/pdf/2204.06806v1 (Accessed December 4, 2025).
Mandel R., Whay H. R., Klement E., and Nicol C. J. (2016). Invited review: Environmental enrichment of dairy cows and calves in indoor housing. J. dairy Sci. 99, 1695–1715. doi: 10.3168/jds.2015-9875
Mar C. C., Zin T. T., Tin P., Honkawa K., Kobayashi I., and Horii Y. (2023). Cow detection and tracking system utilizing multi-feature tracking algorithm. Sci. Rep. 13, 17423. doi: 10.1038/s41598-023-44669-4
McDonagh J., Tzimiropoulos G., Slinger K. R., Huggett ZoëJ., Down P. M., and Bell M. J. (2021). Detecting dairy cow behavior using vision technology. Agriculture. 11, 675. doi: 10.3390/agriculture11070675
Millman S. T. (2007). Sickness behaviour and its relevance to animal welfare assessment at the group level. Anim. Welfare. 16, 123–125. doi: 10.1017/S0962728600031146
Nasirahmadi A., Sturm B., Edwards S., Jeppsson Knut-Håkan, Olsson A.-C., Müller S., et al. (2019). Deep learning and machine vision approaches for posture detection of individual pigs. Sensors. 19, 3738. doi: 10.3390/s19173738
Nikkhah A. and Alimirzaei M. (2023). Understanding calf behavioral responses to environmental changes and challenges: an applied update. Farm Anim. Health Nutr. 2, 72–78. doi: 10.58803/fahn.v2i4.35
Parivendan S. C., Sailunaz K., and Neethirajan S. (2025). Socializing AI: integrating social network analysis and deep learning for precision dairy cow monitoring-A critical review. Animals: an Open Access J. MDPI. 15, 1835. doi: 10.3390/ani15131835
Peng C., Cao S., Li S., Bai T., Zhao Z., and Sun W. (2024). Automated measurement of cattle dimensions using improved keypoint detection combined with unilateral depth imaging. Animals: an Open Access J. MDPI. 14, 2453. doi: 10.3390/ani14172453
Ponn T., Kröger T., and Diermeyer F. (2020). Identification and explanation of challenging conditions for camera-based object detection of automated vehicles. Sensors. 20, 3699. doi: 10.3390/s20133699
Russello H., van der Tol R., and Kootstra G. (2022). T-LEAP: Occlusion-robust pose estimation of walking cows using temporal information. Comput. Electron. Agric. 192, 106559. doi: 10.1016/j.compag.2021.106559
Scaillierez A. J., Izquierdo García-Faria Tomás, Broers H., van Nieuwamerongen-de Koning S. E., van der Tol R. P.P.J., Bokkers E. A.M., et al. (2024). Determining the posture and location of pigs using an object detection model under different lighting conditions. Trans. Anim. Sci. 8, txae167. doi: 10.1093/tas/txae167
Schlägel U. E., Signer J., Herde A., Eden S., Jeltsch F., Eccard J. A., et al. (2019). Estimating interactions between individuals from concurrent animal movements. Methods Ecol. Evol. 10, 1234–1245. doi: 10.1111/2041-210X.13235
Schütz A. K., Krause E.T., Fischer M., Müller T., Freuling C. M., Conraths F. J., et al. (2022). Computer vision for detection of body posture and behavior of red foxes. Animals: an Open Access J. MDPI. 12., 223 doi: 10.3390/ani12030233
Schütz A. K., Schöler V., Krause E.T., Fischer M., Müller T., Freuling C. M., et al. (2021). Application of YOLOv4 for detection and motion monitoring of red foxes. Animals: an Open Access J. MDPI. 11, 1723. doi: 10.3390/ani11061723
Sharma B. and Koundal D. (2018). Cattle health monitoring system using wireless sensor network: a survey from innovation perspective. IET Wireless Sensor Syst. 8, 143–151. doi: 10.1049/iet-wss.2017.0060
Shen W., Hu H., Dai B., Wei X., Sun J., Jiang Li, et al. (2020). Individual identification of dairy cows based on convolutional neural networks. Multimedia Tools Appl. 79, 14711–14724. doi: 10.1007/s11042-019-7344-7
Stull C. and Reynolds J. (2008). Calf welfare. The Veterinary clinics of North America. Food Anim. Pract. 24, 191–203. doi: 10.1016/j.cvfa.2007.12.001
Su Q., Zhang J., Chen M., and Peng H. (2024). PW-YOLO-pose: A novel algorithm for pose estimation of power workers. IEEE Access. 12, 116841–116860. doi: 10.1109/ACCESS.2024.3437359
Sutherland M. A., Lowe G. L., Huddart F. J., Waas J. R., and Stewart M. (2018). Measurement of dairy calf behavior prior to onset of clinical disease and in response to disbudding using automated calf feeders and accelerometers. J. dairy Sci. 101, 8208–8216. doi: 10.3168/jds.2017-14207
Swartz T. H., McGilliard M. L., and Petersson-Wolfe C. S. (2016). Technical note: The use of an accelerometer for measuring step activity and lying behaviors in dairy calves. J. dairy Sci. 99, 9109–9113. doi: 10.3168/jds.2016-11297
Tapkı İbrahim, Şahin A., and Önal A. G. (2006). Effect of space allowance on behaviour of newborn milk-fed dairy calves. Appl. Anim. Behav. Sci. 99, 12–20. doi: 10.1016/j.applanim.2005.09.006
Ultralytics (2023). Explore ultralytics YOLOv8. Performance metrics. Available online at: https://docs.ultralytics.com/models/yolov8/pose-coco (Accessed November 28, 2025).
van der Fels-Klerx H. J., Saatkamp H. W., Verhoeff J., and Dijkhuizen A. A. (2002). Effects of bovine respiratory disease on the productivity of dairy heifers quantified by experts. Livestock Production Sci. 75, 157–166. doi: 10.1016/S0301-6226(01)00311-6
van Erp-van der Kooij E., Almalik O., Cavestany D., Roelofs J., and van Eerdenburg F. (2019). Lying postures of dairy cows in cubicles and on pasture. Animals: an Open Access J. MDPI. 9, 183. doi: 10.3390/ani9040183
van Reenen C. G., O’Connell N. E., van der Werf J. T.N., Korte S.M., Hopster H., Jones R.B., et al. (2005). Responses of calves to acute stress: individual consistency and relations between behavioral and physiological measures. Physiol. Behav. 85, 557–570. doi: 10.1016/j.physbeh.2005.06.015
van Rossum G. and Drake F. L. Jr. (2014). The python language reference. (Wilmington, DE, USA: Python Software Foundation).
Verdon M. (2021). A review of factors affecting the welfare of dairy calves in pasture-based production systems. Anim. Production Sci. 62, 1–20. doi: 10.1071/AN21139
Wang J., Wang N., Li L., and Ren Z. (2020). Real-time behavior detection and judgment of egg breeders based on YOLO v3. Neural Computing Appl. 32, 5471–5481. doi: 10.1007/s00521-019-04645-4
Wang F., Wang G., and Lu B. (2024). YOLOv8-poseBoost: advancements in multimodal robot pose keypoint detection. Electronics. 13, 1046. doi: 10.3390/electronics13061046
Widyadara M. A. D. and Mulya M. A. J. (2025). Comparing YOLOv5 and YOLOv8 performance in vehicle license plate detection. Int. J. Res. Rev. 12, 8–17. doi: 10.52403/ijrr.20250202
Xu X., Wu T., Du Z., Rong H., Wang S., Li S., et al. (2025). Enhanced human pose estimation using YOLOv8 with Integrated SimDLKA attention mechanism and DCIOU loss function: Analysis of human body behavior and posture. PloS One. 20, e0318578. doi: 10.1371/journal.pone.0318578
Yamamoto Y., Akizawa K., Aou S., and Taniguchi Y. (2025). Entire-barn dairy cow tracking framework for multi-camera systems. Comput. Electron. Agric. 229, 109668. doi: 10.1016/j.compag.2024.109668
Keywords: animal behavior, calf, computer vision, Holstein Frisian, tracking, YOLOv8-pose
Citation: Jahn S, Düpjan S, Röttgen V, Jordt J-N, Albrecht S and Homeier-Bachmann T (2026) Integrating YOLOv8-pose for localization and behavior tracking of calves in precision livestock farming. Front. Anim. Sci. 6:1718641. doi: 10.3389/fanim.2025.1718641
Received: 04 October 2025; Accepted: 26 December 2025; Revised: 11 December 2025;
Published: 16 January 2026.
Edited by:
Marcia Endres, University of Minnesota Twin Cities, United StatesReviewed by:
Severiano Silva, Universidade de Trás-os-Montes e Alto, PortugalMaria Teresa Verde, University of Naples Federico II, Italy
Copyright © 2026 Jahn, Düpjan, Röttgen, Jordt, Albrecht and Homeier-Bachmann. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Sarah Jahn, c2FyYWguamFobkBmbGkuZGU=
Volker Röttgen2