Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Anim. Sci., 30 January 2026

Sec. Precision Livestock Farming

Volume 7 - 2026 | https://doi.org/10.3389/fanim.2026.1676504

This article is part of the Research TopicExploring Novel Data Sources to Improve Health and Welfare in Dairy CattleView all 8 articles

Advancing standardized cattle behavior classification with a random forest model evaluated across diverse datasets

Hannah JamesHannah James1Clara RialClara Rial2Julio GiordanoJulio Giordano2David Erickson*David Erickson1*
  • 1Sibley School of Mechanical and Aerospace Engineering, Cornell University, Ithaca, NY, United States
  • 2Department of Animal Sciences, Cornell University, Ithaca, NY, United States

The lack of precise, autonomous tools for monitoring and classifying cattle behavior limits farmers’ ability to make proactive and informed decisions regarding grazing and herd management. Currently, there is no standardized model for cattle behavior classification. To address this gap, we developed and evaluated a Random Forest machine learning model using accelerometer data collected from multiple sensor placements on the cow’s body (nose, ear, neck). Our model integrates open-source datasets with newly collected field data and classifies key behaviors such as eating, rumination, and movement. Unlike prior studies focused on single-herd or controlled environments, this work demonstrates that a single, standardized model architecture and feature pipeline can be applied consistently across five cattle behavior datasets, spanning different breeds, environments, and devices. Cross-dataset transfer experiments were conducted as an exploratory analysis to assess robustness rather than as a primary performance criterion. The model achieves high accuracy—meeting or exceeding published benchmarks—while maintaining low computational demands, making it practical for real-time applications such as virtual fencing. This scalable, data-driven model supports precision livestock monitoring by enabling accurate, real-time classification of key cattle behaviors from wearable sensor data across diverse datasets, facilitating automated and data-informed livestock management.

1 Introduction

As the global agricultural sector moves towards a more sustainable and food-secure future, identifying cattle behavior has become essential for optimizing cattle farming practices (Monteiro et al., 2021). By accurately recording and classifying cattle behavior, farmers can improve herd health through individualized management strategies, which in turn also enhances productivity (Doyle and Moran, 2015; Rial et al., 2024). These behavioral insights can be integrated into virtual fencing systems, which use GPS coordinates to define boundaries and trigger stimuli to keep cattle within designated areas. This technology enables precise and automated control of livestock movements, eliminating the need for physical fences and significantly reducing labor-intensive practices such as moving fences for rotational grazing (Goliński et al., 2022). By leveraging real-time behavior data, virtual fencing optimizes herd management and helps mitigate overgrazing. It enables rotational grazing that preserves soil structure and fertility, improves water retention, and enhances biodiversity (Wätzold et al., 2024; Murray et al., 2025). These systems allow farmers to implement adaptive grazing strategies, ensuring that pastures are utilized efficiently while providing adequate recovery time for vegetation (di Virgilio et al., 2018). Moreover, environmentally conscious practices taken from these insights contribute to reducing the carbon footprint of livestock farming, ultimately benefiting both the farm and the planet (Lovarelli, 2020).

Behavioral changes in cattle, such as reduced activity and decreased grazing time, can serve as an early warning system for potential health issues (Paudyal et al., 2018; Stangaferro et al., 2016; Dittrich et al., 2019). By detecting these changes in real time, farmers can implement timely veterinary interventions, preventing the spread of disease within the herd and minimizing its impact on individual animals (Stangaferro et al., 2016; Silva et al., 2021; Rial et al., 2024; Neethirajan, 2020; Doeschl-Wilson et al., 2021; Neethirajan, 2017). Moreover, health monitoring is closely tied to animal welfare by allowing farmers to assess whether cattle have adequate access to necessities like shelter, food, and water (Sharma and Koundal, 2018). For instance, a notable decrease in drinking time may indicate a water shortage. Translating these behavioral insights into effective management tools requires accurate and efficient computational models that can support real-time decision-making in grazing systems.

Among the computational approaches available, machine learning models provide a promising pathway for classifying cattle behaviors in real time. Beyond predictive performance, timely inference, computational efficiency, and low energy use are critical to enable practical deployment on embedded livestock devices, particularly in virtual fencing systems where extended operational lifetime is essential. Previous studies have explored a wide range of models for predicting cattle behaviors such as grazing, walking, ruminating, resting, and drinking, including Long Short-Term Memory networks (Roberts and Segev, 2020), Convolutional Neural Networks (Zhang et al., 2022), Logistic Regression (Kamphuis et al., 2013), Softmax (multinomial logistic regression), Support Vector Machines, Decision Trees, Linear and Quadratic Discriminant Analysis, Gaussian Naive Bayes (Arablouei, 2021), Deep Neural Networks (Arablouei, 2023a, Random Forests (Versluijs et al., 2023), and JRip (Williams et al., 2016). While many of these approaches achieve high accuracy, they are often optimized for specific herds (González et al., 2015; Robert et al., 2009), particular hardware constraints (Arablouei, 2021; Arablouei et al., 2023b), or rely on computationally intensive deep learning models that limit on-farm deployment (Hosseininoorbin et al., 2021). As a result, there is still no widely adopted or validated standard model for cattle behavior classification that performs consistently across datasets and environments. To address this gap, we propose a standardized, end-to-end cattle behavior classification pipeline based on a Random Forest classifier and evaluate it across diverse datasets, sensor placements, and breeds. Our approach is designed to balance accuracy, interpretability, and computational simplicity, enabling broader adoption across different farm environments and device types. Timely behavior classification can support prompt alerts and more responsive herd management, and comparison against reported benchmarks from each dataset provides context for the competitive performance of the proposed approach. To our knowledge, this study offers one of the first systematic cross-dataset evaluations of cattle behavior classification using a consistent methodology, demonstrating the generalizability of a simple, interpretable model across diverse conditions.

2 Methods

2.1 Datasets

We analyzed five datasets spanning different breeds, housing conditions, sensor placements, and sampling frequencies (Table 1). Dataset 1 (Feng et al., 2024) involved seven Holstein dairy cows in Xuniban Village, Hohhot, Inner Mongolia, each equipped with a nose-mounted ADXL362 triaxial accelerometer sampling at 10 Hz and integrated into a wearable nose ring; cows were housed in free-moving farm areas with access to feeding and movement zones, and behaviors (feeding, rumination, standing, lying, walking) were labeled from video-synchronized recordings. Dataset 2 (Ito et al., 2021) recorded neck-mounted Kionix KX122–1037 accelerometers (25 Hz) on six Japanese Black beef cattle at Shinshu University, Japan, over two days; thirteen behaviors (e.g., grazing, drinking, rumination, lying, urination) were annotated from 4K video by expert and lay observers with time alignment to the sensors. Dataset 3 (Mladenova, 2024) collected collar-mounted Bosch BMI270 accelerometer–gyroscope data from two Simmental cows housed in a free-stall barn in Bulgaria, sampled at 1 Hz; ground-truth behaviors (standing/eating, standing/ruminating, lying/ruminating) were obtained via continuous, time-aligned video. Dataset 4 (Andersen, 2024) included a mixed-breed herd in open-field grazing environments fitted with accelerometer and gyroscope sensors, comprising 5,732 grazing, 1,702 lying-resting, 2,034 lying-ruminating, 1,559 standing-resting, 832 standing-ruminating, and 1,229 walking samples. Dataset 4 reports the number of labeled behavior samples per class but does not specify the total number of individual animals included. Dataset 5 (James and Rial, 2024) recorded a single adult dairy cow in a free-stall barn with an ear-tag mounted accelerometer over 15 hours, with behaviors annotated via synchronized. Behaviors in Dataset 5 were grouped into three classes: eating (standing while feeding at the bunk), standing (standing without feeding activity), and walking (locomotion not associated with feeding). Collectively, these datasets provide diverse breeds, environments, placements, sampling regimes, and annotation protocols for a robust evaluation of our pipeline; summarized in Table 1. Abbreviations: NR = not reported; HQ = minutes retained after quality control. Although the raw datasets differ in sensor placement, sampling frequency, and available signals, all five datasets include tri-axial accelerometer measurements. To ensure comparability across datasets, a standardized feature extraction pipeline based on accelerometer-derived features was applied uniformly, providing a common feature basis across all datasets.

Table 1
www.frontiersin.org

Table 1. Summary of datasets outlining the source, description, collection date, and key features relevant to cow behavior analysis.

All datasets used in this study consist of wearable sensor–derived inertial measurements (tri-axial accelerometer data, and gyroscope data where available) and behavior labels; the released datasets do not include production or performance variables (e.g., milk yield, growth, or reproduction). In addition, animal-level or context-level covariates such as herd identity, year, or season were not consistently available across the datasets. As a result, the modeling framework focuses exclusively on behavior classification using sensor-derived inertial features, without incorporating fixed effects or contextual covariates.

The analyses were conducted in Python 3.12.11. The primary libraries used included scikit-learn (v1.6.1) for machine learning models, imbalanced-learn (v0.14.0) for SMOTE implementation, LazyPredict (v0.2.16) for baseline model comparisons, NumPy (v2.0.2) and pandas (v2.2.2) for data processing, and Matplotlib (v3.10.0) for visualization. Code and scripts for reproduction are openly available in our GitHub repository https://github.com/HannahJames123/cattle-behavior [James (2024)].

2.2 Model selection

To determine the most suitable machine learning model, we benchmarked a set of commonly used algorithms (listed in Figure 1) for cattle behavior classification across all five datasets. Models were trained and tested using default hyperparameters to establish a consistent and interpretable baseline for comparing accuracy and training time (Figure 1). While this approach may favor models that perform well out of the box (e.g., Random Forest), it enables an unbiased initial assessment before any fine-tuning, which can vary substantially across algorithms and datasets (Probst et al., 2019a). For this preliminary screening, we used the Lazy Predict library after applying minimal preprocessing: feature extraction, imputation of missing values, feature scaling, and class balancing via SMOTE. This allowed us to evaluate classifier performance without behavior-specific segmentation or model-specific optimization. Model selection was based on a combination of consistently high classification accuracy across datasets and low training time, reflecting a balance between predictive performance and computational efficiency. Based on this analysis, we selected Random Forest as the most consistently high-performing model. All subsequent modeling steps—including data windowing, refined feature engineering, and cross-validation—were then conducted as described in Section 2.3.

Figure 1
Bar charts showing results from five datasets across different algorithms. Chart (a) depicts simple accuracy percentages, with Dataset 3 consistently performing best. Chart (b) presents training times in seconds, where Dataset 3 again shows higher times, especially for the XGB and KNN algorithms.

Figure 1. Comparison of model performance across the five datasets given in Table 1, for ten machine learning classifiers using Lazy Predict. (a) Simple Accuracy means (%) for each model, where this means percentage of observations correctly classified. (b) Training times (seconds) for the same models. Each dataset is distinguished by color and hatch pattern, and abbreviations of model names are shown on the x-axis. Full names of the models are as follows: RF (Random Forest), XGB (XGBoost), LGBM (LightGBM), KNN (K-Nearest Neighbors), LR (Logistic Regression), DT (Decision Tree), SVM (Support Vector Machine), AdaBoost (AdaBoost), GNB (Gaussian Naive Bayes), and ET (Extra Trees).

Accuracy refers to test accuracy, defined as the proportion of correctly predicted labels (e.g., feeding, resting, walking) out of all predictions. Training time was measured in seconds for each model–dataset pair to highlight computational efficiency. Figure 1 shows the results of 10 different models evaluated using the Lazy Predict Python library (Pandala, 2021), which enables rapid, untuned model screening. This approach was chosen deliberately to highlight how well models perform “out-of-the-box,” without dataset-specific tuning or feature engineering, thereby allowing a fair comparison across datasets. Random Forest consistently achieved high accuracy across multiple datasets, demonstrating robust performance under diverse sampling rates, sensor placements, and behavior categories. In terms of model efficiency, Random Forest consistently completed training in under 3 seconds across all datasets, compared to 5–35 seconds for boosting-based models (XGBoost, LightGBM) and more than 30 seconds for logistic regression. While simpler models such as Naïve Bayes and Decision Trees trained faster (<0.5 seconds), this came at the expense of substantially lower accuracy (Figure 1). Extra Trees (ET) also achieved high accuracy in some datasets but showed greater variability across datasets. Random Forest therefore offered the best trade-off: accuracy comparable to XGBoost and LightGBM, but with considerably lower computational cost.

2.3 The Random Forest model

To classify cattle behavior accurately, we implemented a Random Forest model using a standardized pipeline applied across all datasets (see Supplementary Figure S2). The process involved several key stages: segmenting time-series data, extracting features, addressing class imbalance, training the model, and evaluating its performance. To evaluate generalizability, we applied a standard 80/20 stratified train-test split to each dataset—preserving class balance in both subsets—and subsequently performed cross-validation to assess model robustness. This is a common approach in machine learning (Sugali et al., 2021a) that helps ensure fair evaluation and minimizes overfitting to specific behaviors. No dataset-specific tuning or feature engineering was performed between datasets. This approach tests whether a single model architecture and preprocessing pipeline can perform well across datasets with varying sampling frequencies, sensor placements, and behavior labels, without dataset-specific tuning or redesign.

2.3.1 Data segmentation and windowing

Each dataset was divided into fixed-length windows of 15 data points. The actual time span of each window, therefore, varies by dataset according to its sampling frequency—for instance, 0.5 seconds at 30 Hz and 15 seconds at 1 Hz. This segmentation captures short-term behavioral patterns, making the data more manageable and enhancing the model’s ability to learn from localized movement trends over time. This windowing technique is essential for handling sequential data, as it allows for the effective management of temporal dependencies and patterns (Ye and Keogh, 2009; Zheng et al., 2013; Ismail Fawaz et al., 2019; Ordóñez and Roggen, 2016). A fixed-length window in terms of data points, rather than seconds, was chosen to maintain consistency across datasets with different sampling rates. This decision supports the development of a generalizable model that can be applied to both high- and low-frequency accelerometer datasets without requiring dataset-specific preprocessing. While shorter windows offer finer temporal resolution—particularly beneficial for identifying rapid behaviors such as head movements—they may segment longer behaviors (e.g., grazing or ruminating) into sub-episodes (Walton et al., 2018; Alghamdi et al., 2022; Ni et al., 2025). We acknowledge that many applied ruminant studies use longer epochs (e.g., 5 to 60 seconds) to improve behavioral continuity and reduce noise. For example, Versluijs et al. (2023) used a running-mean preprocessing strategy with smoothing windows of 1, 5, and 20 seconds—rather than segmenting data into behavior-specific epochs—and found that a 20-second window yielded the best performance at 10 Hz. We opted for fixed 15-point windows (regardless of time duration) to ensure consistency across datasets with varying sampling frequencies, thereby standardizing feature extraction and enabling direct comparison across datasets. Exploring the trade-off between temporal resolution and behavioral stability remains an important area for future work.

2.3.2 Feature extraction

From each window, we derived statistical features to summarize movement signals across the X, Y, and Z axes. These included the mean (central tendency), standard deviation (variability), skewness (asymmetry), and kurtosis (presence of outliers). These features were selected because they capture both the average activity level and the distributional shape of movement signals, which are known to differentiate between steady postures (e.g., standing) and more dynamic behaviors (e.g., walking or feeding). Prior studies in livestock monitoring and human activity recognition have shown that low-order statistical moments from accelerometer data provide a compact yet informative representation of behavior, enabling accurate classification while keeping computational costs low (Nathan et al., 2012; Rahman et al., 2018; Rast et al., 2020; Arablouei, 2023a; Hoffman et al., 2024; Janaa et al., 2025).

We intentionally avoided dataset-specific feature engineering to test whether a single pipeline using a consistent set of features could generalize across datasets with different sampling frequencies, sensor placements, and label definitions. This reflects a practical constraint of real-world applications, where models must perform well on new datasets without requiring bespoke adjustments.

2.3.3 Addressing class imbalance

To handle class imbalance—where some behaviors were underrepresented—we used the Synthetic Minority Over-sampling Technique (SMOTE), a method that generates new samples of minority classes by interpolating between existing ones. This helps the model learn to recognize rare behaviors more reliably. For our implementation, we used SMOTE with k = 2 to allow interpolation in small or sparsely represented behavior classes, further enhancing the model’s robustness and balance. This lower value supports robust minority class augmentation without requiring large neighborhood sizes—particularly important given the small number of labeled samples for certain behaviors. This approach is essential for improving the model’s ability to recognize all behavior classes effectively (Chawla et al., 2002; Blagus and Lusa, 2013), thereby enhancing its overall reliability and classification accuracy.

2.3.4 Model training

We trained a Random Forest classifier consisting of 100 decision trees. At each split in a tree, a random subset of the input features was considered—specifically, the square root of the total number of available features. This strategy increases tree diversity and helps prevent overfitting by ensuring that no single feature dominates the decision process across all trees in the ensemble. This approach allows the model to capture complex relationships in the data while still generalizing well to new data.

2.3.5 Model evaluation

For all datasets, we assessed model performance using test accuracy and the weighted F1 score, using the same modeling pipeline described above. Accuracy represents the percentage of correctly classified behaviors, while the weighted F1 score balances precision (how many selected items are relevant) and recall (how many relevant items are selected). Weighted F1 score accounts for class imbalance by averaging per-class F1 scores weighted by support. These metrics together provide a robust evaluation of the model’s ability to generalize across behavior classes and datasets. Like other recent studies in the field (e.g., Versluijs et al., 2023), we focus on accuracy, precision, and recall—summarized in our case by the weighted F1 score—to evaluate classification performance. Although SMOTE was used to balance the training data, evaluating metrics like weighted F1 score remains essential to ensure the model performs well across all classes, including those that were originally underrepresented. For Dataset 3, which uniquely provided separate accelerometer and gyroscope features, we evaluated our model on three configurations: (a) accelerometer-only, (b) gyroscope-only, and (c) the combined feature set. This allowed us to assess model performance across sensor modalities and compare results in line with prior studies that report accuracies separately for each sensor type.

Our evaluation approach is further informed by Ferdinandy et al. (2020) and Cunningham et al. (2024) that both highlight the importance of cross-validation strategies for behavioral data. Ferdinandy et al. (2020) discusses how validation methods such as 5-fold cross-validation and Leave-One-Individual-Out (LOIO) can significantly affect reported accuracy. Behavioral datasets often contain highly correlated samples within individual animals, leading to artificially inflated performance metrics when random train-test splits include data from the same individual in both sets. LOIO, which holds out one individual entirely for testing, provides a more realistic estimate of generalization to new animals but typically results in lower accuracy due to reduced training data and the challenges of inter-individual variation. The authors report overall accuracy between 51% and 60% when using data from the same species, and between 41% and 51% in cross-species prediction scenarios (Ferdinandy et al., 2020). Cunningham et al. (2024) explores how both cross-validation method and epoch length affect model performance. It finds that shorter epochs (30–90 seconds) yield higher accuracy but greater variability, whereas longer epochs (600–900 seconds) offer more stable results at the expense of reduced performance—particularly for behaviors such as grazing and rumination. Leave-One-Steer-Out cross-validation revealed even more variability, underscoring the difficulty of generalizing across individuals. Reported accuracy ranged from 0.85 to 0.92 with 5-fold cross-validation and from 0.82 to 0.87 with LOIO, depending on the device and epoch used. These findings reinforce our decision to report both accuracy and weighted F1 score, and to evaluate multiple cross-validation strategies to better understand model generalization in realistic use cases.

In addition to within-dataset splits and cross-validation, we also evaluated cross-dataset generalization by training the Random Forest model on one dataset and testing it directly on another without retraining. These transfer experiments were selected to represent contrasting experimental conditions. Specifically, we compared Dataset 5 (ear-tag, 1 Hz, free-stall barn) with Dataset 2 (neck-mounted, 25 Hz, university farm) to test generalization across different sampling rates, sensor placements, and behavior taxonomies. We also compared Dataset 4 (open-field grazing herd) with Dataset 2 to evaluate transfer across grazing-focused datasets collected under different environments and annotation schemes. These scenarios were chosen to probe the limits of model robustness under heterogeneous real-world conditions.

3 Results

Following the benchmarking outlined in Section 2.2, Random Forest performance was evaluated across the five datasets listed in Table 1. The results, presented in Figure 2, compare the accuracy of the model with published benchmarks for each dataset.

Figure 2
Bar chart comparing accuracy and F1 scores across different datasets. Datasets 1 to 5 show scores for 80/20 accuracy, 5-fold accuracy, leave-one-in accuracy, published accuracy, and corresponding F1 scores. Bars indicate performance ranging from 60% to 100% with minor variations. Color codes represent different metrics.

Figure 2. Comparison of accuracy and weighted F1 score for the proposed model across five datasets using four evaluation strategies: 80/20 train-test split, 5-fold cross-validation, Leave-One-Instance-Out (LOIO), and published benchmarks. Datasets 3a, 3b, and 3c correspond to different variants of Dataset 3, where 3a uses accelerometer-only features, 3b uses gyroscope-only features, and 3c combines accelerometer and gyroscope features. Error bars indicate the standard deviation across folds (5-fold) or individuals (LOIO). Published values are included where available; Dataset 5 is a novel dataset with no prior benchmark.

Published accuracies were sourced from existing studies where available. For Dataset 1, the reported accuracy comes from Feng et al. (2024), who used a 6.4-second time window and averaged performance across feeding, ruminating, and other behaviors. Dataset 2 references results from Russel (2024), who applied deep learning techniques to accelerometer data. Dataset 3 includes separate benchmark accuracies for accelerometer-only and gyroscope-only models from Mladenova et al. (2024), although no published benchmark was available for the combined sensor configuration used in our Dataset 3c. In contrast, Dataset 5 was developed as part of this study using novel data collected at CURC, so no external benchmark exists. It is important to note that many published studies do not specify their validation strategy—whether they used random train-test splits, k-fold cross-validation, or leave-one-individual-out—which complicates direct comparison with our standardized evaluation methods. Additionally, not all datasets allowed the use of LOIO or 5-fold cross-validation. In some cases, individual animal identifiers were unavailable, or the dataset included only a single animal, making LOIO infeasible. Similarly, datasets with rare behavior labels could not support 5-fold validation due to insufficient representation across folds.

In the initial 80/20 train–test split, the Random Forest model achieved the following accuracies across datasets: Dataset 1 with 92.72%, Dataset 2 with 93.41%, Dataset 3a (accelerometer only) with 94.25%, Dataset 3b (gyroscope only) with 96.66%, Dataset 3c (combined) with 95.84%, Dataset 4 with 90.17%, and Dataset 5 with 86.08%. To further evaluate generalizability, we applied 5-fold and Leave-One-Individual-Out (LOIO) cross-validation selectively across datasets. Because 5-fold CV can be unstable in imbalanced datasets where rare behaviors are absent from some folds (He and Ma, 2013), it was not applied to Dataset 2. LOIO was used for Datasets 1 and 2, where individual-level metadata enabled testing generalization to unseen subjects. As shown in Figure 2, accuracy declined across all datasets when moving from the 80/20 split to 5-fold CV or LOIO. For example, Dataset 1 accuracy dropped from 92.7% (80/20) to 83.6% (5-fold) and 79.2% (LOIO), while Dataset 2 dropped from 93.4% (80/20) to 61.5% under LOIO.

Figure 3 presents the confusion matrix in percentages for Dataset 5, illustrating the model’s performance and guiding further improvements. We specifically chose Dataset 5 due to its novelty and the opportunity this presents to evaluate the model on unpublished data.

Figure 3
Two heatmaps labeled (a) and (b) compare activities: eating, standing, and walking. In (a), eating has high accuracy at 89.06%, standing at 70.77%, and walking at 98.46%. In (b), eating shows 66.85% accuracy, standing 76.71%, and walking 35.71%.

Figure 3. Normalized confusion matrices (%) for Dataset 5. Behavior classes are eating, standing, and walking. Panel (a) shows results for an 80/20 train–test split, and panel (b) shows results using 5-fold cross-validation.

To evaluate performance across behavior classes, we examined confusion matrices for Dataset 5 under both the 80/20 train–test split and 5-fold cross-validation. In the 80/20 split (Figure 3a), walking was classified with high accuracy (98.46%), with negligible confusion with other behaviors. In contrast, 21.54% of standing instances were predicted as eating, indicating substantial overlap in sensor signals between these low-movement behaviors. Performance declined under 5-fold cross-validation (Figure 3b): classification accuracy for eating decreased to 66.85%, while walking dropped to 35.71%, with 42.86% of walking instances predicted as standing. These results highlight behavior pairs that are particularly difficult to distinguish and illustrate the increased challenge of generalizing under stricter validation.

Finally, to further probe the robustness of the proposed approach, we conducted cross-dataset validation experiments by training the same Random Forest model on one dataset and evaluating it on another; all cross-dataset results reported below were generated in this study (Table 2). This analysis was exploratory in nature and complements the primary finding demonstrated in Figure 2, namely that a single standardized model architecture and feature pipeline can be applied consistently across all datasets when trained and evaluated within each dataset. When trained and tested on closely related datasets, such as the Dataset 3 variants, cross-dataset performance remained high. For example, training on Dataset 3a and testing on Dataset 3b achieved an accuracy of 80.33% and a weighted F1 score of 72.2%, indicating that while overall classification accuracy remained high, performance varied across behavior classes with different support. Similarly, training on Dataset 3c and testing on Dataset 3a reached 98.39% accuracy and a weighted F1 of 98.39%, reflecting both high overall accuracy and balanced performance across classes. These results indicate that the learned decision structure remains effective even when sensor configurations differ. Generalization across more heterogeneous datasets was more variable. For instance, training on Dataset 5 and testing on Dataset 2 yielded an accuracy of 75.7% and a weighted F1 of 83.2%. Performance was also asymmetric in certain cases: training on Dataset 4 generalized well to Dataset 2, achieving 95.3% accuracy and a weighted F1 of 93.0%, whereas the reverse transfer resulted in 47.5% accuracy and a weighted F1 of 32.4%.

Table 2
www.frontiersin.org

Table 2. Cross-dataset generalization performance (exploratory transfer experiments).

Very low accuracies and weighted F1 scores were observed for some dataset pairs, particularly those involving Dataset 1. Rather than contradicting the main objective of this study, these cases highlight the challenges associated with directly transferring trained models across datasets that differ in labeling conventions and data representation. Importantly, these exploratory results do not detract from the central finding that the same Random Forest-based pipeline can be applied successfully across diverse cattle behavior datasets without dataset-specific model redesign but instead emphasize the role of standardized feature extraction and labeling in enabling reliable cross-dataset reuse.

The primary aim of this study is to evaluate whether a single, standardized Random Forest model can be applied consistently across multiple cattle behavior datasets, rather than to optimize cross-dataset training performance. As shown in Figure 2, the model achieves strong within-dataset performance across all datasets when trained and evaluated using dataset-specific splits, demonstrating that the same model architecture and feature pipeline can be applied across diverse data sources. Cross-dataset generalization (Table 2) was therefore conducted as an exploratory analysis to examine how the trained models behave when transferred across datasets. Results indicate that transferability varies substantially depending on dataset similarity. High performance was observed for transfers between closely related datasets, while generalization across more heterogeneous datasets was more variable and often asymmetric. Importantly, these exploratory results do not detract from the main finding of this study—that a single Random Forest-based pipeline can be successfully applied to multiple datasets without dataset-specific model redesign—but rather highlight the additional challenges associated with cross-dataset deployment. The observed variability underscores the importance of harmonized feature definitions and labeling standards when cross-dataset transfer is required.

4 Discussion

The primary aim of this study is to evaluate whether a single, standardized Random Forest model can be applied consistently across multiple cattle behavior datasets, rather than to optimize cross-dataset training performance. As demonstrated in Figure 2, the model achieves strong within-dataset performance across all datasets when trained and evaluated using dataset-specific splits, showing that the same model architecture and feature pipeline can be applied across diverse data sources. Cross-dataset generalization (Table 2) was therefore conducted as an exploratory analysis to assess robustness rather than as a primary performance criterion. While transferability varied across dataset pairs, these results do not detract from the central finding that a single Random Forest-based pipeline can be applied across multiple datasets without dataset-specific model redesign and instead highlight the importance of harmonized feature definitions and labeling standards when cross-dataset transfer is required.

As shown in the within-dataset evaluation results (Figure 2), the proposed Random Forest model demonstrates high accuracy and weighted F1 scores, highlighting its promise for cattle monitoring applications. Across multiple datasets, the model achieved a competitive average accuracy of 92.7% under an 80/20 train–test split, accurately classifying the key behaviors available within each dataset, including feeding, walking, standing, ruminating, and lying where these labels were defined. This level of performance is consistent with recent accelerometer-based behavior classification studies, which have reported varying accuracies depending on methodological choices. For example, Coelho Ribeiro et al. (2021) reported ~74% accuracy using an artificial neural network with halter-mounted accelerometers, whereas Hosseininoorbin et al. (2021) achieved an F1-score of 89.3% with sequential deep neural networks. Arablouei (2021) reached 93.4% accuracy using multilayer perceptrons, and later work by Arablouei et al. (2023b) obtained 88.5% accuracy by fusing accelerometer and GNSS data. Similarly, González et al. (2015) reported 85.5–90.5% accuracy using decision tree algorithms with GPS- and accelerometer-equipped collars. Within this context, our Random Forest approach offers competitive performance while balancing accuracy, interpretability, and computational efficiency, making it well-suited for practical livestock monitoring applications.

Additional insights into robustness emerge when examining generalizability across datasets and experimental conditions. Using a fixed model architecture and preprocessing pipeline, we first applied an 80/20 split within each dataset. This yielded strong performance across diverse configurations, with accuracies ranging from 86.1% (Dataset 5) to 96.7% (Dataset 3b, gyroscope only). However, cross-dataset validation revealed mixed transferability. In some cases, generalization was strong—e.g., Dataset 4 to Dataset 2 achieved 95.3% accuracy and 93.0% weighted F1—while the reverse direction produced far lower results: 47.5% accuracy, 32.1% weighted F1. Such asymmetry reflects differences in sampling frequency, sensor placement, and behavioral taxonomies, underscoring the need for standardized labeling frameworks and data collection protocols.

Confusion matrix analysis pointed to specific weaknesses, particularly for behaviors with overlapping movement profiles. Misclassifications between walking and standing, suggest that low-variance acceleration signals limit discrimination between similar activities. These errors became more pronounced under more rigorous validation, where intra-individual patterns contributed less to performance. Incorporating additional discriminative features—such as higher-frequency accelerometer data, GPS context, or temporal movement variation—could mitigate these ambiguities and improve robustness, especially in real-time monitoring where accurate separation of feeding and resting has welfare implications.

Applying 5-fold and Leave-One-Individual-Out (LOIO) cross-validation further highlighted these challenges. While 80/20 splits suggested consistently high accuracy, performance dropped when models were tested on more independent data. For example, Dataset 1 fell from 92.7% (80/20) to 83.6% (5-fold) and 79.2% (LOIO), while Dataset 2 declined from 93.4% to just 61.5% under LOIO. These results confirm the risk of performance inflation when train/test partitions share individual-specific patterns, as also noted by Ferdinandy et al. (2020). Although 5-fold CV provides useful variability estimates, it can be unstable in imbalanced datasets where rare behaviors are absent from some folds (He and Ma, 2013). LOIO, though more demanding, better reflects real deployment scenarios where models must generalize to entirely new individuals. Together, these findings emphasize the importance of validating livestock behavior models under cross-individual protocols rather than relying solely on random splits.

Despite the generalization challenges observed across datasets, Random Forest emerged as a particularly robust choice for behavior classification, consistently balancing accuracy with computational efficiency. The algorithm’s inherent tolerance for noisy (Breiman, 2001; Reis et al., 2019), high-dimensional accelerometer data, combined with its straightforward interpretability, makes it exceptionally well-suited for livestock monitoring applications. Our decision to use the default Random Forest parameters—100 trees with the square root of the total number of features considered at each split—ensured consistency in our comparative analysis and yielded strong baseline performance. The widespread adoption of Random Forest in livestock and animal behavior studies further reinforces its reliability for these applications. Kleanthous et al. (2018) successfully applied Random Forest to classify sheep and goat behaviors, while Dickinson et al. (2021) reported greater than 98% accuracy in Alpine ibex and pygmy goats. More recently, Dhakshinamoorthy et al. (2025) utilized Random Forest for both cattle behavior classification and estrus detection, and Muzzo et al. (2025) applied it effectively to foraging behavior classification. These studies, combined with our results, demonstrate Random Forest’s consistent effectiveness in precision livestock farming and support its suitability for integration into virtual fencing systems.

In addition to the generalization challenges discussed above, an important consideration in interpreting our results relates to the specific characteristics of the datasets used, particularly Dataset 5, which primarily focuses on confined feeding systems such as feedbunks. This presents a notable limitation for virtual fencing applications, which are typically intended for pasture-based systems with free-grazing cattle. These two environments can exhibit markedly distinct behavioral patterns, with pasture-based cattle displaying more complex grazing dynamics, varying activity levels due to larger roaming areas, environmental interactions, and seasonal behavioral changes. Consequently, models trained primarily on confined system data may not fully capture the nuances of pasture-based behaviors and may benefit from additional validation or adaptation to achieve optimal performance in these more dynamic environments. Additionally, the relatively short observation periods in some datasets may not adequately account for long-term behavioral trends, such as seasonal variations in grazing behavior, which are critical factors for effective pasture management systems.

Beyond dataset composition, several technical factors significantly influence model performance and warrant careful consideration for future development. Sensor frequency is a particularly important parameter. Higher-frequency accelerometers (e.g., 50–100 Hz) have been shown to enhance classification accuracy (Arablouei, 2023a), providing more granular movement information that improves pattern recognition (Cabezas et al., 2022), enables detection of subtle and rapid behavioral changes (Robert et al., 2009), and reduces classification ambiguity to better adapt to dynamic environments (González et al., 2015). Incorporating additional sensor modalities, particularly GPS data, would provide valuable spatial context that is especially important for pasture-based systems where grazing patterns and cattle movements vary significantly based on terrain and forage availability. Movement variation features, which capture acceleration changes between consecutive windows (Fogarty et al., 2020), have also shown promise for improving differentiation between active and passive behaviors. While not included in our current model, this represents a valuable avenue for future exploration to enhance temporal sensitivity and improve recognition of behavioral transitions. Our decision to use a fixed window size of 15 data points across all datasets prioritized cross-dataset consistency and computational efficiency, allowing us to apply a uniform preprocessing pipeline regardless of sampling frequency. However, this approach introduces variability in actual time spans (ranging from 0.5 seconds at 30 Hz to 15 seconds at 1 Hz), which may affect the model’s ability to capture long-duration behaviors such as grazing or rumination, particularly in low-frequency datasets. This trade-off between temporal resolution and behavioral stability is well-documented in the literature, with many applied studies favoring time-based epochs of 5 to 60 seconds (Walton et al., 2018).

Even with the noted limitations above, the Random Forest model demonstrates strong potential for deployment in real-world livestock monitoring systems. Random Forest and related approaches have already been applied successfully in pasture-based systems (e.g., Brennan et al., 2021; Cunningham et al., 2024), and our results reinforce their promise for real-time grazing management. Its robust performance across diverse datasets and sensor placements suggests that it could support real-time behavior tracking under varied farm conditions, potentially reducing manual labor requirements and enabling earlier detection of health or welfare issues (Halachmi et al., 2019; Džermeikaitė et al., 2023). The model’s low computational requirements also make it suitable for battery-powered embedded devices, such as those employed in virtual fencing systems (Aquilani et al., 2022). Although the present study does not incorporate GPS or GNSS data, accelerometer-based behavior classification remains a foundational component of next-generation virtual fencing technologies. Such systems increasingly aim to detect behavioral states in real time to guide stimulus delivery and improve animal welfare, even without continuous location tracking (Aaser et al., 2024). Accurate classification of behaviors such as walking and feeding could enable automated grazing management decisions and trigger alerts for behavioral anomalies, thereby improving both operational efficiency and pasture management effectiveness (Chelotti et al., 2024). The model’s adaptability across different sensor configurations further supports compatibility with existing precision livestock platforms, facilitating integration into established farm management systems.

Looking ahead, several opportunities exist to further enhance model performance and practical applicability. In this study, we intentionally used default Random Forest parameters to ensure consistency across datasets and avoid overfitting to specific data characteristics. Performance could likely be improved by fine-tuning hyperparameters such as maximum tree depth, minimum samples required to split an internal node, and minimum samples required at a leaf node, as discussed by Probst et al. (2019b). Future work should explore systematic optimization approaches—including grid search and random search—to improve generalization while balancing accuracy with computational efficiency, an especially important trade-off for embedded applications (Zhen, 2025). Methodological refinements may also strengthen robustness. For example, adaptive windowing strategies that vary window size with sampling frequency or use overlapping segments (Fida et al., 2015; Walton et al., 2018) could improve classification while preserving cross-dataset compatibility. Recent studies have also shown that adaptive or overlapping windowing strategies can improve recognition of short, transitional behaviors (Alghamdi et al., 2022) and that adjusting epoch length influences the accuracy of grazing and ruminating detection in livestock, with shorter windows capturing rapid movements and longer windows stabilizing sustained behaviors (Decandia et al., 2018). Incorporating additional modalities such as GPS could add valuable spatial context, while training and validating on longer-term datasets of free-grazing cattle would help capture seasonal and environmental variation in behavior. When such contextual metadata are available, future work could also examine the inclusion of explicit contextual effects (e.g., herd, year, or season) as additional covariates to assess their influence on behavior classification performance, although this was beyond the scope of the present study and not consistently supported by the available datasets. Finally, personalized fine-tuning approaches may increase sensitivity for health monitoring by addressing inter-individual differences and device-specific variation. Ultimately, validation in live, pasture-based settings will be essential to confirm practical utility and guide further refinements.

Taken together, these directions highlight Random Forest as not only a competitive baseline but also a practical foundation for precision livestock monitoring. Its balance of accuracy, interpretability, and computational efficiency makes it suitable for deployment on embedded, battery-powered devices in next-generation virtual fencing systems. By enabling accurate, real-time behavior classification, the model can support automated grazing management, early detection of welfare concerns, and integration into broader precision livestock platforms. While further development will enhance adaptability and robustness, our findings underscore the value of Random Forest as a core analytic tool for livestock monitoring across diverse farm environments.

5 Conclusion

This study presents a Random Forest model for classifying cattle behavior using accelerometer data, designed for real-time monitoring and integration into energy-efficient systems like virtual fencing. The model balances accuracy, generalizability, and computational simplicity—achieving strong performance across diverse datasets, sensor placements, and behavioral contexts. Its ability to detect key behaviors such as grazing and walking highlights its value in supporting timely, data-driven pasture management decisions. By bridging model performance with practical deployment requirements, this work advances the role of behavior classification in precision livestock systems. Importantly, our cross-dataset results show that while Random Forest is a strong out-of-the-box classifier, performance varies considerably when models are transferred across farms, sensor setups, or labeling conventions. Addressing these limitations through harmonized data standards and domain adaptation techniques will be essential for achieving fully generalizable models that can be deployed globally. Building on prior research demonstrating the feasibility of accelerometer-based cattle behavior classification and its use in pasture monitoring and virtual fencing systems, our model highlights the potential to integrate such classification into virtual fencing platforms, contributing to sustainable livestock production at scale.

Data availability statement

Data can be found here https://github.com/HannahJames123/cattle-behavior.

Ethics statement

The animal studies were approved by Cornell University, Ithaca, NY, USA. The studies were conducted in accordance with the local legislation and institutional requirements. Written informed consent was obtained from the owners for the participation of their animals in this study.

Author contributions

HJ: Conceptualization, Data curation, Methodology, Writing – review & editing, Investigation, Validation, Writing – original draft, Software, Formal Analysis, Visualization. CR: Writing – review & editing, Data curation, Investigation. JG: Project administration, Writing – review & editing, Supervision, Funding acquisition, Resources. DE: Funding acquisition, Resources, Writing – review & editing.

Funding

The author(s) declared financial support was received for this work and/or its publication. We gratefully acknowledge the generous support of the Bezos Earth Fund for funding this Virtual Fencing project, which emphasizes accessibility to technology in low-and-middle income countries and sustainable agriculture. Furthermore, research supported by the USDA National Institute of Food and Agriculture (NIFA) Farm of the Future project # 2023-77038-38865.

Acknowledgments

Special thanks to Juan Boza and Darke Hull for their invaluable contributions in troubleshooting and ideating artificial intelligence methodologies.

Conflict of interest

The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declared that generative AI was used in the creation of this manuscript. The author(s) verify and take full responsibility for the use of generative AI in the preparation of this manuscript. Generative AI was used to assist with language editing, clarity improvements, and formatting of text. All scientific content, analyses, and interpretations were developed and verified by the authors.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Author disclaimer

Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the view of the National Institute of Food and Agriculture (NIFA) or the United States Department of Agriculture (USDA).

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fanim.2026.1676504/full#supplementary-material

Supplementary Figure 1 | Three-dimensional scatter plots of raw accelerometer values (X, Y, Z; g) from Dataset 5 illustrating (A) walking, (B) standing, and (C) eating behaviors. All panels use identical axis limits to facilitate direct comparison. The notation n= indicates the number of labeled samples in each class. Clear differences in signal distribution are visible: walking samples (A) show greater spread across all three axes, reflecting dynamic movement; standing samples (B) cluster more tightly, indicating limited motion; and standing-eating samples (C) form a dense cluster with moderate variability, capturing head and jaw movements superimposed on a standing posture.

Supplementary Figure 2 | Workflow for the proposed Random Forest model.

References

Aaser M. F., Staahltoft S. K., Andersen M., Alstrup A. K. O., Sonne C., Bruhn D., et al. (2024). Using activity measures and GNSS data from a virtual fencing system to assess habitat preference and habitat utilisation patterns in cattle. Animals 14, 1506. doi: 10.1016/j.animal.2021.100429

PubMed Abstract | Crossref Full Text | Google Scholar

Alghamdi S., Zhao Z., Ha D. S., Morota G., and Ha S. S. (2022). Improved pig behavioranalysis by optimizing window sizes for individual behaviors on acceleration and angular velocity data. J. Anim. Sci. 100, skac293. doi: 10.1093/jas/skac293

PubMed Abstract | Crossref Full Text | Google Scholar

Andersen S. (2024). Cattle classification dataset (dataset_6.csv) (GitHub). Available online at: https://github.com/andssuu/cattle_classification/blob/main/data/dataset_6.csv (Accessed January 19, 2026).

Google Scholar

Aquilani C., Confessore A., Bozzi R., Sirtori F., and Pugliese C. (2022). Precision Livestock Farming technologies in pasture-based livestock systems. Animal 16, 100429. doi: 10.1016/j.animal.2021.100429

PubMed Abstract | Crossref Full Text | Google Scholar

Arablouei R. (2021). In-situ classification of cattle behavior using accelerometry data. Comput. Electron. Agric. 183, 106045. doi: 10.1016/j.compag.2021.106045

Crossref Full Text | Google Scholar

Arablouei R. (2023a). Animal behavior classification via deep learning on embedded systems. Comput. Electron. Agriculture 207, 107707 doi: 10.1016/j.compag.2023.107707

Crossref Full Text | Google Scholar

Arablouei R., Wang Z., Bishop-Hurley G. J., and Liu J. (2023b). Multimodal sensor data fusion for in-situ classification of animal behavior using accelerometry and GNSS data. Smart Agric. Technol. 4, 100163. doi: 10.1016/j.atech.2022.100163

Crossref Full Text | Google Scholar

Blagus R. and Lusa L. (2013). SMOTE for high-dimensional class-imbalanced data. BMC Bioinf. 14, 106. doi: 10.1186/1471-2105-14-106

PubMed Abstract | Crossref Full Text | Google Scholar

Breiman L. (2001). Random forests. Mach. Learn. 45, 5–32. doi: 10.1023/A:1010933404324

Crossref Full Text | Google Scholar

Brennan J., Johnson P., and Olson K. (2021). Classifying season long livestock grazing behavior with the use of a low-cost GPS and accelerometer. Comput. Electron. Agric. 181, 105957. doi: 10.1016/j.compag.2020.105957

Crossref Full Text | Google Scholar

Cabezas J., Yubero R., Visitación B., Navarro-García J., Algar M. J., Cano E. L., et al. (2022). Analysis of accelerometer and GPS data for cattle behaviour identification and anomalous events detection. Entropy 24, 336. doi: 10.3390/e24030336

PubMed Abstract | Crossref Full Text | Google Scholar

Chawla N. V., Bowyer K. W., Hall L. O., and Kegelmeyer W. P. (2002). SMOTE: synthetic minority over-sampling technique. Seattle, WA, USA: AI Access Foundation. doi: 10.1613/jair.953

Crossref Full Text | Google Scholar

Chelotti J. O., Martinez-Rau L. S., Ferrero M., Vignolo L. D., Galli J. R., Planisich A. M., et al. (2024). Livestock feeding behaviour: A review on automated systems for ruminant monitoring. Biosyst. Eng. 246, 150–177. doi: 10.1016/j.biosystemseng.2024.08.003

Crossref Full Text | Google Scholar

Coelho Ribeiro L. A., Bresolin T., Rosa G. J., de M., Rume Casagrande D., de A.C. M., et al. (2021). Disentangling data dependency using cross-validation strategies to evaluate prediction quality of cattle grazing activities using machine learning algorithms and wearable sensor data. J. Anim. Sci. 99, skab206. doi: 10.1093/jas/skab206

PubMed Abstract | Crossref Full Text | Google Scholar

Cunningham S. A., Augustine D. J., Derner J. D., Smith D., and Boudreau M. R. (2024). In search of an optimal bio-logger epoch and device combination for quantifying activity budgets in free-ranging cattle. Smart Agric. Technol. 9, 100646. doi: 10.1016/j.atech.2024.100646

Crossref Full Text | Google Scholar

Decandia M., Giovanetti V., Molle G., Acciaro M., Mameli M., Cabiddu A., et al. (2018). The effect of different time epoch settings on the classification of sheep behaviour using tri-axial accelerometry. Comput. Electron. Agric. 154, 112–119. doi: 10.1016/j.compag.2018.09.002

Crossref Full Text | Google Scholar

Dhakshinamoorthy D., Jha A., Majumdar S., Ghosh D., Chakraborty R., and Ray H. (2025). Classification of Cattle Behavior and Detection of Heat (Estrus) using Sensor Data. arXiv preprint arXiv 2506, 16380. Available online at: https://arxiv.org/abs/2506.16380 (Accessed January 19, 2026).

Google Scholar

Dickinson E. R., Twining J. P., Wilson R., Stephens P. A., Westander J., Marks N., et al. (2021). Limitations of using surrogates for behaviour classification of accelerometer data: refining methods using random forest models in Caprids. Movement Ecol. 9, 28. doi: 10.1186/s40462-021-00265-7

PubMed Abstract | Crossref Full Text | Google Scholar

Dittrich I., Gertz M., and Krieter J. (2019). Alterations in sick dairy cows’ daily behavioural patterns. Heliyon 5, e02902. doi: 10.1016/j.heliyon.2019.e02902

PubMed Abstract | Crossref Full Text | Google Scholar

di Virgilio A., Morales J. M., Lambertucci S. A., Shepard E. L. C., and Wilson R. P. (2018). Multi-dimensional Precision Livestock Farming: a potential toolbox for sustainable rangeland management. PeerJ 6, e4867. doi: 10.7717/peerj.4867

PubMed Abstract | Crossref Full Text | Google Scholar

Doeschl-Wilson A., Knap P. W., Opriessnig T., and More S. J. (2021). Review: Livestock disease resilience: from individual to herd level. Animal 15 Suppl 1, 100286. doi: 10.1016/j.animal.2021.100286

PubMed Abstract | Crossref Full Text | Google Scholar

Doyle R. and Moran J. (2015). Cow talk: understanding dairy cow behaviour to improve their welfare on asian farms (Melbourne, Australia: CSIRO Publishing). doi: 10.1071/9781486301621

Crossref Full Text | Google Scholar

Džermeikaitė ,. K., Bačėninaitė D., and Antanaitis R. (2023). Innovations in cattle farming: application of innovative technologies and sensors in the diagnosis of diseases. Animals 13, 780. doi: 10.3390/ani13050780

PubMed Abstract | Crossref Full Text | Google Scholar

Fan D. (2023). Cow nose ring data set (Kaggle). Available online at: https://www.kaggle.com/datasets/fandaoerji/cow-nose-ring-data-set (Accessed January 29, 2026).

Google Scholar

Fawaz H. I., Forestier G., Weber J., Idoumghar L., and Muller P.-A. (2019). Deep learning for time series classification: a review. Heidelberg, Germany: Springer Science+Business Media. doi: 10.1007/s10618-019-00619-1

Crossref Full Text | Google Scholar

Feng W., Fan D., Wu H., and Yuan W. (2024). Cow behavior recognition based on wearable nose rings. Anim. (Basel) 14, 1187. doi: 10.3390/ani14081187

PubMed Abstract | Crossref Full Text | Google Scholar

Ferdinandy B., Gerencsér L., Corrieri L., Perez P., Újváry D., Csizmadia G., et al. (2020). Challenges of machine learning model validation using correlated behaviour data: Evaluation of cross-validation strategies and accuracy measures. PloS One 15, e0236092. doi: 10.1371/journal.pone.0236092

PubMed Abstract | Crossref Full Text | Google Scholar

Fida B., Bernabucci I., Bibbo D., Conforto S., and Schmid M. (2015). Varying behavior of different window sizes on the classification of static and dynamic physical activities from a single accelerometer. Med. Eng. Phys. 37, 705–711. doi: 10.1016/j.medengphy.2015.04.005

PubMed Abstract | Crossref Full Text | Google Scholar

Fogarty E. S., Swain D. L., Cronin G. M., Moraes L. E., and Trotter M. (2020). Behaviour classification of extensively grazed sheep using machine learning. Comput. Electron. Agric. 169, 105175. doi: 10.1016/j.compag.2019.105175

Crossref Full Text | Google Scholar

Goliński P., Sobolewska P., Stefańska B., and Golińska B. (2022). Virtual fencing technology for cattle management in the pasture feeding system—A review. Basel, Switzerland: Agriculture Multidisciplinary Digital Publishing Institute (MDPI). doi: 10.3390/agriculture13010091

Crossref Full Text | Google Scholar

González L. A., Bishop-Hurley G. J., Handcock R. N., and Crossman C. (2015). Behavioral classification of data from collars containing motion sensors in grazing cattle. Comput. Electron. Agric. 110, 91–102.

Google Scholar

Halachmi I., Guarino M., Bewley J., and Pastell M. (2019). Smart animal agriculture: application of real-time sensors to improve animal well-being and production. Annu. Rev. Anim. Biosci. 7, 403–425. doi: 10.1146/annurev-animal-020518-114851

PubMed Abstract | Crossref Full Text | Google Scholar

He H. and Ma Y. (2013). Imbalanced learning: Foundations, algorithms, and applications (John Wiley & Sons: Wiley-IEEE Press). doi: 10.1002/9781118646106

Crossref Full Text | Google Scholar

Hoffman B., Cusimano M., Baglione V., Canestrari D., Chevallier D., DeSantis D. L., et al. (2024). A benchmark for computational analysis of animal behavior, using animal-borne tags. Movement Ecol. 12, 78. doi: 10.1186/s40462-024-00511-8

PubMed Abstract | Crossref Full Text | Google Scholar

Hosseininoorbin S., Layeghy S., Kusy B., Jurdak R., Greenwood P. L., and Portmann M. (2021). Deep learning-based cattle behaviour classification using joint time-frequency data representation. Computers and Electronics in Agriculture. Amsterdam, Netherlands: Elsevier. doi: 10.1016/j.compag.2021.106241

Crossref Full Text | Google Scholar

Ito H., Takeda K., Tokgoz K. K., Minati L., Fukawa M., Chao L., et al. (2021). Japanese black beef cow behavior classification dataset. CERN, Geneva, Switzerland: Zenodo. doi: 10.5281/zenodo.5399259

Crossref Full Text | Google Scholar

James H. (2024). CURC cattle behavior dataset and random forest model code. Available online at: https://github.com/HannahJames123/cattle-behavior (Accessed January 19, 2026).

Google Scholar

James H. and Rial C. (2024). Cattle behavior and movement data (CURC accel). [Dataset]. GitHub. Available online at: https://github.com/HannahJames123/cattle-behavior/tree/main/curc%20data%20accel.

Google Scholar

Janaa R., Dixit S., Sharma M., and Kumar R. (2025). An Explainable AI based approach for Monitoring Animal Health. arXiv preprint arXiv 2508, 10210. doi: 10.48550/arXiv.2508.10210

Crossref Full Text | Google Scholar

Kamphuis C., Frank E., Burke J. K., Verkerk G. A., and Jago J. G. (2013). Applying additive logistic regression to data derived from sensors monitoring behavioral and physiological characteristics of dairy cows to detect lameness. J. Dairy Sci. 96, 7043–7053. doi: 10.3168/jds.2013-6993

PubMed Abstract | Crossref Full Text | Google Scholar

Kleanthous N., Hussain A., Mason A., Sneddon J., Shaw A., Fergus P., et al. (2018). “Machine learning techniques for classification of livestock behavior,” in International Conference on Neural Information Processing (Cham: Springer International Publishing), 304–315.

Google Scholar

Lovarelli D. (2020). A review on dairy cattle farming: Is precision livestock farming the compromise for an environmental, economic and social sustainable production? J. Clean. Prod. 262, 121409. doi: 10.1016/j.jclepro.2020.121409

Crossref Full Text | Google Scholar

Mladenova T. (2024). Accelerometer and gyroscope sensor data from cows. Available online at: https://figshare.com/articles/dataset/Accelerometer_and_Gryscope_Sensor_Data_from_Cows/25920463 (Accessed January 19, 2026).

Google Scholar

Mladenova T., Valova I., Evstatiev B., Valov N., Varlyakov I., Markov T., et al. (2024). Evaluation of the efficiency of machine learning algorithms for identification of cattle behavior using accelerometer and gyroscope data. AgriEngineering 6, 2179–2197. doi: 10.3390/agriengineering6030128

Crossref Full Text | Google Scholar

Monteiro A., Santos S., and Gonçalves P. 2021 (Precision Agriculture for Crop and Livestock Farming—Brief Review). doi: 10.3390/ani11082345

PubMed Abstract | Crossref Full Text | Google Scholar

Murray B. D., Wagner K. L., Reuter R., and Goodman L. E. (2025). Use of virtual fencing to implement critical conservation practices. Rangelands 47, 41–49. doi: 10.1016/j.rala.2024.08.003

Crossref Full Text | Google Scholar

Muzzo B. I., Bladen K., Perea A., Nyamuryekung’e S., and Villalba J. J. (2025). Multi-sensor integration and machine learning for high-resolution classification of herbivore foraging behavior. Animals 15, 913. doi: 10.3390/ani15070913

PubMed Abstract | Crossref Full Text | Google Scholar

Nathan R., Spiegel O., Fortmann-Roe S., Harel R., Wikelski M., and Getz W. M. (2012). Using tri-axial acceleration data to identify behavioral modes of free-ranging animals: general concepts and tools illustrated for griffon vultures. J. Exp. Biol. 215, 986–996. doi: 10.1242/jeb.058602

PubMed Abstract | Crossref Full Text | Google Scholar

Neethirajan S. (2017). Recent advances in wearable sensors for animal health management. Sens. Bio-Sensing Res. 12, 15–29. doi: 10.1016/j.sbsr.2016.11.004

Crossref Full Text | Google Scholar

Neethirajan S. (2020). The role of sensors, big data and machine learning in modern animal farming. Sens. Bio-Sensing Res. 29, 100367. doi: 10.1016/j.sbsr.2020.100367

Crossref Full Text | Google Scholar

Ni G., Shi Z., Xu Y., Han X., Miao J., and Tang W. (2025). “Optimization of inertial sensor’s sampling frequency and window length for classification of livestock behavior with machine learning,” in 2025 IEEE 2nd International Conference on Electronics, Communications and Intelligent Science (ECIS). 1–6 (Piscataway, NJ, USA: Institute of Electrical and Electronics Engineers (IEEE)). doi: 10.1109/ECIS65594.2025.11087006

Crossref Full Text | Google Scholar

Ordóñez F. J. and Roggen D. (2016). Deep convolutional and LSTM recurrent neural networks for multimodal wearable activity recognition. Sensors (Basel) 16, 115. doi: 10.3390/s16010115

PubMed Abstract | Crossref Full Text | Google Scholar

Pandala S. R. (2021). Software LazyPredict: A simple and easy-to-use library for automatic model selection (Github: PyPI). Available online at: https://pypi.org/project/lazypredict/.

Google Scholar

Paudyal S., Maunsell F. P., Richeson J. T., Risco C. A., Donovan D. A., and Pinedo P. J. (2018). Rumination time and monitoring of health disorders during early lactation. Animal 12, 1484–1492. doi: 10.1017/S1751731117002932

PubMed Abstract | Crossref Full Text | Google Scholar

Probst P., Boulesteix A. L., and Bischl B. (2019a). Tunability: Importance of hyperparameters of machine learning algorithms. J. Mach. Learn. Res. 20, 1–32. doi: 10.48550/arXiv.1802.09596

Crossref Full Text | Google Scholar

Probst P., Wright M. N., and Boulesteix A. L. (2019b). Hyperparameters and tuning strategies for random forest. Wiley Interdiscip. Reviews: Data Min. knowledge Discov. 9, e1301. doi: 10.1002/widm.1301

Crossref Full Text | Google Scholar

Rahman A., Smith D. V., Little B., Ingham A. B., Greenwood P. L., and Bishop-Hurley G. J. (2018). Cattle behaviour classification from collar, halter, and ear tag sensors. Inf. Process. Agric. 5, 124–133. doi: 10.1016/j.inpa.2017.10.001

Crossref Full Text | Google Scholar

Rast W., Kimmig S. E., Giese L., and Berger A. (2020). Machine learning goes wild: Using data from captive individuals to infer wildlife behaviours. PloS One 15, e0227317. doi: 10.1371/journal.pone.0227317

PubMed Abstract | Crossref Full Text | Google Scholar

Reis I. (2019). Probabilistic random forest: A machine learning algorithm for noisy data sets. Astronomical J. 157. doi: 10.3847/1538-3881/aaf101

Crossref Full Text | Google Scholar

Rial C., Stangaferro M. L., Thomas M. J., and Giordano J. O. (2024). Effect of automated health monitoring based on rumination, activity, and milk yield alerts versus visual observation on herd health monitoring and performance outcomes. J. Dairy Sci. 107, 11576–11596. doi: 10.3168/jds.2024-25256

PubMed Abstract | Crossref Full Text | Google Scholar

Robert B., White B. J., Renter D. G., and Larson R. L. (2009). Evaluation of three-dimensional accelerometers to monitor and classify behavior patterns in cattle. Comput. Electron. Agric. 67, 80–84. doi: 10.1016/j.compag.2009.03.002

Crossref Full Text | Google Scholar

Roberts H. and Segev A. (2020). “Animal behavior prediction with long short-term memory,” in 2020 IEEE International Conference on Big Data (Big Data). Presented at the IEEE International Conference on Big Data (Piscataway, NJ, USA: Institute of Electrical and Electronics Engineers (IEEE)). doi: 10.1109/BigData50022.2020.9378184

Crossref Full Text | Google Scholar

Russel N. S. (2024). Decoding cow behavior patterns from accelerometer data using deep learning. J. Veterinary Behav. 74, 68–78. doi: 10.1016/j.jveb.2024.06.005

Crossref Full Text | Google Scholar

Sharma B. and Koundal D. (2018). Cattle health monitoring system using wireless sensor network: A survey from innovation perspective. IET Wireless Sensor Syst. 8, 143–151. doi: 10.1049/iet-wss.2017.0060

Crossref Full Text | Google Scholar

Silva M. A., Veronese A., Belli A., Madureira E. H., Galvão K. N., and Chebel R. C. (2021). Effects of adding an automated monitoring device to the health screening of postpartum Holstein cows on survival and productive and reproductive performances. J. Dairy Sci. 104, 3439–3457. doi: 10.3168/jds.2020-18562

PubMed Abstract | Crossref Full Text | Google Scholar

Stangaferro M. L., Wijma R., Caixeta L. S., Al-Abri M. A., and Giordano J. O. (2016). Use of rumination and activity monitoring for the identification of dairy cows with health disorders: Part II. Mastitis. J. Dairy Sci. 99, 7411–7421. doi: 10.3168/jds.2016-10908

PubMed Abstract | Crossref Full Text | Google Scholar

Sugali K., Sprunger C., and Inukollu V. N. (2021a). AI testing: ensuring a good data split between data sets (Training and test) using K-means clustering and decision tree analysis. Int. J. Soft Computing 12. doi: 10.5121/IJSC.2021.12101

Crossref Full Text | Google Scholar

Versluijs E., Niccolai L. J., Spedener M., Zimmermann B., Hessle A., Tofastrud M., et al. (2023). Classification of behaviors of free-ranging cattle using accelerometry signatures collected by virtual fence collars. Front. Anim. Sci. 4, 1083272. doi: 10.3389/fanim.2023.1083272

Crossref Full Text | Google Scholar

Walton E., Casey C., Mitsch J., Vázquez-Diosdado J. A., Yan J., Dottorini T., et al. (2018). Evaluation of sampling frequency, window size and sensor position for classification of sheep behaviour. R. Soc. Open Sci. 5, 171442. doi: 10.1098/rsos.171442

PubMed Abstract | Crossref Full Text | Google Scholar

Wätzold F., Jauker F., Komainda M., Schöttker O., Horn J., Sturm A., et al. (2024). Harnessing virtual fencing for more effective and adaptive agri-environment schemes to conserve grassland biodiversity. Biol. Conserv. 297, 110736. doi: 10.1016/j.biocon.2024.110736

Crossref Full Text | Google Scholar

Williams M. L., Mac Parthaláin N., Brewer P., James W. P. J., and Rose M. T. (2016). A novel behavioral model of the pasture-based dairy cow from GPS data using data mining and machine learning techniques. J. Dairy Sci. 99, 2063–2075. doi: 10.3168/jds.2015-10254

PubMed Abstract | Crossref Full Text | Google Scholar

Ye L. and Keogh E. (2009). “Time series shapelets: a new primitive for data mining,”. in Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, 947–956.

Google Scholar

Zhang L., Wang X., and Chen Y. (2022). CNN and transfer learning-based classification model for automated cow’s feeding behavior recognition from accelerometer data. Cold Spring Harbor, NY, USA: bioRxiv. doi: 10.1101/2022.07.03.498612

Crossref Full Text | Google Scholar

Zhen T. (2025). Optimization strategies for low-power AI models on embedded devices. Appl. And Comput. Eng. 133, 38–45. doi: 10.54254/2755-2721/2025.20598. Учредители.

Crossref Full Text | Google Scholar

Zheng Y., Won W. K., Guan X., and Trost S. (2013). “Physical activity recognition from accelerometer data using a multi-scale ensemble method,” in Proceedings of the AAAI Conference on Artificial Intelligence, 27, 1575–1581.

Google Scholar

Keywords: accelerometer, Cattle behavior classification, machine learning, Precision livestock farming, random forest, Virtual fencing, wearable sensors

Citation: James H, Rial C, Giordano J and Erickson D (2026) Advancing standardized cattle behavior classification with a random forest model evaluated across diverse datasets. Front. Anim. Sci. 7:1676504. doi: 10.3389/fanim.2026.1676504

Received: 30 July 2025; Accepted: 09 January 2026; Revised: 23 December 2025;
Published: 30 January 2026.

Edited by:

Christa Egger-Danner, ZuchtData EDV-Dienstleistungen GmbH, Austria

Reviewed by:

Seyed Abbas Rafat, University of Tabriz, Iran
Thai Ha Dang, Pukyong National University, Republic of Korea

Copyright © 2026 James, Rial, Giordano and Erickson. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: David Erickson, ZGU1NEBjb3JuZWxsLmVkdQ==

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.