AUTHOR=Amarilho-Silveira Fernando , De Barbieri Ignacio , Navajas Elly A. , Cobuci Jaime Araujo , Ciappesoni Gabriel 

TITLE=Machine learning approaches for predicting feed intake in Australian Merino, Corriedale, and Dohne Merino sheep

JOURNAL=Frontiers in Animal Science

VOLUME=Volume 6 - 2025

YEAR=2025

URL=https://www.frontiersin.org/journals/animal-science/articles/10.3389/fanim.2025.1579974

DOI=10.3389/fanim.2025.1579974

ISSN=2673-6225

ABSTRACT=Feed intake is a challenging trait to measure due to the high costs associated with labor, feeding, and facilities. Applying machine learning approaches, considering traits as potential predictors, offers a cost-effective alternative to direct feed intake measurement. By leveraging existing animal data, these models can optimize resources and enable feed intake estimation across a larger population without the need for labor-intensive trials. This research aimed to test combinations of feature selection and prediction models to find the best feed intake (expressed as metabolizable energy intake) prediction approach for a dataset comprising Australian Merino, Corriedale, and Dohne Merino data. The study dataset with 1,708 observations included 920 Australian Merino, 215 Corriedale, and 337 Dohne Merino sheep from 17 feed intake trials conducted between 2019 and 2022. The dataset was randomly partitioned into two subsets: one for training (80%) the algorithms and the other for direct validation (20%). Feature selection methods included track analysis, stepwise model, and principal components analysis. The prediction models were stepwise, linear regression, nonlinear regression, k-nearest neighbor regression, random forest regression, and support vector machines. The highest R2 value was found in the support vector machines using the stepwise model for feature selection, with a value of 0.91 in the cross-validation of the training dataset, and Pearson and Spearman correlation coefficients of 0.95 and 0.93, respectively. In direct validation, the k-nearest neighbor model with the stepwise feature selection model presented the highest Pearson and Spearman correlation coefficients, with values of 0.92 and 0.90, respectively. In the confusion matrix, the support vector machines with stepwise feature selection showed the best performance. The model correctly distinguished between high and low metabolic energy intake in all cases, achieving an overall accuracy of 0.76. This indicates that support vector machines effectively captures the underlying patterns of feed intake distribution. The approaches that presented the best performance balance in both cross-validation and direct validation were the k-nearest neighbor model and the support vector machines using the stepwise model for feature selection.