AUTHOR=Gabur Iulian , Simioniuc Danut Petru , Snowdon Rod J. , Cristea Dan TITLE=Machine Learning Applied to the Search for Nonlinear Features in Breeding Populations JOURNAL=Frontiers in Artificial Intelligence VOLUME=Volume 5 - 2022 YEAR=2022 URL=https://www.frontiersin.org/journals/artificial-intelligence/articles/10.3389/frai.2022.876578 DOI=10.3389/frai.2022.876578 ISSN=2624-8212 ABSTRACT=Large plant breeding populations are traditionally a source of novel allelic diversity and stand at the core of selection efforts for elite material. Finding rare diversity requires deep understanding of biological interactions among the genetic makeup of one genotype and its environmental conditions. Most modern breeding programs still rely on linear regression models to solve this problem, by generalizing the complex genotype by phenotype interactions through manually constructed linear features. However, the identification of positive alleles versus background can be addressed by using deep learning approaches that have the capacity to learn complex non-linear functions for the inputs. Machine learning (ML) is an artificial intelligence (AI) approach involving a range of algorithms in order to learn from input datasets and predict outcomes in other related samples. This article describes a variety of techniques that include supervised and unsupervised machine learning algorithms to improve our understanding of non-linear interactions from plant breeding datasets. Feature selection methods are combined with linear and non-linear predictors and compared with traditional prediction linear methods used in plant breeding. Recent advances in machine learning allowed the construction of complex models that have the capacity to better differentiate between positive alleles and the genetic background. Using real plant breeding programs data, we show that deep machine learning methods have the ability to outperform current approaches, increase prediction accuracies, decrease the computing time drastically and improve detection of important alleles involved in qualitative or quantitative traits.