The final, formatted version of the article will be published soon.
ORIGINAL RESEARCH article
Front. Plant Sci.
Sec. Plant Breeding
Volume 15 - 2024 |
doi: 10.3389/fpls.2024.1373318
This article is part of the Research Topic Utilizing Machine Learning with Phenotypic and Genotypic Data to enhance Effective Breeding in Agricultural and Horticultural Crops View all 5 articles
Enhancing genomic prediction with Stacking Ensemble Learning in Arabica Coffee
Provisionally accepted- 1 Universidade Federal de Viçosa, Viçosa, Brazil
- 2 Brazilian Agricultural Research Corporation (EMBRAPA), Brasília, Distrito Federal, Brazil
- 3 University of Florida, Gainesville, Florida, United States
Coffee Breeding programs have traditionally relied on observing plant characteristics over years, a slow and costly process. Genomic selection (GS) offers a DNA-based alternative for faster selection of superior cultivars. Stacking Ensemble Learning (SEL) combines multiple models for potentially even more accurate selection. This study explores SEL potential in coffee breeding, aiming to improve prediction accuracy for important traits (yield-YL, total number of the fruits -NF, leaf miner infestation -LM and, cercosporiosis incidence-Cer) in Coffea Arabica. We analyzed data from 195 individuals genotyped for 21,211 single-nucleotide polymorphism (SNP) markers. To comprehensively assess model performance, we employed a cross-validation (CV) scheme. GBLUP, multivariate adaptive regression splines (MARS), Quantile Random Forest (QRF), and Random Forest (RF) served as base learners. For the meta-learner within the SEL framework, various options were explored, including Ridge Regression, RF, GBLUP, and Single Average. The Stacking Ensemble Learning (SEL) method was able to predict the predictive ability (PA) of important traits in Coffea Arabica. SEL presented higher PA compared with those obtained for all base learner methods. The gains in PA in relation to GBLUP were 87.44% (the ratio between the PA obtained from best Stacking model and the GBLUP), 37.83%, 199.82%, and 14.59% for YL, NF, LM and Cer, respectively. Overall, SEL presents a promising approach for GS. By combining predictions from multiple models, SEL can potentially enhance the predictive ability (PA) of GS for complex traits.
Keywords: Statistical and Machine Learning, prediction accuracy, plant breeding, ensemble methods, GBLUP
Received: 19 Jan 2024; Accepted: 12 Jun 2024.
Copyright: © 2024 Nascimento, Campana Nascimento, Azevedo, Baião, Caixeta and Jarquin. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence:
Moyses Nascimento, Universidade Federal de Viçosa, Viçosa, Brazil
Diego Jarquin, University of Florida, Gainesville, 32609, Florida, United States
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.
Antonio Carlos Baião
2