AUTHOR=Montesinos López Osval Antonio , Mosqueda González Brandon Alejandro , Palafox González Abel , Montesinos López Abelardo , Crossa José 

TITLE=A General-Purpose Machine Learning R Library for Sparse Kernels Methods With an Application for Genome-Based Prediction

JOURNAL=Frontiers in Genetics

VOLUME=Volume 13 - 2022

YEAR=2022

URL=https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2022.887643

DOI=10.3389/fgene.2022.887643

ISSN=1664-8021

ABSTRACT=The adoption of the machine learning framework in other areas beyond computer science have been facilitated with the development of user-friendly software tools that only require an intermediate understanding level of computer programming. In this paper, we present a new package (sparce kernel method, SKM) software developed in R language for implementing six (generalized boosted machines, generalized linear models, support vector machines, random forest, Bayesian regression models and deep neural networks) of the most popular supervised machine learning algorithms with the (optional) use of sparse kernels. The SKM focuses on user simplicity, as it does not try to include all the available machine learning algorithms but rather the most important aspects of these six algorithms in an easy-to-understand format. Another relevant contribution of this package is a function for the computation of seven different kernels (Linear, Polynomial, Sigmoid, Gaussian, Exponential, Arc-Cosine 1 and Arc-Cosine L) and their sparse versions, which enables the creation of kernel machines without modifying the statistical machine learning algorithm. It is important to point out that the main contribution of our package resides in the functionality for the computation of the sparse version of seven basic kernels, which is of paramount importance when aiming to reduce the computational resources for the implementation of kernel machine learning methods without a significant loss in prediction performance.  Performance of the SKM is evaluated in a genome-based prediction framework using both a maize and wheat data set. As such, the use of this package is not restricted to genome prediction problems, and it can be used in many different applications.