Editorial: Machine Learning for Big Data Analysis: Applications in Plant Breeding and Genomics

Machine Learning for Big Data Analysis: Applications in Plant Breeding and Genomics

learning calibration method (DL_M2) helped to increase the genome-enabled prediction performance in all datasets.
In terms of practical applications, there are many opportunities to put state-of-the-art techniques in machine learning models. For example, ML can also be used to improve the accuracy of the plant phenotyping process, and the predicted data can then be used in turn for QTL (quantitative trait loci) studies. Traditional phenotyping methods are usually labor-intensive, time-consuming, and prone to errors, whereas high-throughput phenotyping platforms can effectively attain physiological traits related to photosynthesis and secondary metabolites that can enhance breeding efficiency. Kumar et al. evaluated supervised machine learning models for their accuracy in distinguishing waterstressed plants and identifying the most important water stress-related parameters in lettuce. The authors reported that random forest (RF) had an accuracy of 89.7% using kinetic chlorophyll fluorescence parameters, whereas the neural network (NN) reached an accuracy of 89.8% using hyperspectral imaging-derived vegetation indices. Then, the top 10 parameters selected by RF and NN were genetically mapped using the Lactuca sativa × L. serriola interspecific recombinant inbred line (RIL) population, allowing the identification of 25 QTL segregating for water stress-related traits, 26 for the chlorophyll fluorescence traits, and 34 for spectral vegetation indices (VI). Shin and Nuzhdin also adopted random forest models to apply the samples prioritization scheme, revealing how ML facilitated the investigation of predictive causal markers in most of the biological scenarios simulated in the present study.
Besides QTL mapping and GS studies, this Research Topic also covered the power of ML in predicting regulatory sequences involved in stress tolerance mechanisms. In this context, Gupta et al. developed the gene regulation and association network (GRAiN) for rice (Oryza sativa). GRAiN is an interactive querybased web-platform built by applying a combination of different network inference algorithms to publicly available gene expression data. The supervised machine learning framework can convert intricate network connectivity patterns of transcription factors (TFs) into a single drought score, allowing the prediction and the validation of OsbHLH148 as an important player involved in rice drought stress.
Computational algorithms can be successfully applied for the analysis of big data generated from cutting-edge NGS platforms. Niu et al. reported the de novo assembly of a macadamia tree by a combination of Oxford nanopore and Hi-C (high-throughput chromosome conformation capture) sequencing technologies. Although no ML model was reported by authors, the extensive analyses performed shed light on the genome evolution of this species providing experimental support for detecting genes underlying the biosynthesis of unsaturated fatty acids, thus laying the basis of genomic-assisted breeding for this species.
In conclusion, the papers collected in this Research Topic have presented some of the recent advances in the application of machine learning models to different omics disciplines, enhancing their integration toward a resolution of key biological questions. For this purpose, "explainable machine learning" will be a key area for genotype to phenotype research, especially in generating accurate predictions combined with reliable interpretations, as also reported by Danilevicz et al.

AUTHOR CONTRIBUTIONS
All authors listed have made a substantial, direct, and intellectual contribution to the work and approved it for publication.