AUTHOR=Duc Nguyen Trung , Ramlal Ayyagari , Rajendran Ambika , Raju Dhandapani , Lal S. K. , Kumar Sudhir , Sahoo Rabi Narayan , Chinnusamy Viswanathan TITLE=Image-based phenotyping of seed architectural traits and prediction of seed weight using machine learning models in soybean JOURNAL=Frontiers in Plant Science VOLUME=Volume 14 - 2023 YEAR=2023 URL=https://www.frontiersin.org/journals/plant-science/articles/10.3389/fpls.2023.1206357 DOI=10.3389/fpls.2023.1206357 ISSN=1664-462X ABSTRACT=Among the seed attributes, weight is one of the main factors determining the harvest index of soybean. Recently the focus of soybean breeding shifted to improving the seed size and weight for crop improvement in seed and oil yield. With the recent technological advancements, there is an increased application of imaging sensors that provide simple, real-time, non-destructive, inexpensive image data for rapid image-based prediction of seed traits in plant breeding programs. The present work is related to digital image analysis of seed traits for the prediction of 100-seed weight (SW) in soybean. The image-based seed architecture traits (i-traits) measured were area size (AS), perimeter length (PL), length (L), width (W), length-to-width ratio (LWR), the intersection of length and width (IS), seed circularity (CS) and distance between IS and CG (DS). The phenotypic investigation revealed a significant genetic variability among 164 soybean genotypes for both i-traits and manually measured seed weight. Seven popular machine learning (ML) algorithms are Simple Linear Regression (SLR), Multiple Linear Regression (MLR), Random Forest (RF), Support Vector Regression (SVR), LASSO Regression (LR), Ridge Regression (RR), Elastic Net regression (EN) were used to create models, which can predict the weight of soybean seeds based on the image-based novel features derived from Red-Green-Blue (RGB)/visual image. Among the models, random forest and multiple linear regression models that use multiple explanatory variables related to seed size traits (AS, L, W & DS) were identified as the best model for predicting seed weight with the highest prediction accuracy (coefficient of determination, R 2 = 0.98 & 0.94, respectively) and lowest prediction error i.e., root mean square error (RMSE) and mean absolute error (MAE). Finally, the principal components analysis (PCA) and hierarchical clustering approach were used to identify IC538070 as a superior genotype with a larger seed size and weight. The identified donors/traits can be potentially used in soybean improvement programs.