AUTHOR=Huang Yen-Hsiang , Ku Hsin-Mei , Wang Chong-An , Chen Ling-Yu , He Shan-Syue , Chen Shu , Liao Po-Chun , Juan Pin-Yuan , Kao Chung-Feng TITLE=A multiple phenotype imputation method for genetic diversity and core collection in Taiwanese vegetable soybean JOURNAL=Frontiers in Plant Science VOLUME=13 YEAR=2022 URL=https://www.frontiersin.org/journals/plant-science/articles/10.3389/fpls.2022.948349 DOI=10.3389/fpls.2022.948349 ISSN=1664-462X ABSTRACT=

Establishment of vegetable soybean (edamame) [Glycine max (L.) Merr.] germplasms has been highly valued in Asia and the United States owing to the increasing market demand for edamame. The idea of core collection (CC) is to shorten the breeding program so as to improve the availability of germplasm resources. However, multidimensional phenotypes typically are highly correlated and have different levels of missing rate, often failing to capture the underlying pattern of germplasms and select CC precisely. These are commonly observed on correlated samples. To overcome such scenario, we introduced the “multiple imputation” (MI) method to iteratively impute missing phenotypes for 46 morphological traits and jointly analyzed high-dimensional imputed missing phenotypes (ECimpu) to explore population structure and relatedness among 200 Taiwanese vegetable soybean accessions. An advanced maximization strategy with a heuristic algorithm and PowerCore was used to evaluate the morphological diversity among the ECimpu. In total, 36 accessions (denoted as CCimpu) were efficiently selected representing high diversity and the entire coverage of the ECimpu. Only 4 (8.7%) traits showed slightly significant differences between the CCimpu and ECimpu. Compared to the ECimpu, 96% traits retained all characteristics or had a slight diversity loss in the CCimpu. The CCimpu exhibited a small percentage of significant mean difference (4.51%), and large coincidence rate (98.1%), variable rate (138.76%), and coverage (close to 100%), indicating the representativeness of the ECimpu. We noted that the CCimpu outperformed the CCraw in evaluation properties, suggesting that the multiple phenotype imputation method has the potential to deal with missing phenotypes in correlated samples efficiently and reliably without re-phenotyping accessions. Our results illustrated a significant role of imputed missing phenotypes in support of the MI-based framework for plant-breeding programs.