AUTHOR=Huang Yen-Hsiang , Ku Hsin-Mei , Wang Chong-An , Chen Ling-Yu , He Shan-Syue , Chen Shu , Liao Po-Chun , Juan Pin-Yuan , Kao Chung-Feng TITLE=A multiple phenotype imputation method for genetic diversity and core collection in Taiwanese vegetable soybean JOURNAL=Frontiers in Plant Science VOLUME=Volume 13 - 2022 YEAR=2022 URL=https://www.frontiersin.org/journals/plant-science/articles/10.3389/fpls.2022.948349 DOI=10.3389/fpls.2022.948349 ISSN=1664-462X ABSTRACT=The establishment of vegetable soybean (edamame) [Glycine max (L.) Merr.] germplasms have been highly valued in Asia and the US, owing to the increasing market demand for edamame. The idea of core collection (CC) is to shorten the breeding program so as to improve the availability of germplasm resources. However, multi-dimensional phenotypes typically are highly correlated and have different levels of missing rate, often failing to capture the underlying pattern of the germplasms and select CC precisely. These are commonly observed on correlated samples. To overcome such scenario, we introduced the “multiple imputation” (MI) method to iteratively impute missing phenotypes for 46 morphological traits, and jointly analyzed high-dimensional imputed missing phenotypes (ECimpu) to explore population structure and relatedness among 200 Taiwanese vegetable soybean accessions. An advanced maximization strategy with a heuristic algorithm by PowerCore was conducted to evaluate the morphological diversity among the ECimpu. In total, 36 accessions (denoted as CCimpu) were efficiently selected, representing high diversity and the entire coverage of the ECimpu. Only 4 traits (8.7%) showed slightly significant differences between CCimpu and ECimpu. Compared to the ECimpu, 96% traits retained all characteristics or slight diversity lost in the CCimpu. The CCimpu exhibited a small percentage of significant mean difference (4.51%), a large coincidence rate (98.10%), variable rate (138.76%) and coverage (close to 100%), indicating the representativeness of the ECimpu. We noted that CCimpu outperformed CCraw in evaluation properties, suggesting the multiple phenotypes imputation method has the potential to deal with missing phenotypes in correlated samples efficiently and reliably without re-phenotyping accessions. Our results illustrated significant role of imputed missing phenotypes in support of the MI-based framework for plant-breeding programs.