AUTHOR=Won Sohyoung , Park Jong-Eun , Son Ju-Hwan , Lee Seung-Hwan , Park Byeong Ho , Park Mina , Park Won-Chul , Chai Han-Ha , Kim Heebal , Lee Jungjae , Lim Dajeong TITLE=Genomic Prediction Accuracy Using Haplotypes Defined by Size and Hierarchical Clustering Based on Linkage Disequilibrium JOURNAL=Frontiers in Genetics VOLUME=Volume 11 - 2020 YEAR=2020 URL=https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2020.00134 DOI=10.3389/fgene.2020.00134 ISSN=1664-8021 ABSTRACT=Genomic prediction is an effective way to measure the breeding values from genetic information based on statistical methods such as best linear unbiased prediction (BLUP). Using haplotype, clusters of linked single nucleotide polymorphism (SNP), as markers instead of individual SNPs can improve the accuracy of genomic prediction, since the probability of a quantitative trait loci to be in strong linkage disequilibrium (LD) with markers is higher. For efficient use of haplotypes in genomic prediction, finding optimal ways to define haplotypes is essential. In this study, 770K SNP chip data was collected from Hanwoo (Korean cattle) population consisted of 3498 cattle. Using SNP chip data, haplotype was defined in three different ways: haplotypes were defined based on 1) length of haplotypes (bp), 2) the number of SNPs included, and 3) agglomerative hierarchical clustering based on LD. To compare the methods in parallel, haplotypes defined by all methods were set to have comparable sizes; in each method, haplotypes defined to have an average number of 5, 10, 20 or 50 SNPs were tested respectively. A modified genomic BLUP (GBLUP) method using haplotype was applied for testing the prediction accuracy of each haplotype set. Also, GBLUP using individual SNPs were tested to evaluate the performance of the haplotype sets on genomic prediction. Carcass weight (CWT), backfat thickness (BFT) and eye muscle area (EMA) were used as the phenotypes. As a result, using haplotypes defined by all three methods showed increased accuracy compared to GBLUP using individual SNPs for all the traits. LD clustering-based haplotypes including average 10 and 20 SNPs showed the highest prediction for CWT and BFT respectively. For EMA, the highest accuracy was obtained when length-based haplotypes with 50 SNPs were used. The maximum gain in accuracy was 3.06% for EMA, suggesting that genomic prediction accuracy can be substantially increased by using haplotypes. When the number of alleles generated by each haplotype defining methods was compared, clustering by LD generated the least number of alleles, reducing computational costs. Finding optimal ways to define haplotypes and using the haplotype alleles as markers can improve the performance of genomic prediction.