AUTHOR=Liao Xiangyu , Zhu Wufei , Liu Chaoyun TITLE=A high-precision genome size estimator based on the k-mer histogram correction JOURNAL=Frontiers in Genetics VOLUME=Volume 15 - 2024 YEAR=2024 URL=https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2024.1451730 DOI=10.3389/fgene.2024.1451730 ISSN=1664-8021 ABSTRACT=In the realm of next-generation sequencing datasets, various characteristics can be extracted through k-mer based analysis. Among these characteristics, genome size (GS) is one that can be estimated with relative ease, yet achieving satisfactory accuracy, especially in the context of heterozygosity, remains a challenge. In this study, we introduce a high-precision genome size estimator, GSET (Genome S ize E stimation T ool), which is based on k-mer histogram correction. The processing model of GSET diverges from the popular data fitting models used by similar tools. Instead, it is derived from empirical data and incorporates a correction term to mitigate the impact of sequencing errors on genome size estimation. We have evaluated GSET on both simulated and real datasets. The experimental results demonstrate that this tool can estimate genome size with greater precision, even surpassing the accuracy of state-of-the-art tools. Notably, GSET also performs satisfactorily on heterozygous datasets, where other tools struggle to produce usable results. GSET is freely available for use and can be accessed at the following URL: https://github.com/Xingyu-Liao/GSET.