AUTHOR=Luo Junwei , Chen Ranran , Zhang Xiaohong , Wang Yan , Luo Huimin , Yan Chaokun , Huo Zhanqiang TITLE=LROD: An Overlap Detection Algorithm for Long Reads Based on k-mer Distribution JOURNAL=Frontiers in Genetics VOLUME=Volume 11 - 2020 YEAR=2020 URL=https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2020.00632 DOI=10.3389/fgene.2020.00632 ISSN=1664-8021 ABSTRACT=Third generation sequencing technologies can produce large numbers of long reads that have been widely used in many fields. When using long reads for genome assembly, overlap detection between any pair of long reads is an important step. However, the sequencing error rate of third generation sequencing technologies is very high, and obtaining accurate overlap detection results remains a challenging task. In this study, we present a long-read overlap detection algorithm (LROD), that can improve the accuracy of overlap detection results. For detecting overlapping regions between two long reads, LROD first retains only solid common k-mers between them. The solid common k-mers can simplify the process of overlap detection. Second, LROD finds a chain (i.e., candidate overlap) that includes the consistent common k-mers. In this step, LROD presents a two-stage strategy to evaluate whether two common k-mers are consistent. Finally, LROD further utilizes a novel strategy to determine whether the candidate overlap is true, and revise it. To verify the performance of LROD, three simulated and two real long-read datasets are used in the experiments. Compared with the other two popular methods (MHAP and Minimap2), LROD can achieve good performance in terms of F1-score, precision and recall. LROD is available from https://github.com/luojunwei/LROD. https://github.com/luojunwei/LROD.