AUTHOR=Lüth Theresa , Schaake Susen , Grünewald Anne , May Patrick , Trinh Joanne , Weissensteiner Hansi TITLE=Benchmarking Low-Frequency Variant Calling With Long-Read Data on Mitochondrial DNA JOURNAL=Frontiers in Genetics VOLUME=Volume 13 - 2022 YEAR=2022 URL=https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2022.887644 DOI=10.3389/fgene.2022.887644 ISSN=1664-8021 ABSTRACT=Background: Sequencing quality has improved over the last decade for long-reads, allowing for more accurate detection of somatic low-frequency variants. In this study, we used mixtures of mitochondrial samples with different haplogroups (i.e., a specific set of mitochondrial variants) to investigate the applicability of Nanopore sequencing for low-frequency single nucleotide variant detection. Methods: We investigated the impact of base-calling, alignment/mapping, QC steps and variant calling by comparing the results to a previously derived short-read gold standard generated on the Illumina NextSeq. For Nanopore sequencing, six mixtures of four different haplotypes were prepared, allowing us to reliably check for expected variants at the predefined 5%, 2% and 1% mixture levels. We used two different versions of Guppy for base-calling, two aligners (i.e., Minimap2 and Ngmlr) and three variant callers (i.e., Mutserve2, Freebayes and Nanopanel2) to compare low-frequency variants. We used F1 scores measurements to assess the performance of variant calling. Results: We observed a mean read length of 11kb and a mean overall read quality of 15. Ngmlr showed higher F1 scores but also higher allele frequencies (AF) of false-positive calls across the mixtures (mean F1 score = 0.83, false-positives AF < 0.17 ) compared to Minimap2 (mean F1 score = 0.82, false-positives AF < 0.06). Mutserve2 had the highest F1 scores (5%-level: F1 score > 0.99, 2%-level: F1 score > 0.54, 1%-level: F1 score > 0.70) across all callers and mixture levels. Conclusion: We here present the benchmarking for low-frequency variant calling with Nanopore sequencing by identifying current limitations.