ORIGINAL RESEARCH article
Front. Bioinform.
Sec. Genomic Analysis
This article is part of the Research TopicAI in Genomic AnalysisView all 3 articles
A Novel and Accelerated Method for Integrated Alignment and Variant Calling from Short and Long Reads
Provisionally accepted- Sentieon Inc, San Jose, United States
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Background: Integrating short-read and long-read sequencing technologies has become a promising approach for achieving accurate and comprehensive genomic analysis. While short-read sequencing (Illumina, etc.) offers high base accuracy and cost efficiency, it struggles with structural variation (SV) detection and complex genomic regions. In contrast, long-read sequencing (PacBio, etc.) excels in resolving large SVs and repetitive sequences but is limited by throughput, higher Indel error rates, and sequencing costs. Hybrid approaches may combine these technologies and leverage their complementary strengths and sources of error to provide higher accuracy, more comprehensive results, and higher throughput by lowering the coverage requirement for the long reads. Methods: This study benchmarks the DNAscope Hybrid pipeline, a novel integrated alignment and variant calling framework that combines short-and long-read data sequenced from the same sample. The DNAscope Hybrid pipeline is a bioinformatics pipeline that runs on generic x86 CPUs. We evaluate its performance across multiple human genome reference datasets (HG002–HG004) using the draft Q100 and Genome in a Bottle v4.2.1 benchmarks. The pipeline's ability to detect small variants (SNPs/Indels), structural variants (SVs), and copy number variations (CNVs) is assessed using data from the Illumina and Pacbio sequencing systems at varying read depths (5x–30x). Benchmark results are compared to DeepVariant. Results: The DNAscope Hybrid pipeline significantly improves SNP and Indel calling accuracy, particularly in complex genomic regions. At lower long-read depths (e.g., 5x-10x), the hybrid approach outperforms standalone short-or long-read pipelines at full sequencing depths (30x-35x), reducing variant calling errors by at least 50%. Additionally, the DNAscope Hybrid outperforms leading open-source tools for SV and CNV detection, and enhances variant discovery in challenging genomic regions. The pipeline also demonstrates clinical utility by identifying variants in disease-associated genes. Moreover, DNAscope Hybrid is highly efficient, achieving less than 90 minutes runtimes at single standard instance. Conclusion: The DNAscope Hybrid pipeline is a computationally efficient, highly accurate variant calling framework that leverages the advantages of both short-and long-read sequencing. By improving variant detection in challenging genomic regions and offering a robust solution for clinical and large-scale genomic applications, it holds significant promise for genetic disease diagnostics, population-scale studies, and personalized medicine.
Keywords: NGS - next generation sequencing, Secondary analysis, Variant calling, Hybrid analysis, machine learning, Accelerated analysis
Received: 22 Aug 2025; Accepted: 30 Oct 2025.
Copyright: © 2025 Hu, Freed, Feng, Chen, Li and Chen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Jinnan  Hu, jinnan.hu@gmail.com
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.
