Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Bioinform.

Sec. Genomic Analysis

This article is part of the Research TopicAI in Genomic AnalysisView all 3 articles

A Novel and Accelerated Method for Integrated Alignment and Variant Calling from Short and Long Reads

Provisionally accepted
Jinnan  HuJinnan Hu*Donald  FreedDonald FreedHanying  FengHanying FengHong  ChenHong ChenZhipan  LiZhipan LiHaodong  ChenHaodong Chen
  • Sentieon Inc, San Jose, United States

The final, formatted version of the article will be published soon.

Background: Integrating short-read and long-read sequencing technologies has become a promising approach for achieving accurate and comprehensive genomic analysis. While short-read sequencing (Illumina, etc.) offers high base accuracy and cost efficiency, it struggles with structural variation (SV) detection and complex genomic regions. In contrast, long-read sequencing (PacBio, etc.) excels in resolving large SVs and repetitive sequences but is limited by throughput, higher Indel error rates, and sequencing costs. Hybrid approaches may combine these technologies and leverage their complementary strengths and sources of error to provide higher accuracy, more comprehensive results, and higher throughput by lowering the coverage requirement for the long reads. Methods: This study benchmarks the DNAscope Hybrid pipeline, a novel integrated alignment and variant calling framework that combines short-and long-read data sequenced from the same sample. The DNAscope Hybrid pipeline is a bioinformatics pipeline that runs on generic x86 CPUs. We evaluate its performance across multiple human genome reference datasets (HG002–HG004) using the draft Q100 and Genome in a Bottle v4.2.1 benchmarks. The pipeline's ability to detect small variants (SNPs/Indels), structural variants (SVs), and copy number variations (CNVs) is assessed using data from the Illumina and Pacbio sequencing systems at varying read depths (5x–30x). Benchmark results are compared to DeepVariant. Results: The DNAscope Hybrid pipeline significantly improves SNP and Indel calling accuracy, particularly in complex genomic regions. At lower long-read depths (e.g., 5x-10x), the hybrid approach outperforms standalone short-or long-read pipelines at full sequencing depths (30x-35x), reducing variant calling errors by at least 50%. Additionally, the DNAscope Hybrid outperforms leading open-source tools for SV and CNV detection, and enhances variant discovery in challenging genomic regions. The pipeline also demonstrates clinical utility by identifying variants in disease-associated genes. Moreover, DNAscope Hybrid is highly efficient, achieving less than 90 minutes runtimes at single standard instance. Conclusion: The DNAscope Hybrid pipeline is a computationally efficient, highly accurate variant calling framework that leverages the advantages of both short-and long-read sequencing. By improving variant detection in challenging genomic regions and offering a robust solution for clinical and large-scale genomic applications, it holds significant promise for genetic disease diagnostics, population-scale studies, and personalized medicine.

Keywords: NGS - next generation sequencing, Secondary analysis, Variant calling, Hybrid analysis, machine learning, Accelerated analysis

Received: 22 Aug 2025; Accepted: 30 Oct 2025.

Copyright: © 2025 Hu, Freed, Feng, Chen, Li and Chen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Jinnan Hu, jinnan.hu@gmail.com

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.