Frontiers | Advances in nucleic acid and protein sequence analysis

About this Research Topic

Submission closed

Background

This collection of articles explores recent computational advances in the analysis of complex genomic data, with emphasis on cancer genetics, mobile DNA elements, metagenomics, and small RNA profiling. The first article introduces a novel bootstrap resampling method to assess the statistical confidence of clone predictions from bulk tumor sequencing data, providing valuable insights into tumor evolution and metastatic dynamics (source). The second article evaluates computational tools for detecting human endogenous retrovirus (HERV) insertions in short-read sequencing data, finding highly variable performance among programs and recommending the use of specialized or consensus-based approaches, coupled with experimental validation. The third article describes the Metagenomic Evaluation Tool Analyzer (META), a software platform that generates realistic simulated metagenomic datasets and assists researchers in selecting optimal sequencing and analysis pipelines, culminating in the introduction of the META Score to benchmark classifier performance. Lastly, the fourth article presents miRPipe, a comprehensive framework for the accurate detection and quantification of small RNAs—including novel miRNAs and piRNAs—in next-generation sequencing data, demonstrated to outperform existing methods in both synthetic and cancer datasets. Collectively, these studies highlight the critical role of robust bioinformatics pipelines in advancing genomic research and improving the reliability of complex biological data interpretation.
-----

The article collection highlights the rapid advancements and persistent challenges in computational genomics and sequencing analysis. Collectively, these studies showcase novel methodologies and benchmarking efforts for extracting reliable biological insights from next-generation sequencing data. The first article introduces a bootstrap-based framework to statistically assess the confidence in clonal sequence inference from bulk cancer sequencing, shedding light on the evolutionary dynamics of metastatic events. The second article rigorously compares computational tools for the detection of human endogenous retroviruses (HERVs), emphasizing variability in tool performance and advocating for multiple-tool and PCR-based validation strategies. The third article presents META, a flexible tool for simulating metagenomic data and benchmarking the accuracy and efficiency of metagenomic classifiers across various sequencing platforms, culminating in the development of the unified META Score for classifier evaluation. Finally, the fourth article addresses the need for reliable miRNA detection by introducing miRPipe, a comprehensive pipeline that outperforms existing methods in identifying known and novel miRNAs from small-RNA sequencing data, with particular relevance to cancer research. Together, these studies underscore the importance of unbiased evaluation, robust computational pipelines, and the integration of statistical confidence measures in advancing genomics research.
----
The exceptional growth of biological sequence databases marks a new era for biology. Yet, a large-scale analysis of this data poses a computational challenge. This Research Topic is focused on algorithmic advances leading to accelerated and/or more accurate nucleic acid and protein sequence analysis. The issue includes but is not limited to problems of sequence alignment, search, genome assembly, and variant calling. It emphasizes various algorithms for speed optimization based on leveraging the architecture of modern processors, and time and memory complexity-reducing innovations such as algorithmic improvements to computationally intensive subproblems.

The scope also includes qualitative advances in sequence analysis. They represent improvements in sensitivity and specificity of homology searches, more accurate models of sequences and sequence families (profiles) for alignment, more accurate statistics, algorithms for error detection and correction, and others. Approaches based on machine learning frameworks for increasing the accuracy or speed of a particular aspect of sequence analysis fall within the scope of this Research Topic too.

In general, this Research Topic highlights algorithmic solutions important to studies of sequence evolution. Therefore, these algorithms are expected to facilitate the analysis of numerous individual sequences. Genome assembly, error correction, the detection of variants and structural variations in genomes, as well as protein sequence classification, clustering, phylogenetics, and annotation are several examples. Scalable algorithmic and methodological developments increase the capacity to process large datasets, and, therefore, advances in computing performance facilitate sequence analysis on a large scale. Examples include sequence database indexing approaches for increasing speed or reducing disk space, novel hash functions for reducing the number of collisions, time- and memory-efficient alignment or alignment-free algorithms, novel and efficient data structures, algorithms for parallel computing, and others.

Keywords: sequence alignment, homology search, sequence statistics, sequence indexing, variant calling, short and long read mapping, high-performance computing, time and space complexity

Important note: All contributions to this Research Topic must be within the scope of the section and journal to which they are submitted, as defined in their mission statements. Frontiers reserves the right to guide an out-of-scope manuscript to a more suitable section or journal at any stage of peer review.

Topic editors

Share on

Frontiers in Bioinformatics

Protein Bioinformatics
Genomic Analysis

Impact

20kTopic views
14kArticle views
4,407Article downloads

View impact

Advances in nucleic acid and protein sequence analysis

About this Research Topic

Background

Topic editors

michiaki hamada

mindaugas margelevicius

Frontiers in Bioinformatics

Protein Bioinformatics

Genomic Analysis