Repetitive structures in biological sequences: algorithms and applications

93.5K

views

33

authors

11

articles

Repetitive structures in biological sequences: algorithms and applications

93.5K

views

33

authors

11

articles

Editorial

04 August 2016

Editorial: Repetitive Structures in Biological Sequences: Algorithms and Applications

Marco Pellegrini

,

Alberto Magi

and

Costas S. Iliopoulos

4,072 views

0 citations

Editors

3

Marco Pellegrini

Institute of Informatics and Telematics, Department of Engineering, ICT and Technology for Energy and Transport, National Research Council (CNR)

Alberto Magi

University of Florence

Costas Iliopoulos

King's College London

Impact

About

Repetitive structures in biological sequences are emerging as an active focus of research and the unifying
concept of "repeatome" (the ensemble of knowledge associated with repeating structures in genomic/proteomic sequences) has been recently proposed in order to highlight several converging trends.

One main trend is the ongoing discovery that genomic repetitions are linked to many biological significant events and functions. For example, an abnormal number of Tandem Repeating units both in coding and non-coding parts of the genome have been found to cause a series of diseases, including Huntington disease. Moreover, there are recent indication of a link between Tandem repeat expansion and Amyotrophic Lateral Sclerosis.

Copy Number Variations, not necessarily in tandem, in the genome have been demonstrated to be one of the main sources of genomic variation in humans, have been shown to participate to phenotypic variation and adaptation, and may contribute in causing various diseases including cancer, cardiovascular disease, HIV acquisition and progression, autoimmune diseases, and Alzheimer and Parkinson diseases. Intra-genic Tandem Repeats polymorphisms may be involved in mis-regulations leading to cell toxicity through multiple pathways.
Tandem Repeats in NGS data are however difficult to detect and analyze, and devising effective detection algorithms is still a very open area of research.

Repeating structures abound in human proteins and they are a key to elucidate sequence-structure-function
relationships. Inverted repeats are a key feature of hairpins and have been shown to contribute to chromosomal
fragility in human genome.

A second converging trend has been the emergence of many different models and algorithms for detecting non-obvious repeating patterns in strings with applications to in genomic data (eg. collections of reads from NGS sequencing). A key aspect still to be explored is the impact of evolutionary sequence divergence, and evolutionary selection over the origin and functional significance of repeating substructure. High
divergence repetitions are harder to detect from the backgrounds, however they may give us more insight over the evolution of functional units in the genome. To tackle these issues, new modeling and algorithmic schemes,
focusing on the computational formulation of the individual entities involved in the repeatome, are emerging. Borrowing methodologies from combinatorial pattern, matching, string algorithms, data structures,
data mining and machine learning these new approaches break the limitations of the current approaches and offer a new way to design better trans-disciplinary research.

This research topic will host recent progress of bio-informatics methods for repeats detection and also encourage the submission of reviews of recent methodological developments and applications.

The topics in this area include but not limited to:

(1) New algorithms and softwares for repeats detection (both tandem and dispersed)
in NGS data and other genomic sequences, including integrated pipelines and visualization tools;

(2) Statistical models for genome-wide or targeted association studies involving repeating elements;

(3) Algorithms and statistical models for repeats detection in genetically heterogeneous samples (e.g. tumor data vs controls);

(4) Performance evaluation of existing or novel methods using simulated and experimental data sets;

(5) Statistical models and tools for detection of repeating structures in model and non-model organisms; computational tools developed for various sequencing platforms.

Download ebook

PDF

EPUB

Share

Editors

Marco Pellegrini

Institute of Informatics and Telematics, Department of Engineering, ICT and Technology for Energy and Transport, National Research Council (CNR)

Alberto Magi

University of Florence

Costas Iliopoulos

King's College London

Impact

93,516 Total views

68,828 Article views

19,329 Article downloads

5,359 Topic views

Published In

Frontiers in Genetics

Computational Genomics

About Frontiers Research Topics

With their unique mixes of varied contributions from Original Research to Review Articles, Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author.

Suggest a topic