About this Research Topic
Repetitive structures in biological sequences are emerging as an active focus of research and the unifying
concept of "repeatome" (the ensemble of knowledge associated with repeating structures in genomic/proteomic sequences) has been recently proposed in order to highlight several converging trends.
One main trend is the ongoing discovery that genomic repetitions are linked to many biological significant events and functions. For example, an abnormal number of Tandem Repeating units both in coding and non-coding parts of the genome have been found to cause a series of diseases, including Huntington disease. Moreover, there are recent indication of a link between Tandem repeat expansion and Amyotrophic Lateral Sclerosis.
Copy Number Variations, not necessarily in tandem, in the genome have been demonstrated to be one of the main sources of genomic variation in humans, have been shown to participate to phenotypic variation and adaptation, and may contribute in causing various diseases including cancer, cardiovascular disease, HIV acquisition and progression, autoimmune diseases, and Alzheimer and Parkinson diseases. Intra-genic Tandem Repeats polymorphisms may be involved in mis-regulations leading to cell toxicity through multiple pathways.
Tandem Repeats in NGS data are however difficult to detect and analyze, and devising effective detection algorithms is still a very open area of research.
Repeating structures abound in human proteins and they are a key to elucidate sequence-structure-function
relationships. Inverted repeats are a key feature of hairpins and have been shown to contribute to chromosomal
fragility in human genome.
A second converging trend has been the emergence of many different models and algorithms for detecting non-obvious repeating patterns in strings with applications to in genomic data (eg. collections of reads from NGS sequencing). A key aspect still to be explored is the impact of evolutionary sequence divergence, and evolutionary selection over the origin and functional significance of repeating substructure. High
divergence repetitions are harder to detect from the backgrounds, however they may give us more insight over the evolution of functional units in the genome. To tackle these issues, new modeling and algorithmic schemes,
focusing on the computational formulation of the individual entities involved in the repeatome, are emerging. Borrowing methodologies from combinatorial pattern, matching, string algorithms, data structures,
data mining and machine learning these new approaches break the limitations of the current approaches and offer a new way to design better trans-disciplinary research.
This research topic will host recent progress of bio-informatics methods for repeats detection and also encourage the submission of reviews of recent methodological developments and applications.
The topics in this area include but not limited to:
(1) New algorithms and softwares for repeats detection (both tandem and dispersed)
in NGS data and other genomic sequences, including integrated pipelines and visualization tools;
(2) Statistical models for genome-wide or targeted association studies involving repeating elements;
(3) Algorithms and statistical models for repeats detection in genetically heterogeneous samples (e.g. tumor data vs controls);
(4) Performance evaluation of existing or novel methods using simulated and experimental data sets;
(5) Statistical models and tools for detection of repeating structures in model and non-model organisms; computational tools developed for various sequencing platforms.
Important Note: All contributions to this Research Topic must be within the scope of the section and journal to which they are submitted, as defined in their mission statements. Frontiers reserves the right to guide an out-of-scope manuscript to a more suitable section or journal at any stage of peer review.