Scalable Bioinformatics: Methods, Software Tools, and Hardware Architectures

20.4K
views
31
authors
6
articles
Cover image for research topic "Scalable Bioinformatics: Methods, Software Tools, and Hardware Architectures"
Editors
3
Impact
Loading...

Copy number variation (CNV), is defined as repetitions or deletions of genomic segments of 1 Kb to 5 Mb, and is a major trigger for human disease. The high-throughput and low-cost characteristics of next-generation sequencing technology provide the possibility of the detection of CNVs in the whole genome, and also greatly improve the clinical practicability of next-generation sequencing (NGS) testing. However, current methods for the detection of CNVs are easily affected by sequencing and mapping errors, and uneven distribution of reads. In this paper, we propose an improved approach, CNV-MEANN, for the detection of CNVs, involving changing the structure of the neural network used in the MFCNV method. This method has three differences relative to the MFCNV method: (1) it utilizes a new feature, mapping quality, to replace two features in MFCNV, (2) it considers the influence of the loss categories of CNV on disease prediction, and refines the output structure, and (3) it uses a mind evolutionary algorithm to optimize the backpropagation (neural network) neural network model, and calculates individual scores for each genome bin to predict CNVs. Using both simulated and real datasets, we tested the performance of CNV-MEANN and compared its performance with those of seven widely used CNV detection methods. Experimental results demonstrated that the CNV-MEANN approach outperformed other methods with respect to sensitivity, precision, and F1-score. The proposed method was able to detect many CNVs that other approaches could not, and it reduced the boundary bias. CNV-MEANN is expected to be an effective method for the analysis of changes in CNVs in the genome.

3,331 views
10 citations
Original Research
25 February 2021

Autism spectrum disorder (ASD) is a heterogeneous neuropsychiatric disorder with a complex genetic background. Analysis of altered molecular processes in ASD patients requires linear and nonlinear methods that provide interpretable solutions. Interpretable machine learning provides legible models that allow explaining biological mechanisms and support analysis of clinical subgroups. In this work, we investigated several case-control studies of gene expression measurements of ASD individuals. We constructed a rule-based learning model from three independent datasets that we further visualized as a nonlinear gene-gene co-predictive network. To find dissimilarities between ASD subtypes, we scrutinized a topological structure of the network and estimated a centrality distance. Our analysis revealed that autism is the most severe subtype of ASD, while pervasive developmental disorder-not otherwise specified and Asperger syndrome are closely related and milder ASD subtypes. Furthermore, we analyzed the most important ASD-related features that were described in terms of gene co-predictors. Among others, we found a strong co-predictive mechanism between EMC4 and TMEM30A, which may suggest a co-regulation between these genes. The present study demonstrates the potential of applying interpretable machine learning in bioinformatics analyses. Although the proposed methodology was designed for transcriptomics data, it can be applied to other omics disciplines.

4,438 views
11 citations
Methods
02 February 2021

New High-Performance Computing architectures have been recently developed for commercial central processing unit (CPU). Yet, that has not improved the execution time of widely used bioinformatics applications, like BLAST+. This is due to a lack of optimization between the bases of the existing algorithms and the internals of the hardware that allows taking full advantage of the available CPU cores. To optimize the new architectures, algorithms must be revised and redesigned; usually rewritten from scratch. BLVector adapts the high-level concepts of BLAST+ to the x86 architectures with AVX-512, to harness their capabilities. A deep comprehensive study has been carried out to optimize the approach, with a significant reduction in time execution. BLVector reduces the execution time of BLAST+ when aligning up to mid-size protein sequences (∼750 amino acids). The gain in real scenario cases is 3.2-fold. When applied to longer proteins, BLVector consumes more time than BLAST+, but retrieves a much larger set of results. BLVector and BLAST+ are fine-tuned heuristics. Therefore, the relevant results returned by both are the same, although they behave differently specially when performing alignments with low scores. Hence, they can be considered complementary bioinformatics tools.

3,242 views
6 citations
Recommended Research Topics
57K
views
79
authors
15
articles
Frontiers Logo

Frontiers in Genetics

Integrative Approaches to Analyze Cancer Based on Multi-Omics
Edited by Sipeng Shen, Xia Jiang, Yang Zhao, Ping Zeng
49.6K
views
52
authors
8
articles
Frontiers Logo

Frontiers in Genetics

Insights in Computational Genomics: 2022
Edited by Richard D Emes, Mehdi Pirooznia, Quan Zou, Marco Pellegrini
66.3K
views
77
authors
14
articles
Frontiers Logo

Frontiers in Genetics

Identification of Multi-Biomarker for Cancer Diagnosis and Prognosis based on Network Model and Multi-omics Data - Volume II
Edited by Chunquan Li, Dechao Bu, DECHEN Lin LIN, Masaharu Hazawa, Sun Liang
25.9K
views
49
authors
5
articles
Frontiers Logo

Frontiers in Genetics

Conference Research Topic: The 21st Asia Pacific Bioinformatics Conference (APBC 2023)
Edited by Min Zeng, Min Li, Feng Luo, Rui Yin
18.5K
views
37
authors
7
articles