MAV-seq: Platform for the NGS Data Workflow Management and Automation

  • 1 The Jackson Laboratory, Genomic Medicine, United States

The increasing amount of unparalleled heterogeneous genomic data generated today necessitates a robust platform for dealing with the practical issues of genomic data interoperability, structure, standardization, security, quality, pre-processing, governance development, long-term support, management of exponential growth of genomic applications and their datasets of enormous size and diversity. Addressing most of these living challenges and practical issues to the genomic big data club, here, we present a new scientific platform i.e. MAV-seq (Ahmed et al., 2016), towards automated management and processing of Next Generation Sequencing (NGS) data. MAV-seq (Management, Analysis, Visualization of Sequence data) is an interactive, user friendly, cross platform, secure, encrypted, automated, customized, centralized, multi-roles based database application for the management of sample repertoires and automation of the data pre-processing of epigenomic and transcriptomic data. It supports: • Study data management • Experiments & projects data management • Centralized sample metadata management • Centralized NGS data management • Automation of NGS data quality checking • Automation of NGS data pre-processing • GUI based access to the data clusters for NGS data transfer and management • Customized data export and sharing • Efficient data linking, tracking, querying and searching • Extraction, classification and loading of data from different formats • Users data and control management • Data security and encryption • Event management and logging • Centralized and modular data administration • Privatization and globalization of data MAV-seq (Figure 1) is a secure database management system, which deals with the security threats including privilege abuse, weak authentication, weak system configuration, backup, front and back end system vulnerabilities. It applies different data encryption algorithms to encode data and provides controlled system’s access to the users based on their roles and privileges. It provides easy to use interfaces for raw data management, operational data management, user data management and analysis of genomic data, which includes: classification, tracking, processing, querying and visualization of data. MAV-seq is a product line application, developed following different bioinformatics methods, software engineering principles, Butterfly paradigm (Ahmed et al., 2014), human computer interaction guidelines and big data analytics. MAV-seq integrates various genomic data quality check and pre-processing pipelines (e.g. ATAC-seq, ChIP-seq, mRNA-seq, tRNA-seq, WES, WGS etc.) with user-friendly graphical interface to enable biologist with no programming experience to process their NGS datasets. It requires Java Runtime Environment to be installed on in-use operating system (e.g. Windows, MacOSX etc.) with all integrated applications to be downloaded and installed in data cluster and referenced genome for mapping. With this platform, we aim to simplify management and storage of NGS datasets including the standardization and automation of quality control and basic processing steps. MAV-seq is very simple and easy to learn platform, which does not require bioinformatics and programming abilities.

Figure 1


We acknowledge The Jackson Laboratory for Genomic Medicine for the financial support and ownership of this research and development.


