Interactive Quality and Pre-Processing Pipeline for ATAC-seq Data
-
1
The Jackson Laboratory, Genomic Medicine, United States
Transposase-accessible chromatin is of importance to better understand gene regulation and chromatin biology. Modern methods include assay for transposase-accessible chromatin with high throughput sequencing (ATAC-seq), a new protocol to capture open chromatin sites (Buenrostro et al., 2013) by performing adaptor ligation and fragmentation of open chromatin regions (Tsompana and Buck, 2014). ATAC-seq helps in understanding the individual nucleosomes positions and chromatin compaction at high resolution by tagging sequencing adaptors at cleaved DNA sites of low starting cell number (500–50,000) into vitro (Buenrostro et al., 2013). ATAC-seq is based on high throughout sequencing and purified Tn5 transposase to estimate quantitative genetic interactions and probe open chromatin sites.
Due to its efficiency in requirement of biological sample and in library preparation time, many scientists are generating ATAC-seq libraries to decipher the chromatin landscape of DNA in a given cell type and condition of interest. Bioinformatics is playing vital role in ATAC-seq data pre-processing and analysis in determining proportionally accessible regions of the genome with transposase activity and separating the reads shorter than canonical length (38bp) with positional information of nucleosome and its free regions (Buenrostro et al., 2015). ATAC-seq data processing pipeline starts with the quality check and adapter trimming, then alignment, shifting, removing duplicates, sorting and peak calling is performed to find significant numbers of mapped reads, indicating the presence of gene regulatory region. Implementation of ATAC-seq data processing pipeline is a complex task, as it involves the I/O (input/output) redirectional integration of several non-interactive command-line applications in Unix, Linux and DOS environments, which requires good knowledge of bioinformatics tools and good programming skills. Currently, there is no open source interactive software available, which can support biologists (without programming background) in doing ATAC-seq data processing.
Here, we present I-ATAC as the first interactive, cross platform, user-friendly desktop application, which supports transparent, reproducible and automatic generation of ATAC-seq data processing. I-ATAC integrates several non-interactive mode applications for quality checking, adapter trimming, alignment, shifting, removing duplicates, sorting and peak calling by automatically generating and running sequential, multiple-parallel and customized data analysis pipelines. Its performance has been tested and delivered direct results on a private and public large-scale datasets which can be visualized using available genome browsers, for differential expression analysis to find significant numbers of mapped reads, indicating the presence of gene regulatory region in the genome.
The targeted end users of I-ATAC are mainly the biologists, who are used to of interactive operating systems (e.g. Windows, Mac-OS-X) and have no programming experience in command-line environment. I-ATAC is a user-friendly desktop application, based on the fundamental software engineering principles, Butterfly paradigm (Ahmed et al., 2014) human computer interaction guidelines and big data analytics. It is programmed in Java and requires Java Runtime Environment to be installed on in-use operating system (e.g. Windows, MacOSX etc.). Additionally, it compels all integrated applications to be downloaded and installed in data cluster and referenced genome for mapping. I-ATAC is very easy to install, cross platform application, freely available to download (https://zenodo.org/record/46078#.VsJIpLS5LHM) and use.
Acknowledgements
Authors acknowledge The Jackson Laboratory for Genomic Medicine (JAX-GM) for financial support and ownership of this research and development. Authors greet valuable encouragement and input of members at JAX-GM teams.
References
1. Ahmed, Z., Saman, Z., Dandekar, T. (2014) Developing sustainable software solutions for bioinformatics by the “Butterfly” paradigm. F1000Research., 3, 71.
2. Buenrostro, J.D., et al. (2015) A Method for Assaying Chromatin Accessibility Genome-Wide. Curr Protoc Mol Biol., 109:21.29, 1-9.
3. Buenrostro, J.D., et al. (2013) Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat Methods., 10, 1213-1218.
4. Tsompana, M and Buck, M. J. (2014) Chromatin accessibility: a window into the genome. Epigenetics & Chromatin., 7, 33.
Keywords:
ATAC-seq,
interactive,
pipeline,
pre-processing,
quality
Conference:
Neuroinformatics 2016, Reading, United Kingdom, 3 Sep - 4 Sep, 2016.
Presentation Type:
Poster
Topic:
Genomics and genetics
Citation:
AHMED
Z
(2016). Interactive Quality and Pre-Processing Pipeline for ATAC-seq Data.
Front. Neuroinform.
Conference Abstract:
Neuroinformatics 2016.
doi: 10.3389/conf.fninf.2016.20.00020
Copyright:
The abstracts in this collection have not been subject to any Frontiers peer review or checks, and are not endorsed by Frontiers.
They are made available through the Frontiers publishing platform as a service to conference organizers and presenters.
The copyright in the individual abstracts is owned by the author of each abstract or his/her employer unless otherwise stated.
Each abstract, as well as the collection of abstracts, are published under a Creative Commons CC-BY 4.0 (attribution) licence (https://creativecommons.org/licenses/by/4.0/) and may thus be reproduced, translated, adapted and be the subject of derivative works provided the authors and Frontiers are attributed.
For Frontiers’ terms and conditions please see https://www.frontiersin.org/legal/terms-and-conditions.
Received:
22 Mar 2016;
Published Online:
18 Jul 2016.
*
Correspondence:
Dr. Zeeshan AHMED, The Jackson Laboratory, Genomic Medicine, Farmington, CT, 06032, United States, zahmed@ifh.rutgers.edu