AUTHOR=Yim Aldrin Kay-Yuen , Yu Allen Chi-Shing , Li Jing-Woei , Wong Ada In-Chun , Loo Jacky F. C. , Chan King Ming , Kong S. K. , Yip Kevin Y. , Chan Ting-Fung TITLE=The Essential Component in DNA-Based Information Storage System: Robust Error-Tolerating Module JOURNAL=Frontiers in Bioengineering and Biotechnology VOLUME=Volume 2 - 2014 YEAR=2014 URL=https://www.frontiersin.org/journals/bioengineering-and-biotechnology/articles/10.3389/fbioe.2014.00049 DOI=10.3389/fbioe.2014.00049 ISSN=2296-4185 ABSTRACT=The size of digital data is ever increasing and is expected to grow to 40,000EB by 2020, yet the estimated global information storage capacity in 2011 is less than 300EB, indicating that most of the data are transient. DNA, as a very stable nano-molecule, is an ideal massive storage device for long-term data archive. The two most notable illustrations are from Church et al. and Goldman et al., whose approaches are well-optimized for most sequencing platforms – short synthesized DNA fragments without homopolymer. Here we suggested improvements on error handling methodology that could enable the integration of DNA-based computational process, e.g. algorithms based on self-assembly of DNA. As a proof of concept, a picture of size 438 bytes was encoded to DNA with Low-Density Parity-Check error-correction code. We salvaged a significant portion of sequencing reads with mutations generated during DNA synthesis and sequencing and successfully reconstructed the entire picture. A modular-based programming framework - DNAcodec with a XML-based data format was also introduced. Our experiments demonstrated the practicability of long DNA message recovery with high error-tolerance, which opens the field to biocomputing and synthetic biology.