AUTHOR=Yim Aldrin Kay-Yuen , Yu Allen Chi-Shing , Li Jing-Woei , Wong Ada In-Chun , Loo Jacky F. C. , Chan King Ming , Kong S. K. , Yip Kevin Y. , Chan Ting-Fung TITLE=The Essential Component in DNA-Based Information Storage System: Robust Error-Tolerating Module JOURNAL=Frontiers in Bioengineering and Biotechnology VOLUME=2 YEAR=2014 URL=https://www.frontiersin.org/journals/bioengineering-and-biotechnology/articles/10.3389/fbioe.2014.00049 DOI=10.3389/fbioe.2014.00049 ISSN=2296-4185 ABSTRACT=

The size of digital data is ever increasing and is expected to grow to 40,000 EB by 2020, yet the estimated global information storage capacity in 2011 is <300 EB, indicating that most of the data are transient. DNA, as a very stable nano-molecule, is an ideal massive storage device for long-term data archive. The two most notable illustrations are from Church et al. and Goldman et al., whose approaches are well-optimized for most sequencing platforms – short synthesized DNA fragments without homopolymer. Here, we suggested improvements on error handling methodology that could enable the integration of DNA-based computational process, e.g., algorithms based on self-assembly of DNA. As a proof of concept, a picture of size 438 bytes was encoded to DNA with low-density parity-check error-correction code. We salvaged a significant portion of sequencing reads with mutations generated during DNA synthesis and sequencing and successfully reconstructed the entire picture. A modular-based programing framework – DNAcodec with an eXtensible Markup Language-based data format was also introduced. Our experiments demonstrated the practicability of long DNA message recovery with high error tolerance, which opens the field to biocomputing and synthetic biology.