AUTHOR=Yang Aimin , Zhang Wei , Wang Jiahao , Yang Ke , Han Yang , Zhang Limin TITLE=Review on the Application of Machine Learning Algorithms in the Sequence Data Mining of DNA JOURNAL=Frontiers in Bioengineering and Biotechnology VOLUME=Volume 8 - 2020 YEAR=2020 URL=https://www.frontiersin.org/journals/bioengineering-and-biotechnology/articles/10.3389/fbioe.2020.01032 DOI=10.3389/fbioe.2020.01032 ISSN=2296-4185 ABSTRACT=Deoxyribonucleic acid (DNA) is a biological macromolecule. Its main function is information storage. At present, the advancement of sequencing technology had caused DNA sequence data to grow at an explosive rate, which has also pushed the study of DNA sequences in the wave of big data. Moreover, machine learning is a powerful technique for analyzing largescale data, and learns spontaneously to gain knowledge. It has been widely used in DNA sequence data analysis and obtained a lot of research achievements. Firstly, this article introduces the development process of sequencing technology, expounds the concept of DNA sequence data structure and sequence similarity. Then it analyzes the basic process of data mining and summarizes several major machine learning algorithms. Then reviews four typical applications of machine learning in DNA sequence data: DNA sequence alignment, DNA sequence classification, DNA sequence clustering and DNA pattern mining. We analysis their corresponding biological application background and significance, and systematically summarize the field of DNA sequence data mining in recent years. Finally, we summarize the content of the article and look to the future of some research directions for the next step.