About this Research Topic
Data science is the next frontier for data-driven decision making in domains such as ecommerce, healthcare, manufacturing, defense, government, and education. It is an interdisciplinary field that combines principles, concepts, and techniques in mathematics, statistics, computer science, and information science. One of the key goals in data science is to automatically extract meaningful insights and knowledge from structured, semi-structured, and unstructured data. To turn raw data into insights, several tasks need to be performed including data collection, data storage and retrieval, data wrangling, data analysis using statistical techniques and machine learning, and data visualization.
It is predicted that by 2024 there will be nearly 150 zettabytes of data. The data deluge continues to challenge us. Large amounts of data are produced on the Web (e.g., social media). Enterprise data lakes and electronic health record systems contain massive amounts of sensitive data of customers and patients. Sensors and Internet of Things (IoT) devices produce enormous amounts of data at a very high rate. As the price of whole genome sequencing continues to drop, healthcare systems will be faced with the challenge of managing massive amounts of genomic data in the near future. In recent years, machine learning (ML), deep learning (DL), and natural language processing (NLP) have become ubiquitous in commercial applications and services such as search, recommendation, image understanding, and speech recognition.
However, the explosion in the volume of structured (e.g., relational databases), semi-structured (e.g., graphs), and unstructured data (e.g., web pages, images, videos) poses serious technical challenges for data science research and applications. The goal of this Research Topic is to focus on novel approaches and techniques including scalable algorithms, models, and systems for data science tasks on large, complex datasets.
The scope of this Research Topic includes theoretical advances, systems design, algorithmic contributions in data science. We seek high-quality contributions of the following types: Original Research, Methods, Technology and Code, and Data Report. Topics of interest include but are not limited to:
• Scalable approaches for data collection and data wrangling
• Scalable approaches for data storage and retrieval of structured/semi-structured, unstructured data
• Scalable ML/DL/NLP techniques for data science tasks
• Scalable data visualization techniques
Keywords: data storage, data wrangling, machine learning, data extraction, big data, genomic data, deep learning, natural language processing, structured data, scalable algorithms, semi-structured, unstructured
Important Note: All contributions to this Research Topic must be within the scope of the section and journal to which they are submitted, as defined in their mission statements. Frontiers reserves the right to guide an out-of-scope manuscript to a more suitable section or journal at any stage of peer review.