Research Topic

Scalable Data Science: From Theory to Practice

About this Research Topic

Data science is the next frontier for data-driven decision making in domains such as ecommerce, healthcare, manufacturing, defense, government, and education. It is an interdisciplinary field that combines principles, concepts, and techniques in mathematics, statistics, computer science, and information science. One of the key goals in data science is to automatically extract meaningful insights and knowledge from structured, semi-structured, and unstructured data. To turn raw data into insights, several tasks need to be performed including data collection, data storage and retrieval, data wrangling, data analysis using statistical techniques and machine learning, and data visualization.

It is predicted that by 2024 there will be nearly 150 zettabytes of data. The data deluge continues to challenge us. Large amounts of data are produced on the Web (e.g., social media). Enterprise data lakes and electronic health record systems contain massive amounts of sensitive data of customers and patients. Sensors and Internet of Things (IoT) devices produce enormous amounts of data at a very high rate. As the price of whole genome sequencing continues to drop, healthcare systems will be faced with the challenge of managing massive amounts of genomic data in the near future. In recent years, machine learning (ML), deep learning (DL), and natural language processing (NLP) have become ubiquitous in commercial applications and services such as search, recommendation, image understanding, and speech recognition.

However, the explosion in the volume of structured (e.g., relational databases), semi-structured (e.g., graphs), and unstructured data (e.g., web pages, images, videos) poses serious technical challenges for data science research and applications. The goal of this Research Topic is to focus on novel approaches and techniques including scalable algorithms, models, and systems for data science tasks on large, complex datasets.

The scope of this Research Topic includes theoretical advances, systems design, algorithmic contributions in data science. We seek high-quality contributions of the following types: Original Research, Methods, Technology and Code, and Data Report. Topics of interest include but are not limited to:

  • Scalable approaches for data collection and data wrangling
  • Scalable approaches for data storage and retrieval of structured/semi-structured, unstructured data
  • Scalable ML/DL/NLP techniques for data science tasks
  • Scalable data visualization techniques


Keywords: data storage, data wrangling, machine learning, data extraction, big data, genomic data, deep learning, natural language processing, structured data, scalable algorithms, semi-structured, unstructured


Important Note: All contributions to this Research Topic must be within the scope of the section and journal to which they are submitted, as defined in their mission statements. Frontiers reserves the right to guide an out-of-scope manuscript to a more suitable section or journal at any stage of peer review.

Data science is the next frontier for data-driven decision making in domains such as ecommerce, healthcare, manufacturing, defense, government, and education. It is an interdisciplinary field that combines principles, concepts, and techniques in mathematics, statistics, computer science, and information science. One of the key goals in data science is to automatically extract meaningful insights and knowledge from structured, semi-structured, and unstructured data. To turn raw data into insights, several tasks need to be performed including data collection, data storage and retrieval, data wrangling, data analysis using statistical techniques and machine learning, and data visualization.

It is predicted that by 2024 there will be nearly 150 zettabytes of data. The data deluge continues to challenge us. Large amounts of data are produced on the Web (e.g., social media). Enterprise data lakes and electronic health record systems contain massive amounts of sensitive data of customers and patients. Sensors and Internet of Things (IoT) devices produce enormous amounts of data at a very high rate. As the price of whole genome sequencing continues to drop, healthcare systems will be faced with the challenge of managing massive amounts of genomic data in the near future. In recent years, machine learning (ML), deep learning (DL), and natural language processing (NLP) have become ubiquitous in commercial applications and services such as search, recommendation, image understanding, and speech recognition.

However, the explosion in the volume of structured (e.g., relational databases), semi-structured (e.g., graphs), and unstructured data (e.g., web pages, images, videos) poses serious technical challenges for data science research and applications. The goal of this Research Topic is to focus on novel approaches and techniques including scalable algorithms, models, and systems for data science tasks on large, complex datasets.

The scope of this Research Topic includes theoretical advances, systems design, algorithmic contributions in data science. We seek high-quality contributions of the following types: Original Research, Methods, Technology and Code, and Data Report. Topics of interest include but are not limited to:

  • Scalable approaches for data collection and data wrangling
  • Scalable approaches for data storage and retrieval of structured/semi-structured, unstructured data
  • Scalable ML/DL/NLP techniques for data science tasks
  • Scalable data visualization techniques


Keywords: data storage, data wrangling, machine learning, data extraction, big data, genomic data, deep learning, natural language processing, structured data, scalable algorithms, semi-structured, unstructured


Important Note: All contributions to this Research Topic must be within the scope of the section and journal to which they are submitted, as defined in their mission statements. Frontiers reserves the right to guide an out-of-scope manuscript to a more suitable section or journal at any stage of peer review.

About Frontiers Research Topics

With their unique mixes of varied contributions from Original Research to Review Articles, Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author.

Topic Editors

Loading..

Submission Deadlines

01 March 2021 Abstract
01 June 2021 Manuscript

Participating Journals

Manuscripts can be submitted to this Research Topic via the following journals:

Loading..

Topic Editors

Loading..

Submission Deadlines

01 March 2021 Abstract
01 June 2021 Manuscript

Participating Journals

Manuscripts can be submitted to this Research Topic via the following journals:

Loading..
Loading..

total views article views article downloads topic views

}
 
Top countries
Top referring sites
Loading..