Skip to main content

About this Research Topic

Submission closed.

While there have been great advances in data analytics in recent years including distributed computing for Big Data, machine learning including deep learning, less attention has been paid to the data curation and data governance processes supporting data analytics. A common complaint is that data scientists ...

While there have been great advances in data analytics in recent years including distributed computing for Big Data, machine learning including deep learning, less attention has been paid to the data curation and data governance processes supporting data analytics. A common complaint is that data scientists spend 80% of their time preparing data for analysis and only 20% of the time in the actual analysis. This is because the tools and methods used in data preparation require a substantial amount of human time and effort for tasks such as data quality analysis, data cleaning, data enhancement, data standardization, data integration, testing, and validation. Data preparation is just one phase of data curation, the management of data through its entire life cycle from acquisition to disposal. Furthermore, as organizations realize the value of their data, they are implementing data governance programs to ensure they have a complete inventory of their data and its contents, and a way to exercise authority and accountability over data as an organizational asset. As with data curation, most data governance processes require substantial human time and effort to be effective.

The aim of this Research Topic is to examine the Automated Data Curation and Data Governance Automation research to develop unsupervised methods and techniques to automate data curation and data governance processes to the greatest extent possible. The goal of fully automating data cleaning and integration has been labeled as a “data washing machine” by Richard Wang with some initial development led by John R. Talburt. Similar work has begun in the industry to develop methods for automating many of the data governance tasks, such as “positive data control” for maintaining the enterprise data catalog. Replacing human analysis with scalable, unsupervised automation of these processes will not be easy but necessary to keep pace with the increasing volume and variety of data driving modern decision systems.

Submissions to this Research Topic can address but are not limited to the following themes within the context of automated methods for:

• Data quality assessment and metrics
• Generating data quality validation rules
• Data cleansing (data washing machines)
• Spelling correction
• Missing value imputation
• Data standardization
• Multi-source data integration
• Entity and identity resolution
• Data governance policy and standards conformance
• Metadata generation
• Data catalog initialization and setup
• Updating data catalogs and business glossaries
• Data operations logging and data provenance
• Positive data control
• Generating data products
• Data as a service
• Data archiving, deletion, and disposal

Keywords: Data curation, data governance, data life cycle, data process automation, unsupervised data operations


Important Note: All contributions to this Research Topic must be within the scope of the section and journal to which they are submitted, as defined in their mission statements. Frontiers reserves the right to guide an out-of-scope manuscript to a more suitable section or journal at any stage of peer review.

Topic Editors

Loading..

Topic Coordinators

Loading..

Recent Articles

Loading..

Articles

Sort by:

Loading..

Authors

Loading..

views

total views views downloads topic views

}
 
Top countries
Top referring sites
Loading..

Share on

About Frontiers Research Topics

With their unique mixes of varied contributions from Original Research to Review Articles, Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author.