Scientific Data Management in the Age of Big Data: An Approach Supporting a Resilience Index Development Effort
- 1Office of Research and Development, United States Environmental Protection Agency, United States
- 2Oak Ridge Associated Universities, United States
- 3University of West Florida, United States
The increased availability of publicly available data is, in many ways, changing our approach to conducting research. Not only are cloud-based information resources providing supplementary data to bolster traditional scientific activities (e.g., field studies, laboratory experiments), they also serve as the foundation for secondary data research projects such as indicator development. Indicators and indices are a convenient way to synthesize disparate information to address complex scientific questions that are difficult to measure directly (e.g., resilience, sustainability, well-being). In the current literature, there is no shortage of indicator or index examples derived from secondary data with a growing number that are scientifically focused. However, little information is provided describing the management approaches and best practices used to govern the data underpinnings supporting these efforts. From acquisition to storage and maintenance, secondary data research products rely on the availability of relevant, high-quality data, repeatable data handling methods and a multi-faceted data flow process to promote and sustain research transparency and integrity. The U.S. Environmental Protection Agency recently published a report describing the development of a climate resilience screening index which used over one million data points to calculate the final index. The pool of data was derived exclusively from secondary sources such as the U.S. Census Bureau, Bureau of Labor Statistics, Postal Service, Housing and Urban Development, Forestry Services and others. Available data were presented in various forms including portable document format (PDF), delimited ASCII and proprietary format (e.g., Microsoft Excel, ESRI ArcGIS). The strategy employed for managing these data in an indicator research and development effort represented a blend of business practices, information science, and the scientific method. This paper describes the approach, highlighting key points unique for managing the data assets of a small-scale research project in an era of “big data.”
Keywords: data management, resilience, indicators, curation, framework
Received: 08 Nov 2018;
Accepted: 14 May 2019.
Edited by:Peng Liu, Institute of Remote Sensing and Digital Earth (CAS), China
Reviewed by:Michael Burgass, Imperial College London, United Kingdom
Vittore Casarosa, Istituto di scienza e tecnologie dell'informazione "Alessandro Faedo" (ISTI), Italy
Copyright: © 2019 Harwell, Vivian, McLaughlin and Hafner. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Ms. Linda C. Harwell, United States Environmental Protection Agency, Office of Research and Development, Washington D.C., United States, firstname.lastname@example.org