Data Sharing and Global Public Health: Defining What We Mean by Data
- 1Heilbrunn Department of Population and Family Health, Columbia Mailman School of Public Health, New York, NY, United States
- 2Spark Street Advisors, New York, NY, United States
- 3The United Nations University - International Institute of Global Health, Kuala Lumpur, Malaysia
- 4Department of International Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, United States
Improved information technology infrastructure and the widespread availability of mobile phones have contributed to an unprecedented amount of data produced globally each year. Data relevant to medicine and public health are being generated from a range of sources, including individuals (i.e., through social media and internet-connected devices), public and private health systems, and health researchers. These data are increasingly digitized and critical for the development of new health interventions, in particular those that use artificial intelligence and machine learning to improve health outcomes (1, 2). Recognizing that pooling and sharing data can accelerate innovation, there have been several calls in recent years to make data available as a global public good for health as part of a set of collective actions that are global in scope and required to address transnational health challenges (3, 4).
Most recently, researchers and global policy makers have called for data sharing related to the ongoing COVID-19 pandemic and developed platforms to facilitate the sharing of patient-level data for those who have contracted SARS-CoV-2 (5, 6). The pandemic has underscored the urgent need to recognize health data as a global public good with mechanisms to facilitate rapid data sharing and data governance. Doing so would support critical pandemic response activities. For example, researchers in China sequenced the SARS-CoV-2 genome and made the data publicly available through an open access platform in January 2020 (7). By making these data available, these researchers helped hasten the development of critical diagnostic assays.
Despite these efforts, there remain several persistent challenges that will need to be overcome. The technical challenges associated with data sharing have increased as datasets have grown in size and become more complex (8). There also remain several economic, legal, and political barriers that make sharing health-related data difficult. In the context of appeals for increased data sharing related to health and efforts to overcome these multifaceted challenges, different types of data and their sources are often conflated. This is problematic because the mechanisms to facilitate data sharing are often specific to data types. For example, data sharing platforms for clinical data remain in the nascent phase. While in some areas, such as cardiovascular disease, there are efforts emerging to address both drug development and clinical practice (9), in others, like genomics, where data sharing platforms are more established, the analytical software and associated outputs require significant effort for conversion and standardization (10).
Given these unique and data-specific challenges and toward establishing data as a global public good, we conducted a rapid review to characterize the current barriers to sharing data that are specific to the types of data used in public health efforts and to identify potential solutions that would support data sharing in the context of these challenges. The findings of this review could help in prioritizing the types of data that would benefit global public health and pandemic preparedness efforts through improved and expanded sharing (Table 1).
Table 1. Data domains, owners, and examples of data (11).
As the volume of public health data expands and data are more interconnected, a broad range of data is being used to support research and inform global health policies and practice. We expand on an existing framework (21) to provide a classification of data into four broad archetypes (1) patient data, (2) health systems data, (3) routine public health data, and (4) health research data. It should be noted that this categorization is used to describe how these data are generated and not necessarily their application or use. This data classification is also described in further detail in Table 1.
Data generated at the patient level include biomedical data (e.g., genomic data, proteomic data), electronic health records, and data generated by individuals themselves (e.g., data from wearables and social media). Sharing patient data allows for independent analysis of research results to ensure reliability, reproducibility, and accountability—all increasingly important in the context of black box algorithms based on AI. Sharing datasets can provide researchers and health intervention developers with appropriate population data references. Data sharing can therefore make an important contribution to public health by strengthening the evidence base used to make clinical and regulatory decisions (22).
Health Systems Data
Health systems data include human resources data, service availability and utilization data (e.g., number of hospital beds, insurance claims data), and performance metrics (e.g., performance assessment results, patient satisfaction surveys). Many of these data are found in health management information systems (HMIS) datasets used around the world, including in many resource limited settings. Sharing health systems data can strengthen coordination and collaboration between the public and private sectors toward achieving common public health goals and outcomes. Sharing health systems data with the research community can also provide insights into strategies to improve the effectiveness and efficiency of health services and measuring the impact of new health policies and interventions.
Public Health Data (Routine and Non-routine)
Routine sources include health facility and community information systems. Non-routine sources are household and other population-based surveys (e.g., demographic and health surveys, multi-indicator cluster surveys), censuses, civil registration (e.g., births and deaths) and vital statistics systems, disease surveillance systems, health facility surveys, and administrative data systems (23). Many of these data are already made available publicly either in aggregate or at the individual level. In addition to supporting research activities as already described, sharing routine and non-routine public health data can also serve to establish global health guidelines, norms, and standards. In particular, these data can be used to provide more comprehensive estimates of morbidity and mortality, including cause-specific estimates.
Health Research Data
The data described above are largely collected for administrative purposes or for program planning and management purposes. However, a large volume of data are generated through observational and experimental health research efforts. Sharing clinical trial data can accelerate advances in public health by generating evidence on the safety and effectiveness of interventions. There are numerous examples of data from clinical trials being used to demonstrate the ineffectiveness of interventions or to improve clinical care (24). Making such data available supports transparency and reproducibility in research.
Barriers to Sharing Data
The barriers to sharing health data have been well-characterized by Van Panhuis et al. (25) Based on our rapid review, we expanded on this nomenclature to identify the barriers specific to data sharing in global public health, and have illustrated in the table below how they might apply to each archetype.
1. Technical barriers include challenges faced by health information management systems and include lack of complete data, lost data, restrictive or conflicting data formats, lack of metadata and standards, lack of interoperability of datasets (e.g., structure or “language”), and lack of appropriate analytic solutions.
2. Motivational barriers include those that prevent individuals or organizations from readily sharing data. Specifically, these barriers include the lack of incentives, opportunity costs, fear of criticisms and disagreement on data use and access.
3. Economic barriers include both the potential and immediate costs of sharing data.
4. Political barriers are those that are inherent to the local health governance standards and typically manifest as policies and guidelines. They can also include issues of trust and ownership.
5. Legal challenges are those that arise from data collection, analysis and use, and range from concerns about who owns or controls data, transparency, informed consent, security, privacy, copyright, human rights, harm and stigma.
6. Ethical barriers include the lack of perceived reciprocity (i.e., the other party will not share data) and proportionality (i.e., deciding not to share data based on an assessment of the risks and benefits). An overarching challenge is that frameworks, laws, and regulations have not kept pace with the technological advances that are changing how data collection, analysis, sharing, or usage occur.
Data protection regulations vary across jurisdictions so definitions on ownership, control and use, as well as the tools or applications that apply these definitions need to align with applicable regulation in relevant jurisdictions (26). For transparency, when users create or generate data on a web or mobile application, the application's information collection and sharing policies are often unclear to the user. This creates a lack of transparency regarding how information may flow beyond its initial or intended audience or purpose (27). Informed consent is also problematic to implement. Collection, sharing and use of data can take place without individuals understanding it is happening or the extent that it does. This means there is a need to ensure a clear understanding on when its being collected, what is collected, how it will be shared and used, as well as how consent can be revoked (28). Security is another major barrier. Data breaches can expose confidential information, lead to third-party manipulation and violate rights or dignity. Data protection standards are needed to protect against unauthorized access, disclosure or use of personal data (29). Security is closely linked to rights and stigma, and the need to protect against the use of information to exploit, marginalize, criminalize, discriminate or exclude vulnerable or disproportionately affected populations (30).
For all data archetypes, the prospect of sharing data raises concerns related to threats to the privacy of personal information that can identify or be linked to a person (31, 32). Privacy issues are particularly acute with regard to patient data, including in resource-constrained settings where data systems are rapidly evolving, corruption is common, and data managers might lack sufficient security training (33–35). In such settings, regulations may be lacking or not be enforced (34). Even when consent procedures are in place, patients are not always explained the extent to which their data are being shared, with whom, or for what purpose (34). The use of mobile tracking apps to assist with contact tracing in the COVID-19 pandemic has highlighted many of these concerns, with multiple examples of data being sold to private companies without individual consent (11, 36, 37). Given these considerations, efforts related to data sharing in global health should require patient data and research data to be fully anonymized.
To date in public health, with perhaps some notable exceptions [i.e., research (38) and “omics”], there are few efforts focused on health data sharing across borders. Rather, existing data sharing activities seem to focus primarily on individual country standardization and interoperability (39). Also, where they exist, data sharing frameworks, laws, and regulations have not kept pace with the technological advances that are changing how data collection, sharing, and analysis occur (27, 30). While many of these issues are not unique to global public health, there is perhaps particular urgency in this area given the self- or group-identifying characteristics of data and the potential for misuse or harm if in the wrong hands or without proper safeguards. Addressing these pitfalls, including the misuse of public health data, will require a range of interventions bespoke to the respective data archetype and that takes into account the motivations and incentives of data owners. It will also require countries and data owners to commit to data sharing principles and acknowledge data as a global public good (4, 40).
Toward this end, there are lessons that can be learned from other disciplines where data sharing and standardization efforts are more advanced. The most well-know is the International Organization for Standardization (ISO), which has supported the development of over 23 thousand standards in a diverse range of areas, covering almost every aspect of technology and manufacturing. Less well-known is the World Meteorological Organization, which hosts a data exchange and technology transfer platform through which national weather agencies regularly share large atmospheric and meteorological data (41). The Society for Worldwide Interbank Financial Telecommunication (SWIFT) is an international cooperative based in Belgium that enables secure and standardized financial transactions around the world. The Business Identifier Codes (BIC; also known as SWIFT BIC and SWIFT codes) is a unique identification code that is used to facilitate the reliable transfer of funds between banks.
In its role as the normative global health agency, we believe that the World Health Organization (WHO) could also play an active role in coordinating data sharing efforts, engaging in partnerships and in the development of guidelines and standards, in particular for patient data, public health data, and health systems data. There are multiple examples where such efforts are already underway. WHO has developed a number of policies on data sharing (42). While legally binding, these policies do provide direction for member states (43, 44). Another is EPI-BRAIN (Epidemic Big Data Resource and Analytics Innovation Network) (45), an online platform intended to bring together public health data with information such as population movement, animal diseases meteorological, and other environmental factors to allow for analysis of large scale data sets for outbreak prediction and emergency preparedness. WHO has also recently established a new platform to support clinical characterization of patients with COVID-19 (46). By aggregating patient level data, this initiative aims to increase understanding of the severity, spectrum, and impact of the disease in the hospitalized population globally. On health systems data, WHO is collaborating with the DHIS2 platform and the University of Oslo on an opensource, web-based health management information system. While these efforts are not necessarily aggregated across countries, the architecture of the software makes this a possibility (18).
While engaging WHO does not imply compliance, by focusing on norms, standards, and guidance, including issues related to privacy, data standardization, and interoperability, WHO can serve to advance data sharing efforts for the benefit of global health.
NS conceptualized the paper. BW reviewed the literature. BW and NS jointly drafted the manuscript. JS and SL contributed to analysis and writing of the paper. NS and BW are joint first authors.
This work was supported by Fondation Botnar.
Conflict of Interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
We would like to thank Joseph Chiu and Ursula Jasper for their review of the manuscript.
2. Wahl B, Cossy-Gantner A, Germann S, Schwalbe N. Artificial intelligence (AI) and global health: how can AI contribute to health in resource-poor settings? BMJ Global Health. (2018) 3:e000798. doi: 10.1136/bmjgh-2018-000798
3. World Health Organization. Solidarity Call to Action to Realize Equitable Global Access to COVID-19 Health Technologies through Pooling of Knowledge, Intellectual Property and Data. WHO (2020). Available online at: https://www.who.int/emergencies/diseases/novel-coronavirus-2019/global-research-on-novel-coronavirus-2019-ncov/covid-19-technology-access-pool/solidarity-call-to-action/docs/default-source/coronaviruse/solidarity-call-to-action/solidarity-call-to-action-01-june-2020 (accessed November 3, 2020).
4. The Center for Policy Impact in Global Health. Intensified Multilateral Cooperation on Global Public Goods for Health: Three Opportunities for Collective Action. Durham, NC: Duke University (2018).
8. Reinsel D, Gantz J, Rydning J. The Digitization of the World: From Edge to Core. IDC (2018). Available online at: https://www.seagate.com/files/www-content/our-story/trends/files/idc-seagate-dataage-whitepaper.pdf (accessed September 25, 2020).
9. BigData@Heart. BigData@Heart. Official webpage. (2020). Available online at: https://www.bigdata-heart.eu/ (accessed September 23, 2020).
12. The Nextstrain team. Genomic Epidemiology of Novel Coronavirus - Global Subsampling. (2020). Available online at: https://nextstrain.org/ncov/global (accessed September 23, 2020).
13. ESP Health. Electronic Medical Record Support for Public Health. (2020). Available online at: https://www.esphealth.org/ (accessed May 26, 2020).
14. US Food and Drug Administration. FDA's Sentinel Initiative. (2020). Available online at: https://www.fda.gov/safety/fdas-sentinel-initiative (accessed September 27, 2020).
15. PCORnet. National Patient-Centered Clinical Research Network. (2020). Available online at: https://pcornet.org/ (accessed May 26, 2020).
16. United States Census Bureau. Home Page. (2020). Available online at: https://www.census.gov/ (accessed May 26, 2020).
17. Office for National Statistics. Cencus. (2020). Available online at: https://www.ons.gov.uk/census (accessed May 26, 2020).
18. The Health Information Systems Program (HISP) at the University of Oslo (UiO). DHIS2 Digital Data Packages for WHO. (2020). Available online at: https://www.dhis2.org/who (accessed September 19, 2020).
19. DHS Program. Data. (2020). Available online at: https://dhsprogram.com/data/ (accessed May 26, 2020).
20. UNICEF. Statistics and Monitoring. (2020). Available online at: https://www.unicef.org/statistics/index_24302.html (accessed May 26, 2020).
22. Bull S, Roberts N, Parker M. Views of ethical best practices in sharing individual-level data from medical and public health research: a systematic scoping review. J Empir Res Hum Res Ethics. (2015) 10:225–38. doi: 10.1177/1556264615594767
23. MEASURE Evaluation. Decision Support Systems for Linking Routine and Nonroutine Data Sources. MEASURE Evaluation (2018). Available online at: https://www.measureevaluation.org/resources/publications/fs-18-288 (accessed November 1, 2020).
25. van Panhuis WG, Paul P, Emerson C, Grefenstette J, Wilder R, Herbst AJ, et al. A systematic review of barriers to data sharing in public health. BMJ Public Health. (2014) 14:1144. doi: 10.1186/1471-2458-14-1144
26. Ursin G, Stenbeck M, Chang-Claude J, Gunter M, Kaaks R, Kampman E, et al. Data must be shared—also with researchers outside of Europe. Lancet. (2019) 394:1902–3. doi: 10.1016/S0140-6736(19)32633-9
28. Principles for Digital Development. (2020). Available online at: https://digitalprinciples.org/ (accessed September 27, 2020).
29. The United Nations Development Group. Data Privacy, Ethics and Protection Guidance Note on Big Data for Achievement of the 2030 Agenda. UNDG (2017). Available online at: https://unsdg.un.org/sites/default/files/UNDG_BigData_final_web.pdf (accessed September 27, 2020).
30. World Economic Forum. Data Collaboration for the Common Good. Enabling Trust and Innovation Through Public-Private Partnerships. Insight Report. World Economic Forum (2019). Available online at: http://www3.weforum.org/docs/WEF_Data_Collaboration_for_the_Common_Good.pdf (accessed September 27, 2020).
34. Wyber R, Vaillancourt S, Perry W, Mannava P, Folaranmi T, Celi LA. Big data in global health: improving health in low- and middle-income countries. Bull World Health Organ. (2015) 93:203–8. doi: 10.2471/BLT.14.139022
35. Beck EJ, Gill W, De Lay PR. Protecting the confidentiality and security of personal health information in low- and middle-income countries in the era of SDGs and Big Data. Glob Health Action. (2016) 9:32089. doi: 10.3402/gha.v9.32089
37. Bengio Y, Ippolito D, Janda R, Jarvie M, Prud'homme B, Rousseau JF, et al. Inherent privacy limitations of decentralized contact tracing apps. J Am Med Inf Assoc. (2020) 27:ocaa153. doi: 10.1093/jamia/ocaa153
38. SPARC. Research Funder Data Sharing Policies. (2020). Available online at: https://sparcopen.org/our-work/research-data-sharing-policy-initiative/funder-policies/ (accessed September 23, 2020).
39. World Health Organization. WHO Forum on Health Data Standardization and Interoperability. WHO (2012). Available online at: https://www.who.int/ehealth/WHO_Forum_on_HDSI_Report.pdf?ua=1 (accessed September 27, 2020).
41. World Meteorological Organization. Data exchange and Technology Transfer. (2020). Available online at: https://public.wmo.int/en/our-mandate/what-we-do/data-exchange-and-technology-transfer (accessed May 26, 2020)
42. World Health Organization. WHO Policy on the Use and Sharing of Data Collected by WHO in Member States Outside the Context of Public Health Emergencies. (2020). Available online at: https://www.who.int/about/who-we-are/publishing-policies/data-policy (accessed September 27, 2020).
43. World Health Organization. Policy statement on data sharing by WHO in the context of public health emergencies (as of 13 April 2016). Weekly Epidemiol Record. (2020) 91:237–40. Available online at: https://apps.who.int/iris/handle/10665/254440 (accessed November 1, 2020).
45. World Health Organization. EPI-BRAIN. (2020). Available online at: https://www.epi-brain.com/ (accessed September 23, 2020).
46. World Health Organization. Global COVID-19 Clinical Data Platform for Clinical Characterization and Management of Hospitalized Patients With Suspected or Confirmed COVID-19. (2020). Available online at: https://www.who.int/teams/health-care-readiness-clinical-unit/covid-19/data-platform (accessed September 21, 2020).
Keywords: data sharing, global public goods, public health, public health data types, global health
Citation: Schwalbe N, Wahl B, Song J and Lehtimaki S (2020) Data Sharing and Global Public Health: Defining What We Mean by Data. Front. Digit. Health 2:612339. doi: 10.3389/fdgth.2020.612339
Received: 30 September 2020; Accepted: 11 November 2020;
Published: 14 December 2020.
Edited by:Wendy Chapman, The University of Melbourne, Australia
Reviewed by:Karmen S. Williams, City College of New York (CUNY), United States
Copyright © 2020 Schwalbe, Wahl, Song and Lehtimaki. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Nina Schwalbe, firstname.lastname@example.org
†These authors have contributed equally to this work and share first authorship