Having Our “Omic” Cake and Eating It Too?: Evaluating User Response to Using Blockchain Technology for Private and Secure Health Data Management and Sharing

This paper reports on end users' perspectives on the use of a blockchain solution for private and secure individual “omics” health data management and sharing. This solution is one output of a multidisciplinary project investigating the social, data, and technical issues surrounding application of blockchain technology in the context of personalized healthcare research. The project studies potential ethical, legal, social, and cognitive constraints of self-sovereign healthcare data management and sharing, and whether such constraints can be addressed through careful design of a blockchain solution.


Introduction
There is a news story almost every day about how individuals' personal data are being harvested, shared with and used by third parties without their consent and in ways that have real potential to cause harm.
The result is an erosion of user trust and a reluctance to use services that gather sensitive information (Edelman, 2019).This remains true for a significant percentage of individuals even if they could greatly benefit from receiving a personalized health service that they can use to understand their health risks and maintain or improve their overall health (Betts and Korenda, 2018;Shabani, Bezuidenhout, and Borry, 2014).Individuals' reluctance may stem from uncertainty about how health data services will store and use their data over time (Sanderson et al. 2016;Shabani, Bezuidenhout and Borry, 2014).Recent revelations about how Facebook, 23&Me, and other platforms use individuals' sensitive personal data validates concerns that consumers' data may be shared with third parties without their informed consent (see, e.g., Rosenberg, 2018;Geggel, 2018).

Background Literature
Advancing personalized medicine: Why both data sharing and data privacy matter Omics science, including genomics, proteomics, exposomics, phenomics, microbiomics and metabolomics (Horgan and Kenny, 2011), provides insights into health at a molecular level never before possible and has the potential to radically alter healthcare.Omic science establishes a sophisticated, systemic understanding of the "complex, longitudinal, and dynamic nature of biological networks (and their fluctuations in response to social/environment exposures) that fundamentally govern human health and disease" (Holmes et al., 2010, 327).Indeed, Bencharit (2012) asserts that "the new era of omics studies . . .may lead to a true clinical application of personalized medicine." The undeniable social good that omics could do is not without challenges and risks, however.Privacy for participants in research and in clinical applications is a major concern because "[b]y nature, the genome encodes a sensitive yet heritable signature of an individual that is marked by genetic variation reflecting one's ancestry and disclosing one's susceptibility to health and diseases" (Shi and Wu, 2017, 61).Both Canada and the United States have passed genetic non-discrimination acts (e.g., Genetic Non-Discrimination Act, S.C.2017, c.3; Genetic Information Non-Discrimination Act, 29 USC §216(e), 29 USC §1132) in light of the potential medical, professional, legal and social consequences that individuals might face should their genomic information be disclosed.Other omic information also has the same potential for abuse.Given the very grave potential consequences of unauthorized disclosure of omic data, protecting the privacy of individuals is of paramount importance.
Family privacy is also a concern, since omics science extends not just to the individual, but to their family as well (Shi and Wu, 2017, 61).After all, genes are heritable -breaching the genetics of one individual may easily reveal private information about those who share that individual's genes."Clinical genetics guidelines [in the United Kingdom] conceptualise genetic information as confidential to families, not individuals" (Dheensa, Fenwick and Lucassen, 2017, 1).
Beyond consideration of the consequences of privacy breaches, however, lies a deeper reason to ensure that individuals' privacy is protected.In a world in which we increasingly live online, we are our data and are data are us (Cheney-Lippold, 2018).The philosopher and information ethicist, Luciano Floridi views consequentialist ethical frames of reference that focus on judging actions as moral or not based on their outcomes as insufficient (Floridi, 1999).He writes that, "Typically, privacy and confidentiality are treated as problems concerning S' ownership of some information, the information being somehow embarrassing, shameful, ominous, threatening, unpopular or harmful for S' life and well-being, yet this is very misleading, for the nature of the information in question is quite irrelevant.It is when the information is as innocuous as one may wish it to be that the question of privacy acquires its clearest value.The husband, who reads the diary of his wife without her permission and finds in it only memories of their love, has still acted wrongly.The source of the wrongness is not the consequences, nor any general maxim concerning personal privacy, but a lack of care and respect for the individual, who is also her information."(Floridi, 1999, p. 53).Thus, we see in Floridi an approach that views an individuals' data as equivalent to the individual themselves, which suggests that to abuse a person's data is tantamount to an assault of their physical being.
Simply locking data away, then, is a poor solution."Sharing genetic findings is vital for accelerating the pace of biomedical discoveries and for fully realizing the promises of the genetic revolution" (Erlich and Narayanan, 2014, 409).Thus, if omic research is to be utilized to its full potential, solutions must be found to protect privacy while still permitting data sharing and usage.
Blockchain Technology: A possible solution to private & secure data sharing Blockchain's design and networked, distributed, autonomous, and global operation establish it as a disruptive technology with social, political, and economic implications that far exceed those of other emerging technologies with many potential applications (Economist, 2015;Casey & Vigna, 2018).One of the key applications identified has been in connection with decentralized management of data and privacy.Swan (2015) notes that, by managing electronic medical records in the blockchain, they "could be analyzed but remain private, with an embedded economic layer to compensate data contribution and use."She also envisions "a standardized secure mechanism for digitizing health data into a health data commons" where patients could consent to making their health data available for research use in exchange for cryptocurrency.Benchoufi and Ravaud advocate for blockchain to address "reproducibility, data sharing, personal data privacy concerns and patient enrolment" (2017, 335), and emphasize "the transparency of the Blockchain ledger -owned by no one, publicly writable by anyone […] users do not need any third party to trust the system" (2017, 338).And, Gropper (2016) proposes the application of a decentralized identity management solution within the healthcare sector.
With the level of trust that it can enforce, blockchain also could be considered a path through the complexities of user consent.Meaningful consent is critical if health data is to be used both ethically and legally with "[C]onsent [being] a cornerstone of both biomedical research ethics and data protection law" (Thorogood and Zawati, 2015, 693).A number of studies have aimed to apply blockchain technology to giving individuals direct control over access to their medical records and consenting to secondary use of their health data for research purposes.Ekblaw et al (2016), Ivan (2016), Broderson et al (2016), Li et al (2017), Linn and Koo (2016), and Dagher et al (2018) discuss blockchain-based medical records systems that incorporate user-defined permissioning while still storing patient records in a provider's existing systems.Yue et al (2017) propose the Healthcare Data Gateway application to allow users to control their own health data and permission its use for research purposes.Zhang et al (2018) present a decentralized application for patient-defined access to structured pieces of their health data record and Patel (2018) discusses a blockchain-based framework for medical image sharing that allows for patient-defined access permissions.Finally, Hofman et al (2018) discuss a blockchain prototype for managing user consent in the use of clinical data for precision health research.
While blockchain could be a solution to some of the challenges of securing and protecting patients' health data (Engelhardt, 2017), giving patients greater control over their data using this technology, it is not without its challenges (Gordon and Catalini, 2018).The cryptography and networking involved in blockchain technology can make it difficult for even IT specialists to understand, let alone users (Lunggren, 2019).Many patients already have difficulty navigating the healthcare system, which raises questions about whether placing the added burden upon them of managing their own healthcare records, and associated consents to access and use of the data within these records, will truly generate a net positive effect (Gordon and Catalini, 2018).Omic data is particularly challenging in terms of meaningful, informed consent.Indeed, omic data represents an extreme form of "the transparency paradox […] If notice (in the form of a privacy policy) finely details every flow, condition, qualification, and exception, we know that it is unlikely to be understood, let alone read.[…] An abbreviated, plain-language policy would be quick and easy to read, but it is the hidden details that carry the significance" (Nissenbaum, 2010, 36).After all, omic research techniques -and therefore research purposesadvance quickly, making it challenging to explain the purpose, risks, and benefits of studies in an accessible way.Indeed, it is difficult to even predict "all the informational benefits and risks of research with complex genomic information" (Thorogood and Zawati, 2015, 694).Moreover, some bioethicists also worry about the possibility of coercion if patients are financially incentivized to share their personal health data (Gammon, 2018).A blockchain solution can give users greater control over access to their health records and consent to use of their health data, but will they be able to navigate both the complexity of consent in addition to a novel technology?Searching for answers matters.

Methodology
We followed a multi-method, two-stage methodology to find out more about the potential of blockchain technology to be used to protect the privacy of individuals' personal health data and enable secure data sharing without introducing cognitive and other barriers that might prevent users from understanding and effectively navigating the such systems.

Stage One: Designing a Self-Sovereign Health Data Management Solution
In the first stage, we set out to design a technical artefact, in the form of a blockchain solution that fundamentally respects users' right to privacy and provides them with the same level of choice and control over the sharing of their data as they would expect over the sharing of their bodies, with a view to exploring our research question.We decided that blockchain protocols that came closest to our vision were those that supported self-sovereign identity .Self-sovereign identity (SSI), a variant of decentralized digital identity, leverages the affordances of blockchain technology to increase users' control of their identities in the digital world (Allen, 2016).It implies that individuals' identities and the data associated with them are neither bestowed, revocable nor owned by any authority save for the individual herself.Christopher Allen writes that "[s]elf-sovereign identity is the next step beyond usercentric identity […] the user must be central to the administration of identity [with] true user control of that digital identity, creating user autonomy: (Allen, 2016) Kaliya Young and Heather Vescent (2018) explain that "Self Sovereign Identity is a new technology layer that enables individuals and organizations to assert their own identity."Tobin and Reed (2017) describe Self-Sovereign Identity as "the result of trying to satisfy three basic requirements: 1. Security -the identity information must be protected from unintentional disclosure; 2. Control -the identity owner must be in control of who can see and access their data and for what purposes (see Figure 1); and 3. Portability -the user must be able to use their identity data wherever they want and not be tied into a single provider.Tim Bouma (2019) argues that in "the old (centralized and federated) models the locus of control was between the other parties that could make decisions about me, whether I was in the picture or not."The basic tenets of SSI can be summarized at a high-level as follows: 1) every individual human being is the original source of their own identity; 2) identity is not an administrative mechanism for others to control; and 3) each individual is the root of their own identity and central to its administration (IBM, 2018).This approach differs markedly from Privacy by Design (Cavoukian, 2011) and Global Alliance for Genetic Health (GA4GH)'s Framework for Responsible Sharing of Genomic and Health-Related Data (GA4GH, 2016), wherein data stewards, research ethics boards, and researchers still make decisions about a data subject's data.By contrast, with self-sovereign identity the locus of ownership and control of decision-making shifts to the individual.Having decided upon an SSI-based solution design, we created a design artefact using prototyping and agile software development.The agile approach draws upon a group-based, collaborative software development methodology that uses iterative, highly context sensitive requirements for identification, design, implementation, and evaluation.Agile development typically involves short, intense sprints wherein cross-functional teams gather in "scrums" to identify requirements, develop code, and evaluate the functionality of a proof-of-concept software application (Agile Alliance, 2013).
Given the focus of the solution design on shifting the locus of control, custody and decision-making about health data to users of the solution, we also employed user-centered design (UCD) as the general methodological approach to the design and implementation of our prototype.UCD methodology is also widely used when designing health care services (LeRouge & Wickramasinghe, 2013;Xie & Carayon, 2015) UCD ensures the involvement of users and the inclusion of their perspectives in the research, development and assessment phases of a design (Ghulam Sarwar Shah and Robinson, 2006).
The architecture of resulting technical artefact, which was developed on Hyperledger Indy (HLI), is shown in Figure 2. HLI is comprised of four basic components: 1) verifiable claims, 2) peer-to-peer agents, 3) decentralized identifiers, and 4) a distributed ledger.Verifiable claims are a "machinereadable statement made by an entity that is cryptographically authentic (non-repudiable)" (W3C, 2019).
Verifiable claims are made when a "holder" agent sends a cryptographic credential -provable digitally signed data -received from an "issuer" agent to another "verifier" agent across a peer-to-peer connection which the receiving verifier agent is able to cryptographically prove is authentic (see Figure 4).Each interacting agent has a decentralized identifier -a new type of identifier for verifiable, "selfsovereign" digital identity that is independent from any centralized registry, identity provider, or certificate authority (W3C, 2019; DIF, 2019) -used to facilitate communication in each pairwise connection.HL Indy employs privacy-enhancing techniques, such as selective disclosure and zeroknowledge proofs -a cryptographic technique allowing an agent to prove that they know something, such as a password, without revealing what they know.A distributed ledger -a ledger that is shared across a set of nodes (i.e., network endpoints) and synchronized between the nodes using a consensus mechanism (e.g., Practical Byzantine Fault Tolerance) -is used to store Public DIDs (e.g., of issuer agents), data schemas for credentials, credential definitions, and revocation registries -that enable the cryptographic verification of claims.Our specific solution design incorporated four main actors: 1) MYco, which is an issuer of individuals' health credentials; 2) Ethics Review Boards (ERB), which issue ethics credentials to researchers so that individuals can verify that their data will be handled properly when shared; 3) Researchers, who apply to the ERB to conduct research projects and market these to data owners; and 4) Data Owners, MYco clients who hold health credentials issued by MYco that are shared with researchers with the data owner's consent.The solution was designed to support a number of steps in the process of providing individuals with control of their health data and enabling privacy-preserving and secure data sharing in support of personalized health research.
The first step in the process of data owners sharing their data and receiving rewards for their contributions occurs when they request cryptographic credentials for each of their biomarkers from MYCo.MYco then issues these credentials to the personal health wallet (agent) of each individual data owner (MYco client).
The solution also supports researchers' application to an Ethics Review Board (ERB) for ethics certificates to conduct their research, and once approved, the sending of the ethics certificate in the form of a cryptographic credential to the wallet (agent) of the applicant/researcher. Once they have ethics approval, researchers are then able to use the solution to advertise their research projects to data owners.
When a data owner notices a research project in which s/he would like to participate, this initiates a "handshake" process using a peer-to-peer connection in which data owners verify that the research project has the necessary ethics approval and researchers verify that data owners meet their study criteria and consent to the sharing of their data.The process completes when data owners share the specific health data (e.g., biomarkers) needed for the study and researchers send data owners a reward for their participation in the study in the form of a cryptographic credential.Figure 3 represents a visual overview of the "handshake" process that takes place between data owners and researchers.A key feature of the entire handshake process is its alignment with the motivating theoretical and design principles; that is, the solution is designed to ensure that the identity of data owners is never revealed to researchers, no personal health information is ever recorded or stored on the blockchain to prevent conflicts with privacy laws and reduce the potential for privacy breaches, and data owners remain in control of their personal health information at all times, revealing only as much information as they feel comfortable with given their assessment of the risk-benefits of the transaction.

Qualitative data analysis
As described in the previous section, validation of the solution design was done throughout the process of designing and implementing a prototype, with final evaluation of the prototype relying upon data gathered from three focus groups.Focus groups are suitable for exploring the attitudes towards new phenomena such as blockchain, as the relatively open-ended discussions can sensitize researchers to unrealized issues and increase the comprehensiveness of large-scale surveys conducted afterward (Morgan, 2005). .In total, 26 individuals participated in our study, with eight in the first focus group, eight in the second group and ten in the third group..The focus groups were comprised of individuals aged 25-60 years old recruited from an online graduate student group.Among all the participants, three participants recently finished their advanced degrees (Master/PhD), and the rest are all enrolled in a Master or PhD program.Five participants have an education background in information management or archival science, and eight participants are enrolled in graduate programs in the medical field.All the participants have been patients at some point in life.
During the focus group, participants were primed with a presentation that contained information about the following topics: consent, management, privacy of personal health data and blockchain technology.
Then they were shown wireframes of the user interface of the prototype solution and asked a set of semistructured questions relating to their understanding of blockchain technology, the views of data privacy and sharing, and their thoughts on the user interface.The focus groups were audio recorded with participant consent.The audio recordings were transcribed verbatim and then the recordings were destroyed.Transcriptions were pseudonymized and coded for analysis using NVIVO qualitative analysis software.The research team read the participants' responses and extracted 6 main codes as shown in table 1.

Sharing
Health data sharing with or without consent, who to share with and benefits and risks of sharing (not including ethical issues and concerns).
Privacy Discussion about privacy issues, ownership and anonymization.

Systems design
The usability and design of the platform; suggestions for improvement.Trustworthiness, security and comfort Discussion about the trustworthiness, security and level of comfort with using decentralized system.

Findings
Participants' responses flag a number of unresolved challenges to the adoption of blockchains as solutions for private and secure data sharing in healthcare, as well as specific areas for improvement of our specific solution design.The following sections provide a high-level summary of participants' feedback.
Focus group participants were generally aware of the challenges of data sharing across healthcare providers.For example, they noted that hospitals could not easily share with one another and that moving across jurisdictions often meant losing access to their health records.They also were aware of cases when very sensitive health information had been inadvertently exposed.Individuals saw value in using a blockchain-based solution as a means to support privacy-preserving data sharing.However, some individuals expressed reluctance to use such a platform until it has been thoroughly tested and more widely adopted.Areas of ongoing concern included who they would be sharing with and for what purpose.Generally, participants expressed willingness to consent to having university researchers use their data, or to share it with government agencies in the event of a public health crisis but were reluctant to share with pharmaceutical companies or insurers for fear of being discriminated against.This highlights the importance of designing upfront information about the type of organization requesting access and a clear explanation of their reason for wanting to use individuals' health data.Individuals also wanted assurances that researchers or other users of their data would not be able to reuse data for another purpose without their consent or assemble data about them from disparate sources to create a health profile about them (a "mosaic effect" [Wittes, 2011]).Participants were not universally hesitant to engage with a more experimental platform; as one focus group participant put it: ". . .someone has to start, right?There would be falls and all that and there would be corrections, I'm willing to be on the beta."One cognitive constraint leading to possible lack of trust was in connection with the way that the cryptographic proofs operated.Focus group participants expressed a lack of understanding and need for more transparency about the manner in which cryptography protected privacy and validated claims, with one participant referring to the proofs as a "black box".This suggests a need for informational tools and techniques, such as decision aids that could support participants' choices to engage with the platform.(Williams et. al, 2014) or algorithmic transparency.Unlike in artificial intelligence (AI) solutions where solution designers have often resisted requests to reveal their algorithms in order to protect their interests (Diakopolous, 2016), there is a longstanding practice of algorithmic transparency in cryptography.
Kerchoff's Principle, one of the guiding axioms of cybersecurity solution design, specifies that a cryptosystem should be secure even if everything about the system, except the private key, is public knowledge (Stewart, Tittel & Chapple, 2008).Thus, cybersecurity solution designers have much stronger incentives for revealing their cryptographic algorithms than do AI researchers, suggesting that this cognitive barrier can be overcome.
Focus group participants generally liked the idea of having greater control and custody of their personal data, though one participant did express concern: "My first impression was 'crap, now I have to keep track of it all'."Another participant said they would share the power of control and consent with immediate family members in case anything happened to them.Universally, participants did not want to bear the risk, typical of current blockchain solutions, of losing access to their data if they lost their private cryptographic key.They were all willing to give up some self-sovereignty for the ability to have a way to regain access.
In terms of usability of a decentralized cryptosystem, individuals expressed a number of concerns.In particular, some participants identified the risk of exclusion of non-tech savvy and older users.However, another participant in an older age demographic noted: ". ..actually today older people have more access to smartphones then they have had in the last 5 or 10 years."Another noted, "I think it will come to a stage that it will be much easier to use for older people."Participants also expressed concern about the understandability of consent terms and conditions, pointing to the fact that these statements can be very complex and difficult to interpret, which is consistent with the findings of previous studies.They requested that terms and conditions be presented in understandable language upfront in the handshake process, not at step five as in the technical prototype they were shown.
In relation to the offering of a reward, most individuals felt comfortable with this idea but did express some concern about potential effects in relation to the scale and granularity of data being shared and the use to which the data would be put.For example, one study participant wondered: "would that become a barrier for researchers who didn't have that kind of [money], that a company has to compensate people, and how would that affect the landscape of information sharing?"Another said, "I would also worry that the outcomes would then be skewed because if you're putting forth opportunities for compensation, then especially if you're talking $50 or less, who are you attracting?Are you really attracting a broad enough range of people that have data that's applicable to whatever the study is, so I don't like that idea."As a result, participants generally expressed a preference for smaller rewards functioning more as honoraria rather than market-based compensation.Others wanted to know more about the form a reward would take.For example, if provided in the form of a gift card, participants wondered if, they could be traced back to the research study.As a result, some participants expressed a preference for the reward in the form of cryptocurrency, like Bitcoin, or even food.Overall, users noted that they have higher levels of trust in the process knowing that a research ethics board has reviewed the study design, including the issue of compensation, even if that meant the platform was not fully decentralized.

Conclusion
No one solution can solve the challenges of protecting participant's privacy -of respecting their autonomy and dignity -in complex, revealing areas such as omic science.However, blockchain technology could solve a number of the technical and social limitations of our current systems for onboarding participants and collecting, storing, and disseminating data.As Dove, Ozdemir, and Joly (2012, 439) remind us, "open innovation models, such as open access, open source, expert sourcing, and patent pools" are one of the primary means of "overcoming the 'transfer problem' in omics research that continues to hinder the full realization of concrete applications for human health" (2012, 439).One of the major hindrances to the full embrace of open innovation in omic science is the very real danger to patient privacy breach should their data be subject to unauthorized access or disclosure.Blockchain technology could let us have our omic cake and eat it too, by permitting the data to be studied while remaining private.Nevertheless, the above evaluation flags a number of ongoing areas of concern and future research challenges.Betts, D., & Korenda, L. (2018, September 25)

Table 1 :
Analytic codes extracted from focus group participant statements.