HYPOTHESIS AND THEORY article
Blockchain, Wikis, and the Ideal Science Machine: With an Example From Genomics
- Department of Learning and Instruction, Graduate School of Education, University at Buffalo, Buffalo, NY, United States
Until now we have lacked a technology that could well capture the processes of science in a way that preserves every step, and creates a full audit trail such that nearly complete transparency is maintained. Combining wiki technologies and blockchain provides us with a new opportunity: to create a “science machine” that folds all processes together, from instrument to publication, and enabling science to proceed as never before with greater transparency, accountability, and reproducibility. Here, I discuss the theory and method that might be combined to create such a science machine with special reference to the nature of genomic science.
Scientific Institutional Reality
Scientific progress is a non-linear process. It resembles more an edifice whose parts are constantly being added to, re-arranged, sometimes torn down, and even potentially replaced. The institutions of science are constantly in flux as we attempt to model nature and understand her laws over time, keeping even our most cherished theories ever contingent and liable to contradiction, ready to tear down whole wings of the edifice if need be. Empirical “research” can be traced in essence back to the ancient Greeks, but this did not constitute “science” (Aristotle, 1938). Empiricism helps us to discover phenomena, but without more it fails to provide the bases for hypotheses and theory—the bedrock of modern science. While empiricism in its nascent forms existed well before modern science, the emergence of philosophical societies and especially their journals in what we call the Enlightenment marks the beginnings of modernity and scientific institutions (Peters et al., 2012). Publication of results that can be tested by a community of researchers, as well as the cataloging of raw data accessible to that community allows for a sea-change in society and the emergence of the scientific method writ large (Koepsell, 2010).
Over the past century, the nature of scientific communication has shifted dramatically, and the manner and accessibility of those communications have, I have argued, threatened to undermine the steady progress of science. The laws of nature and her objects are facts, and the world we catalog, uncover, attempt to understand, and relate as part of our theorizing remain unchanged, but they have become less accessible due to the application of intellectual property over the substratum of the world, and the locking up of scientific communications beyond paywalls. By way of analogy: nature's evolutionary processes evolved genomes over the course of billions of years. Built upon the processes of survival and sex, current genomes represent an imperfect catalog of nature's successes and failures, writing the code for those into the chemicals that compose life. We can learn and understand a great deal about the history of life by unlocking that code, as long as we develop better means of reading it and deciphering it. This venture has been made easier by the development of better, cheaper tools to read the code, and thankfully genomes of all species maintain clues about their pasts, preceding generations and their adaptations and extinctions. Combined with archeology, new sequencing techniques give us an increasingly clear view into the evolution of life, and the relations of species and individuals to each other.
The records of scientific institutions, preserved in raw data and in journal publications, could provide us the tools to understand the universe as a whole if we captured them properly, and allowed for their greater use by a broader community with fewer impediments. The trends that propelled scientific advances for two centuries risk being undermined by both social and technical phenomena, and yet both society and technology also hold the promise of rescuing these institutions.
Science and Its Ideals
“Science” is a process undertaken by a diffuse community working under an organic code of unspecified rules developed over time without coordination nor regulating body. Yet it works. Scientists working on related research problems do experiments or studies geared toward testing hypotheses, which are guesses about what results might occur due to ideas regarding why things work as they do, or why we observe what we do. Early “natural philosophers” realized that they could better approach the understanding they sought if they described what they were doing to others and looked for their peers to find corroboration, fault, or better explanations for their studies. A growing and geographically diffuse group of peers needed a means of review, of ongoing dissemination and discussion of research programs. Scientific publications began to emerge, beginning in 1665 with the French Journal Des Sçavans and the English Philosophical Transactions of the Royal Society. Publications afforded scientific researchers a means of disseminating results and inviting further testing, and thus either confirmation or falsification of hypotheses (Burns, 2003). Besides communicating to their scientific peers, journals, and conferences afforded laypersons an opportunity to view the manner of the progress of sciences at a high level, opening its processes to the world for further corroboration or dispute, laying bare the foundations for the institution that would most alter our world and improve the most lives.
I have argued that “science” as embodied in the history of its publications and underlying data, is a sort of hypertext. Like a hypertext, it evolves, with traces of its references to other texts etched in its footnotes and margins, preserving at its best the halting flow of inquiry, error, understanding, and theory that emerges over time among its community. The emergence of Encyclopedias was an Enlightenment attempt to create some sort of static representation of current knowledge, coalescing the fragmented publication of state of the art science at any one historical moment, but because of the nature of publishing, they were quickly outdated and required new editions at a regular pace (Headrick, 2000). Only recently has technology offered a means of fixing the epistemological barrier posed by traditional scientific publication processes. Scientific journals posed other problems, such as the costs of their maintenance, throwing most into dustbins over time, relegated at best to dusty shelves, at worst, to total loss. Even the internet, which was thought to pose the ability to preserve all knowledge, creating the ultimate hypertext, has proven unreliable at best as pages that fail to be maintained also fall into the abyss.
Ideally, we could trace the course of a scientific discipline by reviewing the web of connections among articles in journals associated with that discipline. Doing so over the course of time, since science's inception, would show a multitude of references linking studies to each other, laying the foundations for new studies, and at times providing the bases for paradigm shifts as old theories fall to new observations. Hypertext was first conceived to solve the problem of reference and preservation of knowledge. It forms even now the basis of the world wide web, whose fundamental language is HyperText Markup Language still. Scientific journals could, with the proper technology, function as idealized, preserving every link and every path, allowing for us to trace the true development of the current state of a discipline, publishing for a broad audience of investigators and laypersons the history of knowledge until now.
Because science is a social phenomenon, it is imperfect and prone to both error and manipulation. Pathological science has, at times, threatened to derail disciplines and undermine public confidence. Humans create and edit texts, they have also historically been the keepers of the “raw data” that they organize, analyze, and report upon. Significant lapses due to human error have at times undermined studies, including famously and for example, Nobel prize-winning Robert Millikan's study of the charge of the electron. We learned only later, when his notebooks were finally found, that the results that were reported did not track the observations. As Richard Feynman noted his lapses, caused by his expectations and lack of disinterest, caused others similar lapses: “Why didn't they discover the new number was higher right away? It's a thing that scientists are ashamed of—this history—because it's apparent that people did things like this: When they got a number that was too high above Millikan's, they thought something must be wrong—and they would look for and find a reason why something might be wrong. When they got a number close to Millikan's value they didn't look so hard. And so they eliminated the numbers that were too far off, and did other things like that…” To gain greater precision, to overcome the effects of human psychology, and to create a more perfect record of scientific progress that enables better understanding of nature for a broader audience, we can begin to envision a sort of four-dimensional hypertext: linking the tools of observation to the publications of those observations and commentary and analysis by peers over time, the idealized structure of the scientific method could begin to be implemented. The oil-drop apparatus should have fed uninterrupted and unmediated data into a notebook that could not be altered, and that would have been available to editors reviewing the article submitted reporting the experiment and reaching its conclusions, and available to anyone wishing to challenge or try to reproduce the experiment and its findings.
The ideal of science is never realized in practice, and no technology could ensure it either. Humans are inextricably involved in its processes and institutions, and must forever be for it to be science as such. Our tendencies toward distrust, hubris, deceit, prejudice, and a million additional failures make us forever likely to help lead studies astray. But we can attempt to allay these errors, to prevent their propagation, to create new institutions and use new technologies to make our human frailties less successful over time. Science works, remarkably, over the course of a long term, tens of years or more and the continued, successful adherence in general by the community of researchers and their supporters, help to ensure that we move forward on average and that errors and harms are mitigated. Let's look briefly at some of the pressures causing some of those harms and then explore how new technologies might be adapted to help research progress more smoothly, more transparently, and to greater benefit, with particular attention to issues in genetic science.
Science, Society, and Scientists: Sources of Error
Perhaps at some point, science was pursued by men of leisure, those with resources and time, as well as curiosity, and whose livelihoods were not dependent upon their investigations. Over time, however, science became a profession, and even more recently, a business. To continue, it needs money, and institutions that evolved around it have become profit-making ventures. As a profession, the pressures that exist have also changed since the first professors of the sciences began their academic careers. As competition heated up for academic positions and status, for grant money and for profits from publications, so too have the incentives to cut corners, to bluster, or even to deceive (Greenberg, 2007; Gardner, 2009). Millikan may have well-believed that his correctness about the charge of an electron was so certain that it justified his excluding results, and a consequence besides the Nobel Prize included his correct judgment being accepted as true, regardless of the flaws of his study. The prize came then, as now, with money and we might wonder if it was deserved. But many of the pressures faced then in academia exist in worse forms today. The need to publish, the pursuit of higher H-index scores, the shrinking potential for tenured positions, the flood of new, hungry students, and the disappearing pool of research funds add pressures that no human could ignore, and that no technology can solve. It is likely that these pressures and the enormous mass of scientific data and publication that now occurs, make the reasons to “cheat” increase, and the chances of being caught slight. Regardless, cheating still harms scientific progress and numerous examples exist that show this. Added to this is the more frequent use of press releases accompanying journal publications, to try to enhance sales and bring press attention to “breaking” scientific news (Bucchi, 1998). The pressures of both the professions of the sciences and the profits necessary to keep it going add up to recurring crises, costs, and calamities, some of which have even cost lives. How can we address these issues and help propel scientific progress again in a positive direction? (Wood, 1904).
Salami science is but one of the possible results of the increasing pressures to “publish or perish” and the competition to attain rank in academia, as well as for increasingly rare permanent positions (Editorial, 2005—Nature Materials). Researchers are slicing up existing single studies into smaller, not necessarily coherent or self-contained parts in order to publish incremental papers, rather than waiting until the successful “end” or natural conclusion of some part of a broad study. As well, they may rehash their work in other forms for different publications to increase their publication numbers.
Can we merge the tools we have created in the past decade or so into a solution that helps, if not to dissuade humans from their frailties, then at least to provide better opportunities to trace and overcome their errors? Before we explore this, let's examine briefly the nature of a major object of scientific study: genomes, and how they work in ways our proposed solution might well work.
The “human genome” we've all heard about is far more complex than we are given to imagine. The first way it exceeds our expectations, based solely on terminology, is that it is only slightly “human.” In fact, genomes of all species relate to each other, and ultimately originate from the same organism at some distant point in the past. From that original strand of DNA, RNA, or some combination of the two, all life on Earth evolved. It is, interestingly, a partial record of the billions of years of evolutionary history and one of the great ongoing projects of science is to trace the genetic relations of all organisms to others. Evolution is like an ongoing experiment keeping its notes in the code of each subject, and understanding how the code relates to the evolution of whole species, as well as the functioning of individual organisms, is the goal of much of evolutionary genetic study as well as genomics in general.
The “genome' of an individual interacts with that individual over its lifetime, changes according to its environment to some degree, and becomes part of the blueprint for the offspring (if any) of the individual. We find over generations, and even sometimes during the course of one generation, changes, adaptations, alterations that we can trace to an individual or group's behaviors and experiences. But the record is incomplete, it has gaps, and we would know so much more if we could capture, at any particular point in time, as well as over the course of time, a complete snapshot of each individual's genome. Our understanding of “the genome” would flourish, given today's computing power and that which we anticipate in the near future, if we could record all that data and crunch it, looking for patterns.
We are beginning only now to possess the tools to do as I describe, to begin to create a picture of data that will extend our understanding greatly, and that can help both for a theoretical understanding of how genomes relate to species, but also how to provide better, personalized healthcare based upon a person's individual genetic makeup. It is also the foundation for fixing the problems of the institutions of science I have partially described above, and that models, in a way, nature's original blockchain. Before we delve into what a blockchain is, let's return to a proposal I made about a decade ago for “wikifying” science.
The Web of Knowledge
We are not keeping enough records, we allow too much data to be lost, and there is no easy way to keep track of data as it is produced around the world, by different individuals and groups, in various languages. Wikipedia is a sort of solution, bringing the Enlightenment project of encyclopedias into the digital age, affording a democratic, trackable, and verifiable means to add to the cumulative body of worldwide knowledge at a high level, even if not with the granularity we require in science (Maxwell, 2007). As with the scientific method itself, the wiki-model for disseminating knowledge builds upon experiment. Each editor provides a version of events, referencing claims, and others (in a form of peer review) rate and edit the versions that appear as evidence warrants. Editors too gain reputation through the strength of their contributions and the longevity of their edits to the articles. Finally, each version of an article becomes forever accessible and the trail of edits and contributions, deletions and changes, remains available to explore.
Ward Cunningham first developed the technology behind Wikipedia in 1995 and it too has been improved upon by countless others since then as the technology is open-source, meaning anyone can submit new versions. While the underlying technology has been most widely applied in Wikipedia, it is also used in numerous other “wikis” that collect, curate, and disseminate information in the democratized form that Wikipedia employs. According to Nature, as of 2005, Wikipedia had grown from nothing to 1.8 million articles (now almost 6 million) in 200 languages, with nearly 15,000 contributors.
Studies have shown that even while the editing process is more or less democratic (restrictions and bots have helped to inhibit some mischief that has over time affected Wikipedia through those who sought to use it as a mechanism of propaganda) it is highly accurate. While disclaiming itself as a reliable source given the ability of anyone to edit an entry, on the whole it is as accurate as standard references such as the Encyclopedia Britannica. As with science, the best model of truth tends to emerge over time, despite setbacks and attempts at deceit or manipulation. In a way, the Wikipedia community acts like that of scientists, self-correcting over time, discovering greater accuracy and better descriptions, and providing a mechanism of extended peer review that enables anyone to contribute, find error, and publish. Unlike the typical media of science, there is no real professional reward for even the best contributions to Wikipedia, which is why, ultimately its model offers us hope for science.
The greatest threat to full-scale adoption of wiki-science rather than journal publishing is the role that journal articles now serve in academic science, particularly in the promotion and tenure process. In academia, publications are nearly everything, even more important ultimately than receiving grants. The primary purpose of grants, after all, is to provide raw data for publications. The number and quality of you publications, measured by the “impact factor” of journals based upon their reputations, helps to build an academic's own reputation, defining their career trajectory, their ability to get tenure, and the ease with which they might be able to make lateral moves to other departments at other institutions. An academic author's H-Index marks them as valuable, and is a primary means of assessment. Whether or not one's primary of referenced work appears in Wikipedia is meaningless in the halls of academia.
The body of knowledge of nature and her laws is immanent in nature itself. Science seeks to extract as accurately as possible and model in a useful way those facts and laws that comprise the universe. But the institutions of humans have requirements and expectations about which nature and her laws care not at all. Merging human institutions with the ideals of discovery about nature is simply not possible. But bringing them in closer alignment may be occasioned by better uses of existing technologies, and careful adherence to the ethos of science. For instance: assessment of researchers, as long as it depends upon the number and impact factor of publications, will be an incomplete process, marred by the nature of devising H-Indices, and in need of external elaboration.
Consider this, though: the perfect companion to scientific inquiry, a complete record of all scientific observations and analysis, captured permanently, fully transparent, and an additional mechanism of recognition of the contributions of researchers. A blockchain-wiki combination, ensuring that each new observation and analysis is traceable and accessible worldwide. An encyclopedia “galactica,” preserving in four dimensions the progress of all sciences. Much as “the genome” provides a sort of record of the evolution of all species, the Science-Blockchain can grant us an even more perfect view of each new entry in the database of human knowledge of nature, her parts, and her laws.
In the case of a genomic study, for example, the blockchain-enabled Perfect Science Machine I propose would automatically capture the following at every stage of a study: every subject's unique ID, anonymized as a hash on the blockchain—which also has a private key to unlock that data and take control of it, the tissue sample ID, the ID and metadata (time, date, etc.) from the genotyping or sequencing machine, similar data about the operator of the machine (all, again, anonymized, and tagged with only a unique hash), an ID for the text file report of the sequencing or genotyping, ideally the de-identified text file itself (perhaps encrypted and readable only with a key held by the subject, sharable with researchers—perhaps only upon payment for the use of the data), any and all analyses of the raw data, IDs of researchers accessing the data, etc. By keeping such a trail, from sample to analyses and use, publications referencing data used in studies can allow anyone to track back the provenance of data and replicate more successfully (or refute) the study, providing ultimately a full record of all data, immutably preserved in an environment of anonymity. Finally, the ideal science machine also places the publications themselves on blockchain, accessible perhaps through payment gateways, but ideally also available for review by all researchers, again with the aim of perfect transparency and committing to the record an immutable log of the scientific enterprise as a whole, as well as its atomic parts.
Open, Fast, and Visible
Blockchains are, in their ideal, immutable ledgers maintained by a “trustless” network where every machine participating is rewarded for its constant verification of the ledger. While the most well-known application thus far has been Bitcoin, the same technology can be applied to maintaining any record (Nguyen and Kim, 2018). By capturing data using a blockchain, lapses that exist in wiki technologies, namely the potential for losing data over time, or the inclusion of entries that are inaccurate and not properly verified by the community hastily, can be addressed (Bartoletti et al., 2017).
Calls for greater transparency are often repeated following notable lapses in scientific conduct. Deceit and distrust grow in the shadows, and sunlight is the best disinfectant. When processes are opened up to maximum scrutiny by the broadest public, opportunities to commit fraud diminish. As well, the progress of science need no longer be halted by greed or self-centered attempts to profit. At least in the ideal, science demands maximal openness, and wikis provide that transparency. Blockchains too provide transparency and trust to record-keeping, creating repositories that cannot generally be hacked or altered, providing consensus, a sort of blind peer review with rewards, to maintain distributed and open databases. Combined with one-another, blockchains and wikis can become relatively permanent and more-or-less perfect scientific repositories upon which anyone can build, and even linked to our machines of observation, providing links back to each element of a study. Wiki technology could help create a universal repository for the current state of scientific knowledge and inquiry, free from the bureaucratic and pragmatic obstructions involved in traditional journal publishing. In so doing, it would democratize the scientific process, create greater transparency and accountability, and hopefully speed up the dissemination of experimental results, criticisms, propagation and refinement of hypotheses, and development of robust theories. We need not necessarily fret over the effect our perfect-science machine would have on the academic or industrial institutions that many scientists belong to, because we should assume that improving the mechanisms of science is a good that institutions can adjust to, and around which they should evolve rather than vice-versa. We should broadly agree on the truth of the following:
- Science as an institution must value the pursuit of truths above individual careers,
- scientists who make positive contributions to the pursuit of truths ought to be recognized,
- a CV listing publications is not necessarily an accurate reflection of a scientist's contributions to the institutions of science,
- wiki technology can capture individual contributions of authors and researchers in some fair manner.
- and, career decision can be made based upon real contributions to scientific institutions (Wray, 2009).
All of these assumptions ought to be theoretically acceptable, at least in the context of idealized scientific method. Can we devise new methods for professional evaluation and recognition of individual achievement and still improve the methods of the sciences with wikis and blockchains combined, mimicking in a way the manners in which nature has herself captured the evolution of species, revealing even more of scientific progress?
It is likely that our (nearly) perfect science machine will help us reform our human institutions and refocus on the elements of scientific inquiry and contribution that truly matter for the progress of science, and even possibly help us to better give credit where credit is due. For instance, if it had existed when the structure of DNA was first discovered, there would be a direct link and perfect record tracing back to Rosalind Franklin whose contributions were, at the time, unfortunately not fully revealed.
A number of existing wiki-type efforts are already augmenting some research programs, and being embraced by scientists. Among these are tools for gene annotation (Salzberg, 2007), chemical discovery (Williams, 2008), and the life-sciences. In fact, a tool called “WikiGenes” (Maier et al., 2005) takes into account the needs in the culture of the sciences for authors to be recognized, and promotes authorship tracking to help to “appraise origin, authority and reliability of information” in the wiki (Hoffmann, 2008). Others are working on additional technologies both to enable large-scale, collaborative science through networks, and to preserve institutions and norms necessary for scientific progress and careers (Goble et al., 2006). Other current examples include Project Polymath (a collaborative, wiki and blog-based tool for mathematical research), Bizarro's Bioinformatics Organization, and ResearchGate (Ingram, 2009). There are even blockchain-enabled wiki projects underway. Recent blockchain projects have begun to move beyond electronic money to large-scale data applications. Put together with wiki technology, blockchain can add a new layer of transparency, potential reward, and trust.
An article in The Economist discusses the merits of science “blogging” as a way to open up research, avoid some of the pitfalls to traditional publishing, and work collaboratively (Sept. 20, 2008, p. 85) (Economist, 2008). This too could work hand-in-hand with wiki-science, providing references and greater depth to entries, and serving as a parallel channel of communication among scientists and the general public. As the Economist article recognizes, any significant change in the way that the institutions of science work (not to mention academia) is a “chicken and egg” problem. But as scientists themselves begin to embrace these tools for the sake of scientific communication, collaboration, and record-keeping, barriers to adoption should begin to fall, as they have in the realm of electronic journals. At one time not so long ago, electronic journals were viewed as substandard and unworthy, but now no such knee-jerk dismissal of the medium exists, and excellent electronic journals are universally acknowledged. The same will be true for blockchained-WikiScience, our perfect science machine. The promise and usefulness of the technology itself will outweigh institutional prejudices, and Vannevar Bush's universal “memex” machine, envisioned in his 1945 Atlantic Monthly article “As We May Think” (and an inspiration for wikis) will finally be realized.
The author confirms being the sole contributor of this work and has approved it for publication.
Conflict of Interest
The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Bartoletti, M., Lande, S., Pompianu, L., and Bracciali, A. (2017). “A general framework for blockchain analytics,” in Proceedings of the 1st Workshop on Scalable and Resilient Infrastructures for Distributed Ledgers (Las Vegas, NV: ACM), 7.
Gardner, T. (2009). Hacked Climate E-mails Awkward, Not Game Changer. Reuters. Available online at: http://www.webcitation.org/5lYoX1TvY (retrieved November 26, 2009).
Ingram, M. (2009). Social Media Help Generate Science 2.0. Internet Evolution. Available online at: http://www.internetevolution.com (accessed November 30, 2009).
Maier, H., Dohr, S., Grote, K., O'Keefe, S., Werner, T., Hrabe de Angelis, M., et al. (2005). LitMiner and WikiGene: identifying problem-related key players of gene regulation using publication abstracts. Nucleic Acids Res. 33, W779–W782. doi: 10.1093/nar/gki417
Peters, M. A., Liu, T.-C., and Ondercin, D. J. (2012). “Learned societies, public good science and openness in the digital age,” in The Pedagogy of the Open Society (Rotterdam: Sense Publishers), 105–127.
Keywords: wiki, blockchain, genomics, open science, open source
Citation: Koepsell D (2019) Blockchain, Wikis, and the Ideal Science Machine: With an Example From Genomics. Front. Blockchain 2:25. doi: 10.3389/fbloc.2019.00025
Received: 26 September 2019; Accepted: 25 November 2019;
Published: 06 December 2019.
Edited by:Sean T. Manion, Science Distributed, United States
Reviewed by:Jason Goldwater, Atlas Research, United States
Erika Beerbower, Independent Researcher, Denver, United States
Copyright © 2019 Koepsell. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: David Koepsell, email@example.com