<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="review-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Big Data</journal-id>
<journal-title>Frontiers in Big Data</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Big Data</abbrev-journal-title>
<issn pub-type="epub">2624-909X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fdata.2020.00031</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Big Data</subject>
<subj-group>
<subject>Review</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Big Data and the Little Big Bang: An Epistemological (R)evolution</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Balazka</surname> <given-names>Dominik</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="author-notes" rid="fn002"><sup>&#x02020;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/856569/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Rodighiero</surname> <given-names>Dario</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<xref ref-type="author-notes" rid="fn002"><sup>&#x02020;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/200215/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Center for Information and Communication Technology (FBK-ICT) and Center for Religious Studies (FBK-ISR), Fondazione Bruno Kessler</institution>, <addr-line>Trento</addr-line>, <country>Italy</country></aff>
<aff id="aff2"><sup>2</sup><institution>Comparative Media Studies/Writing, Massachusetts Institute of Technology</institution>, <addr-line>Cambridge, MA</addr-line>, <country>United States</country></aff>
<aff id="aff3"><sup>3</sup><institution>Berkman Klein Center for Internet &#x00026; Society, Harvard University</institution>, <addr-line>Cambridge, MA</addr-line>, <country>United States</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Brian D. Davison, Lehigh University, United States</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Kenneth Joseph, University at Buffalo, United States; Yidong Li, Beijing Jiaotong University, China</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Dario Rodighiero <email>d.rodighiero&#x00040;icloud.com</email></corresp>
<fn fn-type="other" id="fn001"><p>This article was submitted to Data Mining and Management, a section of the journal Frontiers in Big Data</p></fn>
<fn fn-type="other" id="fn002"><p>&#x02020;ORCID: Dominik Balazka <ext-link ext-link-type="uri" xlink:href="http://orcid.org/0000-0002-1070-8673">orcid.org/0000-0002-1070-8673</ext-link>; Dario Rodighiero <ext-link ext-link-type="uri" xlink:href="http://orcid.org/0000-0002-1405-7062">orcid.org/0000-0002-1405-7062</ext-link></p></fn></author-notes>
<pub-date pub-type="epub">
<day>18</day>
<month>09</month>
<year>2020</year>
</pub-date>
<pub-date pub-type="collection">
<year>2020</year>
</pub-date>
<volume>3</volume>
<elocation-id>31</elocation-id>
<history>
<date date-type="received">
<day>25</day>
<month>11</month>
<year>2019</year>
</date>
<date date-type="accepted">
<day>07</day>
<month>08</month>
<year>2020</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2020 Balazka and Rodighiero.</copyright-statement>
<copyright-year>2020</copyright-year>
<copyright-holder>Balazka and Rodighiero</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract><p>Starting from an analysis of frequently employed definitions of big data, it will be argued that, to overcome the intrinsic weaknesses of big data, it is more appropriate to define the object in relational terms. The excessive emphasis on volume and technological aspects of big data, derived from their current definitions, combined with neglected epistemological issues gave birth to an objectivistic rhetoric surrounding big data as implicitly neutral, omni-comprehensive, and theory-free. This rhetoric contradicts the empirical reality that embraces big data: (1) data collection is not neutral nor objective; (2) exhaustivity is a mathematical limit; and (3) interpretation and knowledge production remain both theoretically informed and subjective. Addressing these issues, big data will be interpreted as a methodological revolution carried over by evolutionary processes in technology and epistemology. By distinguishing between forms of nominal and actual access, we claim that big data promoted a new digital divide changing stakeholders, gatekeepers, and the basic rules of knowledge discovery by radically shaping the power dynamics involved in the processes of production and analysis of data.</p></abstract>
<kwd-group>
<kwd>big data</kwd>
<kwd>power dynamics</kwd>
<kwd>knowledge discovery</kwd>
<kwd>epistemology</kwd>
<kwd>sociology</kwd>
</kwd-group>
<contract-num rid="cn001">P2ELP1_181930</contract-num>
<contract-sponsor id="cn001">Schweizerischer Nationalfonds zur F&#x000F6;rderung der Wissenschaftlichen Forschung<named-content content-type="fundref-id">10.13039/501100001711</named-content></contract-sponsor>
<counts>
<fig-count count="1"/>
<table-count count="1"/>
<equation-count count="0"/>
<ref-count count="136"/>
<page-count count="13"/>
<word-count count="11458"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>Introduction</title>
<p>The former director of the <italic>Oxford Internet Institute</italic>, Luciano Floridi, claims that while 180 exabytes of data were collected between the <italic>invention of writing</italic> and 2006, in 2011, they grew up to 1,600 exabytes (Floridi, <xref ref-type="bibr" rid="B35">2012</xref>, p. 435). Two years later, Andrej Zwitter argues that while 5 billion gigabytes were collected between the <italic>beginning of recorded history</italic> and 2003, the same amount was generated every 2 days in 2011, estimating 5 billion gigabytes every 10 s in 2015<xref ref-type="fn" rid="fn0001"><sup>1</sup></xref> (Zwitter, <xref ref-type="bibr" rid="B136">2014</xref>, p. 2). Despite the different approximations between Floridi and Zwitter, data collection is constantly and exponentially growing &#x0201C;at a rate between 40 and 60% a year&#x0201D; (Bughin, <xref ref-type="bibr" rid="B17">2016</xref>, p. 1).</p>
<p>This unprecedented abundance has been addressed over the years using expressions such as <italic>deluge</italic> (Anderson, <xref ref-type="bibr" rid="B3">2008</xref>; Bell et al., <xref ref-type="bibr" rid="B8">2009</xref>) or <italic>avalanche</italic> (Miller, <xref ref-type="bibr" rid="B86">2010</xref>). The experts declare that big data are provoking a <italic>computational turn</italic> (Lazer et al., <xref ref-type="bibr" rid="B67">2009</xref>; Berry, <xref ref-type="bibr" rid="B9">2011</xref>), leading toward a <italic>fourth paradigm</italic> of science (Kelling et al., <xref ref-type="bibr" rid="B57">2009</xref>; Chandler, <xref ref-type="bibr" rid="B22">2015</xref>), a sort of <italic>quiet revolution</italic> (Bollier, <xref ref-type="bibr" rid="B11">2010</xref>) capable of transforming how we live, work, and think (Mayer-Sch&#x000F6;nberger and Cukier, <xref ref-type="bibr" rid="B80">2013a</xref>), opening the door to the Petabyte Age (Anderson, <xref ref-type="bibr" rid="B3">2008</xref>; Manovich, <xref ref-type="bibr" rid="B79">2011</xref>).</p>
<p>First references to &#x0201C;big data&#x0201D; appear already in 1993 (see <xref ref-type="table" rid="T1">Table 1</xref>), but it is only in 2012 that the literature about the topic started to grow exponentially. Despite the increased relevance of the subject and the various challenges raised by big data, papers that engaged directly and explicitly with underlying epistemological issues remain a minority&#x02014;roughly 0.5% of publications.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>Number of papers about &#x0201C;big data&#x0201D; by year and references to epistemology as of 24 June 2020, 1980&#x02013;2020.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th/>
<th valign="top" align="center" colspan="2" style="border-bottom: thin solid #000000;"><bold>Refers to epistemology?</bold></th>
<th/>
</tr>
<tr>
<th valign="top" align="left"><bold>Year</bold></th>
<th valign="top" align="center"><bold>No</bold></th>
<th valign="top" align="center"><bold>Yes</bold></th>
<th valign="top" align="center"><bold>Total</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">1993</td>
<td valign="top" align="center">1</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">1</td>
</tr>
<tr>
<td valign="top" align="left">1994</td>
<td valign="top" align="center">1</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">1</td>
</tr>
<tr>
<td valign="top" align="left">1995</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">0</td>
</tr>
<tr>
<td valign="top" align="left">1996</td>
<td valign="top" align="center">1</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">1</td>
</tr>
<tr>
<td valign="top" align="left">1997</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">0</td>
</tr>
<tr>
<td valign="top" align="left">1998</td>
<td valign="top" align="center">1</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">1</td>
</tr>
<tr>
<td valign="top" align="left">1999</td>
<td valign="top" align="center">3</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">3</td>
</tr>
<tr>
<td valign="top" align="left">2000</td>
<td valign="top" align="center">1</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">1</td>
</tr>
<tr>
<td valign="top" align="left">2001</td>
<td valign="top" align="center">4</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">4</td>
</tr>
<tr>
<td valign="top" align="left">2002</td>
<td valign="top" align="center">1</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">1</td>
</tr>
<tr>
<td valign="top" align="left">2003</td>
<td valign="top" align="center">3</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">3</td>
</tr>
<tr>
<td valign="top" align="left">2004</td>
<td valign="top" align="center">6</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">6</td>
</tr>
<tr>
<td valign="top" align="left">2005</td>
<td valign="top" align="center">2</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">2</td>
</tr>
<tr>
<td valign="top" align="left">2006</td>
<td valign="top" align="center">7</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">7</td>
</tr>
<tr>
<td valign="top" align="left">2007</td>
<td valign="top" align="center">4</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">4</td>
</tr>
<tr>
<td valign="top" align="left">2008</td>
<td valign="top" align="center">16</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">16</td>
</tr>
<tr>
<td valign="top" align="left">2009</td>
<td valign="top" align="center">16</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">16</td>
</tr>
<tr>
<td valign="top" align="left">2010</td>
<td valign="top" align="center">17</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">17</td>
</tr>
<tr>
<td valign="top" align="left">2011</td>
<td valign="top" align="center">31</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">31</td>
</tr>
<tr>
<td valign="top" align="left">2012</td>
<td valign="top" align="center">284</td>
<td valign="top" align="center">2</td>
<td valign="top" align="center">286</td>
</tr>
<tr>
<td valign="top" align="left">2013</td>
<td valign="top" align="center">1,325</td>
<td valign="top" align="center">2</td>
<td valign="top" align="center">1,327</td>
</tr>
<tr>
<td valign="top" align="left">2014</td>
<td valign="top" align="center">2,904</td>
<td valign="top" align="center">4</td>
<td valign="top" align="center">2,908</td>
</tr>
<tr>
<td valign="top" align="left">2015</td>
<td valign="top" align="center">5,620</td>
<td valign="top" align="center">19</td>
<td valign="top" align="center">5,639</td>
</tr>
<tr>
<td valign="top" align="left">2016</td>
<td valign="top" align="center">7,511</td>
<td valign="top" align="center">30</td>
<td valign="top" align="center">7,541</td>
</tr>
<tr>
<td valign="top" align="left">2017</td>
<td valign="top" align="center">8,561</td>
<td valign="top" align="center">38</td>
<td valign="top" align="center">8,596</td>
</tr>
<tr>
<td valign="top" align="left">2018</td>
<td valign="top" align="center">9,536</td>
<td valign="top" align="center">41</td>
<td valign="top" align="center">9,577</td>
</tr>
<tr>
<td valign="top" align="left">2019</td>
<td valign="top" align="center">9,154</td>
<td valign="top" align="center">38</td>
<td valign="top" align="center">9,192</td>
</tr>
<tr>
<td valign="top" align="left">2020</td>
<td valign="top" align="center">3,503</td>
<td valign="top" align="center">16</td>
<td valign="top" align="center">3,519</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>Results based on Web of Science: Science Citation Index Expanded; Social Sciences Citation Index; Arts and Humanities Citation Index; Conference Proceedings Citation Index&#x02014;Science; Conference Proceedings Citation Index&#x02014;Social Science and Humanities; Emerging Sources Citation Index. The query considers title, abstract, author keywords, and keywords plus</italic>.</p>
</table-wrap-foot>
</table-wrap>
<p>We are not suggesting that lack of epistemological debate implies lack of methodological concerns. There are numerous papers that discuss big data-related issues without connecting them to methods, scope, or validity of a presumably new paradigm in the theory of knowledge. However, this is precisely the heart of the matter. A new paradigm was frequently invoked, occasionally outlined, but it needs further developments. Researchers self-assessed a radically new and independent status of the big data field, claiming a considerable autonomy for themselves, but without managing to justify this conceptual move and without establishing new epistemological standards.</p></sec>
<sec id="s2">
<title>What Qualifies as Big Data?</title>
<p>The scientific community is struggling to reach a shared definition that currently does not exist. On the other side, popular and widespread sources, like the Oxford English dictionary or Wikipedia, use the term <italic>big data</italic> when the traditional modes of computational storage and analysis are not sufficient to deal with large datasets. In other words, big data are big. The concept of volume is widely employed in scientific literature as well, and it occasionally becomes the sole defining feature (Manovich, <xref ref-type="bibr" rid="B79">2011</xref>; Strom, <xref ref-type="bibr" rid="B108">2012</xref>; Jenkins, <xref ref-type="bibr" rid="B53">2013</xref>; Taylor et al., <xref ref-type="bibr" rid="B113">2014</xref>). However, the use of the term <italic>volume</italic> implies two major problems. First, the epistemological problem is identified through technical issues such as storage and maintenance (Strom, <xref ref-type="bibr" rid="B108">2012</xref>; Trifunovic et al., <xref ref-type="bibr" rid="B118">2015</xref>), underestimating the bias that collecting and processing data imply. In this perspective, which promotes a structured epistemological myopia, increasing the computational power is all we need to solve, once and for all, the challenges raised by big data (see Mercer, <xref ref-type="bibr" rid="B84">2019</xref>). However, epistemological issues require epistemological solutions (Floridi, <xref ref-type="bibr" rid="B36">2014</xref>). Second, the volume of big data is still widely undefined. Kitchin and McArdle (<xref ref-type="bibr" rid="B62">2016</xref>) observe that defining this threshold is not easy. Moreover, the volume of a dataset can be measured using the number or the size of records producing different results.</p>
<p>The inconsistency of these definitions makes the entire phenomenon blurry, providing a safe ground to affirm that big data were employed for centuries (Arbesman, <xref ref-type="bibr" rid="B5">2013</xref>; Kaplan and di Lenardo, <xref ref-type="bibr" rid="B56">2017</xref>). While the volume is not relevant as much as the velocity and the exhaustivity that <italic>usually</italic> characterize big data (<italic>ivi</italic>, Kitchin and McArdle, <xref ref-type="bibr" rid="B62">2016</xref>), the discussion about volume is, in reality, a discussion about perception. The point is not how we measure but rather how we perceive a dataset. Data abundance indeed is perceived through the &#x0201C;technologies [that] were invented to deal with the perceived overload&#x0201D; (Strasser, <xref ref-type="bibr" rid="B107">2012</xref>, p. 85). Being big thus becomes a <italic>historically contextualized</italic> quality that a dataset might have with regard to the technologies available in a specific time period (Lagoze, <xref ref-type="bibr" rid="B64">2014</xref>). Although the current amount of available information was never experienced before, this was equally veritable in many moments of human history. It is sufficient to think, for example, about the specimen of 17,000 argyle tablets recording administrative data that were produced in the ancient city of Ebla between II and III millennium BC (Kaplan and di Lenardo, <xref ref-type="bibr" rid="B56">2017</xref>), and consider the massive impact that movable type had on the velocity of the printing process and on the volume of printed material during the so-called &#x0201C;printing revolution&#x0201D; of 1,455 (Eisenstein, <xref ref-type="bibr" rid="B31">1983</xref>). So, what makes the current overload so different from the previous ones?</p>
<p>Concepts such as velocity, variety, and veracity provide a less tautological definition (Laney, <xref ref-type="bibr" rid="B65">2001</xref>; Floridi, <xref ref-type="bibr" rid="B35">2012</xref>; Arbesman, <xref ref-type="bibr" rid="B5">2013</xref>; Lowrie, <xref ref-type="bibr" rid="B76">2017</xref>). Big data are so defined as large datasets generated in real time, characterized by messiness and by different types of content such as images, text, or numbers. &#x0201C;Versatility, volatility, virtuosity, vitality, visionary, vigor, viability, vibrancy, and even virility&#x0201D; are other concepts employed by scholars (see Uprichard, <xref ref-type="bibr" rid="B120">2013</xref>, p. 1). The variety of nuances supposed to have indexical power, as noted by Emma Uprichard, makes the substantial lack of agreement in the scientific community clear. This thesis is also supported by Kitchin and McArdle (<xref ref-type="bibr" rid="B62">2016</xref>), who compared 26 datasets labeled as &#x0201C;big data&#x0201D; according to volume, velocity, variety, exhaustivity, resolution and indexicality, relationality, extensionality, and scalability. None of these traits was present in all datasets. Since big data do not share common traits, only <italic>prevailing</italic> ones, Kitchin and McArdle argued that big data do not constitute a genus but belong to different species (<italic>ivi</italic>, Kitchin and McArdle, <xref ref-type="bibr" rid="B62">2016</xref>), yet how can these species be defined if their common genus cannot be isolated? It is dangerous to define and classify species in the absence of any unifying characteristic.</p>
<p>An alternative set of approaches adopted a slightly different perspective. Mayer-Sch&#x000F6;nberger and Cukier, for example, stress how big data create a shift from a causal approach to knowledge discovery, to an approach based on inductive reasoning and correlation (Mayer-Sch&#x000F6;nberger and Cukier, <xref ref-type="bibr" rid="B80">2013a</xref>). Similarly, Boyd and Crawford claim that big data are not just a technological issue but also a cultural and scholarly phenomenon (Boyd and Crawford, <xref ref-type="bibr" rid="B15">2012</xref>, p. 663). These definitions suggest that big data should be classified according to the way they are used and perceived, rather than their intrinsic characteristics. If presumably defining features, like volume or velocity, lack indexical power and are historically contextualized, then a relational approach might represent an important step toward a shared definition capable of distinguishing big data from lots of data.</p>
<p>The epistemological problem is concerned with the way big data are used to produce and justify knowledge. To approach the puzzle, it is thus important to examine the complex relations between produced knowledge, knowledge producers, and means of knowledge production. What exactly constitutes such means in big data research, however, is currently unclear. Since the meaning of big data still works as an umbrella for a multitude of different theoretical solutions (Favaretto et al., <xref ref-type="bibr" rid="B34">2020</xref>), the problem of definition remains inherently bound to the epistemological one. Lots of data are mixed up with big data, evolutionary and revolutionary aspects are blended together, and a strong objectivistic rhetoric is minimizing the challenges raised by the scientific discussion.</p></sec>
<sec id="s3">
<title>The Promise of Revolution: Positivism in Incognito</title>
<p>At a deeper level, technocentric definitions that ignore epistemological issues have led to a diffused overconfidence in the exactitude of data. Today, big data form an emerging field pervaded by the mantra &#x0201C;let the data speak.&#x0201D; Many practitioners invoke a <italic>paradigm shift</italic>, oriented toward an utterly new epistemological and methodological answer based on Kuhn&#x00027;s concept of scientific revolution (Kuhn, <xref ref-type="bibr" rid="B63">1962</xref>). Using a provocative terminology, Chris Anderson announced the Petabyte Age in which figures &#x0201C;speak for themselves&#x0201D; without any previous knowledge involved. Asking what scientists can learn from Google, Anderson opens the door to a data-driven and -intensive approach to intelligent computation (Anderson, <xref ref-type="bibr" rid="B3">2008</xref>).</p>
<p>During the following years, big data have been employed by universities and companies to identify universal laws (Lehrer, <xref ref-type="bibr" rid="B70">2010</xref>; West, <xref ref-type="bibr" rid="B129">2017</xref>) and forecast future trends (Ginsberg et al., <xref ref-type="bibr" rid="B41">2009</xref>), ignoring errors and producing biased results (for an overview, see Lazer et al., <xref ref-type="bibr" rid="B66">2014</xref>; McFarland and McFarland, <xref ref-type="bibr" rid="B83">2015</xref>; Boulamwini and Gebru, <xref ref-type="bibr" rid="B13">2018</xref>; Zunino, <xref ref-type="bibr" rid="B135">2019</xref>).</p>
<p>Five years after the publication of Anderson&#x00027;s article, Viktor Mayer-Sch&#x000F6;nberger and Kenneth Cukier argued that big data are producing a three-fold revolution: (1) the shift from data-poor to data-rich science makes sampling procedures useless and obsolete; (2) the shift from sampling to <italic>n</italic> = all datasets makes methodological concerns about the <italic>exactitude</italic> of data pointless; and (3) the shift from the <italic>age-old search for causality</italic> to correlation produces a radical change in our understanding of the explanatory process (Mayer-Sch&#x000F6;nberger and Cukier, <xref ref-type="bibr" rid="B80">2013a</xref>). On the same year, Anderson&#x00027;s former colleague, Ian Steadman, took a step further. Steadman claims not only that &#x0201C;algorithms find the patterns and the hypothesis follows from the data&#x0201D; but also that &#x0201C;we&#x00027;re reaching a point where everyone can use big data&#x0201D; and no expertise is required to be a scientist anymore (Steadman, <xref ref-type="bibr" rid="B106">2013</xref>).</p>
<p>More than a century before, Max Weber identified a triple raid of subjectivity into science: (1) a scientist&#x00027;s personal interests and values guide toward a specific understanding of objects (Weber, <xref ref-type="bibr" rid="B127">1922</xref>, p. 10&#x02013;16); (2) knowledge has to be intended as a &#x0201C;knowledge from particular points of view&#x0201D; (Weber, <xref ref-type="bibr" rid="B127">1922</xref>, p. 47&#x02013;49); and (3) the &#x0201C;criteria by which this segment is selected&#x0201D; are inseparable from the cultural framework through which the ultimate meaning is acquired (Weber, <xref ref-type="bibr" rid="B127">1922</xref>, p. 51&#x02013;52). In Weber&#x00027;s text, the scientific objectivity ceased to be assumed <italic>a priori</italic>, becoming a problematic question firmly connected with the notion of methodological strictness. More than a century later, it seems that big data have definitely solved the issues raised by Weber.</p>
<sec>
<title>The Pre-social Output of a Socially Created Process</title>
<p>One of the assumptions that allows for the objectivistic rhetoric of big data is the pre-social origin of collected data. Some authors defend this position believing that data are digital raw <italic>traces</italic> left behind daily deeds and that the problem of subjectivity lies in their analysis and interpretation (Chandler, <xref ref-type="bibr" rid="B22">2015</xref>; Goldberg, <xref ref-type="bibr" rid="B42">2015</xref>; Severo and Romele, <xref ref-type="bibr" rid="B102">2015</xref>; Shaw, <xref ref-type="bibr" rid="B103">2015</xref>; Venturini et al., <xref ref-type="bibr" rid="B123">2017</xref>; Kim and Chung, <xref ref-type="bibr" rid="B59">2018</xref>; Jan et al., <xref ref-type="bibr" rid="B52">2019</xref>; Osman, <xref ref-type="bibr" rid="B90">2019</xref>; Shu, <xref ref-type="bibr" rid="B104">2020</xref>). Other authors rather argue for a pure data-driven approach in which intrusions of subjectivity are entirely ruled out (Kelling et al., <xref ref-type="bibr" rid="B57">2009</xref>; Mayer-Sch&#x000F6;nberger and Cukier, <xref ref-type="bibr" rid="B80">2013a</xref>). For the latter group, the hypotheses emerge from data excluding any need to know the question in advance. As Johnson (<xref ref-type="bibr" rid="B54">2014</xref>) writes, &#x0201C;the constructed nature of data makes it quite possible for injustices to be embedded in data itself,&#x0201D; that is, specific groups are more likely to be represented, or values are embedded in data through design decisions and not all the available information is transformed into data. While Johnson is aware of errors and biases in data collection, he agrees with his colleagues by saying that big data are the solution to a problem circumscribed exclusively to theoretically informed and sample-based datasets.</p>
<p>The first objection to this standpoint rests on the fact that datafication necessarily involves the transformation of a flow into discrete categories. In this process, data are first decontextualized and successively recontextualized to be employed in scientific research. What becomes data is thus only the part of the flow that lends itself to be easily adapted to the process of datafication (Berry, <xref ref-type="bibr" rid="B9">2011</xref>; Leonelli, <xref ref-type="bibr" rid="B71">2014</xref>; Wagner-Pacifici et al., <xref ref-type="bibr" rid="B125">2015</xref>). A second objection is that big data collection remains theoretically informed. Since collections cannot be utterly exhaustive, what to collect and how to collect are design-specific decisions that are embedded in data (Bollier, <xref ref-type="bibr" rid="B11">2010</xref>; Crawford, <xref ref-type="bibr" rid="B28">2013</xref>; Bowker, <xref ref-type="bibr" rid="B14">2014</xref>; Frick&#x000E9;, <xref ref-type="bibr" rid="B37">2014</xref>; Kitchin, <xref ref-type="bibr" rid="B61">2014</xref>; Diesner, <xref ref-type="bibr" rid="B29">2015</xref>; Seaver, <xref ref-type="bibr" rid="B101">2017</xref>). Third, those acting as data intermediaries hold the ultimate power in deciding which information will become available, for how long, when, and to whom (Schwartz and Cook, <xref ref-type="bibr" rid="B100">2002</xref>; Zwitter, <xref ref-type="bibr" rid="B136">2014</xref>; Schrock and Shaffer, <xref ref-type="bibr" rid="B99">2017</xref>).</p>
<p>These three objections underline human intervention during data collection and storage. The previously discussed idea of rawness thus rests on two implicit assumptions: that digital traces capture natural actors enacting natural behaviors and that data-collecting algorithms are intrinsically neutral. The first assumption incurs in the signaling problem, that is, the lack of correspondence between social and digital world, and will be discussed in major detail in the following section. The latter assumption is relatively well-known in science and technology studies (see Mowshowitz, <xref ref-type="bibr" rid="B87">1984</xref>); can algorithms really be neutral and objective quantifiers of the social world? Can the problem of subjectivity in data collection be solved? Technology itself does not have preferences nor ideas, but the designer does and influences the way the technology works whether intentionally or not. The faith in objective quantification, or <italic>dataism</italic> (van Dijck, <xref ref-type="bibr" rid="B121">2014</xref>, p. 198), is the belief in the efficiency of a &#x0201C;pseudo omniscient algorithmic deity&#x0201D; (Gransche, <xref ref-type="bibr" rid="B45">2016</xref>, p. 60). Algorithms are not only designed by humans for other humans but also embedded within a capitalist mode of production (Mager, <xref ref-type="bibr" rid="B77">2011</xref>, <xref ref-type="bibr" rid="B78">2014</xref>; Bibli&#x00107;, <xref ref-type="bibr" rid="B10">2016</xref>; Burrell, <xref ref-type="bibr" rid="B18">2016</xref>; Ames, <xref ref-type="bibr" rid="B2">2018</xref>; Caplan and Boyd, <xref ref-type="bibr" rid="B21">2018</xref>; Grosman and Reigeluth, <xref ref-type="bibr" rid="B46">2019</xref>). Google, for instance, remains a &#x0201C;profit-oriented, advertising-financed moneymaking machine&#x0201D; that promotes a &#x0201C;stratified attention economy&#x0201D; and delivers &#x0201C;a distorted picture of reality&#x0201D; (Fuchs, <xref ref-type="bibr" rid="B38">2011</xref>). The same goes for alternative search engines, such as Bing or Baidu, and for other companies, such as Twitter or Facebook (see Gaubert, <xref ref-type="bibr" rid="B39">2017</xref>). In this perspective, data collecting algorithms are constantly changing, theory-laden, and naturally selective human artifacts produced within a business environment.</p>
<p>To maintain problematic assumptions about implicit neutrality is particularly dangerous because it leads to overconfidence in exactitude, underestimation of risks, and minimization of epistemological issues. The situation is made even worse by the fact that algorithms are not stable over time and that their changes remain widely unknown. This undermines our ability to identify instances of misuse of data and threatens two of the basic assumptions of science: comparability and replicability of findings (Gelman, <xref ref-type="bibr" rid="B40">2013</xref>; Lazer et al., <xref ref-type="bibr" rid="B66">2014</xref>; Bibli&#x00107;, <xref ref-type="bibr" rid="B10">2016</xref>; Leonelli, <xref ref-type="bibr" rid="B72">2018</xref>). Moreover, digital memory is <italic>forgetful</italic>. Links easily decay, updates occasionally make older files unreadable, and pages are constantly updated and rewritten (see Floridi, <xref ref-type="bibr" rid="B36">2014</xref>). Once these issues are combined with the volatility of algorithms, it becomes evident that big data blend together three different kinds of potential biases: (1) a rewritten algorithm may be applied in the same context, treating data differently at time points A and B; (2) the same algorithm can be applied in another context, treating data at different time points in the same way, but without considering the influence that the changed online environment exercises on monitored users; and (3) a rewritten algorithm may be applied in a mutated context, mixing together the two problems described above.</p>
<p>By highlighting these issues in big data usage, we are not suggesting that &#x0201C;small data&#x0201D; are unproblematic or less problematic when it comes to comparability or replicability. Comparability is a persistent problem whenever different studies and/or different waves of the same research are involved. Replicability is no different. A study about replicability in economics conducted on 60 papers coming from 13 different journals shows that only 43% of results were replicable (Chang and Li, <xref ref-type="bibr" rid="B23">2015</xref>). A psychology report published by the Open Science Collaboration (<xref ref-type="bibr" rid="B89">2015</xref>) likewise shows that only 47% of the considered studies are fully replicable, while an additional 21% produce a &#x0201C;weaker evidence for the original findings despite using materials provided by the original authors.&#x0201D;</p>
<p>It is relatively common to define different standards for scientific research and business. The widespread adoption of online surveys in the private sector, despite severe coverage bias and self-selection issues holding back academic circles, is an example of this attitude. As big data are progressively leaving the private companies which collected them for business purposes&#x02014;be it through web scraping (ten Bosch et al., <xref ref-type="bibr" rid="B114">2018</xref>), trading platforms (Yu and Zhao, <xref ref-type="bibr" rid="B133">2019</xref>), direct data collection (Poppinga et al., <xref ref-type="bibr" rid="B93">2012</xref>), or publicly available sources (Chun-Ting Ho, <xref ref-type="bibr" rid="B26">2020</xref>)&#x02014;they are increasingly used for scientific research and to inform public policy (Ulbricht, <xref ref-type="bibr" rid="B119">2020</xref>). From this perspective, business standards are simply no longer enough to define acceptable data practices.</p>
<p>In conclusion, the expression &#x0201C;raw data&#x0201D; is nothing else but an oxymoron (Bowker, <xref ref-type="bibr" rid="B14">2014</xref>). The rawness of data is made impossible by the selectivity of theoretically informed algorithms, by the instability of the digital memory, by management decisions of data intermediaries, and by the implicit problems of quantification whenever a flow is reduced into a limited set of discrete categories.</p></sec>
<sec>
<title>A Photo Stole My Soul: The End of Theory and Other Selected Tales</title>
<p>The second pillar of the objectivistic rhetoric, partially grounded on the previous one, is the idea that big data are exhaustive. Researchers today have more data, a fact that is clear and not harmful by itself. What is problematic is the assumption that <italic>more</italic> means <italic>all</italic>, that is <italic>n</italic> = all. The idea that these datasets do not constitute a subset but are rather an exhaustive representation of social reality leads to an overestimated rhetoric of exactitude:</p>
<disp-quote>
<p>&#x0201C;<italic>The social science disciplines largely relied on sampling studies and questionnaires. But when the data is collected passively while people do what they normally do anyway, the old biases associated with sampling and questionnaires disappear&#x0201D;</italic> (Mayer-Sch&#x000F6;nberger and Cukier, <xref ref-type="bibr" rid="B80">2013a</xref>).</p>
</disp-quote>
<p>Big data are thus not just a selection of raw traces but are rather the collection of all of them (Ekstrom, <xref ref-type="bibr" rid="B32">2013</xref>; Kitchin, <xref ref-type="bibr" rid="B60">2013</xref>, <xref ref-type="bibr" rid="B61">2014</xref>; Walker, <xref ref-type="bibr" rid="B126">2015</xref>; Cheung et al., <xref ref-type="bibr" rid="B24">2019</xref>; Tani, <xref ref-type="bibr" rid="B111">2019</xref>; Taylor and Meissner, <xref ref-type="bibr" rid="B112">2019</xref>; Tian, <xref ref-type="bibr" rid="B115">2020</xref>). Assuming that data are neutral and fully exhaustive, the problem in handling them becomes technical. In this perspective, new technologies, methods, and procedures are all that is needed to cope with big data (see Strom, <xref ref-type="bibr" rid="B108">2012</xref>; Taylor et al., <xref ref-type="bibr" rid="B113">2014</xref>; Trifunovic et al., <xref ref-type="bibr" rid="B118">2015</xref>; Smith, <xref ref-type="bibr" rid="B105">2019</xref>). On the contrary, once we recognize that data are socially created artifacts, the technological and the technical improvements are no longer enough on their own without a careful methodological and epistemological reflection. The position openly in disagreement with the <italic>n</italic> = all assumption can be summarized in four points:</p>
<list list-type="simple">
<list-item><p>- Even if <italic>n</italic> = all is accepted as correct in a restricted sense (i.e., there is effective access to all data generated by every user on a given platform), big data suffer from a <italic>signal problem</italic> causing a lack of correspondence between the social and the digital worlds (Manovich, <xref ref-type="bibr" rid="B79">2011</xref>; Crawford, <xref ref-type="bibr" rid="B28">2013</xref>; Lewis, <xref ref-type="bibr" rid="B74">2015</xref>; Gransche, <xref ref-type="bibr" rid="B45">2016</xref>);</p></list-item>
<list-item><p>- Since big data are constantly growing second by second, it is implicitly impossible to examine them in their totality since every time a new analysis is performed new data are, at the same time, generated (Symons and Alvarado, <xref ref-type="bibr" rid="B110">2016</xref>);</p></list-item>
<list-item><p>- Since specific portions of the population are more or less likely to actively participate in certain online environments, big data are often a biased sample of the population rather than the population itself (Lewis, <xref ref-type="bibr" rid="B74">2015</xref>; McFarland and McFarland, <xref ref-type="bibr" rid="B83">2015</xref>; Chun-Ting Ho, <xref ref-type="bibr" rid="B26">2020</xref>); and</p></list-item>
<list-item><p>- Due to the implicit selectivity in data collection, big data never represent a complete set of information (Lagoze, <xref ref-type="bibr" rid="B64">2014</xref>; Leonelli, <xref ref-type="bibr" rid="B71">2014</xref>).</p></list-item>
</list>
<p>These positions see the <italic>n</italic> = all assumption as a mathematical limit which can be approached but not reached. The exhaustivity, described as one of the core features of big data (Kitchin and McArdle, <xref ref-type="bibr" rid="B62">2016</xref>), is thus a highly questionable assumption at very best.</p>
<p>Big data can be generated by natural actors, physical phenomena, and artificial actors (Zwitter, <xref ref-type="bibr" rid="B136">2014</xref>). Natural actors are not necessarily individuals, an account can hide a collective (Park and Macy, <xref ref-type="bibr" rid="B91">2015</xref>), and individuals can have multiple accounts. As a result, non-random errors are constantly embedded in data. Last year&#x00027;s Cambridge Analytica scandal and the case of Russian trolls targeting teens with memes over Facebook prove the extension of such an issue and how artificially certain supposedly <italic>natural</italic> actors can behave. As photography might not be a truthful representation of reality, big data might not be utterly exhaustive nor accurate (Bollier, <xref ref-type="bibr" rid="B11">2010</xref>; Arbesman, <xref ref-type="bibr" rid="B5">2013</xref>; Brooks, <xref ref-type="bibr" rid="B16">2013</xref>; Frick&#x000E9;, <xref ref-type="bibr" rid="B37">2014</xref>; Welles, <xref ref-type="bibr" rid="B128">2014</xref>; Bail, <xref ref-type="bibr" rid="B6">2015</xref>; Jones, <xref ref-type="bibr" rid="B55">2019</xref>; Corple and Linabary, <xref ref-type="bibr" rid="B27">2020</xref>; Lee and Cook, <xref ref-type="bibr" rid="B68">2020</xref>). Everything is significant and outliers are difficult to identify; as such, artificial actors cannot always be distinguished from natural ones, online and offline behaviors can differ, there may be multiple users behind an account, etc.</p>
<p>From this point of view, theory is the victim of an ongoing process of mystification that pushes forward a mistaken conceptualization of big data as inherently neutral, unproblematic and objective. As Hargittai writes, big data are reproducing social inequalities in digital form (Hargittai, <xref ref-type="bibr" rid="B48">2008</xref>). It is thus of utmost importance to ask: &#x0201C;Which people are excluded [?] Which places are less visible? What happens if you live in the shadow of big data sets?&#x0201D; (Crawford, <xref ref-type="bibr" rid="B28">2013</xref>). By leaving these unspoken issues tacitly crawling around, crucial questions as the ones formulated by Crawford are not just unanswered but even unasked. The theory is more necessary today than it ever was.</p></sec>
<sec>
<title>Let&#x00027;s Let the Raw Meat Speak</title>
<p>No one will ever claim that a piece of meat on a pan will cook itself or that it arrived on the pan all by itself, nor will anybody suggest that every piece of meat implicitly leads toward a specific dish just like that, by itself. It is simple; there is a cook who decides which cut of meat to buy, how to cook it, and what should be the final result in terms of composition and esthetics. Furthermore, the cook&#x00027;s actions and decisions are embedded in a rich sociocultural context that profoundly influences them. However, this seems not to be the case of data processing. No one generates big data, no one analyzes them, and no one interprets them. Big data speak and the scientists listen. Being a cook implies an active effort of comprehension, elaboration, and interpretation. Even when there is a recipe to follow, many factors influence the process, from the selection of ingredients to the plating&#x02014;cooking thus remains a creative act. For some reason, however, big data users refuse to picture themselves as thoughtful professionals interacting with data, promoting instead an image of scientists as neutral listeners of the concert produced by the world in motion (Anderson, <xref ref-type="bibr" rid="B3">2008</xref>; Kelling et al., <xref ref-type="bibr" rid="B57">2009</xref>; Prensky, <xref ref-type="bibr" rid="B94">2009</xref>; Dyche, <xref ref-type="bibr" rid="B30">2012</xref>; Torrecilla and Romo, <xref ref-type="bibr" rid="B116">2018</xref>).</p>
<p>It has been already discussed how big data are far from being pre-social artifacts and how their exactitude and accuracy should be the object of a critical examination rather than an assumed <italic>a priori</italic>. The third pillar of the objectivistic rhetoric, the myth of speaking data, is no different from the previous two in terms of its inner fragility.</p>
<p>Whether a simple metaphor or not, assuming that data-derived knowledge is a-problematic can be highly problematic in itself. Different analytical strategies are always possible, and each of them can potentially lead to a different conclusion. The specific compromise adopted by a researcher is influenced by a variety of factors like time, money, or previous knowledge. Furthermore, specific organizational and professional subcultures influence data collection, structure the analysis, and guide the interpretation. This is true for traditional scientific research and remains true once big data become a part of it (Gould, <xref ref-type="bibr" rid="B43">1981</xref>; Boyd and Crawford, <xref ref-type="bibr" rid="B15">2012</xref>; Jenkins, <xref ref-type="bibr" rid="B53">2013</xref>; Bail, <xref ref-type="bibr" rid="B6">2015</xref>). In this sense, data are like ingredients which do not directly lead to a specific recipe but merely push the cook in a given direction. Even when the ingredients perfectly fit an existing recipe, the ingredients <italic>never</italic> perform the required actions and <italic>never</italic> substitute for the cook as the ultimate meaning producer. A dataset might likewise facilitate or obstruct specific approaches to a given question, but it will not generate meaning instead of the researcher. Only when the existence of a &#x0201C;pseudo omniscient algorithmic deity&#x0201D; is refused will the datafied world and society live as two separate and substantially different entities (see Gransche, <xref ref-type="bibr" rid="B45">2016</xref>). Even if data were metaphorically able to speak, their language would require much more than passive listeners to be understood and correctly interpreted. While the situation of journalists, political professionals, and other data outsiders, who continue to rely on &#x0201C;inflated accounts of the objectivity of analytics&#x0201D; (Baldwin-Philippi, <xref ref-type="bibr" rid="B7">2020</xref>), did not change much over the years, instances and claims of pure objectivity (see Robinson, <xref ref-type="bibr" rid="B97">2018</xref>; Succi and Coveney, <xref ref-type="bibr" rid="B109">2019</xref>) became progressively rarer to find in scientific research. In fact, in recent years, the talk about &#x0201C;data-scientific objectivity&#x0201D; in big data relied on transparency, replicability, and the presumably shareable nature of decision-making (Williamson and Piattoeva, <xref ref-type="bibr" rid="B131">2019</xref>) to translate standardization into a form of quasi-objective construction of knowledge.</p></sec>
<sec>
<title>The Moral of the Story</title>
<p>More than a century after Weber&#x00027;s theories, scientists struggle to reaffirm what used to be taken for granted. Big data critics move along three main argumentative lines: (1) data are not neutral representations of society as they are collected through specific <italic>modes of production</italic> (Mager, <xref ref-type="bibr" rid="B78">2014</xref>); (2) data do not represent the totality of the population but are rather a &#x0201C;misrepresentative mixture of subpopulations&#x0201D; captured in their online environment and subject to various types of biases (McFarland and McFarland, <xref ref-type="bibr" rid="B83">2015</xref>); and (3) the meaning does not emerge from the data itself but is rather from an effort of interpretation performed by fallible human beings (Gransche, <xref ref-type="bibr" rid="B45">2016</xref>). Retracing Weber&#x00027;s thoughts, specific interests are at work in data production and what is accessed is a part of reality from a specific, culturally mediated standpoint.</p>
<p>At an analytical level, big data users might be divided into two different currents of thought. On one side, the objectivistic approach is deeply rooted in the private sector with several representatives from the academic circles. Objectivists variously support the pillars described above, developing and reiterating the rhetoric of neutrality. These forms of empiricism, in particular in their most radical instances, were extensively and repeatedly criticized by the scientific community (see Resnyansky, <xref ref-type="bibr" rid="B96">2019</xref>). Evaluativists question the objectivistic claims of neutrality and promote a critical re-examination of big data&#x00027;s multiple facets. While objectivists view big data as a revolution that solves most of the challenges traditionally established in the scientific domain, evaluativists say that big data shape those challenges, solve some of them, and introduce new ones.</p>
<p>With respect to the past, the big data phenomenon represents both a revolution and an evolution. Some basic assumptions in the philosophy of science are becoming increasingly troublesome to uphold. Highly restricted accessibility to data&#x02014;linked with great ethical dilemmas&#x02014;and the constant variation of processing algorithms obstruct both comparability and Popper&#x00027;s via <italic>negativa</italic> (Popper, <xref ref-type="bibr" rid="B92">1935</xref>).</p></sec></sec>
<sec id="s4">
<title>A (R)Evolving Paradigm</title>
<p>From an epistemological standpoint, the lack of agreement over the definition of big data (Favaretto et al., <xref ref-type="bibr" rid="B34">2020</xref>) is particularly cumbersome. If the underlying question is &#x0201C;how to use big data to produce and justify knowledge?&#x0201D;, then it becomes clear that not being able to univocally circumscribe the central phenomenon is a major impediment. Vague and omni-comprehensive definitions promote confusion which, in turn, promotes an objectivistic rhetoric. The resulting <italic>techno-optimism</italic> was extensively criticized throughout the previous pages.</p>
<p>To further address the issue and counter the diffused hype-related discourses (Vydra and Klievink, <xref ref-type="bibr" rid="B124">2019</xref>), it is first necessary to establish and underline the evolutionary characteristics that link big data to previous knowledge. We will argue that challenges raised by big data require an answer that should come from within the current scientific paradigm and that big data differentiate themselves from small data at a relational level, altering the power dynamics involved in knowledge production.</p>
<sec>
<title>Size and Its Struggles</title>
<p>At the turn of the twentieth century, big data were welcomed as a game changer, even though not all of the large datasets were actually new (Lagoze, <xref ref-type="bibr" rid="B64">2014</xref>). Where do big data establish evolutionary links with small data, and which aspects of this supposedly new phenomenon truly break up with the past? This is a key question that requires an answer in order to strip big data of their current ambivalence and ambiguity.</p>
<p>Technological advancement and rapidly increasing connectivity produced a progressively growing amount of data. The sheer quantity of available information is offering great opportunities to science. For example, the availability of real-time data makes it possible to run a timely analysis capable of answering relevant and pressing questions fastening institutional reactions to emerging social issues. Big data also provide a way to study social groups that were traditionally difficult to reach with survey methods (McFarland and McFarland, <xref ref-type="bibr" rid="B83">2015</xref>). On the downside, however, such growth took a toll on the research process, undermining n-sensitive statistical approaches (Lee and Martin, <xref ref-type="bibr" rid="B69">2015</xref>). The data deluge thus delivered a flood of false positives and called for <italic>big methods</italic> (Williamson, <xref ref-type="bibr" rid="B130">2014</xref>; Ahonen, <xref ref-type="bibr" rid="B1">2015</xref>). Most of the traditional statistical methods were designed to deal with small samples collected through survey methods. As the size and the complexity of a dataset increase, assumptions about data are frequently violated and techniques sensitive to the numerosity of cases produce distorted results. While big data are not replacing small data (see Hekler et al., <xref ref-type="bibr" rid="B51">2019</xref>), the applicability of small methods to big data is highly questionable. What is needed is not just a mere technological improvement but rather a change in the way we look at data in a data-rich context. In this sense and at a methodological level, big data require a huge process of renovation that goes well-beyond a mere evolution of small methods.</p></sec>
<sec>
<title>Knowledge Discovery</title>
<p>Big data are said to have triggered a shift from a theory-driven paradigm based on hypotheses, experiments, and simulations to a data-intensive exploratory science which is rather collaborative, networked, and data-driven (Bell et al., <xref ref-type="bibr" rid="B8">2009</xref>; Bollier, <xref ref-type="bibr" rid="B11">2010</xref>; Kitchin, <xref ref-type="bibr" rid="B61">2014</xref>; Chandler, <xref ref-type="bibr" rid="B22">2015</xref>; Trabucchi and Buganza, <xref ref-type="bibr" rid="B117">2019</xref>). While big data impacted certain scientific domains more than others (see Kelling et al., <xref ref-type="bibr" rid="B57">2009</xref>), claims about the rise of an entirely new paradigm in knowledge discovery rest on a misleading interpretation of these two paradigms as completely separated and independent (see also Hekler et al., <xref ref-type="bibr" rid="B51">2019</xref>). In fact, past and contemporary research has &#x0201C;always rested on a combination of hypothesis-driven and data-driven methods&#x0201D; (Strasser, <xref ref-type="bibr" rid="B107">2012</xref>, p. 86) and the current <italic>enchantment</italic> with data-driven methods must face the fact that</p>
<disp-quote>
<p>&#x0201C;<italic>the studies are irreproducible, the data is irreproducible, the data is unreliable, there is a lack of positive and negative controls, there is the inappropriate use of statistics (often leading to results that the investigator &#x02018;likes&#x02019;), there is the investigator&#x00027;s ignoring of negative results, there is a pro-positive-result publication bias, and more&#x02026;&#x0201D;</italic> (Frick&#x000E9;, <xref ref-type="bibr" rid="B37">2014</xref>, p. 659).</p>
</disp-quote>
<p>Data-driven science is too <italic>post-hoc</italic> (Frick&#x000E9;, <xref ref-type="bibr" rid="B37">2014</xref>, p. 660) but, rather than seeing two radically opposed paradigms, it is possible to see them as two potentially convergent <italic>cultures of modeling</italic> (Veltri, <xref ref-type="bibr" rid="B122">2017</xref>).</p>
<p>With different degrees of emphasis, it was highlighted that big data were also producing a parallel shift from causal models to correlations (Anderson, <xref ref-type="bibr" rid="B3">2008</xref>; Bollier, <xref ref-type="bibr" rid="B11">2010</xref>; Mayer-Sch&#x000F6;nberger and Cukier, <xref ref-type="bibr" rid="B80">2013a</xref>). Opponents to this view claimed that correlation is only enough for business purposes and stressed the dangers of the emerging &#x0201C;data fundamentalism&#x0201D; (Crawford, <xref ref-type="bibr" rid="B28">2013</xref>; Bowker, <xref ref-type="bibr" rid="B14">2014</xref>; Gransche, <xref ref-type="bibr" rid="B45">2016</xref>). However, it is once again possible to see these two paradigms as overlapping and convergent (Succi and Coveney, <xref ref-type="bibr" rid="B109">2019</xref>). The theory-driven paradigm frequently relies on correlations, while the data-driven paradigm never truly abandoned causal aspirations (see Canali, <xref ref-type="bibr" rid="B20">2016</xref>). Since causality is difficult to prove, theory-driven approaches often stop at correlations. Big data, on the other hand, make correlation-based explanations both more precise and easier to provide but do not exclude <italic>a priori</italic> integration with causal models (Veltri, <xref ref-type="bibr" rid="B122">2017</xref>; Hassani et al., <xref ref-type="bibr" rid="B49">2018</xref>).</p>
<p>Kuhn defined scientific revolutions as &#x0201C;those non-cumulative developmental episodes in which an older paradigm is replaced in whole or in part by an incompatible new one&#x0201D; (Kuhn, <xref ref-type="bibr" rid="B63">1962</xref>, p. 92). At an epistemological level and within the realm of social sciences, we argue that this is not the case of big data: (1) big data epistemology within the scientific literature is still heavily grounded on basic assumptions of the third paradigm and obey the principles developed by Karl Popper; (2) big data are integrating small data and not replacing them; and (3) theory- and data-driven approaches share commonalities that make them potentially convergent rather than radically divergent.</p>
<p>Big data introduce significant changes at multiple levels of the process of knowledge discovery. While from the methodological point of view, the urge for <italic>big methods</italic> is revolutionary in Lagoze&#x00027;s terms, but not in Kuhn&#x00027;s, the perceived radicalness of epistemological changes rests on an excessively polarized view of theory- and data-driven approaches and of their respective implications.</p></sec>
<sec>
<title>The New Digital Divide</title>
<p>The match between correlation and causation hides a performative struggle between companies and universities. In this sense, different perspectives on big data separate experts from scientists, causing science to leak from academia (Savage and Burrows, <xref ref-type="bibr" rid="B98">2007</xref>; Lazer et al., <xref ref-type="bibr" rid="B67">2009</xref>; Boyd and Crawford, <xref ref-type="bibr" rid="B15">2012</xref>; Burrows and Savage, <xref ref-type="bibr" rid="B19">2014</xref>). Experts claim to produce better science than scientists challenging explicitly established standards and practices. However, as Strasser rightly pointed out, &#x0201C;this has contributed to an exaggerated trust in the quality and comparability of the data and to many irreproducible results&#x0201D; (Strasser, <xref ref-type="bibr" rid="B107">2012</xref>, p. 86). The fracture between business and academic circles is further reinforced by the parallel fracture between those who are &#x0201C;big data rich,&#x0201D; typically collective actors of private nature, and those who stay &#x0201C;big data poor&#x0201D; (Gelman, <xref ref-type="bibr" rid="B40">2013</xref>; Andrejevi&#x0010D;, <xref ref-type="bibr" rid="B4">2014</xref>; Taylor et al., <xref ref-type="bibr" rid="B113">2014</xref>).</p>
<p>The problem of access conceals two radically different issues, the one of <italic>nominal access</italic> to a dataset, that is the effective possibility to gather data to use, and the one of <italic>actual access</italic>, the possibility not just to obtain such data but also to effectively use them. By distinguishing the two types of access to data, it becomes possible to differentiate the problems derived from restricted accessibility to data from the binding effects of not having the required skills to adequately deal with them. While both of these two forms of access are far from being easily reachable, we interpret actual access as more restrictive because, without nominal access to data, it is impossible to exercise it.</p>
<p>Steadman (<xref ref-type="bibr" rid="B106">2013</xref>) argued that we will soon reach a point at which everyone will have the possibility to use big data to produce science. Today it is relatively easy to perform some basic analysis on open source data using free statistical software. In principle, everyone can do it and, at least on paper, it is not difficult to extend this argument from small to big data. Nevertheless, from a practical point of view, things are not that easy. Even if the nominal access to big data is incurring a slow but tortuous democratizing transformation that makes it difficult to forecast future trends, a certain degree of professional skills is and will always be required for the analysis (Manovich, <xref ref-type="bibr" rid="B79">2011</xref>; Boyd and Crawford, <xref ref-type="bibr" rid="B15">2012</xref>; Mayer-Sch&#x000F6;nberger and Cukier, <xref ref-type="bibr" rid="B81">2013b</xref>; Andrejevi&#x0010D;, <xref ref-type="bibr" rid="B4">2014</xref>; Williamson, <xref ref-type="bibr" rid="B130">2014</xref>). Due to the complexity of big data, contrary to what Steadman claimed, it is thus much more likely that big data will require <italic>big skills</italic>. The democratic idea of science crushed against an oligarchy of big data users established by limitations in nominal access and perpetuated by issues of actual access. This characteristic of big data is seriously threatening both the transparency and the replicability of scientific procedures by marking the mismatch between research ethics and <italic>big methods</italic> (Lewis, <xref ref-type="bibr" rid="B74">2015</xref>; Levy and Johns, <xref ref-type="bibr" rid="B73">2016</xref>; Metcalf and Crawford, <xref ref-type="bibr" rid="B85">2016</xref>). In the near future, unlike what was suggested by Steadman, it is far more likely to observe the democratization of technological means and of the nominal access&#x02014;the European General Data Protection Regulation (GDPR) represents a first crucial step in this direction&#x02014;and a restriction of actual access due to the increased difficulty in data computing.</p>
<p>The democratization of the nominal access will have to deal with the rising concerns about privacy. The awareness of great risks for privacy emerged shortly after the diffusion of big data (Bollier, <xref ref-type="bibr" rid="B11">2010</xref>; McDermott, <xref ref-type="bibr" rid="B82">2017</xref>), but with the &#x0201C;collapse of the control zone&#x0201D; (Lagoze, <xref ref-type="bibr" rid="B64">2014</xref>, p. 6) and the normalization of <italic>dataveilance</italic> (van Dijck, <xref ref-type="bibr" rid="B121">2014</xref>), it seemed that big data were destined to bypass all privacy issues anyway: &#x0201C;Google knows what you&#x00027;re looking for. Facebook knows what you like. Sharing is the norm, and secrecy is out&#x0201D; (Preston, <xref ref-type="bibr" rid="B95">2014</xref>).</p>
<p>Nevertheless, this impression faced numerous examples of ethical ambiguity in big data research. Tsvetkov&#x00027;s artistic project <italic>Your Face Is Big Data</italic> showed that anyone can use pictures of random strangers to easily identify their profiles on social networks (Chulkovskaya, <xref ref-type="bibr" rid="B25">2016</xref>). In 2006, a research group from Harvard gathered data about the Facebook profiles of 1,700 unaware students to investigate changes in interests and relationships over time. While the results were published respecting the anonymity of these users (Lewis et al., <xref ref-type="bibr" rid="B75">2008</xref>), it was soon proved that de-anonymization of the employed and publicly available dataset was still possible (Zimmer, <xref ref-type="bibr" rid="B134">2008</xref>; Boyd and Crawford, <xref ref-type="bibr" rid="B15">2012</xref>). In 2016, a study employing geographical data argued that using big data it was possible to give a name and a surname to the anonymous artist known as Banksy (Hauge et al., <xref ref-type="bibr" rid="B50">2016</xref>; Metcalf and Crawford, <xref ref-type="bibr" rid="B85">2016</xref>). After a legal battle that delayed the publication of the article, the authors finally managed to publish and added a short ethical note:</p>
<disp-quote>
<p>&#x0201C;<italic>the authors are aware of, and respectful of Mr. Gunningham and his relatives and have thus only used data in the public domain. We have deliberately omitted precise addresses&#x0201D;</italic> (Hauge et al., <xref ref-type="bibr" rid="B50">2016</xref>, p. 5).</p>
</disp-quote>
<p>In the article, graffiti were defined as &#x0201C;terrorism-related acts&#x0201D; and Robin Gunningham was publicly associated with vandalism. Whether Gunningham really is Banksy or not remains unclear. The study was strongly criticized at an ethical level and its methodological validity was questioned. Banksy was obviously not pleased by the article and newspapers started to pester Gunningham and his family, revealing even more about their personal lives and whereabouts. Three years later, Banksy still remains an anonymous artist.</p>
<p>These brief examples clearly show how easily scientific research can harm studied subjects in the Petabyte Age. It is no longer possible to assume that public data are <italic>a-problematic</italic> from an ethical point of view. On the contrary, the availability of data is today a sensitive topic in itself. As for the anonymity and informed consent, things are arguably even more complicated. Small adaptive changes to information privacy law will not suffice since big data offered a radically new perspective on the issue at hand.</p>
<p>The main and arguably the more radical effect of big data thus rests at the crossroads between business methods, academic research, emerging laws, and accessibility. Big data entirely changed the rules of the game by redefining power dynamics involved in the processes of data production and knowledge discovery. We therefore propose a theoretical macro-level model (see <xref ref-type="fig" rid="F1">Figure 1</xref>) to orientate future research. The model focuses on collective actors involved in the above-mentioned processes and on the relation they establish between each other.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>Power dynamics, data collection, and nominal access in big data production.</p></caption>
<graphic xlink:href="fdata-03-00031-g0001.tif"/>
</fig>
<p>The center of <xref ref-type="fig" rid="F1">Figure 1</xref> is occupied by information privacy law that directly affects not only the individual and the collective actors involved in big data usage but also their relations by regulating the access to collected data. The GDPR, for example, dictate that, to collect data of a given kind, there must be a specific business purpose. This poses limitations on the type of information that a company can collect and further accentuate the signal problem discussed above. The renewed attention for privacy and stricter regulations accentuate the compliance to the existing set of rules that pose information privacy law at the center of the complex network of relations in data production (Gruschka et al., <xref ref-type="bibr" rid="B47">2018</xref>). Furthermore, GDPR poses part of the power directly in the consumer&#x00027;s hand who can forbid certain uses of data that he or she is willingly sharing (Yeh, <xref ref-type="bibr" rid="B132">2018</xref>). Moreover, companies involved in data collection may impose further limitations to the nominal access in accordance with current regulations to increase their competitive advantage (Fuchs, <xref ref-type="bibr" rid="B38">2011</xref>). As a result, usually only a part of collected data becomes nominally available. Data unavailable to &#x0201C;outsiders&#x0201D; are here addressed as intra-data. Information privacy laws and business secrecy-related dynamics thus pose a limit to nominal access.</p>
<p>Social scientists are typically not involved in the collection and the storage of big data, which means that they have no control of any kind over the population and the data collection process and experience issues of actual access (Burrows and Savage, <xref ref-type="bibr" rid="B19">2014</xref>; Bonenfant and Meurs, <xref ref-type="bibr" rid="B12">2020</xref>). The entity of this limitation varies across disciplines and does not affect all members of the scientific community equally (Savage and Burrows, <xref ref-type="bibr" rid="B98">2007</xref>; Kelling et al., <xref ref-type="bibr" rid="B57">2009</xref>; O&#x00027;Leary, <xref ref-type="bibr" rid="B88">2013</xref>). What will be collected, how it will be collected, and how it will be stored and made accessible are thus usually defined within a business context in the interaction between algorithmists and the company that employs them. Secondary data and therefore the analysis of data produced for different purposes is a common thing in research. So, why is it a problem when dealing with big data? Using secondary data, an important part of the researcher&#x00027;s work that typically precedes the analysis is the evaluation of the dataset, aimed at assessing the quality and the appropriateness of data. With big data, the algorithmic opacity and the private nature of relevant information (Burrell, <xref ref-type="bibr" rid="B18">2016</xref>) both negatively affect the actual access, making critical examination of data for scientific purposes significantly more difficult if not nearly impossible (Bonenfant and Meurs, <xref ref-type="bibr" rid="B12">2020</xref>). In this sense, researchers are marginalized and deprived of power, losing control over data, meant as a primary means of knowledge discovery.</p>
<p>Once data collecting algorithms are defined and set in motion, the data collection begins. The target population is distinguished in its online and offline form. While data collecting algorithms capture most of the online information (all <italic>if</italic> we consider only data that algorithms were designed to collect and ignore the issues raised by GDPR), the access to offline data is limited and rather indirect. Since there is no necessary correspondence between online and offline behavior, the collected data tells us more about the online world than about its offline counterpart (see Crawford, <xref ref-type="bibr" rid="B28">2013</xref>; Lewis, <xref ref-type="bibr" rid="B74">2015</xref>; Gransche, <xref ref-type="bibr" rid="B45">2016</xref>). Algorithms improve over time due to the machine learning process and feed &#x0201C;data back to users, enabling them to orient themselves in the world&#x0201D; shaping human agency directly (Kennedy et al., <xref ref-type="bibr" rid="B58">2015</xref>, p. 1; see also Graham, <xref ref-type="bibr" rid="B44">2018</xref>).</p>
<p>While methodological changes produced by big data do not seem to suffice to invoke a whole new paradigm in knowledge discovery (see also Leonelli, <xref ref-type="bibr" rid="B71">2014</xref>), the rise of big data drastically shaped the involved actors and their relations. The scientific community was thrown to the borders of the process, losing the control that it is traditionally used to. In this sense, the fractures between business and science on one side and between business methods and research ethics on the other, joined with issues of nominal and actual access, are causing tensions at an epistemological level and pushing science outside of academia.</p></sec></sec>
<sec sec-type="conclusions" id="s5">
<title>Conclusions</title>
<p>This paper offered an extensive literature review while addressing the problem of defining big data, the harmful diffusion of an objectivistic rhetoric, and the impact of big data on knowledge discovery within the scientific domain. As discussed, many authors repeatedly failed in their attempt to provide big data with a distinctive and unitary status by focusing on the inherent characteristics of big data.</p>
<p>Big and small data continue to be affected by subjective decisions and errors at multiple levels. The intrinsic logical fallacies of the presumed neutrality and exhaustivity in data collection, analysis, and interpretation have been explored and illustrated.</p>
<p>Following Lagoze&#x00027;s (<xref ref-type="bibr" rid="B64">2014</xref>) distinction between evolutionary and revolutionary dimensions, big data have been interpreted as a methodological revolution carried over by epistemological and technological evolution. In this sense, we argue, big data are not calling for a radical change of paradigm as other authors claimed but rather for an adaptive redefinition and re-discussion of current standards in social sciences. By shifting the attention from the intrinsic characteristics of the object to the relations established between acting subjects and the object at hand, it becomes possible to trace a demarcation line between small and big data. In fact, the area where big data are provoking major changes, differentiating themselves from the so-called small data, is precisely the one of relations involved in data collection, data storage, and data processing. In this sense, big data are pushing the scientific community to the periphery of the new geography of power dynamics in knowledge discovery and entirely redesigning its landscape while changing stakeholders, gatekeepers, and even the rules of the game.</p>
<p>The widespread talk about &#x0201C;revolution&#x0201D; placed big data in a sort of virgin territory where everything was possible. By emphasizing sources of continuity, we tried to bring the debate back to the third paradigm to start anew from a common ground. It is undeniable that the developments observed during the past two decades cannot always and entirely be dismissed as simple evolutionary and adaptive changes, and yet neglecting these aspects in favor of the distracting twinkle of novelty establishes the risk to undermine interdisciplinary cooperation and promotes structural shortsightedness. In the European context, this will arguably be even more important in future years given the recent attempt of the European Commission (<xref ref-type="bibr" rid="B33">2020</xref>) to pursue an &#x0201C;ecosystem of trust&#x0201D; through a &#x0201C;coordinated European approach [&#x02026;] on the better use of big data&#x0201D; in artificial intelligence research. Once the hype is over, the scientific community will have to face the fact that the changing power dynamics has led to the privatization of relevant information. The talk about transparency, representativeness, robustness, privacy, replicability, and comparability will thus have to resume, not to satisfy some remote theoretical need disconnected from reality but to establish acceptable practices and standards in a mutated context and to provide an effective tool for policy-making. To do so, at least some degree of agreement about what actually constitutes the subject of the discussion will be needed.</p></sec>
<sec id="s6">
<title>Author Contributions</title>
<p>The manuscript has been written by DB under the supervision of DR. Both authors contributed to the article and approved the submitted version.</p></sec>
<sec id="s7">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p></sec>
</body>
<back>
<ack><p>Thanks also go to the Center for Information and Communication Technology and Center for Religious Studies of Fondazione Bruno Kessler, Ivano Bison and the University of Trento, Kurt Fendt and the MIT Active Archive Initiative and Jeffrey Schnapp and the Harvard Metalab, along with Tiffany Hwang and Greg Izor who carefully reviewed the English of this article.</p>
</ack>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ahonen</surname> <given-names>P.</given-names></name></person-group> (<year>2015</year>). <article-title>Institutionalizing big data methods in social and political research</article-title>. <source>Big Data Soc.</source> <volume>2</volume>, <fpage>1</fpage>&#x02013;<lpage>12</lpage>. <pub-id pub-id-type="doi">10.1177/2053951715591224</pub-id></citation></ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ames</surname> <given-names>M. G.</given-names></name></person-group> (<year>2018</year>). <article-title>Deconstructing the algorithmic sublime</article-title>. <source>Big Data Soc.</source> <volume>5</volume>, <fpage>1</fpage>&#x02013;<lpage>4</lpage>. <pub-id pub-id-type="doi">10.1177/2053951718779194</pub-id></citation></ref>
<ref id="B3">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Anderson</surname> <given-names>C.</given-names></name></person-group> (<year>2008</year>). <source>The end of theory: the data deluge makes the scientific method obsolete</source>. In: <italic>WIRED</italic>. Available online at: <ext-link ext-link-type="uri" xlink:href="http://www.wired.com/science/discoveries/magazine/16-07/pb_theory">http://www.wired.com/science/discoveries/magazine/16-07/pb_theory</ext-link> (accessed November 1, 2016).</citation></ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Andrejevi&#x0010D;</surname> <given-names>M.</given-names></name></person-group> (<year>2014</year>). <article-title>The big data divide</article-title>. <source>Int. J. Commun.</source> <volume>8</volume>, <fpage>1673</fpage>&#x02013;<lpage>1689</lpage>.</citation></ref>
<ref id="B5">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Arbesman</surname> <given-names>S.</given-names></name></person-group> (<year>2013</year>). <article-title>Five myths about big data,</article-title> in <source>The Washington Post</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://www.washingtonpost.com/opinions/five-myths-about-big-data/2013/08/15/64a0dd0a-e044-11e2-963a-72d740e88c12_story.html">https://www.washingtonpost.com/opinions/five-myths-about-big-data/2013/08/15/64a0dd0a-e044-11e2-963a-72d740e88c12_story.html</ext-link> (accessed November 8, 2016).</citation></ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bail</surname> <given-names>C. A.</given-names></name></person-group> (<year>2015</year>). <article-title>Lost in a random forest: using big data to study rare events</article-title>. <source>Big Data Soc.</source> <volume>2</volume>, <fpage>1</fpage>&#x02013;<lpage>3</lpage>. <pub-id pub-id-type="doi">10.1177/2053951715604333</pub-id></citation></ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Baldwin-Philippi</surname> <given-names>J.</given-names></name></person-group> (<year>2020</year>). <article-title>Data ops, objectivity, and outsiders: journalistic coverage of data campaigning</article-title>. <source>Polit. Commun.</source> <volume>37</volume>, <fpage>1</fpage>&#x02013;<lpage>20</lpage>. <pub-id pub-id-type="doi">10.1080/10584609.2020.1723751</pub-id></citation></ref>
<ref id="B8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bell</surname> <given-names>G.</given-names></name> <name><surname>Hey</surname> <given-names>T.</given-names></name> <name><surname>Szalay</surname> <given-names>A.</given-names></name></person-group> (<year>2009</year>). <article-title>Beyond the data deluge</article-title>. <source>Nature</source> <volume>323</volume>, <fpage>1297</fpage>&#x02013;<lpage>1298</lpage>. <pub-id pub-id-type="doi">10.1126/science.1170411</pub-id><pub-id pub-id-type="pmid">19265007</pub-id></citation></ref>
<ref id="B9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Berry</surname> <given-names>D. M.</given-names></name></person-group> (<year>2011</year>). <article-title>The computational turn: thinking about the digital humanities</article-title>. <source>Culture Mach.</source> <volume>12</volume>, <fpage>1</fpage>&#x02013;<lpage>22</lpage>.</citation></ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bibli&#x00107;</surname> <given-names>P.</given-names></name></person-group> (<year>2016</year>). <article-title>Search algorithms, hidden labour and information control</article-title>. <source>Big Data Soc.</source><volume>3</volume>, <fpage>1</fpage>&#x02013;<lpage>9</lpage>. <pub-id pub-id-type="doi">10.1177/2053951716652159</pub-id></citation></ref>
<ref id="B11">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Bollier</surname> <given-names>D.</given-names></name></person-group> (<year>2010</year>). <source>The Promise and Peril of Big Data. Report, The Aspen Institute, USA, January. Communications and Society Program</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://www.aspeninstitute.org/publications/promise-peril-big-data/">https://www.aspeninstitute.org/publications/promise-peril-big-data/</ext-link> (accessed November 1, 2016).</citation></ref>
<ref id="B12">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Bonenfant</surname> <given-names>M.</given-names></name> <name><surname>Meurs</surname> <given-names>M.</given-names></name></person-group> (<year>2020</year>). <article-title>Collaboration between social sciences and computer science: toward a cross-disciplinary methodology for studying big social data from online communities,</article-title> in <source>Second International Handbook of Internet Research</source>, eds <person-group person-group-type="editor"><name><surname>Hunsinger</surname> <given-names>J.</given-names></name> <name><surname>Allen</surname> <given-names>M.</given-names></name> <name><surname>Klastrup</surname> <given-names>L.</given-names></name></person-group> (<publisher-loc>Dordrecht</publisher-loc>: <publisher-name>Springer</publisher-name>) <fpage>47</fpage>&#x02013;<lpage>64</lpage>. <pub-id pub-id-type="doi">10.1007/978-94-024-1555-1_39</pub-id></citation></ref>
<ref id="B13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Boulamwini</surname> <given-names>J.</given-names></name> <name><surname>Gebru</surname> <given-names>T.</given-names></name></person-group> (<year>2018</year>). <article-title>Gender shades: intersectional accuracy disparities in commercial gender classification</article-title>. <source>Proc. Mach. Learn. Res.</source> <volume>81</volume>, <fpage>1</fpage>&#x02013;<lpage>15</lpage>.</citation></ref>
<ref id="B14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bowker</surname> <given-names>G. C.</given-names></name></person-group> (<year>2014</year>). <article-title>The theory/data thing</article-title>. <source>Comment. Int. J. Commun.</source> <volume>8</volume>, <fpage>1795</fpage>&#x02013;<lpage>1799</lpage>.</citation></ref>
<ref id="B15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Boyd</surname> <given-names>D.</given-names></name> <name><surname>Crawford</surname> <given-names>K.</given-names></name></person-group> (<year>2012</year>). <article-title>Critical questions for big data: provocations for a cultural, technological, and scholarly phenomenon</article-title>. <source>Inform. Commun. Soc.</source> <volume>15</volume>, <fpage>662</fpage>&#x02013;<lpage>679</lpage>. <pub-id pub-id-type="doi">10.1080/1369118X.2012.678878</pub-id></citation></ref>
<ref id="B16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Brooks</surname> <given-names>D.</given-names></name></person-group> (<year>2013</year>, February 19). <article-title>What data can&#x00027;t do</article-title>. <source>The New York Times</source>, p. <fpage>A23</fpage>. <pub-id pub-id-type="pmid">30857405</pub-id></citation></ref>
<ref id="B17">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bughin</surname> <given-names>J.</given-names></name></person-group> (<year>2016</year>). <article-title>Big data, big bang?</article-title> <source>J. Big Data</source> <volume>3</volume>, <fpage>1</fpage>&#x02013;<lpage>14</lpage>. <pub-id pub-id-type="doi">10.1186/s40537-015-0014-3</pub-id></citation></ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Burrell</surname> <given-names>J.</given-names></name></person-group> (<year>2016</year>). <article-title>How the machine &#x0201C;thinks&#x0201D;: understanding opacity in machine learning algorithms</article-title>. <source>Big Data Soc.</source> <volume>3</volume>, <fpage>1</fpage>&#x02013;<lpage>12</lpage>. <pub-id pub-id-type="doi">10.1177/2053951715622512</pub-id></citation></ref>
<ref id="B19">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Burrows</surname> <given-names>R.</given-names></name> <name><surname>Savage</surname> <given-names>M.</given-names></name></person-group> (<year>2014</year>). <article-title>After the crisis? <italic>Big</italic> data <italic>and the methodological challenges of empirical sociology</italic></article-title>. <source>Big Data Soc.</source> <volume>1</volume>, <fpage>1</fpage>&#x02013;<lpage>6</lpage>. <pub-id pub-id-type="doi">10.1177/2053951714540280</pub-id></citation></ref>
<ref id="B20">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Canali</surname> <given-names>S.</given-names></name></person-group> (<year>2016</year>). <article-title>Big data, epistemology and causality: knowledge in and knowledge out in EXPOsOMICS</article-title>. <source>Big Data Soc.</source> <volume>3</volume>, <fpage>1</fpage>&#x02013;<lpage>11</lpage>. <pub-id pub-id-type="doi">10.1177/2053951716669530</pub-id></citation></ref>
<ref id="B21">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Caplan</surname> <given-names>R.</given-names></name> <name><surname>Boyd</surname> <given-names>D.</given-names></name></person-group> (<year>2018</year>). <article-title>Isomorphism through algorithms: institutional dependencies in the case of facebook</article-title>. <source>Big Data Soc.</source> <volume>5</volume>, <fpage>1</fpage>&#x02013;<lpage>12</lpage>. <pub-id pub-id-type="doi">10.1177/2053951718757253</pub-id></citation></ref>
<ref id="B22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chandler</surname> <given-names>D.</given-names></name></person-group> (<year>2015</year>). <article-title>A world without causation: big data and the coming of age of posthumanism</article-title>. <source>Millennium J. Int. Stud.</source> <volume>43</volume>, <fpage>833</fpage>&#x02013;<lpage>851</lpage>. <pub-id pub-id-type="doi">10.1177/0305829815576817</pub-id></citation></ref>
<ref id="B23">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Chang</surname> <given-names>A. C.</given-names></name> <name><surname>Li</surname> <given-names>P.</given-names></name></person-group> (<year>2015</year>). <article-title>Is economics research replicable? Sixty published papers from thirteen journals say &#x0201C;Usually Not,</article-title> in <source>Finance and Economics Discussion Series 2015-083</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://www.federalreserve.gov/econresdata/feds/2015/files/2015083pap.pdf">https://www.federalreserve.gov/econresdata/feds/2015/files/2015083pap.pdf</ext-link> (accessed November 4, 2016).</citation></ref>
<ref id="B24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cheung</surname> <given-names>K.</given-names></name> <name><surname>Leung</surname> <given-names>W. K.</given-names></name> <name><surname>Seto</surname> <given-names>W.</given-names></name></person-group> (<year>2019</year>). <article-title>Application of big data analysis in gastrointestinal research</article-title>. <source>World J. Gastroenterol.</source> <volume>25</volume>, <fpage>2990</fpage>&#x02013;<lpage>3008</lpage>. <pub-id pub-id-type="doi">10.3748/wjg.v25.i24.2990</pub-id><pub-id pub-id-type="pmid">31293336</pub-id></citation></ref>
<ref id="B25">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Chulkovskaya</surname> <given-names>Y.</given-names></name></person-group> (<year>2016</year>). <source>Russian photographer matches random people with social network photos. Russia Beyond the Headlines</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="http://rbth.com/science_and_tech/2016/04/12/russian-photographer-matches-random-people-with-social-network-photos_584153">http://rbth.com/science_and_tech/2016/04/12/russian-photographer-matches-random-people-with-social-network-photos_584153</ext-link> (accessed March 18, 2017).</citation></ref>
<ref id="B26">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chun-Ting Ho</surname> <given-names>J.</given-names></name></person-group> (<year>2020</year>). <article-title>How biased is the sample?</article-title> <source>Reverse engineering the ranking algorithm of facebook&#x00027;s graph application programming interface. Big Data Soc.</source> <volume>7</volume>, <fpage>1</fpage>&#x02013;<lpage>15</lpage>. <pub-id pub-id-type="doi">10.1177/2053951720905874</pub-id></citation></ref>
<ref id="B27">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Corple</surname> <given-names>D. J.</given-names></name> <name><surname>Linabary</surname> <given-names>J. R.</given-names></name></person-group> (<year>2020</year>). <article-title>From data points to people: feminist situated ethics in online big data research</article-title>. <source>Int. J. Soc. Res. Methodol.</source> <volume>23</volume>, <fpage>155</fpage>&#x02013;<lpage>168</lpage>. <pub-id pub-id-type="doi">10.1080/13645579.2019.1649832</pub-id></citation></ref>
<ref id="B28">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Crawford</surname> <given-names>K.</given-names></name></person-group> (<year>2013</year>). <article-title>The hidden biases in big data,</article-title> in <source>Harvard Business Review</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://hbr.org/2013/04/the-hidden-biases-in-big-data">https://hbr.org/2013/04/the-hidden-biases-in-big-data</ext-link> (accessed November 1, 2016).</citation></ref>
<ref id="B29">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Diesner</surname> <given-names>J.</given-names></name></person-group> (<year>2015</year>). <article-title>Small decisions with big impact on data analytics</article-title>. <source>Big Data Soc.</source> <volume>2</volume>, <fpage>1</fpage>&#x02013;<lpage>6</lpage>. <pub-id pub-id-type="doi">10.1177/2053951715617185</pub-id></citation></ref>
<ref id="B30">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Dyche</surname> <given-names>J.</given-names></name></person-group> (<year>2012</year>). <article-title>Big data &#x0201C;Eurekas!&#x0201D; don&#x00027;t just happen,</article-title> in <source>Harvard Business Review</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://hbr.org/2012/11/eureka-doesnt-just-happen">https://hbr.org/2012/11/eureka-doesnt-just-happen</ext-link> (accessed November 1, 2016).</citation></ref>
<ref id="B31">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Eisenstein</surname> <given-names>E. J.</given-names></name></person-group> (<year>1983</year>). <source>The Printing Revolution in Early Modern Europe</source>. <publisher-loc>Cambridge</publisher-loc>: <publisher-name>Cambridge University Press</publisher-name>.</citation></ref>
<ref id="B32">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Ekstrom</surname> <given-names>M.</given-names></name></person-group> (<year>2013</year>). <article-title>N=All: 3 reasons why HR should be all in on big data,</article-title> in <source>Sourcecon</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://www.sourcecon.com/nall-3-reasons-why-hr-should-be-all-in-on-big-data/">https://www.sourcecon.com/nall-3-reasons-why-hr-should-be-all-in-on-big-data/</ext-link> (accessed May 2, 2018).</citation></ref>
<ref id="B33">
<citation citation-type="web"><person-group person-group-type="author"><collab>European Commission</collab></person-group> (<year>2020</year>) White Paper on Artificial Intelligence. A European approach to excellence and trust. Available online at: <ext-link ext-link-type="uri" xlink:href="https://ec.europa.eu/info/sites/info/files/commission-white-paper-artificial-intelligence-feb2020_en.pdf">https://ec.europa.eu/info/sites/info/files/commission-white-paper-artificial-intelligence-feb2020_en.pdf</ext-link>. (accessed July 20, 2020).</citation></ref>
<ref id="B34">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Favaretto</surname> <given-names>M.</given-names></name> <name><surname>De&#x00027;Clercq</surname> <given-names>E.</given-names></name> <name><surname>Schneble</surname> <given-names>C. O.</given-names></name> <name><surname>Elger</surname> <given-names>B. S.</given-names></name></person-group> (<year>2020</year>). <article-title>What is your definition of big data?</article-title> <source>Researchers&#x00027; understanding of the phenomenon of the decade. PLoS ONE</source> <volume>15</volume>:<fpage>e0228987</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0228987</pub-id><pub-id pub-id-type="pmid">32097430</pub-id></citation></ref>
<ref id="B35">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Floridi</surname> <given-names>L.</given-names></name></person-group> (<year>2012</year>). <article-title>Big data and their epistemological challenge</article-title>. <source>Philos. Technol.</source> <volume>25</volume>, <fpage>435</fpage>&#x02013;<lpage>437</lpage>. <pub-id pub-id-type="doi">10.1007/s13347-012-0093-4</pub-id></citation></ref>
<ref id="B36">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Floridi</surname> <given-names>L.</given-names></name></person-group> (<year>2014</year>). <source>The Fourth Revolution. How the Infosphere is Reshaping Human Reality</source>. <publisher-loc>New York</publisher-loc>: <publisher-name>Oxford University Press</publisher-name>.</citation></ref>
<ref id="B37">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Frick&#x000E9;</surname> <given-names>M.</given-names></name></person-group> (<year>2014</year>). <article-title>Big data and its epistemology</article-title>. <source>J. Assoc. Inform. Sci. Technol.</source> <volume>66</volume>, <fpage>651</fpage>&#x02013;<lpage>661</lpage>. <pub-id pub-id-type="doi">10.1002/asi.23212</pub-id></citation></ref>
<ref id="B38">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fuchs</surname> <given-names>C.</given-names></name></person-group> (<year>2011</year>). <article-title>A contribution to the critique of the political economy of google</article-title>. <source>Fast Capitalism</source> <volume>8</volume>:<fpage>263</fpage>. <pub-id pub-id-type="doi">10.32855/fcapital.201101.006</pub-id></citation></ref>
<ref id="B39">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Gaubert</surname> <given-names>J.</given-names></name></person-group> (<year>2017</year>). <article-title>The real reason why facebook introduced &#x0201C;Reactions,</article-title> in <source>The Digital Diary</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://www.the-digital-diary.com/single-post/James-Gaubert-The-Real-Reason-Why-Facebook-Introduced-Reactions">https://www.the-digital-diary.com/single-post/James-Gaubert-The-Real-Reason-Why-Facebook-Introduced-Reactions</ext-link> (accessed September 25, 2017).</citation></ref>
<ref id="B40">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gelman</surname> <given-names>A.</given-names></name></person-group> (<year>2013</year>). <article-title>Ethics and statistics: it&#x00027;s too hard to publish criticisms and obtain data for replication</article-title>. <source>Chance</source> <volume>26</volume>, <fpage>49</fpage>&#x02013;<lpage>52</lpage>. <pub-id pub-id-type="doi">10.1080/09332480.2013.845455</pub-id></citation></ref>
<ref id="B41">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ginsberg</surname> <given-names>J.</given-names></name> <name><surname>Mohebbi</surname> <given-names>M. H.</given-names></name> <name><surname>Patel</surname> <given-names>R. S.</given-names></name> <name><surname>Brammer</surname> <given-names>L.</given-names></name> <name><surname>Smolinski</surname> <given-names>M. S.</given-names></name> <name><surname>Brilliant</surname> <given-names>L.</given-names></name></person-group> (<year>2009</year>). <article-title>Detecting influenza epidemics using search engine query data</article-title>. <source>Nature</source> <volume>457</volume>, <fpage>1012</fpage>&#x02013;<lpage>1014</lpage>. <pub-id pub-id-type="doi">10.1038/nature07634</pub-id><pub-id pub-id-type="pmid">19020500</pub-id></citation></ref>
<ref id="B42">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Goldberg</surname> <given-names>A.</given-names></name></person-group> (<year>2015</year>). <article-title>In defense of forensic social science</article-title>. <source>Big Data Soc.</source> <volume>2</volume>, <fpage>1</fpage>&#x02013;<lpage>3</lpage>. <pub-id pub-id-type="doi">10.1177/2053951715601145</pub-id></citation></ref>
<ref id="B43">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gould</surname> <given-names>P.</given-names></name></person-group> (<year>1981</year>). <article-title>Letting the data speak for themselves</article-title>. <source>Ann. Assoc. Am. Geogr.</source> <volume>71</volume>, <fpage>166</fpage>&#x02013;<lpage>176</lpage>. <pub-id pub-id-type="doi">10.1111/j.1467-8306.1981.tb01346.x</pub-id></citation></ref>
<ref id="B44">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Graham</surname> <given-names>T.</given-names></name></person-group> (<year>2018</year>). <article-title>Platforms and hyper-choice on the world wide web</article-title>. <source>Big Data Soc.</source> <volume>5</volume>, <fpage>1</fpage>&#x02013;<lpage>12</lpage>. <pub-id pub-id-type="doi">10.1177/2053951718765878</pub-id></citation></ref>
<ref id="B45">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gransche</surname> <given-names>B.</given-names></name></person-group> (<year>2016</year>). <article-title>The oracle of big data &#x02013; prophecies without prophets</article-title>. <source>Int. Rev. Inform. Ethics</source> <volume>24</volume>, <fpage>55</fpage>&#x02013;<lpage>62</lpage>. <pub-id pub-id-type="doi">10.29173/irie152</pub-id></citation></ref>
<ref id="B46">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Grosman</surname> <given-names>J.</given-names></name> <name><surname>Reigeluth</surname> <given-names>T.</given-names></name></person-group> (<year>2019</year>). <article-title>Perspectives on algorithmic normativities: engineers, objects, activities</article-title>. <source>Big Data Soc.</source> <volume>6</volume>, <fpage>1</fpage>&#x02013;<lpage>12</lpage>. <pub-id pub-id-type="doi">10.1177/2053951719858742</pub-id></citation></ref>
<ref id="B47">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Gruschka</surname> <given-names>N.</given-names></name> <name><surname>Mavroeidis</surname> <given-names>V.</given-names></name> <name><surname>Vishi</surname> <given-names>K.</given-names></name> <name><surname>Jensen</surname> <given-names>M.</given-names></name></person-group> (<year>2018</year>). <article-title>Privacy issues and data protection in big data: a case study analysis under GDPR,</article-title> in <source>2018 IEEE International Conference on Big Data (Big Data)</source> (<publisher-loc>Seattle, WA</publisher-loc>), <fpage>5027</fpage>&#x02013;<lpage>5033</lpage>. <pub-id pub-id-type="doi">10.1109/BigData.2018.8622621</pub-id></citation></ref>
<ref id="B48">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Hargittai</surname> <given-names>E.</given-names></name></person-group> (<year>2008</year>). <article-title>The digital reproduction of inequality,</article-title> in <source>Social Stratification: Class, Race and Gender in Sociological Perspective</source>, eds <person-group person-group-type="editor"><name><surname>Grusky</surname> <given-names>V.</given-names></name></person-group> (<publisher-loc>Boulder</publisher-loc>: <publisher-name>Westview Press</publisher-name>), <fpage>961</fpage>&#x02013;<lpage>892</lpage>.</citation></ref>
<ref id="B49">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hassani</surname> <given-names>H.</given-names></name> <name><surname>Huang</surname> <given-names>X.</given-names></name> <name><surname>Ghodsi</surname> <given-names>M.</given-names></name></person-group> (<year>2018</year>). <article-title>Big data and causality</article-title>. <source>Ann. Data Sci.</source> <volume>5</volume>, <fpage>133</fpage>&#x02013;<lpage>156</lpage>. <pub-id pub-id-type="doi">10.1007/s40745-017-0122-3</pub-id></citation></ref>
<ref id="B50">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hauge</surname> <given-names>M. V.</given-names></name> <name><surname>Stevenson</surname> <given-names>M. D.</given-names></name> <name><surname>Rossmo</surname> <given-names>D. K.</given-names></name> <name><surname>Le Comber</surname> <given-names>S. C.</given-names></name></person-group> (<year>2016</year>). <article-title>Tagging banksy: using geographic profiling to investigate a modern art mystery</article-title>. <source>J. Spat. Sci.</source> <volume>61</volume>, <fpage>185</fpage>&#x02013;<lpage>190</lpage>. <pub-id pub-id-type="doi">10.1080/14498596.2016.1138246</pub-id></citation></ref>
<ref id="B51">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hekler</surname> <given-names>E. B.</given-names></name> <name><surname>Klasnja</surname> <given-names>P.</given-names></name> <name><surname>Chevance</surname> <given-names>G.</given-names></name> <name><surname>Golaszewski</surname> <given-names>N. M.</given-names></name> <name><surname>Lewis</surname> <given-names>D.</given-names></name> <name><surname>Sim</surname> <given-names>I.</given-names></name></person-group> (<year>2019</year>). <article-title>Why we need a small data paradigm</article-title>. <source>BMC Med.</source> <volume>17</volume>:<fpage>133</fpage>. <pub-id pub-id-type="doi">10.1186/s12916-019-1366-x</pub-id><pub-id pub-id-type="pmid">31311528</pub-id></citation></ref>
<ref id="B52">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jan</surname> <given-names>B.</given-names></name> <name><surname>Farman</surname> <given-names>H.</given-names></name> <name><surname>Khan</surname> <given-names>M.</given-names></name> <name><surname>Imran</surname> <given-names>M.</given-names></name> <name><surname>Islam</surname> <given-names>I. U.</given-names></name> <name><surname>Ahmad</surname> <given-names>A.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>Deep learning in big data analytics: a comparative study</article-title>. <source>Compute. Elect. Eng.</source> <volume>75</volume>, <fpage>275</fpage>&#x02013;<lpage>287</lpage>. <pub-id pub-id-type="doi">10.1016/j.compeleceng.2017.12.009</pub-id></citation></ref>
<ref id="B53">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Jenkins</surname> <given-names>T.</given-names></name></person-group> (<year>2013</year>). <article-title>Don&#x00027;t count on big data for answers,</article-title> in <source>The Scotsman</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="http://www.scotsman.com/news/opinion/tiffany-jenkins-don-t-count-on-big-data-for-answers-1-2785890">http://www.scotsman.com/news/opinion/tiffany-jenkins-don-t-count-on-big-data-for-answers-1-2785890</ext-link> (accessed November 1, 2016).</citation></ref>
<ref id="B54">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Johnson</surname> <given-names>J. A.</given-names></name></person-group> (<year>2014</year>). <article-title>From open data to information justice</article-title>. <source>Ethics Inform. Technol.</source> <volume>16</volume>, <fpage>263</fpage>&#x02013;<lpage>274</lpage>. <pub-id pub-id-type="doi">10.1007/s10676-014-9351-8</pub-id></citation></ref>
<ref id="B55">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jones</surname> <given-names>M.</given-names></name></person-group> (<year>2019</year>). <article-title>What we talk about when we talk about (big) data</article-title>. <source>J. Strategic Inform. Syst.</source> <volume>28</volume>, <fpage>3</fpage>&#x02013;<lpage>16</lpage>. <pub-id pub-id-type="doi">10.1016/j.jsis.2018.10.005</pub-id></citation></ref>
<ref id="B56">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kaplan</surname> <given-names>F.</given-names></name> <name><surname>di Lenardo</surname> <given-names>I.</given-names></name></person-group> (<year>2017</year>). <article-title>Big data of the past</article-title>. <source>Front. Digit. Human.</source> <volume>4</volume>:<fpage>12</fpage>. <pub-id pub-id-type="doi">10.3389/fdigh.2017.00012</pub-id></citation></ref>
<ref id="B57">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kelling</surname> <given-names>S.</given-names></name> <name><surname>Hochachka</surname> <given-names>W. M.</given-names></name> <name><surname>Fink</surname> <given-names>D.</given-names></name> <name><surname>Riedewald</surname> <given-names>M.</given-names></name> <name><surname>Caruma</surname> <given-names>R.</given-names></name> <name><surname>Ballard</surname> <given-names>G.</given-names></name> <etal/></person-group>. (<year>2009</year>). <article-title>Data-intensive science: a new paradigm for biodiversity studies</article-title>. <source>BioScience</source> <volume>59</volume>, <fpage>613</fpage>&#x02013;<lpage>620</lpage>. <pub-id pub-id-type="doi">10.1525/bio.2009.59.7.12</pub-id></citation></ref>
<ref id="B58">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kennedy</surname> <given-names>H.</given-names></name> <name><surname>Poell</surname> <given-names>T.</given-names></name> <name><surname>van Dijck</surname> <given-names>J.</given-names></name></person-group> (<year>2015</year>). <article-title>Data and agency</article-title>. <source>Big Data Soc.</source> <volume>2</volume>, <fpage>1</fpage>&#x02013;<lpage>7</lpage>. <pub-id pub-id-type="doi">10.1177/2053951715621569</pub-id></citation></ref>
<ref id="B59">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kim</surname> <given-names>J.</given-names></name> <name><surname>Chung</surname> <given-names>K.</given-names></name></person-group> (<year>2018</year>). <article-title>Associative feature information extraction using text mining rom health big data</article-title>. <source>Wireless Pers. Commun.</source> <volume>105</volume>:<fpage>691</fpage>&#x02013;<lpage>707</lpage>. <pub-id pub-id-type="doi">10.1007/s11277-018-5722-5</pub-id></citation></ref>
<ref id="B60">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kitchin</surname> <given-names>R.</given-names></name></person-group> (<year>2013</year>). <article-title>Big data and human geography: opportunities, challenges and risks</article-title>. <source>Dialog. Hum. Geogr.</source> <volume>3</volume>, <fpage>262</fpage>&#x02013;<lpage>267</lpage>. <pub-id pub-id-type="doi">10.1177/2043820613513388</pub-id></citation></ref>
<ref id="B61">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kitchin</surname> <given-names>R.</given-names></name></person-group> (<year>2014</year>). <article-title>Big data, new epistemologies and paradigm shifts</article-title>. <source>Big Data Soc.</source> <volume>1</volume>, <fpage>1</fpage>&#x02013;<lpage>12</lpage>. <pub-id pub-id-type="doi">10.1177/2053951714528481</pub-id></citation></ref>
<ref id="B62">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kitchin</surname> <given-names>R.</given-names></name> <name><surname>McArdle</surname> <given-names>G.</given-names></name></person-group> (<year>2016</year>). <article-title>What makes big data, big data? Exploring the ontological characteristics of 26 datasets</article-title>. <source>Big Data Soc.</source> <volume>3</volume>, <fpage>1</fpage>&#x02013;<lpage>10</lpage>. <pub-id pub-id-type="doi">10.1177/2053951716631130</pub-id></citation></ref>
<ref id="B63">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Kuhn</surname> <given-names>T. S.</given-names></name></person-group> (<year>1962</year>). <source>The Structure of Scientific Revolutions. Trans. Eng</source>. <publisher-loc>Chicago, IL</publisher-loc>: <publisher-name>The University of Chicago Press, 1970</publisher-name>.</citation></ref>
<ref id="B64">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lagoze</surname> <given-names>C.</given-names></name></person-group> (<year>2014</year>). <article-title>Big data, data integrity, and the fracturing of the control zone</article-title>. <source>Big Data Soc.</source> <volume>1</volume>, <fpage>1</fpage>&#x02013;<lpage>11</lpage>. <pub-id pub-id-type="doi">10.1177/2053951714558281</pub-id></citation></ref>
<ref id="B65">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Laney</surname> <given-names>D.</given-names></name></person-group> (<year>2001</year>). <article-title>3D data management: controlling data volume, velocity, and variety,</article-title> in <source>META Group, File 949</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf">https://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf</ext-link> (accessed November 4, 2016).</citation></ref>
<ref id="B66">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lazer</surname> <given-names>D.</given-names></name> <name><surname>Kennedy</surname> <given-names>R.</given-names></name> <name><surname>King</surname> <given-names>G.</given-names></name> <name><surname>Vespignani</surname> <given-names>A.</given-names></name></person-group> (<year>2014</year>). <article-title>The parable of google flu: traps in big data analysis</article-title>. <source>Science</source> <volume>343</volume>, <fpage>1203</fpage>&#x02013;<lpage>1205</lpage>. <pub-id pub-id-type="doi">10.1126/science.1248506</pub-id><pub-id pub-id-type="pmid">24626916</pub-id></citation></ref>
<ref id="B67">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lazer</surname> <given-names>D.</given-names></name> <name><surname>Pentland</surname> <given-names>A.</given-names></name> <name><surname>Adamic</surname> <given-names>L.</given-names></name> <name><surname>Aral</surname> <given-names>S.</given-names></name> <name><surname>Barab&#x000E1;si</surname> <given-names>A.&#x02013;L.</given-names></name> <name><surname>Brewer</surname> <given-names>D.</given-names></name> <etal/></person-group>. (<year>2009</year>). <article-title>Computational social science</article-title>. <source>Science</source> <volume>323</volume>, <fpage>721</fpage>&#x02013;<lpage>723</lpage>. <pub-id pub-id-type="doi">10.1126/science.1167742</pub-id><pub-id pub-id-type="pmid">19197046</pub-id></citation></ref>
<ref id="B68">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lee</surname> <given-names>A. J.</given-names></name> <name><surname>Cook</surname> <given-names>P. S.</given-names></name></person-group> (<year>2020</year>). <article-title>The myth of the &#x0201C;data-driven&#x0201D; society: exploring the interactions of data interfaces, circulations, and abstractions</article-title>. <source>Sociol. Compass</source> <volume>14</volume>:<fpage>e12749</fpage>. <pub-id pub-id-type="doi">10.1111/soc4.12749</pub-id></citation></ref>
<ref id="B69">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lee</surname> <given-names>M.</given-names></name> <name><surname>Martin</surname> <given-names>J. L.</given-names></name></person-group> (<year>2015</year>). <article-title>Surfeit and surface</article-title>. <source>Big Data Soc.</source> <volume>2</volume>, <fpage>1</fpage>&#x02013;<lpage>3</lpage>. <pub-id pub-id-type="doi">10.1177/2053951715604334</pub-id></citation></ref>
<ref id="B70">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Lehrer</surname> <given-names>J.</given-names></name></person-group> (<year>2010</year>). <source>A Physicist Solves the City. The New York Times, MM46</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://www.nytimes.com/2010/12/19/magazine/19Urban_West-t.html">https://www.nytimes.com/2010/12/19/magazine/19Urban_West-t.html</ext-link> (accessed June 31, 2020).</citation></ref>
<ref id="B71">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Leonelli</surname> <given-names>S.</given-names></name></person-group> (<year>2014</year>). <article-title>What difference does quantity make?</article-title> <source>On the epistemology of big data in biology. Big Data Soc.</source> <volume>1</volume>, <fpage>1</fpage>&#x02013;<lpage>11</lpage>. <pub-id pub-id-type="doi">10.1177/2053951714534395</pub-id><pub-id pub-id-type="pmid">25729586</pub-id></citation></ref>
<ref id="B72">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Leonelli</surname> <given-names>S.</given-names></name></person-group> (<year>2018</year>). <article-title>Rethinking reproducibility as a criterion for research quality</article-title>. <source>Res. Hist. Econ. Thought Methodol.</source> <volume>36B</volume>, <fpage>129</fpage>&#x02013;<lpage>146</lpage>. <pub-id pub-id-type="doi">10.1108/S0743-41542018000036B009</pub-id></citation></ref>
<ref id="B73">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Levy</surname> <given-names>K. E. C.</given-names></name> <name><surname>Johns</surname> <given-names>D. M.</given-names></name></person-group> (<year>2016</year>). <article-title>When open data is a trojan horse: the weaponization of transparency in science and governance</article-title>. <source>Big Data Soc.</source> <volume>3</volume>, <fpage>1</fpage>&#x02013;<lpage>6</lpage>. <pub-id pub-id-type="doi">10.1177/2053951715621568</pub-id></citation></ref>
<ref id="B74">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lewis</surname> <given-names>K.</given-names></name></person-group> (<year>2015</year>). <article-title>Three fallacies of digital footprints</article-title>. <source>Big Data Soc.</source> <volume>2</volume>, <fpage>1</fpage>&#x02013;<lpage>4</lpage>. <pub-id pub-id-type="doi">10.1177/2053951715602496</pub-id></citation></ref>
<ref id="B75">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lewis</surname> <given-names>K.</given-names></name> <name><surname>Kaufman</surname> <given-names>J.</given-names></name> <name><surname>Gonzalez</surname> <given-names>M.</given-names></name> <name><surname>Wimmer</surname> <given-names>A.</given-names></name> <name><surname>Christakis</surname> <given-names>N.</given-names></name></person-group> (<year>2008</year>). <article-title>Tastes, ties, and time: a new social network dataset using facebook.com</article-title>. <source>Soc. Netw.</source> <volume>30</volume>, <fpage>330</fpage>&#x02013;<lpage>342</lpage>. <pub-id pub-id-type="doi">10.1016/j.socnet.2008.07.002</pub-id></citation></ref>
<ref id="B76">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lowrie</surname> <given-names>I.</given-names></name></person-group> (<year>2017</year>). <article-title>Algorithmic rationality: epistemology and efficiency in the data sciences</article-title>. <source>Big Data Soc.</source> <volume>4</volume>, <fpage>1</fpage>&#x02013;<lpage>13</lpage>. <pub-id pub-id-type="doi">10.1177/2053951717700925</pub-id></citation></ref>
<ref id="B77">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mager</surname> <given-names>A.</given-names></name></person-group> (<year>2011</year>). <article-title>Algorithmic Ideology</article-title>. <source>How capitalist society shapes search engines. Inform. Commun. Soc.</source> <volume>15</volume>, <fpage>769</fpage>&#x02013;<lpage>787</lpage>. <pub-id pub-id-type="doi">10.1080/1369118X.2012.676056</pub-id></citation></ref>
<ref id="B78">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mager</surname> <given-names>A.</given-names></name></person-group> (<year>2014</year>). <article-title>Defining algorithmic ideology: using ideology critique to scrutinize corporate search engines</article-title>. <source>Triple C</source> <volume>12</volume>, <fpage>28</fpage>&#x02013;<lpage>39</lpage>. <pub-id pub-id-type="doi">10.31269/triplec.v12i1.439</pub-id></citation></ref>
<ref id="B79">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Manovich</surname> <given-names>L.</given-names></name></person-group> (<year>2011</year>). <source>Trending: the promises and the challenges of big social data. Manovich</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="http://manovich.net/content/04-projects/067-trending-the-promises-and-the-challenges-of-big-social-data/64-article-2011.pdf">http://manovich.net/content/04-projects/067-trending-the-promises-and-the-challenges-of-big-social-data/64-article-2011.pdf</ext-link> (accessed November 1, 2016).</citation></ref>
<ref id="B80">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Mayer-Sch&#x000F6;nberger</surname> <given-names>V.</given-names></name> <name><surname>Cukier</surname> <given-names>K.</given-names></name></person-group> (<year>2013a</year>). <source>Big Data. A Revolution That Will Transform How We Live, Work, and Think</source>. <publisher-loc>Boston, MA; New York, NY</publisher-loc>: <publisher-name>Eamon Dolan Book/Houghton Mifflin Harcourt</publisher-name>.</citation></ref>
<ref id="B81">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Mayer-Sch&#x000F6;nberger</surname> <given-names>V.</given-names></name> <name><surname>Cukier</surname> <given-names>K.</given-names></name></person-group> (<year>2013b</year>). <source>With big data, we are creating artificial intelligence that no human can understand. Quartz</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="http://qz.com/65925/with-big-data-we-are-creating-artificial-intelligences-that-no-human-can-understand/">http://qz.com/65925/with-big-data-we-are-creating-artificial-intelligences-that-no-human-can-understand/</ext-link> (accessed November 4, 2016).</citation></ref>
<ref id="B82">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>McDermott</surname> <given-names>Y.</given-names></name></person-group> (<year>2017</year>). <article-title>Conceptualising the right to data protection in an era of big data</article-title>. <source>Big Data Soc.</source> <volume>4</volume>, <fpage>1</fpage>&#x02013;<lpage>7</lpage>. <pub-id pub-id-type="doi">10.1177/2053951716686994</pub-id></citation></ref>
<ref id="B83">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>McFarland</surname> <given-names>D. A.</given-names></name> <name><surname>McFarland</surname> <given-names>H. R.</given-names></name></person-group> (<year>2015</year>). <article-title>Big data and the danger of being precisely inaccurate</article-title>. <source>Big Data Soc.</source> <volume>2</volume>, <fpage>1</fpage>&#x02013;<lpage>4</lpage>. <pub-id pub-id-type="doi">10.1177/2053951715602495</pub-id></citation></ref>
<ref id="B84">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Mercer</surname> <given-names>O.</given-names></name></person-group> (<year>2019</year>). <source>Big data requires bigger hardware. <italic>TDAN</italic></source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://tdan.com/big-data-requires-bigger-hardware/24339">https://tdan.com/big-data-requires-bigger-hardware/24339</ext-link> (accessed April 2, 2020).</citation></ref>
<ref id="B85">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Metcalf</surname> <given-names>J.</given-names></name> <name><surname>Crawford</surname> <given-names>K.</given-names></name></person-group> (<year>2016</year>). <article-title>Where are human subjects in big data research?</article-title> <source>The emerging ethics divide. Big Data Soc.</source> <volume>3</volume>, <fpage>1</fpage>&#x02013;<lpage>14</lpage>. <pub-id pub-id-type="doi">10.1177/2053951716650211</pub-id></citation></ref>
<ref id="B86">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Miller</surname> <given-names>H. J.</given-names></name></person-group> (<year>2010</year>). <article-title>The data avalanche is here</article-title>. <source>Shouldn&#x00027;t we be digging? J. Reg. Sci.</source> <volume>50</volume>, <fpage>181</fpage>&#x02013;<lpage>201</lpage>. <pub-id pub-id-type="doi">10.1111/j.1467-9787.2009.00641.x</pub-id></citation></ref>
<ref id="B87">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Mowshowitz</surname> <given-names>A.</given-names></name></person-group> (<year>1984</year>). <article-title>Computers and the myth of neutrality,</article-title> in <source>Proceedings of the ACM 12th Annual Computer Science Conference on SIGCSE Symposium</source> (<publisher-loc>Philadelphia, PA</publisher-loc>), <fpage>85</fpage>&#x02013;<lpage>92</lpage>. <pub-id pub-id-type="doi">10.1145/800014.808144</pub-id></citation></ref>
<ref id="B88">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>O&#x00027;Leary</surname> <given-names>D. E.</given-names></name></person-group> (<year>2013</year>). <article-title>Artificial intelligence and big data</article-title>. <source>IEEE Intell. Syst.</source> <volume>28</volume>, <fpage>96</fpage>&#x02013;<lpage>99</lpage>. <pub-id pub-id-type="doi">10.1109/MIS.2013.39</pub-id></citation></ref>
<ref id="B89">
<citation citation-type="journal"><person-group person-group-type="author"><collab>Open Science Collaboration</collab></person-group> (<year>2015</year>). <article-title>Estimating the reproducibility of psychological science</article-title>. <source>Science</source> <volume>349</volume>:<fpage>aac4716</fpage>. <pub-id pub-id-type="doi">10.1126/science.aac4716</pub-id><pub-id pub-id-type="pmid">26315443</pub-id></citation></ref>
<ref id="B90">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Osman</surname> <given-names>A. M. S.</given-names></name></person-group> (<year>2019</year>). <article-title>A novel big data analytics framework for smart cities</article-title>. <source>Future Generat. Comput. Syst.</source> <volume>91</volume>, <fpage>620</fpage>&#x02013;<lpage>633</lpage>. <pub-id pub-id-type="doi">10.1016/j.future.2018.06.046</pub-id></citation></ref>
<ref id="B91">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Park</surname> <given-names>P.</given-names></name> <name><surname>Macy</surname> <given-names>M.</given-names></name></person-group> (<year>2015</year>). <article-title>The paradox of active users</article-title>. <source>Big Data Soc.</source> <volume>2</volume>, <fpage>1</fpage>&#x02013;<lpage>4</lpage>. <pub-id pub-id-type="doi">10.1177/2053951715606164</pub-id></citation></ref>
<ref id="B92">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Popper</surname> <given-names>K. R.</given-names></name></person-group> (<year>1935</year>). <source>The Logic of Scientific Discovery.</source> Trans. Eng. <publisher-loc>London; New York, NY</publisher-loc>: <publisher-name>Routledge</publisher-name>, 2002.<pub-id pub-id-type="pmid">29318475</pub-id></citation></ref>
<ref id="B93">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Poppinga</surname> <given-names>B.</given-names></name> <name><surname>Cramer</surname> <given-names>H.</given-names></name> <name><surname>B&#x000F6;hmer</surname> <given-names>M.</given-names></name> <name><surname>Morrison</surname> <given-names>A.</given-names></name> <name><surname>Bentley</surname> <given-names>F.</given-names></name> <name><surname>Henze</surname> <given-names>N.</given-names></name> <etal/></person-group>. (<year>2012</year>). <article-title>Research in the large 3.0: app stores, wide distribution, and big data in MobileHCI research,</article-title> in <source>Proceedings of the 14th International Conference on Human-Computer Interaction with Mobile Devices and Services Companion</source> (<publisher-loc>San Francisco, CA</publisher-loc>), <fpage>241</fpage>&#x02013;<lpage>244</lpage>. <pub-id pub-id-type="doi">10.1145/2371664.2371724</pub-id></citation></ref>
<ref id="B94">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Prensky</surname> <given-names>M.</given-names></name></person-group> (<year>2009</year>). <article-title>H. sapiens digital: from digital immigrants and digital natives to digital wisdom. Innovate</article-title>. <source>J. Online Educ.</source> <volume>5</volume>:<fpage>1</fpage>. <pub-id pub-id-type="doi">10.1108/10748120110424816</pub-id></citation></ref>
<ref id="B95">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Preston</surname> <given-names>A.</given-names></name></person-group> (<year>2014</year>). <source>The death of privacy. <italic>The Guardian</italic></source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://www.theguardian.com/world/2014/aug/03/internet-death-privacy-google-facebook-alex-preston">https://www.theguardian.com/world/2014/aug/03/internet-death-privacy-google-facebook-alex-preston</ext-link> (accessed March 18, 2017).</citation></ref>
<ref id="B96">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Resnyansky</surname> <given-names>L.</given-names></name></person-group> (<year>2019</year>). <article-title>Conceptual frameworks for social and cultural big data analytics: answering the epistemological challenge</article-title>. <source>Big Data Soc.</source> <volume>6</volume>, <fpage>1</fpage>&#x02013;<lpage>12</lpage>. <pub-id pub-id-type="doi">10.1177/2053951718823815</pub-id></citation></ref>
<ref id="B97">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Robinson</surname> <given-names>P.</given-names></name></person-group> (<year>2018</year>). <article-title>Understanding big data: fundamental concepts and framework,</article-title> <source>Presented at International Workshop on Big Data for Central Bank Policies</source> (Bali).</citation></ref>
<ref id="B98">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Savage</surname> <given-names>M.</given-names></name> <name><surname>Burrows</surname> <given-names>R.</given-names></name></person-group> (<year>2007</year>). <article-title>The coming crisis of empirical sociology</article-title>. <source>Sociol.</source> <volume>41</volume>, <fpage>885</fpage>&#x02013;<lpage>899</lpage>. <pub-id pub-id-type="doi">10.1177/0038038507080443</pub-id><pub-id pub-id-type="pmid">31782142</pub-id></citation></ref>
<ref id="B99">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schrock</surname> <given-names>A.</given-names></name> <name><surname>Shaffer</surname> <given-names>G.</given-names></name></person-group> (<year>2017</year>). <article-title>Data ideologies of an interested public: a study of grassroots open government data intermediaries</article-title>. <source>Big Data Soc.</source> <volume>4</volume>, <fpage>1</fpage>&#x02013;<lpage>12</lpage>. <pub-id pub-id-type="doi">10.1177/2053951717690750</pub-id></citation></ref>
<ref id="B100">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schwartz</surname> <given-names>J. M.</given-names></name> <name><surname>Cook</surname> <given-names>T.</given-names></name></person-group> (<year>2002</year>). <article-title>Archives, records, and power: the making of modern memory</article-title>. <source>Arch. Sci.</source> <volume>2</volume>, <fpage>1</fpage>&#x02013;<lpage>19</lpage>. <pub-id pub-id-type="doi">10.1007/BF02435628</pub-id></citation></ref>
<ref id="B101">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Seaver</surname> <given-names>N.</given-names></name></person-group> (<year>2017</year>). <article-title>Algorithms as culture: some tactics for the ethnography of algorithmic systems</article-title>. <source>Big Data Soc.</source> <volume>4</volume>, <fpage>1</fpage>&#x02013;<lpage>12</lpage>. <pub-id pub-id-type="doi">10.1177/2053951717738104</pub-id></citation></ref>
<ref id="B102">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Severo</surname> <given-names>M.</given-names></name> <name><surname>Romele</surname> <given-names>A.</given-names></name></person-group> (<year>2015</year>). <source>Traces Num&#x000E9;riques et Territoires</source>. <publisher-loc>Paris</publisher-loc>: <publisher-name>Presses des Mines</publisher-name>.</citation></ref>
<ref id="B103">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shaw</surname> <given-names>R.</given-names></name></person-group> (<year>2015</year>). <article-title>Big data and reality</article-title>. <source>Big Data Soc.</source> <volume>2</volume>, <fpage>1</fpage>&#x02013;<lpage>4</lpage>. <pub-id pub-id-type="doi">10.1177/2053951715608877</pub-id></citation></ref>
<ref id="B104">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Shu</surname> <given-names>X.</given-names></name></person-group> (<year>2020</year>). <source>Knowledge Discovery in the Social Sciences. A Data Mining Approach</source>. <publisher-loc>Oakland, CA</publisher-loc>: <publisher-name>University of California Press</publisher-name>. <pub-id pub-id-type="doi">10.2307/j.ctvw1d683</pub-id></citation></ref>
<ref id="B105">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Smith</surname> <given-names>B. C.</given-names></name></person-group> (<year>2019</year>). <article-title>Big data and us: human-data interactions</article-title>. <source>Eur. Rev.</source> <volume>27</volume>, <fpage>357</fpage>&#x02013;<lpage>377</lpage>. <pub-id pub-id-type="doi">10.1017/S1062798719000048</pub-id></citation></ref>
<ref id="B106">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Steadman</surname> <given-names>I.</given-names></name></person-group> (<year>2013</year>). <article-title>Big data and the death of the theorist,</article-title> in <source>WIRED.</source> Available online at: <ext-link ext-link-type="uri" xlink:href="http://www.wired.co.uk/article/big-data-end-of-theory">http://www.wired.co.uk/article/big-data-end-of-theory</ext-link> (accessed November 1, 2016).</citation></ref>
<ref id="B107">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Strasser</surname> <given-names>B. J.</given-names></name></person-group> (<year>2012</year>). <article-title>Data-driven sciences: from wonder cabinet to electronics databases</article-title>. <source>Stud. Hist. Philos. Biol. Biomed. Sci.</source> <volume>43</volume>, <fpage>85</fpage>&#x02013;<lpage>87</lpage>. <pub-id pub-id-type="doi">10.1016/j.shpsc.2011.10.009</pub-id><pub-id pub-id-type="pmid">22326076</pub-id></citation></ref>
<ref id="B108">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Strom</surname> <given-names>D.</given-names></name></person-group> (<year>2012</year>). <article-title>Big data makes things better,</article-title> in <source>Dice</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="http://insights.dice.com/2012/08/03/big-data-makes-things-better/">http://insights.dice.com/2012/08/03/big-data-makes-things-better/</ext-link> (accessed November 1, 2016).</citation></ref>
<ref id="B109">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Succi</surname> <given-names>S.</given-names></name> <name><surname>Coveney</surname> <given-names>P. V.</given-names></name></person-group> (<year>2019</year>). <article-title>Big data: the end of the scientific method?</article-title> <source>Philos. Trans. R. Soc. A</source> <volume>377</volume>, <fpage>1</fpage>&#x02013;<lpage>15</lpage>. <pub-id pub-id-type="doi">10.1098/rsta.2018.0145</pub-id><pub-id pub-id-type="pmid">30967041</pub-id></citation></ref>
<ref id="B110">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Symons</surname> <given-names>J.</given-names></name> <name><surname>Alvarado</surname> <given-names>R.</given-names></name></person-group> (<year>2016</year>). <article-title>Can we trust big data?</article-title> <source>Applying philosophy of science to software. Big Data Soc.</source> <volume>3</volume>, <fpage>1</fpage>&#x02013;<lpage>17</lpage>. <pub-id pub-id-type="doi">10.1177/2053951716664747</pub-id></citation></ref>
<ref id="B111">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Tani</surname> <given-names>T.</given-names></name></person-group> (<year>2019</year>). <article-title>L&#x00027;incidenza dei big data e del machine learning sui principi alla base del Regolamento Europeo per la tutela dei dati personali (2016/679/UE) e proposte per una nuova normativa in tema di privacy,</article-title> in <source>Societ&#x000E0; Delle Tecnologie Esponenziali e General Data Protection Regulation: Profili critici Nella Protezione Dei Dati</source>, ed <person-group person-group-type="editor"><name><surname>Bonavita</surname> <given-names>S.</given-names></name></person-group> (<publisher-loc>Milano</publisher-loc>: <publisher-name>Ledizioni Ledi Publishing</publisher-name>), <fpage>35</fpage>&#x02013;<lpage>66</lpage>. <pub-id pub-id-type="doi">10.4000/books.ledizioni.3946</pub-id></citation></ref>
<ref id="B112">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Taylor</surname> <given-names>L.</given-names></name> <name><surname>Meissner</surname> <given-names>F.</given-names></name></person-group> (<year>2019</year>). <article-title>A crisis of opportunity: market-making, big data, and the consolidation of migration as risk</article-title>. <source>Antipode</source> <volume>52</volume>, <fpage>270</fpage>&#x02013;<lpage>290</lpage>. <pub-id pub-id-type="doi">10.1111/anti.12583</pub-id><pub-id pub-id-type="pmid">32063659</pub-id></citation></ref>
<ref id="B113">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Taylor</surname> <given-names>L.</given-names></name> <name><surname>Schroeder</surname> <given-names>R.</given-names></name> <name><surname>Meyer</surname> <given-names>E.</given-names></name></person-group> (<year>2014</year>). <article-title>Emerging practices and perspectives on big data analysis in economics: bigger and better or more of the same?</article-title> <source>Big Data Soc.</source> <volume>1</volume>, <fpage>1</fpage>&#x02013;<lpage>10</lpage>. <pub-id pub-id-type="doi">10.1177/2053951714536877</pub-id></citation></ref>
<ref id="B114">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>ten Bosch</surname> <given-names>O.</given-names></name> <name><surname>Windmeijer</surname> <given-names>D.</given-names></name> <name><surname>van Delden</surname> <given-names>A.</given-names></name> <name><surname>van den Heuvel</surname> <given-names>G.</given-names></name></person-group> (<year>2018</year>). <article-title>Web scraping meets survey design: combining forces,</article-title> <source>Presented at BigSurv18 Conference, 26 October, Barcelona</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://www.bigsurv18.org/conf18/uploads/73/61/20180820_BigSurv_WebscrapingMeetsSurveyDesign.pdf">https://www.bigsurv18.org/conf18/uploads/73/61/20180820_BigSurv_WebscrapingMeetsSurveyDesign.pdf</ext-link>. (accessed July 25, 2020).</citation></ref>
<ref id="B115">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tian</surname> <given-names>E.</given-names></name></person-group> (<year>2020</year>). <article-title>A prospect for the geographical research of sport in the age of big data</article-title>. <source>Sport Soc.</source> <volume>23</volume>, <fpage>159</fpage>&#x02013;<lpage>169</lpage>. <pub-id pub-id-type="doi">10.1080/17430437.2018.1555233</pub-id></citation></ref>
<ref id="B116">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Torrecilla</surname> <given-names>J. L.</given-names></name> <name><surname>Romo</surname> <given-names>J.</given-names></name></person-group> (<year>2018</year>). <article-title>Data learning from big data</article-title>. <source>Stat. Probabil. Lett.</source> <volume>136</volume>, <fpage>15</fpage>&#x02013;<lpage>19</lpage>. <pub-id pub-id-type="doi">10.1016/j.spl.2018.02.038</pub-id></citation></ref>
<ref id="B117">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Trabucchi</surname> <given-names>D.</given-names></name> <name><surname>Buganza</surname> <given-names>T.</given-names></name></person-group> (<year>2019</year>). <article-title>Data-driven innovation: switching the perspective on big data</article-title>. <source>Eur. J. Innov. Manag.</source> <volume>22</volume>, <fpage>23</fpage>&#x02013;<lpage>40</lpage>. <pub-id pub-id-type="doi">10.1108/EJIM-01-2018-0017</pub-id></citation></ref>
<ref id="B118">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Trifunovic</surname> <given-names>N.</given-names></name> <name><surname>Milutinovic</surname> <given-names>V.</given-names></name> <name><surname>Salom</surname> <given-names>J.</given-names></name> <name><surname>Kos</surname> <given-names>A.</given-names></name></person-group> (<year>2015</year>). <article-title>Paradigm shift in big data supercomputing: dataflow vs</article-title>. <source>controlflow. J. Big Data</source> <volume>2</volume>, <fpage>1</fpage>&#x02013;<lpage>14</lpage>. <pub-id pub-id-type="doi">10.1186/s40537-014-0010-z</pub-id></citation></ref>
<ref id="B119">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ulbricht</surname> <given-names>L.</given-names></name></person-group> (<year>2020</year>). <article-title>Scraping the demos</article-title>. <source>Digitalization, web scraping and the democratic project. Democratization</source> <volume>27</volume>, <fpage>426</fpage>&#x02013;<lpage>442</lpage>. <pub-id pub-id-type="doi">10.1080/13510347.2020.1714595</pub-id></citation></ref>
<ref id="B120">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Uprichard</surname> <given-names>E.</given-names></name></person-group> (<year>2013</year>). <article-title>Focus: big data, little questions?</article-title> <source>Discover Soc.</source> <volume>1</volume>, <fpage>1</fpage>&#x02013;<lpage>6</lpage>.</citation></ref>
<ref id="B121">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>van Dijck</surname> <given-names>J.</given-names></name></person-group> (<year>2014</year>). <article-title>Datafication, dataism and dataveillance: big data between scientific paradigm and ideology</article-title>. <source>Surveill. Soc.</source> <volume>12</volume>, <fpage>197</fpage>&#x02013;<lpage>208</lpage>. <pub-id pub-id-type="doi">10.24908/ss.v12i2.4776</pub-id></citation></ref>
<ref id="B122">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Veltri</surname> <given-names>G. A.</given-names></name></person-group> (<year>2017</year>). <article-title>Big data is not only about data: the two cultures of modeling</article-title>. <source>Big Data Soc.</source> <volume>4</volume>, <fpage>1</fpage>&#x02013;<lpage>6</lpage>. <pub-id pub-id-type="doi">10.1177/2053951717703997</pub-id></citation></ref>
<ref id="B123">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Venturini</surname> <given-names>T.</given-names></name> <name><surname>Jacomy</surname> <given-names>M.</given-names></name> <name><surname>Meunier</surname> <given-names>A.</given-names></name> <name><surname>Latour</surname> <given-names>B.</given-names></name></person-group> (<year>2017</year>). <article-title>An unexpected journey: a few lessons from sciences po m&#x000E9;dialab&#x00027;s experience</article-title>. <source>Big Data Soc.</source> <volume>4</volume>, <fpage>1</fpage>&#x02013;<lpage>11</lpage>. <pub-id pub-id-type="doi">10.1177/2053951717720949</pub-id></citation></ref>
<ref id="B124">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vydra</surname> <given-names>S.</given-names></name> <name><surname>Klievink</surname> <given-names>B.</given-names></name></person-group> (<year>2019</year>). <article-title>Techno-optimism and policy-pessimism in the public sector big data debate</article-title>. <source>Gov. Inf. Q.</source> <volume>36</volume>, <fpage>1</fpage>&#x02013;<lpage>10</lpage>. <pub-id pub-id-type="doi">10.1016/j.giq.2019.05.010</pub-id></citation></ref>
<ref id="B125">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wagner-Pacifici</surname> <given-names>R.</given-names></name> <name><surname>Mohr</surname> <given-names>J. W.</given-names></name> <name><surname>Breiger</surname> <given-names>R. L.</given-names></name></person-group> (<year>2015</year>). <article-title>Ontologies, methodologies, and new uses of big data in the social and cultural sciences</article-title>. <source>Big Data Soc.</source> <volume>2</volume>, <fpage>1</fpage>&#x02013;<lpage>11</lpage>. <pub-id pub-id-type="doi">10.1177/2053951715613810</pub-id></citation></ref>
<ref id="B126">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Walker</surname> <given-names>R.</given-names></name></person-group> (<year>2015</year>). <source>From Big Data to Big Profits: Success with Data and Analytics</source>. <publisher-loc>New York, NY</publisher-loc>: <publisher-name>Oxford University Press</publisher-name>. <pub-id pub-id-type="doi">10.1093/acprof:oso/9780199378326.001.0001</pub-id></citation></ref>
<ref id="B127">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Weber</surname> <given-names>M.</given-names></name></person-group> (<year>1922</year>). <source>Il Metodo Delle Scienze Storico-Sociali. Trans. It</source>. <publisher-loc>Torino</publisher-loc>: <publisher-name>Einaudi</publisher-name>. 2003.</citation></ref>
<ref id="B128">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Welles</surname> <given-names>B. F.</given-names></name></person-group> (<year>2014</year>). <article-title>On minorities and outliers: the case for making big data small</article-title>. <source>Big Data Soc.</source> <volume>1</volume>, <fpage>1</fpage>&#x02013;<lpage>2</lpage>. <pub-id pub-id-type="doi">10.1177/2053951714540613</pub-id></citation></ref>
<ref id="B129">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>West</surname> <given-names>G.</given-names></name></person-group> (<year>2017</year>). <source>Scale. The Universal Laws of Growth, Innovation, Sustainability, and the Pace of Life, in Organisms, Cities, Economies, and Companies</source>. <publisher-loc>New York, NY</publisher-loc>: <publisher-name>Penguin Press</publisher-name>.</citation></ref>
<ref id="B130">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Williamson</surname> <given-names>B.</given-names></name></person-group> (<year>2014</year>). <article-title>The death of the theorist and the emergence of data and algorithms in digital social research,</article-title> in: <source>The London School of Economics and Political Science: the Impact of Social Science Blog</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://blogs.lse.ac.uk/impactofsocialsciences/2014/02/10/the-death-of-the-theorist-in-digital-social-research/">https://blogs.lse.ac.uk/impactofsocialsciences/2014/02/10/the-death-of-the-theorist-in-digital-social-research/</ext-link> (accessed November 1, 2016).</citation></ref>
<ref id="B131">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Williamson</surname> <given-names>B.</given-names></name> <name><surname>Piattoeva</surname> <given-names>N.</given-names></name></person-group> (<year>2019</year>). <article-title>Objectivity as standardization in data-scientific education policy, technology and governance</article-title>. <source>Learn. Media Technol.</source> <volume>44</volume>, <fpage>64</fpage>&#x02013;<lpage>76</lpage> <pub-id pub-id-type="doi">10.1080/17439884.2018.1556215</pub-id></citation></ref>
<ref id="B132">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yeh</surname> <given-names>C.-L.</given-names></name></person-group> (<year>2018</year>). <article-title>Pursuing consumer empowerment in the age of big data: a comprehensive regulatory framework for data brokers</article-title>. <source>Telecommun. Policy</source> <volume>42</volume>, <fpage>282</fpage>&#x02013;<lpage>292</lpage>. <pub-id pub-id-type="doi">10.1016/j.telpol.2017.12.001</pub-id></citation></ref>
<ref id="B133">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Yu</surname> <given-names>B.</given-names></name> <name><surname>Zhao</surname> <given-names>H.</given-names></name></person-group> (<year>2019</year>). <article-title>Research on the construction of big data trading platform in China,</article-title> in <source>Proceedings of the 2019 4th International Conference on Intelligent Information Technology</source> (<publisher-loc>Da Nang</publisher-loc>), <fpage>107</fpage>&#x02013;<lpage>112</lpage>. <pub-id pub-id-type="doi">10.1145/3321454.3321474</pub-id></citation></ref>
<ref id="B134">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Zimmer</surname> <given-names>M.</given-names></name></person-group> (<year>2008</year>). <source>More on the &#x0201C;Anonymity&#x0201D; of the facebook dataset &#x02013; It&#x00027;s harvard college (updated). MichaelZimmer</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://www.michaelzimmer.org/2008/10/03/more-on-the-anonymity-of-the-facebook-dataset-its-harvard-college/">https://www.michaelzimmer.org/2008/10/03/more-on-the-anonymity-of-the-facebook-dataset-its-harvard-college/</ext-link> (accessed November 3, 2016).</citation></ref>
<ref id="B135">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Zunino</surname> <given-names>C.</given-names></name></person-group> (<year>2019</year>). <source>Scuola, trasferimenti di 10mila docenti lontano da casa. Il Tar: &#x0201C;L&#x00027;algoritmo impazzito fu contro la Costituzione&#x0201D;. La Repubblica</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://www.repubblica.it/cronaca/2019/09/17/news/scuola_trasferimenti_di_10mila_docenti_lontano_da_casa_il_tar_l_algoritmo_impazzito_fu_contro_la_costituzione_-236215790/">https://www.repubblica.it/cronaca/2019/09/17/news/scuola_trasferimenti_di_10mila_docenti_lontano_da_casa_il_tar_l_algoritmo_impazzito_fu_contro_la_costituzione_-236215790/</ext-link> (accessed January 12, 2020).</citation></ref>
<ref id="B136">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zwitter</surname> <given-names>A.</given-names></name></person-group> (<year>2014</year>). <article-title>Big data ethics</article-title>. <source>Big Data Soc.</source> <volume>1</volume>, <fpage>1</fpage>&#x02013;<lpage>6</lpage>. <pub-id pub-id-type="doi">10.1177/2053951714559253</pub-id></citation></ref>
</ref-list>
<fn-group>
<fn id="fn0001"><p><sup>1</sup>Alternative sources report the existence of 300 exabytes of data in 2007 and 1,200 exabytes in 2013 with a decrease of non-digital data from 7% to a mere 2% (Mayer-Sch&#x000F6;nberger and Cukier, <xref ref-type="bibr" rid="B80">2013a</xref>).</p></fn>
</fn-group>
<fn-group>
<fn fn-type="financial-disclosure"><p><bold>Funding.</bold> This work has been possible through the generous contribution of the Swiss National Science Foundation (SNSF) that awarded DR with an Early Postdoc.Mobility, titled Worldwide Map of Research, Grant No. P2ELP1_181930.</p>
</fn>
</fn-group>
</back>
</article>