Zenodo in the Spotlight of Traditional and New Metrics

Peters, Isabella; Kraker, Peter; Lex, Elisabeth; Gumpenberger, Christian; Gorraiz, Juan Ignacio

doi:10.3389/frma.2017.00013

ORIGINAL RESEARCH article

Front. Res. Metr. Anal., 22 December 2017
Sec. Research Assessment
Volume 2 - 2017 | https://doi.org/10.3389/frma.2017.00013

Zenodo in the Spotlight of Traditional and New Metrics

Isabella Peters¹*

Peter Kraker^2,3

Elisabeth Lex⁴

Christian Gumpenberger⁵

Juan Ignacio Gorraiz⁵

¹Web Science, ZBW Leibniz Information Centre for Economics, Kiel University, Kiel, Germany
²Social Computing Area, Know-Center, Graz, Austria
³Open Knowledge Maps, Vienna, Austria
⁴Graz University of Technology, Institute of Interactive Systems and Data Science, Graz, Austria
⁵Bibliometrics and Publication Strategies Department, University of Vienna, Library and Archive Services, Vienna, Austria

In this case study, we aim to explore the characteristics and the reception of files uploaded to Zenodo, and the role the repository plays itself in generating usage. To this end, we first apply descriptive statistics on Zenodo’s full set of data record metadata with digital object identifiers (DOIs) until and including January 2017 (n = 141,777 records). Second, we estimate the coverage of Zenodo datasets in the Data Citation Index as well as of Zenodo journal articles in the Science Citation Index, Social Science Citation Index, and Arts and Humanities Citation Index, Zenodo books and book chapters in the Book Citation Index, and Zenodo conference papers in the Proceedings Citation Index, and analyze their citedness according to the different data record types in Zenodo (e.g., journal article, dataset, book, or conference paper). Third, we provide a bibliometric analysis of Zenodo records by using different metrics for citedness, including citation, usage, and altmetrics. Altmetrics data are gathered from two of the most popular tools for altmetric analyses, PlumX and Altmetric.com, and we compare the results. Moreover, we study how open access and DOIs provided by Zenodo influence the impact of Zenodo data records and we find a tendency toward a positive relationship between permissive access rights and altmetrics in certain data records.

Introduction

Driven by the urgent calls for science being more open, for example, from the European Commission’s Directorate-General for Research & Innovation and Commissioner Carlos Moedas,¹ several platforms have been developed that facilitate storing, publication, and sharing of diverse types of research products, e.g., articles, softwares, and data. Sharing facilities for research data are of special importance to many since research data are believed to be the major resource that enables innovation in various areas. Due to the different kinds of research data spanning from interview data to videos and the different stakeholders interested in different data, to date, several types of data repositories are available (Pampel et al., 2013; He and Nahar, 2016): from institutions (e.g., Harvard Dataverse), for disciplinary (e.g., Pangea) or multidisciplinary purposes (e.g., Dryad, Figshare, Mendeley data, and Zenodo), or only project-specific repositories (e.g., CERN Open Data Portal). Over the last years, all research data repositories have witnessed an exponential growth of data deposits. For example, He and Nahar (2016) report that by the end of 2014, Dryad stored almost 40 times as many datasets as by the end of 2010; Robinson-García et al. (2017) identified a sudden increase of data publications in 2016 probably due to the embargo they are restricted by. Among others, this can be attributed to an increased demand by publishers and funders to make scientific data available (e.g., PlosOne,² Nature,³ or Bill & Melinda Gates Foundation⁴).

The rise in publication outlets for research data has been accompanied by an increasing number of services that make research data searchable, such as Elsevier’s DataSearch. Other services, e.g., Clarivate’s Data Citation Index (DCI; Robinson-García et al., 2016) collect whether and to what extent that data are reused by others. Also, social media engagement with research data has become visible now because data records are usually equipped with a digital object identifier (DOI) or another persistent identifier (Mayernik and Maull, 2017), which are tracked by altmetrics aggregators such as Altmetric.com or PlumX. Given that an increase in reputation—reflected by metrics, e.g., citations—is one of the most significant drivers of data sharing (Bruno et al., 2015; Kratz and Strasser, 2015), there is a growing interest in the impact of research data. Several studies have confirmed that articles that share code (Vandewalle, 2012) or data (Piwowar et al., 2007; Piwowar and Vision, 2013; Drachen et al., 2016) are more often cited and gain higher altmetrics scores.

Literature Review

Robinson-García et al. (2016) provide an extensive study of Clarivate’s DCI and found that 88.1% of data have not been cited at all. From the data records cited, data sets received most citations (73%), but data studies are cited more often on average. All repositories included in the DCI are cited at least once. In terms of disciplinary coverage, the DCI is biased toward the hard sciences. The majority of data records are described as “data set,” and four repositories account for 75% of DCI’s records (Torres-Salinas et al., 2014).

Dryad was subject to several analyses. He and Nahar (2016) studied 550 URL citations to Dryad from scholarly publications indexed by Scopus. They found that 95% of publications that cite data from Dryad are research articles and that there is an increase in the data citations of 3% every year. Almost 50% of Dryad data were cited in the same year, and 70% were cited within 2 years. Interestingly, 84% of articles that cite Dryad data are self-citations, meaning that the title of the cited data matches the title of the citing publication. Thus, the authors conclude that only “16% of the total data cited in the references of publications are ‘real data reuse,’ which means that the shared scientific data are reused by others.” Raw data, e.g., experimental data, received significantly more non-self-citations than other forms of data.

In a follow-up study, He and Han (2017) analyzed the citedness of data records from the Dryad repository and the relation between their usage counts (i.e., number of downloads) and their citations. Citation counts from the Web of Science (WoS) were obtained for those 9,333 articles from Dryad whose DOIs were found in the WoS. All Dryad records have been downloaded at least once but the percentage of 0 citations varies between 7% for 42 articles from “Cell biology” and 57% for 23 articles from “Biochemical research methods.” The subject with the most Dryad records, “Ecology” (3,686 articles), revealed 25% of uncited articles. The authors could show that there are positive, but discipline-specific correlations between download numbers and citation counts (maximum Pearson coefficient of 0.593 for “behavioral science”).

Robinson-García et al. (2017) provide an extensive description of DataCite,⁵ how to access the more than seven million data records, what kinds of data are included, where stored data stems from, and when data were published. They found that 12% of valid records are text or articles and that 23% of all DataCite records include DOIs although this number is highly dependent on the data record type (i.e., 90% of DOIs belong to datasets). However, more than one million records (14.7% of all records) were empty due to some data structure modifications carried out by DataCite and should be removed when using DataCite for bibliometric or other analyses. The authors revealed some more conceptual issues with DataCite (e.g., what constitutes data records, how they should be described, or what is the correct publication year) that impact the description of data and which also should be taken into account when working with DataCite information.

Thelwall and Kousha (2016) explored the Figshare repository via profiles of 2,675 Figshare members and showed that data labeled as “media” are more often viewed than other data types. Plus, while “datasets” are the most common data record they are least viewed and second least shared. However, “the results give no suggestion that any particular resource type is ignored in Figshare” (Thelwall and Kousha, 2016). The engagement with data on Figshare is yet influenced by the subject category and the resource type. Later, Park and Wolfram (2017) performed a citer-based analysis on the basis of two million data publications from the WoS subject area with the most data citations, i.e., “Genetics and Heredity.”

While efforts are in place that foster data citation standards (e.g., Data Cite or Martone, 2014), it has been pointed out (He and Nahar, 2016) that signals of research data reuse still most often appear in two different forms: first, as data citations that formally cite data in the reference section of the article and which point to the data set themselves (so-called “reuse citations”; Mayo et al., 2016; see also Robinson-García et al., 2016), and second, as mentions or footnotes (so-called “intratextual citations”; Mayo et al., 2016) in the body of the article. Mayo et al. (2016) found that intratextual citations are most common in articles and that formal reuse citations only appear in 8% of articles. Those varying citation practices have severe implications for bibliometric studies, which typically rely on citation data that are most often extracted from the reference sections of articles (Konkiel, 2013).

There has been a call for alternative metrics for data reuse, which goes beyond traditional forms of citing (Dorch, 2013). Ingwersen (2014) argues that a Data Usage Index (DUI) is needed, that includes, among others, number of searched records, download frequencies, and number for searches. By using such metrics, he could show that “a Bradford distribution can be observed for searched biodiversity dataset records dispersed over datasets” (p. 111) which let him conclude “that the DUI is appropriate” (p. 116).

In addition to citing research data in the articles, there is a growing number of data journals available that publish descriptions of and links to research data. A study conducted by the CWTS for Elsevier⁶ revealed that one-third of researchers do not publish data at all. If they share data, 33% of asked researchers publish data as an appendix to the publication, 28% as stand alone data publication, and 13% in a data repository. Thus, the authors conclude that “data journals are still a relatively small-scale phenomenon […] [but] the popularity of data journals is growing quite rapidly” (p. 18).

Aims of the Study

Our case study follows the vein of works that describe novel platforms and their potential for bibliometric studies, for example, Kraker et al. (2015), Peters et al. (2015, 2016), and Thelwall and Kousha (2016), who studied Figshare, or Robinson-García et al. (2017) who investigated DataCite, or He and Han (2017) and He and Nahar (2016) who studied Dryad.

In our work, we focus on the multidisciplinary open access (OA) research data repository Zenodo. Zenodo was created by OpenAIRE and CERN in 2013, and by now, it has been integrated into reporting lines for research funded by the European Commission. Zenodo accepts any file format as well as both positive and negative results, and it assigns all publicly available uploads a DOI to make the upload easily and uniquely citable. As one research data outlet among many, to date, Zenodo has not been extensively studied—hence, its characteristics and the reuse and reception of its data records are yet to be explored. Therefore, in this case study, we investigate Zenodo’s data records, their temporal distribution as well as the relation between data records and their licenses [e.g., closed access (CA), embargoed access, or OA]. In addition, we are interested in to what extent Zenodo records are reused or engaged with.

To understand the broader impact of the different kind of data records uploaded in Zenodo, we conduct a bibliometric analysis by using different metrics, including citations, usage, and altmetrics. Metrics data are gathered in Clarivate’s Web of Science Core Collection (including Proceedings and Book Citation Indices; WoS CC) and DCI as well as in one of the currently most popular tool for altmetrics studies, PlumX. The results of the bibliometric analysis are compared, especially with respect to the citedness of data records, the relationship between citations/altmetrics and data record types, and the data availability and intensity of altmetrics that are provided by PlumX and Altmetric.com. Also, we assess the role of the access rights of data records and how they affect altmetrics.

As such, with our case study, we aim to shed light on the characteristics of Zenodo and the broader impact of the different kinds of data records uploaded to the repository. Furthermore, we give insight into the reception of research data in Zenodo in comparison to research data from our previous analyses of Figshare and the DCI (Kraker et al., 2015; Peters et al., 2015, 2016). Thus, we will be able to show whether the repository itself (and its popularity as, for example, induced by mandates from the European Commission) affects usage and reception of data records. Such information can guide authors’ strategic publishing behavior.

Data and Methods

The next section will give an overview on the data and methods used in this study.

Descriptive Statistics of Zenodo Data Records

The basis for our analyses is Zenodo from which we downloaded all metadata records via the OAI–PMH interface using the R package oai (Chamberlain and Bojanowski, 2016) on 17 January 2017, resulting in 142,131 data records. This dataset has been uploaded to Zenodo (see Peters et al., 2017). Thereof 141,777 data records (99.8%) were assigned a DOI and could be used for our further analyses. Table 1 provides an overview on the types of data records contained in Zenodo and their quantities. In addition, the access rights the authors have assigned to the data records are given. Please note that most of the data stored in Zenodo are OA.

TABLE 1

Table 1. Description of Zenodo data records with digital object identifier (DOI) and all publication years.

Zenodo’s records date back until 1762 (year of first publication, not publication at Zenodo), but the majority of records (133,917 or 94.5%) have been published between 2002 and 2016 (1,123 records were published in 2017; six records have publication dates in the future). Figure 1 shows the temporal trend of data record types per publication year. It reveals that journal articles and software have seen strong growth over the last 3–4 years, whereas the number of figures has seen sharp drops in 2014 and 2016. The numbers of datasets and presentations have been steadily growing, but conference papers and books have seen a drop in 2016. Note that these numbers may be influenced by the data collection taking place in January 2017, where not all items published in 2016 may have yet been entered into Zenodo. This may explain the sharp drop of figures in 2016 but would need to be confirmed by further data collection.

FIGURE 1

Figure 1. All data record types per publication year in Zenodo with digital object identifier and publication years 2002–2016 (n = 133,917 records).

For the further bibliometric and altmetric analyses, we took into consideration 68,339 journal articles, books and book sections, conference papers, datasets, software, reports, technical notes and working papers, theses, and other non-figures (e.g., diagrams, audiovisuals) with all publication years. To facilitate a more comprehensive description of the results and to condense the large variety of data record types available in Zenodo, the data records were summarized into following seven groups:

(1) journal articles including all journal articles (n = 33,708),

(2) books including Zenodo’s data records “book and book section” (n = 3,366),

(3) conference papers including “conference papers” (n = 11,967),

(4) datasets including “data sets” (n = 3,660) and “software” (n = 13,747),

(5) reports including “working papers” and “technical notes” (n = 1,276),

(6) theses (n = 614), and

(7) other non-figures (n = 4,737) including “audiovisual,” “diagram,” “drawing,” “interactive resource,” “other,” “patent,” “photo,” “plot,” “poster,” “preprint,” “presentation,” “project deliverable,” “project milestone,” “proposal,” and “software documentation.”

The data record type “figure” was excluded from the bibliometric analyses because it is not indexed by the used databases (i.e., Web of Science) and because of its overrepresentation in Zenodo (n = 68,701) that would distort the results.

In this article, we only report findings for the data record types “book,” “conference paper,” “journal article,” and “dataset.” The complete results for data availability as well as for bibliometric and altmetric analyses for each Zenodo data record type (i.e., “journal articles,” “conference papers,” “books,” “datasets,” “software,” “reports,” “theses,” and “other”) including a short statistical analysis [i.e., items with data available, % of items with data available, sum (intensity), mean, mean available (density), maximum, SD, and T confidence interval (α = 0.05)] are included in the Tables A1–A14 in Supplementary Material.

Data and Methods for Bibliometric Analyses of Zenodo Data Records in WoS CC and DCI

A coverage analysis of all data records uploaded to Zenodo corresponding to the categories “book,” “conference paper,” and “journal article” was performed in the Web of Science Core Collection (i.e., Science Citation Index, Social Science Citation Index, and Arts and Humanities Citation Index, but not Emerging Sources Citation Index because it has not been available until 2015), Zenodo books and book chapters in the Book Citation Index, and Zenodo conference papers in the Proceedings Citation Index including the Proceedings and Book Citation Indices (WoS CC). To this purpose, Zenodo DOI’s from all publication years were retrieved from the complete WoS CC using the advanced search and the field DOI. All the retrieved records (n = 3,332 books, n = 11,942 conference papers, and n = 27,252 journal articles) were then downloaded to enable further citation analyses according to the number of citations attracted in WoS CC (Download Field = TC) and in the complete WoS Platform (WoS; Download Field = ZA). Data collection on WoS CC and WoS took place in July 2017.

Likewise, the coverage of the category “datasets” (including the data record type “datasets” as well as “software”) was analyzed in the corresponding tool hosted by Clarivate Analytics, DCI. A citation analysis was also performed for these data record types (n = 17,408) in DCI. Data were collected from the DCI on January 2017.

The coverage of Zenodo records in the WoS CC (see “Zenodo items in WoS CC” in Table 2) and DCI is reflected by the ratio of data records found in the databases, and the amount of records that have been searched for. The citedness (i.e., citation rate) of Zenodo records is the ratio of all citations and all records searched in WoS CC (see “Zenodo items cited in WoS CC” in Table 2) and DCI.

TABLE 2

Table 2. Coverage of Zenodo data records in WoS CC and PlumX (Zenodo data record types: books, conference papers, and journal articles—all publication years).

Data and Methods for Altmetric Analyses of Zenodo Data Records in PlumX and Altmetric.com

For the altmetrics data collection and aggregation, currently, three major tools are available: ImpactStory, Altmetric.com, and PlumX. Whereas Altmetric.com and PlumX focus on institutional customers (e.g., publishers, libraries, or universities) by gathering and providing data on a large scale, ImpactStory rather targets individual researchers who wish to include altmetrics information in their CV (Jobmann et al., 2014; Peters et al., 2016; Torres-Salinas et al., 2017). The provider of PlumX is Plum Analytics, a 100% subsidiary of EBSCO Information Services since 2014. However, during the course of writing this article, Elsevier took over PlumX from EBSCO. Altmetric.com is a subsidiary from Digital Publishing launched in 2011. By using different identifiers for scholarly products (e.g., DOI, PubMed-ID, and URN), it tracks altmetrics signals in platforms such as Facebook, Twitter, and Mendeley, but also in news outlets, policy documents, and Wikipedia.

For this study, we have used the fee-based PlumX altmetrics dashboard because it gathers and offers publication-level metrics for so-called artifacts, which also include monographs or books, articles, audios, videos, book chapters, patents, or clinical trials, and it allows DOIs to be directly entered as well as many other identifiers.

To gather the data in PlumX, a plain text file containing all the DOIs for all publication years of the Zenodo sample for each data record type (see Table 1 and data record type categorization) has been introduced in PlumX and processed by the tool, providing a new dataset including all the resulting “artifacts”—as data records are named in PlumX—and the corresponding altmetric scores gathered from each tool consulted in. The resulting dataset for each data record type can be exported to Excel in CSV format. The final dataset that we used for the analysis contains, n = 27,252 journal articles, n = 3,332 books, n = 11,942 conference papers, n = 1,178 reports, n = 3,606 datasets, n = 13,309 software, n = 600 theses, and n = 111 other document types. The resulting dataset also includes the scores of all measures according to their origin. The measures are categorized into five separate dimensions: Usage, Captures, Mentions, Social Media, and Citations (Torres-Salinas et al., 2017). This categorization may be subject to criticism, but one big advantage of PlumX is that the results are differentiated in the resulting dataset for each measure and its origin and can be aggregated according to the user criterion.

The comparison of the altmetrics data from PlumX and Altmetric.com is based on an Altmetric.com database dump. It includes altmetric data for publications with assigned and trackable identifiers, e.g., DOI, PubMed-ID, from its launch in 2011 till June 2016. To work with complete yearly data, we only have used altmetrics for Zenodo data records with the publication years from 2011 until 2015 and altmetrics events till June 2016 (time of download from Altmetric.com) and January 2017 (time of download from PlumX). This resulted in a dataset of 51,051 data records, which makes up for 36% of all Zenodo data records with DOI.

PlumX and Altmetric.com apply different data collection procedures that massively affect the amounts of data records and altmetrics signals that can be analyzed. PlumX counts a data record even if no altmetric signal could be found (e.g., if this DOI has not been tweeted). Altmetric.com tracks Mendeley data for scholarly products if they receive at least one other altmetrics event. Hence, Mendeley data should be analyzed carefully (Costas et al., 2014).

To enable the comparison of both altmetrics providers for data availability and altmetric analyses, we report several indicators to accommodate for the data collection differences:

• data availability = number of data records traced in PlumX and Altmetric.com;

• data with scores = number of data records traced in PlumX and Altmetric.com with at least one score (>1)⁷;

• data with scores (%) = number of data records traced in PlumX and Altmetric.com with at least one score (>1) and in relation to the number of Zenodo records searched;

• intensity = sum of all altmetric signals or scores;

• density (mean available) = sum of all altmetric signals or scores in relation to the number of Zenodo records with at least one score (>1).

Data availability reflects that data records could be found in PlumX and Altmetric.com although they may not have any altmetric signals to analyze (i.e., similar to uncitedness). Data with scores show the number of data records that has been engaged with at least once (e.g., via tweeting a DOI). As explained by Haustein et al. (2015), the intensity describes how often a data record has been referenced on a social media platform. Density resembles the citation rate that is highly affected by the data with scores (e.g., low data with scores lead to poor values for density).

It is noteworthy that the majority of altmetrics tools currently do not rely on a source index as featured in citation databases. Therefore, retrieval is only successful for documents, which have at least been commented, discussed or mentioned once. Consequently a 0 score is neither available nor sustainable. For comparison, citation databases like the Web of Science are always based on a source AND a citation index. A document available in the source index, but having remained uncited in the citation index so far automatically results in a 0 score.

In this respect, PlumX is unique among the current altmetrics tools, since it creates an index of all documents that have so far been searched and processed by this tool at least once. In doing so, bibliographical data are enhanced automatically, e.g., PlumX includes different variations of the ISBN already used for identification purposes in one of the previous searches (see Torres-Salinas et al., 2017).

Data and Methods for the Analysis of the Relationship between Altmetrics Indicators of Zenodo Data Records and Access Rights

For this study, we have exemplarily investigated the relationship between altmetrics indicators and access rights for journal articles in Zenodo. Our final dataset contains of n = 1,422 OA, n = 2,370 CA, n = 23 Embargoed Access (EA), and n = 12 restricted access (RA) journal articles. Due to the low size of EA and RA, we merged them with the CA articles, resulting in a total of 2,405 CA articles. We treated our analysis as a classification problem where the two types of access rights represent the target classes, i.e., OA, and CA. Each journal article is described by a vector, whose dimensions are the journal article’s altmetrics indicators. We specifically took as dimensions the counts of the following eight indicators extracted from Altmetric.com: blogs posts, unique users in Twitter, f1000 posts, news posts, Facebook posts, Wikipedia posts, readers in Mendeley, readers in CiteULike due to the fact that in our dataset, we had the most data for these indicators, while for other indicators such as, e.g., posts in Reddit, mostly no data were available. Our aim was to study the relevance and relationship between these eight altmetrics indicators and the access rights with the hypothesis in mind that, e.g., OA would have a positive impact on the number of Mendeley readers or Wikipedia articles. For the experiments, we utilized a stochastic gradient descent (SGD) (Zhang, 2004) approach since it is an iterative, efficient approach to fit linear models and due to the fact that its model parameters (i.e., its coefficients) let us assess feature relevance. We used the SGD implementation from the python library scikit-learn.⁸ Due to the fact that SGD is sensitive to feature scaling, we standardized our data using the StandardScaler also from scikit-learn, which scales the data to a mean of 0 and an SD of 1.

Results

We first present results of the coverage of Zenodo records in the WoS CC and DCI as well as data availability in the altmetrics aggregators PlumX and Altmetric.com. After that, we compare the citedness and altmetrics signals of Zenodo records via WoS CC, DCI, PlumX, and Altmetric.com. Then, an analysis of the access rights of Zenodo’s data records is presented.

Coverage and Data Availability of Zenodo Data Records in WoS CC, DCI, PlumX, and Altmetric.com

WoS CC

The results for the data record types “books,” “conference papers,” and “journal articles” are summarized in Table 2. The results show an almost insignificant coverage percentage for “books” and “conference papers” (less than 1%), due also to the low indexing degree of these publication types in WoS CC. On the other side, almost 13% of the “journal articles” uploaded to Zenodo are indexed in WoS CC.

Data Citation Index

In total, DCI indexed 13,788 records with Zenodo as “source” (including the repository itself): Thereof one “repository,” 147 “data studies,” 2,855 “data sets,” and 10,785 “software.” All records, except ten, could be downloaded. The categories “data set” and “data study” from DCI were compared with the data record type “dataset” in Zenodo, as well as the category “software,” which is identical in both data sources. The results are shown in Figure 2. Almost 80% of the data records categorized in Zotero as “datasets” or “software” were also indexed in DCI. And 1.5% of the data records categorized as “data sets,” “data studies,” or “software” and assigned to Zotero in DCI could not be found in Zotero due to different categorization, indexation delays, elimination, or wrong indexation or DOI.

FIGURE 2

Figure 2. Coverage of Zenodo data records in the Data Citation Index (DCI) (Zenodo data record types: datasets and software; all publication years).

PlumX

Table 2 also provides information about the data availability of Zenodo in PlumX. It should be noted that some PlumX records could not be rematched with the original Zenodo data via DOI. The data used in this table covers 99.99% (3,331 of 3,332) of “books,” 99.98% (11,940 of 11,942) of “conference papers,” and 99.81% (27,201 of 27,252) of “journal articles” in PlumX. The data availability percentages are much higher than in WoS CC for all document types. Almost all Zenodo journal articles indexed were also retrieved in PlumX.

Altmetric.com

From 51,051 data records, 4,692 (9%) were found in the Altmetric.com dataset (see Table 3). Availability of altmetrics data highly varies across the data record types: 15% of journal articles, 11% of reports, working papers, technical notes, and 10% of datasets could be retrieved. The other data record types were less frequent.

TABLE 3

Table 3. Comparison of data records found in Altmetric.com and PlumX (n = 51,051; publication years <2016).

PlumX versus Altmetric.com

As can be seen in Table 3, in terms of data availability PlumX outperforms Altmetric.com for all data record types except for “other” which were not recorded by PlumX. However, the direct comparison of PlumX and Altmetric.com is challenging because both altmetrics providers collect and deliver different altmetric signals. As you can see in Table 4 only the signal “Mendeley reader” is provided by both aggregators.

TABLE 4

Table 4. Full list of signals for journal articles available in Altmetric.com and PlumX (n = 3,399; publication years <2016).