Comments on the Use of the Journal Impact Factor for Assessing the Research Contributions of Individual Authors

Pudovkin, Alexander I.

doi:10.3389/frma.2018.00002

OPINION article

Front. Res. Metr. Anal., 02 February 2018

Sec. Scholarly Communication

Volume 3 - 2018 | https://doi.org/10.3389/frma.2018.00002

This article is part of the Research TopicIn Memory of Eugene Garfield: A Review of His Life and WorkView all 6 articles

Comments on the Use of the Journal Impact Factor for Assessing the Research Contributions of Individual Authors

Alexander I. Pudovkin*

Institute of Marine Biology, Russian Academy of Sciences, Vladivostok, Russia

The journal impact factor (JIF) is probably the best known invention of Eugene Garfield. Originally the JIF was introduced at the Institute for Scientific Information in early 1960s as an “in-house” index to assist in deciding whether to cover a journal in the Science Citation Index (Garfield and Sher, 1963; Garfield, 2006). The JIF, a measure of the short-term average citation rate of papers published in a journal, permitted Garfield to make a fair comparison of journals that publish hundreds of papers a year with “mastodons,” like the Journal of Biological Chemistry, which publish thousands of papers each year. Only later did the JIF acquire its present importance for librarians, journal publishers, and scientists. Along with the popularity of the JIF has come criticism. The literature critical of the JIF and its uses is huge and cannot be considered here, but a few general themes are worth attention.

It is often pointed out that the distribution of citations to papers in a journal is highly skewed, so an arithmetic average, used in the JIF calculation, is a poor representation of central tendency or typical performance: a few highly cited papers may significantly shift the JIF score upward from the median value. Other critics note that unscrupulous editors may manipulate the types of articles a journal publishes to game the JIF calculation, which counts all citations to a journal’s content but divides this number by citable items only (articles and reviews). Highly cited editorials, for example, would raise the JIF. There are also many reports of journal editors requiring authors to add citations to recent papers in the journal and thereby increase the JIF artificially. All three are valid complaints, but there are reasonable rebuttals to each. Although the distribution of citations to papers published in a journal is indeed skewed, there is a very high correlation between the JIF and the citation score of the median paper (Anonymous, 2013; Garfield and Pudovkin, 2015). This means that the JIF reflects not the citedness of a few top cited papers, but rather the citedness of the bulk of the journal’s papers. The citation skewness itself does not seem an important obstacle for using JIF in assessment of papers as noted by Ludo Waltman and Vincent Traag of CWTS, Leiden University, in theoretical terms using a variety of arguments in an as-yet unpublished article (Waltman and Traag, 2017).

Manipulations of the numerator/denominator numbers have been documented only occasionally and for only a few journals. Also, Clarivate Analytics monitors JIF data and suspends journals exhibiting “anomalous” patterns in their citation profiles (Anonymous, 2017). In general, JIF values are quite stable through time. Also, the traditional 2-year and the 5-year JIFs are quite similar.

The most intense criticism of the JIF is reserved for its use to characterize the research impact of individual papers and individual scientists (Seglen, 1989, 1992, 1994, 1997; DORA Declaration, 2012; Hicks et al., 2015; Wilsdon et al., 2015; Zhang et al., 2017). It is repeatedly emphasized that it is a great mistake to consider the JIF value as a proxy for citedness of individual papers in a journal. From this, it is argued that it is wrong to consider the JIF as a proxy for the influence of an author. Well, the JIF is certainly not a proxy for the citation score of a paper. But the skewness of the citation distribution is not the issue here (see Waltman and Traag, 2017). If the distribution would have been symmetrical or even normal, the citation scores of individual papers would differ from the JIF value simply because of the intrajournal variation of paper citedness: about half of the papers would be more cited than would be expected from the JIF value and the other half would be less cited. For instance, the number of citations for the document type “articles” (regular research reports) published in Nature in 2010 (by November 2017) range from 18 to 3,240, the third and first quartiles being 96 and 326 [data from Web of Science (WoS)]. Thus, the citation frequency of a paper may be and mostly is quite different from the average citedness.

However, the fact that a paper is published in a high-impact journal offers some evidence of its putative quality and potential importance. Similar opinion is expressed by Kurtz and Henneken (2017). Take the case of a relatively poorly cited paper appearing in Science or Nature. Such a paper was evaluated by at least two reviewers, who are specialists, in a process overseen by a knowledgeable editor. If the paper was approved for publication, it was judged for quality and importance by at least three informed, critical persons. For these high-impact journals, the ratio of submitted/accepted manuscripts is very high. This results in strong competition among the submitted papers, which allows these journals to exercise requirements of higher scientific merit of the manuscripts and the quality of presentation. The sieve of strict reviewing and editing by high-impact journals aims to ensure that papers published are of very high quality. Few will doubt that selection of a paper for publication in Nature or Science is prima facie evidence of quality regardless of its eventual actual citedness.

Garfield was ambivalent about using the JIF as an indicator of the impact of individual papers or authors. He wrote: “It would be more relevant to use the actual impact (citation frequency) of individual papers in evaluating the work of individual scientists rather than using the journal impact factor as a surrogate. The latter practice is fraught with difficulties, as Seglen and others have pointed out” (Garfield, 2001). Privately he told me that JIF values of papers published by an author may, however, be representative of an author’s professional standing. This indicator should not be the only measure considered, but rather it should be combined with other bibliometric indices (total number of citations, Hirsh index, etc.), and these always in combination with the expert opinion of peers. The JIF may be especially useful in consideration of recent publications, which have not yet had much time to accumulate citations (Abramo et al., 2010). Bornmann and I expressed this opinion (Bornmann and Pudovkin, 2017), and this was fully supported by Garfield, who reviewed our manuscript.

I would like to consider the well-known papers by Seglen (1989, 1992, 1994, 1997) and a recent review of Seglen’s conclusions using a much larger data set (Zhang et al., 2017). The conclusion of the recent paper is “None of our findings are contrary to the understanding that JIFs should not be used as performance measures of individual researchers and their publications.” However, I think that this conclusion is not supported enough by the actual data presented. In Figure 13 in the paper by Seglen (1994), reproduced as Figure 1 in the paper by Zhang et al., one can see that for highly cited authors, there is a strong correlation between the mean article citedness and the impact factor of the journals in which they were published. For less cited authors, there is a weaker correlation in the lower range of the JIF (0.5–4.0). For higher values of JIF (4.5–8.0), the citedness does not grow with an increase of JIF. A similar though more complex pattern is presented in the study by Zhang et al., in which authors interpret the data as evidence of poor correlation between paper citedness and the JIF.

I suggest another interpretation. In his 1994 paper, Seglen studied the performance of 16 researchers from biomedical fields (a very heterogeneous category). Some subfields of biomedicine, like molecular genetics or immunology, are characterized with high citation intensity or density. In other subfields, like general biology or taxonomy, citation intensity is much less. For highly cited authors, who probably work in the subfields of high citation intensity, there is a strong correlation mentioned above between the paper citedness and the JIF (although not quite linear). The less cited authors probably worked in fields exhibiting less citation density. Thus, their most cited papers, even when published in high-impact factor journals are cited with lesser frequency, one more characteristic for their specialty subfield. Hence, their citedness reaches a plateau, clearly seen in the paper by Seglen (1994). The same interpretation may be valid for the recent paper revisiting his conclusions (Zhang et al., 2017). Spearman correlation between paper citedness and the JIF for the pooled data in the latter paper is 0.55, which is not negligibly small.

As stated above, the main argument against the use of the JIF for characterization of individual papers (and scientists) is a lack of strong correlation between the JIF and citedness of the individual papers published in them. The authors expressing this view implicitly believe that citedness of the papers is a measure of its scientific merit. However, many would argue that the best available criterion of the merit of a research paper is the opinions of peers. In this context, it is interesting to note that there is no strong correlation between citedness of papers and their appraisal by peers: the correlation is quite modest, 0.45 (Bornmann and Leydesdorff, 2013), not much different from the correlation between citedness and IF just mentioned (Seglen, 1994; Zhang et al., 2017).

I argue that the impact factor of journals in which a paper was published can be used as an indication of the paper’s scientific merit, along with other bibliometric indicators, raw or normalized. Prognostic value of the normalized JIFs in differentiation of poor and high performers is shown by Bornmann and Williams (2017). The importance of normalization of JIFs was specifically discussed in the study by Pudovkin and Garfield (2004). For instance, the median JIF in the WoS specialty category “Agronomy” is only one quarter of that in “Biochemistry & Molecular Biology.” That is why Garfield and I suggested rank normalization to compare journals across subject categories. Percentile normalization of JIFs was implemented in the JCR in 2015. Thus, considering scientists of different specialties, one should use both the original (“raw”) JIF and the normalized JIF percentile.

My own 30-year experience as a member of committees for recruitment of new research staff and for regular reviews at the Institute of Marine Biology, Russian Academy of Sciences is relevant. I know that committee members usually only look through the list of publications and rarely even read the titles of papers; however they do pay attention to the journals in which the papers of the applicant were published. Thus, the judgment of the committee members is based mostly on the journal titles. The impact factor of the journals (“raw” and percentile) would be much more informative. I believe such a situation is quite common for the developing countries like China, India, and throughout Latin America and Eastern Europe. The mere fact that a paper is published in a journal covered by WoS or Scopus is considered in some countries a substantial evidence of the quality of the paper, regardless of the impact factor of the journal or the actual citedness of the paper.

I believe that the main role of bibliometric indicators is in screening out poor performers rather than identifying and discriminating among “champions.” Moed (2017) has recently expressed a similar view in recommending the use of citation data for determining “minimum performance standards.” To gauge differences in excellence using the indicators is unreasonable for many reasons. I will list only a few:

(1) Number of authors on a paper. A team of authors can produce more papers than a single author, so a member of a research team may easily co-author many papers a year, while a single researcher will produce fewer. Using fractional credit with the weight of 1/n (n being the number of authors) seems a poor solution: it makes participation in multiauthor research “bibliometrically” less rewarding. Possibly a better weight would be 1/sqrt(n). In any case, explicit statements of each author’s contributions would be required to address author credit for papers and citations in a precise way (Sauermann and Haeussler, 2017).

(2) Difference in citation density in various fields and subfields. A key problem is the delineation of fields and subfields to obtain valid baseline measures to assess relative performance (Waltman and van Eck, 2013; Ruiz-Castillo and Waltman, 2015; Ioannidis et al., 2016).

(3) Citations differ in importance. The weight of a citation mentioning one paper among many others in a literature review is not large. However, if a paper is specifically cited as a reason for performing the research presented, then the weight of the citation is significant. Citation context, including analysis of sentiment in the text of the citing passage (a “citance”), has long been desired but only recently explored, thanks to greater availability of full text in a digital form (Moravcsik and Murugesan, 1975; Ding et al., 2014).

So, gauging the differences in performance among highly productive authors is difficult because of the imperfections and limitations of citation-based indicators, including JIFs. However, identifying poor performers is a relatively simple task: they do not publish in high-impact journals, and their papers are poorly (if at all) cited.

Bibliometric theoreticians seem to operate very far from the actual problems of those managing research institutions and conducting research assessments. Simple appeals not to use the JIF in the evaluation of individual scientists are ignored by those managing the merit rating procedures. JIFs are used as proxy measures worldwide. What is the alternative? It is as Garfield said: to use instead actual citedness of authors and their papers. That is reasonable, even to be encouraged, but it is much more time consuming. Besides, as I argue, citedness is not the ultimate evidence of the quality of a paper (Bornmann and Leydesdorff, 2013), especially for recent publications. To base a merit rating on the opinion of peer reviewers would be preferable, but it is not practical for regular assessment procedures since, generally, no enough peers with the necessary expertise are available when needed and there is usually no enough time to devote to evaluation. Summing up my opinion, it is reasonable and justified to use JIFs for merit rating of individual scientists; this is an easy and quick procedure. For a thorough analysis of individual performance, it would be preferable to use a set of bibliometric indices (total citation number, Hirsh index, JIFs) and peers’ reviews.

Author Contributions

The author confirms being the sole contributor of this work and approved it for publication.

Conflict of Interest Statement

The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

The author appreciates help and valuable advice of Mr. David Pebdlebury (Claricate Analytics).

References

Abramo, G., D’Angelo, C. A., and Di Costa, F. (2010). Citations versus journal impact factor as proxy of quality: could the latter ever be preferable? Scientometrics 84, 821–833. doi:10.1007/s11192-010-0200-1