Eugene Garfield’s Influences upon the Future of Evaluative Informetrics
- Sapienza University of Rome, Rome, Italy
This contribution highlights Eugene Garfield’s influences upon the author’s views and proposals. In addition, it presents the author’s perspective on the future of evaluative informetrics, based on his monograph Applied Evaluative Informetrics published in September 2017. It discusses main criticisms against the current practices in the use of informetric indicators in research assessment, and sketches a series alternative approaches.
Eugene Garfield has had an enormous influence on library and information science and on quantitative studies of science and technology, especially on the development of indexing systems of scholarly literature, the study of the scholarly communication and reward system, and on a subject as sensitive as the assessment of scholarly performance.
I worked for many years with director Anthony van Raan and other colleagues at the Centre for Science and Technology Studies (CWTS) at Leiden University. Hence, in this contribution I focus on the science studies and research performance part of the wide spectrum of academic disciplines Eugene so strongly influenced. If there had not been a Science Citation Index from the early 1960s, and if Eugene would not have enabled us at CWTS to further explore its potential in research assessment, I would not have become active in the field, and CWTS may not have been established, at least, not in the form in which it was founded in the 1980s.
As a personal note, I wish to add that the volumes of Eugene’s Current Comments essays published in Current Contents were on my bedside table from the day I became active in the field. All these essays are currently available via the website http://www.garfield.library.upenn.edu/. And although I carefully read many of them, and used some of his ideas in our research and development activities at CWTS, I always had, and still have, even today, the impression that many of his ideas have not yet been sufficiently explored. I also had the great opportunity of meeting Eugene in person several times, and to collaborate with him directly, in the preparation of a joint research article; moreover, he commented on draft versions of each chapter in my monograph Citation Analysis in Research Evaluation, published in 2005.
In the current contribution, I will highlight in Section “How Eugene’s Activities Influenced R&D at CWTS” a series of concrete influences of Eugene’s achievements—both ideas and information products—upon the work I conducted with other colleagues at CWTS. Next, I will discuss my perspective on the future of what I termed Applied Evaluative Informetrics, by presenting the main lines of a second monograph I published in September 2017, and highlighting Eugene’s influences upon my views and proposals.
This book discusses the pros and cons of the use of bibliometric or informetric indicators in the assessment of research performance. It highlights basic assumptions underlying this approach, discusses major criticisms on current application practices, and sketches alternative practices exploiting the potential of informetric indicators but taking into account these criticisms. Section “Main Criticisms against the Use of Informetric Indicators in Research Assessment” discusses my views on main criticisms of the use of informetric indicators in research assessment, while Section “Alternative Approaches to the Assessment of Academic Research” presents a series of alternative approaches to the assessment of academic research. Section “Evaluative Frameworks” discusses in more detail one of the key concepts used in the book, namely evaluative framework.
Research performance is conceived as a multi-dimensional concept, and the book not only deals with classical indicators based on publication and citation counts, but also with new generations of indicators, denoted with terms such as altmetrics, webometrics, and usage-based metrics, and derived from multiple multi-disciplinary citation indexes, electronic full text databases, information systems’ user log files, social media platforms, and other sources. These sources are manifestations of the computerization of the research process and the digitization of scientific-scholarly communication. This is why the current book uses the term informetrics rather than bibliometrics to indicate its subject.
How Eugene’s Activities Influenced R&d at CWTS
i. Eugene has always been very interested in science mapping. As Sandy Grimwade argues in his contribution in this volume (Grimwade, 2018), Eugene was a promotor of the Atlas of Science project at the Institute for Scientific Information (ISI), and he has given it his strong support during his presidency at ISI. As many other groups in the field of science studies, the CWTS was most interested in the groundbreaking work by Small (1973) and by White and Griffith (1981) on co-citation analysis in the 1970s and 1980s. Moreover, in 1983 a group of distinguished colleagues from France launched co-word analysis, a clustering technique structurally similar to co-citation analysis, but based on keywords from titles, abstracts, or full texts rather than on citations (Callon et al., 1983). We hypothesized that an intelligent combination of these two techniques could overcome the phenomenon that authors who work on similar topics may not be well aware of each other’s work and do not cite each other, and, therefore, remain invisible in a purely citation-based map. This resulted in a series of research papers by Robert Braam, Anthony van Raan, Harry Peters, and myself (see, for instance, Braam et al., 1991), which are still among the most cited papers from CWTS.
ii. Eugene’s papers contained many descriptions of how citation analysis was used to monitor and improve the citation indexes that were developed at the ISI (e.g., Garfield, 1972, 1979). Just as quantitative studies of the scientific literature were used to study science itself, Eugene used the data he collected to study his own products, especially the SCI, with a view to ensuring that he was indexing the internationally influential journals—those with outsized impact. When we had created a so-called bibliometric version of the SCI, SSCI, and A&HCI at CWTS, we were able to present a broader overview by including also social sciences and humanities, and at the same time to show more detail within the science field by conducting analyses by research sub-discipline. It was found that especially in the social sciences and the humanities, in which books are a more important medium of publication than in the sciences and where fields appear more fractionated, the joint ISI indexes had a lower rate of coverage than the natural and life sciences. Detailed results were published in Moed (2005) (Ch. 7).
iii. In discussions about ISI overage, Eugene firmly criticized the claim that the ISI indexes (Web of Science) do not cover books. “The ISI indexes contain millions of citations to books,” he said. I am in agreement with Eugene that citations to books from journals are a rich resource for retrieval and analysis of books, but these data are not well known and deserve more attention. Currently, books are added as sources to Web of Science. In addition to adding individual monographs and book series as new sources to a citation index, one could consider putting more efforts into exploiting the “millions of citations to books” already available. This observation opens up a perspective toward enhancement of not-covered cited references to enhance their utility in literature retrieval, and possibly, in research assessment as well.
iv. In Eugene’s 1979 book Citation Indexing (Garfield, 1979), he emphasized caution in the use of absolute citation counts, noting that different fields exhibited distinctly different average rates of citation. He described this as the citation potential of a field, determined in large part by the average number of references in papers of a particular field. This was the original insight that led us to develop a new journal impact measure denoted as SNIP (Source Normalized Impact per Paper), which is a complementary measure of journal influence to the traditional journal impact factor (Moed, 2010; Waltman et al., 2013). SNIP uses the literature citing a journal to characterize the journal’s citation potential. Eugene’s concept of citation potential is the inspiration for many other so-called citing side normalization methods, including those of Zitt and Small (2008) and the recently introduced relative citation ratio of NIH (Hutchins et al., 2016).
v. In several essays, Eugene underlined the limitations of using journal impact factors to measure the performance of individual researchers, and indicated alternative approaches to the assessment of individuals (see, for instance, Garfield, 1996). In my 2017 monograph (Moed, 2017), I defend the position that there is increasing demand for researcher self-assessment using bibliometric data. Authors need sound bibliometric applications to check the indicator data calculated about themselves, decompose the indicators’ values, learn more about informetric indicators, and defend themselves against inaccurate calculation or invalid interpretation of indicators. I believe that the challenge is to make optimal use of the potentialities of the current information and communication technologies and to create an online application incorporating Eugene’s proposals for the evaluation of faculty (Garfield, 1983a,b). It could also include Robert K. Merton’s notions about the formation of a “reference group,” i.e., “the group with which individuals compare themselves, but to which they do not necessarily belong but aspire to” (Holton, 2004).
Main Criticisms Against the Use of Informetric Indicators in Research Assessment
I wish to express the following views, which are partly supportive and partly a counter-critique toward the criticisms of current practices in the use of research performance indicators.
• Calculating indicators at the level of an individual researcher and claiming they measure by themselves the individual’s performance, suggests a false precision. And in my view a strong bibliometric argument supports the claim that research tends to be teamwork, and more and more research articles are multi-authored. Hence, a bibliometric indicator based on authorship has only a limited value in assessing the contribution of an individual to a collective piece of work. A valid and fair assessment of individual research performance should take into account sufficient background knowledge on the particular role researchers played in the research presented in their publications, and by taking into account also other types on information on their performance.
• Societal value cannot be assessed in a politically neutral manner. The foundation of the criteria for assessing societal value is not a matter in which scientific experts including informetricians have a preferred status, but should eventually take place in the policy domain. One possible option is moving away from the objective to evaluate an activity’s societal value, toward measuring in a neutral manner researchers’ orientation toward any articulated, lawful need in society.
• Studies on changes in editorial and author practices under the influence of assessment exercises are most relevant and illuminative. But the issue at stake is not whether scholars’ practices change under the influence of the use of informetric indicators, but rather whether or not the application of such measures enhances research performance. I am fully aware that it is difficult to assess this; simply using the same bibliometric indicators in the evaluation of the outcomes of the process as those that were used in the assessment itself, would easily lead to circular arguments and magical thinking. It must also be noted that there are indeed clear traces of mere indicator manipulation with no positive effect on performance at all, especially related to journal impact factors (see, for instance, Reedijk and Moed, 2008 for typical examples). Eugene was aware of this and my understanding is that he fully supported a policy for the Web of Science to monitor such behavior and to punish publishers who are believed guilty of manipulation.
• A typical example of a constitutive effect is that research quality is more and more conceived as what citations measure. I fully agree that more empirical research on the size of constitutive effects is needed, although one should realize that one cannot look “inside the heads” of those who are actually using indicators. If there is a genuine constitutive effect of informetric indicators in quality assessment, one should not point the critique on current assessment practices merely toward informetric indicators as such, but rather toward any claim for an absolute status of a particular way to assess research quality, regardless of whether such a status is assigned to peer review or to indicator-based approaches.
• If the role of informetric indicators has become too dominant, it does not follow that the notion to intelligently combine peer judgments and indicators is fundamentally flawed and that indicators should be banned from the assessment arena. But it does show that the combination of the two methodologies has to be organized in a more balanced manner. A proper information exchange between informetricians as producers of indicators and evaluators and policy officials as users is a prerequisite and at the same time the Achilles heel of the successful application of informetric methods in research assessment.
• It is crucial that informetricians maintain their neutrality with respect to evaluative criteria or political values. The informetric component and the domain of evaluative or political values in an assessment are disentangled by distinguishing between quantitative-empirical, informetric evidences on the one hand, and an evaluative framework based on normative views on what constitutes research performance and which policy objectives should be achieved, on the other. Thus, the informetric domain is separate and deals especially with application, with attention to how informetric tools are used in practice, their benefits, and problems related to their use. In large part, the issues are technical, involving analytics, and data collection.
• In the proper use of informetric tools, an evaluative framework and an assessment model are indispensable. To the extent that in a practical application an evaluative framework is absent or implicit, there is a vacuum, that may be easily filled either with ad hoc arguments of evaluators and policy makers, or with un-reflected assumptions underlying informetric tools. Perhaps the role of such ad hoc arguments and assumptions has nowadays become too dominant. It can be reduced only if evaluative frameworks become stronger, and more actively determine which tools are to be used, and how. The notion of evaluative framework is further discussed in Section “Evaluative Frameworks.”
Alternative Approaches to the Assessment of Academic Research
I propose the following alternative approaches to the assessment of academic research.
• A key assumption in the assessment of academic research has been that it is not the potential influence or importance of research, but the actual influence or impact that is of primary interest to policy makers and evaluators. In my book, I address the question whether this is a valid assumption. Should it indeed be impact that is of primary interest, especially in academic research? I argue that an academic assessment policy is conceivable that rejects this assumption. It embodies a shift in focus from the measurement of performance itself to the assessment of preconditions for performance. It acknowledges that research quality or contribution to scholarly progress itself cannot be measured, but that factors can be identified that may or may not be favorable for achieving quality and progress, and that make performance more or less likely.
• Rather than using citations as an indicator of research importance or quality, they could provide a tool in the assessment of communication effectiveness, and express the extent to which researchers bring their work to the attention of a broad, potentially interested audience. This extent can in principle be measured with informetric tools. This view discourages the use of citation data as a principal indicator of importance or research quality. In this way, the meaning of citation-based indicators is founded primarily in the domain of scholarly communication and dissemination of information, two domains from which Eugene built up his information systems.
• The functions of publications and other forms of scientific-scholarly output, as well as their target audiences should be taken into account more explicitly than they have been in the past. Scientific-scholarly journals could be systematically categorized according to their function and target audience, and separate indicators could be calculated for each category. More sophisticated indicators of internationality of communication sources can be calculated than the journal impact factor and its variants. I believe that the potential—so well recognized and explored by Eugene—of bibliometric data for the measurement of performance-relevant characteristics of journals and other types of scholarly sources is far from being fully exploited. There are opportunities here for informetricians.
• One possible approach to the use of informetric indicators in research assessment is a systematic exploration of indicators as tools to set minimum performance standards. Using baseline indicators, researchers will most probably change their research practices as they are stimulated to meet the standards, but if the standards are appropriate and fair, this behavior will actually increase their performance and that of their institutions. These minimum standards relate to the above-mentioned preconditions to performance rather than to performance itself. This perspective focuses on the bottom side of the quality distribution. It is clear to me that the articulation of such minimum standards requires a lot of debate, both within the scholarly community and between the research and policy domains, but in any case I see opportunities here for informetricians to facilitate this debate, using the creativity Eugene has shown in his bibliometric activities.
• At the upper part of the quality distribution, it is perhaps feasible to distinguish entities which are “hors catégorie,” or “at Nobel Prize level.” Assessment processes focusing on the very top of the quality distributions could further operationalize the criteria for this qualification. Eugene himself is well known for making this point: that at extreme citation frequency there is a high correlation with Nobel Prize winning possibility or even probability, but certainly this is not causal (Garfield, 1990). I fully agree with Eugene. The point I want to make is that I would not recommend an approach in which in the upper segment of the quality distribution assessors seek to discriminate between “top” research and “good-but-not-top” research merely for the sake of discriminating per se, without a firm justification.
• Realistically speaking, rankings of world universities are here to stay. Academic institutions could, individually or collectively, seek to influence the various systems by formally sending to their creators a request to consider the implementation of a series of new features: more advanced analytical tools; more insight into how the methodological decisions influence rankings; and more information in the system about additional, relevant factors, such as teaching course language.
• In response to major criticisms toward current national research assessment exercises and performance-based funding formula, an alternative model would require less efforts, be more transparent, stimulate new research lines and reduce to some extent the Matthew Effect, according to which “the rich get richer and the poor get poorer,” a concept introduced by Robert K. Merton and Harriet Zuckerman (Merton, 1988). The basic unit of assessment in such an alternative model is the emerging research group rather than the individual researcher. Institutions submit emerging groups and their research programs, which are assessed in a combined peer review-based and informetric approach, applying minimum performance criteria. A funding formula is partly based on an institution’s number of acknowledged emerging groups. I fully realize that such a model cannot be easily implemented across all countries. Whether or not the model is feasible depends, among other factors, upon the policy context, and the overall state of the research infrastructure in a country. Its presentation in this contribution and in my monograph aims to illustrate that alternative approaches are at least thinkable, and perhaps, under certain conditions, practically feasible.
The notion of an evaluative framework can be further developed at two distinct analytical levels. The first is a scientific-scholarly foundation of an assessment approach and the informetric tools employed therein. As far as citation analysis is concerned, such a foundation can be said to be embodied in a “citation theory.” My new book does not focus on this subject. There is a vast literature on citation and other indicator theories. Harriet Zuckerman’s contribution to Eugene Garfield’s Memorial Event included in this Research Topic provides an excellent introduction (Zuckerman, under review)1.
The term evaluative framework in my new book refers to a specification of the qualitative principles and considerations that provide guidance to a concrete assessment process. A core element in an evaluative framework for the assessment of research performance is the specification of a performance criterion, in a set of propositions on what constitutes research quality or performance. From such propositions follow the indicators that should be used, and, in a next logical step, the data sources from which these are to be calculated.
To develop such a framework, the book proposes to study the various approaches to the assessment of “performance” or “quality” in other research disciplines, namely business studies measuring business performance, educational research assessing both student and teacher performance, psychological research measuring human performance, and even technical domains assessing technological performance. A core question would be: what can practitioners in the domain of research performance assessment learn from the debates and solutions explored in these other fields?
An author who claims that informetrics itself does not evaluate, and that actual assessments should be guided by an essentially extra-informetric evaluative framework should be cautious expressing his view on what such a framework should look like, because the danger exists that this would direct the attention too strongly toward his personal views rather than to the claim of the need of such a framework as such, and perhaps even give rise to confounding the principle with one particular realization of it.
But in order to stimulate the debate on evaluative frameworks in research assessment, it could be useful to give a few typical examples of possible elements in an evaluative framework, and an opinion piece is perhaps the right place to do so. Below I give three examples that relate to three distinct application contexts. The choice of indicators—as well as the underlying performance criterion—depend strongly upon the context: what is the unit of assessment; which quality dimension is to be assessed; what is the objective of the process? And what are relevant, general or “systemic” characteristics of the units of assessment?
(i) In an assessment process aiming to select from a set of early career scientists the best candidate for a tenured position, for me important criteria would be: integrity, impartiality; creativity; open mindedness; capability to reason at distinct analytical levels. These criteria form core elements of an evaluative framework to be used in this assessment. None of these can be assessed with bibliometric indicators, but require an in-depth interview, possibly informed by interview techniques. Evidently, in assessing professional competence, the ability to write and orally present would be important factors too. But bibliometric indicators such as publication counts and journal impact factors tend to be of little use to assess such aspects. Making a solid contribution to a paper in a good, specialist journal would be more significant than a co-authorship in a multi-team paper published in a high-impact factor outlet.
(ii) A national research assessment exercise of a large number of research groups in a particular science field could focus on the bottom rather than the top of the performance distribution, and identify activities in groups or subfields below a certain minimum level. This focus would set the evaluative framework. Peer review and bibliometrics could be combined by providing in an initial phase a core peer review committee with a bibliometric study presenting a condensed overview of all groups, and by using this information to select additional committee members who are experts in subfields about which the bibliometric study raises questions. In a later phase, given the committee’s need to focus its attention for practical reasons, the outcomes of the study could be used to select groups to be interviewed in on-site visits.
(iii) In some countries, science policies at a national level aim to stimulate domestic researchers to integrate in international networks, and expose their work to critical judgments of an international peer group by submitting papers to international, peer reviewed journals applying high-quality standards. This policy objective sets the evaluative framework, and specifies its evaluative criteria. Although such an objective would not make much sense in the UK, and although I am aware that several colleagues in my field disagree with me, I find the use of bibliometric/informetric indicators to operationalize the evaluative criteria mentioned in the statement above defensible in principle. But I do have serious doubts about the validity of the currently used indicators based on publication counts and journal impact measures derived from the currently available, multi-disciplinary databases to measure such aspects.
The practical realization of these proposals requires a large amount of informetric research and development. The book proposes several new directions for indicator development. They constitute important elements of a wider R&D program of applied evaluative informetrics. The further exploration of measures of communication effectiveness, minimum performance standards, new functionalities in research information systems, and tools to facilitate alternative funding formula, should be conducted in a close collaboration between informetricians and external stakeholders, each with their own domain of expertise and responsibilities.
The use of a well-documented and validated informetric method in an assessment process enables an evaluator to achieve a certain degree of standardization in the process, and to compare units of assessment against an independent yardstick. These characteristics are sometimes indicated with the term “objective.” Use of such a method reduces the risk that the outcomes of an assessment are biased in favor of particular external interests. This is one of the most positive features and justifications of the use of informetric or bibliometric indicators. Eugene was very well aware of this. But since he introduced the Journal Impact Factor as an “objective” tool to expand the journal coverage of his citation index independently of journal publishers, the landscape of scientific information providers and users has changed significantly.
I believe that there are reasons for concern with respect to the influence of business interests of the information industry upon the development of indicators. While, on the one hand, politicians and research managers at various institutional levels need valid and reliable fit-for-purpose metrics in the assessment of publicly funded research, there is, on the other hand, a tendency that indicators increasingly become tools in the business strategy of companies with product portfolios that may include underlying databases, social networking sites, or even indicator products. This may be true both for “classical” bibliometric indicators and for alternative metrics.
Sections “Main Criticisms against the Use of Informetric Indicators in Research Assessment” and “Alternative Approaches to the Assessment of Academic Research” of this contribution are largely based on the monograph Applied Evaluative Informetrics published by the author with Springer in September 2017, and these sections reuse selected paragraphs from this book. The author is most grateful to David Pendlebury for his kind willingness to present this contribution at the Eugene Garfield Memorial Event in Philadelphia, on September 15, 2017. He also acknowledges David for his useful comments on an earlier version of this article.
The author confirms being the sole contributor of this work and approved it for publication.
Conflict of Interest Statement
The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
- ^Zuckerman, H. (2018). The Sociology of Science and the Garfield Effect: Happy Accidents, Unpredictable Developments and Unexploited Potentials. (Under Review).
Braam, R. R., Moed, H. F., and van Raan, A. F. J. (1991). Mapping of science by combined co-citation and word analysis, I: structural aspects. J. Am. Soc. Inform. Sci. 42, 233–251. doi: 10.1002/(SICI)1097-4571(199105)42:4<233::AID-ASI1>3.0.CO;2-I
Callon, M., Courtial, J. P., Turner, W. A., and Bauin, S. (1983). From translations to problematic networks: an introduction to co-word analysis. Soc. Sci. Inform. 22, 191–235. doi:10.1177/053901883022002003
Garfield, E. (1990). Forecasting the Nobel Prize Winners: Some Caveats Are in Order, Vol. 4. The Scientist. Also published in: Garfield, E. (1991). Essays of an Information Scientist: Science Reviews, Journalism, Inventiveness and Other Essays, Vol. 14. Philadelphia: ISI Press, 382–383.
Hutchins, B. I., Yuan, X., Anderson, J. M., and Santangelo, G. M. (2016). Relative citation ratio (RCR): a new metric that uses citation rates to measure influence at the article level. PLoS Biol. 14:e1002541. doi:10.1371/journal.pbio.1002541
Keywords: citation analysis, research assessment, Eugene Garfield, co-citation analysis, journal impact factor, future perspective
Citation: Moed HF (2018) Eugene Garfield’s Influences upon the Future of Evaluative Informetrics. Front. Res. Metr. Anal. 3:5. doi: 10.3389/frma.2018.00005
Received: 13 December 2017; Accepted: 01 February 2018;
Published: 07 March 2018
Edited by:Staša Milojević, Indiana University Bloomington, United States
Reviewed by:Philipp Mayr, Leibniz Institut für Sozialwissenschaften (GESIS), Germany
Carlos Olmeda-Gómez, Universidad Carlos III de Madrid, Spain
Copyright: © 2018 Moed. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Henk F. Moed, firstname.lastname@example.org