Toward the Discovery of Citation Cartels in Citation Networks

on how to approach the subject


INTRODUCTION
Citation network is a kind of Social network that can be represented as a direct graph with nodes representing papers {P1, . . . , Pn} and edges e(Pi, Pj) between two nodes Pi and Pj denoting a co-citation relationship [1], when the paper Pi cites a paper Pj. This relationship is shown in the direct graph as an arrow directed from a node Pi to Pj. At the moment, the number of pure citations of scientific articles is becoming one of the most important measures of scientific impact and quality. Hence, the authors are trying to obtain as many citations as possible for their works by creating so-called citation cartels, where members cite each other in order to increase their own number of citations. Nowadays, this phenomenon is also more exposed by additional academic pressure which forces scientists to publish as many papers as possible. In contrast, when they do not publish enough, they suffer from losing their job or go down on the career scale. In line with this, the concept "publish or perish" [2,3] has been introduced which has added an additional strength to the scientific competitiveness. Moreover, many measures of scientific impact based on citations are stepping into reality [4][5][6][7][8]. By the same token, bibliographical databases [9] as, for example, Web of Science, Scopus, DBLP, are covering more and more data which are later used for evaluations of researchers and their institutions.
In a nutshell, the modern tools for estimating the quality of research work analysis faces scientists with working hard in order to hold their working positions. Additionally, every year, many more researchers join the scientific community (e.g., scientists from China and India). With so many people in science, many new ways for increasing their publicity have been introduced. Especially, lowly ranked researchers are trying to obtain citations synthetically [10]. Thus, new ways, some legitimate others not, for achieving more citations have been developed. Interdisciplinary research collaborations [11] and motivators for work [12] of course help in this endeavor, and rightfully so. But there are also people, sometimes collaborators [13], sometimes friends and colleagues, and sometimes third party, who are increasingly often citing each other inside citation networks to increase the impact of their papers by establishing citation cartels.
In this paper, we try to expose the problem of detecting citation cartels inside citation networks, visualize the random generated author citation network and show how to discover a citation cartel inside the citation networks.

THE CITATION RACE
In these days, a lot of people argue that scientific publishing has become a kind of race. Every researcher would like to be reputed and well-known in the scientific world. Some years ago, researchers, especially from the prominent institutions, were the most famous in science but with the arrival of the Internet the knowledge started to spread around the world, while many more people have gained access to data and publications. Therefore, more and more people have decided to work in the scientific community. Consequently, ranking of researchers has become more rigorous. As stated before, many metrics were developed for assessing researchers. On the other hand, the Matthew effect by Perc [14], asserting that people who have huge resources, a good citation pool and connections, are much more awarded than people without these, has been much more visible recently. Additionally, social networks also help some researchers to promote their works [15,16].
It is worth mentioning that the citation race is much more visible nowadays than it was in the past. However, researchers have found many ways to promote their papers and raise the number of citations of their articles. Former professional road bike rider Hamilton [17] exposed in his book "The Secret Race: Inside the Hidden World of the Tour de France" tricks which riders used to improve their results in cycling. The most words were given to blood doping and the use of many medications. A very similar story was also depicted by Millar [18]. The question is if scientific publishing is not a kind of doping as well? After analyzing a bunch of papers, it seems that there are many patterns and complex connections in citation networks. In the next sections, we will outline our concept of citation cartels.

CITATION CARTELS
Citation networks describe relationships between researchers and papers connected with reasonable citation relationships. These networks are a useful way for analyzing the hidden relationships, e.g., information about the citation behavior of researchers. In line with this, many questions can be asked such as, for instance, how authors select which publications to cite.
In the past, the citation process was fair. An intention to cite some paper follows a principle saying "the right cites are only those obtained from unknown readers." In this case, readers recognize the author's work at first, instead of recognizing the author earlier. However, each cite means an acknowledgment for an author's good work while, at the same time, it increases an author's reputation. This fact has caused that the fair citation game has been becoming more and more unfair. Finally, almost all rules have been ignored in the struggle for citation.
Consequently, the citation cartels have been established in order to make the difference between the quality of the scientists, measured by the number of cites, higher. The concept of citation cartels was firstly exposed in 1999 in an essay by Franck [19] that defined this phenomenon as groups of Editors and Journals working together for mutual benefit. Actually, this definition refers to Editors that were using the inter-journal cites to increase the Impact Factors of their Journals. Recently, the citation cartels have also addressed other relationships, like Editor to authors or authors to authors.
In line with this, new levels of citation networks have emerged. These can be represented in the sense of multi-layer graphs [20] (Figure 1, left). For instance, an original paper citation network is represented as a direct graph with nodes denoting papers and edges describing a relation IsCitedThe(Pi, Pj) meaning the paper Pi is citing a paper Pj. Citation networks, the authors' collaboration, as well as author citation networks can be extracted from the original paper. Interestingly, each edge in the author citation networks (Figure 1, right) includes also a weight denoting the numbers of cites that author Ai gives to author Aj and vice versa. Finally, the citation cartels are derived from the author citation networks, where the number of inter-citations between two nodes needs to exceed some threshold value in a clique of order 2. In this sense, the discovery of citation cartels in networks is much the same as the discovery of communities in networks [21,22], although with a twist in that members of citation cartels might do their best to stay undetected.

DISCOVERING THE CARTELS
Discovering the citation cartels in multi-level graphs can be implemented easily by using the modern semantic web tools for manipulating the knowledge on the Internet, i.e., Resource Description Framework (RDF) and RDF query language (SPARQL). Knowledge stored in RDF format in the form of structured meta-data is stored as graphs, where nodes represent resources consisting of triplets "subject-predicateobject." RDF uses graphs as a formal basis [23], where subjects are connected with their corresponding objects using edges. Edges denote property relations. The SPARQL query language provides semantic query language for retrieving and handling data stored in RDF format. This language was used for discovering the citation cartels in this study. The results of simulations on an artificially generated citation network showed that it was easy to discover citation cartels in this network using the mentioned semantic web tools. Let us suppose two researchers, i.e., researcher A1 and researcher A2, who published two of their own papers P1 and P2, respectively. In the first paper P1 published by author A1, the reference is on the second paper P2 published by the author A2. Here, two scenarios can be taken into account, as follows: 1. when the paper P1 is not coauthored by author A2 and vice versa (Figure 2, left), and 2. when authors A1 and A2 are coauthors of the other mutual papers (Figure 2, right).
In both cases there is a suspicion that both authors belong to a citation cartel, especially when the number of their mutual citations is higher. However, the first scenario is more restrictive and, therefore, a more sophisticated method of discovering the citation networks. The illustrated scenarios are described easily using a firstorder predicate logic, where concepts like authors and papers are connected by corresponding relations. Let us assume that a relation connects an author (also subject) to his/her paper (object) using the property (predicate). Then, the relation can be defined as a predicate connecting the subject with the object in general. The relation can be described as a triplet subjectpredicate-object. In predicate logic, this triplet can also be written as predicate(subject, object).
The corresponding citation network is constructed by the following relations: • IsCitedThe: The paper Pi is cited by the paper Pj in the paper citation networks. • IsAuthorOf : The paper Pi is co-authored by the author Aj and connects paper citation networks with author citation networks. • NumberOfCites: Determines the number of cites that the author Ai gives to the author Aj and connects the author citation networks with the citation cartels.
Let us notice that the relation NumberOfCites represents a filter allowing that authors Ai and Aj belong to a citation cartel, when the where Threshold determines the level needed to declare the existence of a citation cartel. In place of the threshold value, other measures can also be used for filtering, e.g., the ratio between the number of cites that the author Ai gives the author Aj and the number of all cites by the author Aj, and vice versa. The citation cartel for Scenario 1 can be detected by defining the following SPARQL query formally expressed in first-order predicate logic as while for Scenario 2 as Note that the last conjunction of predicate isAuthorOf denotes all papers written or not by either authors A or B. In order to declare the existence of a citation cartel, the number of inter-citations given by authors A1 and A2 must be higher than the threshold value of 10.  (2) is valid. In this case, they are not familiar with each other. Therefore, this kind of a citation cartel is hard to detect. Scenario 2, depicted at the right, presents a situation, where authors A1 and A2 have at least one mutual paper, i.e., conjunctive relation IsAuthorOf(A1, P3) ∧ IsAuthorOf(A2, P3) in Equation (3) holds.
In this case, authors are familiar with each other and therefore deliberately help each other in achieving higher citation counts.

CONCLUDING REMARKS
Today, citation cartels, where members of these mutually cited papers of authors with which they are known or not known, have become reality in the research domain. In the everyday harder citation race, the cartels imply an easy way to obtain scientific excellence by increasing the number of one's own citations. The aim of this paper is to discover the citation cartels using the modern semantic web tools for manipulating the knowledge on the Internet, i.e., RDF and SPARQL. However, the proclamation that two authors create a citation cartel is very dangerous, because we cannot ever be sure that this indictment really holds in the real-world. We can only indicate that there is a high probability of citation cartel existence, but this fact needs to be confirmed using a detailed analysis.
In general, our purpose is not to prevent this phenomenon or to discredit authors that could be caught in the citation cartel incidentally, but to show that the citation cartels do really exist, and that all responsible for publishing papers, Editors and Reviewers have become aware of this.

AUTHOR CONTRIBUTIONS
IF Jr., IF, and MP designed and performed the research as well as wrote the paper.

FUNDING
This research was supported by the Slovenian Research Agency (Grants P5-0027 and J1-7009).