<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Big Data</journal-id>
<journal-title>Frontiers in Big Data</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Big Data</abbrev-journal-title>
<issn pub-type="epub">2624-909X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fdata.2019.00039</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Big Data</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Is Performance of Scholars Correlated to Their Research Collaboration Patterns?</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Jeon</surname> <given-names>Hyeon-Ju</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/799969/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Lee</surname> <given-names>O-Joun</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/800687/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Jung</surname> <given-names>Jason J.</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/147634/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Department of Computer Engineering, Chung-Ang University</institution>, <addr-line>Seoul</addr-line>, <country>South Korea</country></aff>
<aff id="aff2"><sup>2</sup><institution>Future IT Innovation Laboratory, Pohang University of Science and Technology</institution>, <addr-line>Pohang-si</addr-line>, <country>South Korea</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Feng Xia, Dalian University of Technology (DUT), China</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Jiaying Liu, Dalian University of Technology (DUT), China; Yi Bu, Indiana University, United States</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Jason J. Jung <email>j3ung&#x00040;cau.ac.kr</email></corresp>
<fn fn-type="other" id="fn001"><p>This article was submitted to Data Mining and Management, a section of the journal Frontiers in Big Data</p></fn></author-notes>
<pub-date pub-type="epub">
<day>05</day>
<month>11</month>
<year>2019</year>
</pub-date>
<pub-date pub-type="collection">
<year>2019</year>
</pub-date>
<volume>2</volume>
<elocation-id>39</elocation-id>
<history>
<date date-type="received">
<day>31</day>
<month>08</month>
<year>2019</year>
</date>
<date date-type="accepted">
<day>15</day>
<month>10</month>
<year>2019</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2019 Jeon, Lee and Jung.</copyright-statement>
<copyright-year>2019</copyright-year>
<copyright-holder>Jeon, Lee and Jung</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract><p>This study aims to validate whether the research performance of scholars correlates with how the scholars work together. Although the most straightforward approaches are centrality measurements or community detection, scholars mostly participate in multiple research groups and have different roles in each group. Thus, we concentrate on the subgraphs of co-authorship networks rooted in each scholar that cover (i) overlapping of the research groups on the scholar and (ii) roles of the scholar in the groups. This study calls the subgraphs &#x0201C;collaboration patterns&#x0201D; and applies subgraph embedding methods to discover and represent the collaboration patterns. Based on embedding the collaboration patterns, we have clustered scholars according to their collaboration styles. Then, we have examined whether scholars in each cluster have similar research performance, using the quantitative indicators. The coherence of the indicators cannot be solid proofs for validating the correlation between collaboration and performance. Nevertheless, the examination for clusters has exhibited that the collaboration patterns can reflect research styles of scholars. This information will enable us to predict the research performance more accurately since the research styles are more consistent and sustainable features of scholars than a few high-impact publications.</p></abstract>
<kwd-group>
<kwd>bibliographic network embedding</kwd>
<kwd>research performance estimation</kwd>
<kwd>research group analysis</kwd>
<kwd>research collaboration</kwd>
<kwd>collaboration pattern discovery</kwd>
</kwd-group>
<contract-sponsor id="cn001">National Research Foundation of Korea<named-content content-type="fundref-id">10.13039/501100003725</named-content></contract-sponsor>
<counts>
<fig-count count="3"/>
<table-count count="2"/>
<equation-count count="9"/>
<ref-count count="30"/>
<page-count count="10"/>
<word-count count="6901"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>1. Introduction</title>
<p>As academic societies are getting broader and more subdivided, various intelligent services for scholars have been required (e.g., a recommendation for collaborators, research topics, or journals). For those services, measurements for evaluating performance of scholars, quality of journals, or prominence of research topics are essential and fundamental components.</p>
<p>Therefore, there have been various studies for defining quantitative indicators to evaluate and compare entities in the academia (Hirsch, <xref ref-type="bibr" rid="B13">2005</xref>, <xref ref-type="bibr" rid="B14">2010</xref>; Sidiropoulos et al., <xref ref-type="bibr" rid="B26">2007</xref>; Wu, <xref ref-type="bibr" rid="B29">2010</xref>; Galam, <xref ref-type="bibr" rid="B9">2011</xref>). These indicators have mostly employed (i) count-based and (ii) network-based approaches. The count-based approach comes from intuitive assumptions: a highly-cited scholar/paper/journal might have higher quality than lowly-cited ones, or a scholar published a larger number of papers might have higher performance than the others. However, the assumptions are not &#x0201C;always&#x0201D; correct. First, if a scholar publishes lots of low-quality papers with self-citations, he/she will ostensibly get a lot of highly-cited articles. Also, the number of publications and citations have a dependency on the activeness of research fields. Besides, even if two scholars have the same number of citations, we cannot answer whether the two scholars have similar research performance.</p>
<p>In order to avoid this problem, various indicators have been proposed to evaluate the academic entities based on their influence (i.e., impact in academic communities). They measure the influence of scholars or papers based on bibliographic networks (e.g., co-authorship networks or citation networks). The network-based approaches mostly use centrality measurements to estimate the significance of scholars/papers in the research communities. Nevertheless, estimating the significance is too na&#x000EF;ve to reflect what kinds of roles the scholars/papers have in the research communities; e.g., whether a scholar is a principal investigator (PI) of a research group or an independent researcher participating in numerous research projects.</p>
<p>To improve the network-based indicators, various studies (Ganesh et al., <xref ref-type="bibr" rid="B10">2016</xref>; Ganguly and Pudi, <xref ref-type="bibr" rid="B11">2017</xref>) have proposed methods for learning representations of scholars/papers based on structures of the bibliographic networks. However, these methods mostly consider only the first-order proximity for embedding entities in the bibliographic networks. In the case of scholars, the first-order proximity can reflect collaborators of each scholar. Nevertheless, the proximity cannot consider (i) how a group of scholars work together and (ii) what kinds of roles each scholar has in the research group. We assume that characteristics of research groups affect the research of each scholar; not only on the research performance but also on styles of scholars or types of publications. Based on this assumption, we attempt to discover and represent how scholars work together. Then, this pattern of research collaboration might enable us to predict and analyze the performance of the scholars.</p>
<p>Thereby, in this study, we attempt to validate a research question: research collaboration patterns of scholars are correlated to their research performance. To discover and compare the collaboration patterns, we propose a method for learning representations of structural features of co-authorship networks. First, based on subgraph discovery techniques, we extract and describe the collaboration patterns rooted in each scholar. The collaboration patterns are embedded using Word2Vec-based graph embedding methods regarding their scale and adjacency. Finally, we have verified the research question by clustering scholars according to their collaboration patterns. We have examined each cluster for whether scholars in the cluster have coherence in terms of the research performance.</p>
</sec>
<sec id="s2">
<title>2. Related Work</title>
<p>In this section, we introduce the existing approaches for assessing the research performance. And, we also present the existing studies that attempted to validate correlation between collaborations of scholars and their research performance, even though they merely applied centrality measurements to represent the collaborations.</p>
<sec>
<title>2.1. Count-Based Indicators</title>
<p>Papers are a channel that most directly exposes performance of scholars. However, each paper has a different quality, and it is challenging to assess its quality one-by-one. A massive amount of papers are published every year (e.g., 42,311 papers were indexed in DBLP during August 2019), and the papers deal with too diverse research area. To measure the quality of papers, the number of citations is one of the most effective indicators. Therefore, various indicators have been proposed to measure the research performance by considering both the number of citations and papers. Among them, one of the most widely-used indicators is <italic>h</italic>-index (Hirsch, <xref ref-type="bibr" rid="B13">2005</xref>) that considers a ratio of the number of citations for the number of papers. The <italic>h</italic>-index is a more effective method than simply comparing the number of papers and citations, since the <italic>h</italic>-index gives different weights according to quality of papers.</p>
<p>In order to more accurately measure the performance of scholars, other indicators have been proposed to reflect more diverse features of the research performance. First, the <italic>h</italic>-index counts citations of a few top papers. However, it is important to consider overall performance; e.g., <italic>g</italic>-index (Egghe, <xref ref-type="bibr" rid="B6">2006</xref>), <italic>h</italic><sub>(2)</sub>-index (Kosmulski, <xref ref-type="bibr" rid="B17">2006</xref>), <italic>w</italic>-index (Wu, <xref ref-type="bibr" rid="B29">2010</xref>), <italic>EM</italic>-index (Bihari and Tripathi, <xref ref-type="bibr" rid="B2">2017</xref>), and so on. Second, indicators should reflect that co-authors have different levels of contribution for each paper; e.g., <inline-formula><mml:math id="M1"><mml:mover accent="true"><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mo>&#x00304;</mml:mo></mml:mover></mml:math></inline-formula>-index (Hirsch, <xref ref-type="bibr" rid="B14">2010</xref>), <italic>gh</italic>-index (Galam, <xref ref-type="bibr" rid="B9">2011</xref>), <italic>Ab</italic>-index (Biswal, <xref ref-type="bibr" rid="B3">2013</xref>), and so on. Lastly, recent papers have a relatively smaller number of citations than older ones. Therefore, indicators have to consider publication ages of papers; e.g., <italic>v</italic>-index (Vaidya, <xref ref-type="bibr" rid="B27">2005</xref>), <italic>AR</italic>-index (Jin et al., <xref ref-type="bibr" rid="B16">2007</xref>), contemporary <italic>h</italic>-index (Sidiropoulos et al., <xref ref-type="bibr" rid="B26">2007</xref>), Trendy <italic>h</italic>-index (Sidiropoulos et al., <xref ref-type="bibr" rid="B26">2007</xref>), and so on.</p>
<p>However, most of the count-based indicators only concentrate on results of the research. Measuring the performance based on a part of papers cannot reflect whether the performance is sustainable or not. Co-authorship networks represent not only the performance of scholars but also the way how the scholars collaborate for the results. Thus, the collaboration of a scholar is closer to research capacity, which is an expectation of the performance, than the number of citations or papers. Also, it will enable us to analyze how we can get high research performance. Additionally, the number of citations or papers is also dependent on activeness of research areas. This dependency causes non-interoperability of the quantitative indicators between research areas.</p>
<p>Abbasi et al. (<xref ref-type="bibr" rid="B1">2009</xref>) have proposed <italic>RC</italic>-index and <italic>CC</italic>-index for enhancing the count-based indicators by considering quantity of collaborations and quality of collaborators. These indicators evaluate scholars based on their collaboration activities, and the activities are assessed based on citations for co-authored papers. Nevertheless, they only evaluate collaborators of each scholar rather than consider how they work together. The following section introduces indicators for measuring research performance based on collaborations with co-authorship networks in detail.</p>
</sec>
<sec>
<title>2.2. Network-Based Indicators</title>
<p>Although there have been various studies for analyzing collaborations of scholars, they only concentrated on measuring centrality [e.g., closeness centrality (Sabidussi, <xref ref-type="bibr" rid="B24">1966</xref>), betweenness centrality (Freeman, <xref ref-type="bibr" rid="B8">1977</xref>), PageRank (Haveliwala, <xref ref-type="bibr" rid="B12">2002</xref>), and so on] of each scholar in co-authorship networks. Obviously, the node centrality in social networks indicates how much influence the node has. Nevertheless, these centrality measurements are also affected by the quantitative inequality between research fields. Furthermore, the centrality cannot reflect collaboration styles and organizational cultures of scholars and their research groups. Recently, most of the studies are conducted by collaborations of various-scaled research groups. Therefore, organizations and cultures of the research groups will be key features that affect performance of scholars.</p>
<p>Newman (<xref ref-type="bibr" rid="B22">2001</xref>) analyzed structures of co-authorship networks. After this attempt, various studies applied social network analysis techniques on co-authorship networks, mainly focused on the centrality of scholars. Erjia and Ying (<xref ref-type="bibr" rid="B7">2009</xref>) validated that centrality of scholars and the number of their citations are significantly related. In their study, betweenness centrality and the number of citations showed the highest correlation. However, both of the measurements can be affected by the number of papers. Therefore, a few studies employed more reasonable indicators to validate the correlation between centrality and performance. A few studies (Yan and Ding, <xref ref-type="bibr" rid="B30">2011</xref>; Waltman and Yan, <xref ref-type="bibr" rid="B28">2014</xref>) validated correlation between PageRank and academic influence of scholars. Ding and Cronin (<xref ref-type="bibr" rid="B5">2011</xref>) also attempted to verify that the number of citations for papers cannot reflect influence of scholars on academic societies by measuring PageRank in citation networks. Bordons et al. (<xref ref-type="bibr" rid="B4">2015</xref>) showed correlation between centrality of scholars and their <italic>g</italic>-index (Egghe, <xref ref-type="bibr" rid="B6">2006</xref>).</p>
<p>As validated in the existing studies, the network-based indicators are correlated to research performance of scholars. However, methods for estimating performance based on co-authorship networks have been limited to simply measuring the centrality. To detailedly reflect collaboration relationships, a few studies concentrated on that scholars mainly collaborate with a few steady partners. Reyes-Gonzalez et al. (<xref ref-type="bibr" rid="B23">2016</xref>) classified scholars into research groups according to frequency of co-authoring. Then, they verified that similar research groups have similar performance. This method is valuable for comparing performance of research groups, not for assessing performance of individual scholars. The existing studies cannot consider that scholars participate in multiple research groups, and members of the groups have different roles and significance. In this perspective, we focus on collaboration patterns in co-authorship networks.</p>
</sec>
</sec>
<sec id="s3">
<title>3. Representing Collaboration Patterns</title>
<p>This study aims to (i) discover collaboration patterns of scholars and (ii) represent the collaboration patterns. We assume that the collaboration patterns are correlated to research performance of the scholars and implicitly reflect influence of the scholars in academia. First, we propose a method for discovering the collaboration patterns from co-authorship networks, in section 3.1. To detect and describe relationships between each scholar and his/her collaborators, we employ the WL (Weisfeiler-Lehman) relabeling process. Then, to simplify comparisons between the collaboration patterns, we adopt graph embedding techniques. Section 3.2 describes a method for learning representations of the collaboration patterns.</p>
<p>In this paper, we analyze collaborations based on co-authorship networks, which represent the frequency of co-authored publications among scholars. Although there are various kinds of research collaborations (e.g., co-organizing seminars/workshops/conferences, editing journals, planning/operating research projects, and so on) rather than the co-authoring, publications and co-authorships are the most explicit results and forms of collaborations in the research.</p>
<p>As shown in <xref ref-type="fig" rid="F1">Figure 1A</xref>, the co-authorship network is a social network among scholars. In this network, each node indicates a scholar, each edge represents existence of collaborations between two scholars, and a weight on edge is as with frequency of the collaborations between the scholars. Thus, the co-authorship network is an undirected graph. Based on the network, we can analyze how each scholar is connected to other scholars and how each research group works together. The co-authorship network can be defined as:</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>Discovering collaboration patterns from co-authorship networks. In both <bold>(A)</bold> and <bold>(B)</bold>, a node indicates a scholar, and each edge denotes whether two corresponding scholars have ever collaborated. A weight of edge in the co-authorship network indicates proximity between two scholars, which are measured by the number of co-authored publications. <bold>(B)</bold> presents an example of collaboration pattern rooted in &#x0201C;Jason.&#x0201D; To discover collaboration patterns, we classify edges according to proximity distribution of each scholar. Although two scholars share a common collaborator, importance of the collaborator can be different.</p></caption>
<graphic xlink:href="fdata-02-00039-g0001.tif"/>
</fig>
<p>Definition 1 (Co-Authorship Network). <italic>Suppose that</italic> <italic>n</italic> <italic>is the number of scholars that are in bibliography data. When</italic> <inline-formula><mml:math id="M2"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">N</mml:mi></mml:mrow></mml:math></inline-formula> <italic>indicates a co-authorship network</italic>, <inline-formula><mml:math id="M3"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">N</mml:mi></mml:mrow></mml:math></inline-formula> <italic>can be described as a symmetric matrix</italic> &#x02208; &#x0211D;<sup><italic>n</italic>&#x000D7;<italic>n</italic></sup><italic>. Each element of</italic> <inline-formula><mml:math id="M4"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">N</mml:mi></mml:mrow></mml:math></inline-formula> <italic>denotes a degree of proximity between two corresponding scholars. This can be formulated as:</italic></p>
<disp-formula id="E1"><label>(1)</label><mml:math id="M5"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:mrow><mml:mi mathvariant="-tex-caligraphic">N</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mtable style="text-align:axis;" equalrows="false" columnlines="none none none none none none none none none" equalcolumns="false" class="array"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mtd><mml:mtd><mml:mo>&#x022EF;</mml:mo></mml:mtd><mml:mtd><mml:msub><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mo>&#x022EE;</mml:mo></mml:mtd><mml:mtd><mml:mo>&#x022F1;</mml:mo></mml:mtd><mml:mtd><mml:mo>&#x022EE;</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mi>n</mml:mi><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mtd><mml:mtd><mml:mo>&#x022EF;</mml:mo></mml:mtd><mml:mtd><mml:msub><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mi>n</mml:mi><mml:mo>,</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr></mml:mtable></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p><italic>where</italic> <italic>a</italic><sub><italic>i, j</italic></sub> <italic>indicates proximity of</italic> <italic>s</italic><sub><italic>i</italic></sub> <italic>for</italic> <italic>s</italic><sub><italic>j</italic></sub> <italic>when</italic> <inline-formula><mml:math id="M6"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">S</mml:mi></mml:mrow></mml:math></inline-formula> <italic>is a universal set of scholars that are in bibliography data and</italic> <italic>s</italic><sub><italic>i</italic></sub> <italic>is the</italic> <italic>i</italic><italic>-th element of</italic> <inline-formula><mml:math id="M7"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">S</mml:mi></mml:mrow></mml:math></inline-formula>.</p>
<p>In the co-authorship network, relationships between scholars are complicatedly entangled. Graph theory-based measurements can reflect only few aspects of research collaborations (e.g., who are leading research groups). However, to reveal collaboration styles of research groups and scholars, we have to analyze structural features of the research groups and positions of each scholar in the groups. Especially, scholars can participate in multiple research groups at the same time. The existing network-based indicators have difficulty for reflecting various research groups that are overlapped on a scholar.</p>
<p>To deal with this problem, we attempt to extract and describe structures of research groups in multiple scales. The structures are described by collaborators of each scholar on various scales (i.e., <italic>n</italic>-hop connectivity), using subgraph discovery techniques. We assume that the subgraphs of co-authorship networks represent collaboration patterns between scholars. <xref ref-type="fig" rid="F1">Figure 1B</xref> presents an example for extracting a collaboration pattern of &#x0201C;Jason&#x0201D; from the co-authorship network in <xref ref-type="fig" rid="F1">Figure 1A</xref>. The transformation from <xref ref-type="fig" rid="F1">Figures 1A,B</xref> shows reassigning a label rooted in &#x0201C;Jason&#x0201D; based on labels of its collaborators, which only represent one-hop connectivity. This approach has a common point with ego-centered citation networks (Huang et al., <xref ref-type="bibr" rid="B15">2018</xref>), since they commonly concentrate on neighborhoods of a target node in bibliographic networks. Therefore, <xref ref-type="fig" rid="F1">Figure 1B</xref> can be called as &#x0201C;ego-centered co-authorship network.&#x0201D; However, as different from the ego-centered network, we iterate the transformation from <xref ref-type="fig" rid="F1">Figures 1A,B</xref> for each scholar. According to the iteration, coverage of collaboration patterns becomes wider. This approach enables us to represent structures of research groups overlapped on each scholar with various scales. The collaboration pattern is defined as:</p>
<p>Definition 2 (Collaboration Pattern). <italic>Suppose that</italic> <inline-formula><mml:math id="M8"><mml:msubsup><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula> <italic>indicates a collaboration pattern of</italic> <italic>s</italic><sub><italic>i</italic></sub> <italic>at degree</italic> <italic>d</italic> &#x02208; [0, <italic>D</italic>]<italic>. Collaboration patterns rooted in</italic> <italic>s</italic><sub><italic>i</italic></sub> <italic>reflect (i) collaborators of</italic> <italic>s</italic><sub><italic>i</italic></sub> <italic>and (ii) significance of each collaborator for</italic> <italic>s</italic><sub><italic>i</italic></sub><italic>. Also, the degree lets us know (iii) coverages of the collaboration patterns, which are observation ranges for discovering the patterns. To represent this information iteratively, we describe a collaboration pattern on degree</italic> <italic>d</italic> <italic>based on (i) itself and (ii) its neighborhoods on degree</italic> <italic>d</italic> &#x02212; 1<italic>. When</italic> <italic>a</italic><sub><italic>i, j</italic></sub>, <italic>a</italic><sub><italic>i, k</italic></sub>, <italic>a</italic><sub><italic>i, l</italic></sub> <italic>are only non-zero elements within</italic> &#x02200;<italic>a</italic><sub><italic>i</italic>, &#x0002A;</sub>, <inline-formula><mml:math id="M9"><mml:msubsup><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula> <italic>can be formulated as:</italic></p>
<disp-formula id="E2"><label>(2)</label><mml:math id="M10"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:msubsup><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mrow><mml:mo stretchy="true">&#x02329;</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>;</mml:mo><mml:msubsup><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="true">&#x0232A;</mml:mo></mml:mrow><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>In the following section, we propose a method for extracting the collaboration patterns from the co-authorship networks.</p>
<sec>
<title>3.1. Discovering Collaboration Patterns</title>
<p>In this study, we extract the collaboration patterns from the co-authorship network using the WL (Weisfeiler-Lehman) relabeling process, which comes from the WL graph isomorphism testing (Shervashidze et al., <xref ref-type="bibr" rid="B25">2011</xref>). The WL relabeling can discover multi-scaled subgraphs rooted in each node by iteratively assigning a new label based on neighbors of the node. The variety of scales lets us know the structures of research groups of each scholar from various viewpoints.</p>
<p>Although the existence of edge provides information about which scholars are connected, collaborations among scholars also have a degree of significance. Considering which collaborators are significant for each scholar will let us know (i) roles of scholars in their research groups and (ii) structures of research groups. Even if a scholar has relationships with multiple other scholars, it does not mean that all the relationships are equivalent. Therefore, the collaboration patterns should be described regarding proximity between scholars. We describe collaboration patterns of scholars based on the (i) adjacency and (ii) distribution of proximity between the scholars. These two features provide the following information.</p>
<list list-type="bullet">
<list-item><p>Adjacency: The adjacency between scholars in the co-authorship networks indicates that they have collaborated more than one publication.</p></list-item>
<list-item><p>Proximity: Among the collaborators, the proximity enables us to discriminate which ones are more significant or valuable collaborators. Also, a case that a few scholars lead most of the studies in a research group is different from another case that all the scholars equally participate in their research. Thereby, the distribution of proximity can reflect even organizational cultures of the research groups.</p></list-item>
</list>
<p>However, the WL relabeling process cannot consider the degree of proximity (i.e., collaboration frequency), but only the adjacency. To solve this issue, Lee (<xref ref-type="bibr" rid="B18">2019</xref>) has proposed a modification of the WL relabeling by labeling edges according to the proximity. We apply this method for discovering collaboration patterns of scholars. Similar to the existing method (Lee, <xref ref-type="bibr" rid="B18">2019</xref>; Lee and Jung, <xref ref-type="bibr" rid="B19">2019</xref>), we classify relationships between scholars into three categories: high (<inline-formula><mml:math id="M11"><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">H</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>), medium (<inline-formula><mml:math id="M12"><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">M</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>), and low (<inline-formula><mml:math id="M13"><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">L</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>) proximity, based on the frequency of collaborations. Nevertheless, research fields and communities have a difference in the amount of collaboration among scholars. Thus, we set adaptive thresholds between the categories according to the distribution of collaboration frequency. When we discover subgraphs rooted in <italic>s</italic><sub><italic>i</italic></sub>, an edge between <italic>s</italic><sub><italic>i</italic></sub> and <italic>s</italic><sub><italic>j</italic></sub> (<italic>a</italic><sub><italic>i, j</italic></sub>) can be labeled as:</p>
<disp-formula id="E3"><label>(3)</label><mml:math id="M14"><mml:mrow><mml:msub><mml:mi>a</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mtable columnalign='left'><mml:mtr columnalign='left'><mml:mtd columnalign='left'><mml:mrow><mml:msub><mml:mi mathvariant="-tex-caligraphic">H</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo></mml:mrow></mml:mtd><mml:mtd columnalign='left'><mml:mrow><mml:mtext>if&#x000A0;</mml:mtext><mml:msub><mml:mi>a</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>&#x0003E;</mml:mo><mml:msub><mml:mi>&#x003BC;</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:mi>&#x003B8;</mml:mi><mml:mo>&#x022C5;</mml:mo><mml:msub><mml:mi>&#x003C3;</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr columnalign='left'><mml:mtd columnalign='left'><mml:mrow><mml:msub><mml:mi mathvariant="-tex-caligraphic">L</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo></mml:mrow></mml:mtd><mml:mtd columnalign='left'><mml:mrow><mml:mtext>else&#x000A0;if&#x000A0;</mml:mtext><mml:msub><mml:mi>a</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>&#x0003C;</mml:mo><mml:msub><mml:mi>&#x003BC;</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x02212;</mml:mo><mml:mi>&#x003B8;</mml:mi><mml:mo>&#x022C5;</mml:mo><mml:msub><mml:mi>&#x003C3;</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr columnalign='left'><mml:mtd columnalign='left'><mml:mrow><mml:msub><mml:mi mathvariant="-tex-caligraphic">M</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo></mml:mrow></mml:mtd><mml:mtd columnalign='left'><mml:mrow><mml:mtext>otherwise</mml:mtext><mml:mo>.</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mrow></mml:mrow></mml:math></disp-formula>
<p>where &#x003BC;<sub><italic>i</italic></sub> indicates the average number of collaboration between <italic>s</italic><sub><italic>i</italic></sub> and his/her collaborators, &#x003C3;<sub><italic>i</italic></sub> denotes the standard deviation for collaboration frequency of the collaborators, and &#x003B8; refers to a weighting factor for thresholds between the three categories. Thereby, where <inline-formula><mml:math id="M20"><mml:msubsup><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula> indicates a subgraph rooted in <italic>s</italic><sub><italic>i</italic></sub> at degree <italic>d</italic>, <inline-formula><mml:math id="M21"><mml:msubsup><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula> can be described by <inline-formula><mml:math id="M22"><mml:msubsup><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula> and subgraphs rooted in neighborhoods at degree <italic>d</italic> &#x02212; 1 in the three categories. This can be formulated as:</p>
<disp-formula id="E4"><label>(4)</label><mml:math id="M23"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:msubsup><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mrow><mml:mo stretchy="true">&#x02329;</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>;</mml:mo><mml:msubsup><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">H</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">M</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">L</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="true">&#x0232A;</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E5"><label>(5)</label><mml:math id="M24"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:mrow><mml:msubsup><mml:mi mathvariant="-tex-caligraphic">H</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>d</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mtext>&#x000A0;</mml:mtext><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msubsup><mml:mi>s</mml:mi><mml:mi>j</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>d</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>&#x0007C;</mml:mo><mml:msub><mml:mi>a</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msub><mml:mi mathvariant="-tex-caligraphic">H</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>}</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <inline-formula><mml:math id="M25"><mml:msubsup><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">H</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula>, <inline-formula><mml:math id="M26"><mml:msubsup><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">M</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula>, and <inline-formula><mml:math id="M27"><mml:msubsup><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">L</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula> denote sets of subgraphs adjacent with <inline-formula><mml:math id="M28"><mml:msubsup><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula> in high, medium, and low proximity, respectively. <xref ref-type="fig" rid="F1">Figure 1B</xref> illustrates an example of collaboration pattern, and Algorithm 1 presents all the procedures for discovering the research collaboration patterns, where <italic>N</italic>(<italic>s</italic><sub><italic>i</italic></sub>) indicates a set of collaborators of <italic>s</italic><sub><italic>i</italic></sub>. In Line 2 of Algorithm 1, &#x003BC;<sub><italic>i</italic></sub> and &#x003C3;<sub><italic>i</italic></sub> are used for considering which collaborators are more or less significant to <italic>s</italic><sub><italic>i</italic></sub> than the others. In Line 13, <italic>HASH</italic>(&#x000B7;) indicates the hash function for assigning identifiers for each collaboration pattern.</p>
<table-wrap position="float">
<label>Algorithm 1</label>
<caption><p>Proximity-aware WL relabeling process</p></caption>
<table frame="hsides" rules="groups">
<tbody>
<tr>
<td valign="top" align="left">1:</td>
<td valign="top" align="left"><bold>procedure</bold> WL<sc>RE</sc>LABELLING(<inline-formula><mml:math id="M29"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">N</mml:mi></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mi mathvariant="-tex-caligraphic">S</mml:mi></mml:mrow></mml:math></inline-formula>)</td>
</tr>
<tr>
<td valign="top" align="left">2:</td>
<td valign="top" align="left">&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;Set <inline-formula><mml:math id="M30"><mml:msub><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02190;</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:mi>N</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>|</mml:mo></mml:mrow></mml:mfrac><mml:mo>&#x000D7;</mml:mo><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mo>&#x02200;</mml:mo><mml:msub><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:mi>N</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:munder><mml:msub><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, <inline-formula><mml:math id="M31"><mml:msub><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02190;</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:mi>N</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>|</mml:mo></mml:mrow></mml:mfrac><mml:mo>&#x000D7;</mml:mo><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mo>&#x02200;</mml:mo><mml:msub><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:mi>N</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:munder><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:mfrac></mml:mrow></mml:msup></mml:math></inline-formula></td>
</tr>
<tr>
<td valign="top" align="left">3:</td>
<td valign="top" align="left">&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;<bold>for</bold> <italic>d</italic> : 1 &#x02192; <italic>D</italic> <bold>do</bold></td>
</tr>
<tr>
<td valign="top" align="left">4:</td>
<td valign="top" align="left">&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;Set <inline-formula><mml:math id="M32"><mml:msubsup><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">H</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>&#x02190;</mml:mo><mml:mi>&#x02205;</mml:mi><mml:mo>,</mml:mo><mml:msubsup><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">M</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>&#x02190;</mml:mo><mml:mi>&#x02205;</mml:mi><mml:mo>,</mml:mo><mml:msubsup><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">L</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>&#x02190;</mml:mo><mml:mi>&#x02205;</mml:mi></mml:math></inline-formula></td>
</tr>
<tr>
<td valign="top" align="left">5:</td>
<td valign="top" align="left">&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;<bold>for</bold> <italic>s</italic><sub><italic>j</italic></sub> &#x02208; <italic>N</italic>(<italic>s</italic><sub><italic>i</italic></sub>), <italic>s</italic><sub><italic>i</italic></sub> &#x02260; <italic>s</italic><sub><italic>j</italic></sub></td>
</tr>
<tr>
<td valign="top" align="left">6:</td>
<td valign="top" align="left">&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;<inline-formula><mml:math id="M33"><mml:msub><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02190;</mml:mo><mml:msub><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:mrow><mml:mi mathvariant="-tex-caligraphic">N</mml:mi></mml:mrow></mml:math></inline-formula></td>
</tr>
<tr>
<td valign="top" align="left">7:</td>
<td valign="top" align="left">&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;<bold>if</bold> <inline-formula><mml:math id="M34"><mml:msub><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">H</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> <bold>then</bold></td>
</tr>
<tr>
<td valign="top" align="left">8:</td>
<td valign="top" align="left">&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;<inline-formula><mml:math id="M35"><mml:msubsup><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">H</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>&#x02190;</mml:mo><mml:msubsup><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">H</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>&#x0222A;</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:math></inline-formula></td>
</tr>
<tr>
<td valign="top" align="left">9:</td>
<td valign="top" align="left">&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;<bold>else if</bold> <inline-formula><mml:math id="M36"><mml:msub><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">M</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> <bold>then</bold></td>
</tr>
<tr>
<td valign="top" align="left">10:</td>
<td valign="top" align="left">&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;<inline-formula><mml:math id="M37"><mml:msubsup><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">M</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>&#x02190;</mml:mo><mml:msubsup><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">M</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>&#x0222A;</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:math></inline-formula></td>
</tr>
<tr>
<td valign="top" align="left">11:</td>
<td valign="top" align="left">&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;<bold>else</bold></td>
</tr>
<tr>
<td valign="top" align="left">12:</td>
<td valign="top" align="left">&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;<inline-formula><mml:math id="M38"><mml:msubsup><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">L</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>&#x02190;</mml:mo><mml:msubsup><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">L</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>&#x0222A;</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:math></inline-formula></td>
</tr>
<tr>
<td valign="top" align="left">13:</td>
<td valign="top" align="left">&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;<inline-formula><mml:math id="M39"><mml:msubsup><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>&#x02190;</mml:mo><mml:mrow><mml:mo stretchy="true">&#x02329;</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>;</mml:mo><mml:msubsup><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">H</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">M</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">L</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="true">&#x0232A;</mml:mo></mml:mrow></mml:math></inline-formula></td>
</tr>
<tr>
<td valign="top" align="left">14:</td>
<td valign="top" align="left">&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;<inline-formula><mml:math id="M40"><mml:msubsup><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>&#x02190;</mml:mo><mml:mi>H</mml:mi><mml:mi>A</mml:mi><mml:mi>S</mml:mi><mml:mi>H</mml:mi><mml:mrow><mml:mo stretchy="true">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="true">)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mi mathvariant="-tex-caligraphic">S</mml:mi></mml:mrow><mml:mo>&#x02190;</mml:mo><mml:mrow><mml:mi mathvariant="-tex-caligraphic">S</mml:mi></mml:mrow><mml:mo>&#x0222A;</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:math></inline-formula></td>
</tr>
 </tbody>
</table>
</table-wrap>
</sec>
<sec>
<title>3.2. Learning Representations of Collaboration Patterns</title>
<p>Based on the WL relabeling process, we can describe collaboration patterns of a scholar <italic>s</italic><sub><italic>i</italic></sub> as a multi-set of subgraphs rooted in <italic>s</italic><sub><italic>i</italic></sub>. To compare collaboration patterns of scholars, one of the most na&#x000EF;ve approaches is applying similarity measurements for categorical data (e.g., Jaccard index) to examine whether the scholars have the identical collaboration patterns. However, since the WL relabeling process assigns nominal labels on the collaboration patterns, it is difficult to compare the collaboration patterns by themselves, rather than a composition of them within the scholars.</p>
<p>To solve this problem, we propose a method for learning representations of collaboration patterns. Embedding the patterns enables us to easily compare the collaboration of scholars using similarity measurements among vectors. Embedding techniques for entities in graphs (e.g., nodes, subgraphs, meta-paths, and so on) are mostly based on adjacency and proximity between the entities. Although adjacency of subgraphs does not indicate that the corresponding collaboration patterns are similar, vector representations of the subgraphs will reflect their structural features and research groups, including them. <xref ref-type="fig" rid="F2">Figure 2</xref> presents a simple example of how the adjacency between subgraphs can reach the structural features of the subgraphs.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p><bold>(A,B)</bold> Learning representations of research collaboration patterns. Dotted ellipses indicate the collaboration patterns rooted in gray nodes. For embedding <inline-formula><mml:math id="M15"><mml:msubsup><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula> and <inline-formula><mml:math id="M16"><mml:msubsup><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula>, collaboration patterns of <italic>s</italic><sub><italic>a</italic></sub> and <italic>s</italic><sub><italic>i</italic></sub> have different structures. In the WL relabeling process, labels of the collaboration patterns can provide information only about <inline-formula><mml:math id="M17"><mml:msubsup><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>&#x02260;</mml:mo><mml:msubsup><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula>. To compare the collaboration patterns, we attempt to learn representations of the patterns based on their adjacency. Since neighborhoods of <italic>s</italic><sub><italic>a</italic></sub> and <italic>s</italic><sub><italic>i</italic></sub> have similar local structures, <inline-formula><mml:math id="M18"><mml:msubsup><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula> and <inline-formula><mml:math id="M19"><mml:msubsup><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula> are closely located in spite of their structural inequality.</p></caption>
<graphic xlink:href="fdata-02-00039-g0002.tif"/>
</fig>
<p>As shown in (a) and (b) of <xref ref-type="fig" rid="F2">Figure 2</xref>, collaboration patterns are described by adjacency between scholars, and the collaboration patterns also have adjacency with each other. In <xref ref-type="fig" rid="F2">Figure 2</xref>, <inline-formula><mml:math id="M41"><mml:msubsup><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula> and <inline-formula><mml:math id="M42"><mml:msubsup><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula> have different structures, but neighborhoods of <inline-formula><mml:math id="M43"><mml:msubsup><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula> and <inline-formula><mml:math id="M44"><mml:msubsup><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula> are structurally identical. When we only apply the WL relabeling, we can obtain information only that <inline-formula><mml:math id="M45"><mml:msubsup><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula> and <inline-formula><mml:math id="M46"><mml:msubsup><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula> are not identical. Nevertheless, by observing neighborhoods of <inline-formula><mml:math id="M47"><mml:msubsup><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula> and <inline-formula><mml:math id="M48"><mml:msubsup><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula>, we can know that they have structural similarity. In other words, we can identify whether the collaboration patterns have similar meanings. Thus, if we allocate close vector coordinates to adjacent collaboration patterns, <inline-formula><mml:math id="M49"><mml:msubsup><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula> and <inline-formula><mml:math id="M50"><mml:msubsup><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula> will have similar vector representations, conclusively. Thereby, <inline-formula><mml:math id="M51"><mml:mo>&#x003A6;</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M52"><mml:mo>&#x003A6;</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula>, which are vector representations of <inline-formula><mml:math id="M53"><mml:msubsup><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula> and <inline-formula><mml:math id="M54"><mml:msubsup><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula>, will be able to reflect structural features of research groups including <italic>s</italic><sub><italic>a</italic></sub> and <italic>s</italic><sub><italic>i</italic></sub>.</p>
<p>We attempt to learn representations of the collaboration patterns using Subgraph2Vec (Narayanan et al., <xref ref-type="bibr" rid="B21">2016</xref>), which is the well-known algorithm based on the adjacency between subgraphs. For embedding, Subgraph2Vec employs radial skip-gram and negative sampling. The radial skip-gram is a modification of the original skip-gram in Word2Vec (Mikolov et al., <xref ref-type="bibr" rid="B20">2013</xref>). In the case of language processing, adjacency of words is determined with fixed window sizes. On the other hand, in the graphical data, such as co-authorship networks, the number of adjacent subgraphs is inconstant. Therefore, the radial skip-gram is used for handling the inconstant number of collaboration patterns with unfixed window sizes. In addition, we compose neighborhoods of <italic>s</italic><sub><italic>i</italic></sub> on degree <italic>d</italic> based on its adjacent patterns from degree <italic>d</italic> &#x02212; 1 to <italic>d</italic> &#x0002B; 1, to consider meanings of collaboration patterns on various scales. The negative sampling is applied to reduce the computational complexity in the learning process. Co-occurrence probability of an arbitrary collaboration pattern (<inline-formula><mml:math id="M55"><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">S</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>a</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>) as a neighborhood of <italic>s</italic><sub><italic>i</italic></sub> at degree <italic>d</italic> is formulated as:</p>
<disp-formula id="E6"><label>(6)</label><mml:math id="M56"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:mi>P</mml:mi><mml:mrow><mml:mo stretchy="true">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">S</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>a</mml:mi></mml:mrow></mml:msub><mml:mo>|</mml:mo><mml:mo>&#x003A6;</mml:mo><mml:mrow><mml:mo stretchy="true">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="true">(</mml:mo><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="true">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="true">)</mml:mo></mml:mrow><mml:mo>&#x02243;</mml:mo><mml:mi>&#x003C3;</mml:mi><mml:mrow><mml:mo stretchy="true">(</mml:mo><mml:mrow><mml:mo>&#x003A6;</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="true">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">S</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>a</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="true">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mo>&#x022BA;</mml:mo></mml:mrow></mml:msup><mml:mo>&#x003A6;</mml:mo><mml:mrow><mml:mo stretchy="true">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="true">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="true">)</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where &#x003C3;(&#x000B7;) indicates the sigmoid function, and &#x003A6;(&#x000B7;) denotes a projection function for the vector representations.</p>
<p>By modifying the skip-gram and negative sampling (Mikolov et al., <xref ref-type="bibr" rid="B20">2013</xref>), we define an objective function for embedding the collaboration patterns. We maximize the occurrence probability for the neighborhoods and minimize the probability for collaboration patterns that are not neighboring. This is formulated as:</p>
<disp-formula id="E7"><label>(7)</label><mml:math id="M57"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:mi mathvariant="-tex-caligraphic">L</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>d</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mstyle displaystyle='true'><mml:munder><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mo>&#x02200;</mml:mo><mml:msub><mml:mi mathvariant="-tex-caligraphic">S</mml:mi><mml:mi>a</mml:mi></mml:msub><mml:mo>&#x02208;</mml:mo><mml:mi mathvariant="-tex-caligraphic">N</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>d</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:munder><mml:mrow><mml:mi>log</mml:mi></mml:mrow></mml:mstyle><mml:mtext>&#x000A0;</mml:mtext><mml:mi>P</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi mathvariant="-tex-caligraphic">S</mml:mi><mml:mi>a</mml:mi></mml:msub><mml:mo>&#x0007C;</mml:mo><mml:mo>&#x003A6;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>d</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:mo>&#x02212;</mml:mo><mml:mstyle displaystyle='true'><mml:munder><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mo>&#x02200;</mml:mo><mml:msub><mml:mi mathvariant="-tex-caligraphic">S</mml:mi><mml:mi>b</mml:mi></mml:msub><mml:mo>&#x02209;</mml:mo><mml:mi mathvariant="-tex-caligraphic">N</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>d</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:munder><mml:mrow><mml:mi>log</mml:mi></mml:mrow></mml:mstyle><mml:mtext>&#x000A0;</mml:mtext><mml:mi>P</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi mathvariant="-tex-caligraphic">S</mml:mi><mml:mi>b</mml:mi></mml:msub><mml:mo>&#x0007C;</mml:mo><mml:mo>&#x003A6;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>d</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:mo>&#x02243;</mml:mo><mml:mstyle displaystyle='true'><mml:munder><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mo>&#x02200;</mml:mo><mml:msub><mml:mi mathvariant="-tex-caligraphic">S</mml:mi><mml:mi>a</mml:mi></mml:msub><mml:mo>&#x02208;</mml:mo><mml:mi mathvariant="-tex-caligraphic">N</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>d</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:munder><mml:mrow><mml:mi>log</mml:mi></mml:mrow></mml:mstyle><mml:mtext>&#x000A0;</mml:mtext><mml:mi>&#x003C3;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mo>&#x003A6;</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi mathvariant="-tex-caligraphic">S</mml:mi><mml:mi>a</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>&#x022BA;</mml:mo></mml:msup><mml:mo>&#x003A6;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>d</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:mo>+</mml:mo><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>k</mml:mi></mml:munderover><mml:mrow><mml:msub><mml:mo>&#x1D53C;</mml:mo><mml:mrow><mml:msub><mml:mi mathvariant="-tex-caligraphic">S</mml:mi><mml:mi>b</mml:mi></mml:msub><mml:mo>~</mml:mo><mml:msub><mml:mi>P</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi mathvariant="-tex-caligraphic">S</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msub></mml:mrow></mml:mstyle><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mi>log</mml:mi><mml:mi>&#x003C3;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mo>&#x02212;</mml:mo><mml:mo>&#x003A6;</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi mathvariant="-tex-caligraphic">S</mml:mi><mml:mi>b</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>&#x022BA;</mml:mo></mml:msup><mml:mo>&#x003A6;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>d</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <inline-formula><mml:math id="M60"><mml:msub><mml:mrow><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">S</mml:mi></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x0221D;</mml:mo><mml:mi>U</mml:mi><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">S</mml:mi></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mfrac><mml:mrow><mml:mn>3</mml:mn></mml:mrow><mml:mrow><mml:mn>4</mml:mn></mml:mrow></mml:mfrac></mml:mrow></mml:msup></mml:math></inline-formula> denotes a noise distribution of collaboration patterns, <inline-formula><mml:math id="M61"><mml:mi>U</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">S</mml:mi></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> refers to a unigram distribution of all the collaboration patterns, and <inline-formula><mml:math id="M62"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">N</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mo>&#x000B7;</mml:mo></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> indicates a set of collaboration patterns that are in neighborhoods. This objective function makes <inline-formula><mml:math id="M63"><mml:mo>&#x003A6;</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">S</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>a</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M64"><mml:mo>&#x003A6;</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">S</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>b</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> closer to each other when <inline-formula><mml:math id="M65"><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">S</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>a</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> and <inline-formula><mml:math id="M66"><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">S</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>b</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> are neighboring. Otherwise, it makes them more distant. We have not significantly modified the objective function and learning methods of Subgraph2Vec. We only have modified and extended the WL-relabeling process to apply Subgraph2Vec on co-authorship networks. The contribution of this study has focused on extracting and comparing the collaboration patterns, but not proposing a novel representation learning method. Therefore, we will not present detail procedures of learning representations to avoid redundancy.</p>
</sec>
</sec>
<sec id="s4">
<title>4. Evaluation</title>
<p>We have attempted to validate the correlation between the performance of scholars and the research collaboration patterns of scholars. For the validation, we clustered the scholars according to vector representations of their collaboration patterns. Subsequently, we compared the clusters with quantitative indicators for the research performance. Thus, we attempted to examine whether scholars in a cluster exhibit similar research performance. To conduct the comparison, we applied the following indicators: (i) the number of papers written by each scholar, (ii) the total number of citations for all papers written by each scholar, (iii) the average number of citations for all papers written by each scholar, (iv) PageRank (Haveliwala, <xref ref-type="bibr" rid="B12">2002</xref>), (v) betweenness centrality (Freeman, <xref ref-type="bibr" rid="B8">1977</xref>), and (vi) closeness centrality (Sabidussi, <xref ref-type="bibr" rid="B24">1966</xref>). The centrality measurements are calculated for each scholar in the co-authorship network. As a preliminary study, we restrict our observation range into a small part of the bibliographic network. This limitation makes us challenging to measure count-based indicators or acquire the indicators from the external bibliography databases (e.g., Web of Science).</p>
<p>For the experiment, we collected the bibliography data from DBLP dataset<xref ref-type="fn" rid="fn0001"><sup>1</sup></xref> over the last 5 years at the famous conferences (e.g., ICDE, SIGMOD, and VLDB). The dataset consists of rich bibliography information, including the authors, titles, publication year, venues, and so on. The number of citations for the collected papers is acquired from Scopus<xref ref-type="fn" rid="fn0002"><sup>2</sup></xref>. <xref ref-type="table" rid="T1">Table 1</xref> presents statistical features of the collected dataset. Also, we implemented the proposed model by modifying an open-source project of the Subgraph2Vec<xref ref-type="fn" rid="fn0003"><sup>3</sup></xref>. The implemented model has also been publicly accessible<xref ref-type="fn" rid="fn0004"><sup>4</sup></xref>. Moreover, the proposed method has various hyper-parameters. We determined the parameters in a heuristic way; the number of epochs (&#x003F5;): 10, the learning rate (&#x003B7;): 0.025, the number of dimensions (&#x003B4;): 256, the maximum degree (<italic>D</italic>): 3, the number of negative samples (<italic>k</italic>): 200, and the weighting factor (&#x003B8;): 0.25.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>Descriptions of the experimental dataset.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Statistics</bold></th>
<th valign="top" align="center"><bold>Venues</bold></th>
<th valign="top" align="center"><bold>Number of publications</bold></th>
<th valign="top" align="center"><bold>Number of scholars</bold></th>
<th valign="top" align="center"><bold>Time span</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Value</td>
<td valign="top" align="center">3</td>
<td valign="top" align="center">2896</td>
<td valign="top" align="center">5884</td>
<td valign="top" align="center">2014&#x02013;2018</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The experimental procedures consist of four steps. First, we extracted collaboration patterns of all the collected scholars based on their adjacency and proximity. Second, we composed vector representations of the scholars by learning representations of the collaboration patterns and concatenating representations of patterns rooted in each scholar. Third, we clustered the scholars based on the vector representations, using the Gaussian Mixture Model and the Expectation-Maximization algorithm. The number of clusters is determined as 16 by minimizing the external adjacency between clusters. Lastly, we analyzed whether scholars in each cluster have a similar research style, based on the quantitative indicators. <xref ref-type="table" rid="T2">Table 2</xref> and <xref ref-type="fig" rid="F3">Figure 3</xref> present the experimental results.</p>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p>Experimental results for coherence of the research performance of scholars in each cluster.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th/>
<th/>
<th valign="top" align="center"><bold>C&#x00023;0</bold></th>
<th valign="top" align="center"><bold>C&#x00023;1</bold></th>
<th valign="top" align="center"><bold>C&#x00023;2</bold></th>
<th valign="top" align="center"><bold>C&#x00023;3</bold></th>
<th valign="top" align="center"><bold>C&#x00023;4</bold></th>
<th valign="top" align="center"><bold>C&#x00023;5</bold></th>
<th valign="top" align="center"><bold>C&#x00023;6</bold></th>
<th valign="top" align="center"><bold>C&#x00023;7</bold></th>
<th valign="top" align="center"><bold>C&#x00023;8</bold></th>
<th valign="top" align="center"><bold>C&#x00023;9</bold></th>
<th valign="top" align="center"><bold>C&#x00023;10</bold></th>
<th valign="top" align="center"><bold>C&#x00023;11</bold></th>
<th valign="top" align="center"><bold>C&#x00023;12</bold></th>
<th valign="top" align="center"><bold>C&#x00023;13</bold></th>
<th valign="top" align="center"><bold>C&#x00023;14</bold></th>
<th valign="top" align="center"><bold>C&#x00023;15</bold></th>
</tr>
</thead>
<tbody>
<tr style="border-top: thin solid #000000;">
<td valign="top" align="left">Num</td>
<td valign="top" align="left">&#x003BC;</td>
<td valign="top" align="center">0.14</td>
<td valign="top" align="center">1.90</td>
<td valign="top" align="center">0.36</td>
<td valign="top" align="center">8.70</td>
<td valign="top" align="center">0.12</td>
<td valign="top" align="center">0.22</td>
<td valign="top" align="center">0.11</td>
<td valign="top" align="center">0.19</td>
<td valign="top" align="center">0.29</td>
<td valign="top" align="center">0.40</td>
<td valign="top" align="center">6.66</td>
<td valign="top" align="center">0.17</td>
<td valign="top" align="center">0.17</td>
<td valign="top" align="center">0.44</td>
<td valign="top" align="center">0.08</td>
<td valign="top" align="center">0.31</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">&#x003C3;</td>
<td valign="top" align="center">0.53</td>
<td valign="top" align="center">2.42</td>
<td valign="top" align="center">0.97</td>
<td valign="top" align="center"><bold>10.33</bold></td>
<td valign="top" align="center">0.55</td>
<td valign="top" align="center">0.67</td>
<td valign="top" align="center">0.59</td>
<td valign="top" align="center">0.58</td>
<td valign="top" align="center">0.70</td>
<td valign="top" align="center">0.95</td>
<td valign="top" align="center"><bold>9.98</bold></td>
<td valign="top" align="center">0.62</td>
<td valign="top" align="center">0.64</td>
<td valign="top" align="center">0.88</td>
<td valign="top" align="center">0.38</td>
<td valign="top" align="center">0.80</td>
</tr> <tr style="border-top: thin solid #000000;">
<td valign="top" align="left">Sum</td>
<td valign="top" align="left">&#x003BC;</td>
<td valign="top" align="center">0.97</td>
<td valign="top" align="center">3.59</td>
<td valign="top" align="center">2.57</td>
<td valign="top" align="center">6.96</td>
<td valign="top" align="center">1.17</td>
<td valign="top" align="center">1.08</td>
<td valign="top" align="center">0.64</td>
<td valign="top" align="center">0.68</td>
<td valign="top" align="center">0.94</td>
<td valign="top" align="center">1.30</td>
<td valign="top" align="center">5.58</td>
<td valign="top" align="center">0.99</td>
<td valign="top" align="center">0.91</td>
<td valign="top" align="center">1.59</td>
<td valign="top" align="center">0.70</td>
<td valign="top" align="center">1.29</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">&#x003C3;</td>
<td valign="top" align="center">1.62</td>
<td valign="top" align="center">7.26</td>
<td valign="top" align="center">5.74</td>
<td valign="top" align="center"><bold>10.42</bold></td>
<td valign="top" align="center">2.23</td>
<td valign="top" align="center">1.60</td>
<td valign="top" align="center">1.13</td>
<td valign="top" align="center">0.87</td>
<td valign="top" align="center">1.80</td>
<td valign="top" align="center">1.82</td>
<td valign="top" align="center"><bold>8.52</bold></td>
<td valign="top" align="center">2.23</td>
<td valign="top" align="center">1.30</td>
<td valign="top" align="center">2.86</td>
<td valign="top" align="center">1.20</td>
<td valign="top" align="center">1.90</td>
</tr> <tr style="border-top: thin solid #000000;">
<td valign="top" align="left">Avg</td>
<td valign="top" align="left">&#x003BC;</td>
<td valign="top" align="center">2.29</td>
<td valign="top" align="center">4.52</td>
<td valign="top" align="center">6.02</td>
<td valign="top" align="center">3.15</td>
<td valign="top" align="center">2.78</td>
<td valign="top" align="center">2.40</td>
<td valign="top" align="center">1.53</td>
<td valign="top" align="center">1.59</td>
<td valign="top" align="center">2.05</td>
<td valign="top" align="center">2.63</td>
<td valign="top" align="center">3.05</td>
<td valign="top" align="center">2.13</td>
<td valign="top" align="center">2.11</td>
<td valign="top" align="center">3.35</td>
<td valign="top" align="center">1.65</td>
<td valign="top" align="center">2.81</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">&#x003C3;</td>
<td valign="top" align="center">3.96</td>
<td valign="top" align="center"><bold>7.98</bold></td>
<td valign="top" align="center"><bold>14.26</bold></td>
<td valign="top" align="center">4.40</td>
<td valign="top" align="center">5.45</td>
<td valign="top" align="center">3.41</td>
<td valign="top" align="center">2.68</td>
<td valign="top" align="center">2.14</td>
<td valign="top" align="center">4.20</td>
<td valign="top" align="center">3.11</td>
<td valign="top" align="center">2.99</td>
<td valign="top" align="center">3.75</td>
<td valign="top" align="center">3.02</td>
<td valign="top" align="center">6.55</td>
<td valign="top" align="center">2.74</td>
<td valign="top" align="center">4.41</td>
</tr> <tr style="border-top: thin solid #000000;">
<td valign="top" align="left">PR</td>
<td valign="top" align="left">&#x003BC;</td>
<td valign="top" align="center">4.48</td>
<td valign="top" align="center">11.18</td>
<td valign="top" align="center">9.22</td>
<td valign="top" align="center">16.78</td>
<td valign="top" align="center">5.70</td>
<td valign="top" align="center">6.52</td>
<td valign="top" align="center">4.53</td>
<td valign="top" align="center">6.88</td>
<td valign="top" align="center">7.17</td>
<td valign="top" align="center">7.25</td>
<td valign="top" align="center">13.18</td>
<td valign="top" align="center">4.32</td>
<td valign="top" align="center">4.60</td>
<td valign="top" align="center">6.80</td>
<td valign="top" align="center">7.69</td>
<td valign="top" align="center">6.99</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">&#x003C3;</td>
<td valign="top" align="center">1.20</td>
<td valign="top" align="center">4.24</td>
<td valign="top" align="center">1.98</td>
<td valign="top" align="center"><bold>13.39</bold></td>
<td valign="top" align="center">2.10</td>
<td valign="top" align="center">1.97</td>
<td valign="top" align="center">3.77</td>
<td valign="top" align="center">2.49</td>
<td valign="top" align="center">2.57</td>
<td valign="top" align="center">1.76</td>
<td valign="top" align="center"><bold>11.60</bold></td>
<td valign="top" align="center">2.97</td>
<td valign="top" align="center">2.31</td>
<td valign="top" align="center">1.56</td>
<td valign="top" align="center">1.59</td>
<td valign="top" align="center">2.06</td>
</tr> <tr style="border-top: thin solid #000000;">
<td valign="top" align="left">BC</td>
<td valign="top" align="left">&#x003BC;</td>
<td valign="top" align="center">0.03</td>
<td valign="top" align="center">0.68</td>
<td valign="top" align="center">0.22</td>
<td valign="top" align="center">3.83</td>
<td valign="top" align="center">0.05</td>
<td valign="top" align="center">0.13</td>
<td valign="top" align="center">0.00</td>
<td valign="top" align="center">0.00</td>
<td valign="top" align="center">0.00</td>
<td valign="top" align="center">0.26</td>
<td valign="top" align="center">1.71</td>
<td valign="top" align="center">0.00</td>
<td valign="top" align="center">0.00</td>
<td valign="top" align="center">0.22</td>
<td valign="top" align="center">0.00</td>
<td valign="top" align="center">0.13</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">&#x003C3;</td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center">1.87</td>
<td valign="top" align="center">0.93</td>
<td valign="top" align="center"><bold>8.37</bold></td>
<td valign="top" align="center">0.51</td>
<td valign="top" align="center">0.74</td>
<td valign="top" align="center">0.00</td>
<td valign="top" align="center">0.00</td>
<td valign="top" align="center">0.00</td>
<td valign="top" align="center">0.83</td>
<td valign="top" align="center"><bold>5.36</bold></td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center">0.06</td>
<td valign="top" align="center">0.92</td>
<td valign="top" align="center">0.09</td>
<td valign="top" align="center">0.47</td>
</tr> <tr style="border-top: thin solid #000000;">
<td valign="top" align="left">CC</td>
<td valign="top" align="left">&#x003BC;</td>
<td valign="top" align="center">54.39</td>
<td valign="top" align="center">58.28</td>
<td valign="top" align="center">45.34</td>
<td valign="top" align="center">66.09</td>
<td valign="top" align="center">46.66</td>
<td valign="top" align="center">48.20</td>
<td valign="top" align="center">24.72</td>
<td valign="top" align="center">17.44</td>
<td valign="top" align="center">11.42</td>
<td valign="top" align="center">57.34</td>
<td valign="top" align="center">63.20</td>
<td valign="top" align="center">37.44</td>
<td valign="top" align="center">44.19</td>
<td valign="top" align="center">55.20</td>
<td valign="top" align="center">14.48</td>
<td valign="top" align="center">47.08</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">&#x003C3;</td>
<td valign="top" align="center">19.04</td>
<td valign="top" align="center">17.80</td>
<td valign="top" align="center"><bold>28.35</bold></td>
<td valign="top" align="center">16.55</td>
<td valign="top" align="center">26.36</td>
<td valign="top" align="center">23.98</td>
<td valign="top" align="center"><bold>27.50</bold></td>
<td valign="top" align="center">26.50</td>
<td valign="top" align="center">22.44</td>
<td valign="top" align="center">17.16</td>
<td valign="top" align="center">17.31</td>
<td valign="top" align="center">27.38</td>
<td valign="top" align="center">25.21</td>
<td valign="top" align="center">19.48</td>
<td valign="top" align="center">24.17</td>
<td valign="top" align="center">26.36</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>The coherence is indirectly shown by the mean (&#x003BC;) and standard deviation (&#x003C3;) of the six quantitative indicators: the number of papers (Num), the total number of citations (Sum), the average number of citations (Avg), PageRank (PR), betweenness centrality (BC), and closeness centrality (CC). Cells present &#x000D7; 10<sup>2</sup> of &#x003BC; and &#x003C3; for the readability. The bold values indicate the first and second highest ones</italic>.</p>
</table-wrap-foot>
</table-wrap>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p>Distribution of the quantitative indicators for scholars in each cluster. Box-and-whisker plots indicate distributions of indicator values, and dots notate outliers. <bold>(A)</bold> The number of papers, <bold>(B)</bold> the total number of citations, <bold>(C)</bold> the average number of citations, <bold>(D)</bold> PageRank, <bold>(E)</bold> betweenness centrality, and <bold>(F)</bold> closeness centrality.</p></caption>
<graphic xlink:href="fdata-02-00039-g0003.tif"/>
</fig>
<p><xref ref-type="table" rid="T2">Table 2</xref> presents the mean and standard deviation of each indicator for scholars in a cluster. While most of the clusters had a very low standard deviation, the indicators for two clusters had a much higher standard deviation than the others. Excluding the closeness centrality, clusters which obtained a higher average score from an indicator than the others also had a higher variance for the indicator. This result is caused by that most of the scholars had low performance (e.g., 3870 of 5884 scholars wrote only one paper). At the same time, high-performance scholars exhibited extremely varied values of the indicators, as shown in <xref ref-type="fig" rid="F3">Figure 3</xref>.</p>
<p><xref ref-type="fig" rid="F3">Figure 3</xref> presents the distribution of the quantitative indicators for scholars in each cluster using box-and-whisker plots. The box indicates the 1st quartile to the 3rd quartile of distributions of data, and the horizontal bar refers to the 2nd quartile (the median). The ends of the whisker represent the lowest and highest datum within 1.5 interquartile range of the lower and upper quartile. Additionally, we show outliers that refer to data outside the whisker range. The scholars in C&#x00023;3 and C&#x00023;10 had the highest variance and the largest number of outliers. <xref ref-type="fig" rid="F3">Figure 3A</xref> presents the scholars in C&#x00023;3 and C&#x00023;10 wrote exceptionally more papers than in the other clusters. In our dataset, most of the scholars wrote one or two papers. However, productive scholars wrote a much more number of papers than the others, and there was extremely high variance in the number of papers written by the productive ones. <xref ref-type="fig" rid="F3">Figure 3B</xref> indicates that the scholars in C&#x00023;3 and C&#x00023;10 got many citations for their papers. This result can be affected by that the members of C&#x00023;3 and C&#x00023;10 had a large number of papers. However, at the same time, their average number of citations is relatively small, as displayed in <xref ref-type="fig" rid="F3">Figure 3C</xref>. Then, we also attempted to examine whether the scholars in C&#x00023;3 and C&#x00023;10 had distinctiveness regarding the structure of the co-authorship network. The scholars in C&#x00023;3 and C&#x00023;10 are closely connected to other significant scholars, as revealed by the PageRank algorithm in <xref ref-type="fig" rid="F3">Figure 3D</xref>. Also, they had higher betweenness centrality than the others (in <xref ref-type="fig" rid="F3">Figure 3E</xref>). This point indicates that they participated in larger research groups than the others. In <xref ref-type="fig" rid="F3">Figure 3F</xref>, the closeness centrality shows that they directly collaborated with a large number of scholars comparing with scales of their research groups. These results imply that members of C&#x00023;3 and C&#x00023;10 might be closely connected and composing large sub-networks.</p>
<p>C&#x00023;1 and C&#x00023;2 also showed interesting points. In <xref ref-type="fig" rid="F3">Figure 3A</xref>, the scholars in C&#x00023;1 and C&#x00023;2 wrote the small number of papers. On the other hand, in <xref ref-type="fig" rid="F3">Figure 3B</xref>, they had a large number of citations comparing with the number of papers. Especially in <xref ref-type="fig" rid="F3">Figure 3C</xref>, most of the scholars who exhibited the large average number of citations belonged to C&#x00023;1 and C&#x00023;2. In other words, the scholars in C&#x00023;1 and C&#x00023;2 participated in the small number of papers that obtained a large number of citations. Through these results, we found that they generally concentrated on the quality of papers, not the number of papers. In this perspective, the scholars in C&#x00023;1 and C&#x00023;2 had a high performance differently from the scholars in C&#x00023;1 and C&#x00023;2. The network-based indicators also showed the difference. As shown in <xref ref-type="fig" rid="F3">Figure 3E</xref>, the members of C&#x00023;1 and C&#x00023;2 had a relatively smaller research group than of C&#x00023;3 and C&#x00023;10. Although C&#x00023;3 and C&#x00023;10 had a similar tendency for all the indicators, C&#x00023;1 and C&#x00023;2 showed different results for the PageRank and closeness centrality. In <xref ref-type="fig" rid="F3">Figure 3F</xref>, the scholars in C&#x00023;1 had many collaborations in their research group. In contrast, the scholars in C&#x00023;2 looked irrelevant to the direct collaborations, considering a high variance in the closeness centrality. As shown in <xref ref-type="fig" rid="F3">Figure 3D</xref>, the scholars in C&#x00023;1 had stronger relationships with their collaborators than in C&#x00023;2.</p>
<p>Furthermore, in most of the indicators, scholars in C&#x00023;8 obtained low scores, since they wrote only one paper that was infrequently cited. Nevertheless, in <xref ref-type="fig" rid="F3">Figure 3F</xref>, C&#x00023;8 had many outliers, although most of the other elements had the closeness centrality nearby 0. In other words, most of the scholars in C&#x00023;8 participated in a paper that had a short author list.</p>
<p>Conclusively, by clustering the collaboration patterns, we have examined whether the collaboration patterns are correlated not only to the performance of scholars but also to their styles of research and collaboration. In both of the cases, the four clusters (C&#x00023;1, C&#x00023;2, C&#x00023;3, and C&#x00023;10) included scholars who exhibited high performance. However, in terms of the number of publications, the scholars in C&#x00023;3 and C&#x00023;10 showed higher performance than in C&#x00023;1 and C&#x00023;2. This point is the opposite in terms of the quality of papers. Regarding the structure of research groups, the scholars of C&#x00023;3 and C&#x00023;10 had large research groups, they were directly connected to group members, and their collaborators also had high centrality. In C&#x00023;1 and C&#x00023;2, the scholars had smaller research groups and fewer adjacent scholars than the former case. While the existing indicators simplify the research performance according to a few features, this result demonstrates that the proposed method can reflect various aspects of the research performance.</p>
</sec>
<sec sec-type="conclusions" id="s5">
<title>5. Conclusion</title>
<p>In this study, we have attempted to discover and represent the research collaboration patterns of scholars. Thus, we have proposed a method for learning vector representations of the collaboration patterns rooted in scholars. To demonstrate the efficacy of the method, we clustered the scholars according to the collaboration patterns and compared the clusters with the existing quantitative indicators for the research performance. Based on the comparison, we could partially validate whether the collaboration styles of scholars are correlated to their performance.</p>
<p>The proposed method and evaluation procedures have a few limitations. First, we did not conduct a quantitative evaluation and could not solidly verify the research question. To validate whether collaboration patterns are correlated to the research performance of scholars or not, we should find a way of evaluating their relevancy. Second, although we clustered the scholars, we did not suggest a novel indicator for evaluating the collaboration patterns. We do not know yet which collaboration patterns are helpful for improving research performance. Third, the bibliographic network has time-sequential features that dynamically change. However, since the proposed method does not cover the dynamicity, it considers out-dated publications or collaborations as with recent ones. These limitations should be solved for further research.</p>
</sec>
<sec sec-type="data-availability" id="s6">
<title>Data Availability Statement</title>
<p>The datasets analyzed in this study can be found in DBLP [<ext-link ext-link-type="uri" xlink:href="https://dblp.uni-trier.de">https://dblp.uni-trier.de</ext-link>] and Scopus [<ext-link ext-link-type="uri" xlink:href="https://www.scopus.com">https://www.scopus.com</ext-link>].</p>
</sec>
<sec id="s7">
<title>Author Contributions</title>
<p>H-JJ and O-JL conceived of the presented idea and developed the theory, discussed the results, and contributed to the final manuscript. The experiments were conceived by O-JL and conducted by H-JJ. JJ supervised the findings of this work. All authors reviewed the manuscript. JJ and O-JL provided critical feedback.</p>
<sec>
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
</sec>
</body>
<back>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Abbasi</surname> <given-names>A.</given-names></name> <name><surname>Altmann</surname> <given-names>J.</given-names></name> <name><surname>Hwang</surname> <given-names>J.</given-names></name></person-group> (<year>2009</year>). <article-title>Evaluating scholars based on their academic collaboration activities: two indices, the RC-index and the CC-index, for quantifying collaboration activities of researchers and scientific communities</article-title>. <source>Scientometrics</source> <volume>83</volume>, <fpage>1</fpage>&#x02013;<lpage>13</lpage>. <pub-id pub-id-type="doi">10.1007/s11192-009-0139-2</pub-id></citation></ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bihari</surname> <given-names>A.</given-names></name> <name><surname>Tripathi</surname> <given-names>S.</given-names></name></person-group> (<year>2017</year>). <article-title>EM-index: a new measure to evaluate the scientific impact of scientists</article-title>. <source>Scientometrics</source> <volume>112</volume>, <fpage>659</fpage>&#x02013;<lpage>677</lpage>. <pub-id pub-id-type="doi">10.1007/s11192-017-2379-x</pub-id></citation></ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Biswal</surname> <given-names>A. K.</given-names></name></person-group> (<year>2013</year>). <article-title>An absolute index (ab-index) to measure a researcher&#x00027;s useful contributions and productivity</article-title>. <source>PLoS ONE</source> <volume>8</volume>:<fpage>e84334</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0084334</pub-id><pub-id pub-id-type="pmid">24391941</pub-id></citation></ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bordons</surname> <given-names>M.</given-names></name> <name><surname>Aparicio</surname> <given-names>J.</given-names></name> <name><surname>Gonz&#x000E1;lez-Albo</surname> <given-names>B.</given-names></name> <name><surname>D&#x000ED;az-Faes</surname> <given-names>A. A.</given-names></name></person-group> (<year>2015</year>). <article-title>The relationship between the research performance of scientists and their position in co-authorship networks in three fields</article-title>. <source>J. Informetr.</source> <volume>9</volume>, <fpage>135</fpage>&#x02013;<lpage>144</lpage>. <pub-id pub-id-type="doi">10.1016/j.joi.2014.12.001</pub-id></citation></ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ding</surname> <given-names>Y.</given-names></name> <name><surname>Cronin</surname> <given-names>B.</given-names></name></person-group> (<year>2011</year>). <article-title>Popular and/or prestigious? measures of scholarly esteem</article-title>. <source>Inform. Process. Manage.</source> <volume>47</volume>, <fpage>80</fpage>&#x02013;<lpage>96</lpage>. <pub-id pub-id-type="doi">10.1016/j.ipm.2010.01.002</pub-id></citation></ref>
<ref id="B6">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Egghe</surname> <given-names>L.</given-names></name></person-group> (<year>2006</year>). <article-title>An improvement of the h-index: the g-index</article-title>. <source>ISSI Newslett.</source> <volume>2</volume>, <fpage>8</fpage>&#x02013;<lpage>9</lpage>. Available online at: <ext-link ext-link-type="uri" xlink:href="http://issi-society.org/media/1183/newsletter06.pdf">http://issi-society.org/media/1183/newsletter06.pdf</ext-link></citation></ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Erjia</surname> <given-names>Y.</given-names></name> <name><surname>Ying</surname> <given-names>D.</given-names></name></person-group> (<year>2009</year>). <article-title>Applying centrality measures to impact analysis: a coauthorship network analysis</article-title>. <source>J. Am. Soc. Inform. Sci. Technol.</source> <volume>60</volume>, <fpage>2107</fpage>&#x02013;<lpage>2118</lpage>. <pub-id pub-id-type="doi">10.1002/asi.v60:10</pub-id></citation></ref>
<ref id="B8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Freeman</surname> <given-names>L. C.</given-names></name></person-group> (<year>1977</year>). <article-title>A set of measures of centrality based on betweenness</article-title>. <source>Sociometry</source> <volume>40</volume>:<fpage>35</fpage>. <pub-id pub-id-type="doi">10.2307/3033543</pub-id></citation></ref>
<ref id="B9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Galam</surname> <given-names>S.</given-names></name></person-group> (<year>2011</year>). <article-title>Tailor based allocations for multiple authorship: a fractional gh-index</article-title>. <source>Scientometrics</source> <volume>89</volume>, <fpage>365</fpage>&#x02013;<lpage>379</lpage>. <pub-id pub-id-type="doi">10.1007/s11192-011-0447-1</pub-id></citation></ref>
<ref id="B10">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Ganesh</surname> <given-names>J.</given-names></name> <name><surname>Ganguly</surname> <given-names>S.</given-names></name> <name><surname>Gupta</surname> <given-names>M.</given-names></name> <name><surname>Varma</surname> <given-names>V.</given-names></name> <name><surname>Pudi</surname> <given-names>V.</given-names></name></person-group> (<year>2016</year>). <article-title>&#x0201C;Author2vec: Learning author representations by combining content and link information,&#x0201D;</article-title> in <source>Proceedings of the 25th International Conference on World Wide Web (WWW 2016)</source>, eds <person-group person-group-type="editor"><name><surname>Bourdeau</surname> <given-names>J.</given-names></name> <name><surname>Hendler</surname> <given-names>J.</given-names></name> <name><surname>Nkambou</surname> <given-names>R.</given-names></name> <name><surname>Horrocks</surname> <given-names>I.</given-names></name> <name><surname>Zhao</surname> <given-names>B. Y.</given-names></name></person-group> (<publisher-loc>Montreal, QC</publisher-loc>: <publisher-name>ACM</publisher-name>), <fpage>49</fpage>&#x02013;<lpage>50</lpage>.</citation></ref>
<ref id="B11">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Ganguly</surname> <given-names>S.</given-names></name> <name><surname>Pudi</surname> <given-names>V.</given-names></name></person-group> (<year>2017</year>). <article-title>&#x0201C;Paper2vec: Combining graph and text information for scientific paper representation,&#x0201D;</article-title> in <source>Advances in Information Retrieval - Proceedings of the 39th European Conference on Information Retrieval (ECIR 2017), volume 10193 of Lecture Notes in Computer Science</source>, eds <person-group person-group-type="editor"><name><surname>Jose</surname> <given-names>J. M.</given-names></name> <name><surname>Hauff</surname> <given-names>C.</given-names></name> <name><surname>Alting&#x000F6;vde</surname> <given-names>I. S.</given-names></name> <name><surname>Song</surname> <given-names>D.</given-names></name> <name><surname>Albakour</surname> <given-names>D.</given-names></name> <name><surname>Watt</surname> <given-names>S. N. K.</given-names></name> <name><surname>Tait</surname> <given-names>J.</given-names></name></person-group> (<publisher-loc>Aberdeen</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>383</fpage>&#x02013;<lpage>395</lpage>.</citation></ref>
<ref id="B12">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Haveliwala</surname> <given-names>T. H.</given-names></name></person-group> (<year>2002</year>). <article-title>&#x0201C;Topic-sensitive PageRank,&#x0201D;</article-title> in <source>Proceedings of the eleventh international conference on World Wide Web - WWW &#x00027;02</source>, number 10 in WWW &#x00027;02, eds <person-group person-group-type="editor"><name><surname>Lassner</surname> <given-names>D.</given-names></name> <name><surname>Roure</surname> <given-names>D. D.</given-names></name> <name><surname>Iyengar</surname> <given-names>A.</given-names></name></person-group> (<publisher-loc>Honolulu, HI</publisher-loc>: <publisher-name>ACM Press</publisher-name>), <fpage>517</fpage>&#x02013;<lpage>526</lpage>.</citation></ref>
<ref id="B13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hirsch</surname> <given-names>J. E.</given-names></name></person-group> (<year>2005</year>). <article-title>An index to quantify an individual&#x00027;s scientific research output</article-title>. <source>Proc. Natl. Acad. Sci. U.S.A.</source> <volume>102</volume>, <fpage>16569</fpage>&#x02013;<lpage>16572</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.0507655102</pub-id><pub-id pub-id-type="pmid">16275915</pub-id></citation></ref>
<ref id="B14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hirsch</surname> <given-names>J. E.</given-names></name></person-group> (<year>2010</year>). <article-title>An index to quantify an individual&#x00027;s scientific research output that takes into account the effect of multiple coauthorship</article-title>. <source>Scientometrics</source> <volume>85</volume>, <fpage>741</fpage>&#x02013;<lpage>754</lpage>. <pub-id pub-id-type="doi">10.1007/s11192-010-0193-9</pub-id></citation></ref>
<ref id="B15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Huang</surname> <given-names>Y.</given-names></name> <name><surname>Bu</surname> <given-names>Y.</given-names></name> <name><surname>Ding</surname> <given-names>Y.</given-names></name> <name><surname>Lu</surname> <given-names>W.</given-names></name></person-group> (<year>2018</year>). <article-title>Direct citations between citing publications</article-title>. <source>CoRR</source>, abs/1811.01120.</citation></ref>
<ref id="B16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jin</surname> <given-names>B.</given-names></name> <name><surname>Liang</surname> <given-names>L.</given-names></name> <name><surname>Rousseau</surname> <given-names>R.</given-names></name> <name><surname>Egghe</surname> <given-names>L.</given-names></name></person-group> (<year>2007</year>). <article-title>The R- and AR-indices: complementing the h-index</article-title>. <source>Chinese Sci. Bull.</source> <volume>52</volume>, <fpage>855</fpage>&#x02013;<lpage>863</lpage>. <pub-id pub-id-type="doi">10.1007/s11434-007-0145-9</pub-id></citation></ref>
<ref id="B17">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Kosmulski</surname> <given-names>M.</given-names></name></person-group> (<year>2006</year>). <article-title>A new hirsch-type index saves time and works equally well as the original h-index</article-title>. <source>ISSI Newslett.</source> <volume>2</volume>, <fpage>4</fpage>&#x02013;<lpage>6</lpage>. Available online at: <ext-link ext-link-type="uri" xlink:href="http://www.jmlr.org/papers/v12/shervashidze11a.html">http://www.jmlr.org/papers/v12/shervashidze11a.html</ext-link></citation></ref>
<ref id="B18">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Lee</surname> <given-names>O.-J.</given-names></name></person-group> (<year>2019</year>). <source>Learning Distributed Representations of Character Networks for Computational Narrative Analytics</source> (Ph.D. thesis). <publisher-name>Chung-Ang University</publisher-name>, <publisher-loc>Seoul, South Korea</publisher-loc>.</citation></ref>
<ref id="B19">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Lee</surname> <given-names>O.-J.</given-names></name> <name><surname>Jung</surname> <given-names>J. J.</given-names></name></person-group> (<year>2019</year>). <article-title>&#x0201C;Character network embedding-based plot structure discovery in narrative multimedia,&#x0201D;</article-title> in <source>Proceedings of the 9th International Conference on Web Intelligence, Mining and Semantics (WIMS 2019)</source>, eds <person-group person-group-type="editor"><name><surname>Akerkar</surname> <given-names>R.</given-names></name> <name><surname>Jung</surname> <given-names>J. J.</given-names></name></person-group> (<publisher-loc>Seoul</publisher-loc>: <publisher-name>ACM</publisher-name>), <volume>15</volume>:<fpage>1</fpage>&#x02013;<lpage>15</lpage>:9.</citation></ref>
<ref id="B20">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Mikolov</surname> <given-names>T.</given-names></name> <name><surname>Chen</surname> <given-names>K.</given-names></name> <name><surname>Corrado</surname> <given-names>G.</given-names></name> <name><surname>Dean</surname> <given-names>J.</given-names></name></person-group> (<year>2013</year>). <article-title>&#x0201C;Efficient estimation of word representations in vector space,&#x0201D;</article-title> in <source>Proceedings of the 1st International Conference on Learning Representations, ICLR 2013</source>, eds <person-group person-group-type="editor"><name><surname>Bengio</surname> <given-names>Y.</given-names></name> <name><surname>LeCun</surname> <given-names>Y.</given-names></name></person-group> (<publisher-loc>Scottsdale, AZ</publisher-loc>)</citation></ref>
<ref id="B21">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Narayanan</surname> <given-names>A.</given-names></name> <name><surname>Chandramohan</surname> <given-names>M.</given-names></name> <name><surname>Chen</surname> <given-names>L.</given-names></name> <name><surname>Liu</surname> <given-names>Y.</given-names></name> <name><surname>Saminathan</surname> <given-names>S.</given-names></name></person-group> (<year>2016</year>). <article-title>subgraph2vec: Learning distributed representations of rooted sub-graphs from large graphs</article-title>. <source>arXiv</source> preprint: 1606.08928.</citation></ref>
<ref id="B22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Newman</surname> <given-names>M. E. J.</given-names></name></person-group> (<year>2001</year>). <article-title>The structure of scientific collaboration networks</article-title>. <source>Proc. Natl. Acad. Sci. U.S.A.</source> <volume>98</volume>, <fpage>404</fpage>&#x02013;<lpage>409</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.98.2.404</pub-id><pub-id pub-id-type="pmid">11149952</pub-id></citation></ref>
<ref id="B23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Reyes-Gonzalez</surname> <given-names>L.</given-names></name> <name><surname>Gonzalez-Brambila</surname> <given-names>C. N.</given-names></name> <name><surname>Veloso</surname> <given-names>F.</given-names></name></person-group> (<year>2016</year>). <article-title>Using co-authorship and citation analysis to identify research groups: a new way to assess performance</article-title>. <source>Scientometrics</source> <volume>108</volume>, <fpage>1171</fpage>&#x02013;<lpage>1191</lpage>. <pub-id pub-id-type="doi">10.1007/s11192-016-2029-8</pub-id></citation></ref>
<ref id="B24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sabidussi</surname> <given-names>G.</given-names></name></person-group> (<year>1966</year>). <article-title>The centrality index of a graph</article-title>. <source>Psychometrika</source> <volume>31</volume>, <fpage>581</fpage>&#x02013;<lpage>603</lpage>. <pub-id pub-id-type="doi">10.1007/bf02289527</pub-id></citation></ref>
<ref id="B25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shervashidze</surname> <given-names>N.</given-names></name> <name><surname>Schweitzer</surname> <given-names>P.</given-names></name> <name><surname>van Leeuwen</surname> <given-names>E. J.</given-names></name> <name><surname>Mehlhorn</surname> <given-names>K.</given-names></name> <name><surname>Borgwardt</surname> <given-names>K. M.</given-names></name></person-group> (<year>2011</year>). <article-title>Weisfeiler-lehman graph kernels</article-title>. <source>J. Mach. Learn. Res.</source> <volume>12</volume>, <fpage>2539</fpage>&#x02013;<lpage>2561</lpage>.</citation></ref>
<ref id="B26">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sidiropoulos</surname> <given-names>A.</given-names></name> <name><surname>Katsaros</surname> <given-names>D.</given-names></name> <name><surname>Manolopoulos</surname> <given-names>Y.</given-names></name></person-group> (<year>2007</year>). <article-title>Generalized hirsch h-index for disclosing latent facts in citation networks</article-title>. <source>Scientometrics</source> <volume>72</volume>, <fpage>253</fpage>&#x02013;<lpage>280</lpage>. <pub-id pub-id-type="doi">10.1007/s11192-007-1722-z</pub-id></citation></ref>
<ref id="B27">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vaidya</surname> <given-names>J. S.</given-names></name></person-group> (<year>2005</year>). <article-title>V-index: a fairer index to quantify an individual &#x00027;s research output capacity</article-title>. <source>BMJ</source> <volume>331</volume>, <fpage>13394</fpage>&#x02013;<lpage>1340</lpage>. <pub-id pub-id-type="doi">10.1136/bmj.331.7528.1339-c</pub-id></citation></ref>
<ref id="B28">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Waltman</surname> <given-names>L.</given-names></name> <name><surname>Yan</surname> <given-names>E.</given-names></name></person-group> (<year>2014</year>). <article-title>&#x0201C;PageRank-related methods for analyzing citation networks,&#x0201D;</article-title> in <source>Measuring Scholarly Impact</source>, eds <person-group person-group-type="editor"><name><surname>Ding</surname> <given-names>Y.</given-names></name> <name><surname>Rousseau</surname> <given-names>R.</given-names></name> <name><surname>Wolfram</surname> <given-names>D.</given-names></name></person-group> (<publisher-name>Springer International Publishing</publisher-name>), <fpage>83</fpage>&#x02013;<lpage>100</lpage>.</citation></ref>
<ref id="B29">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wu</surname> <given-names>Q.</given-names></name></person-group> (<year>2010</year>). <article-title>The w-index: a measure to assess scientific impact by focusing on widely cited papers</article-title>. <source>J. Am. Soc. Inform. Sci. Technol.</source> <volume>61</volume>, <fpage>609</fpage>&#x02013;<lpage>614</lpage>. <pub-id pub-id-type="doi">10.1002/asi.21276</pub-id></citation></ref>
<ref id="B30">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yan</surname> <given-names>E.</given-names></name> <name><surname>Ding</surname> <given-names>Y.</given-names></name></person-group> (<year>2011</year>). <article-title>Discovering author impact: a PageRank perspective</article-title>. <source>Inform. Process. Manage.</source> <volume>47</volume>, <fpage>125</fpage>&#x02013;<lpage>134</lpage>. <pub-id pub-id-type="doi">10.1016/j.ipm.2010.05.002</pub-id></citation></ref>
</ref-list>
<fn-group>
<fn id="fn0001"><p><sup>1</sup><ext-link ext-link-type="uri" xlink:href="https://dblp.uni-trier.de">https://dblp.uni-trier.de</ext-link></p></fn>
<fn id="fn0002"><p><sup>2</sup><ext-link ext-link-type="uri" xlink:href="https://www.scopus.com">https://www.scopus.com</ext-link></p></fn>
<fn id="fn0003"><p><sup>3</sup><ext-link ext-link-type="uri" xlink:href="https://github.com/MLDroid/subgraph2vec_gensim">https://github.com/MLDroid/subgraph2vec_gensim</ext-link></p></fn>
<fn id="fn0004"><p><sup>4</sup><ext-link ext-link-type="uri" xlink:href="https://github.com/higd963/Collaboration2Vec">https://github.com/higd963/Collaboration2Vec</ext-link></p></fn>
</fn-group>
<fn-group>
<fn fn-type="financial-disclosure"><p><bold>Funding.</bold> This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (NRF-2017R1A2B4010774). Also, this work was supported by the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea (NRF-2017S1A6A3A01078538).</p>
</fn>
</fn-group>
</back>
</article>