Co-Investment Network of ERC-20 Tokens: Network Structure Versus Market Performance

Cryptocurrencies have attracted extensive attention from individual and institutional investors in recent years. In this emerging and inefficient capital market, the roles that institutional investors play can have a remarkable impact on the market. This paper investigates the ERC-20 token investment market from a network perspective. Using a dataset containing 317 ERC-20 tokens and their institutional investors at the end of June 2020, we construct a co-investment network of tokens connected by the sharing of institutional investors. Specifically, we examine whether the tokens’ market embeddedness, measured by their network structural properties, can influence their market performance, as well as whether the tokens’ structural similarity in the co-investment network can influence similarity of their market performance. Our results indicate that strength centrality, closeness centrality, betweenness centrality, and clustering coefficient have a significant impact on trading volume and liquidity of the market. And there is a significantly positive correlation between the Jaccard similarity index and tokens’ market performance similarity. This work demonstrates the non-negligible influence of the institutional investors and the diffusion of such influence through co-investment relationships in the cryptocurrency market. We expect the analysis could further enhance the understanding of the inefficiency and vulnerability of this emerging market.


INTRODUCTION
As the end of 2020, there are more than 7,000 cryptocurrencies in circulation worldwide. The total cryptocurrency market value has exceeded 300 billion US dollars, with a daily trading volume topping 200 billion [1]. However, only a few hundred of these cryptocurrencies run on their own blockchains, while others reside on Ethereum-like blockchain platforms, which support users to issue smart contract-based cryptocurrencies, also known as tokens, following token standards such as ERC-20, ERC-721, and ERC-777. The number of smart contract-based tokens on Ethereum is more than 300,000 as of 2020 [2], though not all are publicly traded in cryptocurrency exchanges.
Despite the soaring capitalization, the emerging cryptocurrency market also exhibits extremely high volatility. Hence, finding the driving forces of the market is crucial to the understanding of the formation and development of cryptocurrencies' prices. All the evidence points out that this market is highly inefficient. Buchholz et al. [3] claimed that the supply and demand in the market are among the main drivers of the bitcoin price. Wijk [4] emphasized the role of global macroeconomic indicators, e.g., stock indices, exchange rates, and oil prices, in determining Bitcoin's price. He found that the Dow Jones Index, the euro-dollar exchange rate, and the WTI oil price have a longterm and significant effect on Bitcoin's value. Kristoufek [5] found that the price of Bitcoin was significantly and positively correlated with public interests measured by Google Trends and Wikipedia queries, as well as technical indexes, such as hash rates and mining difficulty, in the long run. Moreover, the cryptocurrency market' performance has also been found to be related to media exposure [6,7], policies and regulations [8,9], and other financial assets [10,11], all revealing the inefficiency of the market.
An inefficient market is easily manipulated, especially by large investors. Compared to individual investors, institutional investors can rely on their capital, talent, and information advantages to profit [12], and they also have a stronger ability to capture and conceal bad news in the market [13]. In the case of ERC-20 tokens, institutional investors play a crucial role in both the primary and secondary markets. In a typical ERC-20 token initial offering (ICO) process, i.e., the primary market, the institutional investors would first purchase a large chunk of tokens from the issuer and redistribute a proportion to individual investors before public listing while retaining some tokens for market-making in the secondary market. Institutional investors commonly invest in more than one token to disperse their risks among multiple projects. As a result, they act as intermediaries between different tokens, therefore transmitting market influences from one token to another. To the best of our knowledge, there is still a lack of research on the relationship between institutional investors' investment preference and the performance of the cryptocurrency market.
This paper investigates the impact of institutional investors' dispersed investments on the cryptocurrency market, i.e., how the individual ERC-20 tokens' market performances, e.g., price, volatility, and trading volume, are affected by their sharing of institutional investors. We construct a co-investment network that uses ERC-20 tokens as nodes and the pairwise sharing of institutional investors as edges. From the macroscopic perspective, such a co-investment network offers a panorama of the institutional investors' influence distribution. While from the microscopic perspective, we can closely examine the intertwining influence of multiple institutional investors on the individual ERC-20 tokens.
Specifically, we try to answer two research questions. First, how the market "embeddedness" of individual ERC-20 tokens, in analogy to Granottever's market embeddedness of socialeconomical actors [14] and measured by the corresponding nodes' network structural properties, affects the tokens' market performance. Second, whether tokens with similar "embeddedness", measured by the similarity of their network structure, and therefore experiencing similar market impacts, also have converged market performance.
The rest of the paper is organized as follows. Section 2 describes the data and their sources. Section 3 describes the research methods, including the selection and calculation of six indicators quantifying market performance, as well as the construction and calculation of the co-investment network. Section 4 presents the empirical results of the research hypothesis in detail and makes an in-depth analysis of the results. Section 5 summarizes the whole paper and discusses the direction of future work.

DATA
The institutional investors' investments into ERC-20 tokens can be obtained from Block123.com [15]. As of June 2020, the website listed 556 cryptocurrency projects, of which 317 are ERC-20 tokens, and their institutional investors. At the time of data acquisition, all 317 ERC-20 tokens were actively trading in the cryptocurrency market. To the best of the authors' knowledge, block123.com provides the largest and most complete tokeninvestor relationship dataset that is publicly available. A detailed description of the dataset is given in Supplementary Section 1.
Market data, including the daily closing price, trading volume, and market capitalization (all in USD), of the 317 tokens are obtained from CoinMarketCap.com. Since cryptocurrencies are traded 7/24, we take the last reported price in one day as the daily closing price. The market data range from 1 to 31 July 2020, spanning one month after the acquisition of the token-investor dataset. And 85% of the 317 tokens are valued in the top 20% of the market.
Moreover, we consider three previously claimed drivers of token prices by Liu et al. [16] as extra factors influencing market performance. First, the numbers of tokens' transactions on the blockchains are used as a proxy for adopters' activity. The data are obtained from Etherscan [2]. Second, the indicators of tokens' attention on social platforms, including Twitter followers, Telegram channel subscribers, Reddit board activities, and website rankings are used to represent public interests to the tokens. The data are obtained through the CoinGecko API [17]. Third, the technical indicators of the cryptocurrency projects, such as the Github popularity, are used to indicate the blockchain projects' technical development. The data are also obtained from CoinGecko. These control variables are summarized in Table 1. Note that reddit_discussions and tech_score are combined values of similar factors. Details of the combination methods are described in Supplementary Section 2.

ERC-20 Tokens' Market Performance and Similarity
We use six indicators to quantify the market performance of ERC-20 tokens. Daily price p t , trading volume v t , and market capitalization m t are as provided in the data, while daily return, volatility, and (il)liquidity are defined as follows.
The daily return r t is defined as The volatility ] in a W-day window is defined as where r ln,t ln p t − ln p t−1 is the daily logarithm return and r ln 1/W W t 1 r ln,t is the average return in the W-day window. ILLIQ [18] is the most commonly used indicator to measure market liquidity. For a W-day window, ILLIQ is a direct reflection of how sensitive the prices is to volume. The larger its value is, the higher the level of price change per unit trading volume is. Since the average trading volume is in millions of dollars, we divide the unit of volume by 1,000,000.
Bitcoin and Ether, the original cryptocurrency of Ethereum, are the leaders in the cryptocurrency market. To capture the similarity between the market performance of two tokens, we use partial correlation coefficient of their market indicator time series, eliminating the same influence brought by the market leaders. For example, the partial correlation between two daily return series r i and r j is ρ ri,rj(r Ether ) ρ ri,rj − ρ ri,r Ether ρ rj,r Ether and r Ether is the daily return of Ether to eliminate. Time series of liquidity and volatility are composed of results calculated based on a 3-days window, and the other four indicators' series are daily values. Refer to Supplementary Section 3 for a detailed analysis.

Construction of the Co-investment Network
We define the co-investment network G (V, E), where V is a set of nodes representing the ERC-20 tokens and E is a set of edges connecting the nodes and representing the sharing of institutional investors between the two tokens. The edges are weighted by the numbers of shared investors between tokens. Figure 1 shows a visualization of the ERC-20 token coinvestment network.

Market Embeddedness Measures
The market embeddedness of ERC-20 tokens can be measured by various network structural properties, each reflecting a unique aspect of their market status. Strength centrality of a node v is defined as where w(u, v) is the weight of the edge connecting nodes u and v, i.e., the numbers of shared institutional investors between these two tokens. A higher strength centrality infers that the current token has more shared institutional investors with other tokens, and hence, the more commonly selected by institutional investors in their portfolios. When the edge weights w(u, v) are not considered, the strength centrality is equivalent to degree centrality. Closeness centrality is the reciprocal of the average distance of the node to other vertices. i.e., d u,v is the shortest path length from node u to v. For the unweighted centrality, all edge lengths are considered to be equal. When calculating the weighted centrality, the reciprocal of the edge weight is used as the edge's length. The higher the closeness centrality of a node is, the more central it is in the network. Tokens near the center of the network can be affected by other tokens in the whole network faster and more directly than peripheral ones, and therefore receive a more direct influence from market factors. Betweenness centrality of a node v is the sum of the fraction of all-pairs of shortest paths that pass through v.
where g s,t is the number of shortest paths from node s to t, and g s,t (v) is the number of those paths passing through node v. For the unweighted centrality, all edge lengths are considered to be equal. When calculating the weighted centrality, the reciprocal of the edge weight is used as the edge's length. Betweenness centrality describes the degree to which a node acts as a connection mediator between other nodes. A token with high betweenness centrality plays a key role in the investment network, as it passes the market influence between different sectors. Local clustering coefficient c(v) of node v is the fraction of possible triangles through that node, i.e., where T(v) is the number of triangles through node v and k(v) is the degree of v. From the perspective of structural hole theory [19], the lower the local clustering coefficient of a node is, the more structural holes are around it. The existence of structural holes makes the node dominating the spread of influence among its neighbors. So the lower the clustering coefficient a token has, the greater influence it passes on to other tokens through shared institutional investors.

Structural Similarity Measure
The Jaccard index defines the structural similarity between different nodes based on common neighbors, i.e., where Γ(v) is the set of neighbor nodes of node v. Regarding two directly connected nodes as a portfolio, the higher the Jaccard index, the higher degree of overlap between the investment portfolios of the two tokens. Therefore, they may be affected by similar market factors through institutional investors.

Structure Properties of the Co-investment Network
Structural properties, i.e., the number of nodes N, number of edges M, average degree k, network diameter D, average path length L, average clustering coefficient c, and the density ρ, of the ERC-20 co-investment network as of June 2020 are shown in Table 2. For comparative analysis, we also construct 1,000 randomized networks with the same degree distribution as the token network based on the edge rewiring algorithm. We find that the average clustering coefficient c in the coinvestment network is significantly higher than that in the randomized networks, indicating that the co-investment network is a typical small-world network like many real networks [20]. Figure 2 shows the cumulative degree distribution of the co-investment network on a semilog coordinate. CDF(k) is the proportion of nodes with degree greater than k in the whole network. The distribution follows an exponential function CDF(k) ∼ e −k/36.49 , based on nonlinear least squares fitting. The Kolmogorov-Smirnov test statistics for the goodness of fit is 0.06 with a corresponding p-value of 0.69.

Market Embeddedness Versus Market Performance
In light of the high skewness of market indicators, we pre-process them before further analysis. The price, market capitalization, and trading volume are taken logarithm transformations; the illiquidity is taken a negative logarithm transformation. Furthermore, all variables are standardized as x i x i − x/σ, where x is the mean value, σ is the standard deviation. We adopt ordinary least squares (OLS) linear regression models to analyze the relationship between market performance and various market embeddedness measures of the tokens in the co-investment network. For each market indicator, e.g., price p, we first develop a baseline multiple linear regression (MLR) model where V control is the vector of control variables. Then, for each of the market embeddedness measures e i , we construct another MLR model We are interested in examining the statistical significance of e i in model 2 and the difference ΔR 2 in the predictability, i.e., the R 2 s, in the two models. Table 3 shows the regression results for model 1 on all six market indicators. The R 2 s range from 0.05 to 0.43. p-values indicate that we cannot reject the hypotheses that the number of blockchain transfers and public interests (Alexa ranking) both has    impacts on the price, market capitalization, volume, and liquidity of tokens. Meanwhile, the number of Telegram subscribers has a significant impact on the volume and liquidity; while the Reddit user activities can have an impact on the tokens' market capitalization. However, none of the control variables are significantly correlated with market return and volatility. Moreover, the effect sizes of public interest and social network user activities are larger than blockchain activities. It means that people may prefer to treat tokens as an investment tool instead of using them for actual transactions or consumption. Table 4 shows the regression results of model 2. Specifically, strength centrality, at a significance level of 0.1%, improves the R 2 s of trading volume and liquidity by 0.06 and 0.05, respectively. It means that the more favored by institutional investors, the larger a token's market trading volume and liquidity will be. Closeness centrality is significant at the level of 0.1% for the trading volume and liquidity of tokens, and significant at the level of 5% for the market value, with positive estimated coefficients of 0.40, 0.44, and 0.10, respectively. It suggests that the more direct market impact the tokens receive through institutional investors, the higher their market trading activity will be. Betweenness centrality is significant at the level of 1% for the trading volume with a positive estimated coefficient of 0.26, 5% for the liquidity with a positive estimated coefficient of 0.27. It shows that the stronger the mediation power a token has in the market, the larger liquidity. Clustering coefficient is significantly negatively correlated with the market capitalization, trading volume, and liquidity of tokens at a level of 0.1%, and significantly positively correlated with tokens' volatility. This evidence indicates that tokens with less local influence have a low market capitalization and trading volume, poor liquidity, and high volatility. Also, we can infer that the tokens with greater local influence have better market liquidity and lower volatility. For market embeddedness measures with both weighted and unweighted definitions, only those with better regression results are reported here. Other results can be found in Supplementary Section 4.

Structural Similarity Versus Market Performance Similarity
Again, we use OLS regression models to test our hypothesis that token nodes with similar network structures in the co-investment network, hypothetically impacted by similar market factors, will lead to convergence in their market performance. The linear regression model between the structure similarity and the partial correlation coefficient of the tokens' market indicators is defined as where J i,j represents the Jaccard similarity between node i and node j, and ρ i,j(Ether) represents the partial correlation coefficient. Table 5 shows the regression results of model 3. We can find that the Jaccard similarity index is significantly positively correlated with all the market indicators' partial correlations, confirming our hypothesis. That is to say, the more common neighbors the two token nodes have in the co-investment network, i.e., the more overlapped their portfolios are, the more similar their market performance, including price, market capitalization, trading volume, liquidity, return, and volatility, will be.

CONCLUSION AND DISCUSSION
This paper studies the role institutional investors play in the ERC-20 token market and how they affect the market performance of tokens, e.g., price, trading volume, market capitalization, liquidity, return, and volatility. We construct a co-investment network with ERC-20 tokens as nodes and the pairwise sharing of institutional investors as edges. As such, the intertwined influences of institutional investors on different tokens are embedded in the network.
The significant correlations between the strength centrality, closeness centrality, betweenness centrality, clustering coefficient of tokens, and their market performance reveal institutional investors' positive impact on promoting market liquidity and reducing market volatility. Moreover, token nodes' structural similarity measured by the Jaccard index is significantly positively correlated with their market indicator similarity, suggesting that the sharing of investment institutions between tokens may result in converged market performance.
Our work demonstrates the inefficiency and vulnerability of this emerging market and the non-negligible influence of the institutional investors and the diffusion of such influence through co-investment relationships in the cryptocurrency market. Furthermore, we also remind individual investors to pay extra attention in this highly speculative market, for that institutional investors may deliberately manipulative the market, creating bubbles and crushes for profit.
Note that our dataset contains only the institutional investors' investments in 317 tokens out of approximately 7,000 tokens in circulation as of 2020. Nonetheless, as the tokens are mostly highly valued ones, we believe that the coinvestment network of these tokens is a representative sample of the core of the cryptocurrency market, hence our conclusions being able to be generalized to other parts of the cryptocurrency market.