Is Export a Probe for Domestic Production?

Recent works leverage export data to assess country production structure and ultimately country relative competitiveness. These works mostly rely only on the exported part of the total country output for reasons of data availability, homogeneity, and quality. Here we use the World Input-Output Database (WIOD), which offers cross-country harmonized data that accounts both for domestic production and export, to investigate to what extent export is a proxy for domestic production. We find that export mirrors remarkably well domestic production for manufacturing sectors or sectors related to physical goods. Conversely, this relation fades away for service related sectors. We found those relations consistently across most of the 40 countries for which data are available.


INTRODUCTION
The last decades have witnessed the building up of the awareness that economic thinking must embrace new paradigms in order to properly tackle the challenges set by the complex and adaptable nature of economic systems [1][2][3]. This shift has acted as a breeding ground for cross-disciplinary economics and finance theories and has led to a number of flourishing works bridging several fields ranging from network domain to complexity science. To illustrate a small fraction of these approaches we refer for instance to the complex linkage between micro and macro economic fluctuations [4], the non-trivial topology of World Trade Web [5][6][7], modeling of the inter-bank network [8] to assess financial systemic risk [9,10], technological and scientific progress modeling [11,12], and complex firm diversification trajectories [13,14].
However, the design of these studies and their general setups often reflect the general vision of economies as complicated rather than complex (adaptive) systems. This means that these empirical analyses tend to look at very limited channels of interaction suggesting direct and simple cause-effects (or in-out effects) [1]. This general frame for the empirical search of development determinants faces critical issues when the systems are increasingly complex and adaptive because internal feedback tends to break in-out schemes and lead to the emergence of collective and evolutionary behavior.
Economic Complexity reverses this perspective and starts from the final outputs in order to explain the root of country competitiveness and consequent growth trajectories. It indeed wants to infer country competitiveness from economic outputs, specifically from cross country output differences as they reflect cross country endowment differences which encode relative country strength [16,18,19,[39][40][41][42]. Conceptually speaking, these approaches are close to PageRank methodology: extracting information from network topology in order to measure nodes' features.
The economic output which is typically leveraged to measure differences of production across countries is countries' export basket, which is a subset of the total output of an economic system. Domestic production represents the remaining part of the economic output of a country. Exports are the preferred output in order to evaluate cross-country differences in terms of productive capabilities as they occur mainly on a competitive basis. A country exporting a product is likely signaling a competitive advantage and proving that country owns the capabilities required to produce that specific product. In addition, exports also offer a number of auxiliary features which make them an ideal candidate for these analyses: • Export datasets are harmonized across countries, being the result of data collection from customs offices. This means that all countries identify and define in the same way a specific product, making export baskets, once suitably normalized, comparable across countries; • The value of the flows is often doubly reported, by the exporter and by the importer, allowing to correct many errors and to de-bias inaccurate reporters; • They are available continuously since the sixties [43,44]; • They are available up to a very disaggregated level. Considering Harmonized System, products are hierarchically organized using different levels of aggregation identified by the number of digits used for the product code. For instance, 2 digits codes refer to about one hundred aggregated sectors, while 4 digits codes identify more than one thousand different products. Exports are available up to 6 digits. As a reference, 8-10 digits levels specify the level at which single firms compete (i.e., at those levels if two firms produce the same product, they are likely direct competitors). Exports are then available, consistently for all countries, just one level of aggregation higher than the level setting firm competition [44].
In this work we want to address the relationship that exists between export and domestic production of countries. In particular we want to understand to what extent export flows are mirroring production and therefore they can be used as a good proxy to decode the complexity of domestic production.
In particular, we want to understand if there exist sector-wise patterns of variability of the probing power of export flows. Unfortunately, from an operational point of view, we do not have direct, reliable, highly disaggregated, consistent crosscountry datasets tracing the structure of internal production, differently from what is available in the bilateral trade network. However, we are still able to test the relation at a more aggregate level. In order to test the validity of this assumption, we leverage a type of data made recently available by a number of scholars. The dataset we will refer in this work is the so-called WIOD [45,46]. This dataset extends the original Leontief input-output approach, which is usually provided for internal intra-sector flows, at a global scale (further details are provided in the next section). In WIOD we have access to the input and the output flows for 34 sectors, due to both domestic and import/export contribution, for a limited but significant set of 40 countries covering more than 85% of the world GDP in 2008. Additionally, the trades due to the remaining non-covered part of the world are estimated and included in an additional "country" called "Rest of the World" (RoW). We design a number of tests to statistically assess the probing power of export flows along the two possible cross-sections we can explore: first fixing sectors and then fixing countries.
Our results can be summarized as follows: i) At an aggregate level, exports are a good proxy for internal production for manufacturing sectors and sectors delivering physical goods. ii) The relationship between internal production and export fades away for service related sectors. This highlights differences between products and services and shows that services exports might not have the same meaning of tangible good related exports. This questions approaches aiming to achieve a straightforward extension to the service domain of cross country export differences by treating these class of activities as an extra set of products [47,48].
We found those relationships consistently for the countries considered and discuss the exceptions in the remainder of the paper. The paper is structured as follows: in section 2 we present the results of our research. Particularly, section 2.1 describes how we calculate the internal production and the export for each sector of each country considered. We then analyze those data in sections 2.2 and 2.3, respectively, sector by sector and country by country. We conclude in section 3 discussing our findings and presenting an outlook for our work. Finally, section 4 provides technical details on the statistical methods used. In the Supplementary Information, we provide further results and analyses supporting the main findings of the paper.

Assessing Domestic Production and Export
The Input-Output analysis, formalized by Leontief [49], provides a picture of the inter-industrial relationships. This kind of analysis gives a matrix representation of the interactions between industrial sectors of a country. The model considers an exchange economy divided into a certain number of industrial sectors in which the output from a sector becomes an input for another. In this way, it is possible to see how much each sector depends upon the others. The idea of Dietzenbacher et al. [45,46] was to expand the Leontief 's approach to world trades so they created the World Input-Output Database (WIOD), in which there are flows, quantified in current US dollars, exchanged between industrial sectors relative to several countries of the world. The WIOD contains annual time-series of WIOT, collected for a period of 17 years ranging from 1995 to 2011.
From each one of the WIOT we created a network (whose properties have been studied in [50]) as in Figure 1. We distinguish three different types of links: (i) self-link, representing the inputs that an industry takes from itself (colored dashed lines in the figure); (ii) link between the same industrial sector in two different countries (gray dotted line in the figure); (iii) link between different industrial sectors in the same country (colored solid line) and among countries (gray solid line). Self-links are mainly due to aggregated industry classification [50] and often represent a large amount of the total sector input/output. We neglect this data together with any link connecting the same industrial sector across countries. Therefore, we keep only links represented by solid lines in Figure 1. For each industrial sector s of the country c we define the internal flow I sc as the sum of the output flows toward industrial sectors of the same country. Similarly, we define the export flow E sc as the sum of the output flows toward industrial sectors belonging to other countries. The sum of I sc and E sc gives the total output flow of country c, industrial sector s.
Internal and export flows show high variability in terms of volume from country to country. A better parameter to estimate the importance of an industrial sector is the share with respect to the country's overall internal production or export. Hence, for each industrial sector of each country, we define an internal share i sc and an export share e sc . The former reflects the importance of that sector relative to the country's internal economy while the latter reflects its importance relative to the country's export. Shares are defined as: where s ′ runs over all the 34 industrial sectors.
The main goal of this work is to measure the similarity and the similarity's statistical significance of domestic and export shares sector-wise and country-wise.

Sector-Wise Analysis
Let us first consider the sector-wise similarity. We thus want to measure sector-by-sector whether domestic shares mirror export shares for the available countries. Being n the number of countries, we define d s = {i sc 1 , i sc 2 , ..., i sc n } the vector specifying the domestic shares of a product across countries and ex s = {e sc 1 , e sc 2 , ..., e sc n } the vector of the corresponding export shares. We measure the per sector similarity as the sample Pearson correlation of the vectors d s and ex s . The limited number of countries (n = 41) and the consequent limited statistics make a robust statistical validation of the measured correlation essential. We then require strategies in order to exclude that the sample correlation we observe is associated with a vanishing correlation for the underlying population, i.e., ρ = 0: where ρ is the population correlation coefficient. We will denote population correlation with Greek letters while sample correlation by Roman letters. The statistical validation of correlation can be achieved using different strategies; we will perform the most common ones and discuss the similarities of results witnessing the robustness of our basket of analyses. In detail: • Mitigation of outliers' role (analysis I): to study the typical range of variability of the observed sample correlation coefficient between the domestic and the export shares, we develop a bootstrapping procedure. Unfortunately domestic and export shares occasionally show a broad distribution and therefore we may occasionally fall into an outlier-type regime for some sectors. We then devise a procedure to mitigate the effect of outliers to test the robustness of our findings. The procedure combines a modified bootstrap with a permutation test and it is easily described by means of a concrete example. In Figure 2A, we show the scatter plot of the internal shares and export shares, i.e., d s and ex s , for two sectors (namely Electrical and Optical Equipment and Inland Transport). Each point in the graph represents a country. The sample correlation coefficient of these data is calculated through a modified bootstrapping to mitigate the possible effects of outliers. We essentially re-sample many subsets of the original pairs (further details are provided in Methods section). This permits to evaluate the typical range of variability of the correlation coefficients as shown by the histogram in Figure 2B. We define the sample correlation coefficient r for this sector as the average of the data in this histogram (pointed out by the vertical dotted black line in the same figure panel). To assess the significance of the obtained r we develop a p-value analysis: for each data subset extracted during the bootstrap we calculate the p-value as the results of a permutation test (see section Methods for further details). We then construct the cumulative distribution function of the obtained p-values, shown in Figure 2C. A significant correlation is usually attested by a low p-value. This translates in a p-values' cumulative distribution approaching 1 for small p-values.
In the examples shown in Figure 2 this is the case for the "Electrical and Optical equipment" industrial sector, while it is not the case for "Inland transport." We set a threshold T = 0.15 to define a sector correlated or not.
If the 70th percentile of the p-values distribution is below T then the sector is said to be correlated otherwise it is not. We marked in panel (c) of the same figure the 70th percentile of the data by a dotted black line and the 0.15 threshold by a dashed red line. We see that for the "Electrical and Optical Equipment" sector the 70th percentile of the data is well below the threshold. This means that the internal share d s and ex s of this sector are significantly correlated as measured by our definition of statistical significance. Vice versa for "Inland Transport" the 70th percentile of the data is greater than the threshold meaning a lack of a significant correlation.  correlation confidence level, we also performed a standard bootstapping procedure. We perform a sampling with replacement of n pairs from the original pairs defining our sample and, by repeating several times this procedure, we can estimate the distribution characterizing the sample correlation variability. • Permutation test (analysis III): to compare the sample's correlation information with a null model we perform a permutation test shuffling d s (or alternatively ex s ) and subsequently measuring the correlation and repeat several times this procedure in order to build the ensemble corresponding to the null case we want to exclude, i.e., the zero correlation scenario. A slightly different way to estimate the sample correlation distribution is to generate n pairs of uncorrelated (normal) random numbers, measure the correlation and repeat the procedure several times. Both procedures allow to define a pvalue for the observed sample correlation under the null hypothesis ρ = 0. In this work we will provide both approaches.  r))] approximately follows a Gaussian distribution with mean µ ρ = F(ρ) and variance where n is the sample size. It follows that the p-value of the sample correlation r under the null hypothesis ρ = 0 can be retrieved from the z-score z = (F(r) − µ ρ )/σ ρ = F(r) √ n − 3.

Sector Analysis I: Outlier Mitigation
In Figure 3, we present the 70th percentile p-value for all the sectors in the years from 1996 to 2011. They are sorted by the pvalue in 2011 and the sector names belonging to the services [51] are in bold text. We identify sectors for which the correlation is validated and sectors for which is not. Visually, we see that the service sectors are mostly at the bottom of the figure and they present a large p-value for most of the years analyzed. This reflects the fact that for those factor there is not a statistically significant correlation between the domestic production and the export. The three analyses underline the same trend in terms of validated and non-validated sectors (see Supplementary Information for detailed graphs). In general, a clear clustering is present between two categories of sectors: • Sectors showing a statistically significant correlation: "Wood and Products of Wood and Cork, " "Agriculture, Hunting, Forestry and Fishing, " "Textiles and Textile Products, " "Mining and Quarrying, " "Leather, Leather and Footwear, " "Pulp, Paper, Printing and Publishing, " "Basic Metals and Fabricated Metal, " "Electrical and Optical Equipment, " "Post and Telecommunication." We note that, with the only exception of "Post and Telecommunication, " all these sectors belong to the manufacturing and raw materials industries. • Sectors not showing a significant correlation: "Inland Transport, " "Health and Social Work, " "Public Admin and Defense; Compulsory Social Security, " "Air Transport, " "Other Supporting and Auxiliary Transport Activities; Activities of Travel Agencies, " "Electricity, Gas, and Water The general ordering of the sectors in terms of the significance of the measured correlation and in particular the different behavior for manufacturing and service sectors is robust across all the years available and not specific of a limited time period. However, few exceptions and trends can be spotted. A more explicit visualization of the evolution of significance in time is provided in Figure S3 where we show the time evolution of the 70th percentile p-value from 1996 to 2011 for each sector. We identify a temporal trend for some industrial sectors. In particular "Food, Beverages and Tobacco, " "Coke, Refined Petroleum and Nuclear Fuel, " "Chemicals and Chemical Products, " "Machinery, Nec, " and "Transport Equipment" show an increase in the correlation significance between internal share and export share in the period considered. On the contrary, the industrial sectors "Other Non-Metallic Mineral, " "Manufacturing, Nec; Recycling, " "Electricity, Gas and Water Supply, " "Sale, Maintenance and Repair of Motor Vehicles and Motorcycles; Retail Sale of Fuel" and "Retail Trade,

Except of Motor Vehicles and Motorcycles; Repair of Household
Goods" show a clear worsening of the correlation significance between the two quantities with time. Several factors may share a role in shaping the similarity between internal and export production. As a general consideration, an increasing correlation may be a signature for an increasing globalization and a reduction of trade barriers. However, underpinning the origin of the forces underlying these trends is beyond the scope of this work. The general ordering of sectors by statistical significance induced by the different tests proposed exhibits minor differences. However, two common features are shared by all analyses: 1. Statistical significance of domestic and export shares similarity tends to increase in time. We argue this may due to increasing trade liberalization and openness together with more integrated global value chain; 2. Two groups of sectors emerge in a consistent way in the period under investigation. One group, composed of sectors related to the manufacturing and raw material industries, present a significant correlation between the domestic output and the export. This correlation is instead not significant for another group of sectors composed of service-related sectors.

Sector Analysis II: Simple Bootstrap Results
The correlation coefficient between domestic and export shares of the statistically validated sectors are typically observed in the Frontiers in Physics | www.frontiersin.org range 0.2 − 0.9 as shown in Figure S2 and tend to increase over time. Interestingly negative coefficients are usually never statistically validated.

Sector Analysis III: Permutation Test and Fisher Transformation Results
In Figure S1, we show two Yes-No grids summarizing the statistical validation of the sample correlations we measure for the 34 sectors available, respectively, for the two permutation tests we propose and the Fisher transformation. Sectors are ordered according to decreasing p-value in 2011. Ordering is similar but few differences apply. A green plain dot corresponds to p ≤ 0.05, empty red squares to non-validated sectors. As a first result, both strategies provide essentially the same results and, more interestingly, we observe that non-validated sectors tend to be service-related sectors. Detailed p-value tables are provided in the Supplementary Information.

Country-Wise Analysis
So far we have seen that industrial sectors can be approximately classified into two groups on the basis of the statistical significance of the correlation between domestic and export shares. On average, export is a good proxy for domestic production for manufacturing sectors. Let us now consider the second cross-section analysis we are interested in: the countrywise analysis. Specifically, we want to investigate whether there exist country-specific patterns for the relationship between export and domestic production. In this section we will deal directly with flows (I sc and E sc ) instead of shares since we do not have scaling issues between flows from different countries, as in the previous section. We estimate the statistical robustness of per country correlations with the same methods we used for sectorwise analyses, namely permutation tests, Fisher transformation, bootstrap and the test with mitigation of outliers' effects. Referring to Figure 2 again, the general framework is similar: we still consider scatter plots as in panel (a) but now they have log(I sc ) and log(E sc ) on the horizontal and vertical axis, respectively, i.e., the internal flows of sectors and the export flow of the same sectors for a specified country.
In this section, we will run all the statistical tests proposed with two setups: excluding and keeping those sectors which are discarded by the outlier mitigated test discussed in the previous section. Discarded sectors can be retrieved year-by-year from Figure 3 (they are identified by a red dashed edge).

Country Analysis I: Outlier Mitigation
We define as p all c,yr and p c,yr the 70th percentile p-value calculated with all sectors and with only the validated sectors, respectively. We present the results of this analysis in Figure 4 for the years ranging from 1996 to 2011. Solid black lines represent p all c,yr while colored bars represent p c,yr . The color of the bar is: • Light green if p all c,yr > T and p c,yr < T, i.e., the correlation is statistically significant only excluding the discarded sectors in the sector-wise analysis; • Dark green if p all c,yr < T and p c,yr < T, i.e., in both cases the correlation is statistically significant. However, we note that p all c,yr tends to be always larger than p c,yr , therefore the case excluding services tends to be more significant from a statistical point of view; • Dark red if p all c,yr > T and p c,yr > T, i.e., the correlation is not statistically significant in both cases; • And light red if p all c,yr < T and p c,yr > T, i.e., excluding the discarded sectors decrease the statistical significance of the correlation between internal flows and export flows.
A visual inspection of Figure 4-countries are ordered with respect to p c,yr in 2011-reveals that the vast majority of countries show a notable increase in the significance and of the correlation itself after the removal of the non-validated sectors in the sectorwise analysis. This visually corresponds to the fact that empty bars are larger than colored ones for almost all countries. For instance, in year 2011 only 24 countries out of 41 have validated correlation coefficients including all sectors, after removing not-validated sectors, only 3 countries (i.e., France, Romania, and Taiwan) are not validated as statistically correlated in the country-wise analysis.
The second main observation revealed by the visual inspection of Figure 4 is the presence of a well-defined temporal trend which sees the growth of the number of validated correlation coefficients between export and internal production during the considered period. We have already identified this trend in the sectors' analysis (Figure 3) considering that the 70th percentile of the p-value is overall lower in the last years compared to the previous. However, in this perspective the country-wise analysis is a more suitable playground to look at structural changes of trades (Figure 4). We observe that there is a clear increase of green bars over time. Light red bars completely disappear after 2008 and as mentioned in the last year available the correlation is validated for 37 countries out of 40. This implies that country's specific patterns are disappearing and export is a good probe for internal output for the majority of the countries we can test. A tentative explanation of this behavior can be rooted in the evolution and the rise of world trades due to the globalization process and to the reduction of trade barriers in the period studied. In particular starting from 2008 a very high correlation between export and internal production is present for the vast majority of countries taken under exam.
Interestingly, most of the countries for which the correlation fails to be validated can be easily interpreted. Starting with persistent red light bars which are, in the perspective of the previous section, the most surprising cases, we find for instance Luxembourg which is indeed an economy traditionally dominated by services. We also find Italy but, as argued in Di Clemente et al. [13], Italy's economic system has evolutionary features which are peculiar and may affect the internal output. We do not have instead obvious interpretation of Brazil's behavior in the late nineties and Japan's one in the early 00s. Turkey and India trends toward an increasing correlation underpin their rising economic trajectories which is leading both countries to be pivotal nodes in the trade network.
A persistent anomaly with respect to the observed positive trend is represented by Romania where not only the correlation is lacking for all the years considered but also removing the non-correlated sectors worsens the situation. Regarding France and Taiwan, in some years the correlation is missing but still we find an improvement by removing the selected sectors. France's trade network appear to have specific features since also in [18] some anomalies have been detected.

Country Analysis II: Simple Bootstrap Results
The correlation coefficient between domestic ad export shares of the statistically validated countries are typically observed in the range 0.0 − 0.8 as shown in Figure S5. The red bands correspond to the case with all sectors while blue bands to the case with validated sectors only.

Country Analysis III: Permutation Test and Fisher Transformation Results
In Figure S4, we show two Yes-No grids summarizing the statistical validation of the sample correlations we measure for the 41 countries available, respectively, for the two permutation tests we propose and the Fisher transformation. In all cases we provide the results keeping and discarding non-validated sectors. Countries are ordered according to decreasing p-value in 2011. As for sectors, a green plain dot corresponds to p ≤ 0.05, empty red squares to non-validated countries. As a first result, both strategies provide essentially the same results, a country validated by the permutation tests is also validated by Fisher test. However, major differences apply when we discard non-validated sectors (the small symbols aside larger ones represent the results in this latter case). Discarding non-validated sectors we observe that an increasing trend of validated countries occur and the majority of countries is validated in 2011. Detailed p-value tables are provided in the Supplementary Information.

DISCUSSION
World Input-Output tables allow us to investigate, at an aggregate level, the relationship between the two parts of the economic output of a country: export and domestic production. The former part can be leveraged as a proxy for cross country production differences in order to assess country competitiveness. So a natural question arises, namely to what extent the fully competition-driven part of a country output, the export, is mirroring the domestic production network features and whether significant differences apply between the two parts. Input-Output tables allow making a substantial direct comparison as they provide sector output flows broken down into domestic and foreign contributions. The relation holds countrywise, even if few exceptions exist, as in the case of Romania.
The main finding is instead the existence of a sector-wise pattern of validity of statistical equivalence between domestic and export-destined production. While export mirrors domestic production structure for manufacturing sectors, the relationship fades away for services sectors. This implies that services export cannot be interpreted as in the case of manufacturing or physical goods: on the contrary, services are economic products characterized by an elusive and subtle nature, which shares features of both products and endowments/capabilities.
We point out, however, that this result does not necessarily question a straightforward extension of country competitiveness measures to services [47,48], by simply making use of data on international trade. Indeed, services are very different in nature from manufacturing, and are far less tradable; this shows up in the results of the present work. However, the economic complexity framework tries to track the hidden capabilities of countries, and these could emerge in a clearer way by looking to exports than to internal production, given the fact that the international competition plays a major role in the former.
In any case, this analysis is setting constraints and caveats on the general meaning of services export on a competitive basis. Services are economic activities for which geographical localization is often hard. For some of these activities the concept of localization is likely ill-defined, as in the case, for instance, of strategic consulting firms whose teams and project operate worldwide.
The results also provide a perspective to reconcile manufacturing and services sectors in order to join the two dimensions. Starting from those few countries for which export and domestic services are correlated one should first understand at an aggregate level how these two parts are mutually linked. Then, by segmenting countries on the basis of the domestic services diversity similarity, we can try to extend the mapping provided for those countries where the relationship holds to the countries belonging to the same cluster but for which there is a missing correlation between domestic services and services export. Provided in this way a scheme to estimate how to reconcile export and domestic services diversity and a re-scaling profile for each country, this mapping can be eventually extended at a disaggregate level.

Datasets
We used data extracted from World Input Output Tables (WIOT) [45,46]; they consist in 17 different tables, one for each year from 1995 to 2011. The structure of the table is a matrix that lists economic sectors associated to countries, in the same sequence, both vertically and horizontally. Values on the column represent inputs for the industrial sector and the country at the beginning of the column, expressed in monetary value; while the values on the row represent outputs from the sector and the country at the beginning of the row. Thus, any sector can be analyzed in terms of the direction and amount of its inputs and outputs. We used only the information relative to the fluxes exchanged between industrial sectors of all the countries considered in the database, which covers 27 European countries and 13 other major countries in the world. The 40 countries considered cover more than 85% of world gross domestic product (GDP) in 2008. A model for the Rest of the World (RoW), which accounts for the remaining 15% of world GDP, is used to predict the remaining trades. Each table contains fluxes in current US dollars between 35 industrial sectors. Fluxes both inside the same economy and toward foreign economies are reported. We use only data for 34 sectors since 'Private Households with Employed Persons' has often null input or output. WIOT also provides data for the final demand, government expenditures, depreciation of capital, taxes, etc. However, we do not use these data for a two-fold reason. On one hand, we are interested in the inter-industrial trades. On the other hand, by performing an analysis of competitiveness for countries as in Tacchella et al. [19] using export flows derived from WIOD dataset, we obverse that correlation with the results of the same analysis run on bilateral trade flows is higher when we remove final consumption, especially when services are included in the analysis. This again points in the direction of a non-trivial relationship between domestically-consumed and exported services.

Sector Names
Throughout the paper we used shortened versions of the WIOT sectors' names. In Supplementary Information, we provide the mapping of those shortened names with WIOD ones.

Correlation Significance Assessment for Sectors: Outlier Mitigation (Analysis I)
Our aim is to study the correlation between the internal production of a country and its export. We define these quantities correlated if the p-value of the correlation coefficients' distribution is lower then T = 0.15. We can in this way exclude having an accidental correlation between internal production and export. As a first step, we need a method that allows eliminating outliers from our data set in a systematic way, so that they do not influence the value of the correlation coefficient. For this reason, we perform a bootstrap using only 80% of data, randomly drawn, and we calculate the correlation coefficient only on these data. We repeat this operation 2, 500 times, in this way it is possible to build an empirical distribution measuring the typical range of the correlation coefficients (as shown in Figure 3). In order to assess the statistical robustness of the correlation coefficients, for each bootstrapped subsets we calculated the p-value (this means we now have 2,500 p-values). Each p-value is estimated by reshuffling bootstrapped subset data 5,000 times and by calculating the percentile corresponding to the correlation coefficient of the bootstrapped subset with respect to the correlation distribution obtained from this random ensemble. If the 70th percentile of the p-values distribution is below T then the sector is said to be correlated, otherwise it is not.
It is worth noticing that this approach is robust against noise thanks to the bootstrapping and the calculation of p-values on the bootstrapped data. This is a necessity when dealing with this kind of data, which naturally present outliers and a component of noise.

Correlation Significance Assessment for Countries: Outlier Mitigation (Analysis I)
When we study the correlation between internal production and export relative to each country we deal directly with fluxes instead of shares. Indeed, in this case, we do not mix up data from different countries. Eventually the values that we take for the comparison are the log of the internal flux log(I sc ) and the log of the export flux log(E sc ).
The procedure we adopted to establish the correlation is exactly the same used for products. We obtain the p-value relative to the 70th percentile of the distribution if its lower than T for that country export is a good probe of internal production otherwise it is not.

DATA AVAILABILITY STATEMENT
WIOD dataset is publicly available and can be downloaded from the website (www.wiod.org/home).