Enhanced Gravity Model of trade: reconciling macroeconomic and network models

The structure of the International Trade Network (ITN), whose nodes and links represent world countries and their trade relations respectively, affects key economic processes worldwide, including globalization, economic integration, industrial production, and the propagation of shocks and instabilities. Characterizing the ITN via a simple yet accurate model is an open problem. The traditional Gravity Model (GM) successfully reproduces the volume of trade between connected countries, using macroeconomic properties such as GDP, geographic distance, and possibly other factors. However, it predicts a network with complete or homogeneous topology, thus failing to reproduce the highly heterogeneous structure of the ITN. On the other hand, recent maximum-entropy network models successfully reproduce the complex topology of the ITN, but provide no information about trade volumes. Here we integrate these two currently incompatible approaches via the introduction of an Enhanced Gravity Model (EGM) of trade. The EGM is the simplest model combining the GM with the network approach within a maximum-entropy framework. Via a unified and principled mechanism that is transparent enough to be generalized to any economic network, the EGM provides a new econometric framework wherein trade probabilities and trade volumes can be separately controlled by any combination of dyadic and country-specific macroeconomic variables. The model successfully reproduces both the global topology and the local link weights of the ITN, parsimoniously reconciling the conflicting approaches. It also indicates that the probability that any two countries trade a certain volume should follow a geometric or exponential distribution with an additional point mass at zero volume.


I. INTRODUCTION
The International Trade Network (ITN) is the complex network of trade relationships existing between pairs of countries in the world.The nodes (or vertices) of the ITN represent nations and the edges (or links) represent their (weighted) trade connections.In a global economy extending across national borders, there is increasing interest in understanding the mechanisms involved in trade interactions and how the position of a country within the ITN may affect its economic growth and integration [1][2][3][4][5].Moreover, in the wake of recent financial crises the interconnectedness of economies has become a matter of concern as a source of instability [6].As the modern architecture of industrial production extends over multiple countries via geographically wider supply chains, sudden changes in the exports of a country (due e.g. to unpredictable financial, environmental, technological or even political circumstances) can rapidly propagate to other countries via the ITN.The assessment of the associated trade risks requires detailed information about the underlying network structure [7].In general, among the possible channels of interaction among countries, trade plays a major role [2][3][4].
The above considerations imply that the empirical structure of the ITN plays a crucial role in increasingly many economic phenomena of global relevance.It is therefore becoming more and more important to characterize the ITN via simple but accurate models that identify both the basic ingredients and the mathematical expressions required to accurately reproduce the details of the empirical network structure.Reliable models of the ITN can better inform economic theory, foreign policy, and the assessment of trade risks and instabilities worldwide.
In this paper, we emphasize that current models of the ITN have strong limitations, and that none of them is satisfactory, from either a theoretical or a phenomenological point of view.We point out equally strong (and largely complementary) problems affecting on one hand traditional macroeconomic models, which focus on the local weight of the links of the network, and on the other hand more recent network models, which focus on the existence of links, i.e. on the global topology of the ITN.We then introduce a new model of the ITN that preserves all the good ingredients of the models proposed so far, while at the same time improving upon the limitations of each of them.The model can be easily generalized to any (economic) network and provides an explicit specification of the full probability distribution that a given pair of countries is connected by a certain volume of trade, fixing an otherwise arbitrary choice in previous approaches.This distribution is found to be either geometric (for discrete volumes) or exponential (for continuous volumes), with an additional point mass at zero volume.This feature, which is different from all previous specifications of international trade models, is shown to replicate both the local trade volumes and the global topology of the empirical ITN remarkably well.

II. PRELIMINARIES: BUILDING BLOCKS OF THE MODEL
Before we fully specify our model, we preliminarily identify its building blocks by reviewing the strengths and weaknesses of the two main modelling frameworks adopted so far.

A. Gravity models of trade
We start by discussing traditional macroeconomic models of international trade.These models have mainly focused on the volume (i.e. the value e.g. in dollars) of trade between countries, largely because the economic literature perceives trade volumes as being a priori more informative than the topology of the ITN: the striking heterogeneity of trade volumes observed between different pairs of countries is clearly not captured by a purely 'binary' description where all connections are effectively given the same weight.Based on this argument, emphasis has been put on explaining the (expected) volume of trade between two countries, given certain dyadic and country-specific macroeconomic properties.
Jan Tinbergen, the physics-educated 1 Dutch economist who was awarded the first Nobel memorial prize in economics, introduced the so-called Gravity Model (GM) of trade [8].The GM aims at inferring the volume of trade from the knowledge of Gross Domestic Product, mutual geographic distance, and possibly additional dyadic factors of macroeconomic relevance [9,10].In one of its simplest forms, the GM predicts that, if i and j label two different countries (i, j = 1, . . ., N where N is the total number of countries in the world), then the expected volume of trade from i to j is where GDP k is the Gross Domestic Product of country k, R ij is the geographic distance between countries i and j, and c, α, β, γ are free global parameters to be estimated.
In the above directed specification of the GM, the flows w ij and w ji can be different.An analogous undirected specification exists, where the volumes of trade from i to j and from j to i are added together into a single value w ij = w ji of bilateral trade.In the latter case, Eq. ( 1) still holds but with the symmetric choice α = β.With this in mind, we will keep our discussion entirely general throughout the paper and, unless otherwise specified, allow all quantities to be interpreted either as directed or as undirected.Only in our final empirical analysis we will adopt an undirected description for simplicity.
1 Jan Tinbergen studied physics in Leiden, where he carried out a PhD under the supervision of the theoretical physicist Paul Ehrenfest.Tinbergen defended his thesis in 1929, and then became a leading economist.He was awarded the first Nobel memorial prize in economics in 1969.
More complicated variants of Eq. ( 1) use additional factors (with associated free parameters) either favoring or resisting trade [9,10].Like the GDP and geographic distances, these factors can be either country-specific (e.g.population) or dyadic (e.g. common currency, trade agreements, shared borders, common language, etc.).In general, if we collectively denote with n i the vector of all node-specific factors and with D ij the vector of all dyadspecific factors used (note that these vectors may have different dimensionality), Eq. ( 1) can be generalized to where the functional form of F φ ( n i , n j , D ij ) need not be of the same type as in Eq. ( 1), and φ is a vector containing all the free parameters of the model (like c, α, β, γ for the particular case above).Indeed, although in this paper we focus on the GM applied to the international trade network, our discussion equally applies to many other models of (socio-economic) networks as well.For instance, the recently proposed Radiation Model (RM) [11] is also described by Eq. ( 2), where n i and D ij include certain geographical and demographical variables.Our following discussion applies to both the GM and the RM, as well as any other model described by Eq. ( 2).Similarly, it does not only apply to trade networks, since both the GM and the RM have been successfully applied to other systems as well, including mobility and traffic flows [11][12][13][14], communication networks [15], and migration patterns [16] (the latter representing -to our knowledge -the earliest application of the GM to a socio-economic system, dating back to 1889 [17]).
It is generally accepted that the expected trade volumes postulated by the GM, already in its simplest form given by Eq. (1), are in good agreement with the observed flows between trading countries.To illustrate this result, in Fig. 1 we show a typical log-log plot comparing the empirical volume of the realized (bilateral) international trade flows with the corresponding expected values calculated under the GM as defined in Eq. (1) (with parameters calculated as reported in Table I).The figure shows the typical qualitative consistency between the GM and the empirical non-zero trade volumes.However, it should be noted that, while Eqs.( 1) and ( 2) define the expected value of w ij , the full probability distribution from which this expected value is calculated is not specified, and actually depends on how the model is implemented in practice.In the GM case, the distribution is chosen to be either Gaussian (corresponding to additive noise, in which case the expected weights can be fitted to the observed ones via a simple linear regression [18,19]), log-normal (corresponding to multiplicative noise and requiring a linear regression of log-transformed weights [20] as we did to produce Fig. 1 and Table I), Poisson [20], or more sophisticated [21] (see [22] for a review).The arbitrariness of the weight distribution already highlights a fundamental weakness of the traditional formulation of the model.Moreover, for both additive and multiplica- Log-log plot comparing the empirical volume (y-axis) of all non-zero bilateral trade flows in the ITN with the corresponding expected volume (x-axis) predicted by the Gravity Model defined in Eq. ( 1), with parameters estimated as reported in Table I.Top left: year 1970, top right: year 1980, bottom left: year 1990, bottom right: year 2000.The black line is the identity line corresponding to the ideal, perfect match that would be achieved if the empirical weights were exactly equal to their expected values, i.e. in complete absence of randomness.
tive Gaussian noise, the model can produce undesired negative values.A related but more fundamental limitation of the GM is that, at least in its simplest and most natural implementations, it cannot generate zero volumes -thereby predicting a fully connected network [22][23][24].This means that the GM can be fitted only to the non-zero weights, i.e. the volumes existing between pairs of connected countries.If used in this way, the model effectively disregards the empirical structure of the network, both as input (thus making predictions on the basis of incomplete data) and as output (thus failing to reproduce the topology).Operatively, the GM can be used only after the presence of a trade link has been established independently [22].As observed in [21], "Omitting zero-flow observations implies that we loose information on the causes of (very) low trade", because any fit to positiveonly flows would significanlty underestimate the effects of factors that diminish trade.This problem is particularly critical since roughly half of the possible links are found not to be realized in the real ITN [25][26][27][28].Clearly, the same problem holds for the RM and any more general model of the form specified in Eq. ( 2).While there are variants and extensions of the GM that do generate zero weights and a realistic link density (e.g. the so-called Poisson pseudo-maximum likelihood models [20] and 'zero-inflated' gravity models [21]), these variants systematically fail in reproducing the observed topology [10,22].In other words, while these models can generate the correct number of connections, they tend to put many of the latter in the 'wrong place' in the network.Indeed, even in its generalized forms, the GM predicts a largely homogeneous network structure, while the empirical topology of the ITN is much more heterogeneous and complex [22,23].Established empirical signatures of this heterogeneity include a broad distribution of the degree (number of connections) and the strength (total trade volume) of countries [25][26][27][28][29][30][31][32][33][34][35], the rich-club phenomenon (whereby well-connected countries are also connected to each other) [36,37], strong clustering and (dis)assortative patterns [26,27].These highly skewed structural properties are remarkably stable over time.However, their are not replicated by any current version of the GM [22].

B. Network models of trade
As we mentioned at the beginning, many processes of great economic relevance crucially depend on the largescale topology of the ITN.In light of this result, the sharp contrast between the observed topological complexity of the ITN and the homogeneity of the network structure generated by the GM (including its extensions) calls for major improvements in the modelling approach.In particular, in assessing the performance of a model of the ITN, emphasis should be put on how reliably the (global) empirical network structure, besides the (local) volume of trade, is replicated.In the network science literature, successful models of the ITN have been derived from the Maximum Entropy Principle [24][25][26][27][28][38][39][40][41][42][43][44].These models construct ensembles of random networks that have some desired topological property (taken as input from empirical data) and are maximally random otherwise.Typically, the constrained properties are chosen to be the degrees and/or the strenghts of all nodes.In this way the models can perfectly replicate the observed strong heterogeneity of these purely local properties, and at the same time illustrate its immediate (i.e.prior to invoking any other more complicated network formation mechanism) structural effects on any higher-order topological property of the network.In the different context of financial networks, where the main challenge is a reliable inference of the unobserved topology of a network (typically of interconnected firms or banks) starting from partial, node-aggregate information [45], maximumentropy models have recently turned out to deliver the best-performing reconstruction methods so far [43][44][45].
In general, different choices of the constrained properties lead to different degrees of agreement between the model and the data.This can generate intriguing and counter-intuitive insight about the structure of the ITN.For instance, contrary to what naive economic reasoning would predict, it turns out that the knowledge of purely binary local properties (e.g.node degrees) can be more informative than the knowledge of the corresponding weighted properties (e.g.node strengths).Indeed, while the binary network reconstructed only from the knowledge of the degrees of all countries is found to be topologically very similar to the real ITN, the weighted network reconstructed only from the strengths of all countries is found to be much denser and very different from the real network [26][27][28].This is somewhat surprising, given that the economic literature largely postulates that weighted properties are per se more informative than the corresponding binary ones.
The solution to this apparent paradox lies in the fact that, while the knowledge of the entire weighted network is necessarily more informative than that of its binary projection (in accordance with economic postulates), the knowledge of certain marginal properties of the weighted network can be unexpectedly less informative than the knowledge of the corresponding marginal properties of the binary network.In fact, it turns out that if the degrees of countries are (not) specified in addition to the strengths of countries, the resulting maximum-entropy model can(not) reproduce the empirical weighted network of international trade satisfactorily [27,40,41].
An important take-home message is that, in contrast with the mainstream literature, models of the ITN should aim at reproducing not only the strength of countries (as the GM automatically does by approximately reproducing all non-zero weights), but also their degree (i.e. the number of trade partners) [26][27][28]41].In addition to these studies, an alternative approach, the Linear Gravity Transportation Model (LGTM), has also demostrated the importance of the ITN topology [46].In this model the monetary flow is balanced for each country (node) based on the number of trade partners (degree).The model produces expectations of the GDP of countries that are consistent with real data, using both the volume of trade flows and the topology of ITN as input.These studies indicate that, in order to devise improved models of the ITN, one should include the degrees, which are purely topological properties, among the main target quantities to replicate.This is the guideline we will follow in this paper.
Unlike the GM, maximum-entropy models of trade are a priori non-explanatory, i.e. they take as input structural properties (as opposed to explanatory economic factors) to explain other structural properties.However, they can in fact be used to select a posteriori an explicit, empirically validated functional dependence of the structure of the ITN on underlying explanatory factors.For models with country-specific constraints, this operation can be carried out as follows.Mathematically, controlling for node-specific properties is realized by assigning one or more Lagrange multipliers, also known as 'hidden variables' or 'fitness parameters' x i , to each node.If a certain choice of local constraints is found to replicate the higherorder properties of the real-world network satisfactorily, then one can look for an empirical relationship between the values of the associated hidden variables and those of candidate non-topological, country-specific factors of the type n i , like the GDP or total import/export.If the hidden variables are indeed (at least approximately) found to be functions of some country-specific factors (i.e. if x i ≈ f ( n i )), then one can replace x i with f ( n i ) in the maximum-entropy model, thus reformulating the latter as a model with explanatory variables (i.e.'regressors') of trade, precisely like the GM.Already in one of the earliest studies on the ITN topology [30], the approach outlined above led to the definition of a GDP-driven model for the binary structure of the network, where x i ∝ GDP i (i.e., in this case x i is taken to be one-dimensional).The model, which is a reformulation of a maximum-entropy model for binary networks with given degrees, predicts that the probability of a trade connection existing from country i to country j is where δ is a free parameter that allows to reproduce the empirical link density.The model has been tested successfully in multiple ways [24,25,30,32,38].
The GM in Eq. ( 1) and the maximum-entropy model in Eq. ( 3) have complementary strengths and weaknesses, the former being a good model for non-zero volumes (while being a bad model for the topology) and the latter being a good model for the topology (while providing no information about trade volumes).An attempt to reconcile these two complementary and currently incompatible approaches has been recently proposed via the definition of an extension of the maximum-entropy model to the case of weighted networks [42].Since, as we mentioned, a maximum-entropy model of weighted networks with given strengths and degrees [40] can correctly replicate many structural properties of the ITN [41], it makes sense to reformulate such model as an economically inspired model of the ITN.Indeed, like in the binary case, the hidden variables enforcing the constraints are found to be strongly correlated with the GDP, thus allowing to express both p ij and w ij as functions of the GDP [42].The resulting model is confirmed to be in good accordance with both the topology and the volumes observed in the real ITN.
Unfortunately, in the above approach the choice of country-specific constraints (degrees and strengths) only allows for regressors that have a corresponding countryspecific nature.This makes the model in Ref. [42] incompatible with the inclusion of dyadic variables of the type D ij and represents a strong limitation for (at least) two reasons.Firstly, one of the main lessons learnt from the traditional GM is that the addition of geographic distances improves the fit to the empirical volumes significantly.Indeed, in the light of the large body of knowledge accumulated in the international economics literature, it is hard to imagine a realistic and economically meaningful model of international trade that does not allow for simple pair-wise quantities controlling for trade costs and incentives, including geography [9,10].Sec-ondly, even if the structure of the ITN can be replicated satisfactorily in terms of the 'GDP-only' model defined in Eq. ( 3) [25,30,32], recent analyses have found evidence that certain metric (although not necessarily geographic2 ) distances do also play a role in determining the topology of the ITN [47].Together, these two pieces of evidence call for an inclusion of dyadic factors in w ij and p ij , and highlight a limitation of current maximum-entropy models based only on country-specific constraints.
Combining all the above considerations, it is clear that an improved model of the ITN should aim at retaining the realistic trade volumes postulated by models based on Eq. ( 2) (including the GM, the RM, and possibly many more), while combining them with a realistic network topology generated by (extensions of) maximum-entropy models.Such a model should also aim at providing the full probability distribution, and not only the expected values as in Eq. ( 1), of trade flows and, unlike the GDPonly model in Eq. ( 3) [25] or its current weighted extension [42], allow for the inclusion of both dyadic and node-specific macroeconomic factors.

III. THE ENHANCED GRAVITY MODEL OF INTERNATIONAL TRADE
In this Section, we introduce what we call the Enhanced Gravity Model (EGM) of trade.The EGM mathematically formalizes the two ingredients that, in the light of the previous discussion, any 'good' model of economic networks should feature: namely, realistic (trade) volumes and a realistic topology, both controllable by macroeconomic factors.

A. A single model for topology and weights
The first lesson we have learnt is that Eq. ( 2) is successful in reproducing link weights only after the existence of the links themselves has been preliminarly established.This implies that Eq. ( 2), as a model of real-world trade flows, is actually unsatisfactory and should rather be reformulated as a conditional expectation of the weight w ij , given that w ij > 0. In other words, if a ij denotes the entry of the adjacency matrix A = Θ(W) of the ITN (defined via the step function as ), an improved model should be such that Eq. ( 2) is replaced by where w ij |a ij = 1 is the conditional expected weight of the trade link from country i to country j, given that such link exists.This operation ensures that, whatever the new model looks like, its predictions for the expected trade volume between connected pairs of countries remain identical to the ones proposed in more traditional macroeconomic models.For instance, choosing 1) allows us to retain (in almost intact form) all the empirical knowledge that has accumulated in the econometrics literature since Jan Tinbergen's introduction of the GM.An important difference, however, is that in our model the trade volumes will be drawn from a different probability distribution.
The second lesson we have learnt is that, in analogy with Eq. ( 4), Eq. ( 3) should be generalized to allow for both dyadic ( D ij ) and node-specific ( n i ) factors as follows: where a crucial requirement is that G can in general be different from F in Eq. ( 4) and, correspondingly, the vector ψ of parameters can be different from φ.Note that, since p ij is monotonic in G, the above expression is entirely general, i.e. we have put no restriction on the functional form of p ij .It is also worth noticing that the explanatory factors used in Eqs. ( 4) and ( 5) need not coincide.However, to avoid using different symbols for the arguments of the two functions, we adopt the convention that D ij and n i denote the sets of all factors used as arguments of either F or G, and that these functions can have flat (i.e.no) dependence on some of their arguments.For instance, Eq. ( 5) reduces to Eq. ( 3) by setting n i = GDP i and assuming flat dependence on D ij , or it reduces to the hyperbolic model in Ref. [47] by setting D ij equal to the hyperbolic distance and assuming flat dependence on n i .We want our model to produce both Eq. ( 4) as the desired (gravity-like) conditional expectation for link weights and Eq. ( 5) as a realistic expected topology.To do so, we introduce the full probability P (W) that the model produces a weighted network specified by the N ×N matrix W with entries (w ij ).We are free to choose whether w ij takes non-negative integer values (in which case P (W) is a multivariate Probability Mass Function, or PMF) or non-negative real values (in which case P (W) is a multivariate Probability Density Function, or PDF).The distribution P (W) is the key quantity that fully specifies the model and determines both the topology and the link weights of the ITN.From P (W), focusing on a single pair i, j of nodes and integrating out all other pairs, we can define the dyadic distribution q ij (w) indicating the probability (mass or density) that w ij takes the particular value w.Note that the event w ij > 0 indicates the presence of a trade link (i.e. a ij = 1).By contrast, the event w ij = 0 indicates the absence of a trade link (i.e. a ij = 0) but is also included as a possible outcome in q ij (w).The normalization condition is therefore w≥0 q ij (w) = 1 (for integer weights) or w≥0 dw q ij (w) = 1 (for continuous weights, in which case we anticipate that q ij (w) will have a delta-like point mass at w = 0) for all i, j.Note that we are not assuming independence of the trade volumes w ij and w kl between two distinct country pairs, or equivalently the factorization of P (W) into the product i,j q ij (w ij ) of dyadic probabilities.However, we will later find that the desired model has precisely this independence property.Importantly, unlike in the traditional GM, in our approach dyadic independence is a consequence and not a postulate.
We now look for the form of q ij (w) that enforces both Eqs. ( 4) and ( 5).Let us consider the latter first.In terms of q ij (w), the probability p ij that i and j are connected (irrespective of the volume of trade) is given by the complement of the probability q ij (0) that they are not connected, i.e.
where, for real-valued weights, q ij (0) denotes the point mass, i.e. the magnitude of the delta-like probability density function q ij (w), at w = 0. Imposing that Eq. ( 6) has the form dictated by Eq. ( 5) leads to the following unique choice for q ij (0): We now relate q ij (w) to Eq. ( 4) in a similar manner.The expected trade volume, irrespective of whether a link exists, is (note that the event w = 0 does not contribute to the above quantity).On the other hand the conditional probability that w ij equals w, given that the link is realized (w > 0), is and its expected value gives the conditional expectation of the link weight, given that the link exists: Setting Eq. ( 10) equal to Eq. ( 4) leads to (11) Equation ( 11) carries an important message.It reveals that, while a superficial inspection of Eq. ( 8) might suggest that the expected trade volume w ij is independent of the topology of the ITN, i.e. on q ij (0) or equivalently G, this is actually not the case.In fact, q ij (0) is coupled to the other values q ij (w) (with w > 0) through the normalization condition manifest in Eq. ( 6).This necessarily implies that the topology of the ITN must have an immediate effect on the expected volume of trade between any two countries.This effect is rigorously quantified in Eq. (11), which shows that w ij depends on both F and G.This result confirms the inconsistency of the traditional GM defined in terms of Eq. ( 1) and of any of its extensions of the form given by Eq. ( 2).By contrast, the expected topology of the ITN is independent of the expected volumes of trade, since p ij depends on G but not on F .This simple but, to the best of our knowledge, previously unrecognized result highlights a nontrivial asymmetry between weights and topology in the ITN and, by extension, in any (economic) network described by our generic expressions involving F and G.This basic finding provides a natural explanation for the aforementioned empirical observation that the topology of the ITN and several other networks can be satisfactorily reconstructed from aggregate local constraints [26,40], while the same result does not hold for the weighted structure of the same network(s) [27,28], unless topological information is explicitly included as an additional constraint [40,41].

B. Maximum entropy construction
Equations ( 7) and (11) fix two important properties we require for q ij (w) and ultimately P (W), but they do not specify these probability distributions uniquely.To do so, we invoke the Maximum-Entropy Principle to ensure that the functional form of P (W) is maximally random, given the desired constraints.As well known, this procedure is guaranteed to lead to the least biased inference, i.e. to introduce no unjustified 'hidden' assumption in picking a specific form of P (W) [43,44].In applying this method we will focus primarily on the case of integer-valued link weights, since this requirement matches the datasets in our analysis.The case of realvalued link weights is treated in the Appendix and the corresponding key results are briefly reported at the end of this Section.
We look for the form of P (W) that maximizes the entropy functional (where the sum extends over all weighted graphs with N nodes, non-negative integer link weights, and such that w ii = 0 for all i) subject to the constraints specified by Eqs. ( 7) and (11).Since Eq. ( 7) is equivalent to Eq. ( 5), we select a ij and w ij (for all pairs i = j) as the two sets of constraints specifying our model.In this way, if we introduce α ij and β ij as the (real-valued) Lagrange multipliers required to enforce the expected value of a ij = Θ(w ij ) and w ij respectively (where Θ(x) = 1 if x > 0 and Θ(x) = 0 otherwise), then the maximumentropy problem becomes equivalent to one solved exactly in Ref. [48].There, it was shown that upon introducing the so-called Hamiltonian (representing a linear combination of the quantities whose expected value is being constrained) and the partition function Z = W e −H(W) , the maximum-entropy probability P * (W) with constraints a ij and w ij is found to be where, given x ij ≡ e −αij ∈ (0, +∞) and y ij ≡ e −βij ∈ (0, 1), is the resulting (maximum-entropy) probability that the link from node i to node j carries a weight w.This probability is called the Bose-Fermi distribution, as it unifies the Bose-Einstein and Fermi-Dirac distributions encountered in quantum statistical physics [48].We stress again that all our formulas apply to both directed and undirected representations of the network and, correspondingly, the sums and products over i, j should be interpreted as i = j in the directed case (where the pairs i, j and j, i are different) and as i < j in the undirected one (where the pair i, j is the same as the pair j, i).As we had anticipated, the factorization of P * (W) in terms of products of q * ij (w) shows that, for this particular choice of the constraints, pairs of nodes turn out to be statistically independent as in the standard GM approach, even if we have not assumed this independence as a postulate in our approach.
Importantly, while the constraints used in the maximum-entropy models of the ITN considered so far in the literature are observed topological properties (e.g. the degrees and/or the strengths of nodes), the constraints considered here are economically-driven expectations, namely Eqs. ( 5) and (11).This key step allows us to reconcile macroeconomic and network approaches within a generalized framework, and represents an important difference with respect to previous models.In particular, we use Eqs.( 6), ( 8) and (10) to express p ij , w ij and w ij |a ij = 1 in terms of x ij and y ij [48]: The above expressions allow us to rewrite Eq. ( 15) as Now, equating Eq. ( 16) to Eq. ( 5) and Eq. ( 17) to Eq. ( 11) (or, equivalently, Eq. ( 18) to Eq. ( 4)) allows us to find the values of x ij and y ij solving the original problem: Inserting Eqs. ( 20) and ( 21) into Eq.( 19), we finally get the explicit probability q * ij (w) of any two countries trading a volume w, as a function of any choice of the factors n i and D ij .In terms of conditional probabilities, the model becomes extremely simple: establishing a link from country i to country j is a Bernoulli trial with success probability p ij given by Eq. ( 5); if realized, this link acquires a weight w with probability which is a geometric distribution representing the chance of w − 1 consecutive successes, each with probability y ij , followed by a failure with probability 1 − y ij .The above result provides an insightful interpretation of the realized volumes in the model in terms of processes of link establishment and link reinforcement (see Discussion).

C. Maximum-Likelihood parameter estimation
We now take an econometric perspective and discuss how the model parameters can be chosen to optimally fit a specific empirical instance of the network.To this end, we use the Maximum Likelihood (ML) principle applied to network models [38].If W * denotes the weight matrix (with entries w * ij ) of the empirical network, our model generates this particular matrix with probability P * (W * ).We therefore define the log-likelihood function as (where we have dropped the dependence of F and G on n i , n j , D ij ) and look for the parameter values φ * , ψ * that maximize L( φ, ψ) by requiring that all the first derivatives with respect to φ and ψ vanish simultaneously: For probability distributions belonging to the exponential family, i.e. in the form given by Eq. ( 14) like the one we are considering, the second derivatives of the loglikelihood coincide with (minus) the covariances between the constraints included in the Hamiltonian defined in Eq. ( 13) (see for instance [49,50]).Since covariance matrices are positive-semidefinite (and actually positivedefinite if the chosen constraints are linearly independent, i.e. non-redundant), L( φ * , ψ * ) is indeed a (global, in the positive-definite case) maximum for L( φ, ψ), ensuring that the solution ( φ * , ψ * ) to Eqs. ( 23) and ( 24) yields the optimal parameter values in our model.Selecting these values into Eqs.( 20) and ( 21) yields the values x * ij and y * ij that, when inserted into Eq.( 15), fully specify the model.
The above expressions, which are valid for any specification of the EGM, show that the estimation of the parameter φ nicely separates from that of ψ.This result solves, in a single step, two major problems encountered in previous econometric approaches: on one hand, in most alternative models the estimation of the parameters determining the expected weights is badly affected by the presence of the zeroes; on the other hand, the expected number of zeroes may paradoxically depend on the (arbitrary) units of measure for the weights.For instance, if q ij (w) is a Poisson distribution as in zeroinflated GMs [20][21][22], then its only parameter (the mean) determines both the magnitude of link weights and the connection probability p ij .As the monetary units in the data are changed arbitrarily (e.g. from dollars to thousands of dollars), so will the estimated mean and the resulting expected number of zeroes.By contrast, in our model the monetary units affect φ * but not ψ * (hence F as they should, but not G).

D. Real-valued trade flows
The above results can be adapted in a straightforward, although more technical, fashion to the case when link weights are assumed to take non-negative real values.The entire derivation is reported the Appendix.For brevity, here we only report the main results.
In the real-valued case, P * (W) is a multivariate PDF (rather than a PMF) and we look for its form by maximizing a modified version of the entropy functional S[P ], under the same constraints on a ij and w ij (for all pairs i = j) used above and still given by Eqs. ( 5) and (11).The result is again of the factorized form given by Eq. ( 14), where the Hamiltonian H(W) is still the one defined in Eq. ( 14) while the partition function Z is different and the resulting expression for q * ij (w) is where δ(w) is the Dirac delta function and p ij is still given by Eq. ( 5).The above expression shows that q * ij (w) has now a point mass of magnitude q * ij (0) = 1 − p ij at w = 0, followed by a purely exponential probability density for w > 0. By design, the above PDF still produces the desired conditional expected trade volume w ij |a ij = 1 , connection probability p ij and unconditional expected trade volume w ij given by Eqs. ( 4), ( 5) and ( 11) respectively.Establishing a link from country i to country j is still a Bernoulli trial with success probability p ij given by Eq. ( 5); if realized, this link acquires a weight w with conditional probability density which is now a purely exponential distribution with the desired (conditional) mean F φ ( n i , n j , D ij ).
The estimation of the parameters φ and ψ can be carried out using the ML principle, via a straightforward recalculation of the log-likelihood L( φ, ψ) = ln P * (W * ) and a corresponding adaptation of Eqs. ( 23) and (24).

IV. EMPIRICAL ANALYSIS
We can finally test the predictions of our model against empirical international trade data.The datasets are described in the Appendix.Here, it suffices to report that trade volumes are reported in U.S. dollars and are therefore integer-valued.For this reason, throughout our analysis we will adopt the formulas we obtained assuming integer weights.Clearly, the same analysis can be easily repeated for real-valued volumes by using the corresponding formulas we have provided for real weights.

A. Model specification
We adopt an undirected network description (where the connection between two countries carries a weight equal to the total trade in either direction) to facilitate the definition of the topological properties characterizing the ITN.Previous work has shown that, given the highly symmetric structure of the ITN, the undirected representation retains all the basic properties of the network [26,27,30].
We choose F φ ( n i , n j , D ij ) in such a way that the expected non-zero trade flow w ij |a ij = 1 is the same as in the GM defined by Eq. ( 1) (now interpreted as a conditional expectation).This means choosing where we have set β ≡ α due to undirectedness.Similarly, we choose G ψ ( n i , n j , D ij ) in such a way that the probability p ij is the same as in the model defined in Eq. ( 3), i.e. ψ = δ and With the above specification, the expected topology does not depend on any dyadic factor.This is the simplest choice that is found to reproduce the topology of the ITN very well [25,30,32,38] and is supported by empirical evidence that dyadic factors like geographic distances [51] and trade agreements [47] have a much weaker effect on the purely binary topology of the ITN than on trade volumes.Of course our formalism has been designed in such a way that we can immediately add dyadic factors, and is therefore much more general.For instance, we might easily add 'hidden' metric distances inferred via an optimal geometric embedding [47] (although they would not be identifiable with some empirically measurable, 'external' macroeconomic factors like those used elsewhere in our model).
Given the above model specification, for a given instance W * of the empirical network we find the optimal parameter values c * , α * , γ * and δ * through the ML conditions given by Eqs. ( 23) and (24).Importantly, Eq. ( 24) reads in this case ∂L/∂δ = 0 and yields a value δ * that ensures that the expected number of links i,j p ij = i,j G ψ /(1 + G ψ ) is exactly equal to the empirical number L * = i,j a * ij , irrespective of the volumes of trade.This result, which is equivalent to what is found for the purely binary model defined by Eq. ( 3) [38], shows that, unlike the standard GM, our model always generates the correct number of links and, unlike some more complicated variants of the GM, it does so independently of the monetary units chosen for the volumes.

B. Testing the model against real data
We first test the performance of the EGM in replicating the empirical trade volumes, i.e. the purely local (dyadic) structure of the ITN.In Fig. 2, superimposed to the previous results for the standard GM given by  I) and by the Enhanced Gravity Model defined in Eqs. ( 4) and ( 27) (blue, parameters estimated as reported in Table II).Top left: year 1970, top right: year 1980, bottom left: year 1990, bottom right: year 2000.The black line is the identity line corresponding to the ideal, perfect match that would be achieved if the empirical weights were exactly equal to their (conditional) expected values, i.e. in complete absence of randomness.
Eq. ( 1) and already shown in Fig. 1, the empirical nonzero link weights w * ij are also compared with their conditional expected value w ij |a ij = 1 under the EGM given by Eq. ( 27).As mentioned above, for the EGM the parameters are obtained via the ML principle as prescribed by Eq. ( 23) and their resulting values are reported in Table II.As expected, the sets of points generated by the two models largely overlap, confirming that, in terms of trade volumes, the EGM cannot do worse than the GM.Moreover, the EGM turns out to be more parsimonious than the GM as it achieves a narrower scatter of points while having no dedicated free parameter to tune the variance (as already mentioned, the GM usually assumes that each trade volume is drawn from a certain probability distribution, typically a normal or log-normal one, with mean value given by Eq. ( 1) and variance specified by an additional free parameter).
Importantly, comparing the values of the parameters α, β, γ reported in Table II for the EGM with the corresponding values of the same parameters shown previously in Table I for the GM, we see that the GM yields systematically larger parameter values (especially so for α, β).This means that, with respect to the EGM, the GM overestimates the effects of both GDP and geographic distance, and this is especially true for the GDP.This is due to the fact that the EGM is used to explain not only the  23) and (24).
volume of realized trade flows, but also their existence, and has separate functions (F and G) with possibly overlapping sets of explanatory factors (GDP is the common element in this case) but in any case distinct sets of parameters (α, β, γ on one hand and δ on the other), to take these two aspects into account.The effects of GDP and distance captured by the parameters α, β, γ are only those conditional on a link being created, while discounting the effects of link creation itself via the parameter δ.
Note that α, β, γ and δ are all found to be monotonically increasing over time by the EGM, highlighting a steady increase of the effects of GDP and distance (even if milder than observed in the GM) and of the density of connections.In fact, as the network density becomes higher (larger δ in the EGM), we see a smaller discrepancy between the fitted values of α, β in the two models, consistently with the idea that, if all pairs of countries were connected, then both the GM and the EGM would estimate the effects of GDP only through the lens of trade volumes, because the GDP would no longer explain the (fully connected) topology in such an extreme situation.
In order to better understand the differences between the trade volumes predicted by the two models, in Fig. 3 we plot the cumulative distribution P ≥ (w) counting the fraction of link weights larger than or equal to w in the empirical (red), GM-generated (green) and EGMgenerated (blue) networks.All distributions are normalized as P ≥ (0) = 1 in order to include zero weights, corresponding to pairs of countries that are not connected, in their support.Note that P ≥ (w) is not simply the integral of q * ij (w) because the latter is a probability distribution defined for a specific pair of countries, while the former is defined for the entire network and hence determined by the combination of all pair-specific probabilities.We see that the empirical distribution has a discontinuous jump at w = 1, as it drops from a value P ≥ (1 − ) = 1 to a value P ≥ (1 + ) ≈ 0.53, where > 0 is arbitrarily small.Recalling that link weights take only non-negative integer values in our analysis, this discontinuity indicates that there are roughly 47% pairs of countries that are not connected (w = 0) in this particular snaphot of the ITN, so that the distribution keeps the value P ≥ (w) = 1 for w ∈ [0, 1) and, as we cross the smallest allowed non-zero weight value (w = 1), it drops by a value 0.47 as it no longer 'sees' those unconnected pairs.As bigger weights (w > 1) are considered, the distribution continues to decrease continuously all the way to P ≥ (+∞) = 0, indicating that the only discontinuity we see at w = 1 is actually due to the excess probability mass at zero weights produced by the link-generating process.Remarkably, the empirical distribution is closely matched by the EGM.The fact that this model replicates both the location and size of the discontinuity indicates a correctly predicted number of missing trade connections in the ITN topology.By contrast, the GM predicts a fully connected network, evidenced from the absence of the discontinuity.Pair of countries that are unconnected in the real ITN are are unavoidably given a positive weight by the GM and hence  1) (green, parameters estimated as reported in Table I) and the Enhanced Gravity Model defined in Eqs. ( 4) and ( 27) (blue, parameters estimated as reported in Table II).Note the discontinuous jump due to the ≈ 47% pairs of unconnected countries in both the empirical and the EGM-generated curves, and the absence of such feature in the GM-generated curve (for which missing links are incorrectly given a positive weight).
misplaced to the right in the distribution, which results in exceedingly large values of the green curve with respect to the other two curves.We know that in the EGM the discontinuity is indeed due to the extra point mass at w = 0 in the expression of q * ij (w) given by Eqs. ( 15) or (19).Note that, technically, one can speak of a 'discontinuity' only if weights take continuous values.This would be possible by replicating our analysis in the case of real-valued weights using the results provided in the Appendix and summarized in Eq. (25).Importantly, in this case the jump in P ≥ (w) would be observed precisely at the 'true' value w = 0, consistently with the genuine delta-like form of q * ij (w) given by Eq. ( 25) (only, it would no longer be possible to show the discontinuity of P ≥ (w) along a logarithmic axis and plot the full cumulative distribution).The EGM would again correctly match both location and size of the empirical discontinuity (since p ij , hence the expected number of positive weights, is identical in the discrete and continuous versions of the model).For positive weights, the real-valued EGM would continuously interpolate the discrete points of the integer-valued EGM, because this is a generic property of geometric and exponential distributions with the same expected value.So in either specification, the EGM nicely replicates both the empirical distribution of strictly positive link weights and the sharp peak 'jumping out' from it, while the GM does not.
We now want to check whether the trade links, besides being predicted in correct number by the EGM, are also placed between the correct pairs of countries by the FIG. 4. Country-based network configurations for year 2011 in the real ITN (red), the GM (green) and the EGM (blue).For three representative countries, we show the connections to all trade partners in the world.The total number of countries in the data (see Appendix) is N = 208.The three countries are selected on the basis of their empirical degree k: the country with maximum degree (USA, k = 203), the one with minimum degree (Western Sahara, k = 13) and one with intermediate degree (Vanuatu, k = 91).The GM produces always the maximum possible number (N − 1) of connections.By contrast, the EGM produces connections randomly with probability pij, so links change from realization to realization.The expected degree is however independent of the individual realizations and is close to the empirical one for all countries.We have selected a typical realization that produces a degree equal to the expected degree for each of the three countries.
same model.This means moving the focus of our analysis towards the purely binary, global topology of the ITN.As a first qualitative illustration setting the stage for this analysis, in Fig. 4 we show all the trade links of the country with maximum degree (USA), the one with minimum degree (Western Sahara) and one with intermediate degree (Vanuatu).We also show the corresponding predictions under the standard GM (where Eq. ( 1) is first fitted to the non-zero flows and then extended to all pairs of countries) and the EGM.The traditional GM predicts a fully connected network, i.e. an expected degree k i GM = N − 1 for all i.This prediction may be accidentally correct for one or a few countries with maximal degree, if such countries turn out to be present in the network (in this case, this does not even happen as the maximum observed degree is k = 203 for USA), but deteriorates unavoidably and dramatically for other countries as their degree decreases.By contrast, the EGM gives an expected degree k i EGM = j =i p ij (see Appendix) which is in good agreement with the empirical one for the entire range of connectivity.
We now consider higher-order topological properties as a more stringent and quantitative test.In the top left panel of Fig. 5 we plot the average degree (k nn i ) of the trade partners of each country i versus the number of such partners, i.e. the degree (k i ) of country i itself.Similarly, in the top right panel of Fig. 5 we plot the clustering coefficient (c i ), i.e. the fraction of trade partners of country i that trade with each other, again versus the number (k i ) of such partners.The empirical quantities are compared with the expected quantities under the GM and the EGM.The exact expressions for both empirical and expected quantities are provided in Appendix.The decreasing empirical trends observed in both plots show that the trade partners of poorly connected countries (small k i ) are on average highly connected, both to the rest of the world (large k nn i ) and among themselves (large c i ).By contrast, countries that trade with a high-degree country (large k i ) are on average poorly connected, both to the rest of the world (small k nn i ) and among themselves (small c i ).For both properties, we find that the EGM is in excellent agreement with the em-pirical ITN, as opposed to the classical GM which systematically generates nearly constant and much higher values, as a result of predicting a complete network.
Having checked that the EGM does very well in separately replicating both the local link weights and the global topology of the ITN, we now perform a last and most severe test monitoring properties that combine topological and weighted information together (all definitions are again given in the Appendix).In the bottom left panel of Fig. 5 we plot the average strength (s nn i ), i.e. the average traded volume, of the trade partners of each country i versus the strength (s i ) of country i itself.In the bottom right panel, we plot a weighted version of the clustering coefficient (c w i ) of country i, again versus the strength (s i ) of country i.The empirical trends are compared with the predictions of the GM and EGM (see Appendix for all definitions).These two plots are in some sense the weighted counterparts of the purely binary plots considered above.We find that, on average, countries connected to countries with a low trade activity (small s i ) trade a lot with the rest of the world (large s nn i ) but relatively less so among themselves (small c w i ).Countries connected to countries with a large volume of trade (large s i ) have instead a small trade activity with the rest of the world (small s nn i ), but trade relatively strongly with each other (large c w i ).Again, we find that both trends are replicated very well by the EGM, while the standard GM fails systematically.

V. DISCUSSION
In this paper we have introduced the EGM as a novel, advanced model for the ITN and economic networks in general.Phenomenologically, the EGM allows us to reconcile two very different approaches that have remained incompatible so far: on one hand, the traditional GM that is well established in economics and successfully reproduces non-zero trade volumes in terms of GDP and distance but fails in predicting the correct topology [22]; on the other hand, network models that have appeared more recently in the statistical physics literature and have been successful in replicating the topology [25,44] but are more limited in predicting link weights [42].To our knowledge, the EGM is the first model that can reliably reproduce the binary and the weighted empirical properties of the ITN simultaneously.Just like the standard GM, the RM [11] or similar models, the EGM can accomodate additional economic factors in terms of extra dyadic and country-specific properties.Yet, it can attribute each of these factors two different roles, by considering its measurable effects on the topology and on the trade volumes separately from each other, although in a combined fashion.For instance, already in the analysis presented here, we have noticed that the EGM uses the GDP in two different ways when explaining the presence and the intensity of links.By discounting the effects of GDP in determining the existence of links from the ef-fects of the same factor in determining the volume of the realized trade connections, the EGM produces different parameter values with respect to the GM.By contrast, the latter lacks this possibility and tends to overestimate the effects of GDP and distances on the measured trade volumes.
The agreement between the EGM and trade data calls for an interpretation of the process generating the network in the model.In this respect, we notice that Eqs. ( 15) and ( 22) allow us to interpret the realized trade volumes in the EGM as the outcome of two equivalent processes (a serial and a parallel one) of link creation and link reinforcement.In the serial process, for a given pair of countries i, j we first establish a trade link of unit weight with success probability p ij and then increment its volume in unit steps, each with success probability y ij .After the first failure, we stop the process for the pair of countries under consideration and start it again for a different pair, and so on until all pairs are considered.In the equivalent parallel process, all pairs of countries simultaneously explore the mutual benefits of trade and engage in a first connection, each with its probability p ij .Then, all pairs of nodes for which the previous event has been successful reinforce their existing connection by a unit weight, each with its probability y ij .The process stops as soon as there are no more successful events.In either case, Eq. ( 15) gives the resulting probability that the realized volume is w.
Importantly, Eq. (19) shows that q * ij (w) is a modified geometric distribution with an extra point mass q * ij (0) at zero volume, i.e. the first event has a probability p ij which is in general different from the probability y ij of each of the w − 1 subsequent events required to produce a weight equal to w.This distinguishing property of the Bose-Fermi distribution [48] ensures a realistic network formation mechanism where the establishment of a trade connection for the first time is intrinsically different (and therefore associated to a different 'cost') from the reinforcement of an already existing trade connection.This desirable distinction, interpretable for instance in terms of profitability of trade, has been advocated in previous studies [9,10,21].Here, it is implemented naturally within the maximum-entropy framework via Eq.( 13), where the (expected) binary topology is enforced separately from the (expected) link weights.Notice that the distinction disappears if the parameter α ij in Eq. ( 13) is set to zero, i.e. if the constraint on the expected value of Θ(w ij ) (the expected topology) is removed as in the standard GM.In such a case, p ij becomes equal to y ij (i.e.link creation and link reinforcement become equally likely) and therefore q * ij (w), not only q * ij (w|a ij = 1), becomes a geometric distribution.However, this operation would lead to an unrealistically dense network because the expected topology would no longer be controllable separately from the link weights.
Consistently with the fact that trade volumes are typically reported as integer multiples of some indivisible monetary unit (e.g.dollars), the above discussion and most of our analysis has been assuming non-negative integer link weights.However we may also take the limit of a vanishing monetary unit, in which case trade volumes become non-negative real numbers and, as we have shown, q * ij (w) becomes an exponential density with an extra point mass at zero volume as reported in Eq. ( 25), while q * ij (w|a ij = 1) becomes a purely exponential density as shown in Eq. ( 26).Crucially, the extra point mass q * ij (0) ensures that, even in this continuous limit, p ij is unchanged and the expected topology is still described by Eq. ( 5).In absence of topological constraints, i.e. if we imposed α ij = 0, in this real-valued case the network would degenerate to a fully connected graph as in all specifications of the GM with continuous volumes [39].This would happen due to the disappearance of the point mass at zero volume, implying that 'missing links' become events with zero measure in probability.
Our results may have strong implications both for the theoretical foundations of trade models and for the resulting policy implications.It is known that the traditional GM is consistent with a number of (possibly conflicting) micro-founded model specifications [52][53][54][55].For instance, a gravity-like relation can emerge as the equilibrium outcome of models of trade specialization and monopolistic competition with intra-industry trade [10,56].The empirical failure of the standard GM highlights a previously unrecognized limitation of these micro-founded models, at least in their current form, and indicates the need for an appropriate reformulation that makes these models consistent with the EGM, i.e. with a realistic topology of the ITN.How policy implications change as the result of such a reformulation of current micro-founded models is an important point to add to the future research agenda.Research in the field of interbank networks [45] has shown that, if unrealistically dense networks are assumed, then the outcomes of stress tests typically carried out by central banks to study the propagation of stress among financial institutions are dangerously biased towards a systematic underestimation of systemic risk.Indeed, running the stress test on a network with the 'right' density and topology turns out to be crucial in order to achieve a reliable estimate of risk propagation [45].These results make us confident that, in the field of international economics where the propagation of trade risks is determined by the ITN topology, the EGM may offer a novel benchmark supporting improved theories of trade and refined policy scenarios.

APPENDIX From integer to real link weights
If the link weights w ij take non-negative real values instead of non-negative integer values, the probability P (W) has to be interpreted as a PDF rather than a PMF.We then look for its maximum-entropy functional form P * (W) by maximizing the following modified version of the entropy introduced in Eq. ( 12): where the constraints on a ij and w ij (for all pairs i = j) are still given by Eqs. ( 5) and ( 11), and we keep assuming zero-diagonal matrices (no self-loops in the network), i.e. a ii = w ii = 0 for all i.Note that, in going from Eq. ( 12) to Eq. ( 29), the summation W over all N × N zero-diagonal matrices with non-negative integer entries has been replaced by an integral Θ(W)=A dW over all N × N zero-diagonal matrices with non-negative real entries and such that their binary projection Θ(W) is a given adjacency matrix A (i.e.such that Θ(w ij ) = a ij for all i, j), followed by a discrete sum A over all such possible binary matrices.The resulting integral, written in the combined form A Θ(W)=A dW rather than in the unconstrained form W dW, allows us to treat the binary constraint a ij more naturally and to recover more general 'mixed' (i.e. containing a mixture of a discrete and a continuous part) solutions for P * (W) that are otherwise inaccessible, as we confirm later.
Since the sets of constraints is the same as in the integer-valued case, we arrive at the same expression for P * (W) given by (14), where the Hamiltonian H(W) is still given by Eq. ( 13) but, importantly, the partition function Z is now calculated as where we have again used the definition x ij = e −αij , while in this case we find more convenient not to introduce the corresponding transformation y ij = e −βij , for reasons that will be clear below.Inserting Eq. ( 30) into Eq.( 14) yields the following new form of q * ij (w), replacing the one appearing in Eq. ( 15): Using Eqs. ( 6) and ( 8), we can now calculate the connection probability and the (conditional) expected weight as w ij = w>0 dw w q * ij (w) = Equations ( 32), ( 33) and (34) replace Eqs. ( 16), (17) and (18) in the case of real-valued link weights.Inserting these expressions into Eq.( 31), we get which replaces Eq. ( 19) in the real-valued case and shows that q * ij (w) is now a mixture of a discrete part, characterized by a probability mass of magnitude q * ij (0) = 1 − p ij at w = 0, and a continuous part characterized by an exponential probability density for w > 0. If we want to interpret q * ij (w) uniquely as a PDF throughout its domain (or on the entire real axis), we may rewrite it via the Dirac delta function δ(x) as which allows for a fully continuous treatment.For instance, the normalization can be correctly stated as dw q * ij (w) = (1 − p ij ) + p ij = 1.Clearly, the above solution would not be obviously retrieved if we used the unconstrained integral W dW in Eq. ( 29), unless we imposed, a priori et ad hoc, the presence of a delta-like spike at zero weight.
In terms of conditional probabilities, we still find that establishing a link from country i to country j is a Bernoulli trial with success probability p ij given by Eq. ( 5) as desired; if realized, this link acquires a weight w with probability density q * ij (w|a ij = 1) = 0 w = 0, β ij e −βij w w > 0, (37) which is now a purely exponential distribution with (conditional) mean β −1 ij as prescribed by Eq. ( 34).Now, equating Eq. ( 32) to Eq. ( 5) and Eq.(34) to Eq. ( 4) yields the values of x ij and β ij solving the original problem: Note that Eq. ( 11) holds in this case as well, as it should because it does not depend on whether link weights are taken to be integer or real.Inserting Eqs.(38) and (39) into Eqs.( 36) and (37), we get the explicit form of q * ij (w) and q * ij (w|a ij = 1) as a function of the factors n i and D ij , as reported in the main text in Eqs.(25) and (26) respectively.

Data
We have used international trade and GDP data from the database curated by Gleditsch [57] for the years 1950, 1960, 1970, 1980, 1990 and 2000.This database includes yearly trade volumes w ij (which we have symmetrized by taking the sum of w ij + w ji ), yearly GDP values, and the (time-independent) distance matrix R ij .The number N of countries increases over time from roughly 85 in 1950 to approximately 200 in 2000.Both GDP and trade data are reported in U.S. dollars and are therefore integer-valued.To produce Fig. 4, we have used the BACI database [58], which reports imports and exports between N = 208 countries in 2011.The BACI data were originally in disaggregated form, where total trade was resolved into 96 different non-overlapping commodity classes.We have aggregated all these commodity classes together, and again symmetrized, to obtain a dataset consistent with the Gleditsch data used for the earlier years.

Observed network properties
Given a weighted undirected network with weight matrix W and adjacency matrix A, with entries related through a ij = Θ(w ij ), the degree of node i is defined as the average nearest-neighbor degree of node i is defined as and the (binary) clustering coefficient of node i is defined as The average nearest neighbor strength of node i is defined as (where s i = j =i w ij is the strength of node i) and the weighted clustering coefficient of node i is defined as k =i,j (w ij w jk w ki ) Expected network properties The expected value (under the EGM) of each of the network properties defined above can be calculated either numerically, by averaging over many network realizations sampled independently from the probability P * (W) in Eq. ( 14), or analytically, using the following approach.First of all, in this model the expected value of all ratios can be approximated by the ratio of the expected values [40,41].Secondly, all numerators and denominators involve only products over distinct pairs of nodes, which are statistically independent in the model.Using Eq. ( 15), the expected values of such products can therefore be calculated exactly in terms of x ij and y ij as follows:   where a ij = p ij , as given by Eq. ( 16), and Li n (z) = ∞ l=1 z l l n denoting the so-called n−th polylogarithm of z.From the above two considerations, it follows that the expected properties of all quantities of interest can be approximated with entirely analytical expressions obtained by simply replacing a ij with p ij and w γ ij with w γ ij in Eqs. ( 40), ( 41), ( 42), ( 43) and (44).Via x ij and y ij , the expected values are ultimately a function of only the GDPs and distances.In our analysis, after preliminary checking that the analytical expressions matched extremely well with the numerical averages over realizations, we have systematically adopted the analytical ap-proach, which requires no sampling of networks and is therefore extremely efficient.

FIG. 1 .
FIG.1.Empirical non-zero trade flows vs. the corresponding expectation under the traditional Gravity Model.Log-log plot comparing the empirical volume (y-axis) of all non-zero bilateral trade flows in the ITN with the corresponding expected volume (x-axis) predicted by the Gravity Model defined in Eq. (1), with parameters estimated as reported in TableI.Top left: year 1970, top right: year 1980, bottom left: year 1990, bottom right: year 2000.The black line is the identity line corresponding to the ideal, perfect match that would be achieved if the empirical weights were exactly equal to their expected values, i.e. in complete absence of randomness.

FIG. 2 .
FIG.2.Empirical non-zero trade flows vs. the corresponding expectations under the traditional Gravity Model and the Enhanced Gravity Model.Log-log plot comparing the empirical volume (y-axis) of all non-zero bilateral trade flows in the ITN with the corresponding (conditional) expected volume (x-axis) predicted by the Gravity Model defined in Eq. (1) (green, parameters estimated as reported in TableI) and by the Enhanced Gravity Model defined in Eqs.(4) and (27) (blue, parameters estimated as reported in TableII).Top left: year 1970, top right: year 1980, bottom left: year 1990, bottom right: year 2000.The black line is the identity line corresponding to the ideal, perfect match that would be achieved if the empirical weights were exactly equal to their (conditional) expected values, i.e. in complete absence of randomness.

FIG. 3 .
FIG.3.Empirical and model-generated cumulative distributions of trade flows.Log-linear plot comparing the empirical cumulative distribution of trade flows (normalized in order to include zero flows) in the ITN for the year 2000 (red) with the corresponding distributions obtained using the Gravity Model defined in Eq. (1) (green, parameters estimated as reported in TableI) and the Enhanced Gravity Model defined in Eqs.(4) and (27) (blue, parameters estimated as reported in TableII).Note the discontinuous jump due to the ≈ 47% pairs of unconnected countries in both the empirical and the EGM-generated curves, and the absence of such feature in the GM-generated curve (for which missing links are incorrectly given a positive weight).

FIG. 5 .
FIG. 5. Network properties in the real ITN (red), the GM (green) and the EGM (blue).Top left: average nearest neighbor degree k nn i versus degree ki for all nodes.Top right: clustering coefficient ci versus degree ki for all nodes.Bottom left: average nearest neighbor strength s nn i versus strength si for all nodes.Bottom right: weighted clustering coefficient c w i versus strength si for all nodes.All results are for the shapshot of the ITN in the year 2000.For all the other years in the analysed sample, we systematically obtained very similar results.See Appendix for information about the data and all definitions of empirical and observed quantities.