Was Thebes Necessary? Contingency in Spatial Modelling

When data is poor we resort to theory modelling. This is a two-step process. We have first to identify the appropriate type of model for the system under consideration and then to tailor it to the specifics of the case. To understand settlement formation, which is the concern of this paper, this not only involves choosing input parameter values such as site separations but also input functions which characterises the ease of travel between sites. Although the generic behaviour of the model is understood, the details are not. Different choices will necessarily lead to different outputs (for identical inputs). We can only proceed if choices that are"close"give outcomes are similar. Where there are local differences it suggests that there was no compelling reason for one outcome rather than the other. If these differences are important for the historic record we may interpret this as sensitivity to contingency. We re-examine the rise of Greek city states as first formulated by Rihll and Wilson in 1979, initially using the same"retail"gravity model. We suggest that, whereas cities like Athens owe their position to a combination of geography and proximity to other sites, the rise of Thebes is the most contingent, whose success reflects social forces outside the grasp of simple network modelling.


Introduction
Data is never perfect. However, when it is good, as is often the case in the physical sciences, there are various programmes for tackling data deficiency (e.g. Kennedy and O'Hagan, 2001;Babtie et al., 2014). In social science, and in archaeology in particular, data is often too poor for these approaches to be applicable as they stand. The information we have is incomplete and fragmentary, biased by what has survived, by what has been investigated and what has been made publicly available. We have to supplement the data that we possess with informed guesswork of various levels of detail and reliability. In the light of this, in order to make the best use of our limited knowledge or, complementarily, the best use of our ignorance a good starting point is to adopt the fall-back position of answering the question "All other things being equal, I would expect . . . to have happened"? This is where modelling can add to the debate. How to make the best use of our ignorance is an old problem and we shall not attempt to document its history much beyond the observation that economists, who popularised the approach in the 20th century, refer to it (see Keynes, 1921, chapter 4) as Laplace's 'Principle of Indifference', although Laplace himself termed it the 'Principle of Insufficient Reason' (Laplace, 1825). However named, the basic idea (which precedes Laplace) is simple in principle and is exactly how a wise card-player would have played at the gambling tables of the ancien regime. List all the 'worlds' (in this case, hands of cards) which are compatible with your knowledge/ignorance. Each world is equally likely (or information is being withheld) and the most typical of these is the way in which the system is most likely to have behaved. In the language of Bayesian statistics (Laplace's analysis rather subsumed the ideas of Bayes, but it is Bayes whose name is invoked by statisticians) we have assumed a flat Bayesian prior. Necessarily our conclusions will have a large degree of uncertainty about them and our main aim in this paper is to attempt to show how this uncertainty can be estimated. This is a basic issue here that sets us, as archaeologists, apart from historians, particularly where data is poor. This is that modelling of the type that we are discussing applies only to generic societal behaviour, behaviour that arises when 'history is idling'. While is is easy to discount the historical staples of major social unrest, famine and disease it is less easy to estimate those aspects of local alliances that lead to one decision rather than another. That is, in the context of the above, we are less interested in an enraged gambler overthrowing the table than in the presence of organised card-sharps. To take the example of this paper that we shall develop at great length later, the formation of Greek city-states in ninth and eight century BC, we shall see that there is a contingency to the presence of Thebes that does not apply to Athens. Indeed, it is hard to get Thebes to appear as a major site. We would see this as a consequence of social forces (not necessarily dramatic) that modelling of this type, based on broad statistical inference, cannot accommodate. Whereas to a historian the absence of Thebes in its historical position is nonsense, a Bayesian is happy with a local proxy which differs in detail and position from the original, provided something sensible is said about Athens, Corinth and other important sites.
Jaynes (Jaynes, 1957(Jaynes, , 1973 reformulated the Bayes/Laplace approach as the Principle of Maximum Ignorance (or Principle of Epistemic Modesty) and this is the view closest to the methods we will use in this work, in that it can be reformulated as the Principle of Maximum Entropy. Here we are thinking of entropy in the context of information theory, in which it is simply related to the number of questions with which we need to interrogate the system to have complete knowledge of it. Maximum entropy states are, roughly speaking, the ones which have the largest number of ways of creating a result with the specified macroscopic observables (our limited knowledge) and this fits in with Bayes/Laplace. As always, there are certain caveats that we shall not address and leave to the interested reader (Neapolitan and Jiang, 2014).
Our focus here is on models for networks of exchange in the protohistoric past where the primary information contained in the model is spatial. Our worlds are patterns of exchange and a typical use of such models is to see to what extent spatial features alone can account for these. The entropy maximising that we shall discuss here is only one of several approaches which have evolved in part from urban planning, migration/commuting and economics, for all of which exchange is a key component. We have delineated some of the alternative lines of approach elsewhere (Evans et al., 2012) and will not pursue those further.
To examine the issues with quantifying levels of expectation with their concomitant uncertainty and ambiguity, we will build on a classic study using entropic methods by Rihll and Wilson (1987;1991). They examined how spatial models, based on models for siting retail outlets (Huff, 1964;Lakshmanan and Hansen, 1965;Harris and Wilson, 1978), can be used to study the rise of city states in central mainland Greece around the eighth century BC. Specifically, the question is to understand to what extent spatial features can explain why certain settlements out of the many in the region came to dominate it. The success of the model led to its adoption for understanding city state formation in other times and other places; 2nd millennium BC Crete (Bevan and Wilson, 2013), Middle Bronze Age and Iron Age NE Syria (Davies et al., 2014), early 2nd millennium BC Central Anatolia (Palmissano and Altaweel, 2015) and late 1st Millenium Latenian urbanisation (Filet, 2017).
Our aim is, using the data of Rihll and Wilson as the example, to show how to improve the treatment of uncertainty in such spatial modelling. This lies behind the title of this paper. We shall show that some sites, like Athens, are likely to be significant despite our poor knowledge whereas some, like Thebes, are much more contingent on information that we don't possess. Unlike Athens, its historical importance is likely a consequence of social forces that models like ours cannot accommodate. We had suggested this previously (Rivers and Evans, 2014) and this current paper provides an extensive development and analysis of this earlier proposition. However our focus in this article is less on the archaeology of late Geometric/Early Archaic Greece and more on the nature of uncertainty in such approaches. As such it is equally applicable to the other case studies and some preliminary work on Latenian urbanisation is under way.
As we have said, these studies of urbanisation are framed in the language of exchange networks. In the first instance 'exchange' means the physical transportation of things and people, with secondary meanings in terms of power and culture that these things and people embody, the details of which we know very imperfectly. Uncertainty comes in many forms but, roughly, it can be divided into uncertainty in 'physical' parameters and uncertainty in 'calibration' (Kennedy and O'Hagan, 2001). For the former a major source of uncertainty lies in our imperfect knowledge of site locations. In this regard, we take the sites of the Rihll and Wilson model as given. Even then, further uncertainty lies in the nature of the paths taken (we are thinking here of land-based and riverine travel) e.g. to what extent do existing roads and waterways determine long-distance exchange? We consider the effect of adopting different definitions. A third problem arises in estimating the effort/cost/time involved in the transportation along these routes. These, plus the ambiguity in the site carrying capacities or resource bases, constitute the major uncertainties in the physical parameters of the model. As for calibration, there is the question of how to characterise the ease of exchange for different 'costs' or distances. This is encoded in the so-called 'deterrence' function, a calibration function, with its own calibration parameters, which imposes cost penalties on long-distance single transactions.
We consider the effect of different choices of this deterrence function. In fact, what began our analysis of this data set (Rivers and Evans, 2014) was our inability to replicate the results of Rihll and Wilson when using a different choice of deterrence function from them.
Superficially, our conclusions seem very plausible just on geometric grounds from the distribution of sites. This leads us to question whether we needed the apparatus of entropy and Bayesian analysis. Could we have reached good enough conclusions with a ruler and compass? To test such a null approach we compare the results of this model to the results obtained using more traditional data clustering methods, which rely on geography alone. We find that the answer is no. As part of this programme we produce an open source list of site locations and distances used which allows future researchers to test their ideas on this data sets.
There is the caveat that even seemingly epistemic approaches like this can have a manifestly ontic realisation in terms of sites as actors in their evolution. For example, Wilson rephrases the model in terms of dynamical Lokta-Volterra (predator-prey) equations (Wilson, 2008), and Altaweel has re-examined Syrian city-state formation from an agent-based approach (Altaweel, 2015). Such approaches provide useful complementary perspectives.

Sites
The primary data we use is that of Rihll and Wilson (1991, pp.68-71); namely the choice and location of sites. The sites chosen constitute 109 population centres from the late Geometric period, the ninth and eight century BC,in a region roughly 130km across, as specified in Fig. 1 of Rihll and Wilson (1987) and shown here in Fig. 1. Some of the hypotheses behind the choices are outlined in Rihll and Wilson (1987), e.g. using the existence of a temple or the size of a site as a marker of importance. Relying on such expert judgement is inevitable, but this approach immediately provides a largely unquantifiable source of uncertainty. One response would be to try different 'reasonable' choices of sites to see what effect this has on conclusions. However we will not try this here and we take this set of 109 sites in Fig. 1 as given.
Some models can accommodate some sense of the size of the site. However Rihll and Wilson judge that this information is very uncertain in this period and it is more reasonable to assume all sites were roughly equal in size and importance at the start of the period (see Rihll and Wilson, 1991, pp.69-70). This reflects a entropy viewpoint that this is the least biased assumption to make when there is no other information available.
There is one obvious feature about the spatial arrangement of these 109 sites and that is there is a natural and obvious division into three regions: Boeotia, Attica and the Isthmus-Argolid region. We will comment further on the properties of the spatial arrangement of these 109 sites below, see section 2.3.  Rihll and Wilson (1987). The index of site numbers and the precise locations used in this study are given in Table 5 in the Supplementary Material. Note that site 64, Vouliagmeni, was not labelled in the original figure, see Evans (2016a) for more details. The 'normal' distance set are the direct straight line distances between these sites and the full table of distances are given in Evans (2016a). The edges are a subset of those given by Delaunay Triangulation and are used to calculate a second set of distances (denoted DTMedit). Each edge is linked to the straight distance between the end points. The distance between other pairs of points is found by taking the length of the shortest path along the edges where the path distance is the sum of the distances associated with the edges traversed in the path.

Distances
The only other explicit information used here, and in the original papers, is the distance between every pair of sites (see Rihll and Wilson, 1991, pp.63-65). There are many different ways to assess the effective separation of two sites: effort, time of travel, financial costs, distance of the actual route followed, etc. For our context, only the last can be measured with any accuracy using modern GIS methods, but lack of knowledge about the details of exchange and the way landscapes were actually used still leaves unquantifiable uncertainties. Again the best response is to make few assumptions so in the first instance, like Rihll and Wilson, we use a straight-line distance between sites. In fact the choice of significant sites may well contain some implicit information about the landscape, e.g. an area of difficult terrain is likely to have fewer settlements. We will try to judge the effect of the uncertainty of distance as part of our study, one aspect where we extend the original work of (Rivers and Evans, 2014).
One complication when comparing our results to the original studies is that the distances used in the latter are not provided. To obtain a comparable set of distances we have digitised Fig. 1 of Rihll and Wilson (1987); our coordinates for the sites are given in the Supplementary Material in Table 5. The distance between sites is then the length of a straight line between these sites in these coordinates. This comprises what we will call our 'normal' distance data set. This data is given in Evans (2016a) and the process is discussed there in more detail. The scale is that roughly 6 of our units are equivalent to 1km. If we look at the nearest neighbour to each site, we find that the closest pair are 8 units apart (Nauplia -Pronaia) and the furthest distance between nearest neighbours is 75 units (from Sikyon to Akraia). These minimum distances have a first quartile of 20.0, a median of 28 (e.g. Kopai-Olmous), and a third quartile of 39.0. The mean is 30.9 units implying a short distance scale of around 5km.
Perhaps the most obvious problem with this data is that the shortest path between many pairs of points includes paths which are over the sea. Travelling on water in the ancient world was, where it was available, the most effective method of long distance communication, both in terms of speed and in terms of bulk. However land and water borne transport have different advantages and disadvantages meaning that we cannot simply mix these two modes of travel. This issue largely effects long range links across our region. We shall see later that the bulk of the results seem to be sensitive to smaller scales, say 30km or less (the maximum walking distance in one day). We might reasonably hope that sea travel is likely to have little impact on our results and have used this to produce a second set of separations based purely on routes which avoid the open sea.
From the above we see sites are so close that a determined individual could visit several sites in a day, most likely making use of an existing system of tracks/roads, which we assume occur between nearest neighbours. Thus, if going from site A to site C takes us near an intermediate site B, the traveler is more likely to go from A to C via B than strike off a direct route to save a short distance. Given the uncertainty in assigning useful distances between these ancient sites, we judge that any differences in the distances used here as compared to the original studies should be no more significant than many of the other sources of uncertainty. To test this we applied Delaunay Triangulation to the same 109 sites so as to produce a set of non-interscting edges between 'nearest neighbour' sites. Again, for sites around the edge of the region, this produces several long distance links which cut across the sea. We choose which links to remove using our judgement, ending with a set of largely land based links. The distance for every remaining link was exactly as before, with a typical separation between nearest neighbours of around 5 to 10 km. We then found the distance between each pair of sites by assuming that the path between sites will always be along the links between nearest neighbours, using the path with the shortest total path length. This Delaunay Triangulated derived distance set (labelled 'DTMedit' in figures) is shown in Fig. 1.
In practice this means that sites which are not nearest neighbours are actually slightly further apart than in our normal data set. Again this process is not precise, but the differences between the set of Delaunay Triangulation derived distances and our normal distance set (direct paths) can serve as a proxy for the sort of errors we might expect in our estimations of effective distance between sites. The differences between our two different sets of distances and the Rihll and Wilson direct path distances will serve as a test of the robustness of results against uncertainty in travel.  Figure 2: Hierarchical Agglomerative Clustering for the normal distance set using the average criterion for agglomeration.

Asine
2.3 Regional structure of the data It is worth looking at our distance sets for obvious features. Using Hierarchical Agglomerative Clustering (for example see Manning et al., 2008), a standard method to cluster the data, we find that there are only really three main scales in either distance data set. The typical intersite separation marks the onset of clustering, the overall size of the data set (around 130km) marks the end. In between there is only one characteristic scale in the dendrogram of Fig. 2 (see also Fig. 12 in the Supplementary Material). This is a region of three clusters, which reflect the regions delineated by clear gaps in the map of sites, as seen in Fig. 1. They correspond to Boeotia in the north (containing Thebes), Attica in the east (containing Athens), and the Isthmus/Argive region in the south west (containing Corinth and Argos). While the two distance data sets produce different dendrograms in detail, the large scale picture is broadly similar for both.

The Rihll and Wilson model
The model used by Rihll and Wilson (1987;1991) and by later authors (Wilson and Dearden, 2010;Bevan and Wilson, 2013;Rivers and Evans, 2014) to describe urbanisation and state formation in historical contexts was originally devised to study the emergence of dominant retail centres (Huff, 1964;Lakshmanan and Hansen, 1965;Harris and Wilson, 1978). The model differs from several other well known spatial models, such as Gravity Models, in that it is a 'Zone of Control' model in the terminology of Evans et al. (2012). That is the model produces a structure of dominant sites, here called 'terminal sites', surrounded by a cluster of smaller nearby satellite sites, see In all that follows we consider a network of N sites labelled i = 1, 2, ..., N . As mentioned above in section 2.1, we take the sites to be equal, differing only in terms of their locations relative to one another. The output of the model will be a set of 'flows' F ij each of which describes a flow from site i to a distinct site j, separated by an appropriately defined distance d ij . In many spatial models this flow F ij represents the transmission of goods, people, influence, ideas etc. from site i to site j. Here however we will use it to represent the "pulling power or attractiveness" (Rihll and Wilson, 1987, pp.8) of site i to site j. This is, of course, very abstract and unquantifiable but in most ancient contexts it is nearly always impossible to quantify exchanges of any type between sites, even where they are physical goods, so this is not a particular weakness of this study.

Construction of the Rihll and Wilson model
In the Supplementary Material we outline the derivation of the model though the maximum entropy approach of Wilson (1967). The key assumption is that if all other things are equal, then every possible exchange counted by the flows is equally likely. Of course in reality there are strong constraints. We impose these here in three steps producing first the simple gravity model, then the output constrained gravity model and finally, the Rihll and Wilson model. We shall avoid algebra as much as possible here and refer the reader to the Supplementary Material, where it is given in greater detail.

The Simple Gravity Model
Exchange between sites i and j requires some effort and we imagine that this can be roughly quantified as a 'cost' c ij . For instance the effort to maintain a link between two sites will generally increase with separation d ij so we might choose to capture this with the simple choice that our costs are equal to the distances, c ij = d ij . Our single constraint is that the total cost/effort that can be expended on exchange is capped. This is natural given that any society has limited resources. Making the best use of this knowledge (maximising the entropy of the exchange flows subject to this constraint) gives the most likely distribution as Here the constant A is determined by the total cost we allow but it can be fixed more easily by specifying the total amount of exchange, F total = i,j F ij . The function f (x) is known as the deterrence function and it expresses the effect of the costs incurred on a trip from i to j in terms of the distance. The likelihood of a single exchange occurring over distance The new parameter D is a calibration scale for the effect on distances on travel. Often f (x) is chosen such that f (0) = 1. The appropriate form for the deterrence function is rarely known so another source of uncertainty comes from its choice. Note that we have swapped our lack of knowledge about the costs c ij for the lack of knowledge about the function f . The gain is that we have reexpressed the aspect in terms of a function of distance. Distances are more accessible quantities than costs and we have some knowledge, or at least intuition, about the effect of distance on exchange.
For instance we imagine that the deterrence function should always decrease as distance gets larger and indeed all our examples have this simple feature. Part of our study is to see how resilient the results from our modelling are to different choices. A common form for the deterrence function corresponds to the simplest choice that costs are proportional to distances. This leads to the simple exponential form (labelled EXP in figures) This is also the form used in Rihll and Wilson (1987;1991). In general the deterrence function can depend on additional calibration parameters that alter its shape, as we see in the second form shown in Fig. 3. This 'ariadne' deterrence function (labelled AEP in figures and used in Evans et al. (2006), Knappett et al. (2008;2011), Evans et al. (2012), might seem more appropriate to the case in hand where the proximity of sites suggests low penalties for short distance exchange. It takes what we term the ariadne form As well as the calibration distance scale D, this form has two further parameters, α and γ, which alter the shape of the function. We will work with α = 3.6, 4.0 and 4.4 and while we will use γ values of 0.9, 1.0 and 1.1.

The Output Constrained Gravity Model
The Simple Gravity model as given above can also be derived from a manifestly ontic, agent based approach (Jensen-Butler, 1972). However, the Simple Gravity model is deficient in many ways. One clear limitation is that there is no particular limit on the amount of exchange emerging from each site in the Simple Gravity model, only the total amount of exchange is limited. By way of comparison we note that some of the earliest and most successful examples of spatial modelling in archaeology have employed Proximal Point Analysis (for example Terrell, 1977;Broodbank, 2000). Proximal Point Analysis posits that each site only has interactions/exchanges with a fixed number of their closest sites, i.e. outflows are constrained. This has parallels in the idea that individuals only have the capacity to really interact with a limited number of other individuals (Dunbar, 1992).
Adding to the Simple Gravity model the constraint that individual site outflows are fixed to be a given value O i produces the Output Constrained Gravity model where the most likely exchange flows are found to be The normalisation factors A i ensure that the output from each site is

The Rihll and Wilson model
Unless we extend these models to accommodate strongly unequal site sizes initially, they have no mechanism for generating a handful of dominant sites as outputs, as identified in the archaeological record. The presence of dominant sites requires the inclusion of non-linear behaviour in the model constraints. Drawing on earlier work modelling retail outlets (Huff, 1964;Lakshmanan and Hansen, 1965;Harris and Wilson, 1978), Rihll and Wilson do this by constraining the entropy of the site inflows I j = i F ij though the motivation for this is best provided posthoc from the form of the final model. In the absence of this input entropy constraint, inflows over similar distance scales tend not to have wide variation as seen in either of the two previous models. The effect of this input entropy constraint is to give non-linear feedback so that sites which develop an advantage build on that advantage to become even more important, suggestive of the synoikism and urbanisation that is expected to lie behind the appearance of regionally dominant sites. It is the inflow I j outputs determined by the model that are used by Rihll and Wilson to assign an importance or 'attractiveness' to a site 1 . The result (see supplementary material section B) is that the flow F ij from site i to site j (i = j) now takes the form 2 As before, the normalisation factors A i ensure that the output from each site is indeed O i . We stress that the total flows into each site j, the I j , are not parameters of the theory but are all specified by the solution. The cost of each exchange is encoded in the deterrence function, f (d ij /D), as before. Again the model demands that the outputs are fixed. Given our limited information, we are unable to specify the outflows on a site by site basis and we follow our usual response to a lack of information and take the outflows to be equal in this case. Rihll and Wilson also set the output from each site to be equal as they suggest it is most reasonable to assume all sites were roughly equal in size and importance at the start of the period (see Rihll and Wilson, 1991, pp.69-70).
The feature which distinguishes this model from other gravity models is the factor of I β j in (5). The I β j leads to solutions where for most sites I j is zero or close to zero so that all the available flow is input to a few sites, the "terminal sites", which have non-trivial input flows. These terminal sites send their output flows to a variety of other terminal sites. Roughly speaking if the deterrence function becomes negligible for sites separated by distance scale D or more, then a site T with a large input flow (or attractiveness) I T will suppress the attractiveness of sites within a radius of D or so through the normalisation factor A i . Basically a first guess is that in this model (5)

Terminal sites
The difference from gravity models with input and output constraints (for instance see Evans et al., 2012) is that outflows O i are still input parameters of the theory but now the inflows I j are outputs determined by the model. These are used by Rihll and Wilson to assign an importance or 'attractiveness' to a site.
To identify the most important sites, Rihll and Wilson use a particular implementation of a scheme of Nystuen and Dacey (1961) and we will follow suit. We define a terminal site to be a site where the total flow into that site is bigger than the largest flow out of that terminal site along any one edge. That is a terminal site T satisfies

Assessing the results
The basic idea in the original studies is to find the sites which emerge from the initial equal sized settlements to dominate a region, based purely on the relative geographical positions.
In this context there are certainly four cities which dominate the history of subsequent periods in this region: Thebes, Athens, Corinth and Argos 3 . We will use these four well known sites as our key measure of the effectiveness of our various modelling attempts. Of course other sites play important roles; Rihll and Wilson provide commentaries on several other sites, both important and less so (throughout Rihll and Wilson 1987 and 1991 but see pp.71 of the latter in particular). Undoubtedly scholarship about the settlements in this region at this time will have moved on but our focus is on the methodology and the role of uncertainty in modelling so we are content to remain with a data set which captures the broad picture. It also enables us to build directly on the specific results present in the original papers.
Rihll and Wilson looked at several parameter values in their 1987 and 1991 papers for the exponential deterrence function of (2). Outcomes are determined by two parameters; the distance scale D of the deterrence function and the 'attractiveness' exponent β. Their results for these four significant sites, as derived from their figures, are shown in Table 1. Thebes and Athens were always identified as terminal sites. Corinth is identified in four of these figures but sometimes the terminal site was not Corinth itself but a close neighbour. There was also always a terminal at Argos or a close neighbour.  This D RW is simply the inverse of γ RW , the scaling factor of distance in the exponential form of the potential (2) as it is this γ RW value which is the value quoted in the original papers.
In terms of robustness, Rihll and Wilson look at a range of parameter values, producing between eight and thirteen terminal in the figures 4 to 6 of Rihll and Wilson (1991).
The key results are given in Table 1. We repeat this in the first stage of our study by looking at how sensitive our results are to changes in the calibration parameters D and β for a given deterrence function. In Fig. 6 we show the sensitivity of the terminal number to the distance D and β parameter values, for the standard exponential deterrence function and for the normal (direct) distance set. This is broadly similar to Fig. 6 of Rihll and Wilson (1987). Essentially the only solutions which are relatively stable are those with three terminals. This is clear from the dendrograms of figures 2 (see also 12 in the Supplementary Material) where as the distance scale is raised, clusters start to form at around the minimum separation scale and then grow without any characteristics scale until they reach three clusters. There is then a long range of the distance scale where the dendrograms are stable with three clusters. Normally when clustering data, such a large region of stability is taken to indicate that this is a 'good' clustering and the user is confident that these clusters represent a significant feature in the data. Here this stability with three clusters reflects the only clear spatial feature of our sites, the geographical gap which separates them into the three regions of Boeotia, Attica and Isthmus/Argolid.

Fixed β
Both here and in the original papers changing the non-linearity parameter β in the range 1.05 to 1.25 does not seem to change the qualitative nature of the results for choices of D; the pattern of terminals is qualitatively the same. So we will fix β and then look at the remaining uncertainty in further detail. We chose to fix β = 1.05 as a typical value from the earlier studies. Initially we adopt the normal distance data set and the exponential deterrence function of (2) corresponding to the case where cost equals distance. The sensitivity to distance scale can be seen clearly for this fixed beta value in figs. 7 and 8.
We have also repeated the analysis for our Delaunay Triangulation based distances. The change of the distances from normal to Delaunay Triangulation derived values has only a small effect. Generally the latter reproduce the same number of terminals at a slightly higher distance scale. This is to be expected as the distances between any two points in the Delaunay derived distances is always equal to or greater than the same points in the normal distance set. The main feature is that both these two cases using the exponential deterrence function show the instability in solutions with respect to small changes in the parameters as seen from the contour plots in Fig. 6. So solutions with a given number of terminals T , when T is bigger than three, are only produced for very limited ranges of D, typically less 10 or less, roughly the nearest neighbour site separation.
As a separate exercise Fig.s 7 and 8 illustrate the effects of changing the deterrence function to the ariadne form (3) with the values of its two further calibration parameters α and γ varied by 20%. We see the same instability as seen in the contour plots in terms of small changes in parameter D until we get to large distance scales with six or less terminal sites in their solution. Again the most stable solution has three terminals. The most interesting effect is that when we compare the solutions with the same number of T terminals and the same normal distance set, we see that changing the shape of the potential changes the value of the distance scale D.
This leads to an important conclusion. We should not try to compare two different theories at the same distance scale directly, even though in some qualitative way they correspond to the mode of transport. The relationship between a parameter of a theory, here D, and the actual physical properties it corresponds to, perhaps the typical physical separation of our terminals, is highly non-trivial and depends on the details of the theory. This is a simple example of what is called 'renormalisation' in physics. What we must do is to pick a physical property, say the number of terminals, and then find the values of the a theoretical parameter such as D where different models give similar physical results. Here the exponential potential distances scales can be as much as half the ariadne deterrence function distance scales for results with the same number of terminals.
So for the next stage of investigation we fix β and find solutions with a specified number T of terminal sites.

Eight terminals
We choose to focus on model outputs with β = 1.05 and eight terminals, to see to what extent we can replicate Rihll and Wilson and how to take the analysis further. These are typical values considered in the original paper Rihll and Wilson (1991), with Fig.6 satisfying the criteria exactly. To do this we start with our best representation of the distances between sites as outlines in section 2.1. We set β = 1.05 and, initially, adopt the same exponential form for the deterrence function as shown in (2). We have to find our own distance scale D as the units used in the original papers are unknown. To do this we scan through all possible value for D, other parameters fixed as described, looking for solutions with 8 terminals. We also repeated this exercise for the distances derived from our modified Delaunay Triangulation. The results are shown in Fig. 9.
We found three distinct solutions with eight terminals at D = 80, 85, and 90 for our normal distance matrix and only one at D = 90 for our modified Delaunay Triangulation (DTMedit) distance set. The precise sites are shown in Tables 2 and 3. The results show a lot of consistency with variations on the scale of about 10km. That is, all our examples give Athens as one of the dominant sites, as Rihll and Wilson also found. There is also at least one terminal site close to Argos and another close to Corinth, but often it is one of their close neighbours and not the sites which became dominant in later times. However this is on a relatively small scale of roughly the average site separation, under 10km. This 'error' is emphasised by the fact none of our results here gives historic Thebes as the dominant site, but rather we find one of its close neighbours. Looking at Table 2 we see the order of the sites, as defined by the terminal flow (A17), moves around between the different solutions on top of the actual again emphasising the level of robustness of the results.
Perhaps more interestingly,   Table 3: The eight terminal sites found for β = 1.05 using an exponential deterrence function, organised by location. One using our modified Delaunay Triangulation (DTMedit) derived distances, one with out direct distances (normal) and the last taken from fig.6 of Rihll and Wilson (1991).

Three terminals
The clustering suggests that three terminal solutions are the only ones which are in a noticeably stable region of parameter space. We are not suggesting that, in this period, there were only three significant sites. Our purpose is to understand the nature of the modelling process better when there is less ambiguity. The three sites found for the exponential deterrence function and the two different distance sets are given in Table 4. As might be expected given the noticeable gap visible to the eye in the map of sites in Fig. 1 there is one site for each of the three clear regions: Boeotia, Attica and Isthmus/Argolid. For Boeotia, yet again Thebes itself is never a terminal, but the terminal site picked is always close to Thebes. Unlike the results for eight terminals, this site is no longer always a nearest neighbour, e.g. Plataia is about 13km south of Thebes. For Attica, we find that the Delaunay Triangulation distance data no longer picks Athens but instead picks an extremely close neighbour of Athens. The difference between the two data sets reflects the fact that they differ only on longer distance scales. With three terminals, each site is the centre of attraction for about a third of the sites spread over a much larger distance so the differences between the two distance sets will be more important when we have fewer terminals.
The most interesting case is the terminal in the Isthmus/Argolid region, either Berbati or Tenea being chosen. These sites are somewhere in between the two neighbourhoods, that close to Argos and the second around Corinth, which provided Ishmus/Argolid region terminal sites when there were eight terminals in total. What this result suggests is that to a first approximation, the Rihll and Wilson model splits the space into roughly equal area patches and with one terminal per region, the terminal lying close to the geometric centre of its region.
This has implications for our more general analysis, which we now test.

The Cluster-and-Centre method
The typical type of network produced by the Rihll and Wilson model is shown in Fig.s   to a nearby terminal site. This effectively defines a clustering of the sites 4 where each cluster contains one terminal site along with all the sites with a strong connection to the terminal site of the cluster. Looking at the solutions it appears that the terminals are often close to the geometric centre of the clusters. This is not surprising at the site closest to the centre is likely to minimise the sum over deterrence function terms in the denominator of (4) which would allow a larger I i value in the solutions and a large I j values characterises the terminal sites. This leads us to the conjecture that the Rihll and Wilson model is clustering close sites and then giving the most central node as the terminal node. One way to test this is to attempt a different approach to finding the key sites which we will refer to as a "Cluster-and-Centre" method. This is a null phenomenological approach, relying on geographic data alone, without the trappings of entropy and Bayesian analysis. First we use a standard data clustering method, one which takes distances between data points (here the sites) as their input. The only parameter of the method is the number of clusters to be considered. Once we obtain our clusters of sites, we then find the most central site by looking for the site which is closest to the point with coordinates given by the mean (or the median) of the coordinates of all the sites in the cluster.
In Table 6 we show the results for eight clusters so we can compare them with the eight terminal results discussed above. We used k-means clustering and five variations of Hierarchical Agglomerative Clustering (HAC) (for example see Manning et al., 2008, for both methods). The implementation of k-means clustering that we use is stochastic so we are just showing one possible result here.
For three of the four flagship sites we are using to test of results, this approach gives reasonable results, except when using the method labelled 'HAC single'. The 'single' version of Hierarchical Agglomerative Clustering is well known to be an extreme version and is often not appropriate. It is clearly an outlier here and we will ignore it from now on 5 . For the remaining five methods shown in Table 6, Argos is always picked as a centre. Corinth is only picked out only once and either 80 Lekhaion or 78 Kromna are chosen instead. However these are close neighbours of Corinth and so we are finding the methods chooses the same type of 10km wide region as we did with the Rihll and Wilson model. Athens is picked out most often in the northern Attica region (six times from the ten reasonable methods) with 57 Koropi or 50 Menidi picked otherwise. However neither of these alternative sites is particularly close to Athens, lying outside the range of results we found with the Rihll and Wilson model. The big difference comes in the Boeotian region. Thebes is never picked out. Further, the two close neighbours in the typical small region picked out in our analysis using the Rihll and Wilson model,26 Potniai and 24 Kabirion, are only picked out in only four of the ten reasonable results.
Overall, while there is some correlation between the Rihll and Wilson model and the Cluster-and-Centre method outlined in this section, it is not an overwhelmingly strong or clear relationship. We need more than geography to understand why some sites develop to become so important at the expense of their neighbours. The entropic/Bayesian approach of Rihll and Wilson provides a useful next step to our understanding of state formation.

Conclusions
In our earlier paper (Rivers and Evans, 2014) we speculated on the contingent nature of Thebes in the Rihll and Wilson model, taking their results for granted. In this paper we have shown that Thebes (or, indeed, any site) can only be understood in the much larger context of uncertainty in spatial network modelling and how this is reflected in model outcomes, a context which has implications for all entropic attempts to understand urbanisation and city-state development.
We have highlighted several sources of uncertainty: site choice, distances, model choice, parameter choice. In this paper we have looked at the effect on outcomes of some (but not all) these sources of uncertainty. We have tried two ways to calculate distances, our normal and Delaunay Triangulation derived distances. By comparing with the results of Rihll and Wilson 1987;1991 we have effectively a third set of distances. We have tried also two major ways to encode the costs of distance in the models: an exponential deterrence function (2) and the ariadne deterrence function (3).
An important requirement of this work is that to evaluate uncertainties we must make fair comparisons between results from different variations of the same model or even completely different models. The key problem is that we do not know what parameter values to use in different models in order to make this fair comparison. Even when a parameter is apparently linked to a physically measurable scale, such as our distance scale parameter D, the relationship between model value and actual measurable physical quantity is complicated. Should we relate the the distance scale D directly to the typical daily walking distance, or should that be 1.5D or 0.9D? Whatever this relationship is, it will change with different model parameter values and indeed when using different models. This is the key principle of 'renormalisation', that we must never assume that a model parameter is simply related to a physical quantity. The answer we suggest is to compare results from different models only when they are giving the same output by some suitable measure.
In terms of the data used in this paper, we have arrived at similar conclusions to the original authors, though we put these within larger but quantified margins of uncertainty. Our results show that there are regions about 10km across which have distinct geographical advantages for encouraging urbanisation and which can help explain the different roles of sites in later periods. We do not predict that Thebes, and to a lesser extent Athens, always going to be these sites, unlike the original papers, but we do agree that sites in their close neighbourhood are continually shown to have this advantage. Thebes is not necessary but something like it was always likely.
Of course similar links have been made between geographical locations and the role of sites in history: Delos in the case of Davis (1982), Knossos in the case of Knappett et al. (2008). However in the latter case the modelling also suggested a that pair of candidates, Knossos and Malia, had these spatial advantages, again compatible with the uncertainties in modelling.
In terms of the method of Rihll and Wilson itself, our work has highlighted that, qualitatively at least, it seems to divide up the sites into regions of roughly equal size, each with a terminal site. To test the hypothesis that the terminals are close to the geometric centres of these zones, we have compared this 'zone-of-control' spatial ordering against a null 'Cluster-and-Centre' method which is based on generic clustering methods using geographical data alone. Specifically, it takes any clustering method to create the clusters and then finds their geometric centres to locate the dominant site for each cluster. This enables us to come up put 'error bars' on our results, estimates of the range of reasonable results as illustrated by our Fig. 10.
For the clustering methods we tried here, we found there is some correlation between the Rihll and Wilson model and the Cluster-and-Centre method. However it is not an overwhelmingly strong or clear relationship. It seems that we need more than geography alone, a need arguably satisfied in part by entropic/Bayesian analysis. It could be argued that the popular and generic data clustering methods we used here, k-means and Hierarchical Agglomerative Clustering, are more effective on higher dimensional data than our two-dimensional world-surface. However, even if it had been in better agreement with our outputs, which we don't think is likely, part of the rationale for using the Rihll and Wilson model is the interpretation it brings. The Rihll and Wilson model comes with a powerful epistemic interpretation in terms of the entropy of microstates and a natural interpretation in terms of flows, inputs, outputs and costs. We learn from the nature of the calibration. A regular data clustering method such as k-means brings no intrinsic interpretation to the table, it merely hopes to provide reasonable clusterings in a calibration-free way.
1 Figure 11: The approximate locations of the 109 sites used as the starting point for this study, derived from Fig. 1 of Rihll and Wilson (1987). The index of site numbers is given in Table 5 in the Supplementary Material. Note that site 64, Vouliagmeni, was not labelled in the original figure, see Evans (2016a) for more details. The edges are those used to derive the second set of distances (denoted DTMedit) and are a subset of the edges of a Delaunay Triangulation.
is that if all other things are equal, then every possible exchange counted by the flows is equally likely. Put another way, if there is no other information about these exchanges then the best we can do is to assume all exchanges are equally likely. The maximum entropy framework provides a rigourous mathematical basis for this simple idea.
Of course in reality there are strong constraints and different models add in different types of extra information in an attempt to provide a more realistic description of the individual exchanges. Equivalently, they can be understood as the most likely configurations of a microcanonical ensemble subject to these same constraints.  Figure 12: Hierarchical Agglomerative Clustering for distances based on the edited Delaunay Triangulation network (DTMedit) using average criterion for agglomeration.
For the shopping model (Huff, 1964;Lakshmanan and Hansen, 1965;Harris and Wilson, 1978) used by Rihll and Wilson, the flows defined by the model maximise the entropy S, more conveniently understood as minimising the Hamiltonian H = −S where, after some rewriting, (see Harris and Wilson, 1978, equation 30-33) The solutions are the most likely pattern for exchanges given the constraints (given in the square brackets in (A7)) imposed in the model. For simplicity we are assuming that every site i has the same number of potential origins or destinations for a trip. The first term reflects the assumption that the probability that an exchange occurs on any given edge is independent of the edge if all other things are equal. The second term, with coefficient α imposes a constraint that the total output of site i, the total flow leaving site i, is equal to a parameter O i which we must specify. The third term supposes that the cost of each exchange from i to j is c ij , and that we demand that the total cost is C, another parameter we must satisfy. This 'cost' is not necessarily in terms of money.
Rather it is generally expressed in terms of some characteristic fixed by the geography of the site, typically some measure of the distance between two sites. All of the first three terms are frequently seen in this type of entropy approach. It is the last term which is distinctive and is a key feature of the model of Rihll and Wilson. It involves the total incoming flow, I j = k F kj , but it is not a simple constraint on the total input to each site. Rather, it is a constraint on the entropy of site outflows which, in the absence of this (and other constraints) would be uniform. To understand what type of effect this last term is giving, and indeed to get a better understanding of what all the terms lead to, it is easier to quote the solution for the pattern of flows which maximises the entropy, which can be expressed in simple algebraic form as The parameters α i have been fixed by the requirement that the outputs for each site are fixed to be are O i from which the normalisation factors A i are determined in terms of other quantities in the theory. The total cost C and the cost of each exchange c ij are equivalent to specifying what is known as the deterrence function f ij , where D is a distance scale against which 'costs' are measured. More generally, we write where f (x) relates costs to the separation variables d ij between sites i and j. For instance if the costs are just the distances, c ij = d ij then we arrive at a simple exponential form We stress that the outflows O i are input parameters for the Rihll and Wilson model (A8) whereas the inflows I j = i F ij are to be determined from (A8) as outputs from the model. The inflows are interpreted as the attractiveness or importance of a site and as used to determine the dominant city state sites.
In this case varying site size can be accommodated. The obvious extension to include site size (that has its counterpart in the work of Alonso (1978)) is to take Extremal solutions now take the form Consistency then requires that We said that, in the absence of further evidence, we had taken all the S i to be identical. We note that explicit dependence on site size is very weak, going as S 1−β i , where 1 − β ≈ 0.05, with negligible effect. However, there is implicit dependence in the way the given outputs depend on site size (e.g. O i ∝ S i ).

B.1 Non Linearity in the Rihl and Wilson Model
The factor of I β j in the Rihll and Wilson model is the key difference from most other gravity models. This term introduces non-linear feedback based on the inputs though the self-consistent normalisation factors.
To understand how this works, we will consider a simple way to find the solution to (A8) for a given set of parameters. The idea is that if we are given a set of input flows at some time t, say I j (t), then we specify the next round of values at iteration number (t + 1), I j (t + 1). To show this iterative process, we first rewrite the solution (A14) in terms of just the total inputs I j for each site along with the other fixed parameters as (see equation (23) of Rihll and Wilson (1987) and appendix of Rihll and Wilson (1991)) Note that this solution shows explicitly that the model can be derived from the N different I j values alone. This is a non-linear equation which can be solved using standard methods. In practice in this paper we used a simple approach where at each step of the numerical process we have our current best guess for the input values I j (t). The next set of values, I j (t + 1), is then defined to be We start the process using the site output values as initial values I(t = 0) = O i . We iterate many times until the changes in the I j (t) values are small, as defined by us. For more details on the convergence and uniqueness of this approach see Harris and Wilson (1978). This iterative form (A16) is also useful as it helps us understand the role of the distinctive non-linear factors I β j found in the Rihll and Wilson model. If one site, say T , has a large input flow, so I T (t) I j (t) (j = T ), it is very "attractive" in the language of Rihll and Wilson. The normalisation factor, the denominator in (A16), will then be be dominated by the one large factor of I T (t). This will pull down all values of I j (t + 1) except for I T (t + 1) which is the only one boosted by a large factor in the numerator, the (I T (t)) β . This creates a feedback loop as at each stage the I T entry gets larger and the others smaller, reinforcing the process. The feedback is enhanced the larger β is. The logical end is to have most I j becoming zero so that all the input goes to one site, I T .
In practice the solutions are a little more complicated. If we assume that the deterrence function becomes negligible for sites much more than distance scale D apart, then a site with a growing attractiveness I T will only suppress the attractiveness of sites within a radius of D or so. Basically we should expect to see the system split up into patches of radius D , each with one dominant site. This is roughly what is normally seen. We have a pattern of stars where all the flow from most sites, O i , is directed to just one site, the terminal site in their neighbourhood. The formal definition of a terminal site used by Rihll and Wilson is a particular implementation of a scheme of Nystuen and Dacey (1961) and we will follow suit. We define terminal sites T ∈ T ⊂ V to be sites where the total flow into a site is greater than the largest single flow out of that site along any edge Basically for a terminal site more flow comes in than leaves the site along any one edge.
In practice for many of the parameter values found here we found that terminal sites were the only ones with any significant flows into their sites so the in-strength, I i , is usually sufficient to detect these important sites.

C Cluster-and-Centre Methods and Results
The Cluster-and-Centre algorithm first takes the distances between the sites and uses these to produce a partition of the sites using a standard data clustering method. Let us suppose that these clusters are {P c } where every site in is one and only one cluster, i.e. c P c = V, and P c ∩ P d = if c = d. We then define the centre of each cluster c to be at (x c , y c ). We will define the centre in two ways, using the mean or the median of the coordinates (x i , y i ) of sites i in the cluster c, i ∈ P c . Finally we define the central site to be the site closest to the central point.
The six clustering methods we have used are k-means and five variations of Hierarchical Agglomerative Clustering (HAC); for example see Manning et al. (2008) for both methods. Hierarchical Agglomerative Clustering methods work by starting with each site in a group by itself. A distance scale D is slowly increased and two groups are joined together as soon as the distance between these two groups equals the scale D. The different types of Hierarchical Agglomerative Clustering differ in the way they define the distance between two groups of sites. For instance the single (or minimum) Hierarchical Agglomerative Clustering method joins two groups if just one pair of sites, one from each group, is separated by D, while the complete (or maximum) method only joins groups when all pairs of sites (one from each group) are separated by at least D. The average Hierarchical Agglomerative Clustering method joins two groups together if the dendrogram scale D is equal to the average distance between all possible pairs of sites, one site from each of the two groups.
These central sites of the Cluster-and-Centre algorithm can then be compared to terminal sites (A17) defined in the Rihll and Wilson model. More generally the partitions {P c } of the Cluster-and-Centre algorithm can be compared to clusters {C c } formed in the Rihll and Wilson model which we define as follows. Let T ∈ T be a terminal site as defined by (A17). Then we define a cluster C T for each terminal site T where Provided the largest flow leaving a non-terminal site is unique for each site, then this defines a partition, that is each site is in one and only one cluster, i.e. T ∈T C T = V, and C S ∩ C T = if S = T . In Table 6 below, we show the results of several attempts to cluster the data using standard methods followed by finding the geometric centre of each cluster. We note that the clusters are much more uneven in size for the Hierarchical Agglomerative Clustering methods with the single method being particularly poor. That is one reason we do not consider it seriously in the discussions in the main text but in any case this single method is known to produce long thin clusters which don't seem appropriate for this context. Table 6: Results of partitioning the data into eight clusters and then finding the geometric centre. The normal distance data set was use to define the distances between sites and then standard clustering methods were applied: k-means or various versions of hierarchical agglomerative clustering (HAC). Once these clusters were defined, the geometric centre of each cluster was found by finding the site closest to the mean and closest to the median of the coordinates of the sites in each cluster. The results for one cluster are shown on each line.