ORIGINAL RESEARCH article
Sec. Social Physics
Volume 10 - 2022 | https://doi.org/10.3389/fphy.2022.861678
A Fractal Theory of Urban Growth
- Centre for Advanced Spatial Analysis, Bartlett Faculty of the Built Environment, University College London, London, United Kingdom
This paper presents an analytical framework for the physical environment of cities using fractal theory. The strength of the approach lies in its simplicity and precision. The equations presented in this article comprise: the number of occupied sites in an area; the population and the length of roads of a city; its fractal dimension; its number of average and maximum levels (floors per building); the average density of population and roads; what are the limits to growth as well as an analysis on some of the city’s scaling laws. These equations describe to a high level of precision the real values measured in the system of the United Kingdom, for every city above 5,000 people, which amounts to a sample size of 1,031 cities. This work will allow further research into the nature of cities, since it enables the creation of synthetic cities, and further analytical derivations that can arise from these building blocks. The paper shows as well how the same set of equations can be used to characterise the internal distribution of cities from the perspective of its growth as a possible example of an application of the framework.
The field of urban studies is a continuous pursuit for regularities that expand our capacity to describe and understand cities. From its origins, the field has been intimately related to ideas and methodologies in the field of statistical physics, and an overarching summary of the path and main ideas of the application of statistical physics to urban environments is presented in .
We cannot claim that we understand how cities evolve as long as we do not have an exact set of equations that relate every variable to each other allowing us to understand the effects of population growth. This is paramount for a large number of fields, including research, urbanism and political/economical science. The aim of this work is to create an analytical framework for the analysis of the most important variables in a city from a geometrical standpoint.
This paper presents a theory of the physical aspect of the city using a fractal framework. Cities are defined by their occupation of space, and I show over the next sections that cities increase their fractal dimension as they grow in population, which is a fundamental property, and which was already noted in . In fact, the study of cities as fractals has a long tradition in the scientific literature [2–8]. This view is fundamental to understand cities, since it commands the occupation of space for a given city, and therefore it is the only valid way to extract its geometric framework.
The work presented in this paper studies the population, the road network, the occupation of space, the fractal dimension of the city, the average heights, the maximum heights and the interactions or GDP of a city. I show how all these variables are related between each other and how they were derived. This work is not the first to attempt to produce a set of equations that describe the main variables of the city. Some previous work include [9, 10] which show how congestion influence the growth of cities, the aforementioned  that presents a theory of growth of cities based on scaling theory or several theories on the growth of cities [11–13].
This work delves as well into how this fractal framework affects and influence the scaling of a number of variables. Scaling theory [14–17] in urban science [18–23] study how allometric relations appear between variables such as the length of roads, the number of gas stations, the cost of maintaining a city’s infrastructure, its GDP and many other quantities to the size of the city in terms of population. In previous work , it was shown that those scaling exponents could be derived from a simplified and approximated version of the equations that governed those same variables. The current work presents an improvement on the calculation of the scaling exponents going beyond what was presented in , since we now account with better and more precise equations to describe length of roads, population and GDP.
The work also shows an application of the set of formulas to develop an approximation of the internal distribution of a city, derived from its growth. In order to do so, it assumes that the same formulas that describe a city in its current state have to remain meaningful to describe a city at any specific instant of its history, meaning that this growth is ergodic.
Cities begin to form with the construction of a single house, slowly other houses join, percolating space, and soon the city’s fractal dimension starts increasing. New occupied sites are incorporated with a certain probability of occupation over the territory that surrounds the city while the existing urban fabric gets densified. This probability of occupation tends to a constant value because of a self-optimization pattern. Going any lower would break the city apart into different clusters and expand the city over a long area, increasing travel times and decreasing economies of scale. Going higher would increase traffic and other problems derived from density, leading to an optimal solution in which the probability of occupation is as low as possible while still keeping the city as one single cluster. This probability is very close to the critical probability of a percolation over a squared lattice in two dimensions because the topology of cities is in average similar to that lattice.
In its first stages the city densifies its road network, subdividing the occupied sites which increases the density of occupation. At some point in its growth, the density of people that can live in a planar city saturates and in order to keep growing, the city needs to extend into the third dimension increasing its height, eventually pushing the fractal dimension of the population above 2. From its initial state, the density of population keeps on ever increasing as the city first densifies and later grows further into the third dimension.
The current section constructs the main framework of this work, presenting the derivation of its equations for several geometrical variables in a city. In order to simplify the equations and reasoning, I will use along this section an idealised system, in which I will avoid talking about multipliers or characteristic scales that need to be fitted in order to obtain realistic values, I will show how to calculate those multipliers in the last section of the paper, where I will adapt the equations to work with a real system and give the value for the constants in the specific case of the United Kingdom which will serve as an example throughout the paper.
As a city grows, the pre-existing city does not disappear, meaning that it must maintain a minimum of the current probability of occupation of sites where buildings are constructed. Considering a squared city with a linear dimension L, with a planar fractal dimension d and a certain number of occupied sites n = Ld (shown in Figure 1B) coming from a probability of occupation ρ over an area a = L2. Then we have ρa = n which in turn means that Ld−2 = ρ so:
FIGURE 1. Comparison between real data and their corresponding equations. (A) fractal dimension as a function of linear dimension of the city. (B) number of occupied sites as a function of the linear dimension of the city, measured using GHS data . (C) population as a function of the linear dimension of the city from GHS data. (D) total length of roads as a funcion of the linear dimension of the city taken from OSM data . In blue real data, in black the equations derived with this approach. The vertical dotted line corresponds with the critical threshold.
This equation means that the fractal dimension of a city needs to increase as its linear dimension grows in order to avoid having its preexisting city disappear, this was already noted in . Otherwise, if it were to remain constant (or decrease) we see that the only solution would be to decrease its probability of occupation, meaning that in order to occupy sites in its outskirts the city would need to vacate sites in its preexisting city. This explains the behaviour observed in real systems (Figure 1A) which shows a normal distribution of the error between the predicted and real value with mean −0.0142 and standard deviation of 0.0677.
Cities grow vertically above its two-dimensional footprint. In fact, population becomes a fractal volume, that starts below dimension 2 but as cities become larger it surpasses it. Furthermore, the population has always a larger fractal dimension than the footprint of a city. As it was shown on  the fractal dimension of the population (dp) can be obtained by adding a fractal vertical component η to the planar fractal dimension of sites d, that is dp = d + η and therefore the population (shown in Figure 1C) can be expressed as
Throughout its growth, a city starts densifying its street network increasing the quantity of people that can live within it, and at some threshold xc the density of population per site in a planar city saturates and cannot longer continue growing through this process. In order to keep on growing above that critical threshold xc it needs to start increasing its height. Therefore, the growth of people per site
is a constant.
Numerically, at xc, the density of roads reaches its maximum value of 1 (understood as the probability of finding a road segment in a site) ⟨ℓ⟩c = 1 and the average number of levels of the city is also 1, ⟨h⟩c = 1, since it was 1 from the beginning of the growth of the city and it only starts increasing right after xc. So we have that:
We also have that since ρa = n,
This last equation means that at xc the population is proportional to the area since ⟨ℓ⟩c = 1 and ⟨h⟩c = 1 and k and ρ are constants. Therefore, the fractal dimension of the population at xc is the same as the fractal dimension of the area
and since pc = L2 = ac and a/n = ρ−1 then
for any city size. This is a constant for any city, the limit of population density in each vertical floor belonging to a site per meter of road (how dense is the network, how subdivided is the system), the system cannot hold more people than this value per site, per level. Furthermore, Eq. 5 becomes
which means that at the critical threshold xc, when saturation is reached and both the density of roads and the number of levels is 1, the population equals the area. Moreover, and as an indirect consequence of Eq. 6 we also have that the threshold xc is reached when the linear dimension of the city is Lc = ρ−1/η.
A city cannot grow without limit, the equations portrayed in this work show that its density would go to impossible amounts, the heights of its buildings would reach levels that are physically unattainable and a large number of other issues such as congestion and competition for space would arise. As we see in Eq. 7, ρ−1 is the density limit for the night time population, how many people can live in each site per level, and this is a hard limit, no city outgrows this. As cities grow further than this threshold (xc), the day-time population will spread over its area, people will walk down the parks, the plazas, the avenues and will of course be present in buildings. Since the area of the city cannot contain several levels, it means that at some point in its growth, the population spread over its area will also reach this same limit
If at xm we have that ρpm = am and for all cities ρa = n, then
Regarding ⟨ℓ⟩ and ⟨h⟩ given that we know that both are complementary, since at xc both are 1, and one cannot exist in the numeric range of the other (one has to be less than 1 and the other more than 1) then we have that
which is shown in Figure 2D, and since
portrayed in Figure 2B.
FIGURE 2. Comparison between number of levels and densities in the system with their respective equations. (A) maximum number of levels as a function of the linear dimension of the city. The number of levels is taken from data in Open Street Maps  (B) average number of levels as a function of the linear dimension of the city. (C) approximation of η using
In order to calculate the maximum number of levels of the city (⌈h⌉) we use an approximation from  where we obtained that
which is shown in Figure 2A.
We can also obtain equations that describe the average number of population in a site projected to the floor (collapsing all levels) ⟨pp⟩ (Figure 2E), the average number of people per meter of road ⟨pℓ⟩ and the average number of people per site and per level ⟨ph⟩
Over the next sections I will show how to adapt this framework to real data and how to obtain the value of η and ρ to be able to get the final values of our exponents.
Originally a theory derived in the field of biology, scaling theory studies the allometric scaling of variables in a city as it they relate to its population growth. Some of those variables scale sub-linearly with the size of the city, meaning that the larger a city gets the slower that variable grows, this is the case for variables where economies of scale arise, such as the length of roads needed to cover the city, the number of gas stations, etc. Other variables grow linearly, because they correspond to some fixed value per person, such as the amount of water consumed. Finally, some other variables grow super-linearly, meaning that they grow faster than the population, usually these arise through feedback effects and include elements like traffic congestion, criminality, interactions or the GDP of a city. In  we showed that in fact, this relation to size was due to the fractal nature of cities, and calculated the expected exponents from the fractal scaling of the population and road network.
In that previous work  we reasoned that since the length of roads was proportional to Ld and the population was proportional to
this means that the previous reasoning still stands, but only for the largest cities (those above xc) but the small cities are better represented by a different γ. The resulting length of roads fits the data to a very high degree of precision (normal distribution of the differences between the logged real and predicted values with μ = 0.01466 and σ = 0.1618) as shown in Figure 3A where Figure 3C shows the value of γ.
FIGURE 3. Relations between some of the variables and the scaling equations presented in this paper. (A) length of roads as a function of the population and the sublinear exponent (ℓ = pγ). (B) GDP as a function of the population and the superlinear exponent
In that same paper, we also reasoned that interactions occur when people go to the street and that therefore it should be proportional to the square of the quantity of people in the ground level, multiplied by the number of locations in which that were possible. In that work, the equations were approximated and we used ℓ ∼ n which gave us that i = (p/n) (p/n − 1)n, where i represents the total possible interactions, but since in this work we are distinguishing between the two values (ℓ and n), it is more precise to say that the people in the street interacts, and the number of possible locations is the length of the street network. therefore:
In order to obtain an approximation of the super-linear exponent of interaction
Similarly to what was done in , we assume that the GDP of a city is a direct consequence of the interactions between individuals, and use that quantity to showcase the validity of the formulation as shown in Figure 3B while the super-linear exponent is shown in Figure 3C.
Formulation, Constants and Units for Real Data
The current framework represents an idealised system, it is unitless because everything is divided by an implicit characteristic scale that we will make explicit in this section and there are not any multiplying constants for the sake of simplicity. This section completes the framework, by including those factors and thus creating the final set of equations for the system.
From this point on forward we will use the subindex r to refer to real variables as measured from the data.
To determine the side of our real square (in meters) we use the area of the city.
The characteristic scale for the length of the side of our squared area is called L0 and it is measured in meters.
where for the United Kingdom L0 = 538.924 m. This value was calculated through measuring the fractal dimension for all cities and their areas. An approximation can be obtained through performing those measurements for the largest city (max (dr), max (Lr)) and calculating our theoretical max(L) = exp (ln(ρ)/(max (dr) − 2)), to find L0 = max (Lr)/max(L). Of course, for this we need to determine ρ, this can be done either through directly measuring occupied space (buildings and roads) against open spaces in the city (parks, and plazas) or assuming our theoretical value for ρ = 0.5991 taken from the next section. As an example, London has an approximated 40% surface occupied by parks, which means that its ρr = 0.60. I use this value for L0 as a starting guess and perform a least square estimate of L0 using the measured fractal dimension and area for all cities (I assumed the theoretical ρ to be valid). Notice that we cannot use Lr and d to directly calculate ρ because.
For completeness, we will show how to obtain the area as a function of the side of the square.
where for the United Kingdom p0 = 1,207 people, which was measured by adjusting the theoretical population to the real data measured until their differences were minimised.
where for the United Kingdom n0 = 7.85 sites, measured against the real data.
where for the United Kingdom ℓ0 = 7,700 m, measured against the real data.
where for the United Kingdom h0 = 2 levels, measured against the real data.
The constant that limits growth becomes:
and the densities:
Regarding the scaling equations, we have that the real length of the road network as a function of the real population becomes:
Notice that, the typical equation of a scaling
For the GDP we have:
where for the United Kingdom g0 = 2.3 ⋅ 107 euro, measured against the real data. This is an approximation, and from my perspective it is preferred to use the actual equation instead of an approximated scaling law, whenever possible, even though both equations look indistinguishable when presented against each other or the data. The equation for the interaction of population is:
with a value i0 unknown, since there is no data to measure it. This factor i0 represents the probability that a potential interaction becomes a real one. Furthermore, if the assumption between proportionality of interactions and GDP stands:
One interesting side effect of this, is that given that L0, n0, p0, ℓ0 and h0 or even g0 are pure constants for a system of cities (the variability is absorbed through the rest of the equation), they are much better descriptors of a system, and when calculated for other systems, they will allow us to make comparisons with less noise between different countries.
The Value of ρ and η
To render this analytical approach useful we need to be able to obtain the values of our two constants η and ρ. My approach was to use a genetic algorithm, whose inputs were the area (ar, from where we obtain Lr), fractal dimension (dr) and population (pr) for each city and the parameters to be optimised are ηp and ρp. Using this, I apply a two steps approximation.
In the first step, in order to obtain the heuristic value for each individual, I calculate L0 using the parameter ρp given by the algorithm, and then after calculating the real density
In the second step, we fix η and only optimize ρ, allowing the search only in the neighborhood of the approximated value we obtained in the first step. The only difference between the two, is that we no longer use the real ρr and instead use directly the parameter ρp at every step (to calculate L0 and d = 2 + ln (ρp)/ln (Lr/L0)). From this second step we obtain our constant ρ.
The values obtained were:
were less significant digits were discarded. These values mean that dm = 2 − η/2 = 1.8954 and dc = 2 − η = 1.7908. The measurements of η and ρ were obtained from approximated processes and these measurements could be improved in the future.
This is surprisingly close to the values for a site percolation in a 2d-lattice given in the literature, were η = 0.2083 is the exponent for the function that controls the probability of two sites belonging to the same cluster as a function of distance, pc = 0.5927 is the critical probability, df = 1.8958 is the fractal dimension of the percolating cluster. Given that percolation has been tied in the literature  to the formation of cities, I expect that there exists a logical link between the two but the reasoning behind this numerical coincidence falls outside the scope of this paper and is left for future work.
Scaling studies have shown that different city systems across the world have very similar scaling exponents . Following our derivation we see from Eq. 16, 18 that the scaling exponent depends on d and η. Since d is a function of the linear size of the city and ρ (Eq. 1) the scaling exponent is a function of ρ and η. If the scaling exponents are truly universal it would then mean that in fact ρ and η are universal and therefore these values should remain stable for different systems.
In the following section we show a possible application of the framework contained in this paper, in order to demonstrate its expressiveness.
Growth of a Single City
We can apply the same reasoning presented above to obtain the internal distribution of a single city, since at each stage of its growth, the city has to follow the equations presented for fractal dimension, population, number of sites, and length of roads if we consider urban growth to be an ergodic process.
Upon growth, the city increases from a current linear length L to L + dL. In this change of linear size, it modifies its fractal dimension from dL to dL+dL, and its population change is
This population change will be partitioned between the stripe of land added to the city and the existing urban tissue. I assumed a simple formula for this, that uses a weight to balance the two, w. We then consider a value δ that is the density of population added at each step, which multiplied by the respective areas gives us the increase of population in the new area and the preexisting one. The basic formula for the population at a certain stage of its growth is then:
where w is adjusted to fit the real distributions, in the modeling process w has been made dependent on the step size, so variations on the step size would not influence the final distribution, the adjusted value was
Of course, as we add new population, each city stripe must remain under the maximum possible population. This maximum possible population can be calculated from Eq. 8, where max(p) = max (a⟨ℓ⟩⟨h⟩) = a max ⟨ℓ⟩ max ⟨h⟩ and since max ⟨ℓ⟩ = 1 then max(p) = a⌈h⌉. So each stripe must remain below its area multiplied by the maximum height of the city for the current linear dimension.
When deciding where to locate in the city, a new inhabitant only cares on the distance to the center, in order to simulate this extent when distributing the population (δwL2) over the pre-existing city, we weight each strip by how many more people fit in it, divided by its perimeter,
We can repeat the same steps for the number of sites and the length of roads, obtaining the most important variables. For number of sites, we choose a maximum possible density of 0.9 (being 1 complete occupation), this value was obtained from the data observed, while length of roads is limited by the number of sites. From it we can calculate the heights of buildings expected and the density of sites per area or of people per site.
The height of buildings for the real data is a direct measurement taken from the LIDAR available at the Copernicus site  and no transformation was applied other than dividing it by 3 m, which is taken as an average floor height, this is shown in Figure 4E. In order to calculate the density of a site, we calculate how many occupied sites (there exists population in that element of the grid using data from the GHS ) are in the surrounding area of each site (with a radius of 6.250 km) and divided it by the maximum possible number of sites in that circle, as shown in Figure 4F. The last comparison (population per site) is more complicated, and we need to think how this data was created. The population data is obtained from the Global Human Settlement layer , this data has been produced by taking the population in censal sections, determining the building footprints from satellite data and interpolating the population with the perceived density of buildings, also, most probably, since we do not see any clear cuts from the censal sections, a spatial interpolation averaging large discontinuities was performed. Both interpolations (and even the data aggregated to a censal section) reduce the peaks of population, softening the overall distribution. Therefore, in order to create a fair comparison, we performed similar steps to our results. In Figure 4C both the real distribution obtained (dotted points) and the distribution obtained after a process of clustering and interpolation is shown.
FIGURE 4. Internal growth of a city. (A) internal distribution of population per area of each stripe located at L distance from the center of the city. (B) number of levels per stripe at L distance from the center. (C) internal distribution of number of occupied sites per area of each stripe located at L distance from the center of the city. (D) comparison between expected population per site using our equations and the measured data from the GHS . (E) comparison between the average number of levels obtained from our equations and measured LIDAR data from London taken from Copernicus data , which is divided by 3 m as an average floor height. (F) comparison between the expected density of sites and the measured data from the GHS. In blue real data, in black the equations derived with this approach.
The correspondence of the distributions obtained using the model with the real ones is fairly strong, indicating that this process could be a valid model for the internal growth of a city. However, and as we can notice in Figure 4B we can see that at the outskirts of the city there is a strange behaviour, where the height of buildings start growing again instead of decreasing, which means that there is still room for improvement. This problematic is created because the number of sites decreases faster than the population for that range.
The analytical derivations that give rise to the equations portrayed in this work, makes them exact functional forms of many aspects of the city’s physical environment. This is of extreme importance, since every derivation made from them, every operation will still represent what they are meant to convey. As it is often said, we stand on the shoulder of giants, and approximated equations of similar quantities have been portrayed before in the literature, and while these brought light to a lot of issues they are of limited applicability, because of their approximated nature.
I believe that following this text, new ideas will become easier to test and derive, aiding the process of solving the puzzle of cities.
This work portrays the equations for fractal dimension, population, area, length of roads, different densities of population, average and maximum heights (levels) for a city, and interactions (or GDP). Moreover, it shows how using this framework we can study the internal distributions of those same variables within the city.
The article uses population data from the Global Human Settlement Layer (GHS) , height data from the Copernicus satellite LIDAR data , height and road data from OpenStreetMap  and GDP data from Eurostat .
Data Availability Statement
The datasets used in this work fall under the umbrella of open data and are available at their respective websites as referenced in the bibliography.
The author confirms being the sole contributor of this work and has approved it for publication.
Conflict of Interest
The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
3. Murcio R, Masucci AP, Arcaute E, Batty M. Multifractal to Monofractal Evolution of the london Street Network. Phys Rev E Stat Nonlin Soft Matter Phys (2015) 92(6):062130. doi:10.1103/PhysRevE.92.062130
26.European Commission. Global Human Settlement Layer. Population Grid, European Commission (2015). Availableat: http://ghsl.jrc.ec.europa.eu/ghs_pop.php (Accessed September 2019).
27.European Commission. “Copernicus Urban Atlas (2012). Availableat: https://land.copernicus.eu/local/urban-atlas/building-height-2012 (Accessed September 2019).
28.Planet OSM. OpenStreetMap Contributors, “Planet Dump (2017). Availableat: https://planet.osm.orghttps://www.openstreetmap.org (Accessed September 2019).
29.Eurostat. Eurostat Gdp Data at Nuts-3 Level (2017). Availableat: https://ec.europa.eu/eurostat/web/rural-development/data (Accessed September 2019).
Keywords: fractal theory, urban growth, urban science, scaling theory, complexity science
Citation: Molinero C (2022) A Fractal Theory of Urban Growth. Front. Phys. 10:861678. doi: 10.3389/fphy.2022.861678
Received: 25 January 2022; Accepted: 21 April 2022;
Published: 09 June 2022.
Edited by:Haroldo V. Ribeiro, State University of Maringá, Brazil
Reviewed by:Luiz G. A. Alves, Northwestern University, United States
Satyam Mukherjee, Shiv Nadar University, Greater Noida, India
Copyright © 2022 Molinero. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: C. Molinero, email@example.com