Dynamics of Racial Residential Segregation and Gentrification in New York City

Racial residential segregation is interconnected with several other phenomena such as income inequalities, property values inequalities, and racial disparities in health and education. Furthermore, recent literature suggests the phenomenon of gentrification as a cause of perpetuation or increase of racial residential segregation in some American cities. In this paper, we analyze the dynamics of racial residential segregation for white, black, Asian, and Hispanic citizens in New York City in 1990, 2000, and 2010. It was possible to observe that segregation between white and Hispanic citizens and between white and Asian ones has grown, while segregation between white and black is relatively stable. Furthermore, we analyzed the per capita income and the Gini coefficient in each segregated zone, showing that the highest inequalities occur in the zones where there is an overlap of high-density zones of pair of races. Focusing on the changing of the density of population across the city during these 20 years, and by analyzing white and black people’s segregation, our analysis reveals that a positive flux of white (black) people is associated with a substantial increase (decrease) of the property values, as compared with the city mean. Furthermore, by clustering the region with the higher density of black citizens, we measured the variation of area and displacement of the four most significant clusters from 1990 to 2010. The large displacements ( ≈ 1.6 k m ) observed for two of these clusters, namely, one in the neighborhood of Harlem and the other inside the borough of Brooklyn, led to the emergence of typically gentrified regions.


INTRODUCTION
Although it is not a recent phenomenon, racial residential segregation (RRS) continues to permeate the United States metropolitan areas and it is still an object of study for scientists of different areas . The decrease of RRS in American cities is controversial and drastically varies from one city to another Furthermore, it shows different trends according to the race analyzed. For example, several studies show that the segregation between white and black citizens has decreased in the last 50 years [2,6,10,15]. Instead, segregation between white and Hispanic, and white and Asian citizens has increased [10,15].
Several indexes were developed to quantify RRS [1, 3-5, 14, 16, 19, 23-25]. The first and still most used nowadays is the dissimilarity index created by Duncan and Duncan in 1955 [25]. Subsequently, in 1988, Massey and Denton [23] defined five distinct axes of measurement of residential segregation: evenness, exposure, concentration, centralization, and clustering. The authors affirmed that, to fully analyze residential segregation, at least five indexes corresponding to the five spatial dimensions are necessary. Meanwhile, in 2004, Reardon and O'Sullivan's developed several measures of multigroup segregation and, among them, the authors consider the Information Theory Index the most conceptually and mathematically satisfactory measure to quantify residential segregation [16].
The RRS is the cause and effect of several inequalities. Studies show the relations between racial segregation and income inequalities [11] and property values inequalities. Furthermore, RRS causes racial disparities in health and education [11,17,20,21]. In New York City, for instance, the mortality rates of black citizens vary substantially by locality according to the pattern of racial segregation [21].
In recent years, some researchers also suggest that the phenomenon of gentrification is a cause of perpetuation or even of the increase of RRS [26][27][28][29]. Gentrification is defined by The Encyclopedia of Housing [30,31] as: The process by which central urban neighborhoods that have undergone disinvestment and economic decline experience a reversal, reinvestment, and the in-migration of a relatively well-off, middle and upper middle-class population.
The main reason to indicate gentrification as a cause of perpetuation of racial segregation is the presumed displacement of the low-income class, in many cases predominantly black and/or Hispanic citizens, from their native neighborhoods during the gentrification process [26,29,30,32,33]. Taking the example of New York City once again, there is an intense debate about the gentrification of regions inside the neighborhoods of Harlem and the borough of Brooklyn [34][35][36]. Important to note, gentrification is a socio-economic phenomena with positive and negative impacts still under discussion by the scientific community. Therefore, it present more nuanced outcomes, and even lead to opportunity benefits, reduce out-migration pressure, and promote long-term affordability [37].
This paper aims to study the dynamics of RRS in New York City from 1990 to 2010. Here, we developed a method able both to measure RRS and to delimit the segregated zones. Indeed, differently from previous measures [38], our approach provides a clusterization inside high density areas, allowing a detailed description of the dynamics of such clusters.
Within the limit of the segregated zones, we analyze the per capita income in each high-density zone of population (defined by race) and in the zones of overlaps between them. To quantify income inequality, we calculate the Gini coefficient in each location. Then, we study the per capita income variation and the properties values for the census tracts that change zone during these 20 years. Finally, we focus on the segregation between white and black citizens. Notably, we use a simplified version of the City Clustering Algorithm (CCA) [39][40][41][42][43][44][45][46][47] to cluster the high-density zone of black citizens and measure the displacement and area of the four most significant clusters. One of these clusters includes the neighborhood of Harlem, and another one is inside the borough of Brooklyn.
The paper is structured as follows: first, we introduce our method. Then, we show the results of the application of the technique to New York City. After discussing the results, we finally present our conclusions.

METHODS
The method consists of the following steps: first, we define the limits of the city using the City Clustering Algorithm (CCA) [39][40][41][42][43][44][45][46][47]. Second, we find the high-density zones for white, black, Asian, and Hispanic citizens. Finally, we measure the RRS through the Overlap Coefficient.
The CCA is an algorithm introduced to define the boundaries of metropolitan areas [39][40][41][42][43][44][45][46][47]. Its result depends on two parameters: a population density threshold D* (in people/km 2 ), and a cutoff length ℓ (in km). The elementary information for population data is provided in census tract, where the tracts are geographic regions defined by the United States Census Bureau [48] (see Supplementary Appendix A for more information about the database). From the total population and area of a given tract, we calculate its population density D i . At this point, following the CCA, we assume that only the tracts with D i > D* are populated.
The next step of the algorithm is clusterization. In this step, we define the urban center. For each populated tract, we draw a circle of radius ℓ with a center in the centroid of the tract. All populated tracts with the centroid inside the circle belong to the same cluster and, therefore, the same city. We choose the parameter D* and ℓ respecting the isometry between area and population of the cities [39][40][41]. We apply the algorithm for the entire country and, subsequently, extract only the cluster equivalent to New York City.
The importance of using the CCA to define the urban area of New York City is due to the fact that RRS profoundly depends on the definition of urban areas [2,3,5]. For example, it was shown in [39,41] that the Metropolitan Urban Areas (MSA) have large inhabited regions. Instead, our research aims to analyze RRS in a very dense urban area, specifically New York City.
We define the high-density (HD) zones as the set of tracts inside the city with a high population density of a specific race. The HD zone of a specific race r is defined applying a density threshold D r *. We consider the tracts with D r > D r * as being populated by race r, where D r is the population density of race r. We choose the population density threshold D r * by analyzing the fraction p r of the population of race with respect to the total population of the same race inside the whole city. Therefore, for each race r, we define a parameter p r as the ratio between the population of race r inside its HD zone and the total population of race r. To make the analysis as uniform as possible, we choose D r * so that both D r * and p r take similar values for all considered races r. Figure 1 shows the parameter p as a function of D r * for each race in New York City. We consider the following fraction of people in three cases using different values of D r *, namely when D r * is next to 0, for very large values of D r *, and the value of D r * for which the total population of a given race is close to 80%. The first two are related to all populations and any population, respectively. In Figure 1, the dashed line highlights the value of p 0.8, providing the value of D r * for which the total population of each race is equal to 80%. The parameter p r has been tested in the interval from 0.7 to 0.9 without finding any discrepancies in the results. Therefore, at the end of this step, the method provides well-defined geographic limits of the HD zones for each race. In this work, the threshold density is set to D* 4560 people/km 2 and the cutoff length is set to L 3 km according with the methodology presented in Refs. [39][40][41][42][43][44][45][46][47].
Here we quantify the RRS between two races and two HD zones in terms of the Overlap coefficient (or Szymkiewicz-Simpson coefficient [49]) as, where X r and X r′ are respectively the HD zone areas of races r and r′. Coefficient O rr′ is the sharing area between the HD zone of race r and the HD zone of race r′ divided by minimum area between the two zones. Therefore, the overlap coefficient is a real number between 0 and 1. When it is next to 0 (low overlap), the coefficient indicates high segregation, while when it is next to 1 (high overlap), it indicates low segregation. As defined, the overlap coefficient is a measure between a pair of races and may be considered a two-fold measure. Here we choose this one-versusone strategy [50] to investigate the changing dynamics between all pairs of races. In what follows, we show that this is an efficient strategy to study the multiracial dynamics of the system.

RESULTS
First, we define the limits of New York City by applying the CCA to the population data in 2010 (see Supplementary Appendix A for more details about the data). Then, we calculate the HD zone for white, black, Asian, and Hispanic for 1990, 2000, and 2010. In Figure 2, we show the HD zone for white and black citizens with the respective Overlap zone in the year 2010. Table 1 shows each race's population and population density for the years of 1990, 120 2000, and 2010 for the New York City area defined by the CCA. For each pair of races, we calculate the Overlap coefficients. Table 2 shows that the segregation between white and black and black and Asian citizens remains relatively stable during the time interval. While segregation between white and Hispanic, white and Asian, and Hispanic and Asian has increased, the segregation between black and Hispanic citizens has decreased. Black people are frequently the most segregated, having a high overlap coefficient only with Hispanics.
After defining the HD zones and the Overlap zones, we calculate the average per capita income of each race inside each zone for 1990, 2000, and 2010. Figure 3 presents these results. Clearly, white citizens earn more than all the other races in all zones, except in those where there is segregation between white and Asian citizens. Black and Hispanic citizens earn less than whites in all zones. Moreover, Frontiers in Physics | www.frontiersin.org February 2022 | Volume 9 | Article 777761 Figure 3 shows that income inequality between white and black citizens is more significant in the Overlap zone than in the zones 100% white and black. In order to identify the per capita income inequalities in the segregation between pairs of races (white and black, white and Hispanic, and white and Asian), we calculate the Gini coefficient [51] for each case (see Figure 4). The Gini coefficient varies from 0 to 1. When it is next to 0, there is no inequality, while when it is next to 1, inequality is maximum [51]. Figure 4 shows that inequality is more significant in the Overlap zones in all cases in favor of whites.
We also analyze the tracts that migrated from one zone to another from 1990 to 2010 for the cases of segregation between white and black citizens in Figure 5; white and Asian citizens in Figure 6; and white and Hispanic citizens in Figure 7. The colors in the maps in Figures 5-7 show the alternatives of migration of the tracts from one zone to another. For each case, we calculate the average variation of the per capita income, ΔI, and the average variation of the properties values, ΔH, as and, where the differences ΔI or ΔH are calculated between two given years y 1 and y 2 , N is the number of tracts of the analyzed pairs of races, δI i and δH i are the variations of the per capita income and properties values of tract i, respectively, and δI N i 1 (I i y2 − I i y1 )/N and δH N i 1 (H i y2 − H i y1 )/N. Therefore, positive ΔI or ΔH mean growth higher than the city mean, while, conversely negative ΔI or ΔH mean growth lower than the city mean.
At this point, let us focus on the segregation between white and black citizens and the flux of people from 1990 to 2010 inside the tracts that migrated from one zone to another or to the Overlap zone. The flux of people of a race X inside a tract corresponds to the variation of this people inside tract i compared with the mean variation of them in the whole city. Similarly to Eqs. 2, 3, the average flux ΔFlux X is defined as, where δFlux X is the mean flux of race X in the whole city. Still focusing on the segregation between white and black citizens, we show in Figure 8 the variation of income, the variation of properties values, and the flux of people in the tracts that change zone between the years 1990 and 2010.
For the tracts presented in Figure 8, in Figure 9 we compare the variation of the flux of white and black citizens with the variation of the properties values. It shows that where the flux of white citizens is on average positive, the properties values increase more than the mean. On the other hand, where the flux of black citizens is negative on average the properties values decreases more than the mean.
Next we investigate the displacement of black citizens in New York City. We divide the HD black zone into clusters by using the CCA. Indeed, we ignore the threshold D* and apply the cutoff length ℓ′. The parameter ℓ′ is chosen by analyzing the distribution of the area of the tract. We consider each tract area as a circle with the same area. A mean radius of r 1.3 km was found. Therefore to consider two neighbors tracts as part of the same cluster, we use ℓ′ 1.5 km. The results of the clusterization are shown for the years 1990 and 2010 in For the four most significant clusters (A, B, C, and D), in Table 3 we show the area of each of them for the years 1990 and 2010. Also presented are the displacements of the cluster's centroids, highlighting the fact that clusters A and C have a displacement about three times higher than clusters B and D. In Figure 11, we show the displacement of clusters A and C from 1990 to 2010. Cluster A includes a region in the neighborhood of Harlem, while cluster B is located inside the boroughs of Brooklyn. Also shown in Figure 11 is the variation of the per capita income ΔI for the tracts that change zone in the analyzed period.

Comparison With the Dissimilarity Index
In order to verify the robustness of our method, we compare the Overlap coefficient defined in Eq. 1 with the dissimilarity index [25]: where r i is the population of race r in tract i and r i ′, the population of race r′ in the same tract. R and R′ are the total population of  Frontiers in Physics | www.frontiersin.org February 2022 | Volume 9 | Article 777761 7 race r and r′ in the whole city, respectively, and the city is defined using the CCA. N is the number of all tracts that belong to New York City. The value of D rr′ indicates the fraction of one of the two populations that has to move in order to eliminate segregation. Precisely, if it is close to 1, RRS is high, while no segregation is detected if D rr' 0 [25]. The results for New York City are presented in Table 4.
To analyze the correlation between the two indices, we plot in Figure 12 the dissimilarity indexes D rr′ found for New York City as a function of their respective Overlap coefficients O rr′ (where X r is the HD zone of race a and X r′ of race b). As depicted, the least-squares fit to the data points of a linear relation, D rr′ mO rr′ + b, gives m − 0.57 ± 0.01 and b 0.90 ± 0.01, with a Pearson correlation coefficient of ρ − 0.96. Note that, even though the indices exhibit a strong correlation, they essentially provide different information. While D rr′ refers to the absolute difference between the fractions of races r and r′, the O rr′ index is a measure for the overlap area occupied by the races, as defined by the CCA algorithm. Furthermore, since O rr′ makes use of the HD zones, it also provides an overview of the geographical information of segregated/integrated areas and their dynamics over time.

DISCUSSION
We developed a new method to measure and define the topography of RRS and applied it to the metropolitan area of New York City for 1990, 2000, and 2010. Even though several studies show that, on average, segregation between white and black citizens in the United States has decreased in the last fifty years [2,6,10,15] our results for the overlap index show that it has remained relatively stable during the time interval 1990-2010 in the metropolitan area of New York City. For the dissimilarity index, we found a slight decrease of segregation between white citizens and black citizens. The same pattern can be identified for   black and Asian citizens. Instead, segregation between white and Asian citizens, and Hispanic and Asian citizens has grown for both indexes. Additionally, while the segregation between white and Hispanics citizens increased for the overlap index, it slightly decreased for the dissimilarity index. Compared to 1990, a decrease in segregation in 2010 could only be observed between black and Hispanic citizens.
To better understand the information provided by the CCA analysis applied to the mapping of geographical racial communities and the effectiveness of the Overlap Index, the choice of NYC is justifiable due to the richness of scientific literature on the topic related to the city. The behavior of the present analysis on different regions of the world becomes a topic for future work. The Overlap index, as presented, has some limitations since it only considers two racial groups at a time. A possible generalization of the Overlap index would be to extend it to a multi-group analysis. This generalization could result in losing part of the information the metric provides to the dynamical analysis presented, and therefore it also is left for future works.  By analyzing the per capita income, we observe that white citizens earn more than the other races in all the regions, except when analyzing the segregation between whites and Asians, since Asian citizens have a similar income to white citizens. Regarding the segregation between white and black citizens, we verify that black citizens earn less than white citizens in all the regions. Furthermore, the inequality between white and black citizens is more significant in the areas of a high density of population of both races. The Gini coefficient confirms this result. We show that it is higher in regions with a high population density of two or more races.
Furthermore, we study the segregation between white and black and the segregation between white and Hispanic citizens. We analyze the tracts that change population density from 1990 to 2010 from regions of a high density of black, Hispanic, or overlap with white citizens to a region where white citizens are the only ones to present a high population density. Our results indicate that the per capita income and properties values increased more than the city average in this region. Conversely, in the tracts that migrated from a region of overlap to a region with a high population density of only black or Hispanic citizens, the per capita income and the properties values increased less than the average.
Focusing on the segregation between white and black citizens, we analyze the flux of white and black citizens as a function of the variation of the properties values. Where the flux of white citizens is positive, the real state properties values increased more than the city average, while, where the flux of black citizens is positive, the properties values increased less than the city average. Therefore, our analysis suggests that the flux of ethnic groups with higher income increases the overall cost of living in a given area, evicting the lowerincome population. Although this is the standard explanation for the gentrification phenomenon, the present work provides a technical and deterministic method of diagnosing it in realworld data sets. We hope that this may help to better understand the role of the forces that create the patterns of city segregation by the use of agent-based modeling [52,53,54].
Previous studies [34][35][36] questioned the effects of gentrification in the neighborhood of Harlem and the borough of Brooklyn. Here, by clustering the region of a high density of black citizens, we calculated the displacements of the clusters including an area inside the neighborhood of Harlem, and inside the borough of Brooklyn to be, respectively 1.55 km and 1.57 km in 20 years. These results confirm in a quantitative way the displacement of black citizens in Harlem and the borough of Brooklyn.

DATA AVAILABILITY STATEMENT
Data Availability Section must read: All the data used in this paper is extracted from the National Historical Geographic Information System (NHGIS) [55]. The platform provides population, housing, agricultural, and economic data with GIS compatible boundary files for geographic units in the United States from 1790 to the present. Population data has been extracted from the platform according to race, per capita income data, and the number of owner-occupied housing units by value.