A Synoptic Framework for Forecasting the Urban Rainfall Effect Using Composite and K-Means Cluster Analyses

Observational and numerical modeling studies continue to affirm the existence of the “urban rainfall effect” (URE), or a discernible anomaly in warm season precipitation due to urbanization. However, the literature has been lacking a progression towards the predictability of the URE. Atlanta, Georgia has consistently appeared in the literature because of its well-studied urban rainfall anomalies. Using the Multi-sensor Precipitation Estimates (MPE) dataset and the ERA-Interim reanalysis dataset, an 18-year period (2002–2019) is examined. Three similar but distinct methods are used to define urban rainfall days (URDs), or periods when the precipitation in the urbanized areas of Atlanta are greater than the surrounding rural areas. A combination of compositing, wind rose, and k-means cluster analyses are employed to extract the synoptic framework supportive of the URE in Atlanta, Georgia. The synoptic-scale compositing analysis reveals that there are a consistent set of meteorological ingredients that are needed to produce an URD, including weaker-than-average southwesterly-to-northwesterly flow at 700 hPa, copious amounts of moisture throughout the tropospheric column, and a background low-level convergent flow. Composite atmospheric soundings reveal that there is enhanced moisture throughout the tropospheric column on URDs, leading to marginal instability that favors localized convection across the Atlanta metropolitan area. The study also provides clarity on how often the URE is present (roughly 8% of the time) during warm season days across the Atlanta metropolitan area. Taken together, this synoptic framework will aid in the forecasting of the URE in Atlanta and can be easily applied to other cities.


INTRODUCTION
More than half of the global population lives in urban spaces, and that number may exceed 65% by 2025 (Shepherd et al., 2013). The literature chronicles how urban environments modify weather and hydroclimate (Seto and Shepherd, 2009;Debbage and Shepherd, 2019). Like urban heat islands, urban effects on the hydroclimate have increasingly become well-understood. For decades, a scholarly question loomed regarding the role of urbanization on precipitation processes (Landsberg, 1956;Shepherd et al., 2002;Mitra and Shepherd, 2016). A host of investigators employed field campaigns (Shepherd et al., 2013), remote sensing platforms and long-term observations (Shepherd et al., 2002;Mote et al., 2007;Jin and Shepherd, 2008;Mitra et al., 2012;McLeod et al., 2017;Johnson and Shepherd, 2018), or numerical weather models (Schmid and Niyogi, 2013;Debbage and Shepherd, 2019) to answer the following fundamental questions: Can cities, via urban landscapes and/or associated aerosol processes, initiate or modify precipitating cloud systems?
If so, what processes are of first-order significance in urban-influenced precipitating storms? Liu and Niyogi (2019) recently established, with some degree of conclusiveness, that urbanization affects rainfall processes. Key findings from their meta-analysis established or confirmed several long-standing issues within the urban hydrometeorological community. While historical studies from the METROMEX (i.e., Metropolitan Meteorological Experiment) era argued that urban effects on rainfall are primarily downwind of the city, their analysis revealed that urbanization modifies rainfall in distinct spatial patterns around the central business district of a city: 18% downwind, 16% over the city, 2% on the left, and 4% on the right relative to storm direction. This is consistent with work by other investigators as well Bentley et al., 2012) and an important step in establishing what Shepherd et al. (2010) called the "urban rainfall effect," or URE. The URE is a broad term that captures the various ways that the urban environment modifies the initiation or modification of precipitation processes.
While the concept of the URE has been established for many cities using a variety of methodological approaches, the predictability of urban-generated rainfall remains largely unexplored. Previous studies (e.g., Bentley et al., 2012) have shown that the URE occurs most frequently during periods of weak synoptic flow, and McLeod et al. (2017) revealed that the magnitude and spatial pattern of urban-induced rainfall around Atlanta, Georgia was dependent on the synoptic flow regime. In addition, Dixon and Mote (2003) have shown that low-level moisture, rather than the intensity of the urban heat island, was the most significant predictor of urban-generated convection for the city of Atlanta. However, only a limited effort has been focused on identifying which meteorological variables in the synoptic-scale environment most effectively distinguish days exhibiting the URE. Bentley et al. (2012) revealed that a synoptic environment characterized by moderate thermodynamic instability was most favorable for producing urban convection and rainfall over Atlanta. Herein, the research was motivated by the following questions: (1) Can urban-generated rainfall be predicted with satisfactory skill for a city that has a strong URE during the warm season, such as Atlanta, GA? and (2) If so, which suite of meteorological variables can be used to most effectively distinguish days with a detectable URE from those days lacking a URE signal?
This paper provides a potentially "first of its kind" attempt at establishing a framework for predicting the URE. Building upon the work of McLeod et al. (2017), we use the city of Atlanta as a testbed for the analysis. Section 2 will provide an overview of the data and methodology, and Section 3 presents results. Section 4 summarizes key conclusions and potential pathways forward.

Data
One novel aspect of this study is the use of the Multi-sensor Precipitation Estimates (MPE) dataset to create a database of urban rainfall days (URDs). MPE (Fulton et al., 1998;Seo, 1998;Seo et al., 2010) is a gridded precipitation product that blends Doppler radar estimates and observations from station gauges. The use of this gridded dataset offers a more complete representation of precipitation across the study region, which is particularly important given the convective nature of precipitation during the warm season. The network of station gauges across northern Georgia is not be able to adequately capture convective precipitation totals at the local scale, but Doppler radar provides estimated precipitation totals that can aid in filling these coverage gaps. Our analysis spanned the period 2002-2019 (18 years), which reflects the availability of MPE data. The spatial resolution is approximately 4 km × 4 km, while the temporal resolution is defined as a hydrologic day, or the 24-h period extending from 1200 UTC to 1200 UTC. The high spatial resolution reflects another important reason for using the MPE dataset in this study. It is important to acknowledge that typical inaccuracies associated with Doppler radar (e.g., bright band contamination, Z-R relationships, spatial coverage, precipitation type, and range issues) and station gauges (e.g., wind, siting, and undercatch) will affect the precipitation estimates analyzed in this study (Smith et al., 1996;Sieck et al., 2007;Seo et al., 2010).
The ERA-Interim reanalysis dataset (Dee et al., 2011) is employed to construct synoptic-scale composite maps of the following meteorological variables: 500 hPa geopotential heights, 700 hPa vertical velocity, 700 hPa wind speed, 2-m dew point temperature, integrated vapor transport, precipitable water, and 1,000 hPa divergence. This dataset is produced from a 4-dimensional variational (4D-Var) analysis of observations over a 12-h analysis window. The spatial resolution is 0.75°× 0.75°( 80 km × 80 km), and the temporal resolution is 6-hourly (0000, 0600, 1200, 1800 UTC), which was further aggregated to the daily scale for this study. The period of record used for the ERA-Interim reanalysis is the same as that of the MPE dataset.
The ERA-Interim reanalysis is also used to create a daily database of 700 hPa wind speed and direction over the metropolitan area of Atlanta. Shepherd et al. (2002) noted that the 700 hPa level can be used for defining prevailing wind flow in urban climatology studies. The average of four ERA-Interim reanalysis grid cells centered over the Atlanta area (32.75-34.25°N, 83.5-85°W) was used to develop the wind climatology. While the 700 hPa wind climatology could have been developed from atmospheric sounding data collected from nearby Peachtree City, GA (KFFC; located about 25 miles southwest of downtown Atlanta), there are a few advantages to using the reanalysis data. First, atmospheric sounding data is only collected twice (1200 UTC and 0000 UTC) per day, whereas the 6-hourly resolution of the reanalysis dataset provides a more precise daily average of 700 hPa winds. In addition, the gridded reanalysis data can be used to compute a more spatially representative average of the 700 hPa wind field across the Atlanta metropolitan area, while the sounding data provides a more localized, point-based estimate of 700 hPa winds.
The third key dataset used in this study is the archive of atmospheric soundings provided by the University of Wyoming's Department of Atmospheric Science (University of Wyoming, 2020). During the 18-year period from 2002 to 2019, sounding data were gathered at KFFC for every day during meteorological summer (JJA). However, only soundings launched at 0000 UTC each day were collected since we are only interested in the most convectively active portion of the day. Because each sounding has a different number of observations at unique pressure levels, a linear interpolation scheme had to be applied to pressure, air temperature, dew point temperature, and wind components. Thus, composite soundings were constructed from these interpolated values, similar to Schroeder et al. (2016).

Methods
The analysis in this study was restricted to the core of the convective season, which we define as meteorological summer (JJA). Previous scholars (e.g., Hand and Shepherd, 2009;Mote et al., 2007) have noted that there is less large-scale atmospheric forcing and more locally-forced convective activity during this period of time. Similar to Hand andMcLeod et al. (2017), a geographic framework with nine    Figure 1 shows the average daily precipitation across the Atlanta metropolitan area from 2002 to 2019, coupled with an overlay of the grid cell framework. Similar to Figure 3 in McLeod et al. (2017), the urban enhancement of summer precipitation over and downwind of Atlanta can be readily observed in Figure 1A.
Climatologically, the entire 9-grid region is within a maritime tropical regime, so the enhancement of warm season rainfall (yellow, orange, and red shading) over and to the east of the central business district of Atlanta is indicative of the URE and consistent with previous literature (Shepherd et al., 2002;Mote et al., 2007). Figure 1B shows the average daily precipitation during summer over a broader region of the central and eastern United States. The urban enhancement of precipitation is still clearly visible over and downwind of the Atlanta metropolitan area. These figures serve as the foundation for all analyses conducted in this study. Recent studies are increasingly refining what "downwind" means within the context of these studies. Earlier studies from the METROMEX era (see Shepherd et al., 2013 for a review) and into the early 2000s (Shepherd et al., 2002) operated under the assumption of a "fixed" downwind region based on climatology. McLeod et al. (2017) and Schmid and Niyogi (2013) have found that the URE is strongly a function of the prevailing wind regime such that the "downwind" effect may vary around the urban area.
For each day during the study period, the average precipitation amount was computed for each of the nine grid cells centered on the Atlanta metropolitan area. In order to define an URD, two methods were selected from previous studies and one novel method was devised. Three methods were used in this study to account for sensitivity in selecting URDs, with the goal of creating a more robust, comprehensive definition of an URD. The first method was adapted from Debbage and Shepherd (2015), who established that the intensity of an urban heat island was equal to the minimum temperature averaged over the urban area minus the minimum temperature averaged over the rural area. Similarly, we define an URD as the average precipitation over the central urban grid cell minus the precipitation averaged across the remaining 8 "rural" grid cells. Thus, an URD is observed when this relationship is positive, or the average precipitation over the central urban grid cell exceeds the average precipitation across the remaining 8 "rural" grid cells. The second method was adapted from Mote et al. (2007), who compared radar-based precipitation data over an urban grid cell centered on Atlanta with a more rural grid cell located immediately to the west. We use the same procedure for our second method for defining an URD, where an URD is observed when the average precipitation over the central urban grid cell exceeds the average precipitation over the more rural grid cell located to the west. Finally, a novel third method was created to test whether the average precipitation observed across the urban core, eastern, and northeastern grid cells were greater than the average precipitation over the remaining six grid cells. McLeod et al. (2017) showed that the greatest rainfall anomalies occur climatologically in the urban core, eastern, and northeastern grid cells due to the prevailing southwesterly-to-northwesterly mid-tropospheric wind flow during the summer. Therefore, an URD would occur if the precipitation averaged across the urban core, eastern, and northeastern grid cells exceeded that of the remaining six grid cells. This novel third method is customized for the Atlanta metropolitan area, but future work could explore whether this method can be used for other regions, depending on the local mid-tropospheric wind pattern. Figure 2 shows the percentage of URDs by month and season for the three different methods. The first method based on Debbage and Shepherd (2015) is clearly the most conservative of the three methods, as only 30 percent of all summer days from 2002 to 2019 are classified as URDs. In contrast, over 40 percent of all summer days are classified as URDs for the Mote et al. (2007) and novel methods. Similar monthly patterns emerge among the three methods, as all three observe their greatest percentage of URDs during July. The Debbage and Shepherd (2015) and Mote et al. (2007) methods record their least percentage of URDs during June, while the novel method records its least percentage of URDs during August.
A HYSPLIT (Hybrid Single-Particle Lagrangian Integrated Trajectory) backward trajectory analysis was conducted for the 99th percentile and greater URDs (n = 17 days), based on the novel method of defining URDs (Stein et al., 2015;Rolph et al., FIGURE 3 | Composite anomaly map of 500 hPa geopotential height for URDs compared to non-URDs. Cross hatching (dot hatching) indicates height anomalies that are at least 1 standard deviation greater (lower) than the sample mean. The 9-cell gridded framework centered on Atlanta is indicated by the black-outlined boxes.
Frontiers in Environmental Science | www.frontiersin.org March 2022 | Volume 10 | Article 808026 2017). Each trajectory was initialized over Atlanta and run backward in time for a total of 72 h using the North American Regional Reanalysis (NARR) dataset. Three distinct trajectories initialized at 10 m, 500 m, and 1,500 m above ground level were computed for each daily simulation. Finally, the hourly latitude and longitude coordinates for all of the 17 trajectory simulations were averaged together to form single composite trajectories at the three heights above ground level. Zhang et al. (2020) used a k-means cluster analysis in their assessment of urbanization and rainfall variability. However, the cluster analysis was used for categorizing the intensity of urbanization. Herein, a k-means cluster analysis was performed on the daily MPE data spanning each summer (JJA) season from 2002 to 2019, resulting in a total of 1,652 days for analysis. This sample of 1,652 days was partitioned into 8 clusters, in which each daily observation is assigned to the cluster with the nearest mean. Thus, the precipitation values for each pixel depicted in Figure 12 represent the nearest means, or the centers of each cluster. A total of 8 clusters was chosen to provide enough variability in the cluster partitioning so that an urban signal could be detected. A sensitivity analysis was conducted by selecting 7 and 9 clusters, but the results were generally similar to the analysis using 8 clusters. Because the urban signal was more

Synoptic-Scale Composites
The synoptic-scale compositing analysis reveals that there are a consistent set of meteorological ingredients that are needed to produce an URD. Because the three methods produce very similar composite maps across all seven meteorological variables, only the maps for the most conservative method (i.e., Debbage and Shepherd, 2015) are displayed in the corresponding figures below. It is important to note that the anomalies for each map were calculated as the difference between the average of all URDs and non-URDs during 2002-2019. While  FIGURE 7 | Composite anomaly map of precipitable water for URDs compared to non-URDs. The 9-cell gridded framework centered on Atlanta is indicated by the black-outlined boxes.
Frontiers in Environmental Science | www.frontiersin.org March 2022 | Volume 10 | Article 808026 6 the magnitude of the anomalies may appear small, it is likely just a function of the time of year being analyzed (i.e., summer) and the mesoscale nature of the precipitation pattern under examination.
In Figure 3, the composite map for 500 hPa geopotential heights indicates that Atlanta is caught between weak negative anomalies to the west and stronger positive anomalies to the northeast. Both the negative height anomalies over the Great Plains and the positive height anomalies over the Mid-Atlantic and Northeast U.S. are statistically significant based on a Z-score test. This anomaly pattern should promote west-southwesterly flow and advection over the Atlanta area, which is consistent with the wind rose analysis presented in Figure 4. Figure 5A shows that negative anomalies of 700 hPa vertical velocity are present over the Atlanta metropolitan area, indicating upward motion is occurring downstream of the weakly negative 500 hPa height anomalies over the Central Plains region. Figure 5B indicates that slightly weaker-than-average 700 hPa winds are present over Atlanta. This is consistent with the results of McLeod et al. (2017), who found that positive rainfall anomalies over the urban core of Atlanta are often associated with weaker-thanaverage 700 hPa winds. Low-level moisture is anomalously high FIGURE 8 | Composite anomaly map of integrated vapor transport for URDs compared to non-URDs. The 9-cell gridded framework centered on Atlanta is indicated by the black-outlined boxes.
FIGURE 9 | HYSPLIT backward trajectory analysis for the 99th percentile and greater URDs (n = 17 days). The trajectories are color-coded as follows: 10 m above ground level (red line), 500 m above ground level (blue line), and 1,500 m above ground level (green line).
FIGURE 10 | Composite anomaly map of 1,000 hPa divergence for URDs compared to non-URDs across the Southeast region (A) and the metropolitan Atlanta area (B). The 9-cell gridded framework centered on Atlanta is indicated by the black-outlined boxes.
Frontiers in Environmental Science | www.frontiersin.org March 2022 | Volume 10 | Article 808026 over the Atlanta area, as depicted in Figure 6. Dixon and Mote (2003) found that low-level moisture was more important than UHI intensity in producing urban-generated precipitation over Atlanta. In addition, Figures 7, 8 reveal that positive anomalies of precipitable water and integrated vapor transport are present across the Atlanta metropolitan area during URDs. Thus, the entire atmospheric column is moister compared to average, which is supported by the composite sounding analysis shown later in Figure 10. Schroeder et al. (2016) found that for extreme urban flooding, precipitable water values were typically in the 1-3% range of historical values.
In Figure 9, a HYSPLIT backward trajectory analysis is presented for the 99th percentile and greater URDs (n = 17 days), according to the novel method of defining URDs. It is clear that the most extreme URDs are characterized by low-level moisture transport from the Atlantic Ocean, while mid-level moisture transport occurs from the Gulf of Mexico. This deep-layer moisture transport from two subtropical bodies of water supports the positive anomalies in both precipitable water and integrated vapor transport that were shown over the Atlanta area in Figures 7, 8.
Given that the synoptic environment on URDs are supportive for convection, a composite of 1,000 hPa divergence was conducted. Figure 10 shows that a broad area of convergence is present on URDs as compared to non-URDs. We hypothesize that the background convergent flow is a condition supportive of convective forcing at the meso-gamma scale associated with the urban environment itself. Shepherd el al. (2010) and Debbage and Shepherd (2019) found evidence of enhanced low-level convergence in their modeling studies of urban-enhanced convective precipitation.
The compositing results suggests that a synoptic urban convective regime (SUCR) represents a baseline environment supportive of the urban rainfall effect. However, additional analysis is required to add credibility to the SUCR hypothesis.

Wind Rose Analysis
In Figure 4, 700 hPa wind roses for the three methods of defining URDs (Figures 4B-D) are compared against the climatological wind rose for all summer days during 2002-2019 ( Figure 4A). While it is evident that the climatological wind rose is nearly identical to the other three wind roses, there are some important points that can be made with respect to URDs. First, URDs can occur under any wind direction at 700 hPa, but they are most commonly observed when the flow is southwesterly-tonorthwesterly. Indeed, over 50% of URDs for any of the three methods observed southwesterly-to-northwesterly flow at 700 hPa. McLeod et al. (2017) captured this finding when describing the urban rainfall signature around Atlanta as "flow regime dependent." In addition, URDs can occur under a wide range of 700 hPa wind speeds, but they most commonly occur when wind speeds are between 5 and 10 knots. Around one-third of URDs for any of the three methods observed 700 hPa wind speeds between 5 and 10 knots. McLeod et al. (2017) showed that slower 700 hPa wind speeds were more effective at producing positive precipitation anomalies directly over the Atlanta urban core, while faster wind speeds tended to be associated with enhanced precipitation downwind of the Atlanta urban core. Figure 11 presents composite atmospheric soundings for all URDs identified using each of the three methods. Each sounding has a set of dashed and solid air temperature and dew point temperature lines. The dashed lines represent the climatological mean sounding for every June-August day during 2002-2019, while the solid lines represent only the URDs. The air temperature lines are mostly on top of each other, indicating very little difference between URDs and climatology. However, the dew point temperature lines for URDs are consistently warmer and closer to the air temperature lines, when compared against climatology. The atmosphere during URDs is thus more moist compared to the climatological average summer day, and this enhanced moisture can be found throughout the tropospheric column. Finally, the wind profiles reveal that flow is predominately westerly throughout much of the atmospheric column, with an increase in speed as pressure decreases with height. These soundings indicate marginal instability, or the condition when localized convection is favored over widespread thunderstorm activity. The urban environment of Atlanta via the heat island and convergent forcing can more effectively initiate convection during periods of weak instability when the mesoscale dominates any synoptic-scale processes. These Frontiers in Environmental Science | www.frontiersin.org

Composite Atmospheric Soundings
March 2022 | Volume 10 | Article 808026 9 findings are strongly supported by Dixon and Mote (2003), who also performed a comparison of soundings between UHI-induced precipitation events and their climatological study period. While they found that both air temperature and dew point temperature differed significantly between UHI-induced precipitation days and average days, the differences for dew point temperature were much greater in magnitude, particularly in the bottom half of the troposphere (Dixon and Mote, 2003). This led them to conclude that low-level moisture, rather than UHI intensity, is the dominant factor controlling the occurrence of urban-generated convection.

K-Means Cluster Analysis
Figures 12A-H show the results of the k-means cluster analysis, with the 8 clusters mapped across the Atlanta, GA region. Over 60% of the sample corresponds to days with relatively minimal precipitation (Cluster #2), which is consistent with the summertime climatology of thunderstorm activity across the Atlanta metropolitan area (Rose et al., 2008;Bentley et al., 2012). Most days during the summer are characterized by little to no convective precipitation. The next highest (~20%) percentage of days occurs in Cluster #7, where the greatest precipitation occurs over northern portions of the region and within the urban core of Atlanta. This cluster provides some indication of an urban rainfall effect over the I-285 corridor of Atlanta, but the most compelling evidence of an urban rainfall effect can be found in Clusters #1 and #4. Though collectively occurring only about 8% of the time, these two clusters reveal a pronounced urban rainfall signal, as precipitation is maximized immediately over and downwind (i.e., to the east) of downtown Atlanta. Taken together, the results of this cluster analysis confirm that the urban rainfall effect can be readily detected in the summertime precipitation climatology across the Atlanta metropolitan area.

DISCUSSION
Overall, this study has moved the literature a step forward towards the predictability of the urban rainfall effect. Using a combination of synoptic compositing, wind rose, and k-means cluster analyses, a set of ingredients supportive of the urban rainfall effect in the Atlanta area are presented as representative of a synoptic urban convective regime (SUCR): • Primarily southwesterly-to-northwesterly flow at 700 hPa • Weaker-than-average winds at 700 hPa • Weakly negative (strongly positive) 500 hPa height anomalies to the west (northeast) • Anomalously moist surface and atmospheric column • Background low-level convergent flow Our analysis, possibly for the first time, also quantifies how often the urban rainfall effect occurs during the warm season. The k-means cluster analysis suggests that the urban rainfall effect is detectable in about 8% of the sample days.
Ultimately, the urban rainfall effect will need to be a part of detailed predictive capabilities by forecasters and modeling systems. This study has laid a basic foundation for understanding the synoptic-scale environment that is supportive of the urban rainfall effect. This could provide some initial guidance for forecasters on large-scale conditions in order to refine localized precipitation or lightning guidance around cities. Future regional modeling studies can evaluate our findings by conducting simulations of case days representative of this synoptic urban convective regime under varying urban landscape scenarios. A limitation of this study is the focus on one geographic region. Future studies should consider other cities and methodologies like self-organizing maps.

AUTHOR CONTRIBUTIONS
MS and JM contributed equally to the development of this manuscript including analysis and interpretation. JM generated much of the data analysis and MS conducted the interpretation and writing. JM also assisted with writing and interpretation.

FUNDING
This research was funded by NASA Grant 80NSSC20K1268.