LCZ Generator: A Web Application to Create Local Climate Zone Maps

Since their introduction in 2012, Local Climate Zones (LCZs) emerged as a new standard for characterizing urban landscapes, providing a holistic classification approach that takes into account micro-scale land-cover and associated physical properties. In 2015, as part of the community-based World Urban Database and Access Portal Tools (WUDAPT) project, a protocol was developed that enables the mapping of cities into LCZs, using freely available data and software packages, yet performed on local computing facilities. The LCZ Generator described here further simplifies this process, providing an online platform that maps a city of interest into LCZs, solely expecting a valid training area file and some metadata as input. The web application (available at https://lcz-generator.rub.de) integrates the state-of-the-art of LCZ mapping, and simultaneously provides an automated accuracy assessment, training data derivatives, and a novel approach to identify suspicious training areas. As this contribution explains all front- and back-end procedures, databases, and underlying datasets in detail, it serves as the primary “User Guide” for this web application. We anticipate this development will significantly ease the workflow of researchers and practitioners interested in using the LCZ framework for a variety of urban-induced human and environmental impacts. In addition, this development will ease the accessibility and dissemination of maps and their metadata.


INTRODUCTION
Urbanization and climate change may be the two most important trends to shape global development in the decades ahead. On the one hand, cities serve as engines of change, drive economic progress and pull more people out of poverty than at any other time in history. On the other hand, climate change could undercut all of this by exacerbating resource scarcity and putting (vulnerable) communities at risk from a myriad of environmental challenges (e.g., heat waves, droughts, floods, air quality, etc.) (Baklanov et al., 2018). The magnitude of this risk will increase in the coming decades as it is predicted that global urban land will increase significantly , and by 2050, almost 70% of the world's population will be urban dwellers (UN, 2019). On top, as earth's climate will continue to change over the coming decades, projected global warming and aggravated hydro-climatic extremes will hit urban centers especially hard, being a major threat to the health and well-being of human populations and urban ecosystems (Costello et al., 2009).
Successful mitigation and adaptation to climate change will depend centrally on what happens in cities, as urban areas house the majority of people, assets and infrastructure, and are responsible for about 70% of the world's energy-related CO 2 emissions (Lucon et al., 2014). At the international level, cities are becoming of increasing concern: the new United Nations Agenda and Sustainable Development Goals have a clear focus on urban resilience, climate, and environment sustainability of smart cities. The Intergovernmental Panel on Climate Change (IPCC) held its first "cities and climate change" conference in 2018, and announced a special report on cities which will be part of the panel's seventh assessment cycle . Finally, of the four challenges identified by the World Meteorological Organization (WMO) World Weather Research Program, two are urban related: highimpact weather, including impacts in cities, and urbanization (Creutzig et al., 2016;Masson et al., 2020).
Despite this new focus on cities as a critical scale for climate change management, we know very little about most cities on the planet-being generally ignorant of their extent, how they are constructed and how they are occupied . First and foremost, climate-relevant urban data consistent in coverage, scale, and content are needed to support risk assessment and its management and to enable effective knowledge transfer between cities. The right data at the right scale are an essential prerequisite for developing fit-for-purpose urban planning policies (Georgescu et al., 2015). A number of projects have mapped the global urban extent at finer and finer detail (e.g., Pesaresi et al., 2013;Corbane et al., 2017;Esch et al., 2017;Gong et al., 2020), but these efforts need to be complemented by a wider range of information-rich intra-urban classes that describe different types of urban land covers and land uses: the Local Climate Zone (LCZ) typology is a good example of such classification scheme (Stewart and Oke, 2012;Demuzere et al., 2020a;Reba and Seto, 2020).
Local Climate Zones refer to a classification system that exists out of 17 classes, 10 of which can be described as urban (Figure 1). The system is originally designed to provide a framework for urban heat island studies, allowing the standardized exchange of urban temperature observations (Stewart and Oke, 2012). The LCZ classes are formally defined as "regions of uniform surface cover, structure, material, and human activity that span hundreds of meters to several kilometers in horizontal scale, " exclude "class names and definitions that are culture or region specific, " and are characterized by "a characteristic screen-height temperature regime that is most apparent over dry surfaces, on calm, clear nights, and in areas of simple relief " (Stewart and Oke, 2012). Its universality has important advantages, as it allows a systematic comparability of global intra-and inter-urban heat island studies (e.g., Bechtel et al., 2019a), provides a common platform for knowledge exchange and the description of urban canopy parameters in urban ecosystem processes, and supports model applications, especially for cities with little or insufficient data infrastructure (Stewart and Oke, 2012;Ching et al., 2018;Brousse et al., 2019Brousse et al., , 2020bDemuzere et al., 2020a;Varentsov et al., 2020).
In the early 2010s, Bechtel (2011) and Bechtel and Daneke (2012) first proposed mapping entire cities into Local Climate Zones. This procedure was formalized by Bechtel et al. (2015), relying on an "off-line" workflow that integrates training areas (TAs, a set of LCZ labeled polygons) and Landsat 8 (L8) imagery within the SAGA software package  over a limited spatial domain. More specifically, each TA is identified using Google Earth images aided by the visual and numerical information provided in Stewart and Oke (2012). The TA dataset is then used to extract spectral information from L8 images, which in turn is used in a supervised random forest classifier to categorize the entire region of interest into LCZ types. This procedure was afterwards adopted by the World Urban Database and Access Portal Tools (WUDAPT) community project to create consistent LCZ maps of global cities (Ching et al., 2018).
While this framework is valuable (currently ∼150 cities mapped), it will not result in a database that could support urban decision-making globally in a reasonable time frame. Therefore, Demuzere et al. (2019bDemuzere et al. ( ,c, 2020a) developed a number of strategies to expand LCZ coverage rapidly. The first recognizes that much of the information contained in TA data for one city is transferable to other cities for which no TA data is available. The second employs Google's Earth Engine (EE)-a cloud-based platform for planetary-scale analysis (Gorelick et al., 2017)to use its computational power, access to a range of geospatial datasets (Landsat, Sentinel, and others) and a large number of predefined algorithms. Among others, this cloud-based approach resulted in high-resolution Local Climate Zone maps for global cities, Europe and the continental United States of America (Bechtel et al., 2019a,b;Demuzere et al., 2019aDemuzere et al., ,b,c, 2020aBrousse et al., 2020a).
The LCZ Generator web application described here further simplifies this process, as it provides an online platform that maps a city of interest into LCZs, solely expecting a valid TA file and some metadata as input. The application integrates all of the above-mentioned developments and procedures, and simultaneously provides an automated accuracy assessment, TA data derivatives and a novel approach to identify suspicious TAs. As this contribution explains all front-and back-end procedures, databases and underlying datasets in detail, it serves as the primary "User Guide" for this web application.

LCZ GENERATOR DESIGN
The LCZ Generator web application consists out of three major steps (Figure 2). In a first step, personal and training information needs to be submitted via the web application (section 2.1). Upon successful submission, the LCZ classification and quality control is launched in the back-end, to produce a quality-controlled LCZ map, metadata statistics, and labels for suspicious polygons (sections 2.2 and 2.3). In a third and final step, compressed results are sent to the user via e-mail, and simultaneously added to the online submission table (section 2.4). Each of these steps are discussed in more detail in the following sections.

User Input
When accessing the LCZ Generator, the user is directed to a submission form that consists out of two sections: personal information and TA information ( Table 1). The personal information consists out of the author's first and last name and e-mail address. The name information refers to the primary author of the TA file, which can be acknowledged in case it is used by others. The e-mail is required since the results of the LCZ Generator are sent via e-mail. If the author consents, the

Remarks
Additional information on potential co-authors for the TA file (firstname, lastname) or any other information that supports the interpretation of the TAs.
Fields denoted with * are required.
author's first and last name are displayed in the publicly accessible submission table and factsheet (see section 2.4). The second section of the submission form queries about the TA file. A user can select the continent and country via a dropdown menu, and provide the name of the city of interest. The date field refers to the date for which the training polygons are representative. This is not necessarily the date on which the TA file is created, but rather the date of the imagery (e.g., in Google Earth, see Bechtel et al., 2015) on which the labeled TAs are developed. The non-required "Reference" and "Remarks" fields allow the user to provide additional metadata about the TA file. The former can be the Digital Object Identifier (DOI) in case the TA set is published in a (peer-reviewed) paper, a reference to an online resource, or left blank if none of the previous are available. The latter allows free text and can e.g., be used to list additional authors that contributed to the creation of the TA file, or any other information that is relevant to understand the content of the TA file.
Key to the submission is the TA file itself, that can be uploaded via a button and can have any name. Yet upon submission, a file-check is done to make sure it has not been uploaded before and is compatible with the remainder of the LCZ Generator. First of all it is important that the file extension is .kml or .kmz [Keyhole Markup Language (.kml) or its zipped version (.kmz) respectively]. In case of .kmz, the file is unzipped to .kml. Second, it is checked whether the TA file can be read, and contains one or more LCZ folders, as provided in the default WUDAPT LCZ .kml template 1 . This strategy is chosen as users can provide any label to a LCZ class (e.g., "LCZ 2a, " "compact midrise 1, " "not sure about this one, " . . . ), making it difficult for the application to assign an appropriate LCZ label required for the classification. If folders are available, the folder names are used to rename their underlying polygons. Third, if present, empty polygons are removed (e.g., "Style Place Holders" that were not deleted from the .kml template). Fourth, each polygon is provided with a unique ID, which is required to perform the automated TA quality control (see section 2.3). Finally, also the size of the region of interest (ROI) is checked. The ROI is defined as the outer extent of the TA polygons, currently with an additional buffer on all sides of 10 km. In order to maintain computational efficiency, the maximum allowed ROI size is currently set to 2.5 • x 2.5 • .
If any of the above checks fail, a red-framed message is returned to the user upon submission, instructing about ways to solve the issue. If all tests pass, a green-framed message is returned, and the LCZ Generator is launched in the back-end.

LCZ Classification and Quality Control
Before the TAs are used in the classification procedure, they undergo a final pre-processing step: the surface area of large polygons (>1.5 km 2 ) is reduced to a radius of approximately 350 m, in line with Demuzere et al. (2019bDemuzere et al. ( ,c, 2020a and the minimum allowed surface area described in section 2.3. These large polygons typically represent homogeneous areas such as water bodies and forests, a characteristic that is neither needed nor wanted, as it leads to more imbalanced TA data and computational inefficiency of the classifier. In addition to the TAs, one needs earth observation data and a supervised classifier . The default WUDAPT workflow relies on Landsat 8 data as input to the random forest classifier, embedded as an "LCZ classification tool" in SAGA GIS (Breiman, 2001;Bechtel et al., 2015;Conrad et al., 2015). Yet here, the LCZ Generator builds further upon the findings of Demuzere et al. (2019bDemuzere et al. ( ,c, 2020a, Brousse et al. (2020a), in which additional earth observations are used, in combination with the TAs, as input to EE's implementation of the random forest classifier. Currently, a total of 33 input features are available globally, on a 100 m resolution, and are stored in EE's online WUDAPT asset folder (3 TB of data) ( Table 2). They consist out of 16 features derived from Landsat 8, 5 features from Sentinel-1, 8 features from Sentinel-2, and four additional features reflecting terrain and forest canopy height. Note that the list of input features used in Demuzere et al. (2019bDemuzere et al. ( , 2020a is expanded with Sentinel-2 red edge bands to improve the mapping of wetlands (Forkuor et al., 2018;Kaplan and Avdan, 2018;Brousse et al., 2020a), and a Sentinel-2-based combinational shadow index (CSI) and shadow enhancement index (SEI) median composite (Sun et al., 2019). The system is designed in such a way that, whenever additional, new or improved global earth observation datasets become available, they can easily be added to the asset folder and activated in the classification procedure.
To ensure the quality of the resulting LCZ map, quality control is a vital step (Verdonck et al., 2017). Hence, an automated crossvalidation approach using 25 bootstraps is applied . In each bootstrap, 70% of the TA polygons are used to train and 30% to test; the polygons are selected by stratified (LCZ type) random sampling, maintaining the original LCZ class frequency distribution. This procedure is repeated 25 times allowing us to provide confidence intervals around the accuracy metrics. In addition, this approach also allows the creation of a probability map, which indicates how often (in %) the mode was mapped in the iterative procedure. The resulting LCZ map provided to the user is based on all TAs (100% of the TA polygons) and input features. A filtered version is also provided using the morphological Gaussian filter described in more detail in Demuzere et al. (2020a). This is preferred over the WUDAPT's traditional majority postclassification, as it accounts for the distance from the center of the kernel and differences in the typical patch size between classes. For example, linear features like rivers are typically removed by the majority filter. The LCZ map, its Gaussianfiltered version and the probability map are provided to the user as a single .tif with three bands: "lcz, " "lczFilter, " and "classProbability, " respectively.
The accuracy metrics used follow previous work (see Demuzere et al., 2020a, and references therein): overall accuracy (OA), overall accuracy for the urban LCZ classes only (OA u ), overall accuracy of the built vs. natural LCZ classes only (OA bu ), a weighted accuracy (OA w ), and the class-wise metric F1. The overall accuracy denotes the percentage of correctly classified pixels. OA u reflects the percentage of classified pixels from the urban LCZ classes only, and OA bu is the overall accuracy of the built vs. natural LCZ classes only, ignoring their internal differentiation. The weighted accuracy (OA w ) is obtained by applying weights to the confusion matrix and accounts for the (dis)similarity between LCZ types (Bechtel et al., 2017. For example, LCZ 4 is most similar to the other open urban types (LCZs 5 and 6), leaving these pairs with higher weights compared to e.g., an urban and natural LCZ class pair. This results in penalizing confusion between dissimilar types more than confusion between similar classes. Finally, the class-wise accuracy

Automated TA Quality Control
Sections 2.1 and 2.2 are at the core of the LCZ Generator application, explaining how a user's TA dataset combined with a wealth of earth observation input feeds the random forest classifier, resulting in a quality-controlled LCZ map. Yet an additional automated 3-step TA quality control is added, that aims to facilitate the revision of the original TA submission and resulting LCZ map, since previous work by Bechtel et al. (2017Bechtel et al. ( , 2019a and Verdonck et al. (2019) highlighted that multiple iterations can significantly improve the overall accuracy of the LCZ map, and are thus recommended. Stewart and Oke (2012) suggested that the typical horizontal scale of a Local Climate Zone-reflecting an area of uniform surface cover, structure, and material-spans hundreds of meters to several kilometers. In addition, the number of TAs selected for each zone can be an indicator for zones which are hard to classify, and the WUDAPT protocol suggests to digitize compact and simple TA sets, characterized by a shape ratio close to one Verdonck et al., 2019). Therefore, a summary table (_TA_statistics.csv) is added to the output, providing, for each available LCZ class, the number of polygons (Count, C), the average and total surface area (Avg. / Total area, km 2 ), the perimeter (km), the shape (-), and number of vertices (-).
Subsequently, a 3-step automated quality control (QC) is applied to label suspicious TA polygons. In a first step (qc_step1), FIGURE 3 | Factsheet example for Saint Petersburg. Note that in reality, the factsheet also contains a "Terms of Service" and "Attribution" section (see section 2.5.4). These sections are omitted here for clarity. polygons with a surface area below 0.04 km 2 (too small) or a shape ratio 3 (too complex shape) are flagged. In a second step (qc_step2), the non-parametric density-based spatial clustering of applications with noise (DBSCAN) (Ester et al., 1996;Schubert et al., 2017) is used to identify whether the average spectral value of a polygon of LCZ class i is considered as an outlier compared to the average spectral values of all other polygons of that class i. The method requires two parameters: ǫ, which is the maximum distance between two samples for one to be considered as in the neighborhood of the other, and MinPoints, the number of minimum samples in a neighborhood for a point to be considered as a core point. Here, ǫ is set to 0.3 and MinPoints to C i /10, based on a number of iterations and expert judgement. Since this method is efficient on large, multi-dimensional datasets, it is applied simultaneously on all earth observation input features discussed in section 2.2.
A third and final QC step (qc_step3) considers all individual pixel values of all polygons in each LCZ class i compared to the polygon average approach from qc_step2. The same parameter values for ǫ and MinPoints are used, and the procedure is also applied on all available input features simultaneously. The pixel's latitude and longitude coordinates here serve as an unique identifier to tag suspicious points within polygons.
If polygons are identified as suspicious, the user receives two shapefiles containing the results of the automated quality control procedure. The first shapefile (ID_auto_qc_polygon.shp) contains all polygons flagged as suspicious in at least one of the tree steps. Since qc_step3 returns points, each polygon that intersects with at least one of these flagged points is added. All shapes in this file contain additional metadata fields characterizing their geometry (area, perimeter, shape, vertices) and a boolean value for each of the three QC steps: True (1) / False (0) in case a TA passed / failed one of the three QC tests. The second shapefile (ID_auto_qc_point.shp) contains the individual flagged points, which might provide additional insights into why certain polygons are flagged as suspicious. In case no polygons or points are labeled as suspicious, the same files are created yet only contain a point with a dummy identifier and a geometry indicating the center pixel of the ROI.

Generated Output
If the LCZ Generator successfully completes all processes, the user is notified via e-mail, that contains a compressed (.zip) archive as attachment. This archive (Table 3) contains the various outputs described in sections 2.2 and 2.3.
The output is listed in an online search-and sortable submission table including information about the city, country, continent, date of the submission, overall accuracy, and a button (Show Factsheet) linking to the factsheet that provides a visual summary of all results. In case a user did not agree to display his/her name (see section 2.1), the Author field is left blank in both the submission table and factsheet. By checking one or multiple entries using the left-hand side check-boxes of the submission table, one can also download the corresponding .zip archive(s).
The submission table is structured as follows. If a user submitted multiple TAs for one city, only the submission having the best overall accuracy is displayed. In case multiple users submit TAs for the same city, only the best result is displayed, but this time for each individual user. A button (Show all submissions) allows the user to view and download all submissions including those where one author submitted multiple versions of TAs for the same city. This structure ensures that only results with the best possible quality are directly available for download, but also that this web application can be used for learning purposes and improving the TA creation technique without adding multiple previous submissions of minor quality to the table.
In the event the LCZ Generator fails after successfully submitting the TAs, the user is notified via e-mail as well. In this  case, the developers automatically receive a message, and can use the log stored in the back-end to solve the issue.

Database
All data including the author and submission information, as well as the processing outputs are stored with a unique ID in a PostgreSQL database. The TAs are stored in a PostGIS table as individual polygons.

Versioning
The LCZ-Generator code will be versioned according to semantic versioning 2 : breaking changes to the application programming interface (API)-including changes to the input features (Table 2)-will be indicated by an incremented major version. After the release of version 1.0.0, and for each next release, all changes will be described in a changelog, available on the issue page (section 2.5.3). The version used for creating each LCZ map is stored for each submission and included in the corresponding factsheet.

Support
Guidance in how to use the LCZ Generator is provided via the "Getting started" and "Frequently Asked Questions (FAQ)" pages, accessible via the navigation bar of the web application. If users run into issues while using the LCZ Generator, they can open a public issue on the application's Github issue tracker 3 . In case security bugs are found, we ask the user to not create a public issue but instead reach out to us directly via lcz-generator@rub.de.

Terms of Service and Attribution Guidelines
The web application uses the CC BY-SA 4.0 license 4 for all submissions made. The terms of service 5 need to be accepted upon submission. In addition, attribution guidelines 6 are provided on how to acknowledge the materials produced by the LCZ Generator, the authors of the TAs or any of the underlying methods used in the Generator's classification procedures. This information is also embedded at the end of the factsheet (see also section 3.1).

Test Samples
In this paper, the performance of the LCZ Generator web application is demonstrated via three new TA samples, compiled by three student assistants at the Ruhr University Bochum (Germany). The samples are from different urban ecoregionswhich stratify urban areas based on general climate and vegetation characteristics, regional differences in urban topology, and the level of economic development (Schneider et al., 2010)and include Saint Petersburg (Russia, "Temperate forest in Asia"), Bamako (Mali, "Tropical, sub-tropical Savannah in Africa"), and Havana (Cuba, "Tropical broadleaf forest in South America"). The TAs are a first version, and did not undergo a manual review by an experienced operator .

RESULTS
This section presents and discusses all contents of the resulting .zip archive in more detail. Note that all LCZ results in this paper are displayed with labels 1-10 for the urban classes, and A to G for the natural classes, in line with Stewart and Oke (2012) (Figure 1). However, all underlying files output by the LCZ generator use integers, with labels 11 to 17 for the natural classes. Table   FIGURE 7 | All TA polygons tagged as suspicious during the first quality control step, for Saint Petersburg. Color scheme as in Figure 1.

LCZ Map and Accuracies
Feeding the random forest in a bootstrapping manner with the submitted TAs (Figure 4) and the earth observation input features ( Table 2) results in a raw and filtered LCZ map, a pixel probability map ( Figure 5) and overall accuracy metrics (Figure 6). Combined with the information from the factsheet (Figure 3) and the ID_TA_statistics.csv file, one can directly assess the amount and distribution of TA polygons. For Saint Petersburg, a total of 310 TA polygons are available, with the highest / lowest frequencies for LCZ 6 (Open lowrise) and 14 (Low plants) / LCZ 9 (Sparsely built) and 10 (Heavy industry).
The raw and filtered LCZ maps (Figure 5) differ mainly in their fine-scale heterogeneity: as single pixels do not constitute an LCZ class, the Gaussian filter procedure is able to remove this granularity. Since the Gaussian parameters (standard deviation and kernel size) are currently derived by experts, and expected to differ between cities and continents , they deserve further attention and potential adjustments in future versions of the LCZ Generator. The probability maps in  indicate how often (in %) the mode LCZ class was mapped during the bootstrapping procedure. In general, areas covered by TAs are often mapped as the same LCZ class more than 80% of the time (>20/25 iterations). Areas at the boundaries of the ROI, e.g., southern edge of the Havana domain, or east of Bamako, are often characterized by lower probability scores. Such information helps authors to identify where confusion exists in their ROI.
Finally, the accuracy of the lcz map can be assessed using the accuracy metrics discussed in section 2.2 and displayed in Figure 6. For all three cities, the average overall accuracy metrics reach values above 0.5, a minimum accuracy level proposed by Bechtel et al. (2019a) to pass the automated quality control. Lowest class-wise F1 metrics can be seen for LCZs 9 and 10 in Saint Petersburg (corresponding to the LCZs with the lowest TA polygon frequencies), and LCZ 6 in Havana. Note that no F1 metric is available for LCZ 7 in Bamako, even though one TA polygon is available in the TA set ( Figure 4A). This is because a single polygon does not suffice to perform a quality assessment due to the stratified random sampling of the TAs in training and test data. This is in line with the results of the HUMan INfluence EXperiment (HUMINEX, Bechtel et al., 2017;Verdonck et al., 2019) indicating that, when the number of TAs for a specific zone is low, the representativeness of this TA might be low, leading to lower accuracies. This is often caused by (inexperienced) authors spending a lot of time searching for TAs for all seventeen LCZs, even though some of the zones are not large enough or occur too sparsely in the city to constitute a LCZ.

Automated TA Quality Control
In total, 36 (25%), 80 (25%), and 27 (16%) polygons are flagged as suspicious in at least one of the quality control steps, for Bamako, Saint Petersburg, and Havana, respectively. Some examples from all cities and for each quality control step are described in more detail below. Figure 7 displays all polygons from Saint Petersburg flagged as suspicious during the first quality control step. Two polygons are flagged because they have a surface area below the 0.04 km 2 threshold (Figures 7C,H), the remainder because of their shape exceeding the maximum allowed value of 3. The latter polygons typically correspond to linear (narrow and very long) shapes, often pointing to rivers (LCZ 17, Figures 7E,G,I) or complex shapes not adhering to the guidelines of digitizing simple block shapes (Figures 7A,B). While these are not necessarily wrong, complex shapes may lead to a suboptimal sampling of the satellite input features, or may lead to a mixed spectral signature in Some examples for the second quality control step are shown in Figure 8. They are all natural LCZ classes consisting out of LCZ 11 (or A, Dense trees), 12 (or B, Scattered trees), 16 (or F, Bare soil or sand) and 17 (or G, Water). The true color RGB satellite information reveals that the dense tree polygon ( Figure 8A) might be closer to LCZ B (Scattered trees). This is supported by the spectral profiles in Figure 9A, with e.g., lower values for the forest canopy height (GFCH), and higher values for Landsat's red (L8_B4) and thermal infrared (L8_B10/B11) bands, when compared to the expected spectral value space for all LCZ 11 polygons. For the LCZ 12 polygons (Figures 8B,C), the true color satellite imagery reveals a rather heterogeneous landscape, covered by patches of dense and scattered trees, agricultural fields, bare soils, small settlements or sparsely built areas, and a small (seasonal) river. The latter two are captured by the higher than expected value for Landsat's NDWI (L8_NDWI) and a lower than expected enhanced built up and bare soil index (L8_EBBI), where lower EBBI values refer to built-up areas (Assyakur et al., 2012) ( Figure 9B). The polygon in Figure 8D is labeled as bare soil or sand, even though the man-made land use pattern suggest this area to be farm land, which should thus be labeled as LCZ 14 (or D, Low plants). This is also 7 http://www.wudapt.org/create-lcz-training-areas/ visible from Landsat's median, 10 and 90th percentile normalized difference vegetation index values (L8_NDVI(_P10/_P90) being higher than the expected LCZ 16 values ( Figure 9C). Lastly, the LCZ 17 polygons in Figures 9E,F represent two sections of the Niger river, characterized by strong fluctuations in water levels according to the rainy and dry seasons. Using the Global Surface Water Explorer 8 (Pekel et al., 2016) or Google's timelapse tool 9 , one can infer that these polygons are mapped in sections of the river that are seasonal and thus only have water for some time of the year. This is supported by the Landsat's NDVI and NDWI values for the LCZ 17 polygons (Figures 9D, 10): while all LCZ 17 polygons are sampling from the Niger river (Figure 4A), the NDWI values for the polygons in Figures 8E,F are significantly lower than those from the other polygons. The same but opposite observation can be made for the NDVI values.
The third quality control step performs a similar analysis as the second step, yet this time on the pixel level. Figure 11 displays a selected number of polygons over Havana, together with the pixels flagged as suspicious. The first polygon ( Figure 11A) is labeled as LCZ 9 (Sparsely built), reflecting the small or medium-sized buildings widely spaced across a landscape with abundant vegetation. Yet the polygon also includes a water body large enough to be detected by the 100 m input feature pixels. Visualizing the NDWI values of these pixels against e.g., the combined shadow index derived from Sentinel-2 (S2_CSI) reveals the outlier position of these pixels ( Figure 12A). A similar analysis can be done for the other selected polygons: the LCZ 14 polygon in Figure 11B mostly constitutes agricultural land, yet also contains a farm flagged as suspicious. The compact lowrise LCZ 3 polygon in Figure 11C contains a park in the middle surrounded by trees, being flagged as suspicious. Figure 11D is labeled as LCZ 13 (Bush and scrub) even though it should probably be LCZ D (Low plants). The flagged dots in this case refer to areas with seasonal waters, which can again be visualized using Google Earth's historical imagery tool. Finally, Figures 11E,F are two additional examples of compact lowrise polygons. And even though some of the spectral signatures tend to be outliers compared to all other pixel values for this LCZ class (Figures 12E,F), it is not self-evident to pin-point the exact reasons for the polygons to be flagged. In Figure 11E, a pixel is flagged with abundant vegetation, yet elsewhere in the polygon similar areas can be found that are not flagged. The polygon in Figure 11F represents a homogeneous neighborhood in terms of urban form, yet here the flagged pixel is on top of a largescale warehouse, potentially large enough to influence the pixel's spectral values with its different radiative characteristics.

DISCUSSION AND CONCLUSIONS
Since their introduction in 2012 (Stewart and Oke, 2012), Local Climate Zones (LCZs) emerged as a new standard for characterizing urban landscapes, providing a holistic classification approach that takes into account micro-scale landcover and associated physical properties . This is reflected by the growing number of scientific publications having "LCZ" or "Local Climate Zones" listed as keywords: according to Web of Science, as of February 4 2021, a total of 139 papers were published, 38 of them in 2020 alone. The default LCZ mapping procedure, adopted as Level 0 (lowest level of detail) by the WUDAPT grass-root effort, and relying only on open-source data (Landsat 8) and software (SAGA GIS, Conrad et al., 2015), was certainly instrumental to this success Ching et al., 2018). However, some features of this default procedure inhibit global up-scaling in a reasonable time, e.g., the need to download and pre-process Landsat 8 data from the United States Geological Survey (USGS) Earth Explorer, the processing of the LCZ classifier embedded in SAGA GIS on your local computer, the unavailability of an automated cross-validation, and the manual review by an experienced operator before the data is made publicly available .
The LCZ Generator addresses these shortcomings, by adopting well-tested and -documented cloud-based LCZ mapping strategies using Google's earth engine (Gorelick et al., 2017;Brousse et al., 2019Brousse et al., , 2020aDemuzere et al., 2019bDemuzere et al., ,c, 2020aVarentsov et al., 2020). The result of this is an online platform, that maps a city of interest into LCZs, solely expecting a valid TA file and some metadata as input. The web application simultaneously provides an automated accuracy assessment, in line with the cross-validation procedure detailed in Bechtel et al. (2019a). To date, this bootstrap-based accuracy assessment was not available in the SAGA GIS context, often leading to insufficiently robust accuracy assessments during the production of LCZ maps (Verdonck et al., 2017). In addition, the novel 3-step TA quality control facilitates the revision of the original TAs, allowing the user to revise the initial submission, and re-submit to the LCZ Generator, as previous work highlighted the importance of additional iterations (Bechtel et al., 2017Verdonck et al., 2019). Results in this study reveal for example that users should be more careful when digitizing TAs (e.g., compact shapes, scales, and borders), and should take into account seasonal properties of the underlying land cover/use. Note however that this TA quality control implementation is still experimental, and was successfully tested on a limited number of TA samples only. The LCZ Generator can assist in this respect to gather more TA samples in order to populate a spectral LCZ library across urban (eco)regions (Jackson et al., 2010;Schneider et al., 2010;Demuzere et al., 2019c), enabling a better assessment of spectral outliers.
The LCZ Generator should be considered as a dynamic application, that will be updated whenever new scalable mapping techniques and globally-available input features become available. In case updates occur in the future, they will be tracked via the software version number and described in the changelog available on the Github Issue page. For example, some successfully tested the use of object-based image analysis (Collins and Dronova, 2019;Simanjuntak et al., 2019), others obtained promising results using (residual) convolutional neural networks (Qiu et al., 2019Yoo et al., 2019;Liu and Shi, 2020;Rosentreter et al., 2020;Zhu et al., 2020). Yet to date, the feasibility of such procedures for large-scale LCZ mapping has not yet been demonstrated . Many others have developed GIS-based approaches using datasets from e.g., city administrations or derived from crowd-sourced cartographic services such as OpenStreetMap (Lelovics et al., 2014;Quan et al., 2017;Samsonov and Trigub, 2017;Wang et al., 2018;Hidalgo et al., 2019;Quan, 2019;Oliveira et al., 2020;Zhou et al., 2020). The latter study also proposes an extension to the default WUDAPT accuracy assessment, by integrating GIS data (e.g., building footprints and heights, and pervious surface fraction). While all these efforts are considered valuable, they have one thing in common limiting their implementation into the LCZ Generator: the underlying datasets are to date not globally available.
We anticipate that the LCZ Generator will ease the production, quality assessment and dissemination of LCZ maps and related products. This easy-to-use and accessible online platform should therefore continue to support researchers and practitioners in using the LCZ framework for a variety of applications, such as urban heat (risk) assessment studies , and references therein), climate sensitive design and urban planning (policies) (Perera and Emmanuel, 2016;Vandamme et al., 2019;Maharoof et al., 2020), anthropogenic heat and building carbon emissions (Wu et al., 2018;Santos et al., 2020), quality of life (Sapena et al., 2021), multi-temporal urban land change (Vandamme et al., 2019;Wang et al., 2019), and urban health issues (Brousse et al., , 2020a. This development will in addition accelerate the key aim of WUDAPT, that is "to capture consistent information on urban form and function for cities worldwide that can support urban weather, climate, hydrology and air quality modeling" (Ching et al., 2018. Examples of modeling systems currently using LCZ information are the Surface Urban Energy and Water Balance Scheme (SUEWS, Alexander et al., 2016), ENVI-met (Bande et al., 2020), the urban multi-scale environmental predictor (UMEP, Lindberg et al., 2018), MUKLIMO_3 (Bokwa et al., 2019;Gál et al., 2021), COSMO-CLM and the WUDAPT-TO-COSMO tool (Wouters et al., 2016;Brousse et al., 2019Brousse et al., , 2020bVarentsov et al., 2020), and the Weather Research and Forecasting model (WRF, Brousse et al., 2016;Hammerberg et al., 2018;Wong et al., 2019;Patel et al., 2020;Zonato et al., 2020). While WRF currently uses the WUDAPT-to-WRF tool to ingest LCZ information (Brousse et al., 2016), its next release expected in spring 2021 should offer this compatibility by default (A. Zonato, personal communication).
To conclude, and in line with the assessment of Creutzig et al. (2019), we firmly believe that this LCZ Generator has the potential to become a key part in mainstreaming and harmonizing urban data collection, upscale urban climate solutions and effect change at the global scale.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found at: https://lcz-generator.rub.de.

AUTHOR CONTRIBUTIONS
The authors jointly devised the concept of the LCZ Generator. MD developed the LCZ-related codes. JK developed the database and front-and back-end. MD developed all visualizations. MD led the writing with contributions from JK and BB.