Typical Modes of the Wind Speed Diurnal Variation in Beijing Based on the Clustering Method

Wind speed is an important meteorological condition affecting the urban environment. Thus, analyzing the typical characteristics of the wind speed diurnal variation is helpful for forecasting pollutant diffusion. Based on the K-means clustering method, the diurnal variation characteristics of the wind speed in Beijing during 2008–2017 are studied, and the spatiotemporal characteristics of the wind speed diurnal variations are analyzed. The results show that there are mainly five to seven clusters of typical characteristics of the wind speed diurnal variation at different stations in Beijing, and the number of clusters near the city is smaller than that in the suburbs. The typical number of the wind speed diurnal variation during 2013–2015 is smaller than that in other periods, which means the anomalous clusters of the diurnal variation are reduced. Besides, the numbers of different clusters in different years are often switched. Especially, the switch between clusters five and six and the switch between clusters six and seven are frequent. Based on the second cluster analysis of the clustering results at the Beijing station, we find 12 clusters of the diurnal variation, including nine clusters of “large in the daytime, while small at night,” two clusters of “monotonous,” and one cluster of “strong wind.” Furthermore, the low-speed clusters of wind mainly locate in the city with a significant increasing trend, while the high-speed clusters and the monotonous clusters of wind locate in the suburbs with a decreasing trend.


INTRODUCTION
There are significant environmental problems in big cities and industrial areas [1]. Surface meteorological conditions are important factors affecting the air quality [2], and the strong wind is associated with the rapid diffusion of the pollutant [3]. With the rapid expansion of cities, the urban heat island effect is significant, and the heat island circulation in the daytime is more significant than that at night, which means the characteristics of the wind speed diurnal variation have been changed [4,5]. Beijing is one of the largest cities in China located at the northern foot of the North China Plain, which is the intersection of the Taihang Mountains and the Yanshan Mountains. The special terrain leads to increasingly serious environmental problems, which become more significant with the expansion of the city [6]. In Beijing, the main concentration of pollutants is a two-peak pattern being coincident with rush hour [7]. If the wind speed is not big enough during these two periods, the pollutants are not easy to dissipate. The appearance time of strong wind is important for pollution.
Extracting the typical modes of daily variation of wind speed is helpful to study the appearance time of strong wind. The classification method has been verified to be useful to extract the typical modes, which can obtain more information from the system [8,9]. In the classification, the typical spatial modes can be extracted by taking the spatial field as the sample (Makra et al. [10]). Taking the diurnal variation as the sample, the typical diurnal variation modes can be extracted. The clustering is an effective technique for extracting the typical modes. The K-means clustering method [11,12] is the most widely used clustering method, which classifies a set of samples into k clusters according to the average distance from each sample to the cluster center. The clustering method is unsupervised learning, which does not rely on predefined samples and can automatically learn and label samples through iteration [13]. At present, it is widely used in fields such as machine learning [14], image recognition [15,16], speech recognition [17], and climate change [18]. Because the K-means clustering method is based on calculating the spatial distance, it is generally used in numerical samples. Therefore, when it comes to texts, risk levels, and logical decisions, quantification is needed [19]. The clustering analysis algorithm is simple and easy to operate. However, on the one hand, clustering tends to fall into the local optimization and instability due to the randomness of the initial value; that is, the clustering result depends on the selection of the initial value [20]. On the other hand, the selection of the k value is generally subjective and lacks self-adaptability. To solve the problem of selecting the initial values, improved K-means algorithms such as the Kd-tree [21], the K-means++ [22], the cluster center initialization algorithm [23], and the fast search and find of density peaks [24] were proposed.. For self-adaptation of the k value, the author in Ref. 13 proposed a new method based on the degree of dispersity and aggregation, which automatically determines the k value by presetting a large k value and then degenerating. The authors in Ref. 25 also proposed a support vector machine decision tree method to determine the k value based on the dichotomy K-means. In addition, the elbow method is widely used to determine the k value because it is simple to operate. The authors in Ref. 26 proposed a new method to automatically obtain the k value based on the elbow method. The application of these new methods makes up the shortcoming of the K-means method, and effectively promotes the development and application of the K-means clustering method. Time consumption of the clustering algorithm in the iteration process is another problem that must be considered. Time consumption increases linearly along with the increase of the database. Therefore, a second clustering method is proposed in this study to reduce time consumption and promote clustering efficiency.
In this study, the clustering analysis of characteristics of the wind speed diurnal variation in Beijing is carried out based on the K-means clustering method. In "Results and Analyses" section, the data and the method are introduced briefly. In "First Clustering: The Number of Clustering" section, the clustering analysis is carried out for each station based on the hourly wind speed data at 160 observation stations during 2008-2017.
The typical characteristics of the diurnal variation and the number of main clusters are obtained. In "Second Clustering: Typical Characteristics of the Wind Speed Diurnal Variation" section, according to the classification results, a second clustering is carried out to obtain the typical modes of the characteristics of the wind speed diurnal variation in Beijing. The temporal and spatial variations of the typical modes are analyzed too.

Data
Based on the observed meteorological data at Beijing station, relatively complete hourly wind speed data at 160 stations are retained after the quality control, covering the period from January 1, 2008 to December 31, 2017. The distribution of the stations is shown in Figure 1. Most of them are in the flat region in Beijing. The altitudes at some stations in Yanqing district, Mentougou district, and Fangshan district are more than 1, 000 m above sea level. The city and suburbs are divided by the Fifth Ring Road. The stations in the city are significantly more than those in the suburbs.

K-Means Clustering Method and Its Improvements
The K-means algorithm is an unsupervised learning algorithm, which is often applied to the field of data mining [26], and it is a common clustering algorithm. The calculation steps are as follows: 1) The number of samples is N, and k samples are selected randomly as the initial cluster centers. 2) Do calculation of Euclidean distances (D i,j ) from one sample represented with x i to each clustering center represented with c j according to Eq. 1. The sample x i is assigned to the cluster center c j when the Euclidean distance D i,J is the shortest. Calculate all the samples like this and assign them to different cluster centers. 3) In order to ensure that the cluster centers can be representative as much as possible, the cluster centers are recalculated by using the samples assigned in different clusters. 4) Repeat steps 2 and 3 until the cluster centers of each sample no longer change.
There are two shortcomings of the K-means method. The first is that the randomness of selecting the initial value results in different results. The method of the ensemble is used in this study. We repeat the clustering several times (the random selection of the initial value), then calculate the ensemble results, and finally determine the clustering results. The second shortcoming is that the selection of the k value directly impacts the clustering results. The optimal number of the samples is related to the structure of the data themselves, but the latter is hard to determine. It is very difficult to determine the optimal solution of the k value. Therefore, the elbow method is used to determine the k value in this study.

Second Clustering
If the number of samples is too large to be clustered directly, a second clustering method can be used. According to the characteristics of samples, the first clustering is carried out first. Then, the clustering results are used as samples for the second clustering. As shown in Figure 2, the cluster of each station is carried out first, and the second clustering is taken by using the clusters' results. The distributed clustering can greatly reduce the calculating time and save computing resources. Y ij (i 1, 2, 3,..., m; j 1, 2, 3, ..., n) represents the samples for the first clustering, where m represents a station and n represents moment. C il (i 1, 2, 3, ..., m; l 1, 2, 3, ..., k) represents the first clustering results, and it also represents the samples of second clustering, where k represents the number of first clustering for each station. Pt (t 1, 2,3, ...,o) represents the clustering result of second clustering.

Analyses of the Clustering Results at a Single Station
Taking Shunyi station as an example, we illustrate the process of the K-means clustering. The hourly data during 2008-2017 are taken as samples, and the total number of samples is 87,600. First, the elbow method is used to determine the k value, as shown in Figure 3A. The X-axis represents the k value, and the Y-axis is the average of the Euclidean distance between different samples and their corresponding clusters. It is noted that the average distance is 0 when k N. In the actual clustering analyses, we hope that k is as small as possible, but the clusters can represent the samples. In Figure 3, with the increase of k, the average distance decreases continuously. When k>10, the average distance is nearly constant. Thus, k is set to 10. Next, the ensemble method is used to minimize the influence of the initial value selection. In this study, the initial values are selected randomly 100 times, and the cluster frequency of each time is shown in Figure 3B with the percentage box line chart. The 10 clusters are marked as C1, C2,. . ., and C10, respectively. Their average sample proportions are 29.80, 19.01, 12.25, 9.96, 8.56, 6.10, 4.90, 4.14, 3.03, and 2.25%, respectively. However, their variances are small, which means that the clustering of most samples has not changed. For one sample, it may belong to different cluster center. We assign this sample to the cluster center which appears most in 100 clusters. Finally, 10 cluster centers of wind speed at Shunyi station are obtained and shown in Figure 4. Among them, the wind speed cluster of C1 shows the diurnal variation characteristic of "small at night and large in the daytime" with the maximum wind speed around 12:00 CST (China Standard Time; the same below). The wind speed cluster of C2 shows a skewed distribution, with the maximum wind speed in the afternoon. The diurnal variation characteristic of the wind speed cluster of C3 is similar to that of C1, but the wind speed is higher than that of C1. The diurnal variation of C4 is similar to that of C2, but the wind speed is higher than that of C2. The wind speed cluster of C5 is different from other clusters, showing a monotonous increasing diurnal variation. The diurnal variation cluster of C7 is significantly different from that of other clusters, showing a decreasing characteristic. The wind speed of C8 keeps low before 10:00 CST and increases rapidly from 11:00 to 16:00 CST. The wind speed diurnal characteristic of C9 presents the characteristics of linear increasing and linear decreasing. The diurnal variation of wind speed of C10 is similar to that of C6.
Using the clustering method, we can obtain the typical characteristics of the wind speed diurnal variation. However, the temporal and spatial distributions of the typical characteristics are not clear. Also, the number of the typical characteristics (clusters) that can be obtained at different stations is not clear. Therefore, the observed wind speed data of other 159 stations in Beijing are clustered like those of Shunyi station.

Numbers of the Clusters at Different Stations
The above analyses show that the diurnal variation of the wind speed in Beijing is diverse. The sample numbers of different clusters can be greatly different from those of each other. Thus, typical clusters are analyzed. Cluster analyses are carried out based on the diurnal variations of the wind speed at 160 stations in Beijing. The cluster number (k) is 10. According to Figure 3B, the frequencies of the first few clusters' samples are larger, which can represent more samples, and these clusters are considered to be typical clusters. In this study, if the sample percentage of one cluster is greater than 5%, the cluster is considered as a typical cluster.
The spatial distribution of the typical cluster number at different stations is shown in Figure 5A. The number of typical clusters in urban and flat areas is less than that of clusters in suburban mountainous areas. There are six stations with four clusters, which are mainly in the suburbs including Shunyi district, Huairou district, and Daxing district, with a low average altitude of 80.83 m ( Figure 5B). There are 29 stations with 5 clusters, which are mainly located in the city (including Chaoyang district, Haidian district, Fengtai district, Shijingshan district, Dongcheng district, and Xicheng district) and areas near the city (including Shunyi district, Changping district, and the south of Huairou district). Their average altitude is 113.17 m. There are 79 stations with 6 clusters, which are mainly in the urban area, and the average altitude is 172.85 m. There are 31 stations with 7 clusters. Parts of them are located in the city, and the others are in the area far away from the city, including Miyun district, Pinggu district, and the south of Daxing district. The average altitude is 243.48 m. There are 12 stations with eight and three stations with nine clusters, respectively, which are mainly in the northern area such as Yanqing district, Miyun district, and Pinggu district. The average altitudes are 335.75 and 588.00 m, respectively. In conclusion, the number of clusters with a lower altitude is less, including five clusters in urban areas and seven clusters in the suburb area. In the areas with high altitudes, there are mainly 8 and 9 clusters.

The Interannual Variation of the Cluster Number at Different Stations
The spatial and temporal variations of the typical cluster numbers in different years are further studied. The relationship between the typical cluster number and the station number is shown in  The spatial distribution of the annual cluster numbers is shown in Figure 7. Before 2011, the cluster numbers in Yanqing district, Miyun district, Pinggu district, and Tongzhou district, which are all suburban areas, were mainly seven to nine, while those in the other areas were mainly 5-6. During 2011-2012, the cluster numbers in the urban area, including Yanqing district, and Miyun district changed from 8-9 to 6, while the cluster numbers in the urban area and Shunyi district changed from 7 to 6. During 2013-2015, the cluster numbers in Yanqing district, Huairou district, Fangshan district, and Daxing district, and the urban area changed from 6-7 to 4-5. The variation shows that the number of main clusters was decreasing. While during 2016-2017, the cluster numbers in the urban areas, including Yanqing district, Changping district, and Pinggu district, changed from four to five to six to seven, which means the diurnal variation of wind speed has become more significant. The variation of the cluster numbers in recent years is further studied, as shown in Figure 8. Five to seven clusters remain unchanged before and after the transformation.  In summary, the cluster numbers of the wind speed diurnal variation in different regions of Beijing are significantly different. In urban areas, the cluster numbers are mainly 5 and 6. In the suburbs, the cluster numbers are mainly 7-9. Before 2015, the cluster numbers mainly changed from 7-9 to 4-5. The numbers increased after 2016. In the recent 10 years, the cluster numbers at most stations change from 5 to 6 and from 6 to 7, and the decreasing transformations are more than the increasing ones.

Classification and Diurnal Variation of Different Wind Speeds
The cluster results at different stations are simplified by the second clustering. The cluster results at all stations in Beijing are used as new samples for the second clustering analyses. Figure 9A is the elbow diagram of the second cluster. When the cluster number k is larger than 12, the average distance is almost the same. Therefore, the cluster number is set to 12, and different clusters are marked as P1, P2,. . ., and P12, respectively. Figure 9B shows the percentage of cumulative days at stations with different clusters. P1-P3 clusters are more than 10%, accounting for 39.85, 21.04, and 11.39%, respectively. The cumulative percentage is 72.28%, representing most of the wind speed diurnal variation. The proportions of P4, P5, and P6 are 9.04, 4.64, and 4.31%, respectively. The days represented by P7-P12 are less than the others, accounting for only 9.73% as shown in Figure 10.
The wind speed diurnal variations of different types are shown in Figure 10. The wind speeds of P1-P3 clusters are significantly lower than those of other clusters, and the diurnal average wind speeds are 1.07, 1.56, and 1.95 m s −1 , respectively. The wind speed diurnal variation of P1 cluster presents a quasi-symmetric structure, which is in a stable stage (about 1.00 m s −1 ) during 18:00-08:00 CST and increases during 8:00-14:00 CST. The maximum wind speed is 1.46 m s −1 . Then, the wind speed decreases during 14:00-18:00. The wind speed of P2 cluster is asymmetric. The wind speed is almost constant (1.00 m s −1 ) during 00:00 to 8:00 CST and increases during 08:00-15:00 CST, with a maximum of 2.63 m s −1 . The wind speed decreases during 15:00-19:00 CST and decreases slowly during 19: 00-23:00 CST. The wind speed of P3 cluster is also distributed asymmetrically. The wind speed is about 2 m s −1 during 00:00-8:00 CST and then increases. The maximum wind speed is at 11:00 and 12:00 CST (both are 2.53 m s −1 ). Then, the wind speed decreases slowly and remains constant after 19:00 CST (the average wind speed is 1.43 m s −1 ).
Compared with the average wind speeds of P1-P3 clusters, the average wind speeds of P4-P6 clusters are higher, which are 2.24, 3.03, and 2.14 m s −1 , respectively. The wind speed diurnal variation of P4 cluster is similar to that of P2 cluster. The wind speed hardly changes during 00:00-8:00 CST (the average wind speed is 1.31 m s −1 ), and then increases. The maximum wind speed is at 15:00 CST, which is 3.99 m s −1 . The wind speed decreases at 15: 00-19:00 CST and becomes constant during 19:00-23:00. The wind speeds of P5 and P6 clusters are monotonous. P5 cluster is a monotonously decreasing type, and P6 is a monotonously increasing type. The wind speed diurnal variation of P7 and P8 is slightly similar to that of P4, but the maximum wind speeds appear at different times. The maximum wind speeds of P7 and P8 clusters occur at 12:00 and 15:00 CST, with the maximum wind speeds of 4.41 and 5.07 m s −1 , respectively. Before reaching the  The variation of P12 cluster is different from that of other clusters. The diurnal variation is insignificant. The wind speed shows a linear increasing trend during 00:00-8:00 CST. Then, the wind speed decreases rapidly during 08:00-10:00 CST and nearly remains constant during 11:00-23:00 CST, which is 8.88 m s −1 .
To sum up, the diurnal variation of wind speed at all stations in Beijing can be divided into 11 clusters. Except for P5, P6, and P12, the wind speed diurnal variations of other clusters show the characteristics of "large in the daytime and small at night." However, the different time of the maximum wind speed and the different wind speed lead to multiple clusters of the wind speed diurnal variation of this type. P5 is a monotonic increasing cluster. P6 is a monotonic decreasing cluster. The wind speed of P12 is high without significant diurnal variation.

Interannual Variation and Trend of Different Wind Speed Clusters
Furthermore, the interannual variations of different wind speed cluster frequencies are analyzed, as shown in Figure 11. For P1 cluster, the frequency increases rapidly during 2008-2013, with a trend of 672.54 a −1 (significant at the 98% confidence level, according to the linear trend regression test (LTRT)), which means more and more wind speed is getting smaller. In 2014-2017, the increasing trend stops (the annual average frequency is 2.37 × 10 4 ). P1 cluster is mainly distributed in the fifth ring, which might be related to the larger roughness of the city. The frequency of P2 cluster does not increase or decrease significantly in the past 10 years, with an annual average frequency of 1.23 × 10 4 . The frequency of P3 cluster shows a significant negative trend of −105.89 a −1 (significant at the 95% confidence level, according to the LTRT). The frequency of P4 cluster in 2009 (6,062 times) is significantly higher than the annual average (5,193.44 times). After removing this year, there is a significant increasing trend of 42.39 a −1 (significant at the 95% confidence level, according to the LTRT). The frequencies of P5 and P6 significantly decrease with the trends of −87.55 a −1 and −71.99 a −1 , respectively (both significant at the In conclusion, there are significant differences in the variation trend of different wind speed clusters in different years. The variation trend of P2 is not significant. P1 and P4 show significant increasing trends. P3, P5, and P6 show significant decreasing trends. The frequencies of P7-P10 decrease before 2014 and then increase. The frequencies of P11 and P12 increase after 2014.

Clusters of Wind Speed at Different Stations
There are regional differences in the frequency of the wind speed clusters at different stations. The frequencies of P1-P12 clusters at  Figure 12. P1 cluster is the most common type. As for the spatial distribution, the frequencies at stations in urban areas are significantly higher than those in suburban areas. There are 25 stations more than 2,400 times. The station number with the frequency more than 1,200 times in the whole area is 81, showing that P1 cluster is the main cluster at most stations. Stations with frequencies less than 800 times are mainly in Yanqing district, Pinggu district, and Fangshan district, and the total number is 12. The stations of P2 cluster with a frequency more than 1,200 times are mainly distributed in Yanqing district, Huairou district, Miyun district, and Pinggu district, and the station number is 24. However, the frequency at most urban stations is less than 400. The stations with high frequencies of P3 cluster are mainly in the suburbs. There are 56 stations with frequencies more than 500 times, which are mainly distributed in Fangshan district, Changping district, Yanqing district, Miyun district, Pinggu district, and Shunyi district. The frequency of P3 cluster in most urban areas is less than 400 times. P4 cluster mainly occurs in the northwest of Beijing, including Mentougou district, Yanqing district, and Changping district. The number of stations with more than 500 times is 26. The frequency of P4 cluster in most urban stations is less than 300. As for P5 cluster, the number of stations with more than 400 times is 17, which are mainly distributed in Yanqing district, Changping district, and Pinggu district. As for P6 cluster, the number of stations with more than 400 times is 12, which are mainly distributed in the urban area and Miyun district. The frequencies of P7 and P8 clusters are low. There are 9 and 10 stations with more than 300 times, respectively, which distribute in the west and northwest of Beijing. There are few stations with P9-P12 clusters. As for P9 cluster, there are seven stations with a frequency of more than 200 times, which are mainly in Changping district, and most of the other stations are less than 100 times. As for P10 cluster, there are nine stations with a frequency of more than 200 times, which are mainly in Changping district and Yanqing district. As for P11 cluster, there are 15 stations with a frequency of more than 200 times, which are mainly distributed in Yanqing district. There are only four stations with P12 cluster. Among them, the frequency at Lingshan station in Mentougou district is 440 times, and the frequency at Foyeding station in Yanqing district is 396 times. At these two stations, the altitudes are 1,669 and 1,217 m, respectively. Therefore, the wind speeds are always high.
To sum up, there are differences in the main areas of different clusters of wind speed. P1 cluster mainly appears in urban areas. P2 cluster is mainly distributed in urban areas and northern areas. P3 cluster is mainly distributed in the central and northern areas. P4 and P5 clusters are mainly distributed in the northwest. P6 cluster is mainly distributed in the central area.

CONCLUSION AND DISCUSSION
In this study, the initial value of the K-means clustering method is selected using an ensemble method. The hourly observation data at 160 observation stations in Beijing in recent 10 years are used for cluster analyses. The different clusters of the wind speed diurnal variation at different stations are studied, and the spatial and temporal variations of the cluster numbers and types at different stations are analyzed. The conclusions are as follows.
1) The cluster analyses are carried out at each station. The wind speed at most stations can be divided into four to nine clusters, and the main clusters are five to seven clusters. There are mainly five to six clusters near the urban area and seven clusters far away from the urban area. The altitudes are high at the stations with 8 and 9 clusters.
2) As for the long-term variation, the number of stations with cluster numbers of four to five increased significantly during 2013-2015, and the number of stations with cluster numbers of six to eight decreased, which means the total number of the wind speed clusters decreased during this period. As for the transformation of cluster numbers in the recent 10 years, the stations with five to six clusters and six to seven clusters tend to transform more than the others, and the transformation to fewer cluster numbers is more than that to more cluster numbers.
3) For all stations, the diurnal variation of the wind speed can be divided into 12 clusters including 9 clusters of "large in the daytime and small at night," with 1 cluster of monotonous increase, 1 cluster of monotonous decrease, and 1 cluster of strong wind. Among them, nine different clusters of "large in the daytime and small at night" are mainly caused by the different time and value of the maximum wind speed. 4) As for the long-term variation trend, P1 and P4 increase significantly. P3, P5, and P6 decrease. P7-P12 show opposite trends before and after 2014. As for the spatial distribution, P1 cluster is mainly in urban areas, while other types are mainly in suburbs.
The daily variation of wind speed at the station near the urban area is consistent, while in the suburban area, the diurnal variation of wind speed at different stations is quite different, especially for the stations with high altitude. The difference of daily variation of wind speed at more and more stations is small, and the wind speed is small too. Under the background of urbanization, more and more buildings increase the surface roughness, reduce the wind speed, and reduce the difference of daily variation of wind speed at different stations [27]. It is not conducive to the dissipation of urban pollutants and should be paid more attention by the government.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.