Regional Characteristics of Precipitation in the Nanpan River Basin, China

Nanpan River is the source of the Pearl River in China, and thus, the exploitation of water resources in the Nanpan River Basin directly affects development in its middle and lower reaches. In the present study, the availability of water resources in the Nanpan River Basin and their differences were investigated. Sixteen statistical variables including the 25th and 75th percentiles, the coefficient of variation of the seasonal and annual precipitation and the annual precipitation concentration index were examined using monthly precipitation data collected in 33 stations in the Nanpan River Basin from 1956 to 2016. This paper studies the relationship between the Spearman’s rank correlation coefficient and the distance between stations, and uses principal component analysis (PCA) and cluster analysis to identify the homogeneous precipitation regions in the Nanpan River Basin. The results reveal the following: 1) The Spearman’s rank correlation coefficients for the monthly, seasonal and annual precipitation of the stations exhibit negative correlations with the interstation distance; the stronger the spatial correlation between both parameters, the shorter the time scale. 2) The factors controlling the spatial patterns of precipitation in the basin are its interannual and intra-annual variations. 3) Precipitation for the Nanpan River Basin produce two homogenous regions, which are associated with the influences of the South Asian monsoon, the North Atlantic Oscillation and the South Branch Trough. The first region is mainly to the east of longitude 104°E, while the second is principally to the west.


INTRODUCTION
The Pearl River is a major river in southern China that connects East China to West China and has the second largest discharge of any river in China, and its downstream area is one of the most economically developed regions in the country . Existing studies on precipitation in the Pearl River Basin have mainly used the lower reaches of the Pearl River Basin as the study area (Zhang et al., 2009;Fischer et al., 2010;Fischer et al., 2012). However, as the source of the Pearl River system (Dou and Qiyang, 2020), the Nanpan River Basin, particularly the precipitation in the region, has been the subject of relatively few studies.
In recent years, climate change has exacerbated the heterogeneity of the temporal and spatial distribution of precipitation (Li et al., 2016;Vu et al., 2018), which has caused changes in the intensity and frequency of precipitation (Ma et al., 2015;Cooley and Chang, 2017), in turn affecting the development and utilization of water resources (Immerzeel et al., 2012;Berghuij et al., 2017). Research on the spatial distribution and regionalization of precipitation is conducive to the development and utilization of water resources (Berndtsson and Niemczynowicz, 1988;Dinpashoh et al., 2004;Ilbay-Yupa et al., 2021). Correlation analysis, principal component analysis (PCA), spectral analysis and cluster analysis (Fazel and Berndtsson, 2018) are commonly used to study the spatial distribution and regionalization of precipitation, and the combination of PCA and cluster analysis can better describe the spatial distribution of precipitation and identify homogeneous precipitation regions (Dinpashoh et al., 2004;Modarres and Sarhadi, 2011;Fazel and Berndtsson, 2018). However, different basins have different spatial distributions of precipitation (Fazel and Berndtsson, 2018). Although it is the source of the second-largest river in China, few scholars have studied the spatial distribution of precipitation and precipitation zoning in the Nanpan River Basin.
Therefore, we performed spatial correlation analysis on the monthly precipitation data of 33 precipitation stations in the Nanpan River Basin. The relationship between Spearman's rank correlation coefficient and the distance between stations was analysed to reveal the spatial variation and overall picture of regional precipitation and whether the basin is a uniform precipitation area (Berndtsson and Niemczynowicz, 1988). On this basis, 16 statistical indicators related to precipitation were extracted for PCA; the principal component score was used for cluster analysis to divide the Nanpan River Basin into two regions of homogeneous precipitation areas; the differences between the two homogeneous precipitation regions were analysed; and the effects of the South Asian monsoon, the North Atlantic Oscillation (NAO), and the South Branch trough on different precipitation areas were investigated.

Study Area and Data
The Nanpan River Basin is located in Yunnan Province, China, with a drainage area of 43,181 km 2 (Chen et al., 2017). The Nanpan River provides important water resources for the development of the Central Yunnan Economic Zone in Yunnan Province (Xu-Yu and Zhang, 2012). The topography of the basin is complex, with high elevations in the west and low elevations in the east, and the highest peak, Liangwang Mountain, is 2,820 m above sea level. The average elevation is 297-2,820 m, and it spans from 102.2 to 106.2°E and from 23.2 to 26.8°N. Belonging to the subtropical monsoon climate zone, the wet season is from May to October, the dry season is from November to april of the following year, and the precipitation is concentrated in summer (Zhou et al., 2006).
In this study, the monthly precipitation data of 33 precipitation stations with complete records from the years 1956-2016 were selected to analyse the regional characteristics of precipitation in the Nanpan River Basin. The data came from the Yunnan Hydrology and Water Resources Bureau, China, an organization that specializes in the collection of data and the management of regional hydrology and water resources, so the deemed data are reliable. The elevation of the selected stations is between 530 and 2,060 m, so the maximum elevation difference between any two stations is 1,530 m, and the distribution of stations is generally even. The longitude and latitude of the stations range from 102.4 to 105.82°E and 23.38-26.22°N (Figure 1), respectively. Based on the monthly precipitation data, the multiyear average precipitation, spring (March-May) precipitation, summer (June-August) precipitation, autumn (September-November) precipitation, and winter (December-February) precipitation at each station were extracted. The multiyear average precipitation is between 712.1 and 1646.7 mm ( Table 1). The spatial distribution of precipitation is not uniform, and the most precipitation-rich station, Luoping, receives 934.6 mm more precipitation than the least precipitation-rich station, Yuguopo. At the same time, the South Asian summer monsoon index (SASMI) (http://www.lijianping.cn) (Jianping and Zeng, 2002;Li and Zeng, 2003) and the North Atlantic Oscillation Index (NAO Index) (Jones et al., 1997) (http://www. cru.uea.ac.uk/cru/data/nao/) were collected.

Methods
In this study, Spearman's rank correlation coefficient between station precipitation is first calculated to analyse the correlation with station distance and is used to describe the spatial correlation of precipitation. Sixteen statistical indicators were extracted from monthly precipitation data, and the principal components affecting precipitation were obtained by dimensionality reduction using principal component analysis. Cluster analysis was then carried out using the principal component scores of individual stations and was used to identify the spatial pattern of precipitation and homogeneous precipitation areas in the Nanpan River Basin. and y (Hubert, 1972;Szmidt and Kacprzyk, 2010), where the correlation between the variables can be described using a monotonic function. If y tends to increase when x increases, then ρ > 0; if y tends to decrease when x increases, then ρ < 0; and if y has no tendency to increase or decrease when x increases, then ρ 0. The correlation coefficient estimated by the Spearman method is more reasonable and has less error than other variables. The correlation coefficient ρ is given by the following equation: where x and y are two random variables, the number of variables is N, and the ith value of the two random variables (1 ≤ i ≤ N) is denoted by x i and y i , respectively. Spatial correlations between stations can be derived by combining Spearman's rank correlation coefficients for station precipitation with the interstation distances.

PCA
PCA is a validated climate zoning method (Richman, 2010). The main steps are as follows: 3) Find the eigenvalues and eigenvectors of the correlation matrix R.
Arrange the eigenvalues in order of magnitude, λ 1 ≥ λ 2 ≥ / ≥ λ P ; choose m to allow (λ 1 + λ 2 + / + λ m ) ≥ 85%, i.e., the first m main components obtained can explain more than 85% of the information of all variables, and the eigenvector corresponding to the eigenvalue λ j is denoted as u j [u 1j u 2j / u pj ](j 1, 2, /, m). 1 | Geographic information and annual and seasonal precipitation at rainfall station of Nanpan River basin

5) Perform Varimax Rotation
Varimax rotation is adopted to simplify the structure of the factor load matrix and facilitate the interpretation of each principal factor. In the factor load matrix, the maximum variance V of each factor load is used as the criterion to simplify the factor load matrix, and the factor load matrix with the simplest structure is finally obtained through calculations.

6) Calculate the Factor Score
The factor scoring function is where R is the correlation coefficient matrix of the original variables.
In addition, before PCA, the Kaiser-Meyer-Olkin (KMO) test needs to be run (Henry, 1970) to verify whether the data are reasonable. The value of the test result ranges from 0 to 1, and it needs to be greater than 0.5 and, ideally, close to one to be suitable for PCA. The calculation is as follows: where r is the correlation matrix and u is the covariance matrix.
The 16 precipitation indicators used in this study (PCI, and the 25th percentile, 75th percentile, and coefficient of variation (CV) of annual and seasonal precipitation) include not only those reflecting precipitation (the 25th percentile and 75th percentile of annual and seasonal precipitation) but also those reflecting the annual distribution and interannual variation in precipitation (PCI and CV). PCA can reduce the complexity of the variables and leave only the main components for further study.

Cluster Analysis
Cluster analysis is an important method for quantitatively studying the classification and zoning of samples. Common cluster analysis methods include hierarchical cluster methods, dynamic cluster methods, and fuzzy cluster methods. The basic principle is that, according to the attributes of the samples, the affinity between the samples is quantitatively described using mathematical methods based on a certain similarity or difference, and the samples are clustered according to the calculated affinity (Xu, 1996).
In this study, to determine which stations were similar in the Nanpan River Basin, Ward's method, which is the most commonly used method in climate zoning, was run in SPSS Statistics 26 software for hierarchical cluster analysis (Uvo, 2003;Sarhadi and Heydarizadeh, 2013;Fazel and Berndtsson, 2018). The cosine distance was used (Hajeer, 2012). The calculation method is shown in Equation (4): where, x jk is the calculated k th statistic for the station, y jk is the calculated k th statistic for station j, n is the number of statistics from each station, and d ij is the cosine distance between two stations.

Spatial Correlation
Based on Eq. 1, Spearman's rank correlation coefficients of precipitation at each station over different time scales were calculated, and the relationship between Spearman's rank correlation coefficients of the precipitation of all 33 stations and the interstation distance was obtained, as shown in Figure 2. This process was conducted to reveal the spatial correlation at different time scales and whether the basin is a homogeneous precipitation region. Figure 2A shows that on the annual scale, Spearman's rank correlation coefficient varies widely, between -0.2 and 0.9, and the variation is most significant when the interstation distance is less than 50 km, which occurs when slopes that have the same exposure experience rainfall of similar magnitudes. Hence, there is an increase in correlation between stations on equally oriented slopes, whereas over shorter distances, there may be a decrease in correlation. This behaviour is known as the hole effect (Bacchi and Kottegoda, 1995). As the distance increases, the variation range of Spearman's rank correlation coefficient gradually decreases. At 200 km, Spearman's rank correlation coefficient is between 0 and 0.6. Analysis of the relationship between distance and Spearman's rank correlation coefficient of precipitation yields a Pearson correlation coefficient R of -0.058, which is significant at the 90% confidence level, indicating that the annual precipitation has a certain negative spatial correlation.
As shown in Figures 2B-F, at a certain distance, the variation amplitude of Spearman's rank correlation coefficient on the seasonal and monthly scales is smaller than that on the annual scale, and with increasing distance, the coefficient gradually decreases. The absolute value of the R between distance and Spearman's rank correlation coefficient of precipitation is greater than 0.7. The R values of spring, summer, autumn, winter, and monthly precipitation are −0.791, −0.713, −0.804, −0.83, and −0.924, respectively, which are all significant at the 99.9% confidence level. These results indicate that there is a Frontiers in Environmental Science | www.frontiersin.org January 2022 | Volume 9 | Article 783515 significant negative spatial correlation in the seasonal and monthly precipitation in the Nanpan River Basin. According to the spatial correlation at different scales, the R is worst at the annual scale, at -0.058, followed by those at the seasonal scale, at −0.791, −0.713, −0.804, and −0.83, while R is best at the monthly scale, at −0.924. These results indicate that the spatial correlation of precipitation in the Nanpan River Basin is greater at finer time scales, and in terms of precipitation, the Nanpan River Basin is not a homogeneous precipitation region. PCA and cluster analysis can be used to further analyse and identify the homogeneous precipitation regions.

Homogeneous Rainfall Groups and Precipitation Characteristics
The Kaiser-Meyer-Olkin (KMO) test value of 0.647 obtained using Eq. 3 indicates that the selected variables are suitable for PCA. Based on the monthly precipitation data, 16 precipitation indicators, including the 25th and 75th percentiles, the CVs of the annual and seasonal precipitation and the PCI were calculated for each station, and the results are presented in Table 2. Based on the PCA analysis, five main components (PC1-PC5) were obtained from these indicators, while the varimax rotation was employed to improve the significance of the influencing variables. According to the matrices of the five principal components after rotation presented in Table 3, PC1, PC2, PC3, PC4 and PC5 explain 39.2, 16.3, 11.5, 11.4 and 10.1% of the total variance, respectively ( Figure 3A); that is, 88.5% of the total variance. The comprehensive scores of the five PCs were obtained from the factor score calculated using Eq. 2. Based on the comprehensive scores for each station, the spatial distribution map was created using the ArcGIS 10.2 (Figure 4).
According to the data in Table 3, the impacts of various indicators on the components differ, with a high load (>0.75) indicating that a significant correlation exists between an indicator and the correlation coefficient after rotation. The correlation coefficients of the principal components after rotation exhibit an identical confidence level of 95%.
According the data presented in Table 3 and displayed in Figure 3B, PC1, which explains 39.2% of the total variance, is  Where, n is the number of data years, R i is the annual (seasonal) total precipitation of each year (season), and R is the average of annual (seasonal) total precipitation Frontiers in Environmental Science | www.frontiersin.org January 2022 | Volume 9 | Article 783515 associated with high loads for the 25th and 75th percentiles of the total precipitation in the spring, summer and autumn as well as for the 25th percentile of precipitation in the winter. The highest load of 0.927 obtained for the 25th percentile of the total precipitation in the summer suggests that seasonal precipitation is the main factor controlling the PC1. Spatially, Figure 4A shows that the PC1 is low in the west and high in the east of the basin, with corresponding least and maximum load values of -3 and 8. Owing to the dominant control of its 25th and 75th percentiles, the annual precipitation emerged as the principal factor controlling the PC2, which explains 16.27% of the total variance. The load range for the entire region varies from -2 to 1, with negative values in the east while other parts are characterized by positive values ( Figure 4B). Therefore, the most important factors affecting the PC1 and PC2 are the annual precipitation and the 25th and 75th percentiles of the seasonal precipitation, and these highlight the importance of precipitation.
The high positive loadings for the PC3 along the southeastern edges of the basin highlight the importance of the PCI variability in the precipitation pattern of the region ( Figure 4C). The CVs for the winter and spring, which exhibit high positive loadings, are the main factors controlling the PC4. The contribution of the CV for autumn on the precipitation pattern of the region is defined by the PC5 (Figure 4E). These results indicate that the interannual and intra-annual variation of precipitation also influence the spatial pattern of precipitation in the basin.
Therefore, according to the results of the PCA, the integrated influence of the precipitation at each station and the interannual and intra-annual variation of precipitation account for the spatial pattern of precipitation in the Nanpan River Basin.
The PCA produced five PCs which explained 88% of the total variance in the precipitation, and the principal component scores for each station are presented in Table 4. Two homogeneous  Frontiers in Environmental Science | www.frontiersin.org January 2022 | Volume 9 | Article 783515 6 precipitation regions were then delineated in the Nanpan River Basin based on the hierarchical cluster analysis expressed in Eq. 4, as shown in Figure 5. The first region involves nine stations in the east of the basin (mainly east of longitude 104°E), and these are at elevations varying between 600 and 1485 m. The annual average precipitation values ranging between 990.42 and 1646.67 mm indicate that the entire region is characterized by high precipitation. Relatedly, the second region comprises 24 stations in the west of the basin (mainly west of longitude 104°E), which are at elevations ranging between 960 and 2060 m. The annual average precipitation values varying between 712.13 and 1005.64 mm are relatively lower than those for the  first region, and these highlight the impact of the differences in precipitation on the delineation of zones. Among the 25th and 75th percentiles of the precipitation data for each season and the multiyear average precipitation, the 75th percentile of the total precipitation in the winter displays a difference of 2.23%, whereas other indicators exhibit differences greater than 19%, with a maximum of 36.73% (Table 5). Overall, the 25th and 75th percentiles of the total precipitation in the spring, summer and autumn as well as the 25th percentile of precipitation in the winter are the main factors controlling the PC1. Conversely, the 25th and 75th percentiles of the annual precipitation are the principal factors influencing the PC2.
The relationships between the main factors controlling the PC1 and the elevation were also analysed and the results are presented in Table 6. Excluding the positive correlation between the data for the 25th percentile of the precipitation in autumn and the elevation, all other indicators display negative correlations. These results reflect the impact of the topographical characteristics including the elevation on the differences in precipitation. Therefore, the two homogeneous precipitation regions in the basin reflect differences in precipitation across the basin and the associated influences of the elevations.

DISCUSSION
In the present study, monthly precipitation data collected in 33 stations in the Nanpan River Basin from 1956 to 2016 were used to compute spatial correlations between data for various stations. The PCA and cluster analyses were then integrated to highlight the characteristics of precipitation in the basin. According to the relationships between the Spearman's rank correlation coefficient of the precipitation recorded at each station and the distances between the stations in the Nanpan River Basin, the strength of the spatial correlation increases as the time scale becomes finer. Based on the cluster analysis, the first homogeneous precipitation region is mainly to the east of longitude 104°E, and it is characterized by lower elevations and higher precipitation values relative to the second region. The second region, which is associated with higher elevations and lower precipitation values, is mainly to the west of longitude 104°E. These results are probably linked to the effects of the water vapour transport and weather systems on the precipitation in the basin.
First, precipitation mainly occurs in the Nanpan River Basin during the summer. Precipitation during the summer in the area originates mainly from water vapour transported by the South Asian monsoon (Webster and Song, 1992). This monsoon represents a coupled climate system (Webster et al., 1998) that is correlated with precipitation during the summer in the a Which is significant at the 95% confidence level. b Which is significant at the 90% confidence level. Northern Hemisphere (Jianping and Zeng, 2002), and this affects the amount of precipitation in the summer. According to Zheng and Huang. (2012), the influence of the South Asian monsoon on precipitation during the summer in Yunnan Province decreases from the east to the west. In the present study, the R values between the SASMI and the precipitation during the summer for the two homogeneous precipitation regions in the Nanpan River Basin are 0.382 and 0.359 (Table 7), and both values are significant at an identical confidence level of 90%. The higher R between precipitation and the SASMI in the first relative to the second region indicates that the South Asian monsoon exerts a stronger influence on precipitation in the east compared to the west of the basin. Alternatively, the influence of the South Asian monsoon on precipitation in the basin during the summer decreases from the east to the west. This finding is consistent with the conclusion of Zheng and Huang. (2012) that the influence of the South Asian monsoon on precipitation during the summer in Yunnan Province weakens from the east to the west. Secondly, in the winter, because of the influence of subtropical westerlies, the water vapour associated with precipitation in the Nanpan River Basin emanates primarily from the Bay of Bengal. This is then transported to the Yunnan Province mainly because of the activities in the South Branch Trough. However, the North Atlantic Oscillation (NAO) weakens the activities of the South Branch Trough, and thus, the wet flow into the Yunnan Province is partially hindered (Yun and Wei., 2014;Li et al., 2016). Therefore, in the present study, the Pearson's correlations between the NAO index and precipitation during the winter in the two subregions of the Nanpan River Basin were examined, and the results are presented in Table 8. The first and second regions yielded correlation coefficients of 0.255 and 0.233, respectively, and that for the first was significant at a confidence level of 90%. To further highlight differences in the relationships between the precipitation during the winter and the NAO index for the two regions, the correlations for each station located between latitudes 23°-24°N (high density of stations) were analysed, and the results are presented in Table 9. Evidently, the correlations for stations in the first region are higher than those for stations in the second region. Based on an integration of these results and those in Figure 6, the correlation between precipitation in the winter and the NAO index increases with increasing longitude (R 0.82). These results indicate that the influence of the NAO index on precipitation in the Nanpan River Basin changes with latitude. According to previous studies (Duan and Tao, 2012), water vapour is transported from the South Branch Trough to the Yunnan Province from the east to west and from the west to east. The transport of water vapour gradually wanes during the east-towest movement, whereas during the west-to-east transport, the influence of the South Branch Trough on precipitation is weakened by the higher elevations in the second region. In addition, during the dry season (November-April), the KQSF, an important weather system affecting the Yunnan-Kweichow Plateau (Jong, 1949;Jong, 1950;Duan and Ying, 2002), occurs near longitude 104°E. In fact, this is an important weather phenomenon which occurs in SW China in the winter and spring Tao and Cao., 2014). Based on the KQSF, the Nanpan River Basin can be partitioned into two precipitation regions, and the first, which is characterized by a relatively low topography, is in the rear area of the KQSF and to the east of longitude 104°E. The second region, which involves a higher topography, is in front of the KQSF and to the west of longitude 104°E. Therefore, as the KQSF moves westward, it gradually weakens (Zhang and Zhang, 2016), and this may be another factor that influenced our results.

CONCLUSION
The spatial correlation analysis between Spearman's rank correlation coefficient of the precipitation at each station and the interstation distance showed that the R is the smallest at the   Frontiers in Environmental Science | www.frontiersin.org January 2022 | Volume 9 | Article 783515 9 annual scale, at -0.058, followed by the seasonal scale, with R values in spring, summer, autumn, and winter at −0.791, −0.713, −0.804, and −0.83, respectively, while the R is the largest at the monthly scale, at −0.924. Our results indicate that there is a spatial correlation in the precipitation in the Nanpan River Basin, which is stronger at finer time scales.
Through PCA, five principal components that explained 88.5% of the total variance were obtained, and the scores of the five principal components indicated that the precipitation at each station and the interannual and intra-annual variations in precipitation constituted the most prominent characteristics of precipitation in the basin.
Through cluster analysis, two homogeneous precipitation regions were identified. The first homogeneous precipitation region is mainly located east of 104°E, and the second region is mainly located west of 104°E. In summer, the two homogeneous precipitation regions are affected by the South Asian monsoon to different degrees, and the effect gradually weakens from east to west, resulting in more precipitation in the first homogeneous precipitation region than in the second region. In winter, under the influence of the NAO and South Branch trough, more precipitation occurred in the first homogeneous precipitation region than in the second region. In the dry season, near 104°E, the existence of the KQSF may also affect the results of this regional division. Therefore, under the above conditions, it is reasonable to divide the watershed into two homogeneous precipitation regions.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding authors.

AUTHOR CONTRIBUTIONS
XQ and LW contributed to the conception and design of the study, XL provides data, HY, KW and DF performed the statistical analysis. XQ wrote the first draft of the manuscript. All authors contributed to manuscript revision, read, and approved the submitted version.

FUNDING
This study was supported by the Science and Technology Project of Yunnan Province Education Department (No. 2020J0246, 2022J0341).