Determination of Factors Affecting Dengue Occurrence in Representative Areas of China: A Principal Component Regression Analysis

Background: Determination of the key factors affecting dengue occurrence is of significant importance for the successful response to its outbreak. Yunnan and Guangdong Provinces in China are hotspots of dengue outbreak during recent years. However, few studies focused on the drive of multi-dimensional factors on dengue occurrence failing to consider the possible multicollinearity of the studied factors, which may bias the results. Methods: In this study, multiple linear regression analysis was utilized to explore the effect of multicollinearity among dengue occurrences and related natural and social factors. A principal component regression (PCR) analysis was utilized to determine the key dengue-driven factors in Guangzhou city of Guangdong Province and Xishuangbanna prefecture of Yunnan Province, respectively. Results: The effect of multicollinearity existed in both Guangzhou city and Xishuangbanna prefecture, respectively. PCR model revealed that the top three contributing factors to dengue occurrence in Guangzhou were Breteau Index (BI) (positive correlation), the number of imported dengue cases lagged by 1 month (positive correlation), and monthly average of maximum temperature lagged by 1 month (negative correlation). In contrast, the top three factors contributing to dengue occurrence in Xishuangbanna included monthly average of minimum temperature lagged by 1 month (positive correlation), monthly average of maximum temperature (positive correlation), monthly average of relative humidity (positive correlation), respectively. Conclusion: Meteorological factors presented stronger impacts on dengue occurrence in Xishuangbanna, Yunnan, while BI and the number of imported cases lagged by 1 month played important roles on dengue transmission in Guangzhou, Guangdong. Our findings could help to facilitate the formulation of tailored dengue response mechanism in representative areas of China in the future.

Assessing the impact of some driving factors such as vectors and climate-related features on the occurrence of dengue has become one of hotspots in current research (24)(25)(26)(27)(28). For instance, some studies have attempted to explore the influencing factors of dengue occurrence in some representative regions such as Guangdong and Yunnan provinces, and reported the potential dengue-related factors, including temperature, relative humidity, precipitation, sunlight and mosquito density (29)(30)(31). However, considering the strong correlations among these multi-dimensional factors, the effect of multicollinearity among independent variables has not been well examined in most previous studies, which could severely distort the model estimation. Principal component regression (PCR) can effectively minimize the multi-collinearity among the factors and has been widely used in the field of biomedicine (32,33). The pathogenesis of dengue is complex, involving viruses, hosts, human populations, ecological, environmental and social factors and the interactions of these factors. It is therefore necessary to take into account many factors in the model estimation and meanwhile to minimize the possible impact from multicollinearity.
In view of this, this study selected two representative cities or prefectures from two provinces with frequent dengue outbreak in recent years having different ecological, meteorological and socioeconomic characteristics, and then established the PCR model to identify the key factors influencing the occurrence of dengue. The results may provide scientific evidence for the formulation of tailored dengue prevention and control strategies and measures in China in the future.

Research Site
In this study, Xishuangbanna Dai Autonomous Prefecture (hereinafter referred to as Xishuangbanna prefecture) in Yunnan Province and Guangzhou city in Guangdong Province were selected as two representative sites with high-risk of dengue in China based on epidemic situation of dengue in during recent years (Figure 1).
Xishuangbanna prefecture is located at 21 • 10 ′ to 22 • 40 ′ north latitude, 99 • 55 ′ to 101 • 50 ′ east longitude. It belongs to the humid tropical area in south of the equator and connects Jiangcheng county and Puer city in the East and West. It is adjacent to Lancang county in the northwest, Laos and Myanmar in the southeast, south and southwest, respectively. Covering an area of 19, 582.45 square kilometers, there were 1.196 million permanent residents in this prefecture in 2019.
Guangzhou, the capital city of Guangdong Province, also known as Sub-provincial City, National Central City and Megacity, is located at 22 • 26 ′ to 23 • 56 ′ north latitude, 112 • 57 ′ to 114 • 3 ′ east longitude. It situates on a slope terrain with higher altitude in the northeast and lower altitude in the southwest. The northern part of the city is a hilly and mountainous area with concentrated forests. Guangzhou is adjacent to the subtropical coast and the equator passes through the south-central part of this metropolis, one of the largest cities in China. A marine subtropical monsoon climate prevails in this area, which is characterized by warm and rainy long summer, abundant light and heat, and a short frost period in winter. The annual average temperature ranges 20-22 • C with the smallest annual average temperature difference. Covering an area of 7,434.4 square kilometers, there were 15.359 million permanent residents in this city in 2019.   (34). Breteau Index (BI) is an index for surveillance Aedes larvae density and can be calculated by number of positive containers per a hundred houses inspected. At present, it is utilized widely in dengue risk assessment. The BI of high-risk areas in Guangzhou city and Xishuangbanna prefecture from June to October from 2006 to 2017 was obtained from the annual surveillance report of major infectious diseases and vectors in China CDC.
According to the major meteorological factors affecting dengue occurrence in representative areas of China in the previous researches, and also considering the availability of data of different indicators at different time scales, monthly meteorological data from 2006 to 2017 were obtained from the National Meteorological Science Data Sharing Service (http://cdc.nmic.cn/home.do). These indicators includes monthly average of temperature (Tmean), monthly average of temperature lagged by 1 month (Tmean1), monthly average of maximum temperature (Tmax), monthly average of maximum temperature lagged by 1 month (Tmax1), monthly average of minimum temperature (Tmin), monthly average of minimum temperature lagged by 1 month (Tmin1), monthly average of relative humidity (Hum%), cumulative precipitation (CP), cumulative precipitation lagged by 1 month (CP1), days of precipitation (DP) in the current month, days of precipitation lagged by 1 month (DP1).
The population data over the study period were retrieved from the Guangdong and Yunnan Statistical Yearbooks. The collected dataset of dengue indigenous and imported cases, the key meteorological index and BI data from June to October during 2006-2017 were integrated for the subsequent principal component regression (PCR) analysis.

Statistical Analysis
Multiple linear regression analysis was firstly utilized to determine the possible multiple collinearities of different variables included in this study. Collinearity diagnostic was carried out, and eigenvalue and condition index were computed. All the analyses were analyzed by SPSS version 18.0 (SPSS Inc., Chicago, IL, USA).
The variables concerning dengue occurrence in this study were standardized at first using normalization method. Principal component analysis was adopted to determine the number of principal components of standardized key factors including meteorological factors, mosquito density and imported dengue cases in the last month in the selected regions of Guangdong and Yunnan Provinces in China.
Taking the number of dengue cases as dependent variable (y) and principal components as independent variable (x), we then adopted the principal components regression analysis. In detail, according to the determined principal component, the principal component in front order replaces the original independent variable (x) for multiple linear regression analysis to obtain the regression model between the standardized independent variable (x) and the dependent variable (y). The number of principal components was determined by the accumulative contribution rate of these principal components by total initial eigenvalues. Then, the standardized independent variable is reduced to the original independent variables to obtain the regression model of the original independent variable and the independent variable. In this study, the established models are as follows.
a. The structure of established principal component regression model using standardized scores of principal components . . , b p are called as principal component contribution rate. b. The structure of established regression model using original variable: . . , β p are called as marginal coefficient.

Principal Component Analysis
Based on the above findings, multiple collinearities existed among some variables in both Guangzhou city and Xishuangbanna prefecture. Therefore, principal component analysis was adopted to avoid possible multiple collinearities. The eigenvalues, contribution variance and cumulative contribution of the variance of components in Guangzhou city and Xishuangbanna prefecture are shown in Table 2.

Principal Component Regression Analysis in Guangzhou City
After principal component analysis, multiple linear regression model was established using dengue cases as dependent variable and Z value as the independent variable. PCR model parameter evaluation is shown in Table 3. It was found that the established model was statistically significant (r = 0.881, F = 11.899, P < 0.001). Results of T test revealed that the model constant, regression (REGR) factor score 1 for analysis 1, REGR factor score 4 for analysis 1, have the statistically significance (P = 0.034, 0.012 and < 0.001) ( Table 4). The structure of established PCR using standardized scores of principal components in Guangzhou is as follows.

Principal Component Regression Analysis in Xishuangbanna Prefecture
Similarly, after principal component analysis, multiple linear regression model was established in with dengue cases as dependent variable and Z value as the independent variable.
Model parameter evaluation for Xishuangbanna is shown in Table 5. The established model was statistically significant (R = 0.898, F = 3.557 and P < 0.01). Results of T test showed that REGR factor score 2 for analysis 1, REGR factor score 4 for analysis 1, have the statistically significance (P = 0.014 and 0.015) ( Table 6). The structure of established PCR using standardized scores of principal components in Xishuangbanna prefecture, Yunan province, is as follows.

DISCUSSION AND CONCLUSION
Guangzhou city (12,35) and Xishuangbanna prefecture (17) are two representative regions of dengue outbreak with different dengue vectors during recent years in China (21). The actual number of dengue cases may be underestimated in the current notification system because of numbers of undetected cases with Frontiers in Public Health | www.frontiersin.org  subclinical infections and atypical symptoms (36). Therefore, identifying the key factors influencing the occurrence of dengue in the two high-risk areas in this study is of great significance for the scientific control of dengue in the future.
At present, many studies focused on the influencing factors relevant to the local transmission, even outbreaks of dengue in representative regions of mainland China, including the two cities or regions in this study (7). Usually, dengue outbreaks  are the consequence of the combination of many favorable conditions (6,8). Unfortunately, some key factors were neglected in most previous studies, and the complex collinearity among the studied factors was also not well considered in these researches. Therefore, it is difficult to reveal the true contribution of different influencing factors. Furthermore, inconsistent conclusions existed in different reports due to different study periods and regions selected. Based on the previous research, a total of 13 potential influencing factors, including climatic factors, indigenous and imported dengue cases, and mosquito larvae density (BI), were included and analyzed systematically using principal component regression analysis to avoid the possible multiple collinearity and confounding bias (33).
In Guangzhou, a region of dengue epidemic area dominated by Aedes albopictus (37), we observed multiple collinearity among variables. Specifically, high correlations existed among monthly maximum, minimum and mean temperatures, and these indicators lagged by 1 month, respectively with correlation coefficient higher than 0.85. In addition, correlations also existed among DP, and humidity, DP and Tmin, and DP and CP, DP1 and CP1, with the correlation higher than 0.60. Similar correlations existed for Xishuangbanna prefecture, where Ae. aegypti is the dominated vector for dengue transmission.
In this study, the established PCR model in Guangzhou revealed that the key contributing factors for the occurrence of dengue included BI (positive correlation), and Im1 (positive correlation). Dengue is transmitted by the biting of Aedes mosquito. Therefore, as one of a major indicators of mosquito density, BI is regarded as a promising and direct index for dengue early warning worldwide (38). As far as we know, the dengue case is the source of infection and transmission of dengue involving two incubation periods: an internal incubation period (3-15 days) of the dengue case and an external incubation period (3-15 days) in the body of Aedes mosquito. Furthermore, as an international metropolis, Guangzhou has a large number of floating populations from other domestic provinces and also  foreign countries. Additionally, the major driving factor of Im1 identified in Guangzhou also confirmed that the risk of local outbreak caused by imported dengue case in this city is relatively high. This finding was in accordance with previous study focusing on Guangzhou city (34).
Temperature is regarded as one of the most important climatic factors for dengue transmission (39). Based on literature review, appropriate temperature could influence the reproduction of vector mosquitoes and dengue viruses, which subsequently impacts the dengue risk (40). Relative humidity can affect oviposition, egg hatching, dispersal range, feeding behaviors, and lifespan of Aedes mosquitoes. Our results in Xishuangbanna demonstrated that the top three factors driving dengue occurrences included Tmin1 (positive correlation), Tmax (positive correlation), and Hum (positive correlation). Xishuangbanna prefecture has a much smaller number of floating populations compared to Guangzhou. In particular, Xishuangbanna belongs to the tropical rain forest area, and the suitable meteorological conditions are more conducive to Aedes mosquito breeding, favoring the local outbreak of dengue. The findings were consistent with those in the previous investigation conducted in the border areas of Yunnan and Myanmar [BYM]) (7). Due to the relatively little evidence in Xishuangbanna prefecture of Yunnan, more future studies are warranted to further explore the key factors driving the local outbreak of dengue in this area.
Overall, our study observed that meteorological factors, especially Tmin1, Tmax, Hum, played key roles in dengue occurrence in Xishuangbanna, Yunnan while mosquito density (BI) and Im1 played important roles for dengue transmission in Guangzhou, Guangdong. For targeted dengue control in Guangzhou city, it is urgent to pay close attention to the dengue epidemic both at home and abroad, track the imported dengue cases timely, and reduce adult mosquito density relying on source reduction of Aedes mosquitoes. As for dengue control in Xishuangbanna, it is necessary to include some key meteorological factors when the implementation of dengue early warning and precise control. Furthermore, the difference of dengue determinants in different provinces could be taken into account when formulating tailored dengue precise control and prevention strategy, response mechanism in representative areas of China in the future. Some limitations should be noticed. First, the impact factors of dengue local transmission and outbreaks are complicated. We have tried our best to include as many risk factors as possible. However, other factors, such as public health control programs (30), new dengue virus strain invasion (41,42), emerging insecticide resistance in dengue vectors (43,44), El Nino Southern Oscillation (28), changes of land use and surface water (2), key areas in a city such as "urban villages (UVs)" (3), social media surveillance data (45), air travel data (46), human behavior, other socio-economic indicators (47), are also important for the dengue occurrence and need to be well discussed in future studies. In addition, the exact number of cases of dengue may be underestimated (36) due to the phenomenon of latent infection and mild cases of dengue, which were not included in this study.

CONCLUSIONS
Climatic factors and mosquito density are the key drivers on dengue occurrence in representative high-risk dengue areas in China. Meteorological factors such as monthly average of minimum temperature lagged by 1 month, monthly average of maximum temperature and relative humidity, have stronger impacts on dengue occurrence in Xishuangbanna prefecture of Yunnan, while Aedes larvae density and the number of imported dengue cases lagged by 1 month have more profound impacts on dengue occurrence in Guangzhou city. The findings may provide scientific evidence for the development of early warning and targeted dengue control strategies and measures in China.

DATA AVAILABILITY STATEMENT
The data analyzed in this study is subject to the following licenses/restrictions: Some data could be available based on China CDC's regulations. Requests to access these datasets should be directed to liuxiaobo@icdc.cn.

ETHICS STATEMENT
This study was approved by the Ethics Committee of National Institute for Communicable Disease Control and Prevention, China CDC. No human or animal samples were included in the current study.

AUTHOR CONTRIBUTIONS
XL planned the project and wrote the paper. XL, QL, and YG conducted the field survey. XL, KL, YY, HW, SY, DR, NZ, and JY contributed to data analysis. All authors read and approved the final manuscript.

ACKNOWLEDGMENTS
We would like to acknowledge disease prevention and control staff from Guangdong and Yunnan province, for the collection of relevant data. Special thanks to Prof. Peng Bi from the School of Public Health, The University of Adelaide for the revision of the manuscript.