Development and validation of nomogram including high altitude as a risk factor for COPD: A cross-sectional study based on Gansu population

Background Chronic Obstructive Pulmonary Disease (COPD) is a common and harmful disease that requires an effective tool to early screen high-risk individuals. Gansu has unique environments and customs, leading to the different prevalence and etiology of COPD from other regions. The association between altitude and COPD once attracted epidemiologists' attention. However, the prevalence in Gansu and the role of altitude are still unclarified. Methods In Gansu, a multistage stratified cluster sampling procedure was utilized to select a representative sample aged 40 years or older. The questionnaire and spirometry examination were implemented to collect participants' information. The diagnosis and assessment of COPD were identified by the Global Initiative for Chronic Obstructive Lung Disease (GOLD) criterion, while post-bronchodilator FEV1/FVC < LLN was for sensitivity analysis. Furthermore, the effect of high altitude on COPD was evaluated by the logistic regression model after propensity score matching (PSM). Finally, the participants were randomly divided into training and validation sets. The training set was used to screen the relative factors and construct a nomogram which was further assessed by the receiver operating characteristic (ROC) curve, calibration curve, and decision curve analysis (DCA) in the two sets. Results There were 2,486 eligible participants in the final analysis, of which 1,584 lived in low altitudes and 902 lived in high altitudes. Based on the GOLD criterion, the crude and standardized prevalences in Gansu were 20.4% (18.7–22.0) and 19.7% (17.9–21.6). After PSM, the logistic regression model indicated that high altitude increased COPD risk [PSM OR: 1.516 (1.162–1.978)]. Altitude, age, sex, history of tuberculosis, coal as fuel, and smoking status were reserved for developing a nomogram that demonstrated excellent discrimination, calibration, and clinical benefit in the two sets. Conclusions COPD has become a serious public health problem in Gansu. High altitude is a risk factor for COPD. The nomogram has satisfactory efficiency in screening high-risk individuals.

Methods: In Gansu, a multistage stratified cluster sampling procedure was utilized to select a representative sample aged years or older. The questionnaire and spirometry examination were implemented to collect participants' information. The diagnosis and assessment of COPD were identified by the Global Initiative for Chronic Obstructive Lung Disease (GOLD) criterion, while post-bronchodilator FEV /FVC < LLN was for sensitivity analysis. Furthermore, the e ect of high altitude on COPD was evaluated by the logistic regression model after propensity score matching (PSM). Finally, the participants were randomly divided into training and validation sets. The training set was used to screen the relative factors and construct a nomogram which was further assessed by the receiver operating characteristic (ROC) curve, calibration curve, and decision curve analysis (DCA) in the two sets.
Results: There were , eligible participants in the final analysis, of which , lived in low altitudes and lived in high altitudes. Based on the GOLD criterion, the crude and standardized prevalences in Gansu were . % ( . -. ) and . % ( . -. ). After PSM, the logistic regression model indicated that high altitude increased COPD risk [PSM OR: .
( . -. )]. Altitude, age, sex, history of tuberculosis, coal as fuel, and smoking status were reserved for developing a nomogram that demonstrated excellent discrimination, calibration, and clinical benefit in the two sets.

Introduction
Chronic Obstructive Pulmonary Disease (COPD) is the most common chronic respiratory disease and is the third cause of mortality leading to 3.2 million deaths worldwide (1,2). A population-based survey conducted from 2002 to 2004 indicated that the prevalence of Chinese older than 40 was about 8.2% (3). The rate rose to 13.6% just a decade later, which was higher in men and the southwest region than in women and other regions (4). Airway limitation and respiratory symptoms would seriously disturb the patient's life and work due to the irreversible and progressive lung lesions (5). Besides, data demonstrated that the direct medical cost of COPD accounted for 33.33%−118.09% of the average annual income in China (6). COPD has increasingly become a serious public health problem in China, leading to tremendous economic, social, and healthcare burdens.
Gansu province, located in northwestern China, is characterized by various altitudes (900-3,800 m) and multi-ethnic habitation (7,8). Compared to the flat region, the high-altitude region has heterogeneity in the prevalence and etiology of many diseases due to particular environments (cold temperatures, low humidity, hypobaric, hypoxic, etc.) and lifestyles (coal or biomass as fuel, etc.) (9)(10)(11). But, the prevalence of COPD in Gansu is still unclarified. In addition, COPD is mediated by environmental and genetic factors, in which smoking is the most recognized risk factor (2). However, quite a few non-smokers developing COPD indicate its etiological complexity (12). Although some researchers advocated that altitude was associated with COPD, it has yet to reach a consensus (13)(14)(15).
In this study, we conducted a cross-sectional study to explore the prevalence of COPD in Gansu and clarify the association between altitude and COPD. Also, a nomogram enrolling candidate risk factors was constructed to screen the high-risk population.
During 2018-2019, a multistage stratified cluster sampling procedure was utilized to collect a representative sample from Gansu ( Figure 1). Specifically, four monitoring sites in Gansu were selected by a probability proportional to size method (Longnan City, Jiuquan City, Qingyang City, and Gannan City). Using the same method, three towns were selected from each city, and more than two villages were further selected from each town. Subsequently, we selected a village with more than 150 households by the cluster random sampling method and randomly selected 100 families with members older than 40. Finally, the KISH table was used to select an eligible member from each family for investigation.
All participants were Chinese residents older than 40. Those physically unable to complete the spirometry examination were excluded (active tuberculosis, pregnancy, cardiovascular and cerebrovascular accidents in the past month, heart rate >120 beats/min or blood pressure >180/120 mmHg, related surgeries in the past 3 months, etc.) (16). The flowchart of screening participants was exhibited in Figure 2. Our study was approved by the ethics review committee of Guangzhou Medical University and Xi'an Jiaotong University Health Science Center.

Procedures
A professional questionnaire was implemented to collect information about demographic characteristics, respiratory symptoms, and exposure to COPD-related risk factors (17). Specifically, the smokers were defined as those who had continually smoked for more than 6 months. The former smokers were defined as those who had quit smoking for more than 1 month when they were interviewed. Biomass was defined as wood, grass, animal dung, and crop waste. Coal was defined as coal and lignite. Occupational exposure was defined as exposure to dust or chemicals in the workplace for more than 1 year (4,17). According to some references, the physiological alteration of the respiratory and circulatory systems began at altitudes >1,500 m (18-21). Thus, we chose 1,500 m to divide low and high altitudes.
Lung function was examined by EasyOne Spirometer (NDD Medizintechnik AG, Switzerland) following the operation of the American Thoracic Society and the European Respiratory Society (16). In brief, eligible participants were asked to estimate basic lung function and rechecked 15 min after inhalation of 400 µg salbutamol. Each test was acceptable only when the quality was rated A, B, or C. The overall proportion of A should be more than 70%.
The questionnaire and the spirometry examination were performed by trained staff from local medical systems.

Outcomes
According to the Global Initiative for Chronic Obstructive Lung Disease (GOLD) 2019, participants were diagnosed with COPD if forced expiratory volume in one second (FEV 1 ): forced vital capacity (FVC) was <70% after inhaling bronchodilators. The patients were categorized as the GOLD stage I (mild group: FEV 1 ≥80% predicted), GOLD stage II (moderate group: FEV 1 ≥50 to <80% predicted), GOLD stage III (severe group: FEV 1 ≥30 to <50% predicted), and GOLD stage IV (very severe group: . /fpubh. . FEV 1 <30% predicted). The modified Medical Research Council (mMRC) dyspnea scale was used to estimate the dyspnea of COPD patients (5). In this study, another diagnostic criterion was available for sensitivity analysis: post-bronchodilator FEV 1 /FVC < the lower normal limit (LLN). The LLN, predicted FEV 1 and FVC were calculated by specialized formulas that considered the characteristics of the Chinese population (22).

Statistical analysis
Continuous variables were described by mean [standard deviation (SD)] or median (lower quartile, upper quartile) and were compared by T-test or Mann-Whitney test. Categorical variables were described by rate [95% confidence interval (95% CI)] and were compared by Chi-square test or Fisher's exact test. The Cochran-Armitage test was used to compare the difference between two groups of ordinal data. The overall standardized prevalence of COPD was calculated based on the sample structure of the 2010 census of the Chinese population (23).
The 1:1 propensity score matching (PSM), with the caliper of 0.02 and in the nearest method, was implemented to balance the populations' characteristics between low and high altitudes. Subsequently, logistic regression models estimated odd ratios (Ors) and 95% CI to evaluate the association between high altitude and COPD risk (24).
Furthermore, the participants were randomly divided into the training and validation sets at the ratio of 6:4. In the training set, the univariable and stepwise logistic regression models were performed to screen the related factors to construct a nomogram for COPD (GOLD criterion). The nomogram was assessed by the receiver operating characteristic (ROC) curve, calibration curve, and decision curve analysis (DCA), which was reassessed in the validation set for internal validation.
Statistical analysis was performed by SPSS 26.0 and R 4.13. The result was considered statistically significant only when the two-sided P was <0.05.

Result Participants characteristics
We interviewed 2,925 residents. Finally, 2,486 eligible participants were included in the analysis, of which 1,584 lived in low altitudes (mean: 1,190.56 m, range: 996-1,442 m) and 902 lived in high altitudes (mean: 2,684.92 m, range: 2,416-2,896 m). Compared to the low-altitude population, the high-altitude population showed significant differences in demographic characteristics except for body mass index (BMI), history of tuberculosis (TB), and coal as fuel (P < 0.05; Table 1).

The prevalence of COPD in Gansu province
There were 508 individuals diagnosed with COPD by postbronchodilator FEV 1 /FVC < 70%. The overall and overall standardized prevalences were 20.4% (18.7-22.0) and 19.7% (17.9-21.6). Taking 1,500 m as the boundary to divide low and high altitudes, we found that the prevalence in high altitudes was significantly higher than in low altitudes [high altitudes vs. low altitudes: 23.4% (20.7-26.4) vs. 18.8% (16.9-20.5), P = 0.006]. This tendency was also statistically significant in some subgroups (P < 0.05). The details were summarized in Table 2.

The assessment of patients
The formula calculating the predicted FEV 1 was suitable for the participants aged 40-81 years, so five patients older than 81 were excluded. Finally, 505 patients were included in the severity assessment. Among the eligible patients, the proportion of GOLD stage I, GOLD stage II, and GOLD stage III or IV 3), respectively. Furthermore, compared to low-altitude patients, high-altitude patients had worse lung function and more serious dyspnea (both P < 0.05; Table 3,  Supplementary Table S2).

The role of high altitude on COPD
Biomass as fuel and ethnic minorities almost occurred at high altitudes (Table 1). Taking them into PSM would enormously decrease the sample size. In addition, the distributions of ethnicity and biomass as fuel between cases and controls were similar to the altitude, which might lead to multicollinearity and conceal the actual association between altitude and COPD if we take them into the PSM-processed and multivariate logistic regression models. Thus, the two variables were not considered in calculating propensity score (PS) and the multivariate logistic regression model.
After matching, 699 participants were retained in each group; the two populations have similar distributions of PS and comparable characteristics (SMD < 0.2, P < 0.05; Figure 3, Supplementary Table S3). Further, logistic regression models indicated that high altitude increased COPD risk in both the criteria of FEV 1

Construction and validation of the nomogram
Participants were randomly divided into two sets. The training set had 1,491 individuals, while the validation set had 995 individuals. The difference in the characteristics between the two populations was not statistically significant (P > 0.05; Supplementary Table S4). The univariable logistic regression model first eliminated the factors that did not statistically associate with COPD. Then, the stepwise logistic regression model reserved altitude, age, sex, TB, coal as fuel, and smoking status for developing a nomogram (Table 5, Figure 4).
The ROC curve, calibration curve, and DCA were used to assess the efficiency of the nomogram. In the training set, the AUC was 0.722 (0.690-0.754); the predicted probabilities were almost identical to the actual probabilities; when the threshold probability ranged from 0.08 to 0.45, the clinical benefit was positive ( Figures 5A-C). In the validation set, the AUC was 0.678 (0.634-0.722); the predicted probabilities also fluctuated around the actual probabilities; when the threshold probability ranged from 0.09 to 0.36, the clinical benefit was positive (Figures 5D-F). If removed altitude from the nomogram, the discrimination, calibration, and clinical benefit declined to some degree (Figures 5G-I).

Discussion
In this cross-section study, we investigated a represented sample with 2,486 individuals aged 40 years or above in Gansu by the multistage stratified cluster sampling procedure. Using both .

FIGURE
Distributions of propensity scores of participants before and after matching. the criteria of post-bronchodilator FEV1/FVC < 0.7 or LLN, we found that the prevalence and severity of COPD in high altitudes were higher than in low altitudes. Furthermore, the results of univariable, multivariable, and PSM-processed logistic regression models showed that high altitude was a risk factor for COPD. In addition, we identified altitude, age, sex, TB, coal as fuel, and smoking status as the risk factors and developed a nomogram for screening the high-risk population. The nomogram showed excellent discrimination, calibration, and clinical benefit in the internal validation. The prevalence of COPD in China has risen rapidly in recent years. Among residents older than 40, the overall prevalence was about 8.2% (7.9-8.6) in 2002-2004. However, that increased to 13.6% (12.0-15.2) just a decade later (3,4). Fang et al. (4) invoked data from the surveillance points in northwest China, only two of which were located in Gansu, and concluded that the   Previous studies found that unclean fuels were frequently used in northwest China (coal as fule: 49.4%; biomass as fule: 48.1%), but our study found a higher frequency in Gansu (coal as fule: 53.7%; biomass as fule: 83.9%), which might lead to the much higher prevalence in Gansu than in northwest China (4). Notedly, in the total, low-altitude and high-altitude populations, the prevalences diagnosed by post-bronchodilator FEV 1 /FVC < LLN were higher in the middle-aged and lower in the elderly when compared to the prevalences diagnosed by the GOLD criterion ( Figures 6A-C). LLN as the diagnostic criterion might have an advantage in improving diagnostic sensitivity in the middle-aged and avoiding overdiagnosis in the elderly (22). The role of altitude on COPD is still unclear (13). Some epidemiological studies indicated that high altitude was a protective factor for COPD (25-29). Recently, an epidemiological study combining several databases even suggested that high altitude (>1,500 m) did not associate with COPD (15). However, the participants of those studies were westerners whose genetic background might differ from Chinese (15,(25)(26)(27)(28)(29). In some research, the exposure frequencies of some known risk factors were lower at high altitudes, which contracted to our country (15,25,29). In addition, some studies either did not perform multivariate analysis or missed crucial risk factors in the model for adjusting confoundings (coal as fuel, etc.) (15,26,27,29). We advocated that high altitude increased COPD risk and severity, which was also one of the mainstream views (14,(29)(30)(31)(32)(33). There were two reasonable explanations. On the one hand, high-altitude residents tended to use unclean energy for cooking and warming. Long-term exposure to indoor air pollutants might promote inflammation and oxidative stress in the lung (34). On the other hand, a climate like hypobaric and hypoxic would stimulate hypoxia-related genes, which might play a harmful role in a series of physiological and biochemical processes such as inflammation, lung development, pulmonary hypertension or remodeling, and vascular permeability (35,36).

History of tuberculosis
Our result only regarded age, sex, TB, coal as fuel, and smoking as the COPD-relative predictors. In the univariable analysis, childhood hospital admission for severe respiratory disease, educational level, biomass as fuel, occupational exposure, and BMI were not significantly associated with COPD. One of the reasons might be the crude evaluation of the exposures like biomass as fuel and occupational exposure. Besides, similar exposure frequencies of the above factors in cases and controls required a larger sample to identify the association between the risk factors and COPD (37). Noteworthily, ethnicity was significantly associated with COPD in the univariable analysis. But the stepwise logistic regression model eliminated ethnicity, which might cause by its strong association with altitude (38).
Based on the multivariable regression model, the nomogram provides a quick, visual, and accurate tool to predict the probability of clinical outcomes and is popular among medical workers (39). As . /fpubh. .

FIGURE
Nomogram for COPD reserving altitude, age, sex, history of tuberculosis, coal as fuel, and smoking status as predictors.  we know, this tool is rarely utilized for screening the COPD highrisk population (40,41). In this study, we creatively developed a nomogram for COPD screening, which was evaluated by the ROC curve, calibration curve, and DCA. In the training and validation sets, the nomogram demonstrated excellent calibration and clinical benefit. However, the AUC in the training set was 0.722 (0.690-0.754), while that in the validation set was 0.678 (0.634-0.722), merely indicating acceptable discrimination of the nomogram. More candidate predictors should be enrolled to improve the efficiency of the nomogram by expanding the sample size in future work. We also found that the discrimination, calibration, and clinical benefit declined after removing altitude from the nomogram, reiterating that altitude is an important risk factor for COPD. This study has some limitations: (1) As a retrospective and cross-section study, the recall bias and defect in causal inference are inevitable. (2) The evaluation of some risk factors is crude, which may lead to false negative results. (3) The sample in high altitudes is relatively small, requiring a larger sample to detect the candidate risk factors with a similar distribution in cases and controls (37). (4) Our nomogram needs further external validation.

Conclusions
COPD has become a severe public health problem in Gansu. High altitude is a critical risk factor besides aging, male, TB, coal as fuel, and smoking. The nomogram with the above risk factors has satisfactory efficiency in screening high-risk individuals.

Data availability statement
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement
The studies involving human participants were reviewed and approved by the Institutional Review Board of Guangzhou Medical University and Xi'an Jiaotong University Health Science Center-approval: GZMC2007-07-0676 and XJTU2016-411. The patients/participants provided their written informed consent to participate in this study.

Author contributions
AL analyzed and interpreted the data and completed the writing. CM, BR, YW, GY, and HL took part in investigation. JL, XW, HZ, and XZ made contributions to conceptualization, participants recruitment, and project management. JL, YD, CX, and DH guided the methodology and revised the manuscript. All authors have read and approved the final manuscript.