- 1Department of Respiratory and Critical Care Medicine, The First Affiliated Hospital of Zhejiang Chinese Medical University (Zhejiang Provincial Hospital of Chinese Medicine), Hangzhou, China
- 2The First School of Clinical Medicine, Zhejiang Chinese Medical University, Hangzhou, China
- 3Center of Clinical Evaluation and Analysis, The First Affiliated Hospital of Zhejiang Chinese Medical University (Zhejiang Provincial Hospital of Chinese Medicine), Hangzhou, China
- 4College of Pharmacy, Jinan University, Guangzhou, China
- 5Guangdong Provincial Key Laboratory of Traditional Chinese Medicine Informatization, Jinan University, Guangzhou, China
- 6Cancer Research Institution, Jinan University, Guangzhou, China
- 7Department of Good Clinical Practice (GCP), Hospital of Chengdu University of Traditional Chinese Medicine, Chengdu, China
- 8Department of Good Clinical Practice (GCP), Hospital of Jiangxi University of Traditional Chinese Medicine, Nanchang, China
- 9Chronic Disease Management Department, Tao Zhuang Branch, The First People's Hospital of Jiashan County, Jiaxing, China
- 10College of Information Science and Technology, Jinan University, Guangzhou, China
- 11Department of Respiration, Wenzhou Hospital of Integrated Traditional Chinese and Western Medicine, Wenzhou, China
Background: Chronic obstructive pulmonary disease (COPD) is the third leading cause of death globally and a major public health issue in China. This study aims to develop a COPD predictive model and conduct risk stratification for key indicators not included.
Methods: We collected data from inpatients and outpatients with COPD and non-COPD who were hospitalized between January 2018 and December 2022 at three different hospitals. The data were divided into a training set and an internal validation set, using logistic regression to build a COPD predictive model and perform internal validation. External validation of the model was performed using data from two additional units for the period November 2019 to June 2022.
Results: A total of 1,056 cases were included: 740 in the training set, 316 in the internal validation set, and 408 in the external validation set. Six risk factors were identified: age (OR = 1.05, 95% CI: 1.02–1.08), second-hand smoke exposure (OR = 8.27, 95% CI: 2.70–25.34), cough (OR = 23.52, 95% CI: 12.64–43.77), “occasional episodes of wheezing that are mild and do not interfere with sleep or activity” (OR = 6.06, 95% CI: 2.59–14.19), “bouts of wheezing that worsen with movement” (OR = 21.40, 95%CI: 10.32–44.37), and “persistent episodes of wheezing, occurring at rest, unable to lie down” (OR = 10.97, 95% CI: 1.02–118.28). The predictive model equation was: y = −5.920 + 0.047 (age) + 2.113 (smoke exposure) + 3.158 (cough) + 1.801 (wheezing 1) + 3.063 (wheezing 2) + 2.396 (wheezing 3). The model achieved 94.1% accuracy, 98.5% sensitivity, and 89.2% specificity, with an AUC of 0.976 (internal) and 0.691 (external). The critical cut-off value was 0.258.
Conclusion: We have successfully developed a model for the diagnosis of COPD. The predictive model equation was: y = −5.920 + 0.047 (age) + 2.113 (smoke exposure) + 3.158 (cough) + 1.801 (wheezing 1) + 3.063 (wheezing 2) + 2.396 (wheezing 3).
1 Introduction
Chronic obstructive pulmonary disease (COPD) is a prevalent condition marked by persistent respiratory symptoms and airflow limitation. It is primarily characterized by chronic and often progressive airflow obstruction due to abnormalities in the airways (bronchitis) and/or alveoli (emphysema), resulting in chronic respiratory symptoms (dyspnea, cough, and sputum production) (1). According to the World Health Organization (WHO) (2), COPD ranks as the third leading cause of death globally, after ischemic heart disease and stroke. A high percentage of COPD cases remain undiagnosed. The GOLD 2023 guidelines discuss the impact of case-finding tools in improving COPD diagnosis rates, medical practices, and outcomes (1). In China, nearly 100 million people are affected by COPD, with the prevalence among those aged 40 years and older rising from 8.2% in 2007 to 13.7% in 2018 (3). COPD in China is characterized by high prevalence, morbidity, disability, mortality, and economic burden, along with low awareness (4). It has become one of the most prominent public health and medical problems in China in recent times.
The “gold standard” for diagnosing COPD relies on lung function testing. Despite standardized diagnosis and treatment protocols recommended by the Global Initiative for Chronic Obstructive Lung Disease (GOLD), most patients are not diagnosed until their symptoms become very pronounced. Consequently, by the time they seek medical attention, many patients already have impaired lung function. A nationwide epidemiological survey of COPD revealed that only 10% of respondents had undergone pulmonary function tests, and medication adherence among patients with COPD was as low as 11.7%. With the increasing prevalence of smoking in developing countries and increasing ageing in high-income countries, the incidence of COPD is projected to continue to rise over the next 40 years, with more than 5.4 million deaths from COPD and related diseases (5).
China is a country with a high prevalence of COPD, but due to its vast territory and numerous influencing factors, there is currently no widely used predictive tool to promote early diagnosis of COPD. This study aims to establish a more triage-oriented and applicable predictive model for COPD.
2 Methods
The research adhered to the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) guidelines throughout the investigation and was conducted with the approval of the First Affiliated Hospital of Zhejiang Chinese Medical University (ethics number: 2019-KL-095-02). All patients who participated in the prospective study signed an informed consent form.
2.1 Study cohort and subgroups
Clinical data were retrospectively collected from inpatients and outpatients with COPD admitted to the First Affiliated Hospital of Zhejiang Chinese Medical University, the Affiliated Hospital of Jiangxi Chinese Medical University, and the Affiliated Hospital of Chengdu Chinese Medical University from January 2018 to December 2022. Data were also collected from non-COPD patients attending these hospitals during the same period. Additionally, the prospective inclusion of patients with COPD who visited the physical examination center of the First Affiliated Hospital of Zhejiang Chinese Medical University and the First People’s Hospital of Jiashan County (Tao Zhuang Branch) was performed from November 2019 to June 2022. The retrospective data were categorized into training and internal validation sets, while the prospective data were used as an external validation set.
The retrospective data used to establish the predictive model were obtained from inpatients and outpatients or patients undergoing health examinations who actively visited the respiratory medicine clinic. Since these were retrospective cases, the patients’ information had already been recorded in electronic medical records, including gender, age, and other details. For the non-COPD group, we invited two senior attending physicians to individually assess all non-COPD patients who visited the respiratory medicine clinic between January 2018 and December 2022. We excluded the following situations: (1) repeat visits; (2) acute exacerbation of the disease; (3) exclusion of patients who had undergone lung surgery or had lung tumors, interstitial lung disease, or other diseases that affect lung function or produce clinical symptoms similar to COPD. The non-COPD group excluded in this manner will serve as the control group. The same method was used in all three hospitals.
For the recruitment of the external validation population, investigators regularly arranged for two senior attending physicians to visit the health examination center and Taozhuang Health Center to conduct pulmonary function tests on individuals undergoing routine health screenings on a voluntary basis. If a patient’s pulmonary function met the diagnostic criteria for COPD according to the GOLD 2019 guidelines and the patient was willing to participate, we invited the patient to sign an informed consent form and complete a questionnaire.
2.2 Inclusion criteria
Retrospective cases: patients with a definitive diagnosis of COPD.
Prospective cases: According to the GOLD 2019 guidelines (6), the diagnosis of COPD is primarily based on a history of exposure to risk factors, symptoms, signs, and clinical data, such as pulmonary function tests. It involves excluding other diseases that can cause similar symptoms and persistent airflow limitation and conducting a comprehensive analysis. Lung function tests showing persistent airflow limitation are necessary to confirm the diagnosis of COPD, with an FEV1/FVC ratio of <70% after bronchodilator inhalation, clearly indicating persistent airflow limitation.
2.3 Data acquisition
The clinical information collected in this study included baseline characteristics, medical history, laboratory tests, and clinical symptoms:
Baseline characteristics (6): age, sex, Body Mass Index (BMI) (7–10), history of smoking, history of exposure to secondhand smoke, family history of respiratory disease, and a definite diagnosis of COPD.
Medical history (11–13): history of hypertension, hyperlipidemia, diabetes mellitus, stroke, and osteoporosis;
Laboratory tests: white blood cell count (WBC) (14), platelet count (PLT), hemoglobin level (Hb), neutrophil percentage (NE%), red blood cell count (RBC) (15), eosinophil count (Eos), apolipoprotein, uric acid (UA), fasting blood glucose (FBG), and pulmonary function;
Clinical symptoms: cough, cough sputum (divided into three categories based on the amount of cough sputum: “0” indicates “no sputum or little sputum (sputum volume <50 mL)”; “1” indicates “moderate amount of sputum (sputum volume of 50–100 mL)”; “2” indicates “a lot of sputum (sputum volume >100 mL)”), and wheezing (divided into four categories based on the degree of wheezing: “0” indicates ‘“no significant wheezing”’; ‘1’ indicates ‘persistent episodes of wheezing, occurring at rest, unable to lie down’; ‘2’ indicates ‘wheezing episodes that worsen with movement’; and ‘3’ indicates ‘persistent episodes of wheezing, occurring at rest, unable to lie down’).
2.4 Sample size estimation, culling, and missing value treatment
According to the events per variable (EPV) principle, the minimum sample size required to build a predictive model is 10 times the number of variables included (16). Samples with > 10% missing values were excluded, and multiple interpolations were used to fill in the missing values.
2.5 Statistical analysis
A predictive model was developed and validated based on the TRIPOD guidelines (17). SAS (version 9.4) was used to randomly divide the retrospective data into a training set and an internal validation set at a ratio of 7:3. The training set, internal validation set, and external validation set were used for modeling, internal validation of the model, and external validation of the model, respectively.
IBM SPSS Statistics 26 was used to statistically analyze the data.
A systematic review (18) shows no performance benefit of machine learning over logistic regression for clinical prediction models. Furthermore, machine learning carries the risk of overfitting; thus, this study uses logistic regression to establish a prediction model.
First, in the training set, all independent variables were screened using univariate logistic regression to identify independent risk factors. All independent risk factors were then included in a multivariate logistic regression analysis, and the final predictive model was obtained through backward stepwise regression. This model was then applied to the internal and external validation sets for validation. The model’s performance was assessed by calculating the AUC under the ROC curve. Additional evaluation indices included accuracy, sensitivity, specificity, negative predictive value (NPV), and positive predictive value (PPV). A nomogram was constructed, and the calibration of the model was assessed using calibration curves. Finally, risk stratification of the model was performed in subgroups based on smoking history, BMI, pack-years of smoking, smoking cessation history, age at cessation, and EOS ratio (Eos%) (Figure 1).
3 Results
3.1 Study sample
In this study, 5,916 individuals were initially retrospectively included. After excluding 755 cases with pulmonary function deficiency, 590 cases with BMI deficiency, 254 cases with a family history of respiratory disease, 1,332 cases with a history of hypertension, hyperlipidemia, diabetes mellitus, and stroke, 1,657 cases with EOS deficiency, 247 cases with apolipoprotein deficiency, and 25 cases with coughing, sputum, and wheezing deficiencies, a final sample of 1,056 participants were included for analysis.
Multiple imputation was performed on the 1,056 samples and averaged across five imputations. There were no statistically significant differences in the data before and after interpolation (p > 0.05), as shown in Table 1.
Additionally, a total of 408 patients with COPD with complete data were prospectively included in this study.
3.2 Establishment and validation of the prediction model for COPD
3.2.1 Comparison of equivalence between the training set and internal validation set
Using SAS 9.4, all 1,056 cases were randomly split with a random seed into a training set comprising 70% (n = 740) and an internal validation set comprising 30% (n = 316). There were no significant differences between the two groups in terms of age, sex, BMI, history of hypertension, hyperlipidemia, diabetes mellitus, stroke, osteoporosis, WBC count, PLT count, Hb content, neutrophil ratio, RBC count, apolipoprotein A, FBG, EOS count, history of smoking, exposure to secondhand smoke, family history of respiratory diseases, cough, sputum, and wheezing (Table 2).
3.2.2 Basic characteristics of COPD and non-COPD patients in the training set
In the training set of 740 samples, there were 388 patients with COPD and 352 non-COPD patients. A comparison between the two groups revealed that patients with COPD exhibited higher levels of age, male sex, history of hypertension, history of diabetes mellitus, uric acid levels, WBC count, NE%, history of cigarette smoking, exposure to second-hand smoke, presence of cough, phlegm, and wheezing compared to non-COPD patients. However, patients with COPD had lower levels of BMI and apolipoprotein A compared to non-COPD patients (Table 3).
3.2.3 Univariate logistic regression in the training set
The univariate logistic regression analysis of the training set revealed that the following factors were independent risk factors for COPD: age, sex, BMI, history of hypertension, history of diabetes, uric acid levels, WBC count, NE%, RBC count, apolipoprotein A, history of cigarette smoking, exposure to secondhand smoke, presence of cough, sputum, and wheezing (Table 4).
3.3 COPD predictive model and nomogram
All independent risk factors identified in the univariate logistic regression were included in a multivariate logistic regression analysis using backward stepwise regression to establish a predictive model for COPD. The final factors included in the predictive model were age, secondhand smoke exposure, coughing, and wheezing (Table 5).
The risk of COPD increased by 0.05-fold for each additional year of age (OR = 1.05, 95%CI: 1.02–1.08). Individuals exposed to secondhand smoke had a 7.27-fold higher risk of COPD compared to those without such exposure (OR = 8.27, 95%CI: 2.70–25.34). Patients with coughing showed a 22.52-fold increase in the risk of COPD (OR = 23.52, 95%CI: 12.64–43.77) compared to those without this symptom. Different levels of wheezing symptoms also indicated varying levels of COPD risk.
The formula for the final model is as follows: y = −5.920 + 0.047 (age) + 2.113 (history of secondhand smoke) + 3.158 (having cough) + 1.801 (wheezing symptom 1) + 3.063 (wheezing symptom 2) + 2.396 (wheezing symptom 3), logit(p) = , where p represents probability and logit(p) is distributed between 0 and 1. A higher logit(p) indicates a greater risk of COPD.
Using R 4.1.3, a nomogram was plotted where each diagnostic factor corresponds to a score (also called a point). The scores from these factors were summed to obtain a total score (total points), which correlates with the corresponding risk of COPD (Figure 2).
3.4 Characterization of the external validation set
The external validation set comprised a total of 408 samples, consisting of 141 patients with COPD and 267 non-COPD individuals (Table 6).
3.5 Validation of prediction model
The model was tested on both internal and external validation sets to assess its discrimination and calibration.
3.5.1 Discrimination test
In the COPD prediction model, the area under the curve (AUC) for the training set was 0.964 (95% CI: 0.950–0.978), with an accuracy of 94.1%, a sensitivity of 98.5% and a specificity of 89.2%. For the internal validation set, the AUC was 0.976 (95% CI: 0.962–0.990), with an accuracy of 93.4%, a sensitivity of 96.2%, and a specificity of 89.6%. These results indicate that the model effectively discriminates samples from the same source and demonstrates excellent predictive capability for assessing the risk of COPD (Figure 3).
The AUC of the external validation set was 0.691 (95% CI: 0.962–0.990), with an accuracy of 49.3%, a sensitivity of 94.3%, and a specificity of 25.5%. Additionally, it demonstrated a PPV of 92.6% and an NPV of 94.5%, indicating a lack of generalization power for the model (Table 7).
The cutoff value of the predictive model was 0.258, meaning that when logit(p) was > 0.258, the individual can be diagnosed with COPD according to the model; otherwise, they are not diagnosed with COPD.
3.5.2 Calibration test
A calibration curve was constructed to determine the consistency of the logistic regression model (19). The ideal curve aligns closely with the bias-connected curve, indicating excellent calibration of the model (Figures 4–6).
3.6 Stratified analyses based on some risk factors
During the modeling process, certain variables had to be excluded due to excessive missing values. However, based on guidelines and numerous previous studies, BMI (20–24), smoking history (3, 25–27), and smoking cessation history (28, 29) may be high-risk factors for COPD development and may play a significant role in the diagnosis and evaluation of COPD. Therefore, in this study, the model was applied across various subgroups of the population. The results indicated that, except for variables with insufficient data for fitting, the model had demonstrated robust predictive capability across populations with or without a history of smoking, different BMI levels, varying smoking cessation histories, ≥40 pack-years of smoking, cessation of smoking at age <65 years, and different percentages of EOS (Table 8).
4 Discussion
With China’s economic and social development and the increasing aging population, the elderly population is growing rapidly in China. Attention to the health of the elderly has gradually shifted to a greater focus on disease prevention, improving individual function, promoting good health, and prolonging healthy life expectancy. Healthy China 2030 (30) emphasizes that COPD is characterized by high prevalence, disability, mortality, and disease burden.
In this study, we developed a predictive model for COPD using large sample retrospective data, identified four reliable risk factors for COPD, and derived predictive formulas. Following discrimination and calibration tests, the formulas accurately predicted the probability of COPD development within the same sample source while demonstrating average diagnostic effectiveness in external populations.
Age is a significant risk factor for COPD. The higher the age, the greater the prevalence, likely due to age-related decline in lung function and cumulative exposure to environmental pollutants such as tobacco smoke (31). COPD is highly prevalent in individuals aged over 40 years. According to a 2018 study from the Chinese Adult Lung Health Study (3), the prevalence of COPD among individuals aged over 40 years in China was reported to be 13.7%. In our study, the average age of patients with COPD included in the modeling was 70.37 years old. The coefficient of age in the final model was 0.047, indicating a positive correlation between age and COPD risk. This finding reaffirms the demographic distribution characteristics of COPD and underscores the impact of age on its development.
The effects of age on COPD are mainly reflected in the following aspects. First, there is a natural decline in lung function as individuals age. This decline includes reduced respiratory function, decreased alveolar elasticity, thinning of the alveolar wall, and increased airway resistance, leading to the emergence of symptoms such as dyspnea and cough. Second, aging correlates with declining nutritional status (9), impacting food intake and absorption abilities. For patients with COPD, body functions are in a high state of decomposition, leading to increased daily energy expenditure, and a significantly increased risk of malnutrition. Long-term malnutrition leads to muscle atrophy, especially the atrophy of the respiratory muscles, which makes the lungs less compliant and causes a decline in pulmonary ventilation (32). Third, as previously mentioned, aging increases the risk of decreased nutritional status. Without adequate nutrition, the immune system cannot function properly and the risk of lung infection is increased (33). In the elderly, each infection poses a significant threat to lung function, and the resultant damage is difficult to reverse. In patients with COPD, inflammatory irritation of the airways persists, and airways are constantly remodeling (34). Repeated infections exacerbate inflammatory and airway remodeling, further worsening pre-existing airway obstruction.
The primary components of tobacco are tar and nicotine, which cause inflammation, oxidative stress, and apoptosis. Cigarette smoke induces chronic inflammatory responses throughout the body by increasing the levels of inflammatory factors such as IL-1, IL-6, and TNF-α (35). The brain is highly sensitive to hypoxia, and cigarette smoke aggravates pulmonary ventilation and hypoxemia (31). This situation further slows cellular metabolism and promotes neuronal apoptosis (36).
COPD is a heterogeneous state of the lungs characterized by persistent airflow obstruction due to airway and/or alveolar abnormalities, often accompanied by chronic cough. Pathological changes in COPD involve the airways, lung parenchyma, and blood vessels. Airway alterations, in particular, play a significant role in causing cough (34), as they sustain persistent inflammation leading to mucus hypersecretion and ciliary dysfunction (37). However, narrowing of the airways makes it difficult to expel sputum in the lungs, which in turn stimulates the airways and causes cough. Many patients with COPD also experience allergic diseases, such as asthma and allergic rhinitis, which heighten airway receptor sensitivity and exacerbate cough due to allergic triggers. Patients with COPD are susceptible to bacterial and viral infections due to decreased immunity, further stimulating the airways to cause coughing.
Wheezing is common in patients with COPD, especially in severe disease or acute exacerbation. This study categorized wheezing into four distinct levels of symptoms to assess its diagnostic utility for mild COPD. The results showed that the different levels of wheezing symptoms were diagnostic factors of COPD, suggesting that the presence of wheezing symptoms holds diagnostic significance for identifying COPD once they manifest.
While several prediction models for COPD have been developed in China, most of them focused on studying risk factors for acute exacerbation and have been conducted within specific medical units or regions. In contrast, the present study is a multi-center clinical study with modeling samples from provincial-level tertiary hospitals in Zhejiang, Jiangxi, and Chengdu. This approach has allowed us to achieve a larger sample size, enhancing the regional representativeness and practical application of our final model. The external validation set utilized data from medical examination centers and health centers affiliated with tertiary hospitals, ensuring sample diversity across a broad spectrum. This approach effectively demonstrates whether our model can be widely applied in clinical settings.
China is a country with a high prevalence of COPD. Although lung function is an important basis for diagnosing COPD, many regions lack the conditions for lung function testing. Therefore, we aim to establish a predictive model that incorporates symptoms and routine biological indicators as much as possible as such a model would have broader application potential. For example, during annual physical examinations, if a doctor assesses that a patient has reached the high-risk threshold predicted by the model, they can refer the patient for pulmonary function testing. Additionally, the model can be used to stratify the risk of COPD among the examined population, thereby better assessing the risk of COPD.
However, the sample for this study is not yet sufficient, especially in terms of external validation specificity. One main limitation is sample selection bias. The retrospective data used to build our model came from three provinces in eastern and southwestern China, but due to the sudden outbreak of COVID-19, we were only able to include external validation data from one province in eastern China, resulting in sample bias, which we deeply regret. Additionally, since we used large-scale retrospective data to build the model, many indicators had to be excluded due to data missingness exceeding 10%, though we still analyzed some indicators we deemed important in risk stratification, which is another contributing factor. This model is suitable for the elderly population, which is one of its limitations. Although the model is biased, it is based on a multicenter design and has undergone rigorous validation, and we believe it still has significant value.
In comparison to a study published in Lancet Respiratory Medicine in 2020 (38), they developed a predictive tool to forecast, at an individual level, the rate and severity of COPD exacerbations, reported on its performance in an independent external cohort, and explained, using case studies, its potential clinical application. In 2022 (39), Thorax published an article using causal machine learning to explore the impact of individualized treatment on COPD exacerbations. These two studies suggest that identifying individual responses to COPD progression, exacerbations, and treatment may be more valuable for clinical diagnosis and management of COPD. This provides significant inspiration for our future COPD research. However, our team has not ceased clinical research on COPD. We continue to enroll COPD patients from different provinces and try to develop a more adaptive predictive model, even a digital diagnostic tool.
5 Conclusion
We have developed a predictive model for COPD for clinical use, enabling healthcare professionals, especially those in primary care settings, to quickly and conveniently assess the risk of COPD, thereby promoting timely diagnosis and treatment. However, this model still needs further verification. Until the model is more refined, it is recommended to use it with caution.
Data availability statement
The datasets presented in this article are not readily available because all data are stored in the First Affiliated Hospital of Zhejiang Chinese Medical University, Hangzhou, China. The data used and/or analyzed during the current study can be obtained from corresponding author. However, the data are not publicly available due to privacy or ethical restrictions. Requests to access the datasets should be directed to WZ, d2FuZ3poZW42MTBAc2luYS5jbg==.
Ethics statement
The studies involving humans were approved by the First Affiliated Hospital of Zhejiang Chinese Medical University. The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.
Author contributions
YW: Visualization, Data curation, Validation, Writing – original draft. YL: Writing – original draft, Visualization, Data curation. QL: Methodology, Data curation, Writing – review & editing. RZ: Writing – review & editing, Funding acquisition, Project administration. BY: Investigation, Validation, Writing – review & editing. HX: Writing – review & editing, Investigation, Resources. XQ: Validation, Writing – review & editing. YY: Methodology, Data curation, Writing – original draft. KN: Writing – review & editing, Validation, Investigation. JZ: Data curation, Writing – review & editing. XM: Data curation, Writing – review & editing. RG: Project administration, Resources, Data curation, Writing – review & editing, Validation. ZW: Validation, Funding acquisition, Writing – review & editing, Investigation, Supervision, Resources, Data curation, Methodology, Project administration.
Funding
The author(s) declare that financial support was received for the research and/or publication of this article. This study was funded by National Key R&D Program of China (2018YFC200 2500).
Acknowledgments
We would like to thank Jinan University, the Affiliated Hospital of Chengdu Chinese Medical University, the Affiliated Hospital of Jiangxi Chinese Medical University, and The First People’s Hospital of Jiashan County.
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The authors declare that no Gen AI was used in the creation of this manuscript.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
References
1. Global Initiative for Chronic Obstructive Lung Disease. Global strategy for the diagnosis, management, and prevention of chronic obstructive pulmonary disease (2023). Available online at: https://goldcopd.org/ (Accessed January 15, 2023).
2. WHO. Mortality and global health estimates. Available online at: https://www.who.int/data/gho/data/themes (Accessed March 4, 2022).
3. Wang, C, Xu, J, Yang, L, Xu, Y, Zhang, X, Bai, C, et al. Prevalence and risk factors of chronic obstructive pulmonary disease in China (the China pulmonary health [CPH] Study): a national cross-sectional study. Lancet. (2018) 391:1706–17. doi: 10.1016/S0140-6736(18)30841-9
4. Yahong, C. Interpretation of the GOLD global strategy for the diagnosis, treatment, and prevention of chronic obstructive pulmonary disease 2021. Chin J Front Med. (2021) 13:16–37. doi: 10.12037/YXQY.2021.01-02
5. Christenson, SA, Smith, BM, Bafadhel, M, and Putcha, N. Chronic obstructive pulmonary disease. Lancet. (2022) 399:2227–42. doi: 10.1016/S0140-6736(22)00470-6
6. Global Initiative for Chronic Obstructive Lung Disease. Global strategy for the diagnosis,management and prevention of chronic obstructive pulmonary disease 2019 report. Available online at: https://goldcopd.org/gold-reports/ (Accessed December 2, 2018).
7. Liu, S, Zhou, Y, Wang, X, Wang, D, Lu, J, Zheng, J, et al. Biomass fuels are the probable risk factor for chronic obstructive pulmonary disease in rural South China. Thorax. (2007) 62:889–97. doi: 10.1136/thx.2006.061457
8. Zhong, N, Wang, C, Yao, W, Chen, P, Kang, J, Huang, S, et al. Prevalence of chronic obstructive pulmonary disease in China: a large, population-based survey. Am J Respir Crit Care Med. (2007) 176:753–60. doi: 10.1164/rccm.200612-1749OC Erratum in: Am J Respir Crit Care Med. 2007; 176 (11): 1169
9. Zhou, Y, Wang, D, Liu, S, Lu, J, Zheng, J, Zhong, N, et al. The association between BMI and COPD: the results of two population-based studies in Guangzhou, China. COPD. (2013) 10:567–72. doi: 10.3109/15412555.2013.781579
10. Zhang, X, Chen, H, Gu, K, Chen, J, and Jiang, X. Association of body mass index with risk of chronic obstructive pulmonary disease: a systematic review and meta-analysis. COPD. (2021) 18:101–13. doi: 10.1080/15412555.2021.1884213
11. Alter, P, Lucke, T, Watz, H, Andreas, S, Kahnert, K, Trudzinski, FC, et al. Cardiovascular predictors of mortality and exacerbations in patients with COPD. Sci Rep. (2022) 12:21882. doi: 10.1038/s41598-022-25938-0
12. Alter, P, Kahnert, K, Trudzinski, FC, Bals, R, Watz, H, Speicher, T, et al. Disease progression and age as factors underlying multimorbidity in patients with COPD: results from COSYCONET. Int J Chron Obstruct Pulmon Dis. (2022) 17:1703–13. doi: 10.2147/COPD.S364812
13. Cazzola, M, Bettoncelli, G, Sessa, E, Cricelli, C, and Biscione, G. Prevalence of comorbidities in patients with chronic obstructive pulmonary disease. Respiration. (2010) 80:112–9. doi: 10.1159/000281880
14. Han, Z, Hu, H, Yang, P, Li, B, Liu, G, Pang, J, et al. White blood cell count and chronic obstructive pulmonary disease: a Mendelian randomization study. Comput Biol Med. (2022) 151:106187. doi: 10.1016/j.compbiomed.2022.106187
15. Huang, Y, Wang, J, Shen, J, Ma, J, Miao, X, Ding, K, et al. Relationship of red cell index with the severity of chronic obstructive pulmonary disease. Int J Chron Obstruct Pulmon Dis. (2021) 16:825–34. doi: 10.2147/COPD.S292666
16. Peduzzi, P, Concato, J, Feinstein, AR, and Holford, TR. Importance of events per independent variable in proportional hazards regression analysis. II. Accuracy and precision of regression estimates. J Clin Epidemiol. (1995) 48:1503–10. doi: 10.1016/0895-4356(95)00048-8
17. Snell, KI, Levis, B, Damen, JA, Dhiman, P, Debray, TP, Hooft, L, et al. Transparent reporting of multivariable prediction models for individual prognosis or diagnosis: checklist for systematic reviews and meta-analyses (TRIPOD-SRMA). BMJ. (2023) 381:e073538. doi: 10.1136/bmj-2022-073538
18. Christodoulou, E, Ma, J, Collins, GS, Steyerberg, EW, Verbakel, JY, and Van Calster, B. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J Clin Epidemiol. (2019) 110:12–22. doi: 10.1016/j.jclinepi.2019.02.004
19. Denguezli, M, Daldoul, H, Harrabi, I, Gnatiuc, L, Coton, S, Burney, P, et al. COPD in nonsmokers: reports from the Tunisian population-based burden of obstructive lung disease study. PLoS One. (2016) 11:e151981. doi: 10.1371/journal.pone.0151981
20. Sobrino, E, Irazola, VE, Gutierrez, L, Chen, CS, Lanas, F, Calandrelli, M, et al. Estimating prevalence of chronic obstructive pulmonary disease in the southern cone of Latin America: how different spirometric criteria may affect disease burden and health policies. BMC Pulm Med. (2017) 17:187–96. doi: 10.1186/s12890-017-0537-9
21. Stern, DA, Morgan, WJ, Wright, AL, Guerra, S, and Martinez, FD. Poor airway function in early infancy and lung function by age 22 years: a non-selective longitudinal cohort study. Lancet. (2007) 370:758–64. doi: 10.1016/S0140-6736(07)61379-8
22. Skripak, JM. Persistent effects of maternal smoking during pregnancy onlung function and asthma in adolescents. Pediatrics. (2014) 134:S146. doi: 10.1542/peds.2014-1817X
23. Joehanes, R, Just, AC, Marioni, RE, Pilling, LC, Reynolds, LM, Mandaviya, PR, et al. Epigenetic signatures of cigarette smoking. Circ Cardiovasc Genet. (2016) 9:436–47. doi: 10.1161/CIRCGENETICS.116.001506
24. Imboden, M, Wielscher, M, Rezwan, FI, Amaral, AFS, Schaffner, E, Jeong, A, et al. Epigenome-wide association study of lung function level and its change. Eur Respir J. (2019) 54:1900457. doi: 10.1183/13993003.00457-2019
25. Forey, BA, Thornton, AJ, and Lee, PN. Systematic review with meta-analysis of the epidemiological evidence relating smoking to COPD, chronic bronchitis and emphysema. BMC Pulm Med. (2011) 11:36. doi: 10.1186/1471-2466-11-36
26. Wheaton, AG, Liu, Y, Croft, JB, VanFrank, B, Croxton, TL, Punturieri, A, et al. Chronic obstructive pulmonary disease and smoking status-United States, 2017. MMWR Morb Mortal Wkly Rep. (2019) 68:533–8. doi: 10.15585/mmwr.mm6824a1
27. Diver, WR, Jacobs, EJ, and Gapstur, SM. Secondhand smoke exposure in childhood and adulthood in relation to adult mortality among never smokers. Am J Prev Med. (2018) 55:345–52. doi: 10.1016/j.amepre.2018.05.005
28. He, Y, Jiang, B, Li, LS, Li, LS, Ko, L, Wu, L, et al. Secondhand smoke exposure predicted COPD and 43other tobacco-related mortality in a 17-year cohort study in China. Chest. (2012) 142:909–18. doi: 10.1378/chest.11-2884
29. Korsbæk, N, Landt, EM, and Dahl, M. Second-hand smoke exposure associatedwith risk of respiratory symptoms, asthma, and copd in 20, 421 adults from the general population. J Asthma Allergy. (2021) 14:1277–84. doi: 10.2147/JAA.S328748
30. The Communist Party of China Central Committee the State Council. Healthy China 2030 blueprint. Available online at: http://www.gov.cn/xinwen/2016-10/25/content_5124174.htm (Accessed December 23, 2020).
31. Grahn, K, Gustavsson, P, Andersson, T, Lindén, A, Hemmingsson, T, Selander, J, et al. Occupational exposure to particlesand increased risk of developing chronic obstructive pulmonary disease (COPD): a population-based cohort study in Stockholm, Sweden. Environ Res. (2021) 200:111739. doi: 10.1016/j.envres.2021.111739
32. Langer, D, Ciavaglia, C, Faisal, A, Webb, KA, Neder, JA, Gosselink, R, et al. Inspiratory muscle training reduces diaphragm activation and dyspnea during exercise in COPD. J Appl Physiol. (1985) 125:381–92. doi: 10.1152/japplphysiol.01078.2017
33. Holtjer, JC, Bloemsma, LD, Beijers, RJ, Cornelissen, ME, Hilvering, B, Houweling, L, et al. Identifying risk factors for COPD and adult-onset asthma: an umbrella review. Eur Respir Rev. (2023) 32:230009. doi: 10.1183/16000617.0009-2023
34. Dey, S, Eapen, MS, Chia, C, Gaikwad, AV, Wark, PA, and Sohal, SS. Pathogenesis, clinical features of asthma COPD overlap, and therapeutic modalities. Am J Physiol Lung Cell Mol Physiol. (2022) 322:L64–83. doi: 10.1152/ajplung.00121.2021
35. Lytras, T, Kogevinas, M, Kromhout, H, Carsin, AE, Antó, JM, Bentouhami, H, et al. Occupational exposures and 20-year incidence of COPD: the European Community respiratory health survey. Thorax. (2018) 73:1008–15. doi: 10.1136/thoraxjnl-2017-211158
36. Xie, W, Dumas, O, Varraso, R, Boggs, KM, Camargo, CA Jr, and Stokes, AC. Association of occupational exposure to inhaled agents in operating rooms with incidence of chronic obstructive pulmonary disease among US female nurses. JAMA Netw Open. (2021) 4:e2125749. doi: 10.1001/jamanetworkopen.2021.25749
37. GBD 2019 Risk Factors Collaborators. Global burden of 87 risk factors in 204 countries and territories, 1990-2019: a systematic analysis for the global burden of disease study 2019. Lancet (London, England). (2020) 396:1223–49. doi: 10.1016/S0140-6736(20)30752-2
38. Adibi, A, Sin, DD, Safari, A, Johnson, KM, Aaron, SD, Fitz Gerald, JM, et al. The acute COPD exacerbation prediction tool (ACCEPT): a modelling study. Lancet Respir Med. (2020) 8:1013–21. doi: 10.1016/S2213-2600(19)30397-2
Keywords: chronic obstructive pulmonary disease, predictive model, risk factor, chronic obstructive pulmonary disease (COPD), clinical analysis
Citation: Wang Y, Lv Y, Li Q, Zhang R, Yan B, Xue H, Qian X, Yang Y, Ni K, Zhong J, Meng X, Gao R and Wang Z (2025) Development and validation of a predictive model for COPD: a multicenter study. Front. Med. 12:1615642. doi: 10.3389/fmed.2025.1615642
Edited by:
Shabana Urooj, Princess Nourah bint Abdulrahman University, Saudi ArabiaReviewed by:
Monica Ewomazino Akokuwebe, University of the Witwatersrand, South AfricaYunhuan Liu, Tongji University, China
Copyright © 2025 Wang, Lv, Li, Zhang, Yan, Xue, Qian, Yang, Ni, Zhong, Meng, Gao and Wang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Rundi Gao, bW95dWFuZ2luYUBzaW5hLmNvbQ==; Zhen Wang, d2FuZ3poZW42MTBAc2luYS5jbg==