ORIGINAL RESEARCH article

Front. Med., 09 September 2025

Sec. Pulmonary Medicine

Volume 12 - 2025 | https://doi.org/10.3389/fmed.2025.1615642

Development and validation of a predictive model for COPD: a multicenter study

  • 1. Department of Respiratory and Critical Care Medicine, The First Affiliated Hospital of Zhejiang Chinese Medical University (Zhejiang Provincial Hospital of Chinese Medicine), Hangzhou, China

  • 2. The First School of Clinical Medicine, Zhejiang Chinese Medical University, Hangzhou, China

  • 3. Center of Clinical Evaluation and Analysis, The First Affiliated Hospital of Zhejiang Chinese Medical University (Zhejiang Provincial Hospital of Chinese Medicine), Hangzhou, China

  • 4. College of Pharmacy, Jinan University, Guangzhou, China

  • 5. Guangdong Provincial Key Laboratory of Traditional Chinese Medicine Informatization, Jinan University, Guangzhou, China

  • 6. Cancer Research Institution, Jinan University, Guangzhou, China

  • 7. Department of Good Clinical Practice (GCP), Hospital of Chengdu University of Traditional Chinese Medicine, Chengdu, China

  • 8. Department of Good Clinical Practice (GCP), Hospital of Jiangxi University of Traditional Chinese Medicine, Nanchang, China

  • 9. Chronic Disease Management Department, Tao Zhuang Branch, The First People's Hospital of Jiashan County, Jiaxing, China

  • 10. College of Information Science and Technology, Jinan University, Guangzhou, China

  • 11. Department of Respiration, Wenzhou Hospital of Integrated Traditional Chinese and Western Medicine, Wenzhou, China

Article metrics

View details

1,4k

Views

311

Downloads

Abstract

Background:

Chronic obstructive pulmonary disease (COPD) is the third leading cause of death globally and a major public health issue in China. This study aims to develop a COPD predictive model and conduct risk stratification for key indicators not included.

Methods:

We collected data from inpatients and outpatients with COPD and non-COPD who were hospitalized between January 2018 and December 2022 at three different hospitals. The data were divided into a training set and an internal validation set, using logistic regression to build a COPD predictive model and perform internal validation. External validation of the model was performed using data from two additional units for the period November 2019 to June 2022.

Results:

A total of 1,056 cases were included: 740 in the training set, 316 in the internal validation set, and 408 in the external validation set. Six risk factors were identified: age (OR = 1.05, 95% CI: 1.02–1.08), second-hand smoke exposure (OR = 8.27, 95% CI: 2.70–25.34), cough (OR = 23.52, 95% CI: 12.64–43.77), “occasional episodes of wheezing that are mild and do not interfere with sleep or activity” (OR = 6.06, 95% CI: 2.59–14.19), “bouts of wheezing that worsen with movement” (OR = 21.40, 95%CI: 10.32–44.37), and “persistent episodes of wheezing, occurring at rest, unable to lie down” (OR = 10.97, 95% CI: 1.02–118.28). The predictive model equation was: y = −5.920 + 0.047 (age) + 2.113 (smoke exposure) + 3.158 (cough) + 1.801 (wheezing 1) + 3.063 (wheezing 2) + 2.396 (wheezing 3). The model achieved 94.1% accuracy, 98.5% sensitivity, and 89.2% specificity, with an AUC of 0.976 (internal) and 0.691 (external). The critical cut-off value was 0.258.

Conclusion:

We have successfully developed a model for the diagnosis of COPD. The predictive model equation was: y = −5.920 + 0.047 (age) + 2.113 (smoke exposure) + 3.158 (cough) + 1.801 (wheezing 1) + 3.063 (wheezing 2) + 2.396 (wheezing 3).

1 Introduction

Chronic obstructive pulmonary disease (COPD) is a prevalent condition marked by persistent respiratory symptoms and airflow limitation. It is primarily characterized by chronic and often progressive airflow obstruction due to abnormalities in the airways (bronchitis) and/or alveoli (emphysema), resulting in chronic respiratory symptoms (dyspnea, cough, and sputum production) (1). According to the World Health Organization (WHO) (2), COPD ranks as the third leading cause of death globally, after ischemic heart disease and stroke. A high percentage of COPD cases remain undiagnosed. The GOLD 2023 guidelines discuss the impact of case-finding tools in improving COPD diagnosis rates, medical practices, and outcomes (1). In China, nearly 100 million people are affected by COPD, with the prevalence among those aged 40 years and older rising from 8.2% in 2007 to 13.7% in 2018 (3). COPD in China is characterized by high prevalence, morbidity, disability, mortality, and economic burden, along with low awareness (4). It has become one of the most prominent public health and medical problems in China in recent times.

The “gold standard” for diagnosing COPD relies on lung function testing. Despite standardized diagnosis and treatment protocols recommended by the Global Initiative for Chronic Obstructive Lung Disease (GOLD), most patients are not diagnosed until their symptoms become very pronounced. Consequently, by the time they seek medical attention, many patients already have impaired lung function. A nationwide epidemiological survey of COPD revealed that only 10% of respondents had undergone pulmonary function tests, and medication adherence among patients with COPD was as low as 11.7%. With the increasing prevalence of smoking in developing countries and increasing ageing in high-income countries, the incidence of COPD is projected to continue to rise over the next 40 years, with more than 5.4 million deaths from COPD and related diseases (5).

China is a country with a high prevalence of COPD, but due to its vast territory and numerous influencing factors, there is currently no widely used predictive tool to promote early diagnosis of COPD. This study aims to establish a more triage-oriented and applicable predictive model for COPD.

2 Methods

The research adhered to the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) guidelines throughout the investigation and was conducted with the approval of the First Affiliated Hospital of Zhejiang Chinese Medical University (ethics number: 2019-KL-095-02). All patients who participated in the prospective study signed an informed consent form.

2.1 Study cohort and subgroups

Clinical data were retrospectively collected from inpatients and outpatients with COPD admitted to the First Affiliated Hospital of Zhejiang Chinese Medical University, the Affiliated Hospital of Jiangxi Chinese Medical University, and the Affiliated Hospital of Chengdu Chinese Medical University from January 2018 to December 2022. Data were also collected from non-COPD patients attending these hospitals during the same period. Additionally, the prospective inclusion of patients with COPD who visited the physical examination center of the First Affiliated Hospital of Zhejiang Chinese Medical University and the First People’s Hospital of Jiashan County (Tao Zhuang Branch) was performed from November 2019 to June 2022. The retrospective data were categorized into training and internal validation sets, while the prospective data were used as an external validation set.

The retrospective data used to establish the predictive model were obtained from inpatients and outpatients or patients undergoing health examinations who actively visited the respiratory medicine clinic. Since these were retrospective cases, the patients’ information had already been recorded in electronic medical records, including gender, age, and other details. For the non-COPD group, we invited two senior attending physicians to individually assess all non-COPD patients who visited the respiratory medicine clinic between January 2018 and December 2022. We excluded the following situations: (1) repeat visits; (2) acute exacerbation of the disease; (3) exclusion of patients who had undergone lung surgery or had lung tumors, interstitial lung disease, or other diseases that affect lung function or produce clinical symptoms similar to COPD. The non-COPD group excluded in this manner will serve as the control group. The same method was used in all three hospitals.

For the recruitment of the external validation population, investigators regularly arranged for two senior attending physicians to visit the health examination center and Taozhuang Health Center to conduct pulmonary function tests on individuals undergoing routine health screenings on a voluntary basis. If a patient’s pulmonary function met the diagnostic criteria for COPD according to the GOLD 2019 guidelines and the patient was willing to participate, we invited the patient to sign an informed consent form and complete a questionnaire.

2.2 Inclusion criteria

Retrospective cases: patients with a definitive diagnosis of COPD.

Prospective cases: According to the GOLD 2019 guidelines (6), the diagnosis of COPD is primarily based on a history of exposure to risk factors, symptoms, signs, and clinical data, such as pulmonary function tests. It involves excluding other diseases that can cause similar symptoms and persistent airflow limitation and conducting a comprehensive analysis. Lung function tests showing persistent airflow limitation are necessary to confirm the diagnosis of COPD, with an FEV1/FVC ratio of <70% after bronchodilator inhalation, clearly indicating persistent airflow limitation.

2.3 Data acquisition

The clinical information collected in this study included baseline characteristics, medical history, laboratory tests, and clinical symptoms:

Baseline characteristics (6): age, sex, Body Mass Index (BMI) (7–10), history of smoking, history of exposure to secondhand smoke, family history of respiratory disease, and a definite diagnosis of COPD.

Medical history (11–13): history of hypertension, hyperlipidemia, diabetes mellitus, stroke, and osteoporosis;

Laboratory tests: white blood cell count (WBC) (14), platelet count (PLT), hemoglobin level (Hb), neutrophil percentage (NE%), red blood cell count (RBC) (15), eosinophil count (Eos), apolipoprotein, uric acid (UA), fasting blood glucose (FBG), and pulmonary function;

Clinical symptoms: cough, cough sputum (divided into three categories based on the amount of cough sputum: “0” indicates “no sputum or little sputum (sputum volume <50 mL)”; “1” indicates “moderate amount of sputum (sputum volume of 50–100 mL)”; “2” indicates “a lot of sputum (sputum volume >100 mL)”), and wheezing (divided into four categories based on the degree of wheezing: “0” indicates ‘“no significant wheezing”’; ‘1’ indicates ‘persistent episodes of wheezing, occurring at rest, unable to lie down’; ‘2’ indicates ‘wheezing episodes that worsen with movement’; and ‘3’ indicates ‘persistent episodes of wheezing, occurring at rest, unable to lie down’).

2.4 Sample size estimation, culling, and missing value treatment

According to the events per variable (EPV) principle, the minimum sample size required to build a predictive model is 10 times the number of variables included (16). Samples with > 10% missing values were excluded, and multiple interpolations were used to fill in the missing values.

2.5 Statistical analysis

A predictive model was developed and validated based on the TRIPOD guidelines (17). SAS (version 9.4) was used to randomly divide the retrospective data into a training set and an internal validation set at a ratio of 7:3. The training set, internal validation set, and external validation set were used for modeling, internal validation of the model, and external validation of the model, respectively.

IBM SPSS Statistics 26 was used to statistically analyze the data.

A systematic review (18) shows no performance benefit of machine learning over logistic regression for clinical prediction models. Furthermore, machine learning carries the risk of overfitting; thus, this study uses logistic regression to establish a prediction model.

First, in the training set, all independent variables were screened using univariate logistic regression to identify independent risk factors. All independent risk factors were then included in a multivariate logistic regression analysis, and the final predictive model was obtained through backward stepwise regression. This model was then applied to the internal and external validation sets for validation. The model’s performance was assessed by calculating the AUC under the ROC curve. Additional evaluation indices included accuracy, sensitivity, specificity, negative predictive value (NPV), and positive predictive value (PPV). A nomogram was constructed, and the calibration of the model was assessed using calibration curves. Finally, risk stratification of the model was performed in subgroups based on smoking history, BMI, pack-years of smoking, smoking cessation history, age at cessation, and EOS ratio (Eos%) (Figure 1).

Figure 1

Flowchart depicting the study of COPD and non-COPD populations. Data from 5,916 patients were collected from three provincial tertiary hospitals from January 2018 to December 2022. Samples with over ten percent missing values were excluded. Missing values included lung function, BMI, respiratory disease history, anamnesis, eosinophils, apolipoprotein, cough, and wheezing. After multiple imputation, 1,056 samples remained (570 COPD, 486 non-COPD), divided into a 7:3 ratio for internal validation (316) and training (740). The model was externally validated with 408 new patients. Evaluation involved subgroup analyses.

A technology roadmap of the this study.

3 Results

3.1 Study sample

In this study, 5,916 individuals were initially retrospectively included. After excluding 755 cases with pulmonary function deficiency, 590 cases with BMI deficiency, 254 cases with a family history of respiratory disease, 1,332 cases with a history of hypertension, hyperlipidemia, diabetes mellitus, and stroke, 1,657 cases with EOS deficiency, 247 cases with apolipoprotein deficiency, and 25 cases with coughing, sputum, and wheezing deficiencies, a final sample of 1,056 participants were included for analysis.

Multiple imputation was performed on the 1,056 samples and averaged across five imputations. There were no statistically significant differences in the data before and after interpolation (p > 0.05), as shown in Table 1.

Table 1

CharacteristicsMissing values N (%)InterpolationInterpolationStatisticp-value
PLT count M(Q1, Q3)1 (0.09)205.00 (162.00, 254.00)204.50 (162.00, 254.00)Z = 0.0180.986
Hb content Mean±SD1 (0.09)128.44 ± 18.89128.41 ± 18.91t = 0.040.970
NE % Mean±SD3 (0.28)61.94 ± 18.7061.91 ± 18.76t = 0.040.969
FBG Mean±SD66 (6.25)5.44 ± 1.545.45 ± 1.58t = −0.180.859
Second-hand smoke n (%)48 (4.55)χ2 = 0.0060.938

Sensitivity analysis before and after missing value interpolation.

Additionally, a total of 408 patients with COPD with complete data were prospectively included in this study.

3.2 Establishment and validation of the prediction model for COPD

3.2.1 Comparison of equivalence between the training set and internal validation set

Using SAS 9.4, all 1,056 cases were randomly split with a random seed into a training set comprising 70% (n = 740) and an internal validation set comprising 30% (n = 316). There were no significant differences between the two groups in terms of age, sex, BMI, history of hypertension, hyperlipidemia, diabetes mellitus, stroke, osteoporosis, WBC count, PLT count, Hb content, neutrophil ratio, RBC count, apolipoprotein A, FBG, EOS count, history of smoking, exposure to secondhand smoke, family history of respiratory diseases, cough, sputum, and wheezing (Table 2).

Table 2

Characteristics N (%)General collection (n = 1,056)GroupsT/χ2/Zp-value
Internal validation set (n = 316)Training set (n = 740)
Age, Mean ± SD62.86 ± 14.8362.75 ± 15.1162.91 ± 14.72t = −0.160.871
Sexχ2 = 0.3960.529
 Male670 (63.45)205 (64.87)465 (62.84)
 Female386 (36.55)111 (35.13)275 (37.16)
BMI, Mean ± SD23.39 ± 4.6723.12 ± 4.0623.50 ± 4.91t = −1.290.199
BMIχ2 = 0.3040.859
 <18.599 (9.38)32 (10.13)67 (9.05)
 18.5–24544 (51.52)161 (50.95)383 (51.76)
 ≥24413 (39.11)123 (38.92)290 (39.19)
History of hypertensionχ2 = 0.0790.778
 Not have715 (67.71)212 (67.09)503 (67.97)
 There are341 (32.29)104 (32.91)237 (32.03)
History of hyperlipidemiaχ2 = 1.8430.175
 No1,037 (98.20)313 (99.05)724 (97.84)
 Yes19 (1.80)3 (0.95)16 (2.16)
History of diabetesχ2 = 0.0790.778
 No715 (67.71)212 (67.09)503 (67.97)
 Yes341 (32.29)104 (32.91)237 (32.03)
History of strokeχ2 = 0.3540.552
 No1,017 (96.31)306 (96.84)711 (96.08)
 Yes39 (3.69)10 (3.16)29 (3.92)
History of osteoporosisχ2 = 1.8410.175
 No1,038 (98.30)308 (97.47)730 (98.65)
 Yes18 (1.70)8 (2.53)10 (1.35)
Uric acid, Mean ± SD304.03 ± 89.93302.49 ± 88.11304.69 ± 90.75t = −0.360.715
WBC count, M (Q1, Q3)5.90 (4.70, 7.70)6.00 (4.90, 7.55)5.90 (4.70, 7.70)Z = 0.6550.513
PLT count, M (Q1, Q3)204.50 (162.00, 254.00)207.00 (164.00, 253.50)203.50 (161.50, 255.00)Z = 0.5070.612
Hb content, Mean ± SD128.41 ± 18.91129.84 ± 17.88127.81 ± 19.32t = 1.600.110
NE %, Mean ± SD61.91 ± 18.7661.23 ± 19.3262.20 ± 18.53t = −0.780.438
erythrocyte count, M (Q1, Q3)4.20 (3.80, 4.60)4.20 (3.90, 4.70)4.10 (3.80, 4.60)Z = 1.9250.054
Apolipoprotein A, M (Q1, Q3)1.30 (1.10, 1.50)1.30 (1.10, 1.50)1.30 (1.10, 1.50)Z = 0.2580.796
FBG, Mean ± SD5.45 ± 1.585.55 ± 1.615.41 ± 1.56t = 1.260.207
EOS count, M (Q1, Q3)0.10 (0.00, 0.45)0.11 (0.00, 0.40)0.10 (0.00, 0.47)Z = 0.3230.747
Cigarette smokingχ2 = 0.1120.738
 No766 (72.54)227 (71.84)539 (72.84)
 Yes290 (27.46)89 (28.16)201 (27.16)
Second-hand smoke exposureχ2 = 1.0370.308
 No897 (84.94)263 (83.23)634 (85.68)
 Yes159 (15.06)53 (16.77)106 (14.32)
Family history of respiratory disease-0.509
 No1,054 (99.81)315 (99.68)739 (99.86)
 Yes2 (0.19)1 (0.32)1 (0.14)
Coughχ2 = 0.0890.765
 No502 (47.54)148 (46.84)354 (47.84)
 Yes554 (52.46)168 (53.16)386 (52.16)
PhlegmZ = 1.3550.176
 0522 (49.43)144 (45.57)378 (51.08)
 1208 (19.70)70 (22.15)138 (18.65)
 2326 (30.87)102 (32.28)224 (30.27)
WheezingZ = 1.0850.278
 0539 (51.04)148 (46.84)391 (52.84)
 1178 (16.86)66 (20.89)112 (15.14)
 2321 (30.40)99 (31.33)222 (30.00)
 318 (1.70)3 (0.95)15 (2.03)
COPDχ2 = 2.3760.123
 No486 (46.02)134 (42.41)352 (47.57)
 Yes570 (53.98)182 (57.59)388 (52.43)

Equitability on training set and internal validation set.

t refers to t-test, χ2 refers to chi-square test, Z refers to Mann–Whitney U test, SD refers to standard deviation, M refers to median, Q1 refers to 1st quartile, and Q3 refers to 3rd quartile.

3.2.2 Basic characteristics of COPD and non-COPD patients in the training set

In the training set of 740 samples, there were 388 patients with COPD and 352 non-COPD patients. A comparison between the two groups revealed that patients with COPD exhibited higher levels of age, male sex, history of hypertension, history of diabetes mellitus, uric acid levels, WBC count, NE%, history of cigarette smoking, exposure to second-hand smoke, presence of cough, phlegm, and wheezing compared to non-COPD patients. However, patients with COPD had lower levels of BMI and apolipoprotein A compared to non-COPD patients (Table 3).

Table 3

Characteristics n (%)Total sample (n = 740)Whether COPDt/χ2/Zp-value
No (n = 352)Yes (n = 388)
Age, Mean ± SD62.91 ± 14.7254.69 ± 14.7270.37 ± 10.02t = −16.77<0.001
Sexχ2 = 124.288<0.001
 Male465 (62.84)148 (42.05)317 (81.70)
 Female275 (37.16)204 (57.95)71 (18.30)
BMI, Mean ± SD23.50 ± 4.9124.04 ± 4.9423.01 ± 4.84t = 2.870.004
BMIχ2 = 20.200<0.001
 <18.567 (9.05)15 (4.26)52 (13.40)
 18.5–24383 (51.76)184 (52.27)199 (51.29)
 ≥24290 (39.19)153 (43.47)137 (35.31)
History of hypertensionχ2 = 17.790<0.001
 No503 (67.97)266 (75.57)237 (61.08)
 Yes237 (32.03)86 (24.43)151 (38.92)
History of hyperlipidemiaχ2 = 0.4940.482
 No724 (97.84)343 (97.44)381 (98.20)
 Yes16 (2.16)9 (2.56)7 (1.80)
History of diabetesχ2 = 17.790<0.001
 No503 (67.97)266 (75.57)237 (61.08)
 Yes237 (32.03)86 (24.43)151 (38.92)
History of strokeχ2 = 0.4630.496
 No711 (96.08)340 (96.59)371 (95.62)
 Yes29 (3.92)12 (3.41)17 (4.38)
History of osteoporosis0.111
 No730 (98.65)350 (99.43)380 (97.94)
 Yes10 (1.35)2 (0.57)8 (2.06)
Uric acid, Mean ± SD304.69 ± 90.75295.61 ± 80.93312.94 ± 98.20t = −2.630.009
WBC count, M (Q1, Q3)5.90 (4.70, 7.70)5.55 (4.35, 6.70)6.30 (5.10, 8.30)Z = −5.975<0.001
PLT count, M (Q1, Q3)203.50 (161.50, 255.00)209.50 (162.00, 263.50)196.00 (161.00,246.00)Z = 1.4120.158
Hb content, Mean ± SD127.81 ± 19.32127.40 ± 19.15128.18 ± 19.48t = −0.550.585
NE%, Mean ± SD62.20 ± 18.5360.37 ± 16.1163.86 ± 20.35t = −2.600.010
RBC count, M (Q1, Q3)4.10 (3.80, 4.60)4.10 (3.80, 4.50)4.10 (3.80, 4.70)Z = −0.2400.810
Apolipoprotein A, M (Q1, Q3)1.30 (1.10, 1.50)1.40 (1.30, 1.60)1.20 (1.10, 1.40)Z = 8.590<0.001
FBG, Mean ± SD5.41 ± 1.565.34 ± 1.275.48 ± 1.78t = −1.210.228
EOS count, M (Q1, Q3)0.10 (0.00, 0.47)0.10 (0.00, 0.40)0.10 (0.00, 0.50)Z = −1.7190.086
Cigarette smokingχ2 = 128.920<0.001
 No539 (72.84)325 (92.33)214 (55.15)
 Yes201 (27.16)27 (7.67)174 (44.85)
Second-hand smokeχ2 = 95.141<0.001
 No634 (85.68)348 (98.86)286 (73.71)
 Yes106 (14.32)4 (1.14)102 (26.29)
Family history of respiratory disease1.000
 No739 (99.86)352 (100.00)387 (99.74)
 Yes1 (0.14)0 (0.00)1 (0.26)
Coughχ2 = 479.547<0.001
 No354 (47.84)317 (90.06)37 (9.54)
 Yes386 (52.16)35 (9.94)351 (90.46)
PhlegmZ = −19.407<0.001
 0378 (51.08)317 (90.06)61 (15.72)
 1138 (18.65)17 (4.83)121 (31.19)
 2224 (30.27)18 (5.11)206 (53.09)
WheezingZ = −19.690<0.001
 0391 (52.84)325 (92.33)66 (17.01)
 1112 (15.14)11 (3.13)101 (26.03)
 2222 (30.00)15 (4.26)207 (53.35)
 315 (2.03)1 (0.28)14 (3.61)

Basic characteristics of COPD and non-COPD in the training set.

t refers to t-test, χ2 refers to chi-square test, Z refers to Mann–Whitney U test, −: Fisher’s exact test, SD refers to standard deviation, M refers to median, Q1 refers to 1st quartile, Q3 refers to 3rd quartile.

3.2.3 Univariate logistic regression in the training set

The univariate logistic regression analysis of the training set revealed that the following factors were independent risk factors for COPD: age, sex, BMI, history of hypertension, history of diabetes, uric acid levels, WBC count, NE%, RBC count, apolipoprotein A, history of cigarette smoking, exposure to secondhand smoke, presence of cough, sputum, and wheezing (Table 4).

Table 4

CharacteristicsβS. EWaldOR (95% CI)p-value
Age0.1050.009151.3151.11 (1.09–1.13)<0.001
Sex
 MaleReference point
 Female−1.8170.170114.2630.16 (0.12–0.23)<0.001
BMI
 <18.51.1650.31014.0813.21 (1.74–5.89)<0.001
 18.5–24Reference point
 ≥24−0.1890.1561.4680.83 (0.61–1.12)0.226
History of hypertension
 NoReference point
 Yes0.6780.16217.5441.97 (1.43–2.71)<0.001
History of hyperlipidemia
 NoReference point
 Yes−0.3560.5090.4880.70 (0.26–1.90)0.485
History of diabetes
 NoReference point
 Yes0.6780.16217.5441.97 (1.43–2.71)<0.001
History of stroke
 NoReference point
 Yes0.2610.3840.4611.30 (0.61–2.76)0.497
History of osteoporosis
 NoReference point
 Yes1.3040.7942.6973.68 (0.78–17.47)0.101
Uric acid0.0020.0016.6571.01 (1.01–1.01)0.010
WBC count0.1830.03331.3081.20 (1.13–1.28)<0.001
PLT count−0.0010.0010.3111.00 (1.00–1.00)0.577
Hb content0.0020.0040.2991.00 (0.99–1.01)0.584
Ne%0.0100.0046.4411.01 (1.01–1.02)0.011
RBC count0.0770.0335.6011.08 (1.01–1.15)0.018
Apolipoprotein A−1.6590.28334.2950.19 (0.11–0.33)<0.001
FBG0.0570.0481.3971.06 (0.96–1.16)0.237
EOS count0.0600.0431.9201.06 (0.98–1.15)0.166
Cigarette smoking
 NoReference point
 Yes2.2810.225102.9669.79 (6.30–15.21)<0.001
Second-hand smoke
 NoReference point
 Yes3.4350.51644.32631.03 (11.29–85.29)<0.001
Family history of respiratory disease
 NoReference point
 Yes12.112447.330.0010.978
Cough
 NoReference point
 Yes4.4530.248321.95385.92 (52.82–139.75)<0.001
Phlegm
 0Reference point
 13.6110.294150.47436.99 (20.77–65.86)<0.001
 24.0850.283208.75659.47 (34.17–103.51)<0.001
Wheezing symptoms
 0Reference point
 13.8110.345122.03445.21 (22.99–88.91)<0.001
 24.2190.300198.37667.94 (37.77–122.20)<0.001
 34.2331.04416.44768.92 (8.91–533.09)<0.001

Training set one-factor logistics regression.

3.3 COPD predictive model and nomogram

All independent risk factors identified in the univariate logistic regression were included in a multivariate logistic regression analysis using backward stepwise regression to establish a predictive model for COPD. The final factors included in the predictive model were age, secondhand smoke exposure, coughing, and wheezing (Table 5).

Table 5

CharacteristicsβS. EWaldOR (95% CI)p-value
Constant−5.9200.86347.040<0.001
Age0.0470.01313.1771.05 (1.02–1.08)<0.001
Second-hand smoke
 NoReference point
 Yes2.1130.57113.6948.27 (2.70–25.34)<0.001
Cough
 NoReference point
 Yes3.1580.31799.29023.52 (12.64–43.77)<0.001
Wheezing
 0Reference point
 11.8010.43417.2166.06 (2.59–14.19)<0.001
 23.0630.37267.75521.40 (10.32–44.37)<0.001
 32.3961.2133.90010.97 (1.02–118.28)0.048

Variables included in final model.

The risk of COPD increased by 0.05-fold for each additional year of age (OR = 1.05, 95%CI: 1.02–1.08). Individuals exposed to secondhand smoke had a 7.27-fold higher risk of COPD compared to those without such exposure (OR = 8.27, 95%CI: 2.70–25.34). Patients with coughing showed a 22.52-fold increase in the risk of COPD (OR = 23.52, 95%CI: 12.64–43.77) compared to those without this symptom. Different levels of wheezing symptoms also indicated varying levels of COPD risk.

The formula for the final model is as follows: y = −5.920 + 0.047 (age) + 2.113 (history of secondhand smoke) + 3.158 (having cough) + 1.801 (wheezing symptom 1) + 3.063 (wheezing symptom 2) + 2.396 (wheezing symptom 3), logit(p) = , where p represents probability and logit(p) is distributed between 0 and 1. A higher logit(p) indicates a greater risk of COPD.

Using R 4.1.3, a nomogram was plotted where each diagnostic factor corresponds to a score (also called a point). The scores from these factors were summed to obtain a total score (total points), which correlates with the corresponding risk of COPD (Figure 2).

Figure 2

Nomogram chart assessing risk based on points, age, second-hand smoke exposure, cough, and gasp frequency. Points range from 0 to 100, age from 10 to 100 years, and gasp from 0 to 3. Risk is plotted from 0.1 to 0.9. Total points range from 0 to 350.

Nomogram of COPD prediction model.

3.4 Characterization of the external validation set

The external validation set comprised a total of 408 samples, consisting of 141 patients with COPD and 267 non-COPD individuals (Table 6).

Table 6

Characteristics N (%)General collection (n = 408)GroupT/χ2/zp-value
Non-copd (n = 267)Copd (n = 141)
Age, Mean ± SD66.71 ± 18.5064.08 ± 21.6571.70 ± 8.11t = −5.12<0.001
Sexχ2 = 11.684<0.001
 Male280 (68.63)168 (62.92)112 (79.43)
 Female128 (31.37)99 (37.08)29 (20.57)
BMI, Mean ± SD24.29 ± 12.3724.75 ± 14.9123.42 ± 4.73t = 1.330.184
Second-hand smokeχ2 = 3.6620.056
 No202 (49.51)123 (46.07)79 (56.03)
 Yes206 (50.49)144 (53.93)62 (43.97)
Coughχ2 = 7.1460.008
 No127 (31.13)95 (35.58)32 (22.70)
 Yes281 (68.87)172 (64.42)109 (77.30)
Wheezing symptoms<0.001
 0238 (58.33)190 (71.16)48 (34.04)
 195 (23.28)57 (21.35)38 (26.95)
 268 (16.67)16 (5.99)52 (36.88)
 37 (1.72)4 (1.50)3 (2.13)
WBC count, M (Q1, Q3)5.35 (4.50, 6.40)5.70 (4.70, 6.50)5.00 (4.50, 6.00)Z = −0.8770.381
PLT count, M (Q1, Q3)182.00 (153.00, 227.00)161.50 (147.00,
206.00)
224.0 (174.00,
252.00)
Z = 2.8520.004
RBC count, Mean ± SD4.52 ± 0.584.51 ± 0.554.53 ± 0.62t = −0.150.879
EOS count, M (Q1, Q3)0.32 (0.10, 1.70)1.00 (0.10, 1.90)0.14 (0.10, 0.20)Z = −2.0620.039
Cigarette smokingχ2 = 1.5450.214
 No192 (55.17)122 (52.81)70 (59.83)
 Yes156 (44.83)109 (47.19)47 (40.17)
Phlegmχ2 = 31.179<0.001
 0142 (35.24)116 (44.11)26 (18.57)
 1183 (45.41)111 (42.21)72 (51.43)
 278 (19.35)36 (13.69)42 (30.00)

Baseline comparison of external validation set.

3.5 Validation of prediction model

The model was tested on both internal and external validation sets to assess its discrimination and calibration.

3.5.1 Discrimination test

In the COPD prediction model, the area under the curve (AUC) for the training set was 0.964 (95% CI: 0.950–0.978), with an accuracy of 94.1%, a sensitivity of 98.5% and a specificity of 89.2%. For the internal validation set, the AUC was 0.976 (95% CI: 0.962–0.990), with an accuracy of 93.4%, a sensitivity of 96.2%, and a specificity of 89.6%. These results indicate that the model effectively discriminates samples from the same source and demonstrates excellent predictive capability for assessing the risk of COPD (Figure 3).

Figure 3

ROC curve showing three lines: green for the training set (AUC: 0.964), orange for the test set (AUC: 0.976), and blue for an outer set (AUC: 0.691). The curve plots sensitivity against 1-specificity. A diagonal dashed line represents a random classifier.

ROC curves of the model and its internal and external validation sets.

The AUC of the external validation set was 0.691 (95% CI: 0.962–0.990), with an accuracy of 49.3%, a sensitivity of 94.3%, and a specificity of 25.5%. Additionally, it demonstrated a PPV of 92.6% and an NPV of 94.5%, indicating a lack of generalization power for the model (Table 7).

Table 7

CharacteristicsTraining setInternal validation setExternal validation set
Cutoff0.2580.2580.258
AUC (95% CI)0.964 (0.950–0.978)0.976 (0.962–0.990)0.691 (0.638–0.744)
Accuracy (95% CI)0.941 (0.921–0.956)0.934 (0.900–0.958)0.493 (0.443–0.542)
Sensitivity (95% CI)0.985 (0.967–0.994)0.962 (0.922–0.984)0.943 (0.891–0.975)
Specificity (95% CI)0.892 (0.855–0.922)0.896 (0.831–0.942)0.255 (0.204–0.311)
NPV (95% CI)0.981 (0.960–0.993)0.945 (0.890–0.978)0.895 (0.803–0.953)
PPV (95% CI)0.910 (0.878–0.935)0.926 (0.879–0.959)0.401 (0.347–0.456)

Results of predictive model and internal and external validation.

The cutoff value of the predictive model was 0.258, meaning that when logit(p) was > 0.258, the individual can be diagnosed with COPD according to the model; otherwise, they are not diagnosed with COPD.

3.5.2 Calibration test

A calibration curve was constructed to determine the consistency of the logistic regression model (19). The ideal curve aligns closely with the bias-connected curve, indicating excellent calibration of the model (Figures 46).

Figure 4

Calibration plot showing actual probability versus predicted probability. Three lines are depicted: a dotted line for apparent probability, a solid line for bias-corrected probability, and a dashed line for ideal probability. The plot illustrates the calibration of a predictive model.

Training set calibration curve.

Figure 5

Calibration plot showing predicted probability on the x-axis and actual probability on the y-axis. Three lines represent apparent, bias-corrected, and ideal probabilities. The bias-corrected line closely follows the ideal line, indicating good model calibration.

Internal validation set calibration curve.

Figure 6

Calibration curve comparing predicted probability versus actual probability. Three lines are shown: apparent (dotted), bias-corrected (solid), and ideal (dashed). The bias-corrected line closely follows the ideal line initially, then diverges. The x-axis is labeled "Predicted Probability," and the y-axis is labeled "Actual Probability."

External verification calibration curve.

3.6 Stratified analyses based on some risk factors

During the modeling process, certain variables had to be excluded due to excessive missing values. However, based on guidelines and numerous previous studies, BMI (20–24), smoking history (3, 25–27), and smoking cessation history (28, 29) may be high-risk factors for COPD development and may play a significant role in the diagnosis and evaluation of COPD. Therefore, in this study, the model was applied across various subgroups of the population. The results indicated that, except for variables with insufficient data for fitting, the model had demonstrated robust predictive capability across populations with or without a history of smoking, different BMI levels, varying smoking cessation histories, ≥40 pack-years of smoking, cessation of smoking at age <65 years, and different percentages of EOS (Table 8).

Table 8

SubgroupAUC (95% CI)Accuracy (95% CI)Sensitivity (95% CI)Specificity (95% CI)
Cigarette smoking
 No0.964 (0.918–1.000)0.955 (0.889–0.988)0.974 (0.910–0.997)0.818 (0.482–0.977)
 Yes0.977 (0.961–0.992)0.925 (0.883–0.956)0.952 (0.891–0.984)0.902 (0.836–0.949)
BMI
 <18.51.000 (1.000–1.000)1.000 (0.891–1.000)1.000 (0.872–1.000)1.000 (0.478–1.000)
 18.5–23.90.978(0.959–0.996)0.929 (0.877–0.964)0.955 (0.888–0.987)0.938 (0.850–0.983)
 24–27.90.986 (0.968–1.000)0.938 (0.870–0.977)0.980 (0.891–0.999)0.896 (0.773–0.965)
 ≥280.896 (0.764–1.000)0.800 (0.593–0.932)0.857 (0.572–0.982)0.727 (0.390–0.940)
Pack-years of smokinga
 <40
 ≥400.979 (0.921–1.000)0.947 (0.740–0.999)1.000 (0.794–1.000)0.667 (0.094–0.992)
History of quitting smoking
 Yes0.975 (0.916–1.000)0.933 (0.779–0.992)1.000 (0.872–1.000)0.333 (0.008–0.906)
 No0.963 (0.903–1.000)0.949 (0.859–0.989)0.961 (0.865–0.995)0.875 (0.473–0.997)
Age of cessation
 <650.976 (0.962–0.990)0.928 (0.892–0.954)0.959 (0.917–0.983)0.888 (0.822–0.936)
 ≥65
Eos %
 <2%0.978 (0.964–0.991)0.957 (0.914–0.983)0.893 (0.823–0.942)0.939 (0.879–0.975)
 ≥2%0.953 (0.878–1.000)1.000 (0.815–1.000)0.846 (0.546–0.981)1.000 (0.715–1.000)

Risk stratification of the model.

- indicates insufficient frequency to fit.

a

indicates grouping based on median.

4 Discussion

With China’s economic and social development and the increasing aging population, the elderly population is growing rapidly in China. Attention to the health of the elderly has gradually shifted to a greater focus on disease prevention, improving individual function, promoting good health, and prolonging healthy life expectancy. Healthy China 2030 (30) emphasizes that COPD is characterized by high prevalence, disability, mortality, and disease burden.

In this study, we developed a predictive model for COPD using large sample retrospective data, identified four reliable risk factors for COPD, and derived predictive formulas. Following discrimination and calibration tests, the formulas accurately predicted the probability of COPD development within the same sample source while demonstrating average diagnostic effectiveness in external populations.

Age is a significant risk factor for COPD. The higher the age, the greater the prevalence, likely due to age-related decline in lung function and cumulative exposure to environmental pollutants such as tobacco smoke (31). COPD is highly prevalent in individuals aged over 40 years. According to a 2018 study from the Chinese Adult Lung Health Study (3), the prevalence of COPD among individuals aged over 40 years in China was reported to be 13.7%. In our study, the average age of patients with COPD included in the modeling was 70.37 years old. The coefficient of age in the final model was 0.047, indicating a positive correlation between age and COPD risk. This finding reaffirms the demographic distribution characteristics of COPD and underscores the impact of age on its development.

The effects of age on COPD are mainly reflected in the following aspects. First, there is a natural decline in lung function as individuals age. This decline includes reduced respiratory function, decreased alveolar elasticity, thinning of the alveolar wall, and increased airway resistance, leading to the emergence of symptoms such as dyspnea and cough. Second, aging correlates with declining nutritional status (9), impacting food intake and absorption abilities. For patients with COPD, body functions are in a high state of decomposition, leading to increased daily energy expenditure, and a significantly increased risk of malnutrition. Long-term malnutrition leads to muscle atrophy, especially the atrophy of the respiratory muscles, which makes the lungs less compliant and causes a decline in pulmonary ventilation (32). Third, as previously mentioned, aging increases the risk of decreased nutritional status. Without adequate nutrition, the immune system cannot function properly and the risk of lung infection is increased (33). In the elderly, each infection poses a significant threat to lung function, and the resultant damage is difficult to reverse. In patients with COPD, inflammatory irritation of the airways persists, and airways are constantly remodeling (34). Repeated infections exacerbate inflammatory and airway remodeling, further worsening pre-existing airway obstruction.

The primary components of tobacco are tar and nicotine, which cause inflammation, oxidative stress, and apoptosis. Cigarette smoke induces chronic inflammatory responses throughout the body by increasing the levels of inflammatory factors such as IL-1, IL-6, and TNF-α (35). The brain is highly sensitive to hypoxia, and cigarette smoke aggravates pulmonary ventilation and hypoxemia (31). This situation further slows cellular metabolism and promotes neuronal apoptosis (36).

COPD is a heterogeneous state of the lungs characterized by persistent airflow obstruction due to airway and/or alveolar abnormalities, often accompanied by chronic cough. Pathological changes in COPD involve the airways, lung parenchyma, and blood vessels. Airway alterations, in particular, play a significant role in causing cough (34), as they sustain persistent inflammation leading to mucus hypersecretion and ciliary dysfunction (37). However, narrowing of the airways makes it difficult to expel sputum in the lungs, which in turn stimulates the airways and causes cough. Many patients with COPD also experience allergic diseases, such as asthma and allergic rhinitis, which heighten airway receptor sensitivity and exacerbate cough due to allergic triggers. Patients with COPD are susceptible to bacterial and viral infections due to decreased immunity, further stimulating the airways to cause coughing.

Wheezing is common in patients with COPD, especially in severe disease or acute exacerbation. This study categorized wheezing into four distinct levels of symptoms to assess its diagnostic utility for mild COPD. The results showed that the different levels of wheezing symptoms were diagnostic factors of COPD, suggesting that the presence of wheezing symptoms holds diagnostic significance for identifying COPD once they manifest.

While several prediction models for COPD have been developed in China, most of them focused on studying risk factors for acute exacerbation and have been conducted within specific medical units or regions. In contrast, the present study is a multi-center clinical study with modeling samples from provincial-level tertiary hospitals in Zhejiang, Jiangxi, and Chengdu. This approach has allowed us to achieve a larger sample size, enhancing the regional representativeness and practical application of our final model. The external validation set utilized data from medical examination centers and health centers affiliated with tertiary hospitals, ensuring sample diversity across a broad spectrum. This approach effectively demonstrates whether our model can be widely applied in clinical settings.

China is a country with a high prevalence of COPD. Although lung function is an important basis for diagnosing COPD, many regions lack the conditions for lung function testing. Therefore, we aim to establish a predictive model that incorporates symptoms and routine biological indicators as much as possible as such a model would have broader application potential. For example, during annual physical examinations, if a doctor assesses that a patient has reached the high-risk threshold predicted by the model, they can refer the patient for pulmonary function testing. Additionally, the model can be used to stratify the risk of COPD among the examined population, thereby better assessing the risk of COPD.

However, the sample for this study is not yet sufficient, especially in terms of external validation specificity. One main limitation is sample selection bias. The retrospective data used to build our model came from three provinces in eastern and southwestern China, but due to the sudden outbreak of COVID-19, we were only able to include external validation data from one province in eastern China, resulting in sample bias, which we deeply regret. Additionally, since we used large-scale retrospective data to build the model, many indicators had to be excluded due to data missingness exceeding 10%, though we still analyzed some indicators we deemed important in risk stratification, which is another contributing factor. This model is suitable for the elderly population, which is one of its limitations. Although the model is biased, it is based on a multicenter design and has undergone rigorous validation, and we believe it still has significant value.

In comparison to a study published in Lancet Respiratory Medicine in 2020 (38), they developed a predictive tool to forecast, at an individual level, the rate and severity of COPD exacerbations, reported on its performance in an independent external cohort, and explained, using case studies, its potential clinical application. In 2022 (39), Thorax published an article using causal machine learning to explore the impact of individualized treatment on COPD exacerbations. These two studies suggest that identifying individual responses to COPD progression, exacerbations, and treatment may be more valuable for clinical diagnosis and management of COPD. This provides significant inspiration for our future COPD research. However, our team has not ceased clinical research on COPD. We continue to enroll COPD patients from different provinces and try to develop a more adaptive predictive model, even a digital diagnostic tool.

5 Conclusion

We have developed a predictive model for COPD for clinical use, enabling healthcare professionals, especially those in primary care settings, to quickly and conveniently assess the risk of COPD, thereby promoting timely diagnosis and treatment. However, this model still needs further verification. Until the model is more refined, it is recommended to use it with caution.

Statements

Data availability statement

The datasets presented in this article are not readily available because all data are stored in the First Affiliated Hospital of Zhejiang Chinese Medical University, Hangzhou, China. The data used and/or analyzed during the current study can be obtained from corresponding author. However, the data are not publicly available due to privacy or ethical restrictions. Requests to access the datasets should be directed to WZ, .

Ethics statement

The studies involving humans were approved by the First Affiliated Hospital of Zhejiang Chinese Medical University. The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.

Author contributions

YW: Visualization, Data curation, Validation, Writing – original draft. YL: Writing – original draft, Visualization, Data curation. QL: Methodology, Data curation, Writing – review & editing. RZ: Writing – review & editing, Funding acquisition, Project administration. BY: Investigation, Validation, Writing – review & editing. HX: Writing – review & editing, Investigation, Resources. XQ: Validation, Writing – review & editing. YY: Methodology, Data curation, Writing – original draft. KN: Writing – review & editing, Validation, Investigation. JZ: Data curation, Writing – review & editing. XM: Data curation, Writing – review & editing. RG: Project administration, Resources, Data curation, Writing – review & editing, Validation. ZW: Validation, Funding acquisition, Writing – review & editing, Investigation, Supervision, Resources, Data curation, Methodology, Project administration.

Funding

The author(s) declare that financial support was received for the research and/or publication of this article. This study was funded by National Key R&D Program of China (2018YFC200 2500).

Acknowledgments

We would like to thank Jinan University, the Affiliated Hospital of Chengdu Chinese Medical University, the Affiliated Hospital of Jiangxi Chinese Medical University, and The First People’s Hospital of Jiashan County.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The authors declare that no Gen AI was used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

  • 1.

    Global Initiative for Chronic Obstructive Lung Disease. Global strategy for the diagnosis, management, and prevention of chronic obstructive pulmonary disease (2023). Available online at: https://goldcopd.org/ (Accessed January 15, 2023).

  • 2.

    WHO. Mortality and global health estimates. Available online at: https://www.who.int/data/gho/data/themes (Accessed March 4, 2022).

  • 3.

    WangCXuJYangLXuYZhangXBaiCet al. Prevalence and risk factors of chronic obstructive pulmonary disease in China (the China pulmonary health [CPH] Study): a national cross-sectional study. Lancet. (2018) 391:170617. doi: 10.1016/S0140-6736(18)30841-9

  • 4.

    YahongC. Interpretation of the GOLD global strategy for the diagnosis, treatment, and prevention of chronic obstructive pulmonary disease 2021. Chin J Front Med. (2021) 13:1637. doi: 10.12037/YXQY.2021.01-02

  • 5.

    ChristensonSASmithBMBafadhelMPutchaN. Chronic obstructive pulmonary disease. Lancet. (2022) 399:2227–42. doi: 10.1016/S0140-6736(22)00470-6

  • 6.

    Global Initiative for Chronic Obstructive Lung Disease. Global strategy for the diagnosis,management and prevention of chronic obstructive pulmonary disease 2019 report. Available online at: https://goldcopd.org/gold-reports/ (Accessed December 2, 2018).

  • 7.

    LiuSZhouYWangXWangDLuJZhengJet al. Biomass fuels are the probable risk factor for chronic obstructive pulmonary disease in rural South China. Thorax. (2007) 62:88997. doi: 10.1136/thx.2006.061457

  • 8.

    ZhongNWangCYaoWChenPKangJHuangSet al. Prevalence of chronic obstructive pulmonary disease in China: a large, population-based survey. Am J Respir Crit Care Med. (2007) 176:75360. doi: 10.1164/rccm.200612-1749OCErratum in: Am J Respir Crit Care Med. 2007; 176 (11): 1169

  • 9.

    ZhouYWangDLiuSLuJZhengJZhongNet al. The association between BMI and COPD: the results of two population-based studies in Guangzhou, China. COPD. (2013) 10:56772. doi: 10.3109/15412555.2013.781579

  • 10.

    ZhangXChenHGuKChenJJiangX. Association of body mass index with risk of chronic obstructive pulmonary disease: a systematic review and meta-analysis. COPD. (2021) 18:10113. doi: 10.1080/15412555.2021.1884213

  • 11.

    AlterPLuckeTWatzHAndreasSKahnertKTrudzinskiFCet al. Cardiovascular predictors of mortality and exacerbations in patients with COPD. Sci Rep. (2022) 12:21882. doi: 10.1038/s41598-022-25938-0

  • 12.

    AlterPKahnertKTrudzinskiFCBalsRWatzHSpeicherTet al. Disease progression and age as factors underlying multimorbidity in patients with COPD: results from COSYCONET. Int J Chron Obstruct Pulmon Dis. (2022) 17:170313. doi: 10.2147/COPD.S364812

  • 13.

    CazzolaMBettoncelliGSessaECricelliCBiscioneG. Prevalence of comorbidities in patients with chronic obstructive pulmonary disease. Respiration. (2010) 80:1129. doi: 10.1159/000281880

  • 14.

    HanZHuHYangPLiBLiuGPangJet al. White blood cell count and chronic obstructive pulmonary disease: a Mendelian randomization study. Comput Biol Med. (2022) 151:106187. doi: 10.1016/j.compbiomed.2022.106187

  • 15.

    HuangYWangJShenJMaJMiaoXDingKet al. Relationship of red cell index with the severity of chronic obstructive pulmonary disease. Int J Chron Obstruct Pulmon Dis. (2021) 16:82534. doi: 10.2147/COPD.S292666

  • 16.

    PeduzziPConcatoJFeinsteinARHolfordTR. Importance of events per independent variable in proportional hazards regression analysis. II. Accuracy and precision of regression estimates. J Clin Epidemiol. (1995) 48:150310. doi: 10.1016/0895-4356(95)00048-8

  • 17.

    SnellKILevisBDamenJADhimanPDebrayTPHooftLet al. Transparent reporting of multivariable prediction models for individual prognosis or diagnosis: checklist for systematic reviews and meta-analyses (TRIPOD-SRMA). BMJ. (2023) 381:e073538. doi: 10.1136/bmj-2022-073538

  • 18.

    ChristodoulouEMaJCollinsGSSteyerbergEWVerbakelJYVan CalsterB. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J Clin Epidemiol. (2019) 110:1222. doi: 10.1016/j.jclinepi.2019.02.004

  • 19.

    DenguezliMDaldoulHHarrabiIGnatiucLCotonSBurneyPet al. COPD in nonsmokers: reports from the Tunisian population-based burden of obstructive lung disease study. PLoS One. (2016) 11:e151981. doi: 10.1371/journal.pone.0151981

  • 20.

    SobrinoEIrazolaVEGutierrezLChenCSLanasFCalandrelliMet al. Estimating prevalence of chronic obstructive pulmonary disease in the southern cone of Latin America: how different spirometric criteria may affect disease burden and health policies. BMC Pulm Med. (2017) 17:18796. doi: 10.1186/s12890-017-0537-9

  • 21.

    SternDAMorganWJWrightALGuerraSMartinezFD. Poor airway function in early infancy and lung function by age 22 years: a non-selective longitudinal cohort study. Lancet. (2007) 370:75864. doi: 10.1016/S0140-6736(07)61379-8

  • 22.

    SkripakJM. Persistent effects of maternal smoking during pregnancy onlung function and asthma in adolescents. Pediatrics. (2014) 134:S146. doi: 10.1542/peds.2014-1817X

  • 23.

    JoehanesRJustACMarioniREPillingLCReynoldsLMMandaviyaPRet al. Epigenetic signatures of cigarette smoking. Circ Cardiovasc Genet. (2016) 9:43647. doi: 10.1161/CIRCGENETICS.116.001506

  • 24.

    ImbodenMWielscherMRezwanFIAmaralAFSSchaffnerEJeongAet al. Epigenome-wide association study of lung function level and its change. Eur Respir J. (2019) 54:1900457. doi: 10.1183/13993003.00457-2019

  • 25.

    ForeyBAThorntonAJLeePN. Systematic review with meta-analysis of the epidemiological evidence relating smoking to COPD, chronic bronchitis and emphysema. BMC Pulm Med. (2011) 11:36. doi: 10.1186/1471-2466-11-36

  • 26.

    WheatonAGLiuYCroftJBVanFrankBCroxtonTLPunturieriAet al. Chronic obstructive pulmonary disease and smoking status-United States, 2017. MMWR Morb Mortal Wkly Rep. (2019) 68:5338. doi: 10.15585/mmwr.mm6824a1

  • 27.

    DiverWRJacobsEJGapsturSM. Secondhand smoke exposure in childhood and adulthood in relation to adult mortality among never smokers. Am J Prev Med. (2018) 55:34552. doi: 10.1016/j.amepre.2018.05.005

  • 28.

    HeYJiangBLiLSLiLSKoLWuLet al. Secondhand smoke exposure predicted COPD and 43other tobacco-related mortality in a 17-year cohort study in China. Chest. (2012) 142:90918. doi: 10.1378/chest.11-2884

  • 29.

    KorsbækNLandtEMDahlM. Second-hand smoke exposure associatedwith risk of respiratory symptoms, asthma, and copd in 20, 421 adults from the general population. J Asthma Allergy. (2021) 14:127784. doi: 10.2147/JAA.S328748

  • 30.

    The Communist Party of China Central Committee the State Council. Healthy China 2030 blueprint. Available online at: http://www.gov.cn/xinwen/2016-10/25/content_5124174.htm (Accessed December 23, 2020).

  • 31.

    GrahnKGustavssonPAnderssonTLindénAHemmingssonTSelanderJet al. Occupational exposure to particlesand increased risk of developing chronic obstructive pulmonary disease (COPD): a population-based cohort study in Stockholm, Sweden. Environ Res. (2021) 200:111739. doi: 10.1016/j.envres.2021.111739

  • 32.

    LangerDCiavagliaCFaisalAWebbKANederJAGosselinkRet al. Inspiratory muscle training reduces diaphragm activation and dyspnea during exercise in COPD. J Appl Physiol. (1985) 125:38192. doi: 10.1152/japplphysiol.01078.2017

  • 33.

    HoltjerJCBloemsmaLDBeijersRJCornelissenMEHilveringBHouwelingLet al. Identifying risk factors for COPD and adult-onset asthma: an umbrella review. Eur Respir Rev. (2023) 32:230009. doi: 10.1183/16000617.0009-2023

  • 34.

    DeySEapenMSChiaCGaikwadAVWarkPASohalSS. Pathogenesis, clinical features of asthma COPD overlap, and therapeutic modalities. Am J Physiol Lung Cell Mol Physiol. (2022) 322:L6483. doi: 10.1152/ajplung.00121.2021

  • 35.

    LytrasTKogevinasMKromhoutHCarsinAEAntóJMBentouhamiHet al. Occupational exposures and 20-year incidence of COPD: the European Community respiratory health survey. Thorax. (2018) 73:100815. doi: 10.1136/thoraxjnl-2017-211158

  • 36.

    XieWDumasOVarrasoRBoggsKMCamargoCAJrStokesAC. Association of occupational exposure to inhaled agents in operating rooms with incidence of chronic obstructive pulmonary disease among US female nurses. JAMA Netw Open. (2021) 4:e2125749. doi: 10.1001/jamanetworkopen.2021.25749

  • 37.

    GBD 2019 Risk Factors Collaborators. Global burden of 87 risk factors in 204 countries and territories, 1990-2019: a systematic analysis for the global burden of disease study 2019. Lancet (London, England). (2020) 396:122349. doi: 10.1016/S0140-6736(20)30752-2

  • 38.

    AdibiASinDDSafariAJohnsonKMAaronSDFitz GeraldJMet al. The acute COPD exacerbation prediction tool (ACCEPT): a modelling study. Lancet Respir Med. (2020) 8:101321. doi: 10.1016/S2213-2600(19)30397-2

  • 39.

    VerstraeteKGyselinckIHutsHDasNTopalovicMDe VosMet al. Estimating individual treatment effects on COPD exacerbations by causal machine learning on randomised controlled trials. Thorax. (2023) 78:9839. doi: 10.1136/thorax-2022-219382

Summary

Keywords

chronic obstructive pulmonary disease, predictive model, risk factor, chronic obstructive pulmonary disease (COPD), clinical analysis

Citation

Wang Y, Lv Y, Li Q, Zhang R, Yan B, Xue H, Qian X, Yang Y, Ni K, Zhong J, Meng X, Gao R and Wang Z (2025) Development and validation of a predictive model for COPD: a multicenter study. Front. Med. 12:1615642. doi: 10.3389/fmed.2025.1615642

Received

21 April 2025

Accepted

25 August 2025

Published

09 September 2025

Volume

12 - 2025

Edited by

Shabana Urooj, Princess Nourah bint Abdulrahman University, Saudi Arabia

Reviewed by

Monica Ewomazino Akokuwebe, University of the Witwatersrand, South Africa

Yunhuan Liu, Tongji University, China

Updates

Copyright

*Correspondence: Rundi Gao, Zhen Wang,

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Figures

Cite article

Copy to clipboard


Export citation file


Share article

Article metrics