Predicting atrial fibrillation in patients with acute respiratory failure using machine learning: application of the MIMIC-III and MIMIC-IV datasets

Li, Rixuan

doi:10.3389/fcvm.2025.1696609

ORIGINAL RESEARCH article

Front. Cardiovasc. Med., 09 October 2025

Sec. Cardiac Rhythmology

Volume 12 - 2025 | https://doi.org/10.3389/fcvm.2025.1696609

This article is part of the Research TopicPrecision Strategies for Atrial Fibrillation: Diagnosis, Risk, and Treatment InnovationsView all 11 articles

Predicting atrial fibrillation in patients with acute respiratory failure using machine learning: application of the MIMIC-III and MIMIC-IV datasets

Rixuan Li*

Jinzhou Medical University, Jinzhou, China

Background: Acute respiratory failure (ARF) and atrial fibrillation (AF) are common diseases. This study established a predictive model for the risk of atrial fibrillation in patients with ARF, aiming to provide tools for clinical application.

Methods: This study examined the data of 21,594 patients in the MIMIC-IV database, including factors such as age, vital signs, and laboratory results on the first day of admission. Six feature selection techniques and six machine learning algorithms were used to construct the prediction model, and then the prediction model was verified using the MIMIC-III database. Evaluate the performance of the model through the comparison of results.

Results: A total of 59 predictor variables were identified, among which age was the most important factor. These variables are used to establish predictive models. The verification results show that the XGBoost model (AUC: 0.816) and the Random Forest (RF) model (AUC: 0.822) have the best performance. This study presents the first predictive model for atrial fibrillation in patients with acute respiratory failure.

Conclusions: Both the XGBoost and RF models demonstrated outstanding performance. These findings will make significant contributions to the diagnosis of clinical complications and the resolution of public health issues.

Introduction

Respiratory failure is a respiratory disorder characterized by hypoxemia or hypercapnia (1). Atrial fibrillation (AF) is the most prevalent type of persistent arrhythmia in the circulatory system. The incidence of AF in patients with respiratory failure is between 10% and 15%, and it can reach as high as 50% in patients with type 2 respiratory failure (2, 3). Factors such as atrial fibrosis, aging, hypertension, obesity, diabetes, and genetic predisposition play a significant role in triggering AF (4). Bemis et al. propose that hypoxemia and hypercapnia are risk factors for AF. In hypercapnic patients, factors such as intrathoracic pressure, autonomic nervous system fluctuations, atrial stretching, and remodeling play a role in the development of AF. Furthermore, elevated pulmonary artery pressure in respiratory failure patients leads to right ventricular hypertension and right atrial enlargement, which in turn triggers AF (5). AF can result in serious complications, such as stroke, coronary heart disease, cognitive impairment, and even dementia, suggesting that respiratory failure may trigger AF by impacting the pulmonary veins (6–8).

Schüttler et al. simulated human atrial fibrillation using modeling techniques (9). Roussos et al. highlighted that patients with respiratory failure often experience neural suppression and neuromuscular conduction disorders, which may be indicative of neurotransmitter disturbances associated with AF and myocardial dysfunction (10). However, the mechanisms by which respiratory failure induces AF remain poorly understood.

Although the diagnosis of respiratory failure and AF is well-established (11–13), two-thirds of AF patients report shortness of breath, making it difficult to promptly identify AF in patients with respiratory failure (14). Consequently, developing an early prediction model to prevent or predict AF in these patients is crucial for improving clinical outcomes.

Machine learning has become a powerful tool in medicine, driving significant progress across various medical disciplines (15). Previous AF prediction models have focused on ICU patients (16) and are limited by cohort heterogeneity, which can reduce accuracy and generalizability. This study includes both ICU and non-ICU patients but targets a single disease, minimizing heterogeneity and improving the model's sensitivity and applicability.To date, no predictive model exists for forecasting acute respiratory failure in patients with pneumonia. This study aims to fill this critical research gap by developing a machine learning-based model for early detection of AF in patients with acute respiratory failure. The main objective of this model is to help healthcare professionals proactively prevent or detect AF early, facilitating personalized treatment plans, improving patient outcomes, reducing healthcare costs, and potentially saving lives.

To date, predictive models specifically designed to identify atrial fibrillation (AF) in patients with acute respiratory failure remain scarce. Existing research has predominantly focused on general AF risk factors or populations with cardiovascular disease, leaving a critical research gap in this high-risk cohort. To our knowledge, this study represents the first development and validation of a machine learning model specifically designed to predict AF in this distinct population. By integrating comprehensive clinical, laboratory, and physiological variables, our model demonstrates robust predictive performance, offering an innovative and clinically actionable tool for early risk stratification in patients with acute respiratory failure.

This study utilizes the MIMIC-Ⅳ and MIMIC-Ⅲ databases alongside machine learning techniques to create a predictive model for the development of AF in patients with acute respiratory failure.

Materials and methods

Study population

This retrospective observational study extracted data from patients diagnosed with acute respiratory failure (ARF) using the MIMIC-IV (version3.1) and MIMIC-III (version1.4) databases. At present, there are no international guidelines for diagnosing respiratory failure. However, as mentioned earlier, the diagnosis of respiratory failure and atrial fibrillation (AF) is widely recognized. The main purpose of this study is to rule out the diagnosis of respiratory failure before AF, thereby laying the foundation for further research.

The diagnostic criteria for acute respiratory failure (ARF) are as follows: Type I ARF is characterized by arterial oxygen partial pressure (PaO₂) ≤60 mmHg and arterial carbon dioxide partial pressure (PaCO₂) ≥45 mmHg, while Type II ARF is characterized by PaCO₂ ≥45 mmHg. In the MIMIC database, it is defined by specific ICD-10 and ICD-9 codes (e.g., 51881, J95821).

Diagnostic criteria for atrial fibrillation (AF) include a history of diagnosed or undiagnosed AF, an electrocardiogram (ECG) upon admission showing arrhythmia, irregular heartbeats, and absent P waves, as well as a history of AF. In leads II, III, and aVF, fibrillation waves replace P waves. Additionally, rapid ventricular response (RVR) or a normal ventricular rate may be present, particularly in patients with chronic atrial fibrillation.

By identifying the diagnosis through specific label (seq_num) in the MIMIC database, the sequence of events in the patient's illness is determined, thereby establishing the occurrence of atrial fibrillation following acute respiratory failure. Furthermore, based on the label (seq_num) and diagnosis, all patients were free of atrial fibrillation upon admission.

Data collection

In this retrospective study, patient information was obtained from the MIMIC-IV (version 3.1) (17) and MIMIC-III (version 1.4) (18) databases. Extract the following data points for analysis:

Demographic information

Specific demographic details of the patient, including age, gender and other identifying characteristics.

Vital signs and laboratory results on the first day of admission

Vital signs: body temperature, respiratory rate, systolic blood pressure, diastolic blood pressure, mean arterial pressure, weight.

Laboratory inspection White blood cell count, hemoglobin, platelet count, urea nitrogen, serum creatinine, blood glucose, serum sodium, serum chlorine, serum potassium, hematocrit, eosinophils, basophils, neutrophils, monocytes, lymphocytes, serum calcium, international normalized ratio, prothrombin time, activated partial thromboplastin time, bicarbonate concentration, anion gap Alanine aminotransferase, aspartate aminotransferase, alkaline phosphatase, bilirubin.

Scores and Indices:

Glasgow Coma Scale (GCS)

Simplified Acute Physiology Score II (SAPS II)

Acute Physiology Score III (APS III)

Sequential Organ Failure Assessment (SOFA)

Oxford Acute Severity of Illness Score (OASIS)

Comorbidities:

Congestive heart failure

Cerebrovascular disease

Liver disease

Kidney disease

AIDS

Chronic lung disease

Diabetes

Paralysis

Malignancies

Metastatic solid tumors

Peptic ulcer

Dementia

Rheumatic diseases

Blood Gas Analysis:

Partial pressure of oxygen (PaO2)

Partial pressure of carbon dioxide (PaCO2)

Oxygen saturation

Lactate

Base excess

Total carbon dioxide (TCO2)

Other Variables:

Mechanical ventilation status

Oxygen flow rate

Whether elective surgery was performed

Length of hospitalization prior to ICU admission

Handling missing data

Variables with missing values exceeding 20% will be excluded from the analysis. For variables with missing values of 20% or less, multiple substitutions will be applied using the “mice” package in R (version 4.5.1).Missing values in the dataset were handled using multiple imputation methods from the mice package in R. Specifically, five imputed datasets (m = 5) were generated via the random forest method (method = “rf”), which preserves nonlinear relationships and interactions between variables. The second imputed dataset was selected as the representative sample for subsequent analyses [“complete(imputed_data, 2)”], while the consistency of results across all imputed datasets was verified.

This method ensures thorough data processing while adhering to best practices for handling missing data in clinical research.

Statistical analysis

Continuous variables will be summarized using the median and interquartile range (IQR). The normality of continuous variables was evaluated using the Kolmogorov–Smirnov (K-S) test, and the rank sum test was used for inter-group comparisons. Categorical variables will be expressed in terms of frequency and percentage, and chi-square tests will be used for inter-group comparisons. The MIMIC-IV dataset will be divided into a training set and an internal validation set in a ratio of 4:1 (or 8:2), while the MIMIC-III dataset will serve as the external validation set. This study did not employ K-fold cross-validation or bootstrapping methods, but no statistically significant differences were observed between the internal training set and the internal validation set.

Feature selection will employ a variety of methods, including LASSO, MDA, MDG, FS, BS and BE. The intersection of the selected variables will be used to develop models for XGBoost, random forest, logistic regression, decision tree, support vector machine and artificial neural network.

The performance of the model was evaluated by using ROC curve, calibration curve, decision curve analysis, f1 score, sensitivity, specificity, PLR, NLR, PPV, NPV and cutoff analysis. The model with the highest ROC performance was selected as the main model. SHAP analysis will be used to explain the final model.

All statistical analyses were conducted using R software (version 4.5.1), p < 0.05 is considered statistically significant.

Ethics approval and consent to participate

The Collaborative Institution Training Program (CITI) certification has been completed, meeting the database access requirements. All the databases used in this study contain de-identified data, so patient consent is not required. All procedures comply with relevant guidelines and regulations.

Result

A total of 21,723 patients with acute respiratory failure were extracted from the MIMIC-IV (v3.1) database. After excluding patients under 18 years old and those diagnosed with atrial fibrillation before the onset of acute respiratory failure, the final cohort included 21,594 patients diagnosed with acute respiratory failure, as shown in Figure 1.

Figure 1

Flowchart illustrating patient selection for acute respiratory failure study. MIMIC-IV database initially has 21,723 patients, refined to 21,594 after exclusions, split into internal training (17,275) and validation sets (4,319). MIMIC-III database starts with 6,879 patients, reduced to 6,615 after exclusions, forming the external validation set. Exclusions: age under eighteen, atrial fibrillation before respiratory failure, and missing data.

Figure 1. Flow chart.

Patients in the MIMIC-IV database were divided into two groups based on the presence of atrial fibrillation. Tables 1, 2 presents a comparison of baseline characteristics between the atrial fibrillation group and the non-atrial fibrillation group.

Table 1

Table 1. Internal set.

Table 2

Table 2. External set.

Baseline characteristics of patients with acute respiratory failure, with or without atrial fibrillation

Some clinical and laboratory indicators of patients with atrial fibrillation are significantly higher than those of non-atrial fibrillation patients, including age, pCO₂, pH, alkali excess, TCO₂, SpO₂, Hct, Hgb, PLT, WBC, AG, HCO₃, BUN, Cl, Cr, K, absolute basophils and absolute eosinophils. In addition, variables such as different blood cell counts, coagulation indicators, liver function tests, vital signs, GCS, OASIS, SOFA, APS III, SAPS III, Charlson comorbidity index, and Preiculos score were also recorded. As well as therapeutic factors such as oxygen flow rate, the use of non-steroidal anti-inflammatory drugs, mechanical ventilation and elective surgery.

There were statistically significant differences in ALP, total bilirubin, respiratory rate, body weight, OASIS, SOFA, APS III, SAPS III, and Preiculos scores between the two groups. The scores were higher in the atrial fibrillation group, while the other indicators were either comparable or lower.

Supplementary Figure 1 shows the correlations among the variables in the internal dataset, and Supplementary Table 2 presents the collinearity analysis. Hemoglobin and hematocrit (r = 0.959, P < 0.001), PT and INR (r = 0.920, P < 0.001), DBP and MBP (r = 0.909, P < 0.001); There was a strong correlation among 0.001), and there were significant correlations between SOFA and APSIII (r = 0.724), total CO₂ and alkali excess (r = 0.884), bicarbonate and alkali excess (r = 0.722), AST and ALT (r = 0.819), etc. (all P < 0.001).

The variance inflation factor (VIFs) showed significant multicollinearity in base excess, total CO₂, hematocrit, hemoglobin, SOFA, SAPSII, DBP and MBP (VIFs ranged from 11.20 to 29.73). However, the VIFs of bicarbonate, AST, ALT, total bilirubin, liver function, PT, INR and systolic blood pressure were all below 10, indicating that there was no significant multicollinearity.

Table 3 presents the selected predictor variables. Due to the differences between the MIMIC-IV and MIMIC-III databases and the lack of data on variables such as PCO₂, pH, calcium, absolute basophils, absolute neutrophils, ALT, ALP, AST, body weight, and non-steroidal anti-inflammatory drugs, despite these differences, the performance of the model was not significantly affected.

Table 3

Table 3. Predictor variables for the internal set obtained using six selection methods.

Selection of predictor variables

The predictor variables selected by six methods are. The specific parameters of the six methods, namely LASSO, RF-MDA, RF-MDG, SR-FS, SR-BS and SR-BE, are detailed in Supplementary Tables 3–7, Supplementary Figures 3–12. Figure 2 shows the intersection of six groups of predictor variables. Select the features that appear in at least four of the six methods as the final predictor variables.

Figure 2

Bar chart displaying intersection sizes of sets represented by blue bars, with values ranging from twenty-four to one. A second chart below shows the set sizes for MDG, MDA, BE, FS, BS, and LASSO in red. An intersection matrix indicates overlapping sets.

Figure 2. UpSet plot.

Table 3 shows various performance metrics of the six models on both the internal validation set and the external validation set.

Model development and comparison

All internal datasets are divided into training sets and validation sets in a ratio of 4:1. Supplementary Table 6 summarizes the baseline characteristics of these two groups, showing comparable distributions. The Logistic regression (LR) results, including odds ratio and p value, are shown in Supplementary Tables 7, 8, and the corresponding forest plots are presented in Supplementary Figures 6, 7.

Table 4 presents the evaluation metrics for the internal and external validation sets along with their 95% confidence intervals. The AUC range for the internal set was 0.734–0.816, while that for the external set was 0.685–0.822. Delong tests (Supplementary Tables 9–11) identified XGBoost and Random Forest (RF) as the top performers on the internal and external datasets, respectively. Specifically, XGBoost achieved AUC values of 0.816 [0.804–0.829] and 0.771 [0.758–0.784], with SEN, SPE, PLR, NLR, PPV, NPV, and F1-score at 0.626 (0.601–0.651), 0.800 (0.785–0.815), 3.133 (2.882–3.405), 0.467 (0.436–0.501), 0.610 (0.585–0.635), 0.811 (0.797–0.825), 0.6267; 0.704 (0.684–0.725), 0.702 (0.689–0.716), 2.366 (2.247–2.501), 0.421 (0.390–0.452), 0.499 (0.478–0.520), 0.849 (0.838–0.860), 0.58. The corresponding values for the Random Forest model are 0.810[0.796–0.823], 0.822[0.811–0.834], 0.673 (0.648–0.697), 0.786 (0.771–0.810), 3.148 (3.040–3.256), 0.416 (0.378–0.450), 0.611 (0.596–0.646), 0.828 (0.813–0.842), 0.6332. 0.748 (0.728–0.766), 0.741 (0.727–0.754), 2.889 (2.718–3.057), 0.341 (0.315–0.368), 0.549 (0.529–0.569), 0.874 (0.864–0.884), 0.63. Both models performed excellently across multiple metrics, with the Random Forest model showing a slight overall advantage over XGBoost.

Table 4

Table 4. Predictive performance metrics of different machine learning algorithms of the validation set.

Figures 3–6 displays the confusion matrices for the internal validation set and external validation set of the two optimal models (XGBOOST and RF).

Figure 3

Confusion matrix for XGBoost classifier, showing predictions versus reference values. Top-left cell: 415 true negatives. Top-right cell: 760 false positives. Bottom-left cell: 2459 false negatives. Bottom-right cell: 684 true positives. Color gradient represents frequency, ranging from light to dark blue.

Figure 3. Confusion matrix- XGBoost.

Figure 4

Confusion matrix for a random forest model shows prediction results. True positives: 264, false positives: 637, true negatives: 807, false negatives: 2610, indicated by varying shades of green for frequency.

Figure 4. Confusion matrix-random forest.

Figure 5

Confusion matrix for XGBoost external validation. True negatives: 4025, false positives: 535, false negatives: 249, true positives: 1267. Color gradient indicates frequency, from light blue (low) to dark blue (high).

Figure 5. Confusion matrix-XGBoost (external validation).

Figure 6

Confusion matrix from a Random Forest model validation, with Prediction and Reference axes. Top-left shows 243, top-right 746, bottom-left 4031, and bottom-right 1056. A color gradient indicates frequency, from light to dark green marked from 1000 to 4000.

Figure 6. Confusion matrix-random forest (external validation).

For the internal validation set, the RF metric is: Sensitivity (SEN) 0.673[0.648–0.697], specificity (SPE) 0.786[0.771–0.810], positive likelihood ratio (PLR) 3.148[3.040–3.256], negative likelihood ratio (NLR) 0.416[0.378–0.450] The positive predictive value (PPV) was 0.611[0.596–0.646], the negative predictive value (NPV) was 0.828 [0.813–0.842], and the f1 score was 0.633. For the external validation set, the metrics of the RF model are: SEN = 0.748 [0.728–0.766], SPE = 0.741 [0.727–0.754], PLR = 2.889 [2.718–3.057], NLR = 0.341 [0.315–0.368] PPV = 0.549 [0.529–0.569], NPV = 0.874 [0.864–0.884], F1-Score = 0.63.

The ROC curves, correction curves and DCA curves of the six models in the internal validation set and the external validation set are shown in Figures 7–12.

Figure 7

ROC curves of multiple models comparing sensitivity against one minus specificity. Models include XGBoost, RF, LR, DT, SVM, and ANN with AUC values of 0.816, 0.81, 0.802, 0.734, 0.806, and 0.759, respectively.

Figure 7. ROC plot.

Figure 8

Figure 8. ROC plot.

Figure 9

Calibration curves of multiple models, showing observed proportion versus predicted probability. Six lines represent different models: ANN (red), Decision Tree (blue), Logistic Regression (green), Random Forest (purple), SVM (orange), and XGBoost (yellow). A diagonal dashed line indicates perfect calibration.

Figure 9. Calibration curves.

Figure 10

Calibration curves compare the observed frequency to the mean predicted probability for six models: ANN, Decision Tree, Logistic Regression, Random Forest, SVM, and XGBoost. A dashed line represents perfect calibration.

Figure 10. Calibration curves.

Figure 11

Line graph comparing standardized net benefit against high risk threshold for various models: XGB, RF, LR, DT, SVM, ANN, with a baseline and no model curve. Each model is represented by a different colored line.

Figure 11. DCA plot.

Figure 12

Decision curve analysis graph showing the net benefit against threshold probability for various models, including XGB, RF, LR, DT, SVM, and ANN, with cost-benefit ratios. Each line represents a different model, illustrating their performance across various threshold probabilities.

Figure 12. DCA plot.

The superior performance of XGBoost and Random Forest (RF) models can be attributed to several factors. Firstly, the AUC values of both models are relatively high (XGBoost: 0.816, RF: 0.831), indicating that the classification capabilities of the two models are strong and they have a good ability to distinguish between atrial fibrillation and non-atrial fibrillation patients. Secondly, their calibration curves are closely aligned with the ideal diagonal, indicating a more accurate probability estimation compared to other models. Finally, the two models demonstrated higher net benefits in the decision curve analysis (DCA) within the threshold probability range, indicating that they have greater clinical utility in guiding decision-making. These advantages highlight the accuracy, reliability and practical value of XGBoost and RF in predicting atrial fibrillation in patients with acute respiratory failure.

Optimal model interpretability

This study developed two dynamic graphs (available at http://127.0.0.1:7312) to visualize the impact of various variables on disease outcomes and to assist clinicians in quickly determining the likelihood of atrial fibrillation (Supplementary Figures 13–16). Variables with extreme values (representing a small portion of the data) were not excluded because determining the upper limit of clinical trial indicators is challenging.

To better understand the influence of different variables on atrial fibrillation, we used the SHapley Additive explanation diagram (Figure 13). As shown in Figures 8A,B, age is the most important factor influencing the development of atrial fibrillation. Figure 13B also highlights the intensity and direction of the influence of each variable. For the convenience of local interpretation, Figures 8C,D show how specific variables increase or decrease the risk of atrial fibrillation.

Figure 13

Two SHAP analysis plots explaining feature importance in a predictive model. Panel A shows a bar chart ranking features by mean absolute SHAP values, with age, congestive heart failure, prothrombin time (pt), INR, and weight as the top contributors. Panel B is a summary plot combining SHAP values and feature values, where each point represents an observation. Color indicates feature value (yellow = high, purple = low), while position along the x-axis shows positive or negative contribution to the prediction. Age and congestive heart failure are the most influential predictors. Two SHAP value plots (C and D) illustrating feature contributions to model predictions. Panel C shows positive contributions from INR, platelet count, age, and congestive heart failure, with glucose slightly reducing the prediction, resulting in f(x) = 0.909. Panel D highlights negative contributions from age and glucose, outweighing positive effects from weight, urine output, and congestive heart failure, leading to f(x) = -2.19. Both plots visualize how individual patient features influence prediction outcomes.

Figure 13. SHAP plot.

The clinical application of this model (highlighted by the SHAP summary graph) provides valuable insights into how individual characteristics can contribute to prediction and thereby assist in clinical decision-making. Key characteristics, such as age, blood glucose level and platelet count, play a crucial role in the prognosis of patients. The high interpretability of this model enables clinicians to understand the impact of each feature on risk prediction, thereby supporting informed real-time decision-making during patient management.

For instance, clinicians can guide the treatment adjustments of diabetic or metabolic patients based on the blood sugar levels predicted by the model to prevent complications. Systolic blood pressure (SBP) and body weight can be monitored to enable targeted intervention for patients with cardiovascular risk. Age is regarded as a key predictive feature that can assist clinicians in assessing high-risk patients, as it is closely related to chronic diseases such as hypertension, diabetes and cardiovascular diseases. This allows for more personalized care and effective treatment strategies.

For elderly patients, especially those with a history of heart failure, this model provides insights that help clinicians assess the risks and benefits of specific intervention measures. By combining age-related risks with other clinical data such as AST levels and systolic blood pressure, clinicians can make timely and accurate treatment decisions, prevent over-prescripting, and ensure that high-risk situations are addressed.

In addition, the model also provides predictive insights into biomarkers such as AST levels, which are potential signals of liver dysfunction. This enables clinicians to proactively adjust treatment methods and prevent health problems from escalating. Integrating these insights into clinical practice can enhance the accuracy of diagnosis and improve the ability of clinicians to intervene effectively at an early stage.

In conclusion, this model enhances clinical decision-making by providing real-time, data-driven insights. It enhances the accuracy of intervention, helps identify high-risk patients at an early stage, and optimizes treatment plans, all of which contribute to improving the prognosis of patients. Its ability to offer tailor-made proactive care makes it a valuable tool for patient management

Discussion

This study is the first to propose a predictive model for acute respiratory failure combined with atrial fibrillation, integrating six feature selection methods with real-world vital signs to identify key risk factors. Unlike previous research, no comparison model was included, as no predictive models for this condition currently exist. Instead, the study systematically identifies the most relevant risk factors using multiple methods, ensuring model robustness and reliability.

Interestingly, the predictive performance of our machine learning models was slightly higher in the external validation cohort compared with the internal validation cohort. This finding may reflect differences in the distribution of patient characteristics between datasets, or suggest that the models are robust and generalizable across independent patient populations. Nevertheless, careful interpretation is warranted, and further prospective validation is needed to confirm model stability and clinical applicability. Given the large sample size of this study, the likelihood of model overfitting is relatively low. Large-scale datasets can adequately reflect the variability and representativeness of the underlying population, enabling machine learning algorithms to capture true patterns rather than noise. Nevertheless, external validation remains crucial for confirming the model's robustness and generalization capabilities.

Six machine learning algorithms were applied, with performance evaluated using various metrics. XGBoost and Random Forest (RF) emerged as the top performers. A key innovation of this study is its novel approach to addressing a gap in existing research. The model's interpretability was enhanced through SHAP values and visualizations, making it a valuable tool for healthcare professionals. The study found that age is the primary risk factor for atrial fibrillation, consistent with previous research (21). Age plays a crucial role in the development of atrial fibrillation (AF) in patients with acute respiratory failure (ARF). As age increases, the heart's atrial structure undergoes dilation and fibrosis, leading to instability in atrial electrophysiological properties and an increased risk of AF. Elderly patients often have comorbid chronic conditions such as hypertension, diabetes, and coronary heart disease, with hypertension, in particular, causing left atrial enlargement and electrical remodeling, further promoting the occurrence of AF. Additionally, the aging process is accompanied by a decline in the heart's self-regulation ability, making elderly patients more susceptible to electrophysiological abnormalities during acute pathological states such as hypoxia and acid-base imbalance. The mechanisms through which ARF leads to AF primarily include hypoxemia, acid-base imbalance, and hypercapnia. Hypoxia triggers instability in cardiac electrical activity, acid-base imbalance (e.g., metabolic acidosis) alters myocardial electrical activity, and hypercapnia causes atrial dilation and increased load, all of which contribute to the onset of AF. Furthermore, ARF-induced hemodynamic changes, particularly in patients with concomitant heart failure, increase atrial pressure and stretch the atrial walls, further exacerbating the risk of AF. Pulmonary hypertension and elevated left atrial pressure also play key roles. The interaction of these pathological mechanisms makes ARF a significant trigger for AF, and understanding these mechanisms better helps in the prevention and management of these patients. Additionally, factors such as prothrombin time (PT), international normalized ratio (INR), blood pressure (DBP and SBP), body weight, glucose, hemoglobin, partial thromboplastin time (PPT), aspartate aminotransferase (AST), and bilirubin were identified as reversible risk factors for atrial fibrillation. For patients without other systemic issues, these values should be carefully managed within an appropriate range.

Other studies have shown that when patients develop atrial fibrillation, certain indicators can be used to predict the likelihood of their atrial fibrillation resolving (19). Age, gender, duration of AF, and other factors are important predictive indicators that can be used to assess a patient's recovery from AF.

Limitations

This study has several limitations. Firstly, the external validation set lacks some predictors, which may affect the performance of the model. Future research should utilize more comprehensive external databases and improve data collection to reduce missing values, possibly through advanced imputation or multi-center data.

Thirdly, The respiratory and circulatory system diseases discussed in this study are both strongly associated with smoking (20). However, due to certain limitations in the MIMIC database, the inability to obtain this important characteristic represents a significant shortcoming of this research.

Secondly, factors not related to acute respiratory failure, such as previous cardiovascular diseases, medications or genetics, may affect the occurrence of atrial fibrillation. Future research should incorporate more covariates and consider multi-factor models or propensity scoring methods to address confusion.

Finally, the samples are from the MIMIC database and may not represent the global population, which limits their universality. More diverse populations need to be studied to validate the model in different regions.

Conclusion

Six feature selection methods and six machine learning algorithms are adopted to establish the model. XGBoost and Random Forest demonstrated the best performance in external validation. Vital signs and laboratory data can assist in timely clinical decision-making, reduce complications and improve outcomes.

Data availability statement

The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.

Ethics statement

The studies involving humans were approved by Beth Israel Deaconess Medical Center Institutional Review Board Massachusetts Institute of Technology Institutional Review Board (MIT IRB/Committee on the Use of Humans as Experimental Subjects, COUHES). The studies were conducted in accordance with the local legislation and institutional requirements. Written informed consent for participation was not required from the participants or the participants' legal guardians/next of kin in accordance with the national legislation and institutional requirements. Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.

Author contributions

RL: Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing.

Funding

The author(s) declare that no financial support was received for the research and/or publication of this article.

Conflict of interest

The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that no Generative AI was used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fcvm.2025.1696609/full#supplementary-material

Abbreviations

Po2, partial pressure of oxygen; Pco2, partial pressure of carbon dioxide; PH, acidity; BE, base excess; Total_co2, total carbon dioxide; SPo2, peripheral capillary oxygen saturation; Platelets, platelet count; Wbc, white blood cell count; Bun, blood urea nitrogen; Inr, international normalized ratio; Pt, prothrombin time; Ppt, partial thromboplastin time; Alt, alanine aminotransferase; Alp, alkaline phosphatase; Ast, aspartate aminotransferase; Sbp, systolic blood pressure; Dbp, diastolic blood pressure; Mbp, mean blood pressure; Resp rate, respiratory rate; Gcs, glasgow coma scale; Gcs Motor, GCS motor response; Gcs Verbal, GCS verbal response; Gcs eyes, GCS eye opening; Gcs Unable, GCS unable to score; Sofa, sequential organ failure assessment; Apsiii, acute physiology and chronic health evaluation; Sapsii, simplified acute physiology score II; Oasis, open source anonymized simulator.

References

1. Fujishima S. Guideline-based management of acute respiratory failure and acute respiratory distress syndrome. J Intensive Care. (2023) 11(1):10. doi: 10.1186/s40560-023-00658-3

PubMed Abstract | Crossref Full Text | Google Scholar

2. Bordignon S, Chiara Corti M, Bilato C. Atrial fibrillation associated with heart failure, stroke and mortality. J Atr Fibrillation. (2012) 5(1):467. doi: 10.4022/jafib.467

PubMed Abstract | Crossref Full Text | Google Scholar

3. Mentes O, Celik D, Yıldız M, Özdemir T, Ari M, Aksoy Güney EN, et al. Atrial fibrillation among ICU patients with type 2 respiratory failure: who is at risk and what are the outcomes? Diagnostics (Basel). (2025) 15(13):1612. doi: 10.3390/diagnostics15131612

PubMed Abstract | Crossref Full Text | Google Scholar

4. Sagris M, Vardas EP, Theofilis P, Antonopoulos AS, Oikonomou E, Tousoulis D. Atrial fibrillation: pathogenesis, predisposing factors, and genetics. Int J Mol Sci. (2021) 23(1):6. doi: 10.3390/ijms23010006

PubMed Abstract | Crossref Full Text | Google Scholar

5. Bemis CE, Serur JR, Borkenhagen D, Sonnenblick EH, Urschel CW. Influence of right ventricular filling pressure on left ventricular pressure and dimension. Circ Res. (1974) 34(4):498–504. doi: 10.1161/01.res.34.4.498

PubMed Abstract | Crossref Full Text | Google Scholar

6. Choi SE, Sagris D, Hill A, Lip GYH, Abdul-Rahim AH. Atrial fibrillation and stroke. Expert Rev Cardiovasc Ther. (2023) 21(1):35–56. doi: 10.1080/14779072.2023.2160319

PubMed Abstract | Crossref Full Text | Google Scholar

7. Liang F, Wang Y. Coronary heart disease and atrial fibrillation: a vicious cycle. Am J Physiol Heart Circ Physiol. (2021) 320(1):H1–12. doi: 10.1152/ajpheart.00702.2020

PubMed Abstract | Crossref Full Text | Google Scholar

8. Aldrugh S, Sardana M, Henninger N, Saczynski JS, McManus DD. Atrial fibrillation, cognition and dementia: a review. J Cardiovasc Electrophysiol. (2017) 28(8):958–65. doi: 10.1111/jce.13261

PubMed Abstract | Crossref Full Text | Google Scholar

9. Schüttler D, Bapat A, Kääb S, Lee K, Tomsits P, Clauss S, et al. Animal models of atrial fibrillation. Circ Res. (2020) 127(1):91–110. doi: 10.1161/CIRCRESAHA.120.316366

PubMed Abstract | Crossref Full Text | Google Scholar

10. Roussos C, Koutsoukou A. Respiratory failure. Eur Respir J Suppl. (2003) 47:3s–14. doi: 10.1183/09031936.03.00038503

PubMed Abstract | Crossref Full Text | Google Scholar

11. Van Gelder IC, Rienstra M, Bunting KV, Casado-Arroyo R, Caso V, Crijns HJGM, et al. 2024 ESC guidelines for the management of atrial fibrillation developed in collaboration with the European Association for Cardio-Thoracic Surgery (EACTS). Eur Heart J. (2024) 45(36):3314–414. doi: 10.1093/eurheartj/ehae176

PubMed Abstract | Crossref Full Text | Google Scholar

12. Roca O, Hernández G, Díaz-Lobato S, et al. Acute respiratory failure: clinical diagnosis and management recommendations. Intensive Care Med. (2022) 48(6):677–94. doi: 10.1007/s00134-022-06723-x

Crossref Full Text | Google Scholar

13. Ferrer M, Antonelli M, Roca O, et al. Chest imaging in acute respiratory failure: European respiratory society statement. Eur Respir J. (2019) 54(3):1900205. doi: 10.1183/13993003.00205-2019

Crossref Full Text | Google Scholar

14. van der Velden RMJ, Hermans ANL, Pluymaekers NAHA, Gawalko M, Elliott A, Hendriks JM, et al. Dyspnea in patients with atrial fibrillation: mechanisms, assessment and an interdisciplinary and integrated care approach. Int J Cardiol Heart Vasc. (2022) 42:101086. doi: 10.1016/j.ijcha.2022.101086

PubMed Abstract | Crossref Full Text | Google Scholar

15. Swanson K, Wu E, Zhang A, Alizadeh AA, Zou J. From patterns to patients: advances in clinical machine learning for cancer diagnosis, prognosis, and treatment. Cell. (2023) 186(8):1772–91. doi: 10.1016/j.cell.2023.01.035

PubMed Abstract | Crossref Full Text | Google Scholar

16. Guan C, Gong A, Zhao Y, Yin C, Geng L, Liu L, et al. Interpretable machine learning model for new-onset atrial fibrillation prediction in critically ill patients: a multi-center study. Crit Care. (2024) 28(1):349. doi: 10.1186/s13054-024-05138-0

PubMed Abstract | Crossref Full Text | Google Scholar

17. Johnson AEW, Bulgarelli L, Shen L, Gayles A, Shammout A, Horng S, et al. MIMIC-IV, a freely accessible electronic health record dataset. Sci Data. (2023) 10(1):1. doi: 10.1038/s41597-022-01899-x

PubMed Abstract | Crossref Full Text | Google Scholar

18. Johnson AE, Pollard TJ, Shen L, Lehman LW, Feng M, Ghassemi M, et al. MIMIC-III, a freely accessible critical care database. Sci Data. (2016) 3:160035. doi: 10.1038/sdata.2016.35

PubMed Abstract | Crossref Full Text | Google Scholar

19. Mariani MV, Pierucci N, Trivigno S, Cipollone P, Piro A, Chimenti C, et al. Probability score to predict spontaneous conversion to sinus rhythm in patients with symptomatic atrial fibrillation when less could be more? J Clin Med. (2024) 13(5):1470. doi: 10.3390/jcm13051470

PubMed Abstract | Crossref Full Text | Google Scholar

20. Wu Z, Tang C, Wang D. Bidirectional two-sample Mendelian randomization study of association between smoking initiation and atrial fibrillation. Tob Induc Dis. (2024) 22. doi: 10.18332/tid/189380

PubMed Abstract | Crossref Full Text | Google Scholar

21. Wang N, Yu Y, Sun Y, Zhang H, Wang Y, Chen C, et al. Acquired risk factors and incident atrial fibrillation according to age and genetic predisposition. Eur Heart J. (2023) 44(47):4982–93. doi: 10.1093/eurheartj/ehad615

PubMed Abstract | Crossref Full Text | Google Scholar

Keywords: machine learning, predictive model, atrial fibrillation, acute respiratory failure, MIMIC- IV database

Citation: Li R (2025) Predicting atrial fibrillation in patients with acute respiratory failure using machine learning: application of the MIMIC-III and MIMIC-IV datasets. Front. Cardiovasc. Med. 12:1696609. doi: 10.3389/fcvm.2025.1696609

Received: 1 September 2025; Accepted: 22 September 2025;
Published: 9 October 2025.

Edited by:

Dimitrios Vrachatis, National and Kapodistrian University of Athens, Greece

Reviewed by:

Chengchun Tang, Southeast University, China
Vincenzo Mirco La Fazia, Texas Cardiac Arrhythmia Institute, United States

Copyright: © 2025 Li. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Rixuan Li, MTMwMDE3NjMwM0BxcS5jb20=

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.