Reappraisal of Non-vitamin K Antagonist Oral Anticoagulants in Atrial Fibrillation Patients: A Systematic Review and Meta-Analysis

Background: Recent observational studies have compared effectiveness and safety profiles between non-vitamin K antagonist oral anticoagulants (NOACs) and warfarin in patients with atrial fibrillation (AF). Nevertheless, the confounders may exist due to the nature of clinical practice-based data, thus potentially influencing the reliability of results. This systematic review and meta-analysis were conducted to compare the effect of NOACs with warfarin based on the propensity score-based observational studies vs. randomized clinical trials (RCTs). Methods: Articles included were systematically searched from the PubMed and EMBASE databases until March 2021 to obtain relevant studies. The primary outcomes were stroke or systemic embolism (SSE) and major bleeding. Hazard ratios (HRs) and 95% confidence intervals (CIs) of the outcomes were extracted and then pooled by the random-effects model. Results: A total of 20 propensity score-based observational studies and 4 RCTs were included. Compared with warfarin, dabigatran (HR, 0.82 [95% CI, 0.71–0.96]), rivaroxaban (HR, 0.80 [95% CI, 0.75–0.85]), apixaban (HR, 0.75 [95% CI, 0.65–0.86]), and edoxaban (HR, 0.71 [95% CI, 0.60–0.83]) were associated with a reduced risk of stroke or systemic embolism, whereas dabigatran (HR, 0.76 [95% CI, 0.65–0.87]), apixaban (HR, 0.61 [95% CI, 0.56–0.67]), and edoxaban (HR, 0.58 [95% CI, 0.45–0.74]) but not rivaroxaban (HR, 0.92 [95% CI, 0.84–1.00]) were significantly associated with a decreased risk of major bleeding based on the observational studies. Furthermore, the risk of major bleeding with dabigatran 150 mg was significantly lower in observational studies than that in the RE-LY trial, whereas the pooled results of observational studies were similar to the data from the corresponding RCTs in other comparisons. Conclusion: Data from propensity score-based observational studies and NOAC trials consistently suggest that the use of four individual NOACs is non-inferior to warfarin for stroke prevention in AF patients.


INTRODUCTION
Atrial fibrillation (AF), the most common arrhythmia in clinical practice, increases the five-fold risk of ischemic stroke and twofold for all-cause mortality (1,2). Before 2010, warfarin was primarily used to prevent stroke in AF patients, but there is a limited range for treatment due to the regular monitoring of the international normalized ratio (INR), and the dosage is adjusted frequently (3). Subsequently, non-vitamin K oral anticoagulants (NOACs), including direct thrombin inhibitor (dabigatran) and factor Xa inhibitors (rivaroxaban, apixaban, and edoxaban) are recommended as the preferred drugs for stroke prevention among nonvalvular AF patients (4)(5)(6). Compared with warfarin, NOACs do not require anticoagulation monitoring, have easier dosing regimens, and have fewer food and drug interactions (7).
Previous randomized clinical trials (RCTs) have shown that the efficacy and safety of the NOACs are superior or noninferior to warfarin in AF patients. Specifically, compared with warfarin, dabigatran is associated with lower rates of stroke and systemic embolism (SSE) and a similar rate of major bleeding (8), apixaban has decreased rates of SSE and MB (9), rivaroxaban has non-inferior rates of SSE and a similar rate of major bleeding (10), and edoxaban has non-inferior rates of SSE and a lower rate of major bleeding (11). Although RCTs could ensure the balance of results between different patient groups and get a fair evaluation of the trial treatment effect, they limit the assessment of the risks and benefits of interventions for all the populations when these interventions are used in real-world settings. By contrast, observational studies could infer a wider range of patient characteristics and evaluate a broader range of outcomes over a more extended period (12,13). More recently, many observational studies have been published to compare the effectiveness and safety of NOACs vs. warfarin in AF patients. However, the obvious confounders and significant biases may exist in several observational studies due to the nature of clinical practice-based data, thus potentially influencing the reliability of findings.
An effective method to evaluate interventions' effectiveness in typical clinical settings can be provided by the propensity score (PS) (14). Observational studies using the PS method may alter the target population by changing the distribution of patient baseline characteristics that facilitate analysis. Therefore, the PS analysis can be used to reduce biases in comparisons between the targeted populations and controls. In the present meta-analysis, we aimed to compare the effectiveness and safety profiles between NOACs and warfarin based on the PS-based observational studies, and further test whether the pooled results of high-quality observational studies were consistent with data from the corresponding RCTs.

METHODS
This systematic review and meta-analysis were carried out based on the Cochrane Handbook for systemic reviews. The results were presented according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) Statement.
Ethical approval was not provided because we only included the published studies.
We performed a systematic search in detail on the PubMed and EMBASE databases until March 2021 to obtain all the relevant studies. To obtain a balanced covariate distribution between groups of NOACs and warfarin, we included observational articles that applied the PS-based methods. In addition, 4 RCTs of NOACs vs. warfarin were also selected (dabigatran [RE-LY], rivaroxaban [ROCKET AF], apixaban [ARISTOTLE], and edoxaban [ENGAGE AF-TIMI 48]). The primary outcomes were SSE and major bleeding. Data extraction was conducted independently by two researchers. The hazard ratios (HR) and 95% confidence intervals (CIs) were considered as the effect sizes, and the pooled by the randomeffects model. To test the stability of the results, we re-conducted the analysis using the fixed-effects model, inverse variance heterogeneity (IVhet), and quality effects (QE) models. Detailed information including eligibility criteria, literature search, study selection, and data extraction, quality assessment, and statistical analysis was provided in Supplementary Materials.
All the statistical analyses were carried out by Review Manager

Study Selection
The flow chart of document retrieval is presented in Supplementary Figure 1. A total of 1,139 studies from two electronic databases were under-identification. A total of 782 studies remained after duplication removal, and then 57 studies were left based on the screenings of titles/abstracts. Among the 57 studies undergoing the full-text screenings, 33 of them were excluded due to the following reasons: (1) 23 studies used overlapping databases; (2) 3 studies included single-center patients, and the sample size was less than 1,000; (3) 5 studies reported the comparisons between combined NOACs vs. warfarin, or did not regard warfarin as the reference; (4) 2 studies did not use the PS-based methods to match baseline patient characteristics. Finally, 24 studies (3, 7-11, 15-32) (20 observational cohort studies and 4 RCTs) were included in our current meta-analysis.

Baseline Characteristics of the Included Studies
The baseline characteristics of the included RCTs are shown in Supplementary Table 1. Detailed information was categorized into different groups based on the dose of NOACs. Baseline characteristics of the 20 observational studies are shown in Table 1. Although some studies extracted data from the same database, they analyzed different kinds of NOACs, included diverse study periods, or included different outcomes for analysis.  applied the PS-based methods to balance the covariates between groups [propensity score matching [PSM], n = 11 (3, 7, 20, 23-27, 29, 31, 32), and inverse probability of treatment weighting [IPTW], n = 9 (15-19, 21, 22, 28, 30)]. For the PS diagnostics, 14 studies used standardized differences, and 6 studies failed to report any further diagnostic use.
The results of the risk of bias assessment for RCTs are shown in Supplementary Table 2, suggesting low risks in biases. The methodological quality assessment of observational cohorts was carried out by the NOS tool (Supplementary Table 3). All articles scored 7 or more points indicating relatively high quality.

Comparisons Between Individual NOAC and Warfarin
Based on the observational studies, the crude event rates and pooled HRs (based on random-effects model) of the outcomes between each NOAC vs. warfarin are summarized in Table 2.

Primary Outcomes Between Each NOAC vs. Warfarin
As presented in Figure 1, compared with warfarin, dabigatran was associated with reduced risks of SSE ( (Supplementary Figure 2).

Sensitivity Analysis and Subgroup Analysis
In the sensitivity analysis, the results of the primary outcomes from the IVhet or QE models (Supplementary Figures 6-9) were similar to those from the primary analysis using the randomeffects model. In addition, the results did not change substantially when we re-conducted the analyses using the fixed-effects model (Supplementary Table 4). Table 4, the subgroup analyses concerning the primary outcomes suggested no significant interactions grouped by the NOAC-dose and follow-up period. For the subgroup analysis based on the regions, Asians showed fewer risks of SSE and major bleeding than non-Asians in the group of dabigatran vs. warfarin. In the group of rivaroxaban vs. warfarin, Asians showed fewer risks of major bleeding compared with non-Asians. In the group of apixaban vs. warfarin, the risk of SSE was significantly lowered in Asians compared with non-Asians. There were not enough studies for the subgroup analyses between edoxaban vs. warfarin.

Summary Effect Estimates Between Observational Studies and RCTs
Comparative effect estimates of NOACs vs. warfarin between observational studies and RCTs are shown in Table 3

Publication Bias
For the observational studies, there were no potential publication biases when inspecting the funnel plots of the primary outcomes (Supplementary Figures 10-13). In addition, the Begg's and Egger's tests also proved no significant publication biases (all P > 0.1; Supplementary Table 5). For the secondary outcomes, the Egger's test showed a potential publication bias in intracranial hemorrhage of the dabigatran vs. warfarin group, and ischemic stroke of the rivaroxaban vs. warfarin group. Nevertheless, the results from the trim-and-fill analysis suggested no trimming performed, and the corresponding pooled results were not changed. For the RCTs, there was no need for publication bias analysis because only four NOAC trials were included.

DISCUSSION
In the current meta-analysis, we compared the studied outcomes between NOACs and warfarin by only included the PS-based observational studies. Based on the observational studies, the results from different pooled models consistently suggested that compared with warfarin, dabigatran, rivaroxaban, apixaban, and edoxaban were associated with a reduced risk of SSE, whereas dabigatran, apixaban, and edoxaban but not rivaroxaban was associated with a decreased risk of major bleeding. We further tested whether the pooled results of high-quality observational studies were consistent with data from the corresponding RCTs. The risk of major bleeding with dabigatran 150 mg was significantly lower in observational studies than that in the RE-LY trial, whereas the pooled results of observational studies were consistent with data from the corresponding RCTs in other comparisons for both SSE and major bleeding.
Over the past few decades, vitamin K antagonists such as warfarin have been confirmed to be effective for preventing stroke in AF patients (33). However, the shortcomings of warfarin mainly include slow onset time, the significantly varied dose-response relationship among patients, narrow therapeutic window, and frequent interactions with other drugs, potentially limiting its clinical applications (34). Nowadays, there is increasing use of NOACs because they could be more effective, easier to control, and safer than warfarin (7). Previous NOAC trials (RE-LY, ROCKET-AF, ARISTOTLE, and ENGAGE-AF TIMI 48) suggested that NOACs were comparable to warfarin in efficacy, but NOACs significantly reduced the risk of bleeding. Based on data of NOAC trials, current guidelines have recommended NOACs as the first-line drugs for the prevention of thrombogenesis and stroke in patients with nonvalvular AF (5). Although, RCTs have always been hailed as the gold standard for clinical efficacy evaluation, their results may not be well applicable in practice. At this time, observational studies can be a useful complement (35).
Nowadays, clinical practice-based data are increasingly used to evaluate the effectiveness and safety profiles of NOACs compared to warfarin. Xue et al. (34) compared the overall effectiveness and safety outcomes of three NOACs (dabigatran, rivaroxaban, and apixaban) with warfarin in Asians with AF. Based on the real-world studies, the authors demonstrated that in Asians with AF, the use of NOACs could have potential advantages in all the effectiveness and safety profiles when compared to warfarin irrespective of the type and drug doses. Nevertheless, the heterogeneous real-world studies without proper methods to balance the covariate distribution could be influenced by the potential confounders (36), thus potentially influencing the reliability of results. The PS methods including PSM and IPTW are the most frequently used methods to deal with this issue. The PS methods comprehensively consider all measured characteristic variables, especially confounding factors, making the matched sample more similar to the population of an RCT. PSM can match the treatment and non-treatment group based on the PS from low to high, and thus it can control multiple confounders at the same time by only using the matching of PS (37). IPTW is capable of eliminating confounders by conforming to the distribution of PS in each group (37). However, PSM and IPTW are often failed to be properly conducted (36). Therefore, to further improve the reliability of the study outcomes and reduce the influence of confounding factors, PS diagnostics such as standardized differences, C-statistic, and eye-balling could be conducted after PSM or IPTW. Standardized differences are an attribute of the sample, independent of the sample size. It is easy to compute and understand and is the most commonly used diagnostic method to measure the balance of covariate distribution between treatment groups (36,38). In our current analysis, all of the 20 observational studies applied PSM or IPTW to balance the covariates between NOACs and warfarin regimen group. For the PS diagnostics, 14 studies used standardized differences, and 6 studies failed to report any further diagnostic use.
Reaching an agreement between RCTs and observational studies can greatly improve the accuracy of the results and offer more confidence in the reference of clinical routine practice. It is still known that whether the findings of observational studies were consistent with data from the NOAC trials. Siontis et al. (35) compared the consistency between RCTs and observational studies of the profiles of NOACS and warfarin. The authors found that the effect of NOACs and warfarin were consistent between RCTs and observational studies for most outcomes. However, some exceptions appeared in the dabigatran vs. warfarin group. The RE-LY trial found an increased risk of myocardial infarction in patients treated with dabigatran 150 mg compared with patients using warfarin, whereas the reverse outcomes were found in observational studies. Also, significantly higher risks of major and gastrointestinal bleeding were found in observational studies when compared to the RE-LY trial in the dabigatran group. Conversely, the data of the RE-LY trial demonstrated a lower rate of SSE compared with that of the observational studies. However, Siontis et al. did not describe the baseline characteristics of the treated and non-treated groups in detail, nor did they clarify the statistical methods used in the included studies. Lacking rigorous study design and statistical analysis could make the results easily affected by confounding bias, and thus reduced its reliability. Given these issues, we decided to conduct a more comprehensive meta-analysis by only included the PS-based observational studies. In our analysis, the results of the effectiveness and safety profiles are largely in agreement with some discrepancies that mainly happened in the dabigatran vs. warfarin group. The results of the consistency between the observational studies and RCTs of Siontis et al. are quite similar to our study.

LIMITATIONS
There were still several limitations in this meta-analysis. First, most of the observational studies included were retrospective, and therefore, the association between the drug and the event outcomes rather than their causal relationships were evaluated. Second, despite the detailed information extracted from the included studies, there were still some articles that lack major data (e.g., drug dosage, follow-up period of NOAC treatment) which may provide potential uncertainties to the results. Third, several important cardiovascular events including myocardial infarction were not included in our analysis due to a lack of data. Fourth, in this meta-analysis, we did not include observational studies that only focused on the special populations with AF. Nevertheless, we have previously discussed the effect of NOACs in the special AF populations (e.g., chronic kidney disease, hypertrophic cardiomyopathy, peripheral artery disease, prior stroke) (39)(40)(41)(42). Finally, although we included comparisons of outcomes between edoxaban and warfarin, we still failed to assess the results for some outcomes due to insufficient data.

CONCLUSION
This meta-analysis suggested that the use of NOACs for stroke prevention in AF was non-inferior or even superior to warfarin based on data from PS-based observational studies. The consistency between the observational studies and corresponding RCTs further confirmed this view.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.