Development of a Risk Nomogram Model for Identifying Interstitial Lung Disease in Patients With Rheumatoid Arthritis

The clinical features of rheumatoid arthritis (RA)-associated interstitial lung disease (ILD) (RA-ILD) usually manifest to an advanced stage of lung disease, which leads the challenge of early diagnosis and the difficulty in guiding treatments for patients with RA-ILD in clinical settings. The aim of this study was to construct a nomogram for identifying ILD in RA patients. Through the incorporation of the level of matrix metalloproteinase-3 (MMP-3) in plasma, demographics, clinical feature, and laboratory parameters of 223 RA patients (85 RA-ILD) which were grouped as training cohorts and validation cohorts, an identifying nomogram of RA-ILD was built. Candidate variables for the nomogram were screened using univariable analysis and multivariable logistic regression analysis. The accuracy of the diagnostic nomogram was measured via concordance index (C-index), calibration plots, and decision curve analysis (DCA). Results showed that plasma MMP-3 protein was elevated in RA-ILD patients compared with non-ILD RA patients in both training cohorts (p = 0.0475) and validation cohorts (p = 0.0006). Following a final regression analysis, the gender of male, current smoking state, levels of circulating rheumatoid factor (RF), C-reactive protein (CRP), and MMP-3 were identified as risk factors for the construction of the nomogram. The calibration plots further showed a favorable consistency between the identifying nomogram and actual clinical findings. In consistence, the C-index (0.826 for both training cohorts and validation cohorts) indicated the satisfactory discriminative ability of the nomogram. Although the incorporation of MMP-3 failed to significantly improve identified outcomes of the nomogram as determined by DCA, including the level of circulating MMP-3 increased the diagnostic accuracy of the nomogram for ILD in RA patients. Thus, our proposed model can serve as a non-invasive tool to identify ILD in RA patients, which may assist physicians to make treatment decisions for RA patients.


INTRODUCTION
Pulmonary involvement is one of common extra-articular manifestations in patients with rheumatoid arthritis (RA) (1,2), which manifests different clinical phenotypes in RA patients, including pleura disorders, interstitial lung disease (ILD), and obliterative bronchiolitis (OB). Of these, ILD contributes significantly to the morbidity and mortality in RA patients (3,4). The difficulty in accurate and early diagnosis results in a significant impact on the treatment and prognosis of this disease in clinical settings.
In general, a number of factors can increase the likelihood of progression and mortality in RA-ILD. For instance, the incidence of RA-ILD in men is greater compared to female patients (3,5). Cigarette smoking is also associated with the development of RA-ILD, as cigarette smoke inhalation induces genotoxic and inflammatory citrullination of lung antigens that extend from the conducting airway to the gas exchange zones, particularly in those patients with the shared human leukocyte antigen DRB1 (HLA-DRB1) epitope (6). The strong interactions between risk factors from environments, genetics, and/or clinics of RA thus reinforced the development of ILD in RA patients.
Consistent with the above paradigm, several groups have shown an increased level of the anti-CCP antibody in patients with RA-ILD (7)(8)(9)(10). Beyond the elevated anti-CCP antibody and rheumatoid factor (RF), other alternative biomarkers have also been evaluated for their ability to diagnose RA-ILD, such as the epithelial cell-derived Krebs von den Lungen-6 (KL-6) (11), surfactant protein D (SP-D), matrix metalloproteinase-7 (MMP-7), interferon-g-inducible protein 10 (IP10/CXCL10), and pulmonary and activation-regulated chemokine (PARC) (12,13). In addition, several studies on the potential adverse effects of medication treatments with various disease-modifying antirheumatic drugs in RA-ILD development recently gained large attention. Of these agents, methotrexate (MTX) and tumor necrosis factor-alpha inhibitors (TNFi) have been linked to drug-induced pneumonitis, implying their safety issue in RA treatments (14). However, pulmonary involvements particularly ILDs are often unrecognized in RA patients, highlighting a need of clinical tools such as predicting models that are able to identify the early stages of RA-ILD in clinical settings.
RA is a common inflammatory disease, which predominantly degrades articular cartilage. A compelling body of studies demonstrated that cartilage degradation was mediated by enzyme matrix metalloproteinase (MMP) family members. Among these members, MMP-3 (stromelysin-1) is a proteinase synthesized and secreted by synovial fibroblasts, which can result in the degradation of many components of matrix proteins, such as fibronectin, gelatins, and collagens, in the synovial joints. In addition, MMP-3 can also activate other MMPs such as MMP-1, MMP-7, and MMP-9, rendering a crucial role of MMP-3 in connective tissue remodeling (15,16). A recent study found that the level of circulating MMP-3 was correlated with RA disease activity (17). Of note, further studies revealed that the MMP-3 level was mainly elevated in bronchial and alveolar epithelial cells, interstitial fibroblasts, alveolar macrophages, and other leukocytes of idiopathic pulmonary fibrosis (IPF) lungs (18,19). Mechanistically, it has been described that MMP3 played a role in fibrogenesis by driving epithelial-to-mesenchymal transition (EMT) (18). These studies strongly support the notion of involvements of MMP-3 in the development of RA and pulmonary fibrosis.
In recent years, the nomogram has been widely used as a predictive approach across the majority of cancer types (20,21), primarily owing to its ability to meet requirements for an integrated model (22). The practices of prognosis and risk prediction of diseases via web have contributed to their popularity among clinicians and patients themselves (22,23). In light of aforementioned findings of MMP-3 in RA and pulmonary fibrosis, we thus sought to develop a model that incorporated plasma MMP-3 and independent clinical risk factors for identifying ILD in RA patients.

Study Population and Design
In accordance with protocols approved by the Ethics Committee for the Conduct of Human Research at the General Hospital of Ningxia Medical University (2020-916), blood samples were collected from the outpatient rheumatology and respiratory clinic of the General Hospital of Ningxia Medical University between December 2019 and January 2021. Two hundred twenty-three patients who were diagnosed with RA by fulfilling the 1987 American College of Rheumatology (ACR) classification criteria (24) (25) were further classified into 138 non-ILD RA and 85 RA-ILD based on their chest highresolution computed tomography (HRCT) and clinical manifestations. A multidisciplinary team (MDT) including pneumologists, radiologists, pathologists, and rheumatologists comprehensively assessed the absence/presence of RA-ILD. Patients with other chronic pulmonary diseases and infectious diseases were excluded. Furthermore, patients who had severe heart, renal, and lung dysfunction were also excluded from this study. Eligible patients were randomly assigned into the training cohort group and validation cohort group. The training cohorts were used to screen risk factors and construct the model. The validation cohorts were used to validate the accuracy of the model generated with training cohorts.

MMP-3 Measurement
The level of plasma MMP-3 was determined by commercially available ELISA according to the kit's protocol (NeoBioscience Inc., Shenzhen, China) (assay range: 0.0313-2 ng/ml). For detection of MMP-3 protein, the optical density was measured using the wavelength of 450 nm. Experiments were repeated for three times.

Statistical Analysis
The independent risk factors of the presence of ILD in RA patients were evaluated using univariate logistic regression analysis in training cohorts. The variables with p-value less than 0.05 were candidates for stepwise multivariate analysis. A nomogram was then established based on the results from the final regression analysis and by using the rms package of R, version 4.1.0 (http://www.r-project.org/). The predictive performance of the nomogram was assessed by concordance index (C-index) and calibration with 1,000 bootstrap samples to decrease the overfit bias and decision curve analysis (DCA) (26,27). DCA was applied to the nomogram by quantifying net benefits at different threshold probabilities (28). In this study, the clinical benefits of the nomogram constructed with an incorporation of plasma MMP-3 as a variable risk factor over a nomogram without MMP-3 data were compared. Continuous variables were expressed as mean (SE) and compared with two sample t-testing or Mann-Whitney testing. Categorical demographic data were compared using the c 2 test or Fisher's exact test. In all analyses, p < 0.05 was considered statistically significant. All statistical analyses were performed using SPSS for Windows (version 26.0) (SPSS Inc., Chicago, IL, USA) and "rms" and "dca.r" packages in R.

Baseline Demographic and Clinical Features
A total of 223 patients were enrolled in the study; 100 patients who had no chest HRCT scan, 19 patients who had an uninterpretable HRCT scan, and 58 patients who did not have serum samples available were excluded. Of these, 122 and 101 of them were randomly grouped into the training and validation cohorts according to the timeline, respectively ( Figure 1). Table 1 depicts the baseline clinical characteristics of the patients. With respect to baseline clinical data, there was no obvious statistical significance between the training and validation cohorts. In addition, an HRCT-identified ILD was found in 50 (41.0%) and 35 (34.7%) patients in the training and validation cohorts, respectively. Thirty out of 50 (76.0%) patients with imaging features of ILD were UIP, 6 of 50 (12.0%) cases were NSIP, 2 of 50 (4.0%) were OP, and the rest 4 (8.0%) were other ILD patterns in the training cohorts. Twenty-five out of 35 (71.4%) patients with imaging features of ILD were UIP, 2 of 35 (5.7%) cases were NSIP, 5 of 35 (14.3%) were OP, and the rest 3 (8.0%) were other ILD patterns in the validation cohorts.

Elevated MMP-3 Protein in Plasmas of RA-ILD Patients
The GEO databases have been used by many research communities, which can provide an invaluable gene expression profile to derive a new hypothesis (29). We next evaluated the data available in NCBI GEO profiles (GSE128813 and GSE47460) for DEGs from RA-FLS and lung tissue in ILD. A total of the expressed 43 DEGs were identified in two datasets, consisting of 24 up-regulated genes and 19 down-regulated genes ( Figure 2 and Table 2). To obtain the best-fit RA-ILD proteins, we filtered these DEGs through the PPI network and visualized them by Cytoscape. As shown in Figure 3A, the network consisted of 18 nodes and 25 edges. The red and green circles presented the up-regulated proteins and down-regulated proteins encoded by DEGs, respectively. The top eight hub genes selected by the MCODE in the CytoHubba plug-in included MMP-1, MMP-3, disc largeassociated protein 5 (DLGAP5), DNA topoisomerase II alpha (TOP2A), centromere protein F (CENPF), kinesin family member 4A (KIF4A), SHC SH2 domain-binding protein (SHCBP1), and a     disintegrin and metalloproteinase with thrombospondin motif (ADAMTS) ( Figure 3B). Next, we explored plasma MMP-3 in RA patients with and without ILD. The plasma level of the MMP-3 protein significantly increased in RA-ILD patients (0.94 ± 0.12 ng/ml) as compared with RA patients without ILD (0.36 ± 0.04 ng/ ml) (p = 0.0457) in the training cohorts ( Figure 4A). In addition, the concentration of plasma MMP-3 in RA-ILD (0.49 ± 0.04 ng/ ml) was also higher compared to RA patients without ILD (0.32 ± 0.02 ng/ml) (p = 0.006) in the validation cohorts ( Figure 4B).

Nomogram Variable Screening
Following univariable analysis, the variables sex (male) (

Development and Validation of an ILD-Identifying Nomogram
Based on the variables screened, a nomogram was constructed by incorporating the five significant risk factors for identifying ILD in RA patients. The total nomogram score was determined by the individual scores and applied to obtain the probability for identifying the presence of ILD; most patients had total risk A B  points ranging from 56 to 160 in the present study. While the total risk points were 26, 58, 77, 97, 109, and 128, the presence of ILD were 10%, 30%, 50%, 70%, 80%, and 90%. For instance, a given patient was identified an ILD probability by using the nomogram ( Figure 5A). Of note, the calibration plots graphically showed good agreement between the identified probability of ILD and the actual observation in the training cohorts and validation cohorts, which indicates good calibration of this model (Figures 5B, C). These results suggest that the nomogram built with a combination of variable factors MMP-3, sex, current smoker, RF, and CRP had considerable discriminative and calibrating abilities for identifying ILD in RA patients.

The Incorporation of Plasma MMP-3 Increases the Identifying Probability of Nomogram for ILD in RA Patients
The changes in C index were used to test the performance of the nomogram with an incorporation of circulating MMP-3 into available clinical risk factors. Indeed, the including of MMP-3 in the combination of four independent risk factors sex, current smoker, RF, and CRP improved the diagnostic accuracy of the nomogram for ILD in RA patients, as compared to that which generated a combination of four risk factors alone ( Figure 6). The C indexes of the nomogram generated with the four risk factors alone were 0.801 (95% CI, 0.721-0.881) and 0.819 (95% CI, 0.731-0.907) in the training cohorts and validation cohorts, respectively (left panel, Figures 6A, B), while the C indexes of the MMP-3-incorporated nomogram were 0.826 (95% CI, 0.751-0.901) and 0.826 (95% CI, 0.740-0.912) in the training cohorts and validation cohorts, respectively (right panel, Figures 6A, B).
In addition, DCA curves also showed that both of the nomograms could better identify ILD in RA patients, although the net benefits were not significantly improved in both the training and validation cohorts ( Figures 7A, B).

DISCUSSION
Although the mortality rate in patients with RA is declining, the death of patients due to RA-ILD is increasing (4), emphasizing the need for improved early diagnosis and intervention in the era of precision medicine. Therefore, we constructed a nomogram to identify the presence of ILD in RA patients. In the present study, we observed that plasma MMP-3 protein was elevated in RA-ILD patients compared with non-ILD RA patients. Five variables (male sex, current smoking, RF, CRP, and MMP-3) were identified by multivariable logistic regression based on univariable analysis and were incorporated into the nomogram for the identification of ILD in RA patients. In consistence, calibration plots graphically exhibited an excellent diagnostic performance by the nomogram. The C-index (0.826 for the training cohorts and 0.826 for the validation cohorts) indicated the satisfactory discriminative ability of the nomogram. The C index further showed that the nomogram established with a combination of four conventional factors and MMP-3 increased the accuracy in the identification of RA-ILD, as compared to that built with the four risk factors alone. However, DCA curves demonstrated that net benefits were not significantly improved in this study. Of note, DCA cannot replace measures of accuracy such as sensitivity and specificity. This is because DCA mainly focuses on evaluating clinical application, while the ROC is more utilized in evaluating the diagnostic accuracy of the identifying and/or predictive models. In this regard, the area under the ROC curve (AUC) can be interpreted as the probability that a group of individuals experienced the event and the others did not, in which case the individuals who experienced the event had the higher predicted probability (30). Therefore, this observation requires further investigation with more evaluation indicators, such as the net reclassification index (NRI).
MMP-3 is actively associated with joint destruction in RA patients, which has been applied for diagnosing and monitoring the disease activity of RA (17). Additionally, a recent study showed that a higher level of MMP-3 was associated with the interleukin 1b (IL-1b)-induced inflammation response in FLS (31). The above data highlighted the potential clinical value of MMP-3 in the personalized medical management of RA. Previous studies using cell culture models revealed that exposure of lung epithelial cells to MMP-3 led cells to undergo EMT. For instance, when A549 human lung epithelial cells were treated with MMP-3, the expression of EMT signature genes and a breakdown of epithelial cell structure with a conversion to a more spindle-like morphology were altered (32). Moreover, a number of experiments in transgenic mice have suggested that the role of MMP-3 as an inducer from lung type II alveolar epithelial cells leads to the development of fibrotic characteristics (33). Therefore, the aberrant expression of MMP-3 may be instrumental in promoting lung diseases. In agreement with these findings, based on available high-throughput mRNA expression profiles, we uncovered that the expression of MMP-3 was significantly co-up-regulated in RA-FLS and ILD tissues compared to healthy control in the present study. In addition, we found that a higher level of MMP-3 was in plasmas of RA-ILD patients compared with those with non-ILD.
Emerging evidence has shown that baseline characteristics [old age (≥60 years), sex male, smoking, and RA duration] are risk factors for the development of RA-ILD (34)(35)(36). Indeed, smokers were prone to develop interstitial lung abnormalities (ILAs), which was associated with numerous pro-inflammatory molecules and MMPs, including MMP-3 (37). This observation was consistent with our identifying result of the nomogram generated from datasets including smoking status and male as risk factors in the development of ILD in RA patients. The present cohorts did not identify old age and RA duration as predominant risk factors in RA-ILD as stated in previous studies (38); we reasoned that it might be due in part to a mild disease status of cohorts in this study.
Apart from the baseline risk factors, several serological indexes, such as RF, C-reactive protein (CRP), and erythrocyte sedimentation rate (ESR), were also demonstrated to have diagnostic values for ILD in RA (9,39). In agreement with their findings, levels of CRP and RF may be useful biomarkers of risk factors for developing RA-ILD in this study. However, the study showed no association between ESR and RA-ILD in this multivariable model; some patients had elevated ESR probably due to a change in physiological conditions, including strenuous exercises and emotional agitation.
Recently, there is evidence that a proportion of lung disease may be caused by the therapeutic agents used in the treatment of RA, such as methotrexate (MTX). MTX is recommended as the first-line treatment for RA in most regions of the world, and large prospective studies have demonstrated that the use of MTX effectively reduces disease activity and the risk of death in RA patients. However, most of these studies had several potential biases (40,41). Intriguingly, recent studies and meta-analyses suggest that MTX exposure may be associated with an increased risk in developing ILD in RA patients (42)(43)(44). To date, the association between MTX and ILD is still in debate, as no randomized controlled trials in RA-ILD have been conducted in case reports and retrospective studies (14). Similarly, we did not find various disease-modifying antirheumatic drugs to be associated with increased risk of RA-ILD.
Of note, previous studies have reported that antibodies CCP and AKA were risk factors for ILD. However, neither anti-CCP antibody or anti-AKA antibody exhibited a correlation with ILD, and hence they were excluded in single-factor screening in this study. We reasoned that it might be caused by a relatively small size of samples or variant treatments in patients enrolled in this study. This observation requires further investigation with a larger sample size.
There were several limitations in the current study. First, subclinical ILD are frequent extra-articular manifestations of RA; we cannot rule out these patients, mainly due to the lack of a universally accepted evidence-based screening approach. Secondly, although an internal validation of the model yielded optimal discrimination, excellent calibration and bootstrapping are sample reuse methods, and the predictive performance of the nomogram still requires external validation using additional datasets to ensure external applicability, particularly from other research centers. Thirdly, this nomogram model is created by a cross-sectional cohort study and is only able to identify the presence of ILD in RA patients. We further need to collect blood samples before ILD appearance to predict the development of ILD in RA patients. Finally, further follow-up data collection, survival data, and some well-recognized molecular factors could improve this model for future use. The above limitations may be parts of causes of the differences between our study and others. Therefore, the findings presented in this report should be confirmed in subsequent prospective studies. Collectively, despite these potential shortcomings in this nomogram model, we clearly demonstrate that a combination of conventional risk factors (male sex, current smoking, RF, and CRP) and MMP-3 is strongly associated with the presence of ILD in RA patients. Our findings may facilitate earlier identification of the spectrum of RA-ILD, potentially leading to improvement in clinical outcomes.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/supplementary material. Further inquiries can be directed to the corresponding authors.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by The Ethic Committee for the Conduct of Human Research, General Hospital of Ningxia Medical University. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
JX and JW collected the clinical data. JX, WH, and SW performed the serological analysis. SC collected the plasma samples. JX analyzed the data and drafted the manuscript. XL designed the experiments and revised the manuscript. SC and XL analyzed data and interpreted the data. All authors contributed to the article and approved the submitted version.