Comprehensive analysis of an endoplasmic reticulum stress-related gene prediction model and immune infiltration in idiopathic pulmonary fibrosis

Background Idiopathic pulmonary fibrosis (IPF) is a chronic progressive interstitial lung disease. This study aimed to investigate the involvement of endoplasmic reticulum stress (ERS) in IPF and explore its correlation with immune infiltration. Methods ERS-related differentially expressed genes (ERSRDEGs) were identified by intersecting differentially expressed genes (DEGs) from three Gene Expression Omnibus datasets with ERS-related gene sets. Gene Set Variation Analysis and Gene Ontology were used to explore the potential biological mechanisms underlying ERS. A nomogram was developed using the risk signature derived from the ERSRDEGs to perform risk assessment. The diagnostic value of the risk signature was evaluated using receiver operating characteristics, calibration, and decision curve analyses. The ERS score of patients with IPF was measured using a single-sample Gene Set Enrichment Analysis (ssGSEA) algorithm. Subsequently, a prognostic model based on the ERS scores was established. The proportion of immune cell infiltration was assessed using the ssGSEA and CIBERSORT algorithms. Finally, the expression of ERSRDEGs was validated in vivo and in vitro via RT-qPCR. Results This study developed an 8-ERSRDEGs signature. Based on the expression of these genes, we constructed a diagnostic nomogram model in which agouti-related neuropeptide had a significantly greater impact on the model. The area under the curve values for the predictive value of the ERSRDEGs signature were 0.975 and 1.000 for GSE70866 and GSE110147, respectively. We developed a prognostic model based on the ERS scores of patients with IPF. Furthermore, we classified patients with IPF into two subtypes based on their signatures. The RT-qPCR validation results supported the reliability of most of our conclusions. Conclusion We developed and verified a risk model using eight ERSRDEGs. These eight genes can potentially affect the progression of IPF by regulating ERS and immune responses.


Introduction
Idiopathic pulmonary fibrosis (IPF) is a chronic and progressive interstitial lung disease that has a high mortality rate and limited treatment options (1).The average survival time after diagnosis is only 2-3 years (2).Although two Food and Drug Administrationapproved drugs, pirfenidone and nintedanib, can slow the decline in lung function, they do not halt disease progression or reduce mortality (3).Currently, IPF is diagnosed by identifying a specific pattern of lung damage known as usual interstitial pneumonia using high-resolution computed tomography or lung biopsy (4).Despite advancements in imaging technology and disease classification systems, diagnosis remains challenging.Certain serum biomarkers, such as surfactant protein A/D, matrix metallopeptidase-7, periostin, osteopontin, and chemokine ligand 18, have shown predictive value for diagnosis, prognosis, and treatment response (5,6).However, the heterogeneity of IPF and the involvement of various pathophysiological factors in its progression make its diagnosis and treatment difficult.Therefore, there is a pressing need for more accurate, noninvasive, and feasible markers that could aid in the diagnosis and prognosis of IPF, allowing for more precise and personalized treatment approaches for patients with this condition.
The endoplasmic reticulum (ER) is an essential organelle that maintains proteostasis, or the balance between protein production and degradation within cells (7).ER stress (ERS) is a cellular response that occurs when misfolded proteins accumulate in the ER and disrupt protein homeostasis (8).Several factors, such as the expression of mutant proteins, oxidative stress, impaired autophagy, mechanical stretch, hypoxia, and aging, can induce ERS in IPF (9).Emerging evidence suggests that ERS plays a critical role in IPF by regulating the apoptosis and senescence of alveolar epithelial cells (AECs) (10), promoting epithelial-mesenchymal transition (11) and facilitating myofibroblast differentiation (12).Specifically, a fibroblast-enriched ER protein called TXNDC5 promotes pulmonary fibrosis by enhancing transforming growth factor-b (TGF-b) signaling through stabilizing TGFBR1 (13).However, the exact relationship between ERS and IPF has not been fully elucidated.
Recently, there has been growing interest in the role of immune infiltration in cancer research.However, it is important to note that immune dysregulation also plays a crucial role in the development of IPF.Various immune cells are involved in the pathogenesis of pulmonary fibrosis.When AECs are continuously injured, alveolar macrophages release profibrotic cytokines and chemokines, such as CCL18, CHI3L1, MMPs, Wnt, and TGF-b (14).These substances activate fibroblasts, which differentiate into myofibroblasts.Myofibroblasts produce an extracellular matrix (ECM), leading to the thickening of the pulmonary interstitium.Regulatory T cells inhibit Th1 cell activation, which alters the Th1/Th2 balance in the lungs.Th2 cells produce IL-4 and IL-13, which promote the polarization of alveolar macrophages into a profibrotic M2 phenotype and differentiation of circulating fibrocytes into fibroblasts (15).Studies have shown that ERS regulates the polarization of M2 macrophages in the lungs (16,17).However, the precise mechanism underlying this relationship is poorly understood.Furthermore, whether ERS is involved in interactions between other immune cells in IPF is unclear.
This study successfully established a connection between ERS and immune infiltration in the context of IPF through a comprehensive set of methodical bioinformatics analyses.Furthermore, a diagnostic and prognostic prediction model and a subclass classification system for patients with IPF were developed based on genes associated with ERS.This study contributes substantially to advancing knowledge regarding the intricate relationship s among ERS, immune infi ltration , and IPF pathogenesis.
The GSE28042 dataset consists of 19 healthy subjects and 75 IPF peripheral blood mononuclear cell (PBMC) specimens.These specimens were analyzed using the Agilent-014850 Whole Human Genome Microarray 4×44K G4112F on the GPL6480 platform.
The GSE110147 dataset included 11 healthy, 15 nonspecific interstitial pneumonia, and 22 IPF fresh-frozen lung tissue samples.The samples were analyzed using the Affymetrix Human Gene 1.0 ST Array on the GPL6244 platform.Fifteen nonspecific interstitial pneumonia samples were excluded from further analysis.The GSE70866, GSE28042, and GSE110147 datasets were used as training sets.The details of these datasets are listed in Table 1.
The GSE24206 dataset comprised six healthy donors and 17 IPF lung tissue specimens.These specimens were analyzed using the Affymetrix Human Genome U133 Plus 2.0 Array on the GPL570 platform.The GSE93606 dataset included 20 healthy controls and 154 IPF BALF and PBMC samples.The samples were analyzed using the Affymetrix Human Gene 1.1 ST Array on the GPL11532 platform.The GSE24206 and GSE93606 datasets were selected as external validation sets.Details of these datasets are provided in Supplementary Table S1.

Screening of ERS-related differentially expressed genes
The raw microarray data from three datasets, GSE70866, GSE28042, and GSE110147, were subjected to preprocessing for quality control using the "limma" R package (version 3.56.2) (28).This involved background adjustments and normalization.The resulting expression matrices before and after normalization were visualized using boxplots.The "limma" R package (version 3.56.2) was also employed to identify DEGs between the IPF patient and control groups.The threshold values for significance were set at an adjusted p value (P.adj)< 0.05 and |log2 fold change| > 0.5 in all three datasets.The results were visualized using the "ggplot2" and "pheatmap" R packages (version 1.0.12).Common DEGs (co-DEGs) were identified by intersecting the DEGs from the three datasets.Subsequently, the co-DEGs were further intersected with ERSRGs to determine the ERSRDEGs.Co-DEGs and ERSRDEGs were identified using an online program, and Venn diagrams were generated to illustrate overlapping genes.

Functional annotation of ERSRDEGs and IPF datasets
Gene Ontology (GO) analysis was conducted to assess the biological functions of ERSRDEGs.The analyses included the evaluation of biological processes (BP), molecular functions (MF), and cellular components (CC).The R package "ClusterProfiler" (version 4.8.3)(29) was utilized for this analysis.A significance threshold of p.adj< 0.05 and false discovery rate (FDR)< 0.05 were applied to determine significant enrichment.
To further evaluate the relationship between genes in a predefined gene set and a specific phenotype, gene set enrichment analysis (GSEA) (30) was employed.This analysis was performed on datasets related to IPF.The gene set "c2.cp.all.v2022.1.Hs.symbols.gmt"from the MSigDB database (25) was used for GSEA, and only terms with p.adj< 0.05 and FDR< 0.05 were considered significant.

Weighted gene coexpression network analysis
A weighted gene coexpression network analysis (WGCNA) (31) was conducted on the DEGs between individuals with IPF and control individuals in the GSE70866 dataset.The analysis employed the WGCNA R package (32) with the RsquaredCut parameter set to 0.90, the minimum number of module genes set to 100, and the module merge cut height set to 0.2.This package constructs scalefree coexpression networks for clinical phenotypes.Hierarchical clustering analysis was employed to filter the discrete cases.Subsequently, an appropriate soft power was selected to construct a weighted adjacency matrix, which was then transformed into a topological overlap matrix (TOM).The TOM contained module assignments labeled with color and module features (ME).

GSE70866
GSE28042 GSE110147 Pearson's correlation coefficients were calculated to evaluate the relationship between ME and the clinical features.Finally, the genes in the most significant module associated with IPF were intersected with ERSRDEGs for further investigation.

Co-ERSRDEG-associated diagnostic model construction
In this study, we initially developed a Support Vector Machine (SVM) model using the SVM algorithm (33) to identify ERSRDEGs associated with IPF.The number of ERSRDEGs was selected based on accuracy and error rate.Additionally, we employed the random forest (34) (RF) algorithm, which utilizes bootstrap aggregation and randomization of predictors to further screen candidate ERSRDEGs for IPF diagnosis.The RF model was implemented using the "randomForest" package (version 4.7-1.1)(35) in R, with the following parameters: ntrees = "1000" and "set.seed(234)".The number of trees and error value of the 10-fold cross-validation were plotted on the X and Y axes.Subsequently, a binary logistic regression analysis was conducted to investigate the impact of the selected ERSRDEGs on IPF.The logistic regression model was visualized using the Forest Plot.To prevent overfitting, we employed the least absolute shrinkage and selection operator (36) (LASSO) regression algorithm to identify the ERSRDEGs with the highest predictive value for IPF in the logistic regression model.The "glmnet" (37) package (version 4.1-8) in R was utilized for this analysis.The ERSRDEGs and their coefficients were determined using the best penalty parameter l, which was associated with the smallest 10-fold cross-validation error.The results of the LASSO regression analysis were presented using a diagnostic model and variable locus diagrams.Finally, the identified genes were used to construct an optimal risk signature, which was determined by a linear combination of their expression levels, weighted with the regression coefficients from the LASSO analysis.The risk score for each sample was calculated as follows: Finally, the ERSRDEGs identified by the SVM, RF, and Logistic-LASSO models were compared using a Venn diagram to identify overlapping ERSRDEGs.A nomogram (38) was constructed using the "rms" package (version 6.7-1) in R based on the co-ERSRDEGs to visualize the diagnostic model for IPF.The model accuracy was assessed by evaluating its predictive value using a calibration curve.Furthermore, the clinical impact of the model's judgment on patients with IPF was quantified using decision curve analysis (DCA) (39) by plotting a clinical impact curve.

Validation of expression of co-ERSRDEGs and effect of the diagnostic prediction model
The expression matrix of co-ERSRDEGs was obtained from four datasets: GSE70866, GSE28042, GSE110147, and GSE24206.The expression differences between IPF and healthy samples were assessed using the Wilcoxon rank sum test and visualized using the "ggplot2" package.Statistical significance was set at p< 0.05.To evaluate the accuracy of the model's predictions, receiver operating characteristic (ROC) (40) curves and corresponding area under the curve (AUC) values were calculated using the "survivalROC" R package.The diagnostic model based on co-ERSRDEGs was tested in the training sets GSE70866 and GSE110147, as well as the validation sets GSE24206 and GSE93606.

Spearman correlation analysis of co-ERSRDEGs
In this study, Spearman correlation analysis was conducted to examine the expression levels of co-ERSRDEGs in the GSE70866 dataset.The analysis was performed using the "limma" package (28), with a significance threshold of |R| > 0.2 and p< 0.05.Scatter plots were generated to visualize the results using the "ggplot2," "ggpubr," and "ggExtra" packages.

Construction of ERS score prognostic model
Initially, single-sample GSEA (ssGSEA) was employed to quantify the ERS phenotype scores of all samples in the GSE70866 dataset.This was performed by utilizing the expression matrix of co-ERSRDEGs and the "GSVA" package (41).Subsequently, the patients with IPF were divided into highand low-risk groups based on the median ERS score obtained from the GSE70866 dataset.The overall survival (OS) of these two groups was compared via Kaplan-Meier analysis and the log-rank test.To assess the prognostic value of the risk model, ROC analysis, time-dependent ROC curve analysis, and calculation of the AUC values were conducted.These evaluations were performed to determine the accuracy of the model for predicting patient outcomes.

Classification of IPF subtypes
The "ConsensusClusterPlus" (35) package in R was employed to cluster the patients with IPF in the GSE70866 dataset.This clustering was based on the expression of co-ERSRDEGs between patients with IPF and controls to identify the molecular subtypes of IPF.The analysis was conducted using the following parameters: maxK = 8, reps = 50, pItem = 0.8, pFeature = 1, clusterAlg = "pam," and distance = "spearman."The outputs included consensus cumulative distribution function (CDF) plots and the relative change in the area under the CDF curve.Principal component analysis (PCA) was conducted to further validate gene expression patterns in the identified clusters.Additionally, Kaplan-Meier survival analysis using the survival package was employed to assess the differences in OS between the various subtypes of IPF.

Immune infiltration and correlation between co-ERSRDEGs and immune cells
This study divided IPF cases in the GSE70866 dataset into two groups based on their risk scores obtained from the LASSO regression analysis.Similarly, patients were divided into high and low ERS score groups based on their ERS phenotype scores.The relative abundances of 28 immune cell types in patients with IPF were quantified using the ssGSEA algorithm (42,43).The differences in immune cell abundance between the risk score groups, ERS score groups, and IPF subtypes were analyzed using the Mann-Whitney U test and presented using boxplots.The correlation between the abundance of immune cell infiltration in different groups and subtypes was assessed using Pearson's correlation analysis and visualized using a correlation matrix plot created with the "ggplot2" R package.Pearson's correlation analysis was used to examine the relationship between the abundance of immune cell infiltration and expression of co-ERSRDEGs.The results were displayed using a correlation dot plot generated with the "ggplot2" R package.Additionally, the CIBERSORT algorithm (44) was used to calculate the infiltration fraction of 22 immune cell types in the risk score groups, ERS score groups, and IPF subtypes.The results were presented as histograms.Differences in immune cell abundance between different groups were analyzed using the Wilcoxon rank-sum test and are displayed using boxplots.The correlation between the abundance of immune cell infiltration and co-ERSRDEG expression was assessed using Pearson's correlation analysis and visualized using a correlation matrix plot created with the "ggplot2" R package.

Preliminary validation of expression of co-ERSRDEGs in ERS A549 cell model and embryonic mouse fibroblasts 3T3 cell model
A549 and embryonic mouse fibroblast 3T3 cell lines were obtained from the American Type Culture Collection (ATCC).The cells were maintained in specific media; DMEM and Ham's F-12K were used for embryonic mouse fibroblast 3T3 and A549 cells respectively.The media were supplemented with 10% fetal bovine serum and 1% penicillin-streptomycin.The cells were cultured at 37°C with 5% v/v CO2.The study consisted of control and treatment groups.The treatment group was exposed to 1 mg/mL tunicamycin (MCE, NJ, USA) for 24 h in embryonic mouse fibroblast 3T3 cells and 4 mg/mL tunicamycin for 48 hours in A549 cells.RNA was then extracted from the cells for RT-qPCR.The relative gene expression was determined using the Equation 2 ^-DDct.Primer sequences are listed in Supplementary Table S3.

Validation of expression of co-ERSRDEGs in bleomycin-induced pulmonary fibrosis mouse model
Male C57BL/6 mice aged 6-9 weeks, with an average weight of 20-25 g, were obtained from the Laboratory Animal Center of Guizhou Medical University.The study protocol was reviewed and approved by the Animal Ethics Committee of the institution.To induce fibrotic changes, the experimental mice received 50 mL of bleomycin (5 mg/kg) via intratracheal administration, whereas the control mice were administered an equal volume of phosphate buffer (PBS).After 21 days, lung tissue samples were collected.A portion of the sample was used for subsequent RT-qPCR.In contrast, the remaining portion was subjected to hematoxylineosin (HE) and Masson staining following the instructions in the staining kit.

Statistical analysis
Statistical analysis was performed using R software (version 4.2.2) and the corresponding packages.All data were expressed as the mean ± standard deviation (SD).Gene expression in the two groups was compared using Student's t-test, Wilcoxon test, or Mann-Whitney U test, where appropriate.Correlation analyses were performed using Spearman's or Pearson's correlation analyses.GraphPad software (version 8.0) was used to visualize statistical results.Two-tailed p<0.05 was considered significant.
A flowchart representing the overall concepts and procedures employed in this study is shown in Figure 1.

Identification of DEGs and ERSRDEGs between IPF and control
To ensure the comparability of gene expression data across samples, the gene expression levels of the GSE70866, GSE28042, and GSE110147 datasets were normalized, and batch effects were subsequently eliminated (Figures 2A-F).Furthermore, the "limma" package was used to examine DEGs in patients with IPF and healthy controls using p-value<0.05and |log2FC|>0.5 as thresholds.A total of 1,256 DEGs were identified in the GSE70866 cohort of the GEO database.Among these genes, 499 were upregulated, and 757 were downregulated.Similarly, the GSE28042 dataset revealed 1,294 DEGs, of which 608 were upregulated and 686 were downregulated.Additionally, the GSE110147 dataset contained 8,139 DEGs, with 4,211 upregulated and 3,928 downregulated genes.Figures 3A-C depict the expression patterns of DEGs using heatmap visualization.DEGs from three datasets, GSE70866, GSE28042, and GSE110147, were compared, and co-DEGs were identified.A total of 90 co-DEGs were shared among these datasets, as illustrated in a Venn diagram (Figure 3D).A collection of 2,269 ERSRGs was acquired from the GeneCards database and PubMed.These genes intersected with the co-DEGs, resulting in 13 overlapping ERSRDEGs, which were further examined.Figure 3E shows the 13 ERSRDEGs (ADM, AGRP, BIRC3, CDA, FAM20C, IER3, MT1E, NELL2, PDGFA, RAI14, SNCA, SOCS3, and ZNF91) in a Venn diagram.The expression patterns of these genes in the GSE70866, GSE28042, and GSE110147 datasets are represented using heat maps, as shown in Figures 3, F-H, respectively.

Functional annotation of ERSRDEGs
To gain further insight into the biological function of the 13 ERSRDEGs (ADM, AGRP, BIRC3, CDA, FAM20C, IER3, MT1E, NELL2, PDGFA, RAI14, SNCA, SOCS3, and ZNF91) in IPF, we performed GO enrichment analyses (Table 2).GO analysis encompassed three categories: BP, CC, and MF.In terms of BP, ERSRDEGs were predominantly enriched in negative regulation of phosphate metabolism, morphogenesis of a branching epithelium, response to insulin, response to copper ion, and negative regulation of the G protein-coupled receptor signaling pathway (Figures 4A, B).With regard to CC, the top two enriched were platelet alpha granule and Golgi lumen (Figures 4A, C).In the context of MF, ERSRDEGs were closely associated with enzyme inhibitor activity, receptor-ligand activity, signaling receptor activator activity, hormone activity, and G proteincoupled receptor binding (Figures 4A, D).Furthermore, a combined GO and log FC enrichment analysis revealed that ERSRDEGs were primarily enriched in BP pathways, as visualized in a bubble diagram (Figure 4E).

GSEA of IPF datasets
GSEA was performed to examine the relationship between gene expression and the BP, CC, and MF implicated in patients with IPF compared with healthy controls.The GSE70866, GSE28042, and GSE110147 datasets were used for this analysis.To identify significant enrichment, the enrichment screening criteria were set at a p.adj< 0.05 and FDR value (q.Vue)< 0.05.Results showed a significant concentration of genes associated with various pathways in patients with IPF compared with the control group.Specifically, in the GSE70866 dataset, the genes were concentrated in pathways such as surfactant metabolism, ECM receptor interaction, lung fibrosis, and cytokinecytokine receptor interaction.These results are shown in Figures 5B-E and Table 3, as well as in the mountain maps in Figure 5A.Similarly, in the GSE28042 dataset, genes were significantly enriched in pathways, including interlerkin-10 signaling, oxidative stress response, neuroinflammation, and diseases of programmed cell death.These findings are shown in Figures 5G-J and Table 4, along with mountain maps in Figure 5F.Furthermore, in the GSE110147 dataset, genes were notably enriched in pathways, such as the diseases of DNA repair, cell cycle checkpoints, apoptotic execution phase, and G2 M checkpoints.These results are presented in Figures 6B-E and Table 5, as well as in the mountain maps in Figure 6A.

Analysis of the GSE70866 dataset using WGCNA
The WGCNA algorithm was used to construct the coexpression modules in the GSE70866 dataset.Initially, genes exhibiting variance Flowchart of data analysis.IPF, Idiopathic Pulmonary Fibrosis; WGCNA, Weighted gene coexpression network analysis; DEGs, differentially expressed genes; ERSRDEGs, endoplasmic reticulum stress-related DEGs; SVM, support vector machine; LASSO, least absolute shrinkage and selection operator; GSEA, gene set enrichment analysis; GO, gene ontology; ERS score, endoplasmic reticulum stress scores; ssGSEA, single-sample gene set enrichment analysis.
in the top 20% of all genes were selected as input genes.Subsequently, a hierarchical clustering analysis was employed to filter out discrete cases (Figure 7A).Furthermore, a soft threshold power of 5 was set as the key parameter to ensure overall connectivity of the coexpression module (Figure 7B).Subsequently, 13 modules were identified based on the optimal soft threshold capability, as illustrated in the cluster dendrogram (Figure 7C).The module merge cut height was then set to 0.2 (Figure 7C), resulting in the final acquisition of 13 coexpression modules with the gene clusters color-coded, as displayed in Figure 7D.The correlation between module membership and IPF samples is shown (Figure 7E).Notably, the blue module exhibited the most significant correlation with IPF (|COR|=0.36,P=2e-05) (Figure 7E), and its characteristic genes with the highest correlation intersected with the 13 ERSRDEGs.The Venn diagram in Figure 7F revealed that two genes (ADM and IER3) were common to both sets.The expression levels of two module ERSRDEGs, ADM and IER3, were analyzed using the Wilcoxon rank-sum test in the GSE70866 dataset.The results indicated that both genes were highly expressed in patients with IPF compared with normal controls (p< 0.001) (Figure 7G).Moreover, a significant positive correlation between the expression of the two module ERSRDEGs was observed using the Spearman algorithm (r = 0.848, p< 0.001) (Figure 7H).

Construction of ERSRDEGs-related diagnostic prediction model
A binary logistic regression analysis was conducted to evaluate the diagnostic significance of the 13 DEGs (ERSRDEGs) in the GSE70866 dataset for IPF.Subsequently, a logistic regression model was constructed, and the results were visually represented using forest plots (Figure 8A).Of the 13 genes, nine had odds ratios (OR) greater than 1, and four had OR less than 1, with statistical significance at p-value<0.05.To further narrow down the ERSRDEGs, SVM and RF models were successively developed.The SVM model was used to determine the optimal number of genes yielding the lowest error and highest accuracy rates.The results showed that the SVM model achieved the highest accuracy when all 13 genes were used (Figures 8B, C).
In addition, RF analysis was used to rank and screen the most important diagnostic markers based on the expression levels of ERSRDEGs in the GSE70866 dataset.ADM, AGRP, BIRC3, CDA, FAM20C, IER3, MT1E, NELL2, PDGFA, SNCA, and ZNF91 were the top 11 genes identified (Figures 8D, E).Furthermore, LASSO regression was used to identify 13 ERSRDEGs based on a previous logistic regression model with the highest predictive value for IPF (Figure 8F, G).Among these genes, AGRP, BIRC3, CDA, FAM20C, MT1E, NELL2, SNCA, and ZNF91 were of high importance; thus, an 8-gene signature consisting of these ERSRDEGs was constructed.The risk score was calculated as follows: To identify more reliable diagnostic markers for ERS-related conditions, we conducted an analysis to obtain co-ERSRDEGs.The results from three models, Logistic-LASSO regression, SVM, and RF, were combined.Eight co-ERSRDEGs were identified, which were also the eight ERSRDEGs obtained from the Logistic-LASSO regression model (Figure 8H).A diagnostic nomogram model was then established based on the expression of these genes, and the expression of AGRP was found to have a significantly higher effect on the diagnostic model than the other variables (Figure 8J).The accuracy of the diagnostic model was assessed using a ROC curve, which showed an AUC of 0.975 (95% CI: 0.950-0.999)for the GSE70866 dataset, indicating high prediction accuracy (Figure 8I).The calibration curve demonstrated that the model had good predictive performance (Figure 8K).Additionally, DCA was conducted to evaluate the relationship between the nomogram and gene score in predicting the benefits and risks of different cutoff points in the prevalence model for IPF.The results showed that the ERSRDEG-related diagnostic model was more beneficial at various threshold probabilities (Figure 8L).

Validation of the effect of the diagnostic prediction model and the expression of co-ERSRDEGs
To assess the validity of the ERSRDEG-related diagnostic model based on the GSE70866 dataset, we evaluated the expression of eight co-ERSRDEGs in the GSE110147 dataset.Furthermore, the coefficients of these genes in the diagnostic model were used to compute the risk score for each sample in the GSE110147 dataset.Using the RiskScore and grouping information available in the GSE110147 dataset, we constructed ROC curves and observed an AUC of 1.000, indicating a high level of prediction accuracy (Figure 9A).To further validate our prediction method, we applied it to external datasets, specifically GSE24206 and GSE93606.The diagnostic model demonstrated a high level of prediction accuracy in the GSE24206 dataset (AUC=0.902)and relatively accurate prediction in the GSE93606 dataset (AUC=0.723)(Figures 9B, C).In summary, these findings suggest that our diagnostic prediction model holds significant value for BALF, PBMC, and lung tissues.
The Wilcoxon rank-sum test was used to investigate variations in the expression levels of the eight co-ERSRDEGs between patients with IPF and healthy individuals.This analysis included three training datasets (GSE70866, GSE28042, and GSE110147) and an external validation dataset (GSE28042).The results indicated that within the training set, the expression of co-ERSRDEGs was significantly different (p< 0.01) (Figures 9D, E, H).In the GSE24206 dataset, CDA, MT1E, NELL2, and ZNF91 expression levels also exhibited significant differences (p< 0.05).However, differences in the expression of AGRP, BIRC3, FAM20C, and SNCA were not significant (Figure 9G).To evaluate the diagnostic impact of the differences in the expression levels of co-ERSRDEGs on IPF, ROC curves were constructed within the GSE70866 dataset.ROC analysis revealed that, except for NELL2, all co-ERSRDEGs had AUC values exceeding 0.750 (Figures 9H-O).

Correlation analysis of expression of co-ERSRDEGs
We employed the "RCircos" package (version 1.2.2) to annotate the chromosomal positions of eight co-ERSRDEGs.Results showed that these genes were predominantly located on chromosomes 1, 4, 11, 16, and 19, with chromosome 16 exhibiting the highest  10A).This suggests a close genomic relationship between these co-ERSRDEGs, which are in close proximity to each other on the chromosomes.
To further investigate the correlation between these eight co-ERSRDEGs in the GSE70866 dataset, we utilized the Spearman algorithm to perform a correlation analysis of their expression levels (Figure 10B).The results were visualized using a correlation heatmap, demonstrating that most correlations between these genes were significant (p<0.05),encompassing positive and negative associations.Subsequently, we selected the six gene pairs with the most significant positive or negative correlations and depicted them using correlation scatter plots.Notably, CDA exhibited the most significant positive correlation with FAM20C, BIRC3 with ZNF91, and FAM20C with SNCA (Figures 10C-E).Conversely, the most significant negative correlations were observed between CDA and ZNF91, AGRP and CDA, and AGRP and FAM20C (Figures 10F-H).

Immune infiltration in IPF dataset GSE70866 and correlation between co-ERSRDEGs and immune cells among different risk score groups
After identifying the group of patients with IPF from the GSE70866 dataset, they were further divided into high-and lowrisk groups based on the median risk score obtained from the ERSRDEG-related diagnosis model.The ssGSEA and CIBERSORT algorithms were used to assess the differences in immune cell infiltration levels between the high-and low-risk-score groups.
Initially, the ssGSEA algorithm was employed to calculate the abundances of 28 different types of immune cells in both groups.The Mann-Whitney U test was used to analyze differences in infiltration between the two groups.The results indicated that 11 immune cell types showed significant differences in infiltration abundance between the high-and low-risk groups (p<0.05) Results of GO enrichment analysis using ERSRDEGs.both groups.The results showed a correlation between the abundance of immune cell infiltration and expression levels of co-ERSRDEGs.In the low-risk group, most immune cells showed a positive association with the expression levels of co-ERSRDEGs, except for AGRP and MT1E (Figure 11D).In the high-risk group, most immune cells exhibited a positive correlation with the expression levels of co-ERSRDEGs, whereas only AGRP and CDA expression levels negatively correlated with the abundance of immune cell infiltration (Figure 11E).
The CIBERSORT algorithm was used to calculate the infiltration abundance of 22 different types of immune cells in both groups.The results indicated that the infiltrating abundance of these immune cells was not uniformly zero, with macrophages M0 and M1 showing a substantial proportion of infiltration abundance across various samples (Figure 12A).The Wilcoxon rank-sum test was employed to examine the differences in the infiltration abundance of the 22 immune cell types between the high-and low-risk groups.The results revealed that the eight immune cell types exhibited significant differences between the two groups (p<0.05)(Figure 12B).These eight immune cell types include activated dendritic cells, resting dendritic cells, macrophages M1, activated mast cells, resting mast cells, monocytes, resting memory CD4 T cells, and naïve CD4 T cells.Furthermore, the Pearson algorithm was used to calculate the correlation between the abundance of these eight immune cell infiltrations and the expression levels of the eight co-ERSRDEGs in both groups.The results demonstrated a significant correlation in both groups, with the positive correlation being more pronounced than the negative correlation (Figures 12C, D).

Identification of ERS score and prognostic prediction model
In this study, the ssGSEA algorithm was used to quantify the ERS phenotype scores of all samples in the GSE70866 dataset.This score was based on the expression matrix of eight co-ERSRDEGs.Subsequently, patients with IPF were categorized into two groups, high-ERS and low-ERS, using the median ERS phenotype score as a threshold.To evaluate the prognostic value of the ERS score model in predicting the outcomes of patients with IPF, Kaplan-Meier curves were generated.The findings revealed that patients in the high ERS score group had significantly shorter OS than those in the low ERS score group (hazard ratio [HR] = 1.66, p = 0.034) (Figure 13A).We also employed a ROC curve to demonstrate the efficacy of the ERS score model in predicting IPF prognosis, with an AUC of 1.000 (Figure 13C).Additionally, a time-dependent ROC curve analysis was performed, yielding AUC values of 0.636, 0.676, and 0.867 for the 1-, 3-, and 5-year survival rates, respectively (Figure 13B).Furthermore, ROC curves were constructed for individual co-ERSRDEGs within the GSE70866 dataset to assess their prognostic impact on IPF.The analysis revealed that except AGRP, MT1E, NELL2, and ZNF91, the ERSRDEGs (BIRC3, CDA, FAM20C, and SNCA) exhibited AUC values exceeding 0.750 (Figures 13D-K).Moreover, the expression levels of certain ERSRDEGs, including BIRC3, CDA, FAM20C, NELL2, and SNCA, were higher in the high ERS score group than in the low ERS score group (Figure 13L).These findings suggest that the ERS score model based on co-ERSRDEGs is a promising prognostic tool for patients with IPF.Enrichment plots from GSEA in GSE110147 dataset.

Immune infiltration in IPF dataset GSE70866 and correlation between co-ERSRDEGs and immune cells among different ERS score groups
Two algorithms, ssGSEA and CIBERSORT, were utilized to investigate the variations in the levels of immune cell infiltration between the high and low ERS score groups.
Initially, the ssGSEA algorithm was used to determine the abundance of 28 distinct types of immune cells in both groups.Subsequently, the Mann-Whitney U test was used to analyze the disparities in infiltration between the two groups.The findings revealed that 20 immune cell types exhibited significant differences in infiltration abundance between the high and low ERS score groups (p<0.05)(Supplementary Figure 1A).These immune cells included activated B cells, activated CD4 T cells, activated CD8 T cells, activated dendritic cells, CD56bright NK cells, CD56dim NK cells, central memory CD4 T cells, effector memory CD8 T cells, eosinophils, gamma delta T cells, immature B cells, macrophages, MDSC, monocytes, NK cells, NK T cells, neutrophils, plasmacytoid dendritic cells, regulatory T cells, T follicular helper cells, and type 1 T helper cells.Furthermore, the Pearson algorithm was employed to examine the correlation between the infiltration abundances of these 20 immune cell types in both groups.The results indicated predominantly positive correlations between the infiltration abundances of these immune cells (Figures 14B, C).Notably, the highest positive correlation was observed between activated CD8 T cells and effector memory CD8 + T cells in both groups.Additionally, the Pearson algorithm was used to analyze the correlation between the abundance of immune cell infiltration and the expression levels of eight common ERSRDEGs in both groups.In the low ERS score group, most immune cells exhibited a positive association with the expression levels of co-ERSRDEGs, except for AGRP (Supplementary Figure 1D).In the high ERS score group, most immune cells positively correlated with the expression levels of co-ERSRDEGs (Supplementary Figure 1E).
Concurrently, the CIBERSORT algorithm was used to calculate the infiltration abundance of 22 different types of immune cells in both groups.The results indicated that the infiltrating abundance of these immune cells was not uniformly zero, with macrophages M0 and M1 exhibiting a substantial proportion of infiltration abundance across various samples (Supplementary Figure 2A).The Wilcoxon rank-sum test was used to examine the differences in the infiltration abundance of the 22 immune cell types between the high and low ERS score groups.The results revealed that the nine immune cell types exhibited significant differences between the two groups (p<0.05)(Supplementary Figure 2B).These nine immune cell types include resting dendritic cells, macrophages M0, activated mast cells, resting mast cells, monocytes, activated NK cells, resting NK cells, resting memory CD4 T cells, and gamma delta T cells.The Pearson algorithm was used to calculate the correlation between the abundance of these nine immune cell infiltrations and the expression levels of the eight co-ERSRDEGs in both groups.The results showed that there was a relatively stronger negative correlation between immune cells and co-ERSRDEGs in the low ERS score group, whereas there was a stronger positive correlation between immune cells and co-ERSRDEGs in the high ERS score group (Supplementary Figure 2C, D).

Classification of IPF subtypes in IPF dataset GSE70866
To illustrate the ERS-related patterns of IPF, unsupervised cluster analysis was conducted on 112 IPF samples from the GSE70866 dataset.This analysis utilized the "ConsensusClusterPlus" R package and focused on the expression patterns of eight co-ERSRDEGs.In the consistency matrix of cluster 2, the intragroup correlation was higher, and the intergroup correlation was low (Figure 14A).Compared with clusters 2-8, the growth rate in cluster 2 was flat in the CDF plot (Figure 14B).Furthermore, Figure 14C shows a significant increase in the relative change in the area under the CDF curve from k = 2 to k = 5.Based on these findings, the 112 IPF samples were divided into two subtypes, cluster 1 (n = 71) and cluster 2 (n = 41), using PCA (Figure 14D).Kaplan-Meier curves were generated to evaluate the prognostic implications of disease classification.The analysis revealed that patients in cluster 2 had a significantly shorter OS than those in cluster 1 (p=0.014)(Figure 14E).Further analysis revealed significant differences in the gene expression patterns between the two subtypes (Figure 14F).We then intersected the 2,633 genes from the difference analysis with the eight co-ERSRDEG to obtain five DEGs: BIRC3, CDA, FAM20C, NELL2, and ZNF91(Figures 14G).To better understand the molecular characteristics that distinguish these subtypes, the expression levels of the five DEGs were evaluated.The results indicated that CDA, FAM20C, and NELL2 were significantly upregulated in cluster 2, whereas BIRC3 and ZNF91 were significantly upregulated in cluster 1 (Figure 14H).

Immune infiltration in IPF dataset GSE70866 and correlation between co-ERSRDEGs and immune cells among different IPF subtype groups
To investigate the disparities in the levels of immune cell infiltration between the two subtypes of IPF, two algorithms, ssGSEA and CIBERSORT, were utilized.Initially, the ssGSEA algorithm was used to determine the abundance of 28 distinct immune cell types in both IPF subtypes.Subsequently, the Mann-Whitney U test was used to analyze the differences in infiltration between the two subtypes.The results revealed that the five immune cell types exhibited significant differences in infiltration abundance between the two IPF subtypes (p<0.05)(Supplementary Figure 3A).These immune cell types included activated CD4 T cells, effector memory CD4 + T cells, immature B cells, memory B cells, and type 2 T helper cells.Furthermore, the Pearson algorithm was used to examine the correlation between the infiltration abundance of these five immune cell types in the two subtypes.The findings indicated predominantly positive correlations between the infiltration abundances of these immune cells (Supplementary Figure 3B, C).Notably, cluster 1 displayed the highest positive correlation between activated CD4 T cells and type 2 T helper cells, whereas cluster 2 demonstrated the highest positive correlation between effector memory CD4 T cells and memory B cells.Additionally, the Pearson algorithm was used to analyze the correlation between the abundance of immune cell infiltration and the expression levels of five DEGs (BIRC3, CDA, FAM20C, NELL2, and ZNF91) in the two subtypes.In cluster 1, most immune cells were negatively associated with the expression levels of the five DEGs (Supplementary Figure 3D).In cluster 2, most immune cells positively correlated with the expression levels of five DEGs, except for CDA (Supplementary Figure 3E).
Concurrently, the CIBERSORT algorithm was used to calculate the infiltration abundance of 22 distinct immune cell types in the two subtypes.The results indicated that the infiltrating abundance of these immune cells was not uniformly zero, with macrophages M0 and M1 exhibiting a substantial proportion of infiltration abundance across various samples (Supplementary Figure 4A).Furthermore, the Pearson algorithm was used to examine the correlation between the infiltration abundance of these 22 immune cell types in the two subtypes.The findings revealed that the number of positively and negatively correlated cell pairs between immune cells was approximately equal between the two subtypes (Supplementary Figures 4B, C).Additionally, the Pearson algorithm was used to calculate the correlation between the abundance of 22 immune cell infiltrations and the expression levels of the five DEGs in the two subtypes.The results demonstrated several positive correlations between immune cells and the five DEGs in the two subtypes (Supplementary Figures 4D, E).

Validation of expression of co-ERSRDEGs in vivo and in vitro
A previous analysis demonstrated a strong correlation between ERS and the diagnosis, prognosis, and classification of IPF.To validate the mRNA expression of the eight co-ERSRDEGs, we established A549 cell and embryonic mouse fibroblast 3T3 cell ERS models using tunicamycin.RT-qPCR results revealed that SNCA, AGRP, ZNF91, FAM20C, and MT1E exhibited high expression levels, whereas CDA, BIRC3, and NELL2 were relatively low in the tunicamycin-treated group compared with the control group in embryonic mouse fibroblast 3T3 cells.This difference was significant, except for MT1E (Figure 15A).Similarly, in the A549 cell line, RT-qPCR results indicated high SNCA, NELL2, ZNF91, MT1E, and FAM20C expression levels.In contrast, CDA, BIRC3, and AGRP levels were relatively lower in the tunicamycin-treated group than in the control group.The difference was significant, except for FAM20C (Figure 15B).
To further validate the findings of the in vitro cell experiments and simulate the pathological process of pulmonary fibrosis in patients, we successfully established a bleomycin-induced pulmonary fibrosis mouse model for in vivo experiments.This model allowed us to investigate ERS events during the development of pulmonary fibrosis.RT-qPCR results demonstrated significantly high expression levels of CDA, BIRC3, SNCA, AGRP, ZNF91, FAM20C, and MT1E in the lung tissues of mice treated with bleomycin.However, results showed that NELL2 was weakly expressed in the bleomycin-induced lung tissues of mice.However, the difference was not significant (Figure 15C).Additionally, histological examination using HE and Masson staining revealed significantly increased lung tissue inflammation and collagen deposition in the bleomycin group compared with the control group (Figure 15D).

Discussion
IPF is a prevalent form of interstitial lung disease characterized by complex and multifaceted pathogenesis and progression (45).The initial presentation of IPF is often characterized by non-specific symptoms, leading to significant delays in diagnosis.The heterogeneity of IPF poses challenges to the effectiveness of treatment strategies, resulting in a poor prognosis for affected individuals (46).Furthermore, the prediction of the course and prognosis of IPF on an individual basis remains challenging.In recent years, numerous studies have been conducted to identify the clinical, imaging, and pathological indicators that could aid in diagnosing, predicting progression, and estimating survival in IPF (3).However, the retrospective nature of these studies and the inherent uncertainty associated with the employed metrics have resulted in varying degrees of accuracy for the identified predictors.Consequently, there is an urgent need to develop appropriate and Immune infiltration analysis evaluated between groups with high-and low-risk scores by the ssGSEA algorithm in GSE70866 dataset.precise diagnostic and prognostic models for IPF to improve clinical practice.ERS has been extensively investigated for its association with IPF.Although its involvement in the pathogenesis and progression of IPF is likely, the exact mechanism remains unclear.Given that IPF is a complex pathophysiological process, this study aimed to explore the significance of ERS in the diagnosis and prognosis of IPF using samples derived from the BALF, PBMC, and lung tissue.Initially, we identified and validated eight ERSRGs to diagnose IPF: AGRP, BIRC3, CDA, FAM20C, MT1E, NELL2, SNCA, and ZNF91.Additionally, we explored the association between ERS and IPF outcomes by calculating an ERS score based on the expression levels of eight co-ERSRDEGs.These genes have been identified as potential protective prognostic factors.Consistent clustering was performed to further predict the regulatory patterns of ERS-related prognosis.This analysis revealed two subgroups with significantly different prognoses, with cluster 2 exhibiting poorer outcomes than cluster 1.Moreover, we discovered a correlation between the infiltration of various immune cell types and ERS in the lung tissue of patients with IPF, as well as differences in immune cell infiltration between normal and IPF lung tissues.Our findings indicated that ERS plays a crucial role in IPF development.Notably, this study represents the first bioinformatics investigation to elucidate the relationship between ERS and IPF using human samples and has laid the foundation for further exploration of ERS in the context of IPF.Therefore, the identification of novel therapeutic targets is imperative.
In this study, diagnostic and prognostic models comprised eight ERS-related genes.AGRP, also known as agouti-related neuropeptide, encodes an antagonist of the melanocortin-3 and melanocortin-4 receptors.Its role in regulating feeding behavior through the melanocortin receptor and/or intracellular calcium regulation suggests its involvement in weight homeostasis (47).Previous studies on body weight homeostasis have shown a correlation between AGRP and ERS and the inflammatory response.However, the specific mechanism of action may be contradictory.Zhou  medications, such as olanzapine, induce ERS in hypothalamic neurons, leading to increased expression of neuropeptide Y and AGRP, autophagy, and resistance to leptin and insulin (48).This ultimately results in the inflammation of the central nervous system (CNS), leading to weight gain.BIRC3, also known as cellular IAP2, is a member of the human inhibitor of apoptosis protein (IAP) family (50).Previous research has demonstrated that BIRC3 is a multifunctional protein that regulates caspases, apoptosis, inflammatory signaling, immunity, mitogenic kinase signaling, and cell proliferation (51).In specific cell types, such as polarized human myeloid leukemia THP-1 cells and primary human macrophages, cIAP1 and cIAP2 are highly expressed in M1 macrophages (52).Studies have indicated that BIRC3 regulates immune-related lung diseases, including asthma and Klebsiella pneumoniae pneumonia.Increased BIRC3 expression may contribute to asthma pathogenesis by influencing eosinophilic and allergic inflammation (53).In a mouse model of infection, blocking the Birc3/TLR4/Myd88 signaling pathway showed a protective effect against carbapenem-resistant K. pneumoniae (54).Additionally, studies have revealed the involvement of ERS and BIRC3 in tumor regulation.SNHG1, a KLF4-regulated lncRNA, suppresses ERS-induced apoptosis and promotes gliomagenesis by increasing BIRC3 expression (55).BIRC3 may also play a role in liver fibrosis by regulating the inflammatory response.Targeting cIAPs, including BIRC3, has been suggested as a potential therapeutic strategy for liver fibrosis by increasing MMP9 expression induced by CCL5 chemotactic neutrophils (56).Notably, recent studies have proposed targeting IAPs as a potential therapy for IPF by promoting apoptosis of mesenchymal cells.Fibroblasts from IPF tissues show increased expression of XIAP, which is also a member of the IAP family and is associated with resistance to apoptosis in lung fibroblasts (57).Furthermore, research has demonstrated that the profibrotic mediator TGF-b1 could enhance the expression of both XIAP and cIAPs in murine mesenchymal cells.Increased expression of XIAP and cIAP1 has also been observed in bleomycin-induced IPF models.Consistently, the IAPs inhibitor AT-406 protected mice from bleomycin-induced lung fibrosis (58).In this study, the AUC value of BIRC3 in the diagnostic and prognostic models of IPF exceeded 0.750.Moreover, there was a positive association between the expression levels of BIRC3 and the extent of immune cell infiltration.These findings suggest that BIRC3 is important in the context of IPF, and further investigation is warranted to elucidate its underlying mechanism.Metallothionein (MT) is a low-molecular-weight protein with a high cysteine content that binds to metals and is present in all eukaryotes.In humans, MT is categorized into four subfamilies: MT1, MT2 (also known as MT2A), MT3, and MT4 (59).MT1 is involved in ERS regulation Zn treatment prevents type 1 diabetesinduced hepatic oxidative damage, ERS, and cell death by upregulating hepatic MT expression (60).MT also protects against ERS-induced cardiac anomalies, possibly by attenuating cardiac autophagy (61).Furthermore, MT1 plays a role in the differentiation and functioning of immune cells.It positively regulates the differentiation of CD4+ T cells into Tregs and negatively regulates the differentiation of CD4+ T cells into Tr1 and Th17 cells (62).MT1 may also regulate lung diseases, such as chronic obstructive pulmonary disease (COPD), by affecting immune responses.Researchers identified a subpopulation of macrophages with high MT expression in patients with advanced COPD (63).Moreover, MT1 regulates pulmonary fibrosis, and its induction attenuates the progression of lung fibrosis in mice exposed to long-term intermittent hypoxia (64).In this study, MT1 expression in BALF and PBMC was significantly higher in patients with IPF than in healthy individuals.Conversely, the expression of MT1 in the lung tissue was lower in patients with IPF than in healthy controls.Furthermore, the AUC value of MT1 in the diagnostic model for IPF was 0.755, indicating its potential as a diagnostic marker for this condition.Interestingly, a negative association was observed between the expression levels of MT1 and extent of immune cell infiltration.This finding is in contrast to previous studies investigating the relationship between MT1 and inflammation, suggesting the need for further investigation.
Alpha-synuclein (SNCA) is a member of the synuclein protein family, which includes a-, b-, and g-synuclein, as well as synoretin (65).It is primarily expressed in regions of the adult CNS associated with synaptic plasticity (66).SNCA and ERS have been implicated in several neurological disorders.The aggregation of SNCA disrupts the ability of neurons to respond to misfolded proteins in the ER.It has been suggested that enhancing multiple proteostatic pathways is therapeutically beneficial in Parkinson's disease (PD) (67).Additionally, SNCA is involved in ERS induced by manganese through the PERK signaling pathway in brain slice cultures (68).SNCA and inflammation also play a role in the development and progression of neurological diseases, including PD. Intracellular translocation of toxic SNCA species could trigger hyperactivity in microglia, activate astrocytes, upregulate the expression of proinflammatory factors, and recruit peripheral immune cells to the vicinity of preapoptotic and apoptotic dopaminergic neurons in the CNS, all of which could contribute to neuronal dysfunction (69).
Studies have shown that SNCA is involved in the regulation of renal fibrosis.Disruption of SNCA signaling in renal proximal tubular epithelial cells contributes to the pathogenesis of renal tubulointerstitial fibrosis by promoting a partial epithelial-tomesenchymal transition and accumulation of ECM (70).This study showed that SNCA expression exhibited a notable increase in the high ERS score group compared with the low ERS score group among patients diagnosed with IPF.Additionally, the AUC value of SNCA in the diagnostic and prognostic models for IPF exceeded 0.750, indicating its potential as a valuable diagnostic and prognostic marker for this particular condition.Furthermore, a positive correlation was observed between SNCA expression and the extent of immune cell infiltration.Collectively, these findings suggest that SNCA plays a significant role in the development and progression of pulmonary fibrosis via ERS.Cytidine deaminase (CDA) is an enzyme that plays a crucial role in the pyrimidine salvage pathway (71).Family with sequence similarity 20 member C (FAM20C), previously known as Golgi casein kinase (G-CK), is a protein specifically localized in the Golgi apparatus (72).NELL2 is a glycoprotein involved in the development of neural cells and guidance of axons, including their repulsion (73).Zinc finger protein 91 (ZNF91) is a nuclear protein that is 63.5 kDa in size and exhibits structural motifs characteristic of transcription factors (73).These proteins were associated with the GO enrichment analysis results of ERSRDEGs.ERSRDEGs were enriched in the negative regulation of the phosphate metabolic process and G protein-coupled receptor signaling pathway.Additionally, they were enriched in cellular components of the Golgi lumen.Regarding MF, ERSRDEGs were closely associated with enzyme inhibitors, receptor ligands, and signaling receptor activator activities.
This study showed significant differences in the expression of AGRP, BIRC3, CDA, FAM20C, MT1E, NELL2, SNCA, and ZNF91 between patients with IPF and healthy controls.These were also significantly correlated with immune cell infiltration.Furthermore, our in vivo and in vitro experiments demonstrated that the expression of these molecules differed between the tunicamycin and bleomycin groups compared with the normal control group.This suggests that these molecules are involved in the development of IPF via ERS and inflammatory responses.However, it is important to note that the inconsistent trend in their expression may be attributed to the different sources of samples analyzed in our database, including BALF, PBMC, and lung tissue.Additionally, the A549 cells used in our in vitro experiments represent alveolar epithelial cells, whereas the 3T3 cells represent lung fibroblasts.Therefore, AGRP, NELL2, FAM20C, and MT1E may exert varying effects on alveolar epithelial cells and lung fibroblasts.Moreover, our in vivo experiments used bleomycin to induce pulmonary fibrosis in mice, which may not fully reflect the pathophysiological processes of IPF in humans.Furthermore, the stress state of the ER in mice might not have been fully represented in our in vivo experiments without the addition of tunicamycin.An important consideration regarding the use of animal models is that they may not completely mimic the pathophysiology of IPF.Nevertheless, animal models provide detailed mechanistic insights that are difficult to obtain from human studies.In addition, we did not assess the protein levels of these molecules.Therefore, further investigation is required to explore the changes in protein levels and necessitate including more BALF, blood, and lung tissue samples from patients with IPF for verification.
Recent advances have shed light on the role of the immune system in IPF, revealing that immune dysregulation is a crucial factor in the pathophysiology of the disease (14,15).Both human and mouse studies have provided evidence that monocytes are recruited to the lungs in response to tissue injury and subsequently differentiate into long-lived alveolar macrophages (Ams) (74).These Ams play a pivotal role in promoting fibrosis through various mechanisms, including the production of TGF-b (75), chemokine ligand 18 (CCL18) (76), chitinase 3-like protein 1 (CHI3L1) (77), matrix metalloproteinases (MMPs) (78), and activation of the Wnt/bcatenin pathway (79).These processes ultimately lead to fibroblast activation, myofibroblast differentiation, and ECM remodeling.Importantly, several of these profibrotic mechanisms are associated with M2 polarization in Ams (79).Th17 cells, CD8+ T cells, and Tregs have been observed to contribute to the progression of fibrosis, whereas Th1 cells and tissue-resident memory CD4+ T cells have shown potential protective effects (15).In accordance with previous studies, our findings demonstrated a higher abundance of monocytes and macrophages in the high-risk group than in the low-risk group, suggesting a potential association between elevated levels of these immune cells and IPF progression.Additionally, we observed a decrease in the number of resting memory CD4+ T cells in the BALF of the high-risk group, indicating the potential protective role of these cells in IPF.Furthermore, significant differences were observed in the dendritic cells, mast cells, NK cells, neutrophils, and myeloid-derived suppressor cells (MDSC) between the high-and low-risk groups.Notably, a significant correlation was observed between most ERSRDEGs and various immune cell populations.However, the precise contribution of these immune cells to IPF pathogenesis requires further investigation.
This study established a correlation between ERS and its associated genes in IPF diagnosis and progression.However, this study had some limitations.First, it relied on data from the GEO database, which may have introduced potential biases.Despite including multiple datasets, the sample size was relatively small, which may have affected the generalizability of our findings.Second, the majority of the datasets lacked crucial clinical information such as pulmonary function parameters, St. George's respiratory questionnaire scores, and the use of antifibrotic medications.Lastly, the relationship between the risk score and immune activity requires further investigation through basic experiments.Therefore, additional research is necessary to validate the clinical significance of these results.

2
FIGURE 2 Standardized processing of IPF datasets.(A) Boxplot of the GSE70866 data prior to normalization.(B) Boxplot of the GSE70866 data post normalization.(C) Boxplot of the GSE28042 data prior to normalization.(D) Boxplot of the GSE28042 data post normalization.(E) Boxplot of the GSE110147 data prior to normalization.(F) Boxplot of the GSE110147 data post normalization.IPF, idiopathic pulmonary fibrosis.

5
FIGURE 5Enrichment plots from GSEA in GSE70866 and GSE28042 datasets.(A, F) Mountain maps showing the GSEA results of GSE70866 (A) and GSE28042 (F) datasets.(B-E) Enrichment showed that DEG function mainly focused on surfactant metabolism (B), ECM receptor interaction (C), lung fibrosis (D), and cytokine-cytokine receptor interaction (E) in the GSE70866 dataset.G-J Enrichment showed that DEGs function mainly focused on interlerkin-10 signaling (G), oxidative stress response (H), neuroinflammation (I), and diseases of programmed cell death(J) in the GSE28042 dataset.GSEA, gene set enrichment analysis; DEGs, differentially expressed genes; ECM, extracellular matrix.
(A) Mountain maps showing the GSEA results of the GSE110147 dataset.(B-E) Enrichment showed that DEGs function mainly focused on the diseases of DNA repair (B), cell cycle checkpoints (C), apoptotic execution phase (D), and G2 M checkpoints (E).GSEA, gene set enrichment analysis; DEGs, differentially expressed genes.

7 WGCNA
FIGURE 7 WGCNA analysis identified the coexpression modules in GSE70866 dataset.(A) Sample dendrogram of GSE70866 dataset.(B) Scale-free index analysis for soft-threshold power and mean connectivity analysis for various soft-threshold powers.(C) Cluster of gene modules in GSE70866 dataset.(D) Module clustering dendrogram based on a dissimilarity measure (1-TOM).Each color represents one module.(E) Heatmap of the correlation between module eigengenes and IPF, each containing the corresponding correlation and P-value.(F) Venn diagram showing overlapping genes between ERSRDEGs and genes in MEblue.(G) Violin plot showing the differential expression analysis of ADM and IER3 between patients with IPF and healthy individuals in the GSE70866 dataset.(H) Scatter plot of the correlation between ADM and IRE3.WGCNA, Weighted gene coexpression network analysis; ERSRDEGs, endoplasmic reticulum stress-related differentially expressed genes; TOM, topological overlap matrix; IPF, idiopathic pulmonary fibrosis.***p<0.001.

8
FIGURE 8 Construction of the ERSRDEGs-associated diagnostic model.(A) Forest plot of logistic regression model.(B) The number of genes with the most minimal error rate obtained by the SVM algorithm.(C)The number of genes with the highest accuracy obtained by the SVM algorithm.(D) Correlation between the error and number of trees.(E) The importance scores of 13 ERSRDEGs were calculated based on the RF model.(F, G) Screening of characteristic genes by LASSO regression analysis.(H) Venn diagram illustrating overlapping genes among characteristic genes selected in SVM, RF, and Logistic-LASSO models.(I) Nomogram predicting the ROC of prevalence in the GSE70866 series.(J) Nomogram of predicted prevalence according to gene score of 8 common ERSRDEGs.(K) Calibration curves to assess the predictive power of the diagnostic model.(L) DCA curve to evaluate the clinical value of the diagnostic model.SVM, support vector machine; ERSRDEGs, endoplasmic reticulum stress-related differentially expressed genes; LASSO, least absolute shrinkage, and selection operator; ROC, receiver operating characteristic curve; AUC, area under the curve; DCA, decision curve analysis.
(A) The difference in expression of 28 immune cells between the high-and low-risk score groups.(B, C) Correlation heatmap showed the correlation coefficient between different immune cells in low-(B) and high-risk score groups (C).(D, E) The correlation analysis between common ERSRDEGs and specific immune cells in the low-(D) and high-risk score groups (E)."ns" not significant (p-value >0.05).*p<0.05,**p<0.01,***p<0.001.ssGSEA, single-sample gene-set enrichment Analysis; ERSRDEGs, endoplasmic reticulum stress-related differentially expressed genes.
FIGURE 12Immune infiltration analysis evaluated between groups with high-and low-risk scores by the CIBERSORT algorithm in GSE70866 dataset.(A) Histogram showing the distribution of 22 immune cell infiltration between the high-and low-risk score groups.(B) Boxplot showing the differences in infiltrated immune cells between the high-and low-risk score groups.(C, D) The correlation analysis between common ERSRDEGs and specific immune cells in the low-(C) and high-risk score groups (D)."ns" not significant (p-value >0.05).*p<0.05,**p<0.01,***p<0.001.ERSRDEGs, endoplasmic reticulum stress-related differentially expressed genes.

13
FIGURE 13 Construction of the ERS score prognostic model.(A) KM plot of overall survival between the high-and low-ERS score groups in the GSE70866 dataset.(B) ROC curve evaluated the predictive value of the model for the prognosis of patients in the discovery cohort for 1-,3-, and 5-year.(C) ROC curve evaluated the predictive value of the model for the prognosis of patients in the discovery cohort.(D-K) ROC curves of eight common ERSRDEGs in the discovery cohort, (D) AGRP, (E) BIRC3, (F) CDA, (G) FAM20C, (H) MT1E, (I) NELL2, (J) SNCA, (K) ZNF91.(L) The differential expression analysis of eight common ERSRDEGs between high-and low-ERS score groups."ns" not significant (p-value >0.05).***p<0.001.KM, Kaplan-Meier; ERS score, endoplasmic reticulum stress score; ROC, receiver operating characteristic curve; AUC, area under the curve; ERSRDEGs, endoplasmic reticulum stress-related differentially expressed genes.
FIGURE 15Expression of eight common ERSRDEGs genes verified via RT-qPCR in A549 cells, embryonic mouse fibroblasts 3T3 cells, and bleomycin-induced pulmonary fibrosis in mice.The experimental groups included DMSO and treatment groups in vitro.The treatment group was exposed to 1 mg/mL tunicamycin for 24 h in embryonic mouse fibroblasts 3T3 cells and 4 mg/mL tunicamycin for 48 h in A549 cells.The experimental group was induced with bleomycin (5 mg/kg), whereas the control group was treated with PBS in vivo.After 21 days, lung tissues were collected for RT-qPCR, HE, and Masson staining.(A, B) The expression of eight common ERSRDEGs between tunicamycin treatment and DMSO groups in embryonic mouse fibroblasts 3T3 (A) and A549 (B) cells.(C) The expression of eight common ERSRDEGs between bleomycin treatment and PBS groups in mice.(D) HE and Masson staining results of bleomycin treatment and PBS groups.*p<0.05,**p<0.01,***p<0.001.DMSO, dimethyl sulfoxide; RT-qPCR, real-time quantitative PCR; ERSRDEGs, endoplasmic reticulum stress-related differentially expressed genes.

TABLE 1 Idiopathic
Pulmonary Fibrosis data set information list.

TABLE 2
GO enrichment Analysis results of ERSRDEGs.

TABLE 3
GSEA enrichment analysis results of IPF dataset GSE70866.

TABLE 5
GSEA enrichment analysis results of IPF dataset GSE110147.
However, Hagimoto et al. showed that glucocorticoid-induced AGRP expression is suppressed through the NF-kB-p65 pathway in ERS (49).Currently, there is a lack of research examining the role of AGRP in the context of pulmonary fibrosis.This study showed that AGRP expression has a notably greater impact on the ERS-related diagnostic model than other co-ERSRDEGs.Furthermore, AGRP expression was negatively correlated with immune cell infiltration.This finding contradicts the positive correlation between AGRP and inflammation reported by Zhou et al., necessitating further comprehensive investigations.