CD247, a Potential T Cell–Derived Disease Severity and Prognostic Biomarker in Patients With Idiopathic Pulmonary Fibrosis

Background Idiopathic pulmonary fibrosis (IPF) has high mortality worldwide. The CD247 molecule (CD247, as known as T-cell surface glycoprotein CD3 zeta chain) has been reported as a susceptibility locus in systemic sclerosis, but its correlation with IPF remains unclear. Methods Datasets were acquired by researching the Gene Expression Omnibus (GEO). CD247 was identified as the hub gene associated with percent predicted diffusion capacity of the lung for carbon monoxide (Dlco% predicted) and prognosis according to Pearson correlation, logistic regression, and survival analysis. Results CD247 is significantly downregulated in patients with IPF compared with controls in both blood and lung tissue samples. Moreover, CD247 is significantly positively associated with Dlco% predicted in blood and lung tissue samples. Patients with low-expression CD247 had shorter transplant-free survival (TFS) time and more composite end-point events (CEP, death, or decline in FVC >10% over a 6-month period) compared with patients with high-expression CD247 (blood). Moreover, in the follow-up 1st, 3rd, 6th, and 12th months, low expression of CD247 was still the risk factor of CEP in the GSE93606 dataset (blood). Thirteen genes were found to interact with CD247 according to the protein–protein interaction network, and the 14 genes including CD247 were associated with the functions of T cells and natural killer (NK) cells such as PD-L1 expression and PD-1 checkpoint pathway and NK cell-mediated cytotoxicity. Furthermore, we also found that a low expression of CD247 might be associated with a lower activity of TIL (tumor-infiltrating lymphocytes), checkpoint, and cytolytic activity and a higher activity of macrophages and neutrophils. Conclusion These results imply that CD247 may be a potential T cell-derived disease severity and prognostic biomarker for IPF.


INTRODUCTION
Idiopathic pulmonary fibrosis (IPF) causes worsening dyspnea and deteriorating lung function, which results in poor prognosis (1). Actually, patients with IPF often die within 2-3 years after diagnosis (2,3), and the 5-year survival rate is less than 40% (4). Therefore, it is important to identify effective biomarkers for disease severity and prognosis in patients with IPF, which might identify patients with a worse predicted prognosis early and might then benefit from more aggressive interventions or earlier referral for transplantation.
Increasing studies have shown that innate and adaptive immune processes can coordinate existing fibrotic responses and are associated with prognosis in patients with IPF (5,6). CD247 (also referred to as T-cell surface glycoprotein CD3 zeta chain) is part of the T-cell antigen receptor (TCR) complex, playing an important role in receptor expression and signaling (7,8). Studies have suggested that a low expression of CD247 caused in the setting of chronic inflammation was associated with decreased T cell activity (9)(10)(11). Interestingly, the caused immunosuppression is only associated with downregulation of CD247 while the remaining TCR subunits are unaffected, which implies that the CD247 downregulation occurs at chronic inflammation not at acute inflammatory response (9,10). Furthermore, downregulation of CD247 had been reported in chronic inflammatory diseases such as celiac disease (12), chronic obstructive pulmonary disease (11), systemic lupus erythematosus (13), and systemic sclerosis (14). As far as we know, the association between CD247 and IPF has not been reported.
In this study, according to publicly available databases, we presented evidence of such an association between CD247 and immune microenvironment phenotype and evaluated the role of CD247 expression in patients with IPF.

Analysis of scRNA-seq Data
The computational analysis of the GSE141259 dataset was performed using R package "Seurat" (4.0.3) (26). Quality control had been finished by the authors of this GSE141259 dataset; therefore, 29,297 cells were analyzed. The Seurat SCTransform () function was used to normalize the scRNA-seq data. Principal component analysis (PCA) was calculated using the Seurat RunPCA () function. UMAP embedding and Louvain clusters were calculated using the first 50 principal components with the Seurat RunUMAP () and FindClusters () functions, respectively. Resolution was set as 0.9. The Seurat FindAllMarkers () function was used to find markers of 31 clusters, and cell types were identified based on markers of each cluster according to the CellMarker (27) and PanglaoDB databases (28). Expression and distribution of Cd247 were visualized according to Seurat DotPlot () and FeaturePlot () functions. Dynamic Cd247 expression was visualized by the web (https://theislab.github.io/ LungInjuryRegeneration/).

Functional Analysis
Differentially expressed genes (DEGs) were defined as expression levels of genes that were significantly diverse in IPF patients with low-expression CD247 compared with those with high-expression CD247 (|log Fold Change|>0.5 and false discovery rates (FDR) < 0.05). "Limma" package (v.3.46.0) (31) was used for the analysis of DEGs. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) were analyzed and visualized according to R package "clusterProfiler" (32). P values were adjusted with the Benjamini and Hochberg (BH) correction method. The single-sample gene set enrichment analysis (ssGSEA) score of 19 immune cells and the activity of 15 immune-related pathways (33) were calculated by the "GSVA" R package (v.1.38.2) (34). CIBERSORT is a large-scale analysis tool of RNA mixtures for cellular biomarkers and therapeutic targets according to the gene expression feature sets of 22 immune cell subtypes (http://cibersort. stanford.edu/) (35). Subsequently, the 20 immune cell subtypes were classified into four types: lymphocytes (B cells naive, B cells memory, plasma cells, T cells CD8, T cells CD4 naive, T cells CD4 memory resting, T cells CD4 memory activated, T cells follicular helper, T cells regulatory, T cells gamma delta, NK cells resting, NK cells activated), macrophages (monocytes, macrophages M0, macrophages M1, macrophages M2), dendritic cells (dendritic cells resting, dendritic cells activated), and mast cells (mast cells resting and mast cells activated).

Statistical Analysis
SPSS Statistics 23 (IBM SPSS) and R software (Version 4.0.3) were used for statistical analysis. Categorical variables were described as number (%) and were compared by the chi-square test or the Fisher Values are presented as mean ± SD or n (%). FVC% predicted, percent predicted forced vital capacity; Dlco% predicted, percent predicted diffusion capacity of the lung for carbon monoxide.
Values are presented as n (%) or mean ± standard deviation (SD). FVC% predicted, percent predicted forced vital capacity; Dlco% predicted, percent predicted diffusion capacity of the lung for carbon monoxide; TFS, transplant-free survival; CEP, composite end point.
exact test. Continuous variables were compared by independent group t tests. The diagnostic accuracy of CD247 for IPF or Dlco% predicted decline ≥15% over 12 months (Dlco15) were estimated by receiver operating characteristic (ROC) analysis. Youden index was used to calculate the optimal cutoff values of CD247 for Dlco15. Logistic regression analysis was used to estimate the odds ratio (OR) for the statistically significant correlation between gene expression and Dlco15. Kaplan-Meier analysis was used to compare the TFS or CEP or survival between different groups. R package "survminer" (https://CRAN.R-project.org/package=survminer, v.0.4.8) was used to estimate the optimal cutoff expression value of CD247 for the survival analysis. The hazard ratio (HR) for the statistically significant correlation between CD247 expression and TFS or CEP was estimated by univariate Cox regression. The R package "survivalROC" (https://CRAN.R-project.org/package= survivalROC, v.1.0.3) was used to construct the time-dependent ROC curve to evaluate the predictive value of CD247. Some statistical analyses were visualized by GraphPad Prism 9. The bilateral test was used.

Identification of the Hub Gene: CD247
The association between CD247 expression and lung function was respectively determined in the GSE38958, GSE132607, and GSE93606 datasets (blood samples). The significant correlations (p < 0.05) between genes and lung function are shown in Supplemental Material 1. After intersecting the genes with significant correlations in the three blood datasets, 33 genes associated with Dlco% predicted and 17 genes associated with FVC % predicted were selected for further analysis (Supplementary  Tables 2, 3 and Figure 1). Furthermore, of the 33 genes, 14 genes with consistent positive correlations (ATP10A, CD244, CD247,  DDX19A, DIS3L, OSBPL3, RFX5, SH3YL1, TDRKH, TMTC4,  TTC39B, TXK, UBE3C, and UTP15) and 11 genes with consistent negative correlations (BNIPL, C9orf131, DUSP13, GJA5, GUCY2D, MYL4, RHAG, SLC28A1, SLC6A7, TNFRSF19, and TULP2) were selected as the candidate hub genes ( Figure 1B, pink rectangles). Of the 17 genes, one gene with consistent positive correlations (PMPCB) and five genes with consistent negative correlations (ALLC, ANLN, DRD3, GLT8D2, and NECAB1) were also selected as the candidate hub genes ( Figure 1D, pink rectangles). The follow-up data (4, 8, 12 months) were extracted from the GSE132607 dataset (blood). In the fourth month, only MYL4 was significantly negatively associated with Dlco% predicted ( Table 3). In the eighth month, six genes (CD247, DIS3L, OSBPL3, TDRKH, TTC39B, and UTP15) were significantly positively associated with Dlco% predicted, and three genes (GUCY2D, MYL4, and RHAG) were significantly negatively associated with Dlco% predicted ( Table 3). In the 12th month, five genes (CD244, CD247, DDX19A, RFX5, and TXK) were significantly positively associated with Dlco% predicted, and MYL4 was significantly negatively associated with Dlco% predicted (Table 3). However, no genes were significantly associated with FVC% predicted in the follow-up data. Therefore, CD247 and MYL4 with a greatly consistent significant correlation were chosen for further analysis.   The numbers in the heatmap represent the correlation coefficients, red represents the positive correlation, green represents the negative correlation, the darker shade of red or green represents the higher correlation level, and the pink rectangles showed the genes were selected as the candidate hub genes.
Dlco% predicted decline ≥15% over 12 months (Dlco15) is a useful index to evaluate whether patients with IPF were progressive. Patients with Dlco15 were more likely to have the lower CD247 expression and higher MYL4 expression compared with those without Dlco15 in the GSE132607 dataset (Figures 2A, C and Supplementary Table 4). In addition, Dlco% predicted in patients with low-expression CD247 classified by the median value of CD247 at baseline was significantly lower than that in patients with high-expression CD247 after visiting at 4, 8, and 12 months in the GSE132607 dataset ( Table 4 and Figure 2B). Likewise, Dlco% predicted in patients with high-expression MYL4 classified by the optimal cutoff value (4.37) of MYL4 at baseline was significantly lower than that in patients with low-expression MYL4 after visiting at 4, 8, and 12 months ( Table 4 and Figure 2D). Subsequently, two lung tissue datasets (GSE32537 and GSE47460) were used. As shown in Figure 3, CD247 expression was significantly positively associated with Dlco% predicted in both blood and lung tissue. However, MYL4 was significantly negatively associated with Dlco% predicted in blood samples, not in lung tissue samples   Figure 1). Furthermore, CD247 was not significantly correlated with GAP (gender, age, and physiological index) in the GSE70866 dataset (BALF sample, data not shown). The expression of CD247 in patients with IPF was lower than that in controls in the GSE93606 (blood), GSE33566 (blood), GSE47460 (lung tissue), and GSE110147 (lung tissue) datasets, whereas the expression of CD247 did not show a difference in the GSE38958 (blood) and GSE32537 (lung tissue) datasets ( Figure 4). The expression of CD247 in patients with IPF (n = 75) was lower than that in controls (n = 19) in the GSE28042 dataset (13.24 vs. 13.56, p = 0.0043). Immunosuppressive therapy was not used before the blood samples were collected in the GSE93606 and GSE33566 datasets, implying that the expression of CD247 in the blood samples was not affected by the immunosuppressive therapy ( Table 1). The diagnostic values of CD247 for IPF were variable in different datasets (Supplementary Figure 2). Besides, the expression of MYL4 did not show the significant difference in these datasets (data not shown). Therefore, CD247 was considered as the key gene.

The Prognostic Value of CD247 in the Blood Samples
According to the ROC curve analysis and R package "survminer", the optimal cutoff value of CD247 was chosen for the logistic regression analysis in the GSE132607 (blood) dataset and prognosis-related analysis in the GSE93606 (blood), GSE27957 (blood), and GSE28042 (blood) datasets, respectively. According to logistic regression analysis, a low expression of CD247 at visiting at 0, 8, and 12 months was the risk factor of Dlco15 in the GSE132607 dataset, whereas the low expression of CD247 was not the risk factor of Dlco15 in the fourth month ( Figure 5A). According to Cox regression analysis and Kaplan-Meier analysis, low-expression CD247 at visiting at 0, 1, 3, 6, and 12 months was the risk factor of CEP in the GSE93606 dataset and significantly associated with shorter TFS time in the GSE27957 and GSE28042 datasets ( Figure 5B, Supplementary Figures 3A-G). Furthermore, the ROC curve showed that the areas under the curve (AUC) were 0.736 at 1 year and 0.741 at 2 years for CEP in the GSE93606 dataset (Supplementary Figure 3H). Moreover, AUCs for non-TFS were respectively 0.889, 0.787, and 0.702 at 1, 2, and 3 years in the GSE27957 dataset, whereas the AUC was relatively low in the GSE28042 dataset (Supplementary Figures 3I, J). In the GSE70866 dataset (BALF samples), CD247 was not significantly associated with mortality (p > 0.05, Supplementary Table 5).
In order to reveal the biological significance correlated with CD247, the DEGs between the patients with high-expression and low-expression CD247 based on the median value were used to conduct the GO enrichment and KEGG pathway analysis in the GSE38958, GSE132607, and GSE93606 datasets (blood samples). In the three datasets, DEGs were mainly enriched in inflammation-and immune-related response and pathways such as neutrophil activation involved in immune response, T cell activation, T cell differentiation, leukocyte chemotaxis, IL-17 signaling pathway, T cell receptor signaling pathway, Th17 cell differentiation, PD-L1 expression, and PD-1 checkpoint pathway in cancer and so on (Figure 7). Furthermore, related functions or pathways with ssGSEA showed that patients with low-expression CD247 were more likely to have lower immune activity [lower T cell general, Th1 cells, TIL (tumor-infiltrating lymphocytes), checkpoint, cytolytic activity, T cell co-inhibition, and T cell exhaustion scores)] and higher degree of inflammation response [higher DCs (dendritic cell), M2 macrophages, and neutrophils] compared with patients with high-expression CD247 in the three datasets (Figure 8), which was consistent with the results of CIBERSORT analysis (Supplementary Figures 4A, C, E). In addition, low-expression CD247 was significantly associated with low lymphocyte score (Supplementary Figures 4B, D, F).

The scRNA-seq Data Analysis of CD247
According to the LungMAP database (36), CD247 is mainly expressed by the T cells and NK cells in the human lung based on scRNA-seq data (37) (Supplementary Figure 5A). In the mouse lung, Cd247 is also mainly expressed by the T cells and NK cells, and the expression of Cd247 of T cells is increased in the acute inflammatory stage, then decreased in the fibrotic stage after bleomycin injection (Supplementary Figures 5B-D).

DISCUSSION
IPF is a serious lung disease with high mortality. In this study, we observed a significantly downregulated CD247 expression in patients with IPF compared with controls, and CD247 was significantly positively associated with Dlco% predicted in both blood and lung samples. The significantly positive association was still consistent after following up 8 and 12 months in the GSE132607 dataset (blood). Besides, low-expression CD247 was also a risk factor of Dlco15 in the GSE132607 dataset (blood) and is significantly associated with higher CEP in the GSE93606 datasets (blood). After following up at 1, 3, 6, and 12 months, low-expression CD247 was still the risk factor of CEP in the GSE93606 dataset (blood). In addition, low expression of CD247 was significantly associated with shorter TFS time in the GSE27957 and GSE28042 datasets (blood). The cause of low AUC in the GSE28042 dataset might be the high rate of lung transplantation. Therefore, CD247 might be a potential biomarker for disease severity and prognosis in patients with IPF. In addition, MYL4 was significantly negatively associated with Dlco% predicted in blood samples whereas it was associated neither with Dlco% predicted in the lung tissue samples nor with prognosis in the blood samples. Thereby, MYL4 might be an effective biomarker of disease severity for IPF in blood samples. However, the roles of the two genes need further study for verification. CD247 as part of the TCR complex plays an important role in receptor expression and signaling (7,8) and is associated with chronic inflammation (9). According to the LungMAP database, CD247 is mainly expressed by the T cells and NK cells in the lung based on scRNA-seq data, implying that CD247 could be an important regulator of immune responses in the lung. Actually, inflammation caused by some pathogens such as viruses and bacteria are proposed to play a role in the development of IPF. In the bleomycin-induced pulmonary fibrosis model, cytomegalovirus is considered to accelerate existing fibrosis according to enhancing TGFB1 activation and the expression of both phospho-SMAD2 and Vimentin (38). Besides, in BALF and lung tissue of patients with IPF, Epstein-Barr virus (EBV, a member of the Herpes family) is enriched compared with healthy controls (39,40). Furthermore, GERD (gastroesophageal reflux disease) is common in patients with IPF; thereby, the ongoing micro aspiration could lead to repeated inoculation with oral and gastric organisms (41).
In this study, results from blood, lung tissue, and BALF samples revealed significant co-expression associations between CD247 and the six interacted genes (CD3E, ZAP70, LCK, FYN, JAK3, and PTPN3). When the expression of CD247 was downregulated, the expressions of CD3E, ZAP70, LCK, and FYN were also downregulated. JAK3 was negatively associated with CD247 in the blood samples but was positively associated with CD247 in the lung tissue and BALF samples. These genes are key signaling molecules not only in the selection and maturation of developing T-cells but also in the activation of T cells. These results reveal that downregulation of CD247 caused by inflammation might suppress the immune activity by regulating the expression of these genes, which needs further study for verification.
In addition, according to the GO and KEGG analyses of DEGs between the patients with low-expression and high-expression CD247, these DEGs were mainly enriched in inflammation-and immune-related response and pathways. Besides, patients with lowexpression CD247 were more likely to have lower activity of T cells in general, Th1 cells, NK cells, and TIL (tumor-infiltrating lymphocytes) and higher activity of dendritic cells (DCs), M2 macrophages, and neutrophils compared with patients with highexpression CD247. Furthermore, the response process analysis showed that patients with low-expression CD247 were more likely to have a lower score of checkpoint, cytolytic activity, and T cell activation and higher score of inflammation compared with patients with high-expression CD247. These results were consistent with the following studies.
The incidence of cancer in IPF patients is higher compared with matched controls, especially for lung cancer (42). Interestingly, five genes (CD247, CD3E, ZAP70, LCK, PTPN6) and DEGs between the patients with high-expression and lowexpression CD247 were associated with the hsa05235 pathway (PD-L1 expression and PD-1 checkpoint pathway in cancer), and patients with low-expression CD247 had a lower TIL ssGSEA score, which may explain the high incidence of cancer. Besides, the downregulation of T cell regulatory genes associated with the immune checkpoint CTLA-4 was significantly associated with reduced event-free survival in the PBMCs (peripheral blood mononuclear cell) of patients with IPF (21). Interestingly, inflammatory interstitial lung diseases were caused by checkpoint inhibitors used as cancer immunotherapy (43). We speculated that the low immune checkpoint expression may be associated with the development and progression of IPF, which needs further study for verification. Immune processes can coordinate existing fibrotic responses and are associated with prognosis in patients with IPF (5,6). Th1 cells and their secretory products such as IL-12 (a potent inducer of IFN-g) are considered as being anti-fibrotic (44,45). Macrophages play an important role in the pathogenesis of IPF according to the regulation of both injury and repair of lung (46,47). A relative excess of M1/M2 macrophages leads to epithelial cell death as well as aberrant and dysregulated repair responses, which could cause progression or acute exacerbation of IPF (48,49). Neutrophils are associated with production of cytokines and chemokines, tissue injury, regulation of ECM (extracellular matrix) turnover, and generation of NETs (neutrophil extracellular traps), which result in fibroblast activation and ECM accumulation in IPF (6).
Taken together, our results suggest that chronic inflammation could participate in the development and progression of IPF according to the downregulated expression of CD247.
There are several limitations in this study. First, the study was conducted based on the retrospective data from GEO, and the number of samples in each dataset was relatively small. Second, we have only considered a single variable in the logistic regression and Cox regression. Many prominent prognostic clinical parameters such as lung function, treatment measures, and underlying diseases were not reported in most datasets that we used; thereby, the prognostic value of CD247 was limited. Third, the treatment of patients with IPF was unknown in some datasets, which may affect both disease and data analysis. Fourth, results regarding Dlco15 and CEP were based on a single dataset, which limited the generalizability of these findings. Finally, larger-sample prospective studies are needed to estimate the clinical relevance of CD247.

CONCLUSION
These results suggest that CD247 could reflect well the immune activity in both lung and blood and may be a potential biomarker to predict the lung function and prognosis of patients with IPF. Besides, MYL4 may be a potential biomarker for Dlco% predicted in the blood samples. However, the results need further study for verification.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found here: GSE32537

ETHICS STATEMENT
Approval of the Ethics Committee was not necessary for the datasets from GEO database.

AUTHOR CONTRIBUTIONS
HL, XL, and XW performed the data analysis. YL and SC performed the data collection, prepared the first manuscript draft, validated the data collection, refined the research idea, performed the data analysis, and edited the manuscripts. HC and SN designed and developed the research idea, refined the research idea, validated the data collection, and edited the manuscripts. HC and SN are the guarantors of the manuscript. All authors contributed to the article and approved the submitted version.