Differential Genomic Profile in TERT, DSP, and FAM13A Between COPD Patients With Emphysema, IPF, and CPFE Syndrome

Background: Genetic association studies have identified single nucleotide polymorphisms (SNPs) associated with lasting lung diseases such as Chronic Obstructive Pulmonary Disease (COPD) and Idiopathic Pulmonary Fibrosis (IPF), as well as the simultaneous presentation, known as Combined Pulmonary Fibrosis and Emphysema (CPFE) Syndrome. It is unknown if these diseases share genetic variants previously described in an independent way. This study aims to identify common or differential variants between COPD, IPF, and CPFE. Materials and methods: The association analysis was carried out through a case-control design in a Mexican mestizo population (n = 828); three patients' groups were included: COPD smokers (COPD-S, n = 178), IPF patients (n = 93), and CPFE patients (n = 16). Also, two comparison groups were analyzed: smokers without COPD (SWOC, n = 367) and healthy subjects belonging to the Mexican Pulmonary Aging Cohort (PAC, n = 174). Five SNPs in four genes previously associated to interstitial and obstructive diseases were selected: rs2609255 (FAM13A), rs2736100 (TERT), rs2076295 (DSP) rs5743890, and rs111521887 (TOLLIP). Genotyping was performed by qPCR using predesigned Taqman probes. Results: In comparing IPF vs. PAC, significant differences were found in the frequency of the rs260955 G allele associated with the IPF risk (OR = 1.68, p = 0.01). Also, the genotypes, GG of rs260955 (OR = 2.86, p = 0.01) and TT of rs2076295 (OR = 1.79, p = 0.03) were associated with an increased risk of IPF; after adjusting by covariables, only the rs260955 G allele remain significant (p = 0.01). For the CPFE vs. PAC comparison, an increased CPFE risk was identified since there is a difference in the rs2736100 C allele (OR = 4.02, p < 0.01; adjusted p < 0.01). For COPD-S, the rs2609255 TG genotype was associated with increased COPD risk after adjusting by covariables. Conclusion: The rs2736100 C allele is associated with decreased IPF risk and confers an increased risk for CPFE. Also, the rs2076295 TT genotype is associated with increased IPF risk, while the GG genotype is associated with CFPE susceptibility. The rs2609255 G allele and GG genotype are associated with IPF susceptibility, while the TG genotype is present in patients with emphysema.


INTRODUCTION
Chronic Obstructive Pulmonary Disease (COPD) and Idiopathic Pulmonary Fibrosis (IPF) are the leading causes of morbidity and mortality of pulmonary etiology in individuals over the fifth decade of life (1,2); each one with an independent pathophysiological and clinical behavior. In the last 30 years, a new entity where both diseases can coexist has been described, and this is the Combined Pulmonary Fibrosis and Emphysema (CPFE) syndrome with a worse prognosis and higher mortality compared with patients with individual diseases (3)(4)(5).
One of COPD's main characteristics is emphysema, defined as the irreversible destruction of the alveolar wall beyond the terminal bronchiole; however, not all COPD patients present it (3,(5)(6)(7). On the other hand, IPF is another lung disease with a fatal prognosis and survival of ∼5 years from diagnosis (8,9). It results from the combination of excessive extracellular matrix production, loss of the alveolar epithelium, and permanent collapse of the alveolar sacs (5).
COPD is a multifactorial disease, and tobacco smoking is the main risk factor described; however, only around 20% of smokers will develop COPD, suggesting that other factors such as genetics can influence the disease susceptibility. The most representative is the genetic α1 antitrypsin deficiency, but its frequency is low and mainly in the European population. Nevertheless, genome-wide association studies have identified genetic markers, mainly SNPs, in susceptibility to COPD; for example, at least 20 polymorphisms in matrix metalloproteinases (MMP) genes have been associated with this disease. The MPP1 rs1799750 has been associated with an apical distribution of emphysema in these patients (10,11).
The CPFE is a recently described entity in which emphysema is predominantly in the upper and fibrosis in the basal lung lobes and subpleural regions (9,12). The etiology and pathogenesis of this disease remain unknown and result in a diagnostic challenge with poor prognosis and high mortality (4,12,13). In 2020, Kinjo et al. reported that the minor allele of the AGER rs2070600 was associated with CPFE related with COPD patients in the dominant model in a Japanese cohort (14). Variants in MMP are also described in IPF and CPPE; for instance, Xu et al. reported that the T allele of MMP9 (C-1562T) might predispose to the development of emphysema in patients with IPF in a Chinese population (15). If an interaction between COPD and IPF can result in the development of CPFE remains unclear since each one has a different pathophysiological profile. However, both entities share genetic variants that had been independently associated. Among the most reported are single-nucleotide variants in MUC5B, FAM13A, DSP, and TERT genes (16,17).
We hypothesize that SNP-type variants are involved in the susceptibility and pathogenesis of CPFE and present a differential profile in COPD with emphysema and IPF. This study aims to identify differential variants between COPD, IPF, and CPFE in patients from a mestizo-Mexican population.

Study Population
In this case-control study, a total of 828 participants divided into three groups of patients were included; COPD smokers (COPD-S), Idiopathic Pulmonary Fibrosis (IPF) patients, and subjects with Combined Pulmonary Fibrosis and Emphysema (CPFE) syndrome diagnosis; as a control groups, smokers without COPD (SWOC), and healthy subjects belonging to the Pulmonary Aging Cohort (PAC) (18,19) over 50 years were included.
The diagnosis of COPD was confirmed through spirometry, from the FEV 1 /FVC ratio <70% after the bronchodilator administration, taking as reference the values for Mexicans defined by Pérez-Padilla et al. (20).
The control groups included smokers without COPD with normal spirometry values (FEV 1 /FVC >70%). Individuals with a tobacco index (TI) >10 packs/year and indistinct gender were included; patients with clinical evidence of other bronchopulmonary diseases were excluded. We also include subjects with the presence of emphysema with at least 10% of extension in tomography (9). Participants were recruited from 2009 to 2016.
The IPF diagnosis was established considering the criteria of the 2018 ATS/ERS/JRS/ALAT guidelines (tomographic or histopathological pattern of usual interstitial pneumonia) (21). Smokers, and non-smokers, of both sexes, were included. Patients with evidence of a secondary cause of fibrosis (autoimmune processes, systemic diseases, or a history of exposure) were excluded. Recruitment was carried out in the period 2013-2019. The control group from the pulmonary aging cohort was randomly selected (18), matching the variables of age, sex ratio, and smoking history of the IPF group. All subjects are defined as pulmonary healthy by spirometry (using the reference values of Pérez-Padilla) or imaging (20).
The CPFE diagnosis was determined through high-resolution tomography, where there were upper emphysematous lesions and fibrosis in the lower lobes in the subpleural region. Smokers subjects without evidence of other pulmonary or systemic disease were included. Recruitment began in 2019 but was suspended due to the COVID-19 lockdown. For this patients' group, the subjects of the PAC were considered as controls.
Participants in the COPD-S, SWOC, and CPFE groups were recruited from the Tobacco Smoking and COPD Research Department, and the clinical service 5. In addition, the IPF group was evaluated and managed in the Interstitial Lung Disease and Rheumatology Unit (ILD&RU), while the PAC participants from the Translational Research Laboratory on Aging and Pulmonary Fibrosis of the "Moises Selman Lama" Research Unit. All the departments above are part of the Instituto Nacional de Enfermedades Respiratorias Ismael Cosio Villegas (INER) at Mexico City, Mexico.
Clinical and demographic selection of case and control groups are shown in Supplementary Figure 1.

Statistical Analysis
The analysis and comparison of the clinical and demographic variables of the comparison groups were performed using the RStudio software (22). The normality of the variables was evaluated through the Kolmogorov-Smirnov normality test; thus, it was determined to use non-parametric statistics. The comparison of quantitative variables between groups was carried out using the Mann-Whitney U-test, and the frequency of qualitative variables was compared with the χ 2 -test.
Allele frequency and genotype analysis were performed using Epi Info 7.1.4.0 software (23) (Centers for Disease Control and Prevention, Atlanta, GA, USA) using the χ 2 -test and Fisher's exact test (when the frequency of a variable was <10) to obtain the 95% confidence intervals and the OR values. The Hardy-Weinberg equilibrium of the variants was calculated using the PLINK v1.07 software (24). The results and associations obtained were considered significant when a p < 0.05. Logistic regression analysis was performed to adjust for possible confounding variables using PLINK v. 1.07.
The univariate and multivariate logistic regression model was designed to evaluate the relationship of the genotypes of the polymorphisms with clinical variables related to prognosis and development of the disease in RStudio.

Ethical Approval and Informed Consent
This study was reviewed and accepted by the Institutional Committees for Research, Ethics in Research, and Biosecurity of the INER (approval numbers: C09-19 and C39-14). In addition, all participants signed the written informed consent form and provided a privacy statement that describes the legal protection of their data, both documents approved by the Institutional Research and, Ethics in Research Committees.
All experiments were performed following pertinent regulations and considering the STREGA (STrengthening the REporting of Genetic Association) guidelines to design this genetic association study.

Demographic Variables in Cases and Controls
Three case-control comparisons were included; the first refers to COPD-smokers patients (178 COPD-S) and smokers without COPD (367 SWOC). The second includes 93 IPF patients vs. 174 PAC subjects, and the third comparison consists of 16 CPFE vs. 174 PAC. The clinical and demographic variables are shown in Table 1. When comparing COPD-S vs. SWOC, significant differences were found in age (p < 0.01), male sex (p < 0.01), being found more frequently in COPD-S. Also, the BMI was lower in the COPD-S group than the SWOC group, and TI was higher in the COPD-S group (p < 0.01). In the IPF vs. PAC comparison, no differences were found between the demographic variables. However, when comparing CPFE vs. PAC, statistically significant differences were found in TI (p < 0.01).
The differences observed for the comparisons between groups of cases and controls in pulmonary function tests are expected since they are part of the diagnostic criteria to differentiate them. It should be noted that the CPFE patients had the lowest DL CO levels in the cases' groups.

Hardy-Weinberg Equilibrium
The Hardy-Weinberg equilibrium (HWE) was tested for the control group for each comparison. The rs2609255 and rs111521887 do not meet HWE in the SWOC group (p < 0.05).  While for the PAC group, rs2736100, rs5743890, rs111521887 did not meet this criterion.

Allele and Genotype Frequencies
In the COPD comparison group, no differences were found in allele frequencies in any of the included SNPs (Data presented in Table 2); however, when correcting for covariates (age, sex, BMI, and tobacco index), significant values were found for the FAM13A rs2609255 TG genotype OR = 1.60 (CI 95% = 1.09 -2.35 p < 0.01), presenting an association with an increased risk for COPD.
Interestingly, the C allele of rs5743890 in the TOLLIP gene had the lowest frequency of all the SNPs included (9%). Table 3 shows the comparison of allele and genotype frequencies for the IPF patients and PAC subjects. A significant difference in the frequencies of the G allele of the FAM13A rs2609255 was found, ∼10% between cases and controls (OR = 1.69, CI 95% = 1.11-2.57, p = 0.01), in the same way, a significant difference was found between the frequency of the GG genotype with the highest presence in the case group (OR = 2.86, CI 95% = 1.26-7.06, p = 0.01) conferring a greater IPF susceptibility. Excitingly, this association remains significant after adjusting for covariates (p = 0.01). For rs2076295 in the preliminary analysis, a significant difference was found in the TT genotype that occurs more frequently in cases (OR = 1.79 CI 95% = 1.02-3.12, p = 0.03), granting a greater risk for IPF development; however, it does not remain significant after adjustment for covariates.
For the CPFE comparison, statistically significant differences for rs2736100 were found, with a higher A allele frequency in controls, associated with a decreased risk for CPFE (OR = 0.24, CI 95% = 0.11-0.55, p < 0.01); while the C allele, with a higher   Table 4.

Genetic Association Models
Codominant, dominant, and recessive models were applied for significant SNPs in the IPF comparison (Tables 5, 6). The rs2609255 association remains in the codominant (OR = 3.29) and recessive (OR = 2.86) models. For rs2076295, the association remains significant when applying the dominant model (OR = 1.90). These models also were applied in the COPD and CPFE comparisons. The first one was without significant association for none of the SNPs evaluated (Supplementary Tables 2, 3). For CPFE comparison, the rs2736100 AA genotype association remains in the dominant model (OR = 0.08), also applying the recessive model, a significant association was found for the CC genotype (OR = 3.93). When the recessive model is applied, we found a significant association for the rs2076295 GG genotype (OR = 5.12), conferring an increased risk to CPFE (Supplementary Tables 4, 5).

Univariate and Multivariate Logistic Regression Model
Univariate and multivariate logistic regression analyses were performed for the recessive model of rs2609255 (FAM13A) in the IPF comparison, where a possible relationship of the genotype with clinical variables associated with prognosis or development may be involved. The variables to consider were the tomographic pattern, BAL cell count, and pulmonary function tests (FVC and DL CO ); however, no relationship was found between patients' genotype and the clinical variables (Supplementary Table 6).

DISCUSSION
In this case-control study, we investigated the potential associations of rs2609255 (FAM13A), rs2736100 (TERT), rs2076295 (DSP) rs5743890, and rs111521887 (TOLLIP) with IPF, COPD, and CPFE syndrome in a Mexican-mestizo population. We found that the rs2609255/G allele (FAM13A) has an independent effect on the risk for IPF in our population. In addition, an association with the rs2076295/TT genotype (DSP) was significantly associated with increased susceptibility for IPF. The rs2609255/TG genotype (FAM13A) is associated with COPD susceptibility. For the CPFE group, we also found a significant association of rs2076295/GG genotype (DSP) and rs2736100/CC genotype (TERT) to a higher risk for CPFE syndrome. The prominent and distinctive characteristic of the CPFE syndrome is the marked loss of gas diffusion capacity, represented by DL CO values (25), which correlates with our results when comparing the DL CO levels among the three cases' groups.
Regarding clinical and demographic variables of patients' groups and control subjects, we observed that the CPFE patients are the group that presents the lowest DL CO , FEV 1 , and FVC values when were compared with the IPF patients and the PAC subjects. Interestingly, despite the decrease in FEV 1 and FVC, the FEV 1 /FVC ratio was maintained at >70%, another characteristic previously reported in CPFE patients (9,13,25).
Multiple studies have identified variants associated with susceptibility for both COPD and IPF (17,26). However, there are few genetic association studies reported for CPFE. Multiple GWAS have identified common single-nucleotide variants with a MAF>5% associated with IPF in several genes that play a crucial role in different pathways related to disease pathogenesis, among which are MUC5B, FAM13A, DSP, TOLLIP, TERT, TERC, MDGA2, SPPL2C (17,26).
FAM13A (family with sequence similarity 13, member A) is a gene allocated on cytogenetic band 22 of the long arm of chromosome 4, expressed in multiple tissues and cells, such as type 2 epithelial cells and airway macrophages. However, little is known about its biological function (27,28). Genetic variants in FAM13A have been associated with both COPD and IPF (17,27,29,30). In 2016, Hirano et al. reported in a Japanese cohort that the association with IPF susceptibility was associated with the rs2609255 G allele; furthermore, they demonstrated that in the recessive, dominant, and additive genetic association models, the GG genotype was also associated with an IPF increased risk (27). Our IPF group showed higher frequencies of the G allele and the GG genotype of rs2609255, finding an association with the increased risk for IPF. However, when adjusting for covariates, only the G allele remained significantly associated. On the other hand, when applying the co-dominant and recessive association models, the GG genotype remained associated, conferring a higher risk for IFP. Different studies have also described that FAM13A is associated with COPD susceptibility. Wang et al. evaluated 5 SNPs (rs7671167, rs2869966, rs2869967, rs2045517, and rs6830970), showing that they were associated with lower FEV 1 /FVC values and that rs767167 conferred a higher risk for COPD. On the other hand, Zhang et al. reported an association between rs17014601 and COPD in additive, heterozygous, and dominant models (31). No association between COPD and rs2609255 has been previously reported; in our case, when performing the analysis of allele frequencies and genotypes in the COPD-S and SWOC group; after correction for covariates (age, sex, TI, and BMI), we found that the TG genotype was associated with an increased risk for developing COPD. This variant in the SWOC group does not comply with the HWE; however, the HWE is not met in multiple populations. The Mexican mestizo population has a very diverse genetic variability due to years of genetic recombination with other populations such as European, Amerindian, and Asian, so probably several SNPs do not behave in the same way previously described in the literature (32,33). FAM13A is associated with the Wnt signaling pathway; in a COPD animal model, it promotes β-catenin degradation and decreases Wnt signaling; while in fibrosis, a mechanism by which FAM13A may be conditioning susceptibility has not been proposed. In IPF patients, there is an increase in FAM13A protein, inducing fibroblast migration (27), which suggests that FAM13A plays an important role in the IPF pathogenesis.
In the genotype frequencies crude analysis, another significant association was found within the IPF patients group. The rs2076295 TT genotype of DSP is associated with increased susceptibility to IPF. Previous studies showed an association between intronic variants of the DSP gene with IPF. Mathai et al. reported that the rs2076295 G allele in intron 5 confers a higher risk in the Caucasian population; also that the minor allele was associated with a lower DSP expression in the lung (34). In our IPF patients group, the TT genotype is associated with higher risk, and when applying the dominant model, this association remains significant. The DSP gene encodes for desmoplakin, a binding protein present in desmosomes; the rs2076295 is considered a binding site for the transcription factor PU.1 involved in macrophage activation and airway remodeling (34,35).
Alterations in telomeres and their length are etiological factors of multiple diseases. For example, rs2736100 of the TERT gene, a component of the telomerase enzyme, is strongly associated with telomere length and increased susceptibility for IPF (36). Previous studies have shown that the A allele of this variant is associated with shorter telomere length in peripheral blood leukocytes (37,38). On the other hand, this same variant is associated with increased telomere length in patients with lung cancer (37).
Mutations in TERT have also been associated with the development of emphysema and COPD. For example, Ding et al. (38) reported an increased risk for COPD with several SNPs, including rs10069690, rs2853677, and rs2853676. In addition, Stanley et al. (39) found mutations in TERT present in female smokers with severe emphysema and a tendency to the presence of pneumothorax.
Our study found an association of the C allele of rs2736100 of TERT and the CC genotype in the recessive and codominant model, giving increased susceptibility for CPFE; this is the first significant association between rs2736100 and CPFE syndrome described.
It should be noted that in the IPF group, this variant is associated when comparing allele frequencies. The A allele is associated with increased risk for IPF while the C allele with CPFE risk; we could speculate then that there is a differential genetic profile between these two conditions. CPFE syndrome is a relatively new and poorly described entity; there are still many unknowns about its etiopathogenesis. The participation of immunological, inflammatory, and genetic factors has been proposed in several reviews; however, being such a rare entity, few studies have been reported to date. Although we have a limited sample size for this entity, we could observe a difference in the behavior of the SNPs analyzed in each disease, conferring risk for one and not for the other two.
Patients with IPF have a poor prognosis, with a maximum survival of 5 years from diagnosis. Previous studies have searched for relationships between the genetic variants analyzed and clinical variables associated with a worse prognosis in patients, such as DL CO values, the extension of the tomographic pattern, and LBA cell count. For example, in IPF patients, Wang and collaborators (36) reported that the CC + CT model of rs868903 of MUC5B was associated with shorter survival, shorter telomere length, and higher tomographic pattern (honeycomb) extension. In addition, Bonella et al. reported an association between the C allele of rs5743890 of the TOLLIP gene and lower survival and disease progression. On the other hand, the T allele of rs2609255 is associated with lower DL CO levels and lower survival (40). We performed univariate and multivariate logistic regression models looking for a relation between the TT + TG vs. GG model and variables associated with prognosis or development (tomographic pattern, LBA cell count, and spirometry values); however, no significant association was identified, possibly due to our sample size. Therefore, additional investigations should be done to identify factors associated with a worse prognosis.
The well-defined cases' groups (clinical and tomographically) and the 2:1 case-controls ratio are some of our research strengths. Besides, the PAC group is ideal for making comparisons since it comprises aging subjects without evidence of lung disease, avoiding potential confounding factors. In addition, there are no studies where the genetic variants of both entities involved in CPFE syndrome are compared.
Our study is not free of limitations. For example, the sample size was small compared to other studies such as GWAS. All the participants were recruited from a single center, and the CPFE recruitment was suspended due to the COVID-19 lockdown. We also had limitations regarding the tomographic data since only the diagnostic tomography was considered to analyze genotypes and variables associated with a worse prognosis since several patients did not have more than one tomography in the followup. Besides, we only consider the presence of the usual interstitial pneumonia pattern (honeycomb) as typical or not, without considering its extension for the analysis. More studies are needed with a more significant number of patients and to be able to corroborate the previous findings.
In summary, the rs2736100 C allele of TERT is associated with decreased IPF risk and confers an increased risk for CPFE; besides, the A allele is also associated with IPF increased susceptibility, while the CPFE comparison provides a protection factor (OR <1.0). Also, the rs2076295 TT genotype of DSP is associated with increased IPF risk, while the GG genotype is associated with CPFE susceptibility. The rs2609255 G allele and GG genotype of FAM13A are associated with IPF susceptibility, while the TG genotype is present in patients with emphysema and provides COPD susceptibility. These findings support the hypothesis that there is a differential genomic profile between COPD patients with emphysema, IPF, and CFPE. Our findings point to different molecular pathways between the three diseases and the role of SNP variants in the pathogenesis of CPFE, which could even be used as genetic markers to differentiate these patients.
In conclusion, we described for the first time that some polymorphisms in the TERT and DSP genes are associated with a higher risk for CPFE. Interestingly, one of these SNPs is associated with reduced risk for IPF while not associated with COPD. These findings suggest the existence of a differential genomic profile between COPD, IPF, and CPFE syndrome. However, more studies are needed to elucidate the role of genetic variants in the development of CPFE.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Institutional Committees for Research, Ethics in Research, and Biosecurity of the INER (approval number: C09-19 and C39-14). The patients/participants provided their written informed consent to participate in this study.