Polymorphisms of Antigen-Presenting Machinery Genes in Non-Small Cell Lung Cancer: Different Impact on Disease Risk and Clinical Parameters in Smokers and Never-Smokers

Lung cancer is strongly associated with cigarette smoking; nevertheless some never-smokers develop cancer. Immune eradication of cancer cells is dependent on polymorphisms of HLA class I molecules and antigen-processing machinery (APM) components. We have already published highly significant associations of single nucleotide polymorphisms (SNPs) of the ERAP1 gene with non-small cell lung cancer (NSCLC) in Chinese, but not in Polish populations. However, the smoking status of participants was not known in the previous study. Here, we compared the distribution of APM polymorphic variants in larger cohorts of Polish patients with NSCLC and controls, stratified according to their smoking status. We found significant but opposite associations in never-smokers and in smokers of all tested SNPs (rs26653, rs2287987, rs30187, and rs27044) but one (rs26618) in ERAP1. No significant associations were seen in other genes. Haplotype analysis indicated that the distribution of many ERAP1/2 haplotypes is opposite, depending on smoking status. Additionally, haplotypic combination of low activity ERAP1 and the lack of an active form of ERAP2 seems to favor the disease in never-smokers. We also revealed interesting associations of some APM polymorphisms with: age at diagnosis (ERAP1 rs26653), disease stage (ERAP1 rs27044, PSMB9 rs17587), overall survival (ERAP1 rs30187), and response to chemotherapy (ERAP1 rs27044). The results presented here may suggest the important role for ERAP1 in the anti-cancer response, which is different in smokers versus never-smokers, depending to some extent on the presence of ERAP2, and affecting NSCLC clinical course.


INTRODUCTION
Lung cancer is the most common cause of cancer death. Among risk factors of lung cancer, smoking is considered to be the predominant risk factor (1,2). The ratio of mortality to incidence is 0.87, so most patients eventually die of the disease (3). In addition, survival is in inverse proportion to smoking, i.e., decreases with the growing number of packyears (1). However, epidemiological data indicate that other risk factors exist. First, only about 15% of smokers get lung cancer (2). Second, this disease appears also, albeit much less frequently (10-25% of all lung cancer cases) among neversmokers (4)(5)(6). Although lung cancer develops in neversmokers worldwide, geographic variation is striking, with 30 to 40% of Asian patients with lung cancer being neversmokers, compared with 10 to 20% of Caucasian patients (2). Non-small cell lung cancer (NSCLC) type accounts for approximately 85% of all cases of lung cancer, especially in never-smokers (almost all cases), and adenocarcinoma is a major type of NSCLC (7)(8)(9). However, accurate data on the incidence of lung cancer in never-smokers are rare due to the lack of information about smoking habits in the majority of cancer registries (10).
Malignant cells are eliminated by both innate and adaptive immunity, which are dependent on the HLA class I (HLA-I) molecule expression level on the surface of tumor cells (11). Innate natural killer cells, with their KIR or NKG2/CD94 receptors, recognize the markedly reduced amount or lack of HLA class I molecules which frequently happens on tumor cells (12). The efficient adaptive immunity against tumors is mediated by CD8+ cytotoxic T lymphocytes (CTLs) which recognize tumor neoantigens presented to their specific T cell receptors by HLA-I molecules (13). Antigen presentation by HLA-I glycoproteins depends (i) on genetics of the extremely polymorphic HLA-I molecules themselves, (ii) on differential expression and/or activity of antigen-processing machinery (APM) elements due to some polymorphisms of their genes, and (iii) on regulatory effects of other molecules. Associations of HLA and other genes with human diseases, including neoplasms, have already been studied for decades (14).
Peptides bound and presented by HLA-I molecules are prepared by the antigen processing machinery (APM) composed of multiple components. Proteins synthesized in the cytosol, but not transported to the site of their destiny, are polyubiquitinated and degraded in the proteasome-a multiprotein proteolytic complex in the cytosol. Under conditions of immune response, some protein subunits of the proteasome are replaced by functionally different counterparts-PSMB9, PSMB8, and PSMB10 (called PSMB for "proteasome 20S subunit beta" or, according to older nomenclature, LMP2, LMP7, and LMP10 for "low-molecular-mass polypeptide") and such a protein complex is called "immunoproteasome". Both proteasome and immunoproteasome proteolytically cut protein chains into peptides which are transported to the endoplasmic reticulum by the transporter associated with antigen processing (TAP) molecules (consisting of TAP1 and TAP2 subunits) (13). However, peptides produced by immunoproteasome fit TAPmediated transport and MHC-I binding better than peptides produced by a common proteasome (13,15). Then, ERAP1 (endoplasmic reticulum aminopeptidase associated with antigen presentation 1) and ERAP2 trim the transported peptides (if these are too long) to a suitable length (usually 8-10 amino acids). Empty HLA class I molecules bind the peptides with the help of other molecules present in the endoplasmic reticulum. Finally, HLA-I/peptide complexes are transported to the cell surface, where they may be recognized by CD8+ T cells via their specific T cell receptors (13)(14)(15) or by NK cells via KIRs and CD94/NKG2 (16,17).
Defects in APM may enable tumor cells to evade T lymphocyte-mediated recognition and lysis, which has already been shown for multiple tumor types (13). Disturbances of APM component expression have been shown for several types of human tumors (18). For example, PSMB, TAP, and/or ERAP molecules were down-or up-regulated in different cancers (19)(20)(21)(22)(23)(24)(25)(26)(27). Similarly, loss, retention or acquisition as well as imbalances of ERAP1 and ERAP2 expression in different solid tumor tissues as compared to normal counterparts has been observed (28) which may affect disease outcome and response to treatment (29).
Importantly, genes encoding APM molecules exhibit some levels of polymorphism, and some genetic variations may result in changes in expression or activity of these molecules (14). Moreover, some variants were found to be associated with human diseases, including neoplasms (30). For example, PSMB8 variant encoding a protein with lysine in position 49 (PSMB8-K) has a lower mRNA stability and a lower interferon-gamma-mediated induction than an alternative form with glutamine in this position (PSMB8-Q) and was associated with colon cancer (31). Although polymorphism of TAP1 and TAP2 genes is relatively low, some variants have been associated with cancer risk. For instance, the rs1057141/TAP1-I393V variation was positively associated with multiple myeloma (32) and higher risk of cervical intraepithelial neoplasia after HPV infection (33). Furthermore, two TAP1 polymorphisms: rs1057141 and rs1135216 [D697G] were associated with high-grade cervical intraepithelial carcinoma (34). With regard to TAP2, heterozygous genotype GA of the SNP rs2228396 [A565T] was associated with susceptibility to chronic lymphoid leukemia (CLL), whereas rs241447 G allele [T665A] was a risk factor for chronic myeloid leukemia and multiple myeloma alongside CLL (32). ERAP genes, particularly ERAP1, are polymorphic, which influences their expression, activity and substrate specificity (35,36) as well as the HLA class I-bound immunopeptidome repertoire (37). According to this, several ERAP1 polymorphisms were described as affecting susceptibility to cervical carcinoma in Dutch (38) and Indonesian (39) populations.
Polymorphisms of antigen-processing machinery genes were not tested so far in non-small cell lung cancer, except for our previous report on the ERAP1 gene showing highly significant associations in Chinese but not in Polish populations (40). However, smoking status was not established in these two cohorts. Therefore, we hypothesized that genetic polymorphisms of the APM components (PSMB8, PSMB9, TAP1, TAP2, ERAP1, ERAP2) might be associated with NSCLC also in Poles, but rather in never-smokers, devoid of this strong environmental factor. Results presented here show that for ERAP1, four out of the five examined SNPs were significantly associated with NSCLC, not only in never-smokers but surprisingly also in smokers, however in opposite directions. In addition, three studied polymorphisms in ERAP1 and one in PSMB9 were also associated with some clinical characteristics.

Study Subjects
A total of 464 newly diagnosed patients with pathologically documented NSCLC (according to WHO criteria) were enrolled in our study by the Department of Pulmonology and Lung Cancer, Wrocław Medical University, Wrocław and by the Thoracic Surgery Center, Lower Silesian Centre of Lung Diseases, Wroclaw. The histological type of lung cancer was identified according to the World Health Organization (WHO 2015) classifications. Pathologic stages were determined according to the International System for Staging Lung Cancer (41) as described (42). According to histopathological reports, NSCLC included adenocarcinoma (AC), squamous cell carcinoma (SCC), and a few cases of large cell carcinoma. In several cases, a detailed subtype of NSCLC was not determined due to small cytological specimen and/or only palliative regimen of treatment. NSCLC patients with a history of primary cancer other than lung cancer were excluded from this study. The study also did not include palliative patients whose stage of the disease was not established due to the general condition and/or coexisting serious unstable diseases. Advancement of the disease was determined according to the TNM staging system (43) based on radiological exam and endoscopic techniques. Chest computed tomography, PET-CT, and as needed brain MRI/CT, and bone scintigraphy were done. All patients underwent bronchofiberoscopy. If it was necessary to verify the status of mediastinal lymph nodes, additional EBUS-TBNA or mediastinoscopy was applied. In the case of curative surgery the definite clinical stage was verified by histopathological examination of postoperative specimens especially those of the lymph nodes. The ECOG scale was used to assess the general condition and quality of life of patients. Patients underwent surgery with adjuvant or neoadjuvant chemotherapy (two/three courses given at three-week intervals) as needed, radiotherapy, radiochemotherapy, chemotherapy, or palliative treatment only, depending on disease stage based on local recommendations. The treatment response was based on radiological examinations including mainly chest computed tomography evaluation (CT, Response Evaluation Criteria in Solid Tumors) or chest X-rays, if chest CTs were not available, with clinical and radiological data monitoring appearance of distant metastases (other lung, pleura, central nervous system, bones, adrenal glands) in advanced stage. Tumor response was assessed every 2 months during the first year after diagnosis, every 3 months between 12 and 18 months, and thereafter the interval of assessment was at the physician's discretion. Overall survival was assessed from the date of NSCLC diagnosis until death from any cause or until 2 years when data collection was finished. Detailed characteristics of the patients are shown in Table 1.
A total of 409 unrelated healthy Polish individuals (42 females and 367 males, for details see Supplementary Table 1), mostly from the same geographic region (Lower Silesia), were taken as a control group.
Both patients and controls were interviewed for their history of smoking and divided into never-smokers (0 pack-years) and smokers (present or past smokers-quitting at least one year before diagnosis). Additionally for each patient we had information about the number of pack-years. Information about smoking history for patients and controls was presented in Table 1 and in Supplementary Table 2, respectively. As it can be seen from Table 1, women with NSCLC generally smoked a lower amount of cigarettes than men (e.g. 47.1% of women smoked up to 11-20 pack-years, while among men it was only 21.4%, i.e almost 80% of men smoked more than 11-20 p-y), had lower disease stage, had slightly lower values on the ECOG scale, and had lower risk of death than men of the same age, and the same disease stage.
Strength of association was measured with odds ratios (ORs) for smokers and never-smokers with 95%-confidence interval. In the case of stable association among strata, i.e. OR smokers ≈ OR never-smokers association was measured with common odds ratio and estimated with Mantel-Heanszel estimator (OR.MH). OR.MH is useful when OR seems stable among smokers and never-smokers. This homogeneity of ORs was tested as hypothesis H0: OR smokers ≈ OR never-smokers vs. H1: OR smokers ≠ OR never-smokers and p-value is reported (p MH ) as the result of Breslow test. If true odds ratios among two strata are not identical but do not vary much, OR.MH still is a useful summary of the conditional associations between SNP and risk of cancer. Departure from the Hardy-Weinberg equilibrium was tested with chi-square test and measured as f = p cc −p 2 c p c (1−p c ) where p c and p cc are allele c and genotype cc frequencies. f < 0 in case of deficiency of homozygotes, f > 0 corresponds to deficiency of heterozygotes and f = 0 when locus is in HWE. Difference in genotype distributions between patients and controls was tested with the chi-square test on one degree of freedom (score test). Based on properties of gamma family distributions, test for null H0: All four ORs = 1 both in the smokers and in the never-smokers group opposite to H1 : ∃ i ∈ f1, 2, 3, 4g : OR i ≠ 1 was tested with chi-square statistic on k = 2 degrees of freedom and reported in Table 2 as P*. Global statistic for association of SNPs of a gene with the risk of cancer has c 2 2ÂL distribution, where L is the number of analyzed SNP in a gene. This procedure tests H0 : ∀ l ∈ f1, 2, …, Lg ∧ ∀ i ∈ f1, 2, 3, 4g : OR l,i = 1, opposite to H1: ∃l ∧ ∃i: OR l,i ≠ 1. Basic statistics used to describe the main variables considered in the paper were median, first and third quartiles (Q1, Q3), minimal and maximal observations. Median  Strength of association is measured with odds ratios (OR) for smokers (S) and never-smokers (NS) with 95%-confidence interval. OR.MH is common odds ratio estimated with the Mantel and Haenszel estimator which is useful when OR seems stable among smokers and never-smokers. This homogeneity of ORs was tested as H0: OR smokers = OR never-smokers vs. H1: OR smokers ≠ OR never-smokers and p-value is reported (p MH ). If true odds ratios among two strata are not identical but do not vary much, OR.MH still is a useful summary of the conditional associations between SNP and risk of cancer. The table also presents results of testing hypothesis H0: There are no associations between genotype and risk of cancer, i.e. all OR S = 1 opposite alternative H1: H0 is false, and reports them as P*-values.
was used as a location parameter, and S n statistic was computed as a measure of variability (44) S n = med {med|x ix j |; j = 1…n} which is an average difference between two randomly sampled observations. Expected age of diagnosis depending on genotype in SNPs related with risk of NSCLC was analyzed with F-test of ANOVA for contrasts constructed based on the risks (ORs) of NSCLC and adjusted to sex and smoking status. R 2 is determination coefficient of the model. Haplotype frequencies were estimated iteratively by maximization of a posteriori probabilities. Test for association of NSCLC with haplotypes is chi-squared distributed. Survival was analyzed with proportional hazards models with survival function estimated with Kaplan-Meier estimator. Stage of NSCLC was analyzed with Monte Carlo simulation and tested with Mahalanobis D 2 statistic. Response to chemotherapy was analyzed with logistic model where Y is type of response, r is score for type of response: progressive disease, stable disease, partial response or complete response, x is vector of predictors and h is the logit function. In case of small samples confidence intervals CI95% was estimated with bootstrap method.
Supplementary Table 4 and Supplementary Figure 1 present power estimation of the test for association between genotype and risk of cancer. Therefore, in never-smokers the presence of one C allele reduced the disease risk by a factor of two. As it was shown in Table 2, homogeneity of odds ratios among smokers and neversmokers was tested. In the case of rs26653 the p-value is reported as p MH = 0.014. Again it is clear that association between this SNP is not the same in smokers and never-smokers. We also tested the null hypothesis that there is no association between genotype in this SNP and risk of cancer neither in the smokers nor in the never-smokers. Test for this null hypothesis gives the result P* = 0.011. Concluding, there is relationship between genotype in rs26653G>C and risk of NSCLC, and this relationship depends on smoking status. Similarly, in the case of rs30187 and rs27044, we observed that along with the growing number of minor alleles the risk of NSCLC increased in smokers whereas decreased in neversmokers. These results are not surprising because of the existence of linkage disequilibrium between analyzed polymorphisms in ERAP1 (see Supplementary Table 7). In contrast, for rs2287987 we observed a reverse effect as minor homozygotes protected smokers and predisposed never-smokers ( Table 2).

Genetic Data
Global test for association of all ERAP1 polymorphisms with lung cancer gives c 2 df =10 = 32:6, p = 0:000314 It is worth noting that this association can be discovered in the Caucasian population only after taking into account information about smoking history.

ERAP2 rs2248374
The rs2248374G>A was distributed similarly in smoking and never-smoking patients and controls ( Table 2), therefore not showing any significant association with NSCLC by itself, but seemed to influence the effect of ERAP1 (see below). We investigated also another polymorphism of ERAP2, namely rs2549782. However, because of the nearly complete LD detected between both SNPs in our population (R 2 = 0.99), we did not include rs2549782 into the statistical analysis.

ERAP1,2 Haplotype Distribution in Smoking and Never-Smoking Patients and Controls
As genes for ERAP1 and ERAP2 aminopeptidases are located close to each other on chromosome 5, and both enzymes may function not only separately but also in a complex (45, and references therein), which can change their activity and substrate specificity, we analyzed frequencies of ERAP1 (five SNPs) plus ERAP2 (one SNP) haplotypes in our groups of patients and controls depending on their smoking status. Details of this analysis are shown in Tables 3 and 4. Associations of individual haplotypes with NSCLC were different between smokers and never-smokers. Overall, our analysis suggested that observable effects of haplotypes, like those of individual SNPs, were more distinct in never-smokers. In this group, we detected a significant difference in the frequency of haplotypes between patients and controls (c 2 df =14 = 24:57; p = 0:039) ( Table 4). The haplotype rs26653G-rs26618T-rs2287987C-rs30187C-rs27044C-rs2248374G increased nearly two-fold (OR = 1.99) the risk of NSCLC, whereas the presence of the haplotype rs26653C-26618T-rs2287987T-rs30187T-rs27044G-rs2248374G decreased the disease risk two-fold (OR = 0.52).
Although none of the haplotypes reached a level of significance in smokers, the effects of some haplotypes were quite different in smokers and never-smokers. The most striking difference was seen for the haplotype rs26653G-rs26618C-rs2287987T-rs30187C-rs27044C-rs2248374G which increased the disease risk in neversmokers nearly four-fold (OR = 3.71), but turned out to be protective in smokers (OR = 0.56) (Tables 3 and 4). Of note, the other haplotype, rs26653G-rs26618C-rs2287987T-rs30187C-rs27044C-rs2248374A, possessing the same alleles in all ERAP1 SNPs but differing only by ERAP2 allele, was not associated with NSCLC independently of smoking status.
The effects of individual ERAP1,2 haplotypes on NSCLC risk in never-smokers and smokers are shown also in Figure 1, where the effect size of single haplotypes was measured with log ORs. It clearly shows that majority of haplotypes behaved in an opposing way depending on smoking status. TAP1 rs1135216 and rs1057141, TAP2 rs4148876, rs1800454, rs241447 and rs16870908, PSMB9 rs1351383 and rs17587, PSMB8 rs2071543 All these SNPs were distributed similarly in smoking and neversmoking patients and controls (Supplementary Table 5).

Age at Diagnosis and ERAP1 Polymorphism
Our analysis revealed that ERAP1 polymorphism influenced age at diagnosis (AAD) of NSCLC depending on smoking status. Figure 2 and Supplementary Table 6 show the expected age of diagnosis for each ERAP1 SNP genotype that was calculated on the basis of observed values, adjusted to gender and smoking. Overall, SNP genotypes associated with disease risk in smokers (see Table 2) were also associated with earlier AAD in this group, whereas opposite associations of the same genotypes in neversmokers correlated with later age of diagnosis ( Figure 2). The strongest relationship was found for ERAP1 rs26653 (p = 0.0087). In smokers, the disease risk for genotypes GG and GC was nearly the same (OR = 1.0 and 1.01 respectively, Table 2), and the     A similar effect to that observed for rs26653, albeit weaker, was demonstrated for rs30187 and rs27044. Thus, rs30187TT and rs27044GG, bearing some disease risk in smokers, were associated with lower age at diagnosis values, whereas the opposite was true for the same genotypes in never-smokers ( Figure 2).
It should be noted here, however, that the R 2 coefficient of determination values for all tested ERAP1 SNPs were between 0.022 and 0.032 (Supplementary Table 6) which means that genotypes of ERAP1 SNPs in combination with the information about gender and smoking status explain only 2-3% of the variability of the age of onset.
SNPs in other APM genes did not exhibit any associations with age at diagnosis.

Stage of the Disease, ERAP1 and PSMB9
Two SNPs revealed associations with the disease stage: rs27044 in ERAP1 and rs17587 in PSMB9 ( Table 5). We noticed that the FIGURE 1 | Haplotype effect size of risk of NSCLC measured with log OR and sorted by effect in never-smokers group. A negative value of log OR (log OR < 0) corresponds to a protective effect of haplotype (OR < 1), while log OR > 0 corresponds to a predisposing haplotype effect (OR > 1). Differences in haplotype distributions between cases and controls were tested for smokers (p = 0.3742) and never-smokers (p = 0.03906). main division for cut-off of age at diagnosis was 62.5 years. Generally, the mean disease stage was lower in patients diagnosed later (i.e., above 62.5 years) and corresponded to value of 2.55 on the severity scale 1-4. In patients who were diagnosed with NSCLC before reaching the age of 62.5 years, the average disease severity was 2.88.
Additionally, in patients diagnosed before the age of 62.5 years, the value of the mean stage of disease was different depending on the PSMB9 rs17587 genotype: GA heterozygotes had a lower value (2.65) than both homozygotes (AA and GG, 3.04). When GA heterozygote group was divided into those with age at diagnosis of 54.5 years or less, and those above 54.5 years, then the first group had mean disease stage of 3.19, whereas the second group had only 2.49 ( Table 5).
On the other hand, in the group of patients diagnosed at an age above 62.5, the mean disease stage depended not on the PSMB9 rs17587, but on the ERAP1 rs27044 genotype. In this case heterozygotes CG had a lower mean disease stage (2.32) than both homozygotes (2.71) ( Table 5). Figure 3 shows a mean expected disease stage in relation to age at diagnosis and both polymorphisms: PSMB9 rs17587 and ERAP1 rs27044. It shows that the mean stage of disease decreased with age at diagnosis, but only in a group of ERAP1 rs27044 CG heterozygotes (left panel). This decrease was stronger in PSMB9 heterozygotes than in both homozygotes. In contrast, in a group of both ERAP1 rs27044 homozygotes the age at diagnosis was not important for disease stage in a group of PSMB9 heterozygotes, but the mean stage of disease increased with the age at diagnosis in PSMB9 homozygotes (right panel). Other genetic markers were not associated with the stage of disease. Also gender and smoking did not play a role as factors affecting stage of NSCLC, both when they were considered independently or together with age at diagnosis and genetic polymorphisms (data not shown).

Surgery, ERAP1 rs30187, and Overall Survival
In this study, NSCLC patient's OS depended on several factors: disease stage, gender, treatment, and ERAP1 rs30187 polymorphism.
In our group of patients we observed that the risk of death increased with the severity of disease: with each point on the scale of disease stage this risk increased HR stage = 1.53 times [CI95% (1.22; 1.92), p = 0.00022]. Therefore, a patient in disease stage III had HR III vs I stage = 2:34 times higher death risk than a patient in disease stage I. A patient in the disease stage IV had HR IV vs I stage = 3:58 times higher death risk than a patient in stage I.
Generally, men had a higher risk of death than women. At any time from diagnosis, death risk for a man was HR man = 1.63 higher than that of a woman in the same age, with the same disease stage and the same ERAP1 rs30187 genotype [CI95% (1.09; 2.43), p = 0.0161].
Surgery was the most important factor for a patient's survival. When we compare two patients of the same gender, at the same disease stage and the same rs30187 genotype, then the individual who did not have a surgery had a death risk HR surgery = 3.88 higher than a person who underwent surgery [CI95% (2.33; 6.48), p = 1.03 × 10 −7 ].
Finally, the ERAP1 rs30187 genotype also affected survival: patients being the carriers of the T allele (TT homozygote or CT heterozygote) had HR ERAP1 rs30187 C >T = 1:46 times higher risk of death than patients carrying CC genotype with the same disease stage, gender, age at disease diagnosis, treatment and other features [CI95% (0.82; 2.68), p = 0.0972]. This effect was relatively strong: the death risk for a person with genotype CT or TT in comparison with CC was as high as the death risk of an individual in stage III compared to a patient in stage II (see above). Table 6 shows the survival of patients 5, 12, and 21 months after diagnosis depending on surgery as the strongest factor affecting survival, and on ERAP1 rs30187 genotype. We observed that in the group of patients who underwent surgery, higher percentages of those with ERAP1 rs30187CC genotypes survived after 5, 12, and 21 months as compared to those with CT or TT genotype. Interestingly, even in the group of patients who did not qualify for surgery, there was a similar difference between CC versus CT + TT genotypes in months 5 and 12. After 21 months  Figure 3. SD, standard deviation.
the growth of a tumor was so advanced that the rs30187 genotype had no effect. The data for Table 6 were implemented from survival curves shown in Figure 4. In the left panel (A) of the figure survival curves of patients submitted to surgery or those who were not qualified for it, divided according to their ERAP1 rs30187 genotype are presented. We were able to note, firstly, that survival of nonoperated patients was much shorter than those who were subject to surgical intervention. Secondly, survival of CC homozygotes was better than that of patients with CT and TT genotypes. This effect was much stronger for patients who underwent surgery than for non-operated patients; importantly, the effect was detectable even for the latter ones during the first three months of observation.
The right panel (B) presents an estimation of cumulative hazard function (only for patients who underwent surgery) confirming previously described relationship between the possession of CT or TT genotype and higher risk of death in comparison to CC genotype during the whole observation period.

Response to Chemotherapy and ERAP1 rs27044
Only one SNP, rs27044 in ERAP1, was associated with response to chemotherapy. Namely, a heterozygous CG genotype was associated with better response, whereas both homozygotes responded worse ( Table 7). That is, only 20.0% of heterozygotes (CG), but 30.12% of homozygotes (CC and GG), had tumor progression despite chemotherapy. Complete response was achieved by 26.7% of heterozygotes but only 16.87% of homozygotes. Therefore, a heterozygote had OR = 1.76 times higher chance to respond better to chemotherapy than had a homozygote (p = 0.0698) adjusted for disease stage, FIGURE 3 | Average stage of NSCLC according to the age at diagnosis and two polymorphisms in genes PSMB9 rs17587 and ERAP1 rs27044. See also Table 5 for comparison of differences among two SNPs. Hypothesis H0: There is no difference between genotypes among PSMB9 (red vs. blue line) as well as there is no difference between genotypes of ERAP1 was tested against H1: This is not true at least in one gene; p = 0.000299.

DISCUSSION
Gene expression profiles of cancers arising in smokers and never smokers are very different (8,46,47). Also mutations in lung cancer cells differ in smokers and never-smokers; for example, those in TP53 and KRAS genes are more frequent in smokers, whereas lung cancer in never smokers is characterized by EGFR TK mutations, ALK, RET and ROS fusions. Mutation numbers are also lower in never-smokers, but most of them are suspected to be causative for malignant transformation, whereas mutations in smokers, although more numerous, are believed to be mostly passengers without effect on transformation (48). Moreover, it has been shown that multiple germ-line genetic polymorphisms affect susceptibility to lung cancer differently in smokers and never-smokers. Thus, the effects of polymorphisms in DNA repair genes were found to depend on smoking status: for example, polymorphisms of XRCC1 and ERCC2, involved in decreased adduct removal, were associated with risk in never or light smokers, but tended to be protective in heavy smokers. The authors explain this paradoxical finding as follows: "Although the reasons for these differences are not clear, it is possible that in heavy smokers, the effects of the DNA repair polymorphisms are overwhelmed" (8). Therefore, lung cancer in smokers and never-smokers are, according to many authors, two different diseases (49). It is conceivable, therefore, that genetic associations of APM polymorphisms to NSCLC in smokers (where tobacco smoke is the strongest factor) and never-smokers (where genetics may play much stronger role in addition to environmental factors other than tobacco) may differ. The main issues addressed in our study were: (i) whether smokers and never-smokers differ in the associations of APM polymorphisms with NSCLC risk and clinical outcome, (ii) whether ERAP2 influences the ERAP1 effect, if any, as we observed it in other diseases-psoriasis (50), and ankylosing spondylitis (51).
In this study we demonstrated that all tested ERAP1 single nucleotide polymorphisms, except rs26618, conferred susceptibility to cancer not only in never-smokers, but also in smokers, although in the opposite direction. Therefore, as far as we know, we describe here for the first time the opposite associations of ERAP1 polymorphisms with NSCLC in smokers versus neversmokers. This result may be explained by the differing expression of distinct proteins as well as their different mutations in smokers versus never-smokers. This should result in differences in repertoire of neoantigenic peptides bound and presented by class I HLA molecules between smokers and never-smokers.  Different peptides, in turn, may require different ERAP variants for efficient antigen presentation to cytotoxic T cells, resulting in elimination (or not) of cancer cells. Of course, multiple differences in peptide repertoire may exist also at the level of individuals in each of these groups, and some of them may blur the differences between smokers and never-smokers to some extent. Hence the detectable effects of individual SNPs and especially their haplotypes (in which the same allele of an individual SNP may be distributed in many haplotypes) were not very strong.
The rs2248374 in ERAP2 was not associated with susceptibility to NSCLC when analyzed individually, however we obtained interesting results for this variant when it was analyzed as a part of the haplotype with ERAP1 SNPs. Normal ERAP2 protein may likely form not only ERAP2 homodimers (52) but also ERAP2/ ERAP1 heterodimers in which both enzymes may be functional and, in some cases, both of them are required to produce some epitopes (45). In light of this, we found the result of our analysis for rs26653G-rs26618C-rs2287987T-rs30187C-rs27044C-rs2248374G [ERAP1 127P/276M/349M/528R/730E, ERAP2(-)] and rs26653G-rs26618C-rs2287987T-rs30187C-rs27044C-rs2248374A [ERAP1 127P/276M/349M/528R/730E, ERAP2(+)] haplotypes intriguing. Both haplotypes contain the same ERAP1 alleles, and in both of them the combination of these alleles should probably result in low enzymatic activity (because of possessing Arg528 and Glu730 residues) (53)(54)(55). The former haplotype, ERAP2-negative, was protective (albeit non-significantly) in smokers, but it was associated with higher risk in never-smokers. The latter haplotype with the same ERAP1, but differing only by the presence of functional ERAP2, did not differentiate between smokers and never-smokers. rs2248374A allele gives normal ERAP2 protein expression, whereas rs2248374G produces unstable mRNA degraded by nonsense mediated decay and usually results in undetectable protein expression (56). However, it has been reported recently that during stimulation by microbial infections the rs2248374G allele may give two truncated protein isoforms absent in rs2248374AA homozygotes and with an unknown biological role (57)(58)(59), possibly interfering in ERAP1 activity. Why the haplotype GCTCCG without functional ERAP2 molecule gives opposite results in never-smokers versus smokers, whereas the same ERAP1 (GCTCCA) with functional ERAP2 does not affect NSCLC prevalence in any of these groups, is not clear. As mentioned earlier, smokers and never-smokers suffering from lung cancer differ in gene expression, germ-line genetic polymorphisms, and total number of mutations (8,(46)(47)(48). Therefore, we may speculate that in smokers the GCTCCG haplotype results in production of epitopes which could stimulate cytotoxic T cells to kill cancer cells. In never-smokers, with the same ERAP1 haplotype, other proteins are expressed being potential source of epitopes leading to the recognition and killing of cancer cells by T cells, but this particular ERAP1 variant does not produce suitable epitopes from them. This may result in uncontrolled growth of the tumor. The GCTCCA haplotype, coding for active ERAP2, allows this enzyme, either alone or in a heterodimer with ERAP1 to destroy stimulating epitopes in never-smokers and protective epitopes in smokers, resulting in a lack of ERAP effect on NSCLC risk.
It is important to mention here that the effect of ERAPs on tumor antigen presentation depends mainly on HLA-I allotypes which are highly polymorphic, and therefore distinct in different individuals. Hence, the influence of the ERAPs without stratification for HLA-I alleles cannot be very strong. In addition, not all epitopes to be bound by HLA-I and presented to T cells require trimming by ERAPs (60).
It should be explained here why we have not detected earlier any association of ERAP1 SNPs with NSCLC in Poles, whereas they were found in Chinese populations (40). In that work, we had incomplete information on the smoking status of patients and controls, therefore we could not use it in statistical calculations. As we see here, ERAP1 SNPs seem to act in opposite directions in smokers and never-smokers. Therefore, in a randomly selected representation of Polish populations, consisting of smokers and never-smokers, these effects mutually neutralized themselves. In Chinese populations examined in that study smokers might have accounted for a smaller proportion of tested individuals (Dr. Yao Yufeng, personal communication), and indeed, their results for rs26653 were concordant with those of our never-smokers, although for rs27044 they were concordant rather with our smokers, and inconsistent with our results for other ERAP1 SNPs independently of smoking status [compare our Table 2 with Table 1 of Yao et al. (40)]. However, Chinese people are genetically distant from Poles, as reflected in highly significant differences of ERAP1 SNP frequencies in healthy Chinese and Polish populations [(40), Table 2] as well as HLA-I allele frequencies [www.allelefrequencies.net] which play a major role in presentation of tumor antigens. Interestingly, ERAP2 SNPs were found to be associated with NSCLC in the Chinese population even without stratification for smoking status as compared to healthy individuals, and there were also differences between smoking and never-smoking patients. However, these results were published in Chinese only (61), so their verification on the basis of short English abstract is not possible. Nevertheless, it looks as if both ERAPs are associated with NSCLC in Chinese even without division into smokers and never-smokers. For ERAP2, like for ERAP1, this may be explained by differences in allele frequencies for these variants between Chinese and Polish populations. Indeed, frequencies of rs2248374 A and G alleles were 37.8 and 62.2%, respectively, in healthy Chinese individuals (61), but 49.1 and 50.9% in Poles (our data).
We demonstrated that some ERAP1 genotypes which exerted NSCLC risk or protection also influenced the patient's age at diagnosis. We observed an interesting relationship: the higher disease risk a given genotype bears, the lower AAD (earlier diagnosis) is expected, and it was seen most clearly for rs26653. In smokers, rs26653CC homozygosity predisposed to NSCLC and decreased AAD by about 3 years as well. In contrast, in never-smokers the same genotype protected against lung cancer and increased the age at diagnosis by about as much as 10 years in comparison to GG and GC genotypes. We may try to explain it by the supposition that rs26653CC never-smoking subjects develop cancer more seldom than others and are diagnosed significantly later because they suffer from a more benign lung tumor. Similar, but weaker associations were observed for rs30187, where, as we expected in never-smokers, the protective genotype TT also elevated patient's age of diagnosis. In smokers, vice versa, the TT genotype was weakly predisposing to cancer, and in line with that, decreasing the age at diagnosis.
There is no information in literature about a similar association of any known genetic variant with age of diagnosis of NSCLC, therefore these observations need further development. Interestingly, rs26653 in ERAP1 has been associated with several autoimmune diseases including psoriasis (62)(63)(64), ankylosing spondylitis and inflammatory bowel disease (65). This information indicates that rs26653 may be functional. However, data regarding its potential impact on enzyme activity or expression level are scarce. The rs26653 heterozygote was claimed by Mehta et al. (66) to be associated with normal ERAP1 expression in cervical carcinoma when contrasted with both homozygotes which had lower expression (in terms of proportion of positive cells in immunohistochemistry). However, no detailed data were shown. They also demonstrated for the same cancer that rs26653 heterozygotes had better survival than both homozygotes (66). In an earlier report, the same authors showed that low ERAP1 expression was significantly associated with reduced survival of patients with cervical carcinoma (67). However, this SNP had no effect on the survival of our patients with NSCLC, in contrast to rs30187 (discussed below). For the role of ERAP1 expression in other cancers, see (29). It should be noted that overall influence of ERAP1 genotypes on AAD was relatively small. Even in the case of rs26653 (where association was the strongest) the effect of this polymorphism in combination with additional information about smoking status and patient's gender explains only 3.2% of the entire variability of AAD.
Our results indicated that the stage of cancer was primarily associated with age at diagnosis: the mean disease stage was lower in patients diagnosed for NSCLC at a later age. We may explain it by the assumption that a less malignant, slower growing tumor causes symptoms prompting the patient to see a doctor for the first time later than a more malignant, fast-growing cancer, hence the later age of diagnosis correlates with lower disease stage.
The influence of SNPs on cancer stage was observed here for the rs27044 in ERAP1 and rs17587 in PSMB9. For both SNPs, the heterozygotes had lower stage of disease, particularly in later age. Interestingly, the decline of disease stage together with the increasing AAD was the strongest in those patients who were simultaneously heterozygotic for both rs27044 and rs17587. In contrast, in patients homozygotic for rs27044C or 27044G and simultaneously homozygotic for rs17587A or rs17587G, the disease stage was growing with age at diagnosis, while in rs17587 heterozygotes but homozygotic for rs27044C or 27044G, the stage of cancer did not change with age. In all these situations, the effect of rs27044 was stronger than that of rs17587, suggesting that a difference in the production of peptides by different variants of PSMB9 is for disease stage less important than more or less active trimming of these peptides by ERAP1, depending on rs27044. For rs17587, it was reported that in GG individuals, the immunoproteasome activity was higher than in GA individuals (68), but this result was not confirmed subsequently in a large panel of cancer cell lines (69). Therefore, the difference in activity of two PSMB9 variants seems to be weak if any, hence the stronger effect of ERAP1 variant. In addition, as rs17587 did not affect susceptibility to lung cancer, we cannot expect its strong contribution to the staging of cancer. Some effects of rs17587 heterozygosity, modulating to some extent the rs27044 effect, might result from a wider spectrum of peptides produced by double heterozygotes.
We showed that NSCLC patient's overall survival mostly depended on implemented surgery. As anticipated, nonoperated patients had death risk (at any time) about four-fold higher than operated patients. Other factors influencing OS were: stage of cancer, gender, and, unexpectedly, the ERAP1 rs30187 genotype. Patients with CC genotype in this polymorphism survived significantly longer than patients possessing CT or TT. Of note, the effect of the rs30187 SNP was seen both in operated and non-operated patients.
Moreover, the effect of rs30187 on survival was relatively strong, particularly in patients operated upon, which suggests the influence of the rs30187 variation on the repertoire of peptides contributing to immune-mediated elimination of residual cancer cells. Therefore, rs30187 in ERAP1 was found here to be associated both with risk of cancer and survival. The rs30187 SNP is located in hinge domain III and can indirectly affect enzymatic function by affecting the conformational dynamics of ERAP1 (70). In addition, as mentioned previously, the rs30187C allele is a part of several ERAP1 haplotypes (71) coding for a low activity form of the enzyme (53). If CC homozygotic patients suffering from NSCLC survived longer, we may suppose that less active ERAP1 favored generation of much more immunogenic cancer epitopes recognized by the immune system. Alternatively, high-activity ERAP1 present in patients with TT and, at lower level, with CT genotypes might destroy (by over-trimming) some cancer epitopes, and therefore the efficient cancer elimination by immune cells was impaired. Then, we suppose that in patients with rs30187CC genotype, effective treatment along with more efficiently working immune system are able to prolong their life.
To date there was no published data about the influence of ERAP1 SNPs on survival of NSCLC patients. In a single study concerning cervical carcinoma, Mehta and colleagues analyzed the influence of several SNPs in ERAP1 (including rs30187) on disease risk and OS. However, the association with OS was found only for rs3734016 and rs26653 but not for rs30187 (66). Thus, ERAP1 polymorphisms may work differently in response to different cancers most likely associated with different HLA alleles and therefore requiring different preparation of antigenic peptides for efficient immune response.
In response to chemotherapy, only one SNP, ERAP1 rs27044, seemed to play some role. Here, again, the heterozygote had an advantage over both homozygotes: complete response was achieved by 26.7% of patients with CG genotype, whereas only 16.9% of homozygotes (CC and GG) reached this result. Similarly, progressive disease in spite of therapy was observed in only 20.0% of heterozygotes but in 30.1% of homozygotes. The observed effect concerned patients with either a complete response or, in contrary, with disease progression, but was undetectable in patients who experienced partial response or stable disease. It is unclear why patients heterozygous for rs27044 had some advantage over those with homozygous genotypes. In contrast to ERAP1 rs30187 [R528K] that directly influences the kinetics of the enzyme, rs27044 [E730Q] may modulate enzymatic activity depending on substrate length. The presence of glutamic acid in 730 position increases preference of ERAP1 for shorter peptides (37,70). Thus, we suggest that in the heterozygotes both allotypes of ERAP1 could in total utilize a broader spectrum of substrates and the chance of generation of more immunogenic cancer epitope(s) is much greater than in the case of homozygotes. In this scenario "useful" cancer epitopes could be produced efficiently by both alleles. However, this finding requires confirmation on a larger cohort of patients and, optimally, also in vitro study on peptide trimming efficiency of different ERAP1 allotypes, as we have made for other SNP in other disease (72). If confirmed, it may appear useful as a predictor of a better treatment response.
It should be mentioned here that advantage (or disadvantage) of heterozygote over both homozygotes (i.e., positive or negative heterosis) has been observed in many clinical situations, the best known example is the advantage of sickle cell anemia heterozygotes in regions where malaria is prevalent: wild-type homozygotes are prone to malaria, sickle cell homozygotes are prone to anemia, and heterozygotes are the best survivors (73).
One of the limitations of the current study is an insufficient number of never-smoking participants, especially among NSCLC patients. This resulted from the dominant contribution of smoking to lung cancer prevalence.

CONCLUSIONS
To the best of our knowledge, this is the first study which discovered the adverse associations of ERAP1 polymorphisms with NSCLC in smokers and never-smokers. We described also the relationships between some tested APM variants (again, mostly in ERAP1) and several clinical parameters such as age at diagnosis, disease stage, overall survival and response to chemotherapy. Multiple associations of ERAP1 genetic variants, not only with NSCLC risk but also with disease course and response to treatment, described here, support the thesis that ERAP1 aminopeptidase can take part in the anti-cancer immune response. Although rs2248374 in ERAP2, analyzed separately, was not associated with NSCLC, the presence or absence of functional ERAP2 in a few haplotypes seems to modify the effect of ERAP1 to some extent, especially in the case of the ERAP1 variant with low enzymatic activity. Our novel results, however, need to be considered as preliminary and therefore further developed and confirmed by other research groups. It is worth studying, as its results may contribute to better prediction of treatment outcomes.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Bioethical Committee of Wrocław Medical University, Wrocław, Poland. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
PK and AW conceived and designed the experiments. AW, MJ, MW, and WN-M performed the experiments. MS performed statistical analysis. KP, IP, and AK contributed to patients' recruitment and clinical characteristics. JD and NJ contributed to controls recruitment. PK, AW, MS, and IP wrote the paper. MJ and MW critically reviewed the paper. All authors contributed to the article and approved the submitted version.

ACKNOWLEDGMENTS
We are grateful to our patients and healthy volunteers for their kind agreement to give us blood and share their clinical data.