STRA6 Polymorphisms Are Associated With EGFR Mutations in Locally-Advanced and Metastatic Non-Small Cell Lung Cancer Patients

Retinol plays a significant role in several physiological processes through their nuclear receptors, whose expression depends on retinol cytoplasmic concentration. Loss of expression of nuclear receptors and low retinol levels have been correlated with lung cancer development. Stimulated by retinoic acid 6 (STRA6) is the only described cell membrane receptor for retinol uptake. Some chronic diseases have been linked with specific polymorphisms in STRA6. This study aimed to evaluate four STRA6 single nucleotide polymorphisms (SNPs) (rs4886578, rs736118, rs351224, and rs97445) among 196 patients with locally-advanced and metastatic non-small cell lung cancer (NSCLC) patients. Genotyping, through a validated SNP assay and determined using real time-PCR, was correlated with clinical features and outcomes. NSCLC patients with a TT SNP rs4886578 and rs736118 genotype were more likely to be >60 years, non-smokers, and harboring EGFR mutations. Patients with a TT genotype compared with a CC/CT SNP rs974456 genotype had a median progression-free survival (PFS) of 3.2 vs. 4.8 months, p = 0.044, under a platinum-based regimen in the first-line. Furthermore, patients with a TT rs351224 genotype showed a prolonged overall survival (OS), 47.5 months vs. 32.0 months, p = 0.156. This study showed a correlation between clinical characteristics, such as age, non-smoking history, and EGFR mutational status and oncological outcomes depending on STRA6 SNPs. The STRA6 TT genotype SNP rs4886578 and rs736118 might be potential biomarkers in locally-advanced and metastatic NSCLC patients.


INTRODUCTION
Lung cancer is the leading cause of cancer-related deaths worldwide and is currently responsible for 1.8 million deaths per year; moreover, it will soon represent around 30% of all cancer-related deaths (1). Non-small cell lung cancer (NSCLC) accounts for approximately 85% of all lung cancers. In Mexico, this tumor subtype is the leading cause of cancer deaths, with about 7,000 deaths per year (2,3), and almost 90% of them were diagnosed in advanced stages (4)(5)(6).
Retinoids are a family of signaling molecules related to vitamin A (retinol), and play a significant role in several cellular processes, such as embryogenesis, cell proliferation, differentiation, and apoptosis (7). Moreover, retinoids have been demonstrated to stop carcinogenesis in breast, lung, prostate, bladder, and lung cancer (7)(8)(9)(10). Retinoid effects occurred due to the activation of the nuclear retinoic acid receptors (RARs) and retinoid X receptors (RXRs), responsible for developing pulmonary epithelium and lung cancer pathogenesis (11,12). Evidence indicates NSCLC patients overexpress RXRa and RARg; conversely, RARb suppression exists, suggesting an imbalance in the receptors' expression that could play an essential role in cancer genesis (12). Preclinical and clinical studies demonstrated that retinoids might reverse premalignant epithelial lesions and preventing secondary neoplasms.
Nuclear retinol receptor expression depends on cytoplasm concentrations, and the plasma retinol-binding protein (RBP) is the primary transporter for retinol in blood. Retinol enters the cells from the RBP via a cell-surface receptor, named STRA6 (Stimulated by Retinoic Acid 6) (13,14). The exogenous addition of retinoic acid has increased RAR levels in preclinical models (15). Furthermore, natural or synthetic retinoids added to paclitaxel and cisplatinbased chemotherapy has improved the median overall survival (OS) in advanced NSCLC patients (9,16).
STRA6 is a cell-surface membrane protein located predominantly in blood-organ barriers and expressed during embryonic development and adult stages (17,18). It belongs to a large group of retinoic-acid-stimulated genes encoding transmembrane proteins whose function remains unknown (19). STRA6 gene is contained in the chromosome 15q24.1 region and made up of 20 exons and 19 introns encoding for a 667 residues protein. However, until now, the topology and transportation mechanisms of the human STRA6 gene remains unknown. STRA6 is overexpressed in the colon, breast, and gastric tumor tissue, suggesting a tumorigenesis role (18,20,21). Recently, an invitro study showed that STRA6 is down-regulated by miR-873 in gastric cancer cells; (22) yet, scarce information about its role in lung cancer pathogenesis is available.
STRA6 mutations and single nucleotide polymorphisms (SNPs) alter its function and cellular topography, mainly in the RBP-binding domain (23). The Matthew-Wood syndrome, a severe pathological phenotype characterized by malformations in multiple human organs, including the eye, brain, heart, and lung, has been associated with some STRA6 mutations (24,25). Prior reports have shown an association of type 2 diabetes mellitus (DM2) in Chinese and Indian populations with 3 SNPs (rs736118, rs974456, and rs4886578); however, the allelic variation expression varies within these populations (26,27).
Limited information concerning STRA6 polymorphisms in the Mexican population has been documented. Furthermore, information regarding STRA6 genetic variability and its relationship with lung cancer remains unexplored. This study aims to analyze four STRA6 SNPs (rs4886578, rs736118, rs351224, rs974456) and their association with clinical features, PFS, and OS in locally-advanced and metastatic NSCLC patients.

Clinical Data Collection, Treatment Regimen, and Patient Follow-Up
All relevant clinical features from eligible patients were obtained from electronic medical records. Chosen therapy was at consideration of treating physicians in agreement with international guidelines (4). Most of the patients received platinum-based chemotherapy as first-line treatment in combination with paclitaxel or pemetrexed. After progression to first-line chemotherapy, EGFR mutated patients received EGFR-tyrosine kinase inhibitors (EGFR-TKIs) as further therapy lines.
Subsequently and independently, each allelic variance was confirmed by direct sequencing in five percent of the samples and was amplificated based on its high integrity and purity. Primers were designed using Primer Express Software v.

Statistical Analysis
For descriptive purposes, continuous variables were summarized as arithmetic means and standard deviations (SD), while categorical variables were presented as frequencies and proportions. The student t-test was employed for assessing differences among continuous variables. Chi-squared or Fisher's exact tests were used to assess the significant differences among categorical variables. For the survival analysis, variables were dichotomized. PFS was defined as the time to an event from the date of initiating treatment until disease progression, unacceptable toxicity, death, or loss to follow-up. In contrast, overall survival (OS) was defined as the time from diagnosis until death or loss to follow-up. PFS and OS were estimated using the Kaplan-Meier method; log-rank tests were employed to set comparisons among subgroups. Adjustment for potential confounders was performed using a multivariate Cox regression model, and hazard ratios (HR) were calculated along with their corresponding 95% confidence intervals (CI) as a measure of association. For each clinical factor, we performed a univariate analysis followed by multivariable Cox Regression analysis. The inclusion criteria for variables in multivariable analyses was a p >0.10. A p-value <0.05 was considered significant based on a two-sided test. All statistical analyses were performed using the SPSS software package, v. 15 (SPSS Inc, Chicago, IL).

Study Population
The median age at diagnosis was 61 (± 12.5) years; most of the patients were female (56.1%), without wood-smoke exposure (55.1%), and half of them were current/former smokers. Adenocarcinoma subtype was the most common subtype in 85.7%, and 40% of the total cohort (73/196) had a positive EGFR mutational status. All baseline characteristics are summarized in Table 1. Patients received a platinum-based chemotherapy regimen in 84.2% (165/196) as first-line treatment, and 86 (43.8%) of the cases harboring an EGFR mutation received as second or third line an EGFR-TKI. Gefitinib was the most commonly used TKI (17.9%), followed by afatinib (16.8%) and erlotinib (7.7%) ( Table S1).

Genotyping Frequency of STRA6 SNPs in NSCLC Patients
Genotyping and allelic frequency are presented in Table S2. Four STRA6 SNPs (rs4886578, rs736118, rs351224, and rs974456) were analyzed, and all allelic frequencies were under Hardy-Weinberg assumption criteria, i.e., they are in genetic equilibrium (Table S2).

Association Between the Clinical Characteristics and the STRA6 SNPs
The relationship between clinical characteristics and STRA6 polymorphisms was analyzed, and all findings are summarized in Table 2. Remarkable, TT genotype in SNP rs4886578 was correlated with non-smokers vs. current smokers (71.4 vs 28.6%, p = 0.038), and EGFR mutated vs wild-type (71.4 vs 28.6%, p = 0.001).

PFS and Clinical Features
The median PFS to platinum-based chemotherapy as first-line therapy was 4.6 months. Locally-advanced (stage III) vs. metastatic disease (stage IV) (5.5 vs. 4.3 months, p = 0.027) were associated with a better PFS. Additionally, patients with CC/CT genotype in the rs974456 polymorphism showed a better median PFS than those with a TT genotype (4.9 vs. 3.3 months, p = 0.044) ( Figure 1). However, in the multivariate analysis, the clinical-stage was the only significant variable representing higher hazards of death [HR 1.85, 95% CI (1.0-3.3), p = 0.038] ( Table 3).

OS and Clinical Features
The median overall survival (OS) in the whole population was 34.9 months. Concerning the presence of STRA6 SNPs, patients with a TT genotype in rs351224 showed a longer median OS (47.5 vs. 32.0 months; p = 0.156) compared with those without this polymorphism; however, this difference was not statistically significant. In the univariate analysis, age ≤60 years vs >60 years (43.9 vs 26.9 months, p = 0.024); ECOG PS (0-1) vs (≥ 2) (39 vs 13.3 months, p = < 0.001), positive EGFR mutation (47.2 vs 20.4 months, p < 0.001); and glycaemia <120 mg/dL (38.9 vs 17.8 months, p = 0.001) were associated to an improvement in OS. No other characteristics were significant for OS ( Table 3). In the multivariate analysis, just early-stage disease, ECOG PS (0-1), and EGFR mutated status were significant for an improvement in OS ( Table 3). The

SNP Validation in NSCLC Samples by Sequencing
Following the screening analysis for STRA6 allelic discrimination, the detection of polymorphisms was validated by mean Sanger sequencing; notably, results showed 100% concordance with those previously obtained by RT-PCR.

Linkage Disequilibrium and Haplotype
Analysis of STRA6 SNP's Linkage disequilibrium (LD) between loci couples was performed based on RT-PCR genotyping results, using Haploview software. Haplotype presence was studied in the population-based on Gabriel's definition (D' > 0.9; minimal allelic frequency >5%) (29). We found three red points of LD between SNPs (one intense and two weaker). The four STRA6 SNPs are shown in Figures 2A, B, with a color code (dark red = strong LD; light red = intermediate LD and white = in equilibrium). According to our data, there is only one block of LD in STRA6 among the analyzed population. The SNP rs351224 variant was in equilibrium concerning other variants (rs4886578, rs736118, and rs974456). There is a strong LD between rs4886578 and rs736118, followed by intermediate LD between rs4886578, rs736118, and rs974456 (Figures 2A, B).
The analysis of all data performed on the study population investigates the possible haplotype and the value of confidence interval pre-established by Gabriel et al. using an r 2 = 0.8 (29). Concerning the haplotype (rs4886578/rs736118), 68.0% showed CC, 28.3% TT, and 2.5% CT, respectively (Figures 2A, C).
Clinical Features of Haplotype (TT/TT, rs4886578/rs736118) An analysis of whether the presence of a specific haplotype was associated with clinical features was performed. Table 4 summarizes the main findings. The TT/TT haplotype (rs4886578/rs736118) was correlated with the presence of an EGFR mutation (64.7 vs. 35.3%, p = 0.019) compared with wildtype patients. No other clinical characteristics showed any association with this haplotype. Additionally, 8.1% (16/196) of the patients with a TT genotype for SNP rs4886578 also have the TT genotype for rs351224 and rs974456. TT genotype was associated with older patients (> 60 years), 81.3 vs 18.8%, p = 0.034 and EGFR mutated status, 62.5 vs 37.5%, p = 0.029 (Table 4).
Significantly, a homozygous condition in both haplotypes was associated with older age (> 60 years) and EGFR mutated status ( Table 4). Nevertheless, no association of these haplotypes with PFS or OS was observed (Table S4).

DISCUSSION
Currently, STRA6 is the only known retinol transporter mediating its entry into the cell cytoplasm. The association of the RBPretinol complex with STRA6 acts as a cytokine downstream activation receptor in the JAK2-STAT signaling pathway (30,31). Previously, STRA6 was overexpressed in breast and colon tumor tissue compared with healthy tissue (32); however, it is unknown the expression or variation within lung cancer.
In the present study, we analyzed four STRA6 SNPs in patients with a lung cancer diagnosis, and all were in Hardy-Weinberg equilibrium. According to The Genome Aggregation Database, (gnomAD, https://gnomad.broadinstitute.org/), all four SNPs    database, all previously mentioned SNPs have a benign role; thereby, more robust evidence is in need to define SNPs' role in lung cancer pathogenesis. On the other hand, the association between STRA6 polymorphisms and type 2 diabetes mellitus (DM2) has been studied in Indian and Chinese populations. In both studies, the same STRA6 SNPs were analyzed, like in the present work. In the Indian population, there was a positive correlation between haplotype (AAT-rs974456/rs736118/rs4886578) and the presence of DM2 (26). Likewise, in the Chinese population, the rs736118 and rs974456 SNPs were associated with DM2 (27). In Mexico, approximately 10% of the population has DM2. Our group showed that NSCLC patients with DM2 but an adequate glycemic index had better survival than hyperglycemic states (34). In the present study, 15% of the patients had a DM2 confirmed diagnosis. Although it was not part of the main objectives, we analyzed the possible association between the expression of the STRA6 SNPs with DM2, and in contrast with previous studies, we did not found connections between both conditions. However, according to our previous findings, patients without DM2 diagnosis and lower blood glucose levels than 120 mg had better median PFS.
TT genotype SNP rs4886578 and rs736118 were associated with the presence of an EGFR mutated status. A larger proportion of Latin American patients with NSCLC harbor EGFR mutations than other races, even more, in non-smokers, young women, and history of wood-smoke exposure; similar findings in the current study have been described (35)(36)(37). Two of the SNPs (rs4886578 and rs736118) might probably serve as potential biomarkers in non-smokers harboring EGFR mutations, although the relation between the EGFR pathway and STRA6 function remains unknown. We  made an additional linear statistical analysis, strengthening the relationship between the TT haplotype of both SNPs and the EGFR mutations presence (Table S4). EGFR mutations can induce sustained downstream signaling activation to induce proliferation, differentiation, and survival. At the same time, EGFR signaling could activate the STAT3 (signal transducer and activator of transcription 3) pathway (38), which can stimulate via STRA6 activation. Although the other two SNPs do not show associated clinical characteristics, the TT genotype of rs974456 showed a statistical association with shorter PFS, suggesting a role as a poor prognosis biomarker in NSCLC patients. The rs736118 SNP is a G>A polymorphism that leads to a Met to Ile substitution in the C-terminal of STRA6. According to the literature, this change can alter protein trafficking and cell surface expression (26). Even though the other three SNPs are localized in gene intron regions, SNPs in non-coding regions can modulate gene expression (39). Since STRA6 modulates retinol entry to the cytoplasm, the TT genotype of rs736118 or rs974456 could modify the cellular membrane's STRA6 function. Thus, limiting the retinol intake and inhibiting nuclear receptors' expression and activity leads to shorter PFS and OS, supporting its probable significance as a poor prognosis predictor. Previously we have reported that loss of RARa and RARb expression in tumor tissue from advanced NSCLC patients was associated with a worse prognosis (40). Indeed, in an independent study, advanced NSCLC patients who received chemotherapy complemented with all-trans-retinoic acid showed a better overall response rate than those who received chemotherapy alone (55.8 vs. 25.4%, respectively). Moreover, a small patient subgroup who expressed RARb on tumor tissue showed a better response (66.6%) than those without RARb expression (42%) (9).
In contrast, it seems that the TT genotype SNP rs351224 might not influence STRA6 localization or function, but could induce an overactivation, facilitating the retinol entry to the tumor cell, high expression of nuclear receptors, and favorable PFS and OS outcomes. Despite the survival benefit was not significant for patients with a TT genotype SNP rs351224, although a strong trend was observed. Patients with the TT genotype rs351224 did not exhibit a concomitant presence of any other SNP; thus, it could be considered an independent factor compared to other SNPs. An intriguing topic would be to define the role of the TT genotype SNP rs351224 in NSCLC patients under retinoic acid substitution in combination with systemic therapy, and whether it could be adopted as a prognosis biomarker.
Two STRA6 polymorphisms have been related as a haplotype (rs4886578/rs736118) with prognostic clinical characteristics. First, this haplotype was correlated to an EGFR mutated status supporting the rationale that both SNPs by itself could participate in the development of lung cancer with non-smoking history. Although the statistical analysis was not significant, it was possible to observe a slightly trending of a more prolonged OS in those patients with TT/TT genotype in the haplotype (rs4886578/ rs736118). This tendency disappeared when the patients were homozygous for three SNPs (rs4886578/rs736118/rs974456); it reinforces the assumption that rs974456 TT genotype might be an adverse prognostic marker.
We propose STRA6 expression could be related to normal retinol levels in the tumor cell cytoplasm, favoring expression of retinol nuclear receptors, and playing an essential role in NSCLC patients as improving oncological outcomes. Nuclear receptors expression depends on the retinoic acid concentration in the cellular cytoplasm. STRA6 function could be essential for the downstream activation pathway, strengthening the hypothesis as a determinant role in lung carcinogenesis. Further research regarding the STRA6 expression in tumor tissue and its relation with RARs expression and clinical outcome in advanced NSCLC patients is in process.
Our study had some limitations due to the small and heterogeneous sample size. Moreover, important variations in allelic frequencies between races have been described, thus, it will be attractive to explore STRA6 SNPs in other populations. Ideally should be pursued in larger cohorts of patients to increase validity in subgroups analyses. To the best of our knowledge, this is the first study that addresses a potential association between STRA6 SNPs, relevant clinical characteristics, and oncological outcomes in NSCLC patients and attractive results have emerged in this first analysis. However, it remains to confirm if SNPs rs4886578 and rs736118 participate in lung cancerogenesis regardless of tobacco exposure, through stratifying patients based on EGFR mutational status. There is an urgent need to develop reliable biomarkers; in this context, it is relevant to discern whether any STRA6 SNPs could help to establish prognosis or predict benefits with current standards of treatment.

CONCLUSION
A positive association between the TT genotype SNP rs4886578 and rs736118, individually and in the haplotype form, was observed with non-smoking history and EGFR mutational status; suggesting its involvement in the genesis of lung cancer unrelated to tobacco exposure.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Bioethics and scientific INCan (Instituto Nacional de Cancerologıá Mexico) committees (015/026/IBI) (CEI/954/ 15). The patients/participants provided their written informed consent to participate in this study.

FUNDING
This work was partially financed by the Instituto Nacional de Cancerologıá of Mexico.