Single Nucleotide Polymorphisms in HOTAIR Are Related to Breast Cancer Risk and Prognosis in the Northeastern Chinese Population

Background The long noncoding RNA HOX transcript antisense RNA (HOTAIR) is highly expressed in breast cancer (BC) tissues and is associated with the recurrence and metastasis of BC. Until now, the results of studies on associations between several functional single nucleotide polymorphisms(SNPs) (rs920778, rs1899663, and rs4759314) in HOTAIR with BC susceptibility carried out in different regions of China are still inconsistent. There is no study on correlation between HOTAIR SNPs and prognosis of Chinese population. Therefore, we investigated the relationship between HOTAIR SNPs and susceptibility to and prognosis of BC. Method We conducted a population-based case-control study involving 828 BC cases and 905 healthy controls. Peripheral blood DNA was used for genotyping. The association between HOTAIR genotypes and BC risk were estimated by odds ratios (ORs) computed using the binary logistic regression model. The relationships between HOTAIR SNPs and clinicopathological features were tested by Pearson’s chi-square test or Fisher’s exact test. Survival was analyzed using the Kaplan-Meier method. Results The functional rs920778 genetic variant increased BC risk in the codominant model. Individuals with the rs920778 GG genotype had an OR of 2.426 (95% confidence interval [CI] = 1.491–3.947, P < 0.001) for developing BC compared to individuals with the AA genotype. Individuals with the AG genotype had an OR of 1.296 (95% CI = 1.040–1.614, P = 0.021) for developing BC compared to individuals with the AA genotype. Individuals with the rs4759314 GA genotype had a lower BC risk than individuals with the rs4759314 AA/GG genotype (OR = 0.566, 95% CI = 0.398–0.803, P = 0.001). The rs1899663 genotype had no correlation with BC susceptibility. Haplotypes composed of rs920778–rs1899663 and rs920778–rs1899663–rs4759314 could increase BC risk (all P < 0.001). There were no statistically significant associations between HOTAIR SNPs and clinicopathological characteristics. The rs920778 GG/AG genotypes were associated with worse disease-free survival (DFS) (p = 0.012), and the rs4759314 GA genotype was associated with worse DFS and overall survival (OS) (p = 0.011). Conclusion HOTAIR SNPs(rs920778 and rs4759314) are significantly related to BC susceptibility and prognosis in the northeastern Chinese population, indicating the significance in the occurrence and development of BC.

Background: The long noncoding RNA HOX transcript antisense RNA (HOTAIR) is highly expressed in breast cancer (BC) tissues and is associated with the recurrence and metastasis of BC. Until now, the results of studies on associations between several functional single nucleotide polymorphisms(SNPs) (rs920778, rs1899663, and rs4759314) in HOTAIR with BC susceptibility carried out in different regions of China are still inconsistent. There is no study on correlation between HOTAIR SNPs and prognosis of Chinese population. Therefore, we investigated the relationship between HOTAIR SNPs and susceptibility to and prognosis of BC.
Method: We conducted a population-based case-control study involving 828 BC cases and 905 healthy controls. Peripheral blood DNA was used for genotyping. The association between HOTAIR genotypes and BC risk were estimated by odds ratios (ORs) computed using the binary logistic regression model. The relationships between HOTAIR SNPs and clinicopathological features were tested by Pearson's chi-square test or Fisher's exact test. Survival was analyzed using the Kaplan-Meier method.
Results: The functional rs920778 genetic variant increased BC risk in the codominant model. Individuals with the rs920778 GG genotype had an OR of 2.426 (95% confidence interval [CI] = 1.491-3.947, P < 0.001) for developing BC compared to individuals with the AA genotype. Individuals with the AG genotype had an OR of 1.296 (95% CI = 1.040-1.614, P = 0.021) for developing BC compared to individuals with the AA genotype. Individuals with the rs4759314 GA genotype had a lower BC risk than individuals with the rs4759314 AA/GG genotype (OR = 0.566, 95% CI = 0.398-0.803, P = 0.001). The rs1899663 genotype had no correlation with BC susceptibility. Haplotypes composed of rs920778-rs1899663 and rs920778-rs1899663-rs4759314 could increase BC risk (all P < 0.001). There were no statistically significant associations between HOTAIR SNPs and clinicopathological characteristics. The rs920778 GG/AG genotypes were associated

INTRODUCTION
Breast cancer (BC) is one of the most common cancers among women, and its morbidity and mortality have continued to increase worldwide in recent years, reflecting its strong invasive and metastatic characteristics (1,2). In China, the incidence of BC is increasing annually and is currently the most common malignant tumor in women (3,4).
Long noncoding RNAs are non-protein-coding transcripts longer than 200 nt and play important roles in the epigenetic regulation of gene expression. One such RNA, HOX transcript antisense RNA (HOTAIR), is transcribed from the antisense strand of the HOXC locus and mainly regulates HOXD genes. HOTAIR can guide the polycomb repressor complex 2/lysinespecific histone demethylase 1 complex to a specific target gene, where the complex then trimethylates lysine 27 of histone H3 and dimethylates lysine 4 of histone H3, causing chromatin remodeling (5)(6)(7). This can block some metastasis suppressor genes, such as junctional adhesion molecule 2, protocadherin beta 5, and protocadherin 10 (6).
HOTAIR is overexpressed in BC and is related to the occurrence, development, recurrence, and metastasis of BC. A large number of researches indicate that HOTAIR has oncogenic impacts. In the diagnosis of gastric cancer, pancreatic cancer, and colorectal cancer, the expression of HOTAIR is used to distinguish benign and malignant tissues, compared with benign tissues, the expression of HOTAIR in tumor tissues is higher. HOTAIR is a biomarker of therapeutic response and poor prognosis (8). In our previous studies, we identified several single nucleotide polymorphisms (SNPs) in HOTAIR (rs920778, rs4759314, and rs1899663). These SNPs are located in the intronic region of HOTAIR and can regulate its expression (9)(10)(11). Therefore, these SNPs are expected to be related to the occurrence, development, recurrence, and metastasis of BC. These SNPs may have the potential to be a new therapeutic target. Further research demonstrated that these sites are related to gastric cancer, esophageal cancer, and papillary thyroid cancer susceptibility. Several meta-analyses showed that these SNPs are associated with the susceptibility of gastrointestinal cancer and estrogen-dependent tumors (12)(13)(14)(15)(16)(17), especially in Asian populations. However, these SNPs have different prevalences in different regions and races and are more common in Asian populations than in Caucasian populations. There are also different prevalences in different parts of Asia (12,17). Few studies have reported a relationship between HOTAIR SNPs and BC susceptibility. The participants of the current study were mainly Chinese, Turkish, Iranian, and Indian. The results of the research on populations in different regions are inconsistent and controversial. There are obvious regional differences in the distribution of HOTAIR genetic polymorphisms in gastrointestinal cancer. The GG genotype of rs920778 in northeastern population is higher than in middle or southern population, the GG genotype of rs4759314 in southeastern population is higher than in middle and northern population, the GG genotype of rs1899663 in southeastern population is lower than in middle and northern population. Therefore, it is of great significance for us to study the role of HOTAIR gene polymorphisms in the occurrence, development, and prognosis of BC in the Northeast population for the first time. This can provide research basis for discovering new pathogenic targets of BC.Therefore, we retrospectively analyzed the relationship between HOTAIR SNPs (rs920778, rs1899663, and rs4759314) and BC clinicopathological features and prognosis in the northeastern Chinese population.

Ethics
This study was approved by the Institutional Ethics Committee of our hospital (ethical approval number 2014-031). Written informed consent was obtained from each participant at recruitment. The study methods were carried out in accordance with the relevant guidelines.

Selection and Description of Participants
We investigated the relationship between HOTAIR SNPs (rs920778, rs1899663, and rs4759314) and the risk of BC in a case-control study. All of the participants were genetically unrelated Han Chinese individuals from northeast China. This study enrolled 828 BC patients and 905 age-matched healthy control individuals from The First Affiliated Hospital of Jilin University (Changchun, Jilin Province, China) between April 2013 and September 2016. The median follow-up time was 6.7 years. The participants' clinical characteristics were collected through medical records. The inclusion criteria were female patients with early breast cancer diagnosed by pathology.

HOTAIR SNP Genotyping
DNA was extracted from peripheral blood samples. Genotypes were detected using the MassArray system (Agena, San Diego, CA, USA) by the matrix-assisted laser desorption ionizationtime of flight mass spectrometry method. HOTAIR was selected and genotyped as described previously (9)(10)(11). SNP genotyping was performed without knowledge of case status. Reciprocal testing was performed in a random sample of 15%, and the reproducibility was 99.7%.

Statistics
SPSS 24.0 (IBM Corp., Armonk, NY, USA) and the online SNPStats program (https://www.snpstats.net/start.htm, developed by the Institut Català d'Oncologia) were used to analyze BC risk. Variables are characterized as percentages. The Hardy-Weinberg equilibrium test was conducted to test whether the allele frequency distribution of the case group and the control group is biased. Pearson's chi-square test was used to examine differences in demographic variables and HOTAIR htSNP genotype distributions between BC cases and controls. Associations between HOTAIR genotypes and BC risk were estimated by odds ratios (ORs) and their 95% confidence intervals (CIs), which were computed using the binary logistic regression model. All ORs were adjusted by age whenever appropriate. Pearson's chi-square test or Fisher's exact test were used to evaluate the relationships between HOTAIR SNPs and clinicopathological features. The effects of the HOTAIR SNPs on disease-free survival (DFS) and overall survival (OS) were evaluated using the Kaplan-Meier method and the univariate Cox model. All statistical tests were two-sided. P values < 0.05 were considered statistically significant.

Participant Characteristics
The control group was composed of healthy women who had undergone routine physical examination in our hospital who did not have a family history of cancer. The median age of the control group was 38 years (range 32-53 years). There were 678 premenopausal women and 226 postmenopausal women. The median age of the case group was 51 years (range 44-58 years), in which there were 398 premenopausal women and 430 postmenopausal women. Only 32 cases had a family history of cancer. Among 828 BC cases, 793 were of an invasive ductal carcinoma and 35 were of other types. Detailed information on the characteristics of the BC patients can be found in Table 1.

Relationship Between HOTAIR SNPs and Risk of BC
The genotype distribution of cases and controls showed no deviation for different HOTAIR SNPs either in controls or in cases ( Table 2). The functional rs920778 genetic variant was associated with an increased risk of BC in three genetic models. We used the Akaike Information Criterion to select the optimal genetic model, and the lowest AIC was found in the codominant genetic model. We discovered that the rs920778 GG genotype had an OR for BC development of 2.426 (95% CI = 1.491-3.947, P < 0.001) compared to the AA genotype. The rs920778 AG genotype was also associated with an increased BC risk compared to the rs920778 AA genotype (OR = 1.296, 95% CI = 1.040-1.614, P = 0.021). The functional rs4759314 genetic variants had different associations with BC risk in different genetic models (i.e., the codominant model, dominant model, and overdominant model). The AIC was the lowest in the overdominant model; therefore, using that model, the rs4759314 GA genotype was associated with a lower risk of BC development (OR = 0.566, 95% CI = 0.398-0.803, P = 0.001) than the AA/GG genotype. The rs1899663 SNP did not show an association with BC risk ( Table 3).

Haplotype Analysis
In order to analyze the influence of different haplotype systems composed of three HOTAIR SNP sites on the occurrence of BC, We explored the correlation between haplotypes and BC risk by comparing the distribution of each haplotype in the case group and the control group. There were significant differences between the case and control groups in the distributions of the following haplotypes: rs920778-rs1899663 and rs920778-rs1899663-rs4759314 (all P < 0.001). However, rs1899663-rs4759314 was not related to BC risk ( Table 4). Haplotype 1 is composed of wild-type genotypes of three SNPs. Haplotype 2 increased BC risk compared with haplotype 1 (OR=1.39, 95%CI=1.13-1.70, P=0.002). Haplotype 4,5, and 6 reduced BC risk compared with haplotype 1 (all P < 0.001) ( Table 5).

Relationship Between HOTAIR SNPs and Prognosis of BC
We did not find any significant associations between HOTAIR SNPs and clinicopathological characteristics of BC, including tumor size, lymph node metastasis, lymphovascular invasion, molecular type, histological grade, family history, menstrual status, and pathological type ( Table 6). We then assessed the correlation between HOTAIR    SNPs and survival in Cox regression analysis. GA genotype of rs920778 and GA genotype of rs4759314 could predict poor prognosis both in univariate analysis and multivariate analysis ( Tables 7 and 8).
For the rs920778 SNP, there were many significant differences in DFS (P = 0.012) after comparing all three genotype of rs920778, the GG genotype was associated with the worst DFS of the three genotypes (GG, AG, and AA) in univariate analysis (HR = 1.909, P = 0.048). The AG genotype was associated with worse DFS than the AA genotype (HR = 1.48, P = 0.037). However, there was no significant difference in OS (P = 0.13). (Figure 1 and Table 8).
There was no difference in DFS or OS between individuals with the rs1899663 CC or CA genotypes and those with the AA genotype in multivariate analysis. (Figure 2 and Table 8).
When comparing all three rs4759314 genotypes, the GA genotype had worse DFS and OS than those with the AA genotype (P = 0.008). The OS was significantly different when comparing all three genotypes (P = 0.011); individuals with the GA genotype had the worst OS(P=0.001). However, individuals with the GG genotype and those with the AA genotype had similar OS (P = 0.968) ( Figure 3 and Table 8).  We found that the G allele is rare and can increase BC risk. The distributions of rs920778 genotypes in BC patients in these five BC studies differ slightly. However, in these five BC studies, the AA genotype is more common whereas the GG genotype is rare. , however, subjects with GA genotype had a worst DFS than subjects with AA genotype(P = 0.026); the overall three curves of OS are statistically significant, subjects with GA genotype had a worst OS in the three genotypes, however, subjects with GG genotype and AA genotype had a similar OS.  [Yan et al. (26) found that the GG genotype is more common, the AA genotype is rare, and the A allele carries disease risk]. One possible reason for this is differences in tumor type and gender. Further, the study by Yan et al. has limitations in terms of sample size, detection methods, research results, and population. Therefore, we think that the rs920778 GG/AG genotypes can increase BC risk.
The rs1899663 SNP (C > A) is located in the intronic region of HOTAIR, and the AA genotype can increase the expression of HOTAIR by altering the binding affinity of various transcription factors, such as paired box 4, spermatogenic leucine zipper 1, and zinc finger protein 281 (ZFP281) (28) Yan et al., and the present study). In Taheri et al.'s study, the relationship between rs1899663 SNP and prostate cancer susceptibility was not observed due to the sample size, however, they compared prostate hyperplasia tissues and prostate cancer tissues and identified that the risk of AA alleles in tumor tissues was higher than CC alleles, This result suggests that AA alleles might increase prostate cancer susceptibility (28).The P value of 0.087 in the present study is close to 0.05. Therefore, we think that SNP has a weak relationship with BC risk when increasing the sample size due to the weak effect of rs1899663 SNP on BC risk.
The rs4759314 SNP (A > G) is located in intronic region of HOTAIR, and the GG genotype can increase the expression of HOTAIR by enhancing the promoter activity of HOXC11. Of five studies examining the relationship between rs4759314 and BC susceptibility ( Table 9), only two Chinese studies [Yan et al. (26) and this study] have shown a significantly decreased risk of BC in individuals with at least one G allele (GA or GG) compared to individuals with homozygous A alleles. The other three studies did not show any association of rs4759314 with BC risk. Two studies in the Iranian population [Hassanzarei et al (25). and Khorshidi et al. (27)] are too small to draw such conclusions, and another Chinese study in southeast China (Lin et al.'s study) showed that rs4759314 has no correlation with the risk of BC (21). This may be because BC has a population bias, and the population in the other two studies are in middle and northeast China.
We also examined the haplotypes of these three SNPs. We found that the rs920778-rs1899663 and rs920778-rs1899663-rs4759314 haplotypes significantly increase BC risk (P < 0.001). We believe that the gene effect of rs920778 affects the gene effects of the other two SNPs, which leads to an increase in breast cancer susceptibility. In Bayram's study, researchers found an association between the rs920778 SNP and clinicopathological features in the Turkish population, including advanced TNM stage, larger tumor size, distant metastasis, perineural invasion, and poor histological grade (23). In Hassanzarei's study, they found that the rs920778 SNP was only significantly associated with ER status (25). In Rajagopal's study, they found that the rs920778 variant (AG + GG genotype) increased BC risk in premenopausal women (OR = 5.86, 95% CI = 3.87-8.88, P < 0.0001) (24). However, we did not find any relationship between the rs920778 SNP and any clinicopathological features. This may be because all of these studies were retrospective and there might be an inherent selection bias. Because of the low distribution frequency of the GG genotype (about 3-8% among common populations), a large sample size is needed to analyze the relationship between the GG genotype and clinical characteristics.
We initially found that the rs920778 SNP is associated with the prognosis of BC patients. Our study found that the DFS of patients with the AG/GG genotypes was much shorter than that of patients with the AA genotype (P = 0.012). However, we did not find similar results for OS. Our result is consistent with the result of Weng et al's (29) study showing that subjects with GG genotype of rs920778 had a poor OS, however Xavier-Magalhhães et al's study (30) had the opposite result that subjects with the AG genotype of rs920778 had a longer overall survival than GG subjects in glioma patients. The sample size and tumor type might result the inconsistent results. HOTAIR is regarded as an oncogene involved in both the initiation and progression of cancer. The rs920778 SNP is located in the intronic enhancer region of HOTAIR, and polymorphism of rs920778 could alter the activity of this enhancer and lead to overexpression of HOTAIR. Elevated expression of HOTAIR has been reported to be associated with reduced DFS and OS in cervical cancer patients (31). Therefore, we infer that the influence of the rs920778 SNP on BC prognosis is mediated by the resultant increased expression of HOTAIR. We need to prove this hypothesis further in BC tissue.
The rs1899663 SNP had no effect on DFS. However, in subgroup analysis, individuals with the CA genotype had worse DFS than those with the AA genotype (P = 0.007), which could provide references for future research. Individuals with the rs4759314 GA genotype had worse DFS and OS than patients with other genotypes(P=0.008 and P=0.001 respectively), which was also interesting and needed further study. Because of the low distribution frequency of the rare genotypes AA of rs1899663 and GG of rs4759314 (no more than 2.4%), a larger sample size is needed to assess their associations with prognosis. Because the rs1899663 and rs4759314 SNPs can increase the expression of HOTAIR, their effect on BC prognosis appears to be mediated by the increased expression of HOTAIR. However, we need to prove this hypothesis further in BC tissue. Although all the results of survival analysis have not been verified in multivariate analysis, our results suggest that some gene loci may play a role in the occurrence and development of BC.
In summary, this study demonstrates, for the first time, that functional HOTAIR SNPs rs920778 and rs4759314 are related to the risk and prognosis of BC in the northeastern Chinese population, suggesting that these two SNP sites may be involved in the occurrence, development, and metastasis of BC by regulating the expression of HOTAIR. This may have certain significance for future diagnosis, drug development, and prognostic judgment of BC. The distribution of gene frequency of the three functional HOTAIR SNP loci has a certain correlation with regions and populations. This study only examined the northeast Chinese population as its research object, and it therefore cannot explain why these three HOTAIR SNP loci are responsible for the occurrence and development of BC in the overall Chinese population. Therefore, we need a more large prospective multicenter, multi-regional, multi-ethnic population to analyze the significancy of HOTAIR SNP in BC development and find a target of treatment.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/supplementary files, further inquiries can be directed to the corresponding authors.