Significance of Single-Nucleotide Variants in Long Intergenic Non-protein Coding RNAs

Single-nucleotide variants (SNVs) are the most common genetic variants and universally present in the human genome. Genome-wide association studies (GWASs) have identified a great number of disease or trait-associated variants, many of which are located in non-coding regions. Long intergenic non-protein coding RNAs (lincRNAs) are the major subtype of long non-coding RNAs; lincRNAs play crucial roles in various disorders and cellular models via multiple mechanisms. With rapid growth in the number of the identified lincRNAs and genetic variants, there is great demand for an investigation of SNVs in lincRNAs. Hence, in this article, we mainly summarize the significant role of SNVs within human lincRNA regions. Some pivotal variants may serve as risk factors for the development of various disorders, especially cancer. They may also act as important regulatory signatures involved in the modulation of lincRNAs in a tissue- or disorder-specific manner. An increasing number of researches indicate that lincRNA variants would potentially provide additional options for genetic testing and disease risk assessment in the personalized medicine era.


INTRODUCTION
Single-nucleotide variant (SNV), also known as single-nucleotide polymorphism (SNP), is the variant of a single nucleotide that occurs at a specific genomic position. It is the most common type of genetic variants, which has long been confirmed in various loci of the genome (Human Genome Structural Variation Working Group, Eichler et al., 2007). In the past few decades, genetic variants have been typically used to dissect complex human disorders through research on candidate genes, particularly genome-wide association study (GWAS), an observational study of the genome-wide set of genetic variants in different individuals, which is performed to identify whether any variant is associated with the phenotypes. As a representative of a large-scale variant analysis, it has provided an approach to identifying potential genetic variant loci associated with heterogeneous disorders, including cancer susceptibility (Freedman et al., 2011). With the development of emerging technologies, such as microarraybased genotyping and high-throughput next-generation sequencing, it offers a novel avenue for the clinical application of genetic variants (GTEx Consortium et al., 2017). As might be expected, the role of genetic variants in understanding the pathogenesis of diseases, therapeutic response, and even ultimately personalized medicine will be indispensable in the near future. Based on the implementation of the International HapMap Project and the 1000 Genomes Project, great breakthroughs have been achieved in the research field of genetic variants, particularly focusing on some variants of protein-coding genes (Human Genome Structural Variation Working Group, Eichler et al., 2007;Genomes Project et al., 2015). However, genetic variants, especially SNVs, not only occur to protein-coding sequences, but many of them also fall within non-coding regions or the intergenic regions between two genes. For instance, a considerable genetic component has been confirmed to be involved in the susceptibility of various cancers; the genomic contexts of cancer-associated SNVs (SNPs) have been analyzed within a comprehensive GWAS catalog. Of these risk variants, less than 10% are mapped in protein-coding regions, whereas most of them are located in the intronic or intergenic regions (Figure 1A), it brings forward the issue of these non-coding loci and their importance role in cancer research (Hindorff et al., 2009).
The Human Genome Project (HGP) has determined the whole sequence of nucleotide base pairs that compose the human genome and initially provided approximately 20,000 proteins that could serve as therapeutic targets (Venter et al., 2001). Subsequent large-scale annotation efforts, such as the Encyclopedia of DNA Elements (ENCODE) project, surprisingly, have identified hundreds of thousands of non-coding RNAs, which were previously regarded as "junk DNA" (The Encode Project Consortium, 2012). Among them, a great quantity of long non-coding RNAs (lncRNAs) are transcribed in mammalian genomes. Based on their locations and characteristics, lncRNAs can be placed into five broad categories: (1) intergenic, (2) antisense, (3) sense, (4) intronic, and (5) overlapping (Ponting et al., 2009;Derrien et al., 2012). Thereinto, long/large intergenic non-protein coding RNAs (lincRNAs), which are located within the genomic interval between two coding genes, are the major subtypes of lncRNAs accounting for approximately 63% ( Figure 1B). Compared with other lncRNAs, molecules for which we know next to nothing about, lincRNAs are generally unexplored and have yet to be elucidated. About half of these lincRNAs are transcribed from the vicinity (<10 kb) of proteincoding loci and more likely to be involved in cis-regulatory of the expression level of adjacent genes; other transcripts that are well away from an adjacent gene seem to have little chance of cis-regulatory within the nearby region. Although they rarely form triplexes within double-stranded DNA owing to their poor complementarity to sequences elsewhere within the genome, these lincRNAs often act as trans-regulatory players within some ribonucleoprotein complexes (Ulitsky and Bartel, 2013). LincRNAs may have crucial roles in various disorders and cellular models via multiple mechanisms. Alterations in the levels of lincRNA expression have been linked to the occurrence of various disorders, such as cancers; they may act as tumor suppressors or proto-oncogenes (Huarte, 2015). Currently, advances in highthroughput RNA sequencing and computing approaches allow for an unparalleled analysis of transcriptomes. Of the diverse kinds of RNA transcripts, lincRNAs are attractive as they can be found out from the existing RNA-seq datasets through available bioinformatics methods (Cabili et al., 2011).
According to recent reports from the ENCODE project, thousands and thousands of variant loci are present in the non-coding regions of the human genome, and total number continues to increase (Schaub et al., 2012). Generally, genetic variants, such as SNVs, which occur to the non-coding loci, are more frequently than in conservative protein-coding genes regions. A large number of GWAS-identified SNVs loci reside in the regions that encode lincRNAs, indicating that these variants of lincRNAs may play a crucial role in the susceptibility of diseases. More than three quarters of disease-associated genetic variants are remarkably overlapped in promoter or enhancer regions, suggesting that SNVs may serve as an important player in the regulation of transcript levels (Hindorff et al., 2009). Therefore, identification of such variant loci and elucidation of their biological functions would be of profound significance in understanding the etiology of disorders and in promoting novel approaches for the diagnosis, prevention, and treatment of disorder.

LONG INTERGENIC NON-PROTEIN CODING RNA VARIANTS AND DISEASE SUSCEPTIBILITY
As a matter of fact, the occurrence of complex diseases (e.g., cancer) is related to multiple factors, including genetic, environmental, and lifestyle. Among them, genetic factors are of particular interest, just as GWASs and next-generation sequencing studies have greatly broadened the understanding of genetic variants that confer risk of diseases. Numerous genetic variants in lincRNA regions have been determined to be associated with the susceptibility of heterogeneous diseases, especially multiple types of cancer. Herein, we reviewed some lincRNAs that encompass disease or trait-associated variants (Tables 1, 2).

Long Intergenic Non-protein Coding RNA Variants on the chr8q24 Locus
Genome-wide association studies have pointed to the chr8q24 genomic locus as a hotspot for cancer-associated variants owing to the large density, more strength, and high allele frequency of these variants (Yeager et al., 2007;Tuupanen et al., 2009). Even though chromosome 8q24 has been considered as a "gene desert" region owing to the absence of functionally annotated genes, with the only notable exception of the frequently amplified MYC (a proto-oncogene involved in tumorigenesis) (Chung et al., 2011). Surprisingly, large-scale studies have revealed that several lincRNAs are transcribed from the chr8q24 locus, such as CCAT1 (Kim et al., 2014), CCAT2 (Ling et al., 2013), PVT1 (Hanson et al., 2007), PCAT1 (Guo et al., 2016), and PRNCR1 ; all of these encompass multiple cancer-associated variants. For instance, lincRNA CCAT2 (Colon Cancer-Associated Transcript 2, also termed LINC00873), a transcript spanning SNV rs6983267, is associated with an increased risk for prostate, breast, FIGURE 1 | (A) Genomic distribution of single-nucleotide variants in cancers; a majority of cancer-associated single-nucleotide variants (SNVs) [single-nucleotide polymorphisms (SNPs)] are found in the intergenic or intronic regions, and only small numbers are located in protein-coding regions of the human genome. (B) Classification of long non-coding RNA (lncRNA) transcripts; long intergenic non-protein coding RNA (lincRNA) is a major subtype of lncRNA. colon, and colorectal cancers (Yeager et al., 2007;Tuupanen et al., 2009;Ling et al., 2013). CCAT2 is overexpressed in various types of cancers and may contribute to tumor growth, metastasis, and chromosomal instability by increasing MYC expression (Ling et al., 2013). LincRNA PRNCR1 has been reported to be involved in prostate carcinogenesis and may play an oncogene role via modulating the androgen receptor (Chung et al., 2011), PRNCR1 variants, especially rs1456315, are associated with the susceptibility of prostate and colorectal cancers Teerlink et al., 2016). Through an integrative analysis of the lncRNA transcriptome and GWAS data, Guo et al. (2016) have identified a prostate cancer-associated transcript PCAT1 and 10 risk loci on the chr8q24.21, including PCAT1 variants rs10086908 and rs7463708, which are significantly associated with prostate cancer susceptibility. As for PVT1 (also termed LINC00079), a GWAS analysis has identified that its variants rs13255292 and rs4733601 are associated with the susceptibility of diffuse large B cell lymphoma (Cerhan et al., 2014). Other independent SNVs (e.g., rs2720709 and rs2648875), which are mapped on PVT1, especially contributes to the development of end-stage renal disease (ESRD) in patients with type 2 diabetes (Hanson et al., 2007). A recent meta-analysis has summarized LincRNA, long intergenic non-protein coding RNA.
the relationship between two common variants (rs10505477 and rs7837328) in the intronic region of CASC8 (LINC00860) at 8q24 locus with the risk of cancers (Cui et al., 2018), including colorectal, gastric, and lung cancers Hu et al., 2016). Another intronic loci rs378854 is related to adiposity in the individuals of African ancestry (Ng et al., 2017).

Single-Nucleotide Variants in Long Intergenic Non-protein Coding RNA H19 Locus
The H19 (also termed LINC00008) is located in chromosome 11p15.5, a paternally imprinted onco-fetal gene, which is typically down-regulated in adult tissues but can be overexpressed in multiple types of solid cancer. LincRNA H19 expression is closely related to tumor growth, metastasis, recurrence, and clinical prognosis (Ge et al., 2018). H19 variants are involved in the susceptibility of multiple diseases. A meta-analysis study has indicated that variant T allele of rs2107425 is correlated with a decreased risk of developing cancers (e.g., breast, ovarian, lung, and bladder cancers) (Chu et al., 2016;Wu et al., 2017), whereas variant rs2839698 is associated with an increased risk of digestive cancers (colorectal and gastric cancers) via up-regulating H19 expression; of note, there is no significant association observed between rs217727 variant and cancers susceptibility (Chu et al., 2016). However, in other reports, H19 rs217727 has been linked to the risk of hepatocellular carcinoma (HCC) (Ge et al., 2018), oral squamous cell carcinoma (OSCC), and bladder cancer in the Chinese population (Guo Q. Y. et al., 2017). For coronary artery disease (CAD), the T variant of rs217727 is associated with an increased risk, whereas rs2067051 A variant is linked to a decreased risk (Gao et al., 2015). H19 rs217727, but not rs2107425 variant, is associated with susceptibility of women with preeclampsia (PE) (Harati-Sadegh et al., 2018). Additionally, maternally transmitted fetal H19 variants (e.g., rs217727, rs2071094, and rs10732516), along with paternal IGF2 variants, are independently correlated with the placental DNA methylation levels (Marjonen et al., 2018) and birth weight of newborns (Petry et al., 2011).

Single-Nucleotide Variant in MALAT1 and MIAT Regions
LincRNA MALAT1 (metastasis-associated lung adenocarcinoma transcript 1, also termed LINC00047) has rs619586 A > G variant, which is significantly associated with the susceptibility of pulmonary arterial hypertension (PAH), and the carriers with variant G genotypes have a decreased PAH risk (Zhuo et al., 2017). Recent study has suggested that rs619586 AG/GG genotypes could reduce the risks of coronary atherosclerotic heart disease and congenital heart disease (CHD) by regulating MALAT1 expression (Li et al., 2018b). Another report has showed that MALAT1 is overexpressed in colorectal cancers and that SNV rs1194338 mapping to its promoter region is significantly associated with a decreased risk of colorectal cancer . Moreover, the large-scale case-control association studies have identified a novel myocardial infarction-associated transcript, MIAT (also termed LINC00066), which encompasses rs2331291, and other variants confer the susceptibility of myocardial infarction (Ishii et al., 2006). As a component of the nuclear matrix, MIAT is mainly expressed in neurons, Rao et al. (2015) have reported that SNV rs1894720 is correlated with paranoid schizophrenia susceptibility, and MIAT may contribute to the pathogenesis of schizophrenia.

Other Long Intergenic Non-protein Coding RNA Variants in Human Cancers
In addition to the above lincRNA molecules, recent studies have identified many other cancer-associated variants within lincRNA regions. For example, the tissue differentiation-inducing non-protein coding RNA (TINCR), also termed LINC00036, is essential for somatic tissue differentiation and tumor progression (Kretz et al., 2013). It has been demonstrated that two variants of TINCR (rs2288947 and rs8105637) are significantly correlated with the susceptibility and lymph node metastasis of colorectal cancer ; the lincRNA TINCR rs2288947 G allele and rs8113645 A allele genotypes could reduce the risk of gastric cancer. HULC, an HCC up-regulated lncRNA, also termed LINC00078, and its variants (rs7763881 and rs1041279) are linked to the susceptibility of HCC . In thyroid carcinoma, several papillary thyroid carcinoma susceptibility candidates, such as PTCSC2, contain a risk-variant rs965513, and PTCSC3 encompasses rs944289; two lincRNA expression levels are strongly down-regulated in thyroid carcinoma tissues (Jendrzejewski et al., 2012;He et al., 2015). Additionally, GWAS analyses have identified five tag-SNVs, including rs944289 located in PTCSC3, are associated with large-vessel ischemic stroke (Lee et al., 2016). Xue et al. (2013) have reported that a prostate cancer gene expression marker, PCGEM1 (LINC00071), containing two risk-SNVs (rs6434568 C and rs16834898 A alleles) that are associated with a decreased risk of prostate cancer. Another prostate cancer risk-associated allele rs75823044 mapping to promoter of LINC00676 is almost exclusively found in African ancestry populations (Conti et al., 2017). In a GWAS analysis, five common variants including rs3803662 on the exon of CASC16 (LINC00918) have been identified to contribute to the susceptibility of lung and breast cancers (Orr et al., 2011). Furthermore, the colorectal cancer risk-SNV rs11776042 is located in the promoter of LNC00964, in which lincRNA is significantly decreased in colorectal cancer tissues . For tumor suppressor lncRNA GAS5, an insertion/deletion variant of rs145204276 is associated with the susceptibility of HCC (Tao et al., 2015) and colorectal and gastric cancers (Li et al., 2018a).

Other Disease-Associated Variants in Long Intergenic Non-protein Coding RNA Regions
Except for cancer susceptibility, some lincRNA variants are found to be associated with the risk of other heterogeneous diseases. GWAS and expression quantitative trait locus (eQTL) analyses have identified a risk factor for pathological inflammatory responses of leprosy, SNV rs1875147, which is an eQTL variant for lincRNA LOC105378318 located in chromosome 10p21.2 (Fava et al., 2017 (2011) GWAS analysis has identified a major depressive disorder (MDD) risk-associated variant rs12526133, which resides in exon of LINC01108, in which lincRNA is overexpressed in patients with MDD. Moreover, the maternally expressed imprinted gene, MEG3 (also termed LINC00023), containing variants rs941576 (Wallace et al., 2010) and rs34552516 (Westra et al., 2018), which is found to be associated with susceptibility of type 1 diabetes. Nikpay et al.'s (2015) comprehensive GWAS meta-analyses have reported an association of CAD susceptibility with several SNVs, such as rs1870634, which is located in the downstream of LINC00841, and its GG genotype is strongly linked to CAD risk and has a higher frequency in CAD patients.

For Clinicopathological Characteristics and Prognosis
In addition to disease susceptibility, trait-associated SNVs are widely used for the indication of clinicopathological characteristics, prognosis, and treatment response (Gong et al., 2017). For example, with regard to a neuroblastoma-associated variant rs6939340, which is mapped on the intronic locus of lincRNA CASC15 and NBAT1, neuroblastoma individuals with the risk alleles are more likely to have clinical aggressive presentation, including metastatic disease, tumor with MYCN amplification, and disease relapse (Maris et al., 2008). Two independent cohort studies have observed that risk-SNV rs2608053 of PVT1 is correlated to the survival outcome of patients with classical Hodgkin lymphoma (Ghesquieres et al., 2018). For multiple sclerosis, several risk loci of PVT1 may contribute to the prediction of an optimal response to treatment with glatiramer acetate (Kulakova et al., 2017). LincRNA H19 variants have been found to increase the risk of ischemic strokes, and the up-regulated H19 may induce cerebral ischemia reperfusion injury by activating autophagy . Recent studies have reported that H19 rs2839698 variant may serve as an indicator for the increased risk and poor prognosis of HCC . Among individuals with coal workers' pneumoconiosis (CWP), carriers of H19 rs2067051 CT/TT genotypes are associated with a decreased risk; H19 rs2067051 may be a possible biomarker for CWP prevention . A case-control study has shown that lincRNA MALAT1 variant rs4102217 is related to increased HCC risks; this SNV may be a potential predictor for the risk and prognosis of patients with HCC (Wang et al., 2018b). Another MALAT1 rs3200401 T allele has been found to confer better survival for patients with advanced lung adenocarcinoma . Furthermore, TDRG1 (testis development related 1, also termed LINC00532) is overexpressed in esophageal squamous cell carcinoma (ESCC) tissues; the AA genotype of variant rs8506 is linked to an increased risk of ESCC; this risk allele may regulate TDRG1 expression by disrupting the sponge binding of miR-526b; high TDRG1 expression and rs8506 A allele variant may contribute to the advanced tumor-node-metastasis stage and poor survival for ESCC patients (Han et al., 2017). Recent GWAS analyses have demonstrated that variant rs11672691 of PCAT19 (LINC01190) on 19q13 is positively related to aggressive prostate cancer. Further cohort studies have confirmed the association of rs11672691 with clinical characteristics of aggressive disease, including high tumor stage, prostate-specific antigen (PSA) progression, and development of castration-resistant prostate cancer (CRPC). The risk GG genotype of rs11672691 is also associated with a poor prognosis for patients with prostate cancer . These results highlight the clinical potential of trait-associated SNV, which may serve as risk stratification markers for the management of cancer patients.

Indication of Treatment Response
Recent GWAS analyses have identified two common SNVs (rs4476990 and rs3802201), in which mapping to MIR2052HG may affect the recurrence risk of breast cancer patients treated with aromatase inhibitors. Expressions of MIR2052HG and estrogen receptor α (ERα, encoded by ESR1 gene) are induced by aromatase inhibitors and estrogen in a variantdependent manner. MIR2052HG could sustain the levels of ERα via promoting AKT/FOXO3-mediated ESR1 transcription and limiting the ubiquitin-mediated ERα degradation. Its risk variant genotypes could enhance ERα binding to estrogen response elements and result in an alteration of response to aromatase inhibitors treatment for cancer patients (Ingle et al., 2016). In the evaluation of adverse reaction for lung cancer patients receiving platinum-based chemotherapy, the variants CASC8 rs10505477 (Hu et al., 2016) and ANRIL rs1333049 are correlated with overall toxicity, especially severe hematologic and gastrointestinal toxicity; lincRNA MEG3 rs116907618 is correlated with severe gastrointestinal toxicity; these variants may be considered as biomarkers for the evaluation of platinumbased treatment (Gong et al., 2017). Moreover, the rs10505477 GG genotype of CASC8 is also associated with tumor size, lymph node metastasis, and tumor-node-metastasis stage and may contribute to the survival for gastric cancers patients . In nasopharyngeal carcinoma (NPC), lncRNA GAS5 variant rs2067079 is associated with an increased risk of severe myelosuppression and neutropenia, whereas rs6790 may decrease the incidence rate of toxic reactions induced by chemo-radiotherapy in NPC patients (Guo Z. et al., 2017). Functional genomic studies have revealed that GAS5 promoter encompassing SNV rs55829688 (T > C), which up-regulates GAS5 expression via interacting with transcription factor TP63, may aggravate myelosuppression and result in a poor prognosis for patients with acute myeloid leukemia (AML) (Yan et al., 2017). Additionally, GWAS analyses have identified that some genetic variants are correlated with the pharmacokinetics of psychotropic drugs, such as variant rs16935279 located in an intron of LINC01592; its C allele carriers have a lower metabolism rate for anti-epileptic drugs (Athanasiu et al., 2015).

LONG INTERGENIC NON-PROTEIN CODING RNA VARIANTS REGULATE GENE TRANSCRIPTION
Genome-wide association studies have identified a lot of trait-associated variants, most of which reside in non-coding regions of the human genome. However, the specific functional mechanism of genetic variants still remains confused, which is one of the major challenges for post-GWAS research (Schaub et al., 2012). The regulatory elements are mainly located within regions of non-coding DNA and play critical roles in the transcription of target genes. Emerging studies have showed that these regulatory elements can affect the expression of lincRNAs and other related genes via long-range chromatin interactions in a cell-type-or tissue-specific manner. Many genetic variants reside in the regulatory element regions of lincRNAs and may disrupt the interaction of transcription factors with a region containing SNVs (Figures 2A,B). The mapping of SNVs to lincRNA regulatory regions (especially promoters and enhancers) may indicate a potential impact of these variants on the transcription of target genes (GTEx Consortium et al., 2017).

Single-Nucleotide Variants in Super-Enhancer Locus of MYC Gene
Many genetic variants are located in the upstream of MYC, a gene desert on 8q24, which is related to the susceptibility of multiple cancers. Some observations, such as chromosome conformation capture (3C) assays, histone acetylation, and methylation marks analyses, have demonstrated that these regulatory regions containing SNVs may serve as enhancers for MYC gene in a tissue-specific manner. Functional investigations suggest that lincRNA CCAT2 augments the binding of transcription factor (TCF7L2 or TCF4) to MYC promoter region, activates WNT signaling, and increases the expression of target genes, especially the MYC proto-oncogene (Pomerantz et al., 2009;Ling et al., 2013). Although there is a disputable association between variant rs6983267 and MYC expression (Tuupanen et al., 2009), its risk G alleles produce more CCAT2 transcripts, which are exclusively retained in the nucleus. Interestingly, a risk-SNV rs6983267 also contributes to increased expression of CCAT1 (Zhao et al., 2016); an adjacent lincRNA of CCAT2, through affecting the longrange chromosomal interaction of MYC enhancer or CCAT1 promoter, then results in a cell-cycle regulation and tumor development (Kim et al., 2014). Guo et al. (2016) have reported that a prostate cancer risk-associated T allele of rs7463708 at lincRNA PCAT1 exhibited enhancer activity, through modulating the binding of novel transcription factor ONECUT2 with a distal enhancer that loops to the PCAT1 promoter; this process increases PCAT1 expression upon prolonged androgen treatment and promotes prostate transformation. Moreover, another prostate cancer risk-SNV rs378854 G alleles are also found to increase the expression of PVT1 oncogene by regulating an interaction of transcription factor YY1 with the promoters of PVT1 or MYC genes (Meyer et al., 2011). Similarly, the GG genotypes of rs13281615 increase PVT1 transcription and promote cell proliferation in breast cancer (Zhang et al., 2014).
Overexpression of PVT1 may contribute to high levels of MYC mRNA and protein, along with an increased copy number, eventually leading to tumorigenesis (Zou et al., 2017). These results demonstrate the association of genetic variants with lincRNA transcription, although further studies are needed to reveal the relationship of these SNVs and lincRNAs on chromosome 8q24 locus.

Single-Nucleotide Variants in Promoter Regions
Some SNVs reside in gene promoter regions and may influence the transcriptional expression of their target genes. Through an eQTL analysis of candidate genes and genetic variants in different tissues, an endometriosis risk-SNV rs3820282 is found to down-regulate LINC00339 expression by affecting the activity of LINC00339 promoter (Powell et al., 2016). Tao et al. (2015) have reported that an indel variant rs145204276 in the promoter region of GAS5 contributes to the up-regulation of GAS5 via affecting the methylation status of GAS5 promoter and regulating its transcriptional activity, thereby bringing its proto-oncogene role into play. Furthermore, the variant rs944289 of PTCSC3 is reported to reside in a binding site for CCAAT/enhancer binding proteins (C/EBPα/β); this variant may affect the activity of PTCSC3 promoter and down-regulate its transcript, then resulting in an abnormal expression of downstream genes and the progression of papillary thyroid carcinoma (Jendrzejewski et al., 2012). Notably, a gene promoter region is likely to overlap with another super-enhancer locus, suggesting it that may have enhancer-like roles. In these interactions, lincRNA loci may serve as both target genes of its SNVs and the distal regulatory elements of other related genes. Integrative functional genomic and epigenomic analyses have identified that osteoporosis risk-associated SNV rs6426749 may act as a distal variant-specific enhancer and play a pivotal role in bone metabolism. Risk rs6426749 G alleles can affect the enhancer activity by binding to transcription factor TFAP2A; a thin process may increase transcription of LINC00339 and modulate the expression of downstream gene via long-range chromatin loop formation in osteoblast cells . Recent studies have reported that prostate cancer riskassociated G allele of rs11672691 is associated with an increased expression of lincRNA PCAT19 and oncogene CEACAM21; SNV rs11672691 is located in an enhancer element and may alter the binding site of its oncogenic transcription factor HOXA2. CRISPR/Cas9-mediated interference and activation assays have demonstrated that rs11672691 variant is involved in the regulation of its eQTL genes PCAT19 and CEACAM21 expression and affects the cells' aggressive property in prostate cancers . In another alternative mechanism, risk variant rs11672691 is associated with the decreased levels of a short isoform of PCAT19 (PCAT19-short) and increased levels of a long isoform (PCAT19-long). This risk SNV locus is bifunctional with both promoter and enhancer activity, which maps to a promoter of PCAT19-short and the third intron of PCAT19-long. Risk allele rs11672691 and its linkage disequilibrium SNV rs887391 may alter the binding profiles of transcription factors NKX3.1 and YY1, thereby elevating the abundance of PCAT19-long through promoter-enhancer switching. Ultimately, it gives rise to an increased formation of the HNRNPAB-PCAT19-long complex to activate a subset of cell-cycle genes and promote prostate cancer aggression (Hua et al., 2018).
Another causative cis-regulatory mechanism has been constructed via integrative genomic analyses; the breast cancerassociated variant rs4415084 is located in a GATA3-binding motif of LINC02224, which refers to the differential GATA3 binding and chromatin accessibility, thereby promoting the transcription of LINC02224 and MRPS30 genes . It is reasonable to postulate that the interactions of lincRNA, trait-associated variants, and regulatory factor may contribute to the development of specific disorders.

SINGLE-NUCLEOTIDE VARIANTS AFFECT THE BIOLOGICAL FUNCTION OF LONG INTERGENIC NON-PROTEIN CODING RNA
Currently, genetic variants in potential lincRNA regions have attracted increasing interest; it has been established that many SNVs are associated with susceptibility of multiple diseases. It is evident that the expression and function of lincRNAs may be influenced by its SNVs in a cell-type-or tissue-specific manner. A comprehensive analysis has suggested that genetic variants in lincRNA regions also possibly affect the process of splicing and stability of lincRNA conformation, thereby leading to a modification of their interacting partners, as shown in Figures 2C-E (Hon et al., 2017).

Effect of Single-Nucleotide Variants on the Role of Long Intergenic Non-protein Coding RNA CCAT2
Several observations, such as eQTL and DNAase peak assays, indicate that genetic variants that occurred in exons of lincRNAs may change the lincRNA secondary structure, thereby affecting its stability, interactive properties, and regulatory functions (Khurana et al., 2016). For example, lincRNA CCAT2 could act as a scaffold or assembly platform and modulate the alternative splicing of glutaminase (GLS) pre-mRNA via directly binding to a Cleavage Factor I (CFIm) complex. However, SNV rs6983267 (G/T) may affect the interaction of CCAT2 with CFIm complex by changing lincRNA secondary structure and initiating a domino effect mechanism; this process leads to allele-specific reprogramming of cellular energy metabolism in colon cancers (Redis et al., 2016). Moreover, by using allele-specific CCAT2 transgenic mice, recently, Shah et al. (2018) have revealed that overexpression of CCAT2 may lead to genomic instability and myeloid malignancies; the SNV rs6983267-specific RNA-editing induces the dysregulation of a genome-wide gene expression by down-regulating EZH2, a histone-lysine N-methyltransferase, which then results in the impairment of immune processes and development of myelodysplastic neoplasms in vivo. In another study, Sur IK and his colleagues have generated mice lacking a myc enhancer region spanning risk-SNV rs6983267; the mutant mice have not showed an overt phenotype but confer resistance to intestinal tumorigenesis induced by APC minmutation (Sur et al., 2012). These studies indicate that cancer risk-associated variants identified from the human genome may also exert a functional effect for animals in vivo.

Effect of Single-Nucleotide Variants on the Long Intergenic Non-protein Coding RNA Secondary Structure
It is worth noting that lincRNAs have a long average length and that their exon regions contain numerous trait-associated variants; significant alterations of lincRNA secondary structure may be caused by its SNVs on exon loci. Many variants such as PRNCR1 (prostate cancer-associated non-coding RNA) are located in exon regions, for example, rs1456315 G/A; it has been predicted to affect the lincRNA secondary structure of PRNCR1 (Chung et al., 2011) and then alter lincRNA stability and conformation, even giving rise to the modification of its interacting partners. Xue et al. (2015) have also reported that SNV rs7958904 G/C in an exon region does not affect transcription activity of HOTAIR; however, in in silico analyses, it is shown to alter the RNA secondary structure of HOTAIR. These findings indicate that genetic variants, especially SNVs in exon loci, may play a different role via affecting the lincRNA structure.

Effect of Single-Nucleotide Variants on MicroRNA Binding
Not surprisingly, it has been documented that some microRNAs (miRNAs) can function in a non-canonical manner to regulate lincRNA expression levels or directly interact with lincRNA molecules. The competing endogenous RNA (ceRNA) is a mechanism that lncRNA could competitively bind or sponge miRNAs, such as ceRNA MALAT1; its exon locus contains a variant rs619586 A > G, which can significantly up-regulate the expression of XBP1 (X box-binding protein 1) by sponging miR-214 and then suppressing the proliferation and migration of vascular endothelial cells in vitro (Zhuo et al., 2017). In another case, variant rs11752942 of LINC00951 exon is linked to the susceptibility of ESCC; risk G alleles of rs11752942 may decrease the expression levels of LINC00951 via affecting the binding of miR-149-3p, thereby regulating cell proliferation and tumor growth . Intriguingly, recent studies have demonstrated that pancreatic cancer risk-SNV rs11655237 G > A in the LINC00673 exon region is likely to create a target site for miR-1231 binding and reduces the function of LINC00673 in an allele-specific manner. Down-regulation of LINC00673 may attenuate the interaction of PTPN11 with an E3 ubiquitin ligase PRPF19 and suppress the ubiquitin-mediated PTPN11 degradation; these processes enhance an oncogenic signaling whereas diminish STAT1-dependent anti-oncogenic signaling in cancer cells (Zheng et al., 2016). These findings highlight the regulatory relationships of miRNAs with lincRNAs in a variant-specific manner and may offer a wider field for future research on lincRNA.

APPROACHES FOR IDENTIFYING DRIVERS
As summarized above, genetic variants play a very significant role in the transcription and biological function of lincRNAs, contributing to various disease susceptibility, progression, prognosis, and treatment response. Genetic variants may act as a driving factor to affect the role of lincRNAs; just like a driver who drives a vehicle, analogously, lincRNA variants may vividly serve as a putative driver to regulate the lincRNA molecules.

Computational Approaches
Driver identification is a challenging task, owing to their complex and diverse modes of action and the inadequate understanding of non-coding regions; the computational prediction of noncoding drivers is even more challenging than that of proteincoding drivers. In addition, non-coding variants are more abundant than protein-coding genes; hence, the key variants with functional significance have to be distinguished from a larger set of passenger events (Khurana et al., 2016). Currently, several online databases have been constructed to describe genomic variants in lncRNA regions, such as lincSNP, lncRNASNP2, and LncVar. More specifically, lincSNP 2.0 is an integrated database to identify and annotate disease-associated SNVs on human lincRNAs and their transcription factor binding sites (Ning et al., 2017). LncRNASNP2 is an updated database of comprehensive information about SNVs or mutations in human and mouse lncRNAs, as well as their impacts on lncRNA structure and potential function on miRNA binding (Miao et al., 2018). LncVar provides genetic variants associated with lncRNAs in multiple species and their effects on biological function of lncRNAs . Furthermore, a large number of GWAS analyses have successfully identified an array of genetic variants that are associated with various types of human disorders (MacArthur et al., 2017). Numerous public databases have been set out to provide a comprehensive description of genetic variants and GWAS data in the human genome with high impact (Genomes Project et al., 2015). A brief overview of these databases with their key features and corresponding references is presented in Table 3.
Functional annotations and linkage disequilibrium analyses of genetic variants can be performed based on public databases and bioinformatic methods. Among tag-SNVs with strong linkage disequilibrium, significant genotype-specific effects on lincRNA expression can be observed by eQTL analysis (GTEx Consortium et al., 2017). Subsequently, according to ChIP-Seq data from the ENCODE database 1 , some trait-associated SNVs can be picked out; those variants mapping to cis-regulatory motifs may affect the binding activities of many interrelated transcription factors, including EZH2, CHD1, TCF7L2, and CTCF. These transcription factors may be closely related to the occurrence and progression of various human disorders, such as enhancer of zeste homologue 2 (EZH2), which is overexpressed in several human tumors and accounts for the aggressiveness and unfavorable prognoses of various cancers.

Function Verification
Many functional verification studies of genetic variants have focused on protein-coding regions of the human genome. With an expanding appreciation that non-coding variants play a crucial role in the development of disorders, several recent studies have set out to explore approaches to evaluate the function of noncoding variants (Khurana et al., 2016). For example, experimental methods used to understand the effects of cis-regulatory variants within a promoter or enhancer region on cellular biological functions is summarized as follows. A main strategy is required to introduce the sequence variants, the mutated DNA fragment can be constructed via site-directed mutagenesis, CRISPR-Cas system (Konermann et al., 2015) or oligonucleotide synthesis. Subsequently, the functional output of non-coding variants should be detected through several methods, either luciferase reporter assays or high-throughput sequencing-based assays, such as cis-regulatory element analysis by sequencing (CREseq) (Kwasnieski et al., 2014) and self-transcribing active regulatory region sequencing (STARR-seq) (Arnold et al., 2013). Furthermore, functional verification is required to determine the direct biological significances, such as the oncogenic properties, which can be manifested though cancer cellular experiments (e.g., cell proliferation, cell cycle, cell death, migration, and invasion tests) along with in vivo model assays. In addition, other approaches are needed to be explored to demonstrate the effects of genetic variants within introns, exons, or intergenic regions. For instance, genetic variants mapping to exons of a lincRNA may alter the lincRNA secondary structure, which can be partly predicted using RNAfold web server (Hofacker and Stadler, 2006). The 5 UTR (un-translated region) variants may affect the process of splicing and stability of RNA conformation, a functional splicing reporter minigene assay should be used to assess the effect of genetic variants on RNA splicing (Giorgi et al., 2015). Through the aforementioned knowable strategies, comprehensive functional verification of non-coding variants is very important to understand their biological consequence; there is an urgent need to explore more practical methods and strategies for functional verification research.

PERSPECTIVES AND DISCUSSION
Single-nucleotide variants are the most common genetic variants and universally present in the human genome, including noncoding regions. One current belief is that heterogeneous disease (e.g., cancers susceptibility) may be caused by the accumulation of multiple driving genetic variations. GWASs have identified a large number of disease or trait-associated SNVs, and many of those are located in non-coding regions of the human genome. The functions of genetic variants are generally unknown and remain to be elucidated. One critical common viewpoint is that the significance of lincRNA variants depends on their genomic position. Certain SNVs are located in regulatory regions of lincRNA genes; it is found to affect the binding efficiency of transcription factor; it is known to possibly regulate the transcription of lincRNAs and other related genes in a cell-type-or tissue-specific manner. The mapping of SNVs to lincRNA transcript itself potentially affects the process of splicing and stability of lincRNA conformation or modulates the lincRNA secondary structure; these effects may lead to an alteration of the interactive properties and regulatory functions of lincRNA (Khurana et al., 2016). Collectively, these findings indicate that genetic variants in lincRNA regions may serve as a regulatory signature for early events, which illustrate the genomic background of lincRNA differential expressions in a tissue-or disorder-specific manner.
Considering their important regulatory role, lincRNAs may serve as the promising biomarkers for the diagnosis, prognosis, and treatment response of various diseases (Zou et al., 2018). With their characteristic of tissue and disease specificity, lincRNAs may be explored as target molecules for personalized medicine in the future (Huarte, 2015). Currently, molecular targets drug approved by the US Food and Drug Administration (FDA) are mainly derived from proteins. However, owing to the finiteness of druggable protein genes in the human genome, the expansion of potentially druggable targets may need to include lncRNA molecules. One lincRNA PCA3based test (the PROGENSA PCA3 assay approved by the FDA) has been used as a marker for the detection of prostate cancers (Evaluation of Genomic Applications in Practice and Prevention [EGAPP] Working Group, 2014). Moreover, a novel treatment strategy differs from the classical small molecules and antibodies that mainly target proteins. RNAtargeting therapeutics refer to the use of oligonucleotides to target primarily RNA involved in various diseases for therapeutic efforts. Two major approaches are employed to target RNA: double-stranded RNA-mediated interference (RNAi) and antisense oligonucleotides (ASOs) (Kole et al., 2012). Currently, both methods are in clinical trials. Among them, nusinersen (Spinraza), an ASO-targeting drug for spinal muscular atrophy (SMA), has been approved by the FDA (Finkel et al., 2017); patisiran, an RNAi therapeutic strategy for hereditary transthyretin amyloidosis (hATTR), has also shown promising results (Adams et al., 2018). Hence, we can expect that lincRNA molecules will provide additional options for RNA therapeutics. Importantly, disease-associated variants are found to exhibit a higher frequency in non-coding regions, which encompass enhancers, promoters, and other regulatory elements. It is likely that the role of genetic variants in lincRNA regions should be characterized at the regulatory network level. Genetic variants may offer the possibility to make use of the information from adjacent protein-coding or non-coding regions to link with heterogeneous diseases. Therefore, a combination of SNVs, lincRNAs, and proteins may bring personalized medicine closer to clinical applications in the foreseeable future (Li et al., 2019).
Previous studies may appear to be a slightly biased against the genetic variants that are located in non-coding regions, as their significant roles have not yet been explored to the same extent as those of protein-coding genes. In particular, for the disease-associated variants in lincRNA regions, whether functionally affected or altered in lincRNA expression by risk variants, it may be responsible for the disease development and its pathogenesis. Verification of the mechanisms requires a detailed understanding of the lincRNA structure and function, and a suitable experimental system to distinguish the subtle differences caused by genetic variants. Although it is difficult to describe the consequences of genetic variants in noncoding regions, more emerging technologies and approaches are urgently needed to explore the driving effects of genetic variants on lincRNA regions.

AUTHOR CONTRIBUTIONS
HZ consulted relevant literatures, finished the manuscript, and completed English revision. LT completed the figures and tables. F-FS provided constructive feedback and guidance. L-XW and H-HZ completed critical revisions and proofread the manuscript.