Circulating Long Noncoding RNAs Act as Diagnostic Biomarkers in Non-Small Cell Lung Cancer

Identification of novel effective early diagnostic biomarkers may provide alternative strategies to reduce the mortality for non-small cell lung cancer (NSCLC) patients. Circulating long non-coding RNAs (lncRNAs) have emerged as a new class of promising cancer biomarkers. Our study aimed to identify circulating lncRNAs for diagnosing NSCLC. A total 528 plasma samples were continuously collected and allocated to four progressive phases: discovery, training, verification, and expansion phases. The expression of candidate lung cancer related lncRNAs were detected using quantitative reverse-transcriptase polymerase chain reaction (qRT-PCR). We identified a 4-lncRNA panel (RMRP, NEAT1, TUG1, and MALAT1) that provided a high diagnostic value in NSCLC (AUC = 0.86 and 0.89 for training and verification phase, respectively). Subgroup analyses showed that the 4-lncRNA panel had a sensitivity of 78.95% [95% confidence interval (CI) = 62.22%–89.86%] in stage I-II patients and 75.00% (95% CI = 52.95%–89.40%) in patients with small tumor size (≤3cm). Notably, the sensitivity of 4-lncRNA panel was significantly higher than that of routine protein panels in adenocarcinoma (CEA, CA125, and CYFRA21-1, 86.30% vs. 73.96%). Adding 4-lncRNA to protein markers significantly improved the diagnostic capacity in both adenocarcinoma (AUC=0.85, 95% CI = 0.78–0.91) and squamous cell carcinoma (AUC=0.93, 95% CI = 0.86–0.97). In conclusion, we identified a plasma 4-lncRNA panel that has considerable clinical value in diagnosing NSCLC. The 4-lncRNA panel could improve the diagnostic values of routine tumor protein markers in diagnosing NSCLC. Circulating lncRNAs could be used as promising candidates for NSCLC diagnosis.


INTRODUCTION
Lung cancer is one of the most malignant tumors with high incidence and mortality in China and around the world (1,2). Non-small cell lung cancer (NSCLC) is the major histological type, accounting for about 85% of lung cancer (3). Although NSCLC patients at early stages have a relatively high survival rate with optimized treatment, more than 75% of patients are diagnosed at advanced stages (4). The 5-year survival rate of NSCLC patients is still less than 20% due to the scarcity of effective early detection method (4).
Early detection and treatment is one of the most effective ways to improve curative effect and reduce mortality for NSCLC patients. Early detection method should be non-invasive and easily accessible (5). Low-dose computed tomography (LDCT) screening is recommended for the early detection of lung cancer, which can reduce lung cancer mortality by 20% (6). However, the false-positive rate of LDCT is relatively high (7). Moreover, high cost and repeated scanning have limited the application of LDCT (7). Circulating tumor protein markers, such as carcinoembryonic antigen (CEA), squamous cell carcinoma antigen (SCC), cytokeratin 19 fragment antigen (CYFRA21-1), can also act as noninvasive biomarkers to improve early diagnosis of NSCLC (8). However, the diagnostic performance of the protein markers for the early detection of NSCLC were limited because of unsatisfied sensitivity and specificity (8). Thus, identification of novel effective early diagnostic markers may provide alternative strategies to reduce the mortality for NSCLC patients.
Long non-coding RNAs (lncRNAs) have been proved to play important roles in occurrence and progression of many diseases, including NSCLC (9,10). Studies have found a variety of abnormal expressed lncRNAs, which have important biological function in the process of NSCLC. Moreover, lncRNAs can be stably detected in the peripheral circulation. These features make circulating lncRNAs ideal noninvasive biomarkers for lung cancer diagnosis (11). In fact, differential expression of several circulating lncRNAs, including MALAT1 (12), GAS5 (13), SNHG1 (14), TUG1 (15), and HOTAIR (16) in patients with NSCLC were reported recently. Although these circulating lncRNAs have the ability to distinguish lung cancer patients from non-lung cancer patients, several challenges must be overcome to further develop circulating lncRNA-based biomarkers for clinical applications. The sample size in most of the current studies are relatively small and the results have not been verified in multiple stage clinical studies. In addition, few studies investigated circulating lncRNA for early detection of NSCLC patients and simultaneously compared the diagnostic performance of circulating lncRNAs with the routine tumor protein biomarkers.
In the present study, we investigated the diagnostic value of circulating lncRNAs in multiple progressive phases (discovery, training, and verification phases) to identify a panel of lncRNAs for the diagnosis of NSCLC. We also compared of the diagnostic performance of the lncRNAs panel with protein tumor markers in lung adenocarcinoma and squamous cell carcinoma, respectively.

Patient Samples and Study Design
Participants were continuously recruited from November 2015 to December 2017 at the Xinqiao Hospital Affiliated to Army Medical University in Chongqing, China. A total of 528 participants were enrolled and comprised of patients with newly diagnosed and histopathologically confirmed primary NSCLC, chronic obstructive pulmonary disease (COPD), pulmonary tuberculosis, pulmonary inflammation, other benign lung disease, and healthy controls. Blood samples from patients were collected prior to any treatment under fasting conditions. Demographic and clinicopathological characteristics were obtained from all participants via a combination of a structured questionnaire and medical records. The serum concentrations of tumor markers (CEA, CA125, CYFRA21-1, SCC, and NSE) prior to any treatment were also collected. Research protocol was reviewed and approved by the ethics committee of the Army Medical University (Chongqing, China), and all participants provided informed consent.
A multi-phase study was designed to identify a panel of plasma lncRNA biomarkers. The study comprised four phases: the discovery phase, the training phase, the verification phase and the expansion phase ( Figure 1). In the discovery phase, a total of 31 candidate lung cancer related lncRNAs were selected as potential diagnostic biomarkers according to previous studies and LncRNADisease database (Supplementary Table S1). The expression of 31 lncRNAs were detected with quantitative reverse transcriptase polymerase chain reaction (qRT-PCR) in 40 plasma samples (20 NSCLC patients and 20 controls). In the training phase, the stably expressed lncRNAs in the discovery phase were firstly detected with qRT-PCR in an independent cohort of plasma samples from 265 participants. LncRNAs that were differentially expressed between NSCLC and control groups (healthy and benign controls) were used to construct the diagnostic model. In the verification phase, the diagnostic performance of the lncRNAs panel from the training phase were verified in an independent cohort of 223 plasma samples. In the expansion phase, diagnostic models for lung adenocarcinoma and squamous cell lung carcinoma were constructed using the five tumor protein markers (CEA, CA125, CYFRA21-1, SCC, and NSE) in 240 participants. Comparisons of the diagnostic performance were conducted between the lncRNA-based model and tumor protein marker-based model in lung adenocarcinoma and squamous cell lung carcinoma, respectively.
Plasma Processing, RNA Isolation, and qRT-PCR Analysis All blood samples from patients were collected prior to any treatment under fasting conditions. Blood samples were processed to separate plasma within 2 h from collection by centrifugation (2,000 g for 10 min at 4°C, 12,000 g for 10 min at 4°C). Plasma samples were transferred to RNase/DNase-free tubes and stored at −80°C awaiting total RNA extraction. Total RNA from plasma was extracted using the TRIzol LS reagent (Invitrogen, Carlsbad, CA, USA) following the manufacturer's instructions.
For qRT-PCR analysis, 7 ml total RNA from plasma was firstly reverse transcribed into complementary DNA using PrimeScript ™ RT reagent kit with gDNA eraser (TaKaRa, Dalian, China) as follows: 37°C for 15 min, followed by 85°C for 5 s. Then, real-time PCR was performed using the SYBR Premix Ex Taq (TaKaRa) with the thermocycling conditions as follows: 95°C for 30 s, followed by 40 cycles of 95°C for 5 s, 60°C for 30 s, and 72°C for 30 s, followed by a final cycle of 72°C for 2 min. Results were normalized to the expression levels of b-actin as described previously (17,18). Primers sequences are provided in Supplementary Table S1. The qRT-PCR results were calculate using 2 −△△Cq method.

Detection of Tumor Markers
Serum CEA, CA125 and CYFRA21-1 levels were detected by chemiluminescence method using Roche reagent sets (Roche Diagnostics, Shanghai, China) following the manufacturer's instructions. Serum SCC and NSE were determined by chemiluminescence method using Abbott reagent sets (Abbott, Chicago, USA) following the manufacturer's instructions.

Statistical Analysis
The relative expression levels of lncRNAs were expressed as median (quartile spacing) [M (P25, P75)] and the expression difference were evaluated using the Mann-Whitney U test using SPSS 19.0 software (SPSS, Inc., Chicago, IL, USA). The lncRNAs expression differences under different freeze-thaw cycles and different room temperature incubation times were evaluated by one-way repeated measures analysis of variance using SPSS 19.0 software. ROC curves were generated using MedCalc 19.0.7 (Med-Calc, Mariakerke, Belgium) and the area under the curves (AUC) were compared by the DeLong test. The Clinical Calculator online tool (http://vassarstats.net/clin1.html?tdsourcetag=s_pctim_ aiomsg) was used to calculate the sensitivity, specificity, positive predictive value, negative predictive value, positive likelihood ratio, and negative likelihood ratio. A two-sided P-value less than 0.05 was taken as statistically significant.

Patient Characteristics
A total of 528 participants were enrolled into our study. These participants were randomly allocated to a discovery phase (n = 40), a training phase (n = 265) and a verification phase (n = 223; Figure 1). The characteristics of the study participants were summarized in Table 1. A total of 20 NSCLC patients (including 12 adenocarcinoma and 8 squamous cell carcinoma) and 20 controls (including 5 healthy controls and 15 benign lung disease patients) were included in the discovery phase. Subsequently, 148 NSCLC patients (including 87 adenocarcinomas and 61 squamous cell carcinoma) and 117 controls (including 35 healthy controls and 82 benign lung disease patients) were included in the training phase. The verification phase comprised 120 NSCLC patients (including 73 adenocarcinomas and 47 squamous cell carcinoma) and 103 controls (including 30 healthy controls and 73 benign lung disease patients). Among the above mentioned 528 participants, a total of 240 participants who received the full panel of tumor marker test including CEA, CA125, CYFRA21-1, SCC, and NSE before therapy were allocated to an expansion phase to compare the diagnostic performance of the lncRNAs with tumor protein markers ( Figure 1).

LncRNAs Screening
By reviewing previous studies and LncRNADisease database, we selected 31 lung cancer related lncRNAs as potential diagnostic candidates (Supplementary Table S1). qRT-PCR were conducted to quantify the expression levels of 31 lncRNAs in 40 plasma samples (20 NSCLC patients and 20 controls).
Five lncRNAs (RMRP, NEAT1, TUG1, MALAT1, and H19) stably expressed in plasma. Differential expression analysis showed that RMRP and TUG1 had significantly lower expression levels in the NSCLC group than in the control group (P = 0.001, Figure 2A, Supplementary Table S2). In contrast, NEAT1 and MALAT1 had significantly higher expression levels in the NSCLC group than in the control group (P = 0.022 and 0.002, respectively, Figure 2A, Supplementary Table S2). The expression level of H19 showed no significant difference between NSCLC and control groups (P = 0.534, Figure 2A, Supplementary Table S2).
The stability of RMRP, NEAT1, TUG1, MALAT1, and H19 in plasma was evaluated under harsh conditions. The expression levels of RMRP, NEAT1, TUG1, MALAT1, and H19 in plasma were detected after treating with repetitive multiple freeze-thaw cycles (1, 3, 5, and 8) or incubating for various durations (0, 4, 12, and 24 h) at room temperature. One-way repeated measures analysis of variance showed no significant difference in the expression levels of lncRNAs among freeze-thaw or incubating groups (P > 0.05) (Supplementary Figure S1A, B). In summary, the 5 stably expressed lncRNAs were identified as candidates for further testing in the training phase.

Determination of the Diagnostic Value of the 4-lncRNA Panel in the Training Phase
The 5 lncRNAs were detected using an independent cohort of 265 plasma samples (including 148 NSCLC and 117 control) with qRT-PCR in the training phase. Four (RMRP, NEAT1, TUG1, and MALAT1) of the 5 lncRNAs had significantly different expression levels between the NSCLC and control groups, which were consistent with the results in the discovery phase ( Figure 2B, Supplementary Table S3). Thus, RMRP, NEAT1, TUG1, and MALAT1 were selected as the final candidates for constructing a diagnostic model.
The diagnostic value of RMRP, NEAT1, TUG1, and MALAT1 were firstly measured by ROC curves, which demonstrated a good discriminative ability between NSCLC and control groups (AUC = 0.70, 0.73, 0.65, and 0.66, respectively) (Table 2, Figure  3A). Then, a predictive lncRNAs panel was established by a stepwise logistic regression model using the training phase samples. All of the four lncRNAs turned out to be significant predictors. The predicted probability of the 4-lncRNA panel was calculated using following formula: Logit (P) =−1.083×RMRP +0.955×NEAT1−0.594×TUG1+0.530×MALAT1. The diagnostic performance of the established 4-lncRNA panel was evaluated by using ROC analysis, and the AUC for the 4-lncRNA panel was 0.86 [95% CI = 0.81 to 0.91; at a cut-off of 0.679, sensitivity = 85.32%, specificity = 76.19%, Table 2, Figure 3A]. The AUC value of the 4-lncRNA panel was significantly higher than that of any lncRNA alone ( Table 2, P < 0.05).  The area under the curves of each lncRNA marker was compared with 4-lncRNA panel using the DeLong test with P<0.05. AUC, area under the curve; +LR, positive likelihood ratio; -LR, negative likelihood ratio; PPV, positive predictive value; NPV, negative predictive value; CI, confidence interval.

Evaluation of the Diagnostic Performance of the 4-lncRNA Panel in the Verification Phase
To further assess the diagnostic value of the 4-lncRNA panel, we detected the 4 lncRNAs expression levels in another independent cohort of 223 plasma samples (including 120 NSCLC and 103 control) in the verification phase. The expression levels of the 4 lncRNAs were significantly different between patients with lung cancer and controls, which were consistent with the results in the training phase ( Figure 2C, Supplementary Table S4). Similarly, the predicted AUC of the 4-lncRNA panel was 0.89 (95% CI = 0.84 -0.94; at a cut-off of 0.679, sensitivity = 86.96%, specificity = 74.65%, Table 3, Figure 3B).

DISCUSSION
In this study, we revealed that plasma RMRP, NEAT1, TUG1, and MALAT1 were potential circulating diagnostic biomarkers for diagnosing NSCLC. The 4-lncRNA panel established by the logistic regression model provided a high diagnostic value in NSCLC. We also compared its diagnostic value with tumor protein markers, and found that the 4-lncRNA panel had a markedly higher sensitivity in diagnosing NSCLC. Current medical detection methods (including imaging and biomarker detection) for the diagnosis of cancer are emerging (19). However, the early diagnosis of lung cancer is still a great challenge (20). Circulating lncRNAs were stably expressed and were considered to be novel potential biomarkers to diagnose lung cancer (21,22). In the training phase and verification phase, our study revealed that plasma RMRP, NEAT1, TUG1, and MALAT1 were potential circulating markers for diagnosing NSCLC. The 4-lncRNA panel showed a high accuracy in the diagnosis of NSCLC. Moreover, the 4-lncRNA panel had high specificity (76.05%-76.23%) while maintaining high sensitivity (85.26%-87.02%), indicating that the model had strong ability to detect NSCLC patients and could specifically exclude non-NSCLC patients. MALAT1 was firstly identified as a candidate circulating biomarker for the diagnosis of NSCLC (23). Subsequently, the diagnostic roles of circulating lncRNAs for NSCLC have been demonstrated in several studies. However, the diagnostic performances of circulating lncRNAs in different studies were inconsistent. For example, Guo et al. (24) reported that MALAT1 as a candidate blood-based biomarker to diagnosis lung cancer with an AUC value of 0.718. However, Liang et al. (13) found that plasma GAS5 expression level could be used to distinguish NSCLC patients from control patients with a relatively high AUC value of 0.832. Wang et al. (25) conducted a meta-analysis including 2121 NSCLC patients and 1,528 healthy controls and suggested miRNAs had a moderate diagnostic accuracy for lung cancer (sensitivity 75%, specificity 79%). Xie et al. (26) developed a diagnostic panel consisting of SOX2OT, ANRIL, CEA, CYFRA21-1, and SCCA, which could be valuable in NSCLC diagnosis (sensitivity = 77.1%, specificity = 79.2%). Tao et al. (27) detected the expression of exosomal lncRNAs in NSCLC and found that the combination of two exosomal lncRNAs had a similar diagnostic efficiency (sensitivity = 81.3%, specificity = 69.3%). In addition, Kamel et al. (28) demonstrated that the combination of GAS5 and SOX2OT showed a better diagnostic efficiency (sensitivity = 83.8%, specificity = 81.4%). The sensitivity of the 4-lncRNA panel model in our study is higher than the above studies, and the specificity is similar to the above studies. The reported inconsistent diagnostic lncRNA panels may be attributed to the differences in candidate lncRNAs profiles, specimen types (serum, plasma, and serum exosome), source of controls and patients. Although there are various methods of RNA isolation and qRT-PCR analysis used to detect the expression of circulating lncRNA, so far there is still no uniform standard. Thus, heterogeneity in RNA isolation and qRT-PCR methods could also be important reasons for the inconsistent findings.
Compared with those studies of circulating lncRNAs in diagnosing NSCLC (29)(30)(31)(32), our study is unique for the following reasons: Firstly, we selected 31 lung cancer related lncRNAs based on the previous studies and screened the expression of these lncRNAs in the plasma, which make it easier to obtain effective diagnostic markers for lung cancer. Secondly, we included not only healthy controls but also benign lung diseases in the control group. Our subgroup analyses indicated that the 4-lncRNA panel also provided a good diagnostic capacity to distinguish NSCLC patients from COPD, pulmonary tuberculosis, or pulmonary inflammation.
We also confirmed the stability of 4 lncRNAs in plasma under harsh conditions. The reasons for the stability of lncRNAs in plasma could be that lncRNAs were encapsulated in some small vesicles (such as exosomes) (33,34). In addition, lncRNAs could be folded into secondary and tertiary architectural domains and combined with proteins to form a complex which was protected from RNase degradation (35,36). The stable expression of lncRNAs in plasma lay the foundation to act as diagnostic markers in NSCLC.
In clinical practice, tumor protein markers are widely used for screening and diagnosis of lung cancer (37)(38)(39). Thus, we further evaluated the diagnostic values of the 4-lncRNA panel for NSCLC by comparison with tumor protein markers. The 4-lncRNA panel showed a higher sensitivity than protein panel, in adenocarcinoma while the specificity of 4-lncRNA panel was lower than tumor protein markers. Therefore, we believe that the 4-lncRNA panel could supplement the lack of sensitivity of tumor protein markers. In fact, our additional analyses on the combinations of 4-lncRNA and protein markers showed a better diagnostic value in distinguishing lung adenocarcinoma or lung squamous cell carcinoma from controls.
In conclusion, we identified a plasma 4-lncRNA panel that distinguished NSCLC patients from healthy and benign lung diseases with a high degree of sensitivity and specificity. In addition, the 4-lncRNA panel could improve the diagnostic values of traditional tumor protein markers. Our plasma 4-lncRNA panel showed robust potential for the early diagnosis of NSCLC, suggesting circulating lncRNAs could be used as promising candidates for NSCLC diagnosis.

DATA AVAILABILITY STATEMENT
All the data generated for this study are included in the article/ Supplementary Material.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the ethics committee of the Army Medical University (Chongqing, China). The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
YL led the study by designing, conducting, interpreting results, writing the manuscript, and obtaining the funding. SY, YX, and XG performed the majority of the experiments and participated in the study design, result interpretation, and manuscript writing. YZ, CL, and LW collected human tissue samples and clinical data. WX and NW performed the statistical analysis. TC and XM contributed to the result interpretation and discussions. LB and ZY participated in the study design, participant recruitment, and result interpretation. All authors contributed to the article and approved the submitted version.