These authors have contributed equally to this work
This article was submitted to Molecular Diagnostics and Therapeutics, a section of the journal Frontiers in Molecular Biosciences
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
Gastric cancer (GC) is one of the leading causes of cancer-related death worldwide, ranking the third in males and the fifth in females (
Deregulation of lipid metabolism has a critical role in the promotion of tumorigenesis and tumor progression (
GC progression is closely associated with alterations of lipid metabolism. A low level of serum high-density lipoprotein predicted a high risk of GC development, a high rate of lymphatic and vascular invasion, an advanced nodal metastasis, and a poor prognosis in patients with GC (
The mechanisms of deregulation of lipid metabolism in cancers are complicated, including alteration in pathways involved in
Two GEO (Gene Expression Omnibus,
The normalized gene expression data of the GSE62254 dataset were downloaded from GEO. Prognosis relevant genes from lipid metabolism–related gene sets were identified using the “survival” package. All the lipid metabolism–related genes were subjected to the univariate Cox regression model, and 63 genes were identified to be associated with overall survival (OS). The 63 genes were further subjected to the LASSO Cox regression model analysis using the glmnet package, and then 19 genes were selected for construction of the risk prognostic scoring system. Calculation of risk scores was performed using the generated coefficients and corresponding expression. According to the risk scores, patients were classified into low-risk and high-risk groups with a cut-off value (risk score = −3.793587), which best stratified patients with different OS.
The same model and coefficients in the training cohort were applied in the validation cohort (GSE26942 dataset). The normalized gene expression data of GSE26942 were downloaded. The efficacy of risk score prognostic classification was evaluated by ROC analysis with the timeROC package. The survival analysis was conducted as mentioned above and was also validated with another two gene sets (TCGA GC and GSE84437).
The timeROC package of R software was applied to perform the time-dependent receiver-operating characteristic curve (ROC) analysis. Survival analysis with Kaplan–Meier plots and the log-rank test, and the univariate and multivariate Cox hazard model were performed. GO and KEGG functional enrichment analyses were conducted through the R package clusterProfiler.
The expression matrices of GSE62254 and GSE26942 datasets were uploaded to CIBERSORT to determine the tumor-infiltrating immune cell fractions, which were calculated according to the LM22 signature with 1,000 permutations, as described previously (
Independent prognostic factors were identified through univariate and multivariate Cox regression analysis. The independent prognostic factors were used to construct the prognostic nomogram, which assessed the OS probability at 1, 3, and 5 years with the “rms” package in R software.
Student’s
Human lipid metabolism–related pathways were downloaded from the KEGG (
The risk prognostic model was developed using the training dataset (GSE62254). The univariate Cox regression model was used to identify genes with prognostic relevance for overall survival (OS). As a result, 63 genes were found to have statistically significant relevance with OS, and their correlations with each other were validated (
Risk score = (0.100 * LPL) + (−0.374 * IPMK) + (−0.122 * PLCB3) + (0.311 * CDIPT) + (0.146 * PIK3CA) + (−0.263 * DPM2) + (0.310 * PIGZ) + (−0.594 * GPD2) + (0.043 * GPX3) + (0.077 * LTC4S) + (−0.782 * CYP1A2) + (−0.102 * GALC) + (−0.189 * SGMS1) + (−0.061 * SMPD2) + (−0.184 * SMPD3) + (−0.081 * FUT6) + (0.130 * ST3GAL1) + (0.549 * B4GALNT1) + (−0.040 * ACADS).
The GSE26942 dataset was adopted as the validation dataset. With a cut-off value of −3.793587, which stratified patients into two groups with the largest OS difference, patients were classified into low-risk and high-risk groups. The difference between the low-risk and high-risk groups in OS was statistically significant in the training dataset, the validation dataset, and both datasets combined. Kaplan–Meier curves are displayed in
The risk predictive score model had high efficacy of prediction in the training set, the validation set, and the combination of both datasets. Kaplan–Meier curves of overall survival stratified by risk score (low/high) in the training set
Univariate and multivariate Cox regression analysis in the combined dataset showed that the risk prognostic score model was an independent and significant prognostic factor for OS (
We compared the risk score between patients with different clinical characteristics. The results showed that the risk score between patients with different age (<60 or ≥60 years) was comparative (
Association of the risk score with clinical characteristics and immune cells: tumor grade
We analyzed the percentage of 22 immune cell subtypes in the tumors of both datasets and compared their level between patients with low and high risk scores (
Functional analysis of differentially expressed genes was performed with KEGG and GO functional enrichment analyses. The top 10 GO genes were found to be associated with the biological process (BP), cellular component (CC), and molecular function (MF) (
Enriched pathways by risk score with KEGG and GO functional enrichment analyses.
Factors identified by univariate and multivariate Cox regression analysis as independent and significant prognostic factors were applied in the construction of a nomogram model (
Construction of a nomogram based on the risk predictive score and other prognostic factors. Nomogram constructed based on the risk score and three other factors
GC has long been recognized as a recalcitrant cancer for its high incidence, high mortality, aggressive behavior, refractory traits, and poor prognosis (
In this study, we develop a novel prognostic scoring model based on the expression of lipid metabolism–related genes in gastric cancer. We used independent datasets from GEO to construct and validate the risk predictive scoring system containing 19 lipid metabolism–related genes. This scoring system was demonstrated to be efficient in predicting patient survival by ROC analysis. Patients had a significant and remarkable difference of OS between the high–risk and low–risk score groups. We further generated a nomogram integrating the risk predictive scoring system and three other prognostic factors (patients’ age, TNM stage, and adjuvant chemotherapy) that improved the efficiency of prognostic value of the nomogram and accurately predicted the 1-, 3-, and 5-year OS of GC patients in the GEO datasets.
The risk predictive score calculated with our scoring system was significantly associated with the aggressiveness of GC. Patients with a higher grade tumor and in an advanced stage were shown to have a higher risk score, suggesting that dysregulation of lipid metabolism not just was associated with cancer progression in GC but also served as a driving factor for the aggressiveness of GC. Some of the genes included in our risk scoring system had been found to be involved in cancers, such as lipoprotein lipase (LPL) (
Interestingly, the present study also revealed that patients with high and low risk scores had distinct features in tumor-infiltrating immune cells. Patients with high risk scores had significantly reduced number of plasma cells, activated CD4 memory cells, follicular helper T cells, and resting dendritic cells and increased number of naïve CD4 T cells, monocytes, M2 macrophages, and resting mast cells. Activated CD4 memory cells were associated with favorable outcomes in patients with cervical cancer (
The major limitation of the present study was the lack of validation in larger patient cohorts from multicenter real-world clinical practice. Thus, the risk predictive score was still far from being able to be used in clinical practice. Another important limitation was that most of the genes used to construct this risk predictive score model were scarcely investigated in cancers. In addition, we did not perform the basic experiment to validate their roles and related mechanisms in GC cells. The biological mechanism was unclear and needed further experimental validation. However, as a prognostic risk score, our model was repeatedly validated and achieved consensus results, so the conclusions of our study are still convincing despite the lack of experimental validation of each gene’s role in GC.
In the present study, a novel lipid metabolism–related gene-based risk predictive score model was constructed and validated in datasets of patients with GC. This risk predictive scoring system could efficiently predict patient outcomes and had significant correlation with immune cell subtypes. A nomogram containing the risk score was generated, and it improved the prognostic predictive value of the current TNM staging system. This study will be helpful in biomarker and therapeutics development for GC patients.
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/
The study followed the Declaration of Helsinki and was approved by the Clinical Research Ethics Committee of Sun Yat-sen University Cancer Center. Because of the retrospective nature of the study, patient consent for inclusion was waived.
CP, Y-BC, and X-LW designed the study. T-QL, J-NL, and Z-CX retrieved the data and conducted analysis. T-QL, YW, and YZ drew the tables and figures. X-LW, T-QL, and J-NL wrote the manuscript. All authors read and approved the manuscript.
This study was supported by the Guangdong Basic and Applied Basic Research Foundation (2019A1515110171).
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
The authors would like to thank the authors who submitted the related data on the GEO website.
The Supplementary Material for this article can be found online at:
Flowchart of the study. Two GEO datasets, GSE62254 and GSE26942, were used as the training and validation datasets for the risk predictive score model construction. Further comparisons and establishment of a nomogram based on the risk scores were conducted.
Construction of a risk predictive score model based on lipid metabolism–related genes. 63 prognostic relevant genes in lipid metabolism–related pathways were screened
Kaplan–Meier curves of overall survival stratified by risk score (low/high) in another two datasets: TCGA GC dataset
Subgroup analyses of Kaplan–Meier curves for overall survival stratified by adjuvant chemotherapy (no/yes) and TNM stage (I + II/III + IV) in the combined dataset. Adjuvant chemotherapy—no
Expression of 19 genes
Decision curve analysis (DCA) for 3-year OS and 5-year OS. DCA for 3-year OS in the training dataset