Edited by: Yuanwei Zhang, University of Science and Technology of China, China
Reviewed by: Xinhua Xie, Sun Yat-sen University Cancer Center (SYSUCC), China; JIangyang Zhao, ATGC Inc, United States
†These authors have contributed equally to this work
This article was submitted to Computational Genomics, a section of the journal Frontiers in Genetics
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
The lung is one of the most common sites of distant metastasis in breast cancer (BC). Identifying ideal biomarkers to construct a more accurate prediction model than conventional clinical parameters is crucial. MicroRNAs (miRNAs) data and clinicopathological data were acquired from the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) database. miR-663, miR-210, miR-17, miR-301a, miR-135b, miR-451, miR-30a, and miR-199a-5p were screened to be highly relevant to lung metastasis (LM) of BC patients. The miRNA-based risk score was developed based on the logistic coefficient of the individual miRNA. Univariate and multivariate logistic regression selected tumor node metastasis (TNM) stage, age at diagnosis, and miRNA-risk score as independent predictive parameters, which were used to construct a nomogram. The Cancer Genome Atlas (TCGA) database was used to validate the signature and nomogram. The predictive performance of the nomogram was compared to that of the TNM stage. The area under the receiver operating characteristics curve (AUC) of the nomogram was higher than that of the TNM stage in all three cohorts (training cohort: 0.774 vs. 0.727; internal validation cohort: 0.763 vs. 0.583; external validation cohort: 0.925 vs. 0.840). The calibration plot of the nomogram showed good agreement between predicted and observed outcomes. The net reclassification improvement (NRI), integrated discrimination improvement (IDI), and decision-curve analysis (DCA) of the nomogram showed that its performances were better than that of the TNM classification system. Functional enrichment analyses suggested several terms with a specific focus on LM. Subgroup analysis showed that miR-30a, miR-135b, and miR-17 have unique roles in lung metastasis of BC. Pan-cancer analysis indicated the significant importance of eight predictive miRNAs in lung metastasis. This study is the first to establish and validate a comprehensive lung metastasis predictive nomogram based on the METABRIC and TCGA databases, which provides a reliable assessment tool for clinicians and aids in appropriate treatment selection.
Breast cancer (BC) is the most common cancer diagnosed (excluding skin cancers) and is the second leading cause of cancer death among United States women (
Currently, the traditional tumor node metastasis (TNM) staging system is a standard tool for risk evaluation in clinical practice. However, BC patients with the same stage can have varying clinical outcomes (
MicroRNAs (miRNAs) are small, non-coding single-stranded RNAs (18–25 nucleotides) and negatively regulate gene expression by binding to complementary sequences in the 3' untranslated region (3' UTR) of mRNAs (
Therefore, the purpose of this study was to establish and validate a comprehensive nomogram that incorporated both the miRNAs signature and clinical-related risk features for the individual prediction of lung metastasis status of BC patients. The new prediction model was compared with the traditional TNM staging system in order to determine its reliability. Aiding with this model, clinicians might be able to evaluate the lung metastasis risk of BC patients, thus choosing appropriate medical examinations and optimizing therapeutic regimen.
To identify lung metastasis-related miRNA and mRNA in BC, public datasets with matched miRNA, mRNA, and clinical data were used in this study. A European Genome-phenome Archive (RRID: SCR_004944),
Among the 439 BC patients in the METABRIC dataset, two subsets of patients were defined based on their metastasis status: a lung metastasis group (those who had lung metastasis) and an NM group (those who did not report metastasis until the last follow-up). We identified 853 miRNAs annotated in the METABRIC dataset, and differentially expressed miRNAs (DEmiRNAs) between the two groups were identified using the
An optimal cut-off point was determined using receiver operating characteristic (ROC) curve, to classify samples into low (≤0.168) and high risk (>0.168) group. The Kaplan-Meier (KM) survival analysis with a log-rank test was implemented to compare the survival difference between the two groups (
Univariate logistic regression analysis was performed to compare the predictive power of the eight-miRNA risk score and clinical parameters including age at diagnosis, tumor size, TNM stage, grade, estrogen receptor (ER) status, progesterone receptor (PR) status, human epidermal growth factor receptor 2 (HER2) status, and hormone therapy. Furthermore, we used a multivariate logistic regression analysis to determine whether the eight-miRNA risk score could be an independent predictive factor for lung metastasis in BC patients. Other clinical parameters with values of
The ROC curves were plotted to assess the sensitivity and the specificity of independent predictive parameters including eight-miRNA signature, age at diagnosis, TNM stage, and miRNA-based nomogram in predicting lung metastasis (
The target genes of eight predictive miRNAs were first predicted and analyzed using miRWalk3.0 (RRID: SCR_016509;
For the screened overlapped target genes of each miRNA separately or hub genes for upregulated or downregulated miRNAs, gene ontology (GO) enrichment analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways analysis were performed (clusterProfiler, RRID: SCR_016884;
MicroRNA transcriptome data of BC patients from the TCGA dataset were selected to perform two differential miRNA expression analyses between different subgroups of BC patients. Around 48 DEmiRNAs between patients with lung metastasis only and patients with distant metastasis (except for the lung) were identified using the DESeq2 package of R (DESeq2, RRID: SCR_015687;
The miRNA expression data and corresponding clinical data of the patients of six cancer types [adrenocortical carcinoma (ACC), bladder urothelial carcinoma (BLCA), sarcoma (SARC), skin cutaneous melanoma (SKCM), cervical squamous cell carcinoma and endocervical adenocarcinoma (CESC), and stomach adenocarcinoma (STAD)] were downloaded from the TCGA database. DEmiRNAs between patients with lung metastasis and patients without metastasis were identified in each type of cancer using the DESeq2 package of R.
All the statistical analyses were performed with the SPSS software (RRID: SCR_002865) and R software (version 4.0.0; RRID: SCR_001905).
A total of 479 BC patients from METABRIC and 449 BC patients from TCGA were included in this study. Baseline clinical and pathological characteristics of the study participants in the training and two validation cohorts were listed in
Demographics of the samples chosen for the study.
Variables | Training cohort ( |
Internal validate cohort ( |
External validate cohort ( |
---|---|---|---|
Median age at diagnosis in years (IQR) | 61.11 (51.09–68.99) | 60.57 (50.94–70.25) | 60.00 (71.00–67.00) |
Median follow up time from diagnosis in days (IQR) | 3,318 (1916–4,719) | 3,144 (1781–4,479) | 343.5 (114–1,108) |
Lung metastasis status | |||
No metastasis | 300 | 141 | 433 |
Lung metastasis | 27 | 11 | 16 |
Pam50 subtype | |||
Luminal A | 151 | 78 | 205 |
Luminal B | 83 | 37 | 66 |
HER2 | 26 | 7 | 21 |
Basal like | 46 | 22 | 100 |
Normal breast-like | 21 | 8 | 14 |
Unknown | 0 | 0 | 43 |
TNM stage | |||
1 | 203 | 93 | 152 |
2 | 114 | 58 | 281 |
3 | 8 | 1 | 14 |
4 | 2 | 0 | 2 |
ER status | |||
Positive | 259 | 118 | 304 |
Negative | 68 | 34 | 124 |
Unknown | 0 | 0 | 21 |
PR status | |||
Positive | 187 | 87 | 270 |
Negative | 142 | 65 | 156 |
Unknown | 0 | 0 | 23 |
HER2 status | |||
Positive | 41 | 9 | 53 |
Negative | 286 | 143 | 248 |
Unknown | 148 | ||
Menopausal state | |||
Pre | 71 | 32 | 84 |
Post | 256 | 120 | 312 |
Peri | 0 | 0 | 19 |
Unknown | 0 | 0 | 34 |
Vital status | |||
Alive | 198 | 86 | 435 |
Dead | 129 | 66 | 14 |
PAM50, prediction analysis of microarray 50; ER, estrogen receptor; PR, progesterone receptor; HER2, human epithelial growth factor receptor 2; and TNM, the tumor node metastasis.
The METABRIC dataset includes 1,302 BC samples, of which 479 (36.79%) of them reached the inclusion criteria for the analysis of identifying DEmiRNAs. About 327 samples were randomly assigned as a training cohort and the rest were assigned as the internal validation cohort based on a computer number generator. The flow chart of the study design was showed in
Study design. METABRIC, molecular taxonomy of BC international consortium; LASSO, the least absolute shrinkage and selector operation; KEGG, Kyoto encyclopedia of genes and genomes; Abs, absolute value; FC, fold change; miRNA, microRNA; and TCGA, the cancer genome atlas.
Parameter selection to develop an eight-miRNA signature to distinguish lung metastasis status of breast cancer.
In the training cohort, we used LASSO-based logistic regression and identified eight miRNAs from the 16 DEmiRNAs, which were as follows: miR-663, miR-210, miR-17, miR-301a, miR-135b, miR-451, miR-30a, and miR-199a-5p (
In the training cohort, according to the results of univariate logistic regression analysis, the eight-miRNA signature, and five clinical risk factors (age at diagnosis, tumor size, grade, TNM stage, and HER2 status) with values of
Risk factors for lung metastasis (LM) in training cohort.
Univariate analysis | Multivariate analysis | |||
---|---|---|---|---|
OR (95% CI) |
|
OR (95% CI) |
|
|
miRNA score | 1.898 (1.237–2.912) | 0.0033 | 1.651 (1.046–2.606) | 0.0311 |
Age at diagnosis | 0.583 (0.330–1.020) | 0.0587 | 0.486 (0.275–0.862) | 0.0134 |
Tumor size | 1.499 (1.148–1.958) | 0.0030 | ||
Grade | 3.129 (0.824–11.884) | 0.094 | ||
TNM stage | 3.494 (1.905–6.407) | <0.0001 | 4.025 (2.078–7.795) | <0.0001 |
ER status | 0.738 (0.298–1.824) | 0.511 | ||
PR status | 1.085 (0.487–2.416) | 0.842 | ||
HER2 status | 2.759 (1.087–7.005) | 0.0328 | ||
Hormone therapy | 0.563 (0.253–1.254) | 0.1599 |
LM, lung metastasis; miRNA, microRNA; ER, estrogen receptor; PR, progesterone receptor; HER2, human epidermal growth factor receptor 2; and TNM, the tumor node metastasis.
Development and assessment of the miRNA-based nomogram. Constructed a miRNA-based nomogram to predict LM for BC patients in the training cohort, with age at diagnosis, stage, and eight-miRNA signature incorporated. LM, lung metastasis; BC, breast cancer; and miRNA, micro RNA.
Area under the receiver operating characteristics curve (AUC) of prognostic indicators for lung metastasis in breast cancer (BC).
Variables | Training cohort | Internal validation cohort | External validation cohort |
---|---|---|---|
miRNA score | 0.681 (95% CI, 0.589–0.774) | 0.754 (95% CI, 0.561–0.946) | 0.711 (95% CI, 0.608–0.815) |
Age at diagnosis | 0.403 (95% CI, 0.290–0.516) | 0.282 (95% CI, 0.117–0.448) | 0.623 (95% CI, 0.479–0.768) |
TNM stage | 0.727 (95% CI, 0.628–0.825) | 0.583 (95% CI, 0.407–0.759) | 0.840 (95% CI, 0.716–0.963) |
Nomogram model | 0.774 (95% CI, 0.669–0.879) | 0.763 (95% CI, 0.597–0.929) | 0.925 (95% CI, 0.846–1.000) |
TNM, the tumor node metastasis; AUC, area under the receiver operating characteristics curve.
Assessment of the miRNA-based nomogram. Receiver operating characteristic (ROC) curves of eight-miRNA signature, age at diagnosis, stage, and the miRNA-based nomogram model predicting LM in
We then examined the predictive ability of our eight-miRNA signature and nomogram model in two validation cohorts. The distributions of the miRNA-based risk score, OS, OS status, and the expression profiles of predictive miRNAs in the internal validation cohorts have been shown in
An independent external validation cohort of 449 patients who fulfilled the same requirements as above was recruited from the TCGA dataset. A total of seven of the eight miRNAs identified in our study were found in the TCGA miRNA dataset (the exception being miR-663). The distributions of the miRNA-based risk score, OS, OS status, and the expression profiles of predictive miRNAs in the external validation cohorts has been shown in
Survival curves of BC patients stratified by different variables. KM curves of overall survival of breast cancer patients stratified by
Risk factors for lung metastasis in external validation cohort.
Univariate analysis | Multivariate analysis | |||
---|---|---|---|---|
OR (95% CI) |
|
OR (95% CI) |
|
|
miRNA score | 2.748 (1.299–5.816) | 0.0082 | 4.207 (1.440–12.290) | 0.0086 |
Age at diagnosis | 1.678 (0.861–3.268) | 0.1277 | 1.748 (0.811–3.769) | 0.1540 |
TNM stage | 29.345 (9.153–94.086) | <0.0001 | 32.540 (8.986–117.830) | <0.0001 |
TNM, the tumor node metastasis.
Currently, the conventional TNM staging system is the standard tool for risk evaluation in clinical practice. When comparing the AUC, we found that the miRNA-based prediction nomogram achieved better predictive accuracy than the TNM stage in the training cohort and two validation cohorts (
The improvement of miRNA-based nomogram in predicting lung metastasis according to net reclassification improvement (NRI) and integrated discrimination improvement (IDI).
Training cohort | Internal validation cohort | External validation cohort | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
NRI (95% CI) |
|
IDI (95% CI) |
|
NRI (95% CI) |
|
IDI (95% CI) |
|
NRI (95% CI) |
|
IDI (95% CI) |
|
0.216 (0.048–0.384) | 0.012 | 0.065 (0.015–0.115) | 0.011 | 0.307 (0.020–0.594) | 0.036 | 0.093 (0.021–0.165) | 0.011 | 0.308 (0.081–0.535) | 0.008 | 0.025 (−0.048–0.098) | 0.5 |
NRI, net reclassification improvement; IDI, the integrated discrimination improvement; and P,
Decision-curve analysis was conducted to compare the clinical use of our nomogram to that of the TNM staging system (
We identified the gene targets for predictive miRNAs using
Identification of potential targets for predictive miRNAs and their role in lung metastasis.
Gene ontology and Kyoto Encyclopedia of Genes and Genomes pathway enrichment analyses were performed for the overlapped target genes of each predictive miRNAs. Among pro-metastatic miRNAs, miR-17 mainly interfered with cell cycle arrest (BP), mitotic G1/S transition checkpoint (BP), positive regulation of autophagy (BP), signal transduction by p53 class mediator (BP), focal adhesion (KEGG), signaling pathways regulating pluripotency of stem cells (KEGG), regulation of actin cytoskeleton (KEGG), and hippo signaling pathway (KEGG;
These miRNAs functioned together in the organism, so then we tried to identify the role of five upregulated or three downregulated miRNAs as a whole. Hub genes of the target genes for five upregulated or three downregulated miRNAs were generated to identify central elements of pro-metastatic and anti-metastatic biological networks (
In order to determine whether these eight predictive miRNAs were unique to lung metastasis in BC patients, we first identified DEmiRNAs between patients with lung metastasis only and patients with distant metastasis except for the lung (
Demographics of the samples recruited in subgroup analysis.
Variables | lung metastasis only ( |
distant metastasis except for the lung ( |
without metastasis ( |
---|---|---|---|
Median age at diagnosis in years (IQR) | 65 (56–71.5) | 57 (47–63.25) | 60 (50.5–67.00) |
Median follow up time from diagnosis in days (IQR) | 1,233 (645.3–3,578) | 1,096 (190.5–2,405) | 343.5 (109.3–1,064) |
Pam50 subtype | |||
Luminal A | 0 | 22 | 201 |
Luminal B | 1 | 11 | 64 |
HER2 | 0 | 5 | 21 |
Basal like | 2 | 6 | 94 |
Normal breast-like | 1 | 2 | 13 |
Unknown | 2 | 8 | 40 |
TNM stage | |||
1 | 0 | 7 | 151 |
2 | 2 | 27 | 276 |
3 | 4 | 14 | 5 |
4 | 0 | 6 | 1 |
ER status | |||
Positive | 1 | 37 | 115 |
Negative | 5 | 12 | 297 |
Unknown | 0 | 5 | 21 |
PR status | |||
Positive | 1 | 31 | 266 |
Negative | 5 | 19 | 144 |
Unknown | 0 | 4 | 23 |
HER2 status | |||
Positive | 1 | 2 | 52 |
Negative | 1 | 17 | 241 |
Unknown | 4 | 35 | 140 |
Menopausal state | |||
Pre | 1 | 12 | 82 |
Post | 5 | 32 | 299 |
Peri | 0 | 2 | 19 |
Unknown | 0 | 8 | 33 |
Patient metastatic sites | |||
Lung |
|
|
|
Bone |
|
|
|
Brain |
|
|
|
Liver |
|
|
|
Multi-organ Metastasis |
|
|
|
No metastasis |
|
|
|
Vital status | |||
Alive | 1 | 16 | 433 |
Dead | 5 | 38 | 0 |
PAM50, prediction analysis of microarray 50; ER, estrogen receptor; PR, progesterone receptor; HER2, human epithelial growth factor receptor 2; and TNM, the tumor node metastasis.
miR-30a and miR-135b have unique roles in lung metastasis of BC. Dot plots were plotted to show the distributions of miR-30a, miR-135b, and miR-17 in BC patients with distant metastasis except for the lung, BC patients with lung metastasis only, and BC patients without metastasis. miRNA, microRNA; BC, breast cancer. *
We performed differential miRNA expression analyses between patients with lung metastasis and patients without metastasis in six cancer types (ACC, BLCA, SARC, SKCM, CESC, and STAD;
Based on Surveillance, Epidemiology, and End Results (SEER) database, the median survival time for BC patients with lung metastases was 21 months, and only 15.5% of the patients were alive for more than 3 years (
Subgroup analysis suggested that miR-30a and miR-135b have distinct roles in lung metastasis of BC patients. miR-30 has been reported to be able to stabilize pulmonary vessels and inhibit pulmonary vascular hyperpermeability in the premetastatic phase (
The significance of miRNAs is better appreciated from the aspect of their potential functional impact on biological pathways, as these influence the outcomes for the patient (
We also conducted a pan-cancer analysis to figure out whether the eight predictive miRNAs were specific to BC. Some of the miRNAs had consistent effects in different cancer types, such as miR-30a, miR-17, miR-451a, and miR-135b, while others showed controversial effects, such as miR-210, miR-301a, and miR-199a. Previous studies also identified the role of these predictive miRNAs in lung metastasis of other types of cancer (
The univariate and multivariate logistic regression analysis showed that the eight-miRNA signature could be an independent risk factor in training and validation cohorts. The AUC of eight-miRNA signature alone for lung metastasis prediction showed a little bit smaller than that of the TNM staging system in training and external validation cohort. Therefore, the comprehensive predictive nomogram was constructed integrating the risk score and conventional clinical parameters including stage and age, all of which were verified as an independent risk factor using univariate and multivariate logistic regression analysis for the lung metastasis status of BC patients. Apart from AUC, the calibration plot was also used to assess the discrimination performance of the nomogram model. Although the overall trend was in line with the 45-degree ideal diagonal line, yet the calibration plot showed some deviation, which may due to the limited events and thus affecting the power. NRI, IDI, and DCA were used to evaluate the prediction ability between miRNA-based nomogram and the TNM staging system. The results of NRI indicated the significant improvement of miRNA-based nomogram in all three cohorts, and the results of IDI suggested that the nomogram model improved the predictive power, yet failed to reach a significant level in the external validation cohort. DCA results also indicated that our miRNA-based nomogram improved current treatment standards, while the ideal model was the model with the positive net benefit at any given threshold.
However, several limitations of our study should be acknowledged. Firstly, due to the different sequence platforms, only seven of eight predictive miRNAs were identified in the external validation cohort, so we did not adopt the risk scores and cut-off points generated in the training set as previous research suggested (
In this study, we constructed a nomogram model based on multiple lung metastasis-related miRNAs and clinical risk factors to predict the lung metastasis of BC patients. We screened the high-throughput sequence data from the METABRIC database to explore DEmiRNAs and used the LASSO method to identify an eight-miRNA signature. The risk score was calculated by the multivariate logistic coefficient multiplied by the expression of the miRNA. Then the risk score and clinical risk factors were combined together to construct a miRNA-based nomogram, which was assessed by the calibration plot, ROC analysis, NRI, IDI, and DCA. Internal and external validation was also performed to evaluate the nomogram model. Functional enrichment analyses were performed to identify the potential biological roles of eight predictive miRNAs. Subgroup analysis of BC patients with different distant metastasis showed that miR-30a, miR-135b, and miR-17 have unique roles in lung metastasis of BC. Pan-cancer analysis of patients with lung metastasis or without metastasis in six types of cancer indicated the significant importance of eight predictive miRNAs in lung metastasis. A biomarker-based approach to accurately predict the metastasis status of BC patients is urgently needed in the era of precision medicine. Risk assessment is vital for making appropriate therapeutic decisions and follow-up strategies in BC patients. If a patient has a high probability to have lung metastasis in the future, we might recommend the patient to take a close inspection of the lung and adopt advanced treatment. This model might be able to perform well in all patients, for it was constructed based on large-scale datasets. In addition, this risk score was also a significant factor in affecting survival. Therefore, this nomogram could be used as a supportive graphic tool in clinical practice to facilitate treatment decisions of BC patients.
In our current study, we identified eight predictive miRNAs from publicly available data and constructed an eight-miRNA based nomogram that incorporated other clinical parameters including stage and age to predict the lung metastasis status of BC patients, whose prediction power was better than that of conventional TNM stage system. Subgroup analysis suggested that miR-30a, miR-135b, and miR-17 may have unique roles in lung metastasis of BC patients. On the basis of the GO, KEGG enrichment, and pan-cancer analyses, the eight miRNAs played crucial roles in lung metastasis cascade. Therefore, our eight-miRNA-based nomogram might be a vital tool for lung metastasis prediction in BC patients, aiding in developing personalized treatment strategies.
The datasets analyzed for this study can be found in The Cancer Genome Atlas (
Ethical review and approval was not required for the study on human participants in accordance with the local legislation and institutional requirements. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.
LZ initiated and organized the study. LZ and JP designed and carried out bioinformatics and statistical analyses, drew figures, and drafted the manuscript. ZW, CY, and JH participated in editing the manuscript. All authors contributed to the article and approved the submitted version.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
The datasets of this study are obtained by the Cancer Genome Atlas database and European Genome-phenome Archive. We are grateful to them for the establishment and management of the databases.
The Supplementary Material for this article can be found online at:
BC | Breast cancer | ||
miRNAs | MicroRNAs | ||
METABRIC | Molecular taxonomy of breast cancer international consortium | ||
DEmiRNAs | Differentially expressed miRNAs | ||
TCGA | The cancer genome atlas | ||
AUC | Area under the receiver operating characteristics curve | ||
TNM | The tumor node metastasis | ||
NRI | Net reclassification improvement | ||
IDI | The integrated discrimination improvement | ||
DCA | Decision-curve analysis | ||
LASSO | Least absolute shrinkage and selection operator | ||
OS | Overall survival | ||
3' UTR | 3' untranslated region | ||
KM | Kaplan-Meier | ||
ROC | Receiver operating characteristic curve | ||
ER | Estrogen receptor | ||
PR | Progesterone receptor | ||
HER2 | Human epidermal growth factor receptor 2 | ||
DEmRNAs | Differentially expressed mRNAs | ||
GO | Gene ontology | ||
KEGG | Kyoto encyclopedia of genes and genomes | ||
DFS | Disease-free survival | ||
SEER | Surveillance, epidemiology, and end results | ||
EMT | Epithelial-mesenchymal transition | ||
TGFβ | Transforming growth factor beta | ||
ACC | Adrenocortical carcinoma | ||
BLCA | Bladder urothelial carcinoma | ||
SARC | Sarcoma | ||
SKCM | Skin cutaneous melanoma | ||
CESC | Cervical squamous cell carcinoma and endocervical adenocarcinoma | ||
STAD | Stomach adenocarcinoma | LM | Lung metastasis |
1
2
3
4
5
6
7
8
9
10