Construction of Bone Metastasis-Specific Regulation Network Based on Prognostic Stemness-Related Signatures in Breast Invasive Carcinoma

Background Bone is the most common metastatic site of Breast invasive carcinoma (BRCA). In this study, the bone metastasis-specific regulation network of BRCA was constructed based on prognostic stemness-related signatures (PSRSs), their upstream transcription factors (TFs) and downstream pathways. Methods Clinical information and RNA-seq data of 1,080 primary BRCA samples (1,048 samples without bone metastasis and 32 samples with bone metastasis) were downloaded from The Cancer Genome Atlas (TCGA). The edgeR method was performed to identify differential expressed genes (DEGs). Next, mRNA stemness index (mRNAsi) was calculated by one-class logistic regression (OCLR). To analyze DEGs by classification, similar genes were integrated into the same module by weighted gene co-expression network analysis (WGCNA). Then, univariate and multivariate Cox proportional hazard regression were applied to find the PSRSs. Furthermore, PSRSs, 318 TFs obtained from Cistrome database and 50 hallmark pathways quantified by GSVA were integrated into co-expression analysis. Significant co-expression patterns were used to construct the bone metastasis-specific regulation network. Finally, spatial single-cell RNA-seq and chromatin immunoprecipitation sequence (ChIP-seq) data and multi-omics databases were applied to validate the key scientific hypothesis in the regulation network. Additionally, Connectivity Map (CMap) was utilized to select the potential inhibitors of bone metastasis-specific regulation network in BRCA. Results Based on edgeR and WGCNA method, 43 PSRSs were identified. In the bone metastasis-specific regulation network, MAF positively regulated CD248 (R = 0.435, P < 0.001), and hallmark apical junction was the potential pathway of CD248 (R = 0.353, P < 0.001). This regulatory pattern was supported by spatial single-cell RNA sequence, ChIP-seq data and multi-omics online databases. Additionally, alexidine was identified as the possible inhibitor for bone metastasis of BRCA by CMap analysis. Conclusion PSRSs played important roles in bone metastasis of BRCA, and the prognostic model based on PSRSs showed good performance. Especially, we proposed that CD248 was the most significant PSRS, which was positively regulated by MAF, influenced bone metastasis via apical junction pathway. And this axis might be inhibited by alexidine, which providing a potential treatment strategy for bone metastasis of BRCA.


INTRODUCTION
Breast Invasive Carcinoma (BRCA) was the most common tumor in female, which originated from ducts and acinar epithelium at all levels of the breast, and most patients suffered from malignant epithelial tumor. And the BRCA can be classified into several types according to the state of progesterone receptor (PR), estrogen receptor (ER), and ERBB2 receptor (HER2) in histological stratification, which was applied in clinical practice (1). Estimated by American Cancer Society (ACS), there were 279,100 new cases, and 42,690 new death BRCA patients in 2020 (2). Besides, the five-year survival rate for BRCA in stage I, II, III, and IV were 98, 92, 75, and 27% according to the statistic from 2009 to 2015, respectively (3). Although the five-year survival rate of primary BRCA was high, the five-year survival rate of bone metastasis was only 20%, and patients were trapped in a vicious cycle between osteolytic degeneration and proliferation of cancer cells (4). Besides, the osteolytic lesions like pain in bone, fractures, spinal compression and hypercalcemia leaded to poor survival quality and death (4).
Machine learning based on high-throughput data played an important role in prognosis of cancer, and new characters defined by algorithm like mRNA stemness index (mRNAsi) and stemness-related gene provided a new way for analysis (5). Recently, analysis on triple-negative breast cancer were launched to explore the key gene related to the stemness and find the target for further therapy (6). Therefore, molecules participated in the bone metastasis required to be explored, and corresponded biomarkers and possible mechanism needed to be drug for clinical strategy and therapy.
In this study, data of RNA-seq in BRCA were identified by edgeR, and differential expressed genes (DEGs) were performed by machine-learning algorithm to define the mRNAsi. Then, the profiling of DEGs were integrated into weighted gene coexpression network analysis (WGCNA) to classify similar genes into multiply modules and outline the phenotypic characteristics of modules. Besides, the mRNAsi and Hallmark gene sets were qualified to annotate modules. Then, the key module and genes most associated with mRNAsi were selected. Univariate and multivariate Cox regression analysis were applied to access the prognostic value of genes. What is more, based on the Pearson analysis for TF, genes, and Hallmark gene sets, a bone metastasisspecific network was constructed. The scientific hypothesis was determined by the correlation coefficient. Moreover, the CMap analysis was applied to find potential inhibitors for signal axis. Finally, spatial single-cell RNA sequence and chromatin immunoprecipitation sequence (ChIP-seq) data and multi-omics online databases were applied to validate the key scientific hypothesis in the regulation network. The bone metastasisspecific regulation network and inhibitors provided potential treatment strategy for bone metastasis of BRCA.

Data Acquisition
Based on the Cancer Genome Atlas (TCGA) database (https:// portal.gdc.cancer.gov/), RNA-seq data of 1,080 primary BRCA samples were downloaded, including 1,048 samples without bone metastasis and 32 samples with bone metastasis. And the bone metastasis was diagnosed by imaging examinations like CT or PET-CT. Besides, demographics like age and gender, tumor information like TNM stage and grade, and follow-up data of all patients were also obtained from the TCGA database. Besides, samples without follow-up information were excluded.

Differentially Expressed Genes Identification
The edgeR package was used to screen the RNA-seq data to define DEGs between primary BRCA patients with and without bone metastasis, and the criteria must fit following two points at the same time: the absolute value of log Fold Change (log FC) must more than 1, and the False Discovery Rate (FDR) must less than 0.05. Then, Gene Oncology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis were utilized to annotate DEGs.

mRNAsi
The mRNAsi was calculated by the one-class logistic regression (OCLR) machine-learning algorithm (7) based on the RNA-seq data of 1,080 primary BRCA samples.

Weighted Gene Co-Expression Network Analysis
With the aim to holistically analyze DEGs, modules were classified by WGCNA (8) R package (http://www.r-project. org/), every single module gathered highly similar DEGs. Then based on RNA-seq profiling, the Pearson correlation analysis was utilized to construct the gene co-expression network. Additionally, the power function was applied to build a weighted adjacency matrix: S ij represented the Pearson correlation between gene i and j, a ij represented the weighted network adjacency between gene i and j. And b equaled to 4 was the soft-threshold parameter set by pickSoft Threshold from WGCNA R package. What's more, the application of soft-threshold parameter of weighted network make it possible to show the continuous variety of co-expression information in [0,1], and it might promote the idea of scale-free co-expression network come true. In addition, the correlation coefficients were utilized to construct hierarchical clustering, and a topological overlap method was performed to classify DEGs with the similar expression patterns into same module. Besides, the capacity of module must more than 20 genes, and when a certain module with less than 20 genes, similar modules will be merged.
Based on the H Collection of Molecular Signatures Database (MSigDB) v7.0 (https://www.gsea-msigdb.org/gsea/msigdb/ genesets.jsp?collection=H) (9), 50 Hallmark gene sets were qualitied by the computational approach named Gene Set Variation Analysis (GSVA). And these Hallmark gene sets were related to biological process and states. Then, with the aim to annotate the specific phenotypic traits for module, 50 Hallmark gene sets and mRNAsi were defined as phenotypes for a co-analysis with modules. Besides, gene significance (GS) coefficients and p values were illustrated the correlation between DEGs and phenotypes. Similarly, module significance (MS) coefficients and p values showed the correlation between modules and phenotypes, and the MS was calculated from the average absolute GS for all genes in every single module. Moreover, the first principle component of module genes showed the gene expression level in the certain module, and module eigengenes (MEs) represented the first principle component. And module membership (MM) represented the correlation between gene to MEs. Next, mRNAsi was the key phenotype to choose the stemness-related module, and the largest MS with p value less than 0.05 were applied to determine the key module for next exploration. What's more, the Hallmark gene set significantly related to the key module and p value less than 0.05 was selected as the key Hallmark gene set, and the key Hallmark gene set was defined as the key pathway for stemness-related signatures (SRSs). Besides, SRSs with MM >0.300 and GS >0.300 were obtained from the key module for further analysis.

Multivariate Prognosis Model Construction
The univariate and multivariate Cox proportional hazard regression were applied to find prognostic stemness-related signatures (PSRSs). SRSs with p value <0.001 in univariate Cox proportional hazard regression were defined as PSRSs. Then, the LASSO regression analysis was applied to avoid the over-fitness. Then, residual PSRSs were integrated into the multivariate Cox regression model, and risk score for each BRCA patient was calculated by the formula: In the formula, "m" represented the number of each patient, "n" represented prognostic PSRS, and "b" represented the coefficient of each prognostic PSRS. Then, patients with BRCA were divided into low-and high-risk groups according to risk score. And the efficiency of risk score of the model was detected by Kaplan-Meier survival analysis. What's more, and accuracy was detected by the Receiver Operating Characteristic Curve (ROC) curve and C-index. Finally, the demographic information, TNM stage and risk score were applied for correction, and univariate and multivariate Cox regression were performed to validate the independent prognostic value.

Potential Signal Axis Identification
Based on the Cistrome database (http://cistrome.org/) (10), the list of 318 Translate factors (TFs) were downloaded. And edgeR was utilized to find differential expressed TFs. Then coexpression analysis for TFs and PSRSs by Pearson correlation analysis, and the significant paired TF-PSRSs were selected.
Aim to identify the significantly co-expressed pathway, the absolute quantification of 50 Hallmark gene sets between nonmetastasis and metastasis patients was screened by GSVA. And to explore the up-and down-regulated pathways between nonmetastasis and metastasis patients, Gene Set Enrichment Analysis (GSEA) was conducted based on the 50 gene sets of Hallmark (11). In addition, the intersection of GSVA, GSEA, and module phenotypic traits was defined as the key pathway. Then, the Pearson correlation analysis was utilized to analysis the interaction between Hallmark gene sets and PSRSs.
Eventually, the network based on TFs, PSRSs and Hallmark gene sets was constructed. And String database (12) was applied to plot a protein-protein interaction (PPI) network. Besides, the criteria for TF-PSRS paired was the absolute value of the correlation coefficient more than 0.400 and p value less than 0.05, for PSRS-Hallmark gene set was the absolute value of correlation coefficient more than 0.300 and p value less than 0.05.

Connectivity Map Analysis
To expend the application of potential signal axis, inhibitors of the signal pathway were selected by the Connectivity Map (build 02) (CMap) (https://portals.broadinstitute.org/cmap/) (13). Then, inhibitors for BCRA were identified. Besides, the information like chemical structural formula and biologic function of inhibitor compounds were available from the mechanism of actions (MoA) (http://clue.io/) (14). Ultimately, the key inhibitor was found according to the TF, PSRS, and Hallmark gene set.

Assay for Targeting Accessible-Chromatin With High-Throughout Sequencing-Seq and Chromatin Immunoprecipitation Sequence Validation
Assay for Targeting Accessible-Chromatin with high-throughout sequencing (ATAC-seq) and ChIP-seq data were used to validated the regulation mechanism of the network. ATAC-seq and ChIP-seq data were obtained from TCGA GDC (https://gdc. cancer.gov/about-data/publications/ATACseq-AWG) and Cistrome database (http://cistrome.org/), respectively (10,15). WashU Epigenome Browser and Gviz package were used to visualize the binding peaks (16,17).

Spatial Transcriptome Validation
The regulatory relationship between TF and PSRS, PSRS and Hallmark gene set required to be validated the direct mechanism in the molecular experiment. When linked with single-cell RNA sequence data, spatial transcriptome data can validate the cell subtype localization of the key genes (https://support. 10xgenomics.com/spatial-gene-expression/datasets/1.1.0/V1_ Breast_Cancer_Block_A_Section_1). For quality control, only fit the following standards at the same time can be selected: genes must express in more than 3 single cells, gene counts more than 1, cell transcripts range from 1,500 to 100,000.
To integrate data analysis, the Seurat method was performed (18). Then, the "sctransform" algorithm was utilized for normalization. In addition, to identify variable genes and spatial-specific genes, "vst" and "markvariogram" method were utilized, respectively. Next, the principal component analysis (PCA) was performed based on variable genes (18). In addition, the jackstraw analysis was utilized to select the principal components (PCs), and p value must less than 0.05. Then, the further t-distributed Stochastic Neighbor Embedding (t-SNE), Haematoxylin and Eosin (HE) and UMAP (Uniform Manifold Approximation and Projection) were applied to identify the cell sub-cluster based on the PCs (resolution = 0.50) (19). And in sub-cluster, DEGs were filtered when the absolute value of log FC less than 0.5 and FDR more than 0.05. Moreover, the location and expression of DEGs were demonstrated in feature plots and violin plots, respectively. Besides, every cluster was annotated by scMatch (20), singleR (21), and CellMarker (22) databases. Aim to annotate single cells, 50 hallmark gene sets were performed to absolutely quantify the signaling pathway activity in each single cell.

Multidimension Validation
To decrease the inherent defects of analysis in silicon, multiply online databases were utilized to validate the scientific hypothesis in several aspects. And top five genes in the key pathway selected by GeneCard (https://www.genecards.org/) were also validated with TFs and PSRSs.

Statistics Analysis
The R software (www.r-project.org; version 3.6.1; Institute for Statistics and Mathematics, Vienna, Austria) was applied in all statistics analysis in our study, and two-sided p value <0.05 was determined as statistically significant.

Differentially Expressed Genes Identification
The expression profiling of 1,048 primary BRCA samples without bone metastasis and 32 primary BRCA samples with bone metastasis were obtained from TCGA database, and all patients' demographics information was summarized in Table 1. And all analysis processes were illustrated in Figure 1. RNA-seq data from TCGA database was screened by edgeR to filter DEGs, and the information of BRCA (Figure 2A

Multivariate Prognostic Model Construction
The heatmap ( Figure 4A) and volcano plot ( Figure 4B) demonstrated the results of DEG analysis of 125 SRSs. And univariate Cox regression analysis was utilized to find the prognostic genes, and 43 PSRSs were identified. In addition, forest plot ( Figure 4C) showed CD248 (HR = 1.00004, 95%CI (1.00001-1.00006), p = 0.007) was significantly associated with prognosis of BRCA. Then, the PSRSs were integrated into multivariate Cox regression analysis, and a prognostic model was constructed. The scatter plot ( Figure 5A) and risk line plot ( Figure 5B) were drawn to illustrated the risk distribution of each patient. In addition, in term of model diagnosis, area under curve (AUC) of ROC curve was 0.711 ( Figure 5C). Based on the median risk score calculated by multivariate model, low-and high-risk groups were accessed by Kaplan-Meier survival analysis ( Figure 5D), and the result displayed a significant difference (p < 0.001). Finally, the risk score was coanalysis with age, T stage, N stage, M stage, stage, the univariate ( Figure 5E) (HR = 167.019, 95%CI (39.677-703.066), p < 0.001) and multivariate ( Figure 5F) (HR = 1.050, 95%CI (1.033-1.067), p < 0.001) Cox regression, suggesting that the risk score was an independent prognostic factor.

Potential Signal Axis Identification
The expression levels of Hallmark gene sets were shown in heatmap plot ( Figure 6A), and differential expressed gene sets showed in volcano plots ( Figure 6B). A total of 47 significant expressed Hallmark gene sets were identified by GSVA ( Figure  6C), and five up-regulated and 15 down-regulated Hallmark gene sets were identified by GSEA ( Figure 6D). What's more, Hallmark gene sets were co-analyzed with PSRSs by the Pearson correlation analysis.
Based on data of 318 TFs from the Cistrome database, a series of analysis on expression launched, and heatmap plot ( Figure  7A) and volcano plot ( Figure 7B) were applied. Moreover, 96 TFs were significantly differential expressed. Then, the Pearson correlation analysis was utilized to find the relation between TFs and PSRSs. , and KEGG (E) enrichment analysis of differential expressed genes. Cell-cell adhesion via plasma-membrane adhesion molecules, contractile fiber, receptor ligand activity were the most significant GO items, and Neuroactive ligand-receptor interaction was the most significant KEGG item.
In addition, the intersection between co-analysis Hallmark gene sets in GSVA and significant Hallmark gene sets in GSVA was illustrated in Venn plots ( Figure 7C), 22 Hallmark gene sets were found.
In sum, the scientific hypothesis was defined: MAF positively regulated CD248, promoting apical junction pathway in BRCA, which might play a role in bone metastasis.

Spatial Transcriptome and Chromatin Immunoprecipitation Sequence Validation
With the aim to further explore the location of key genes in subtype cell clusters, the profiling of scRNA-seq and spatial transcriptome were co-analyzed. Fourteen clusters were identified in UMAP and t-SNE, and pare-carcinoma, invasive ductal carcinoma and intraductal carcinoma in situ were illustrated in HE-stained section ( Figure 9A). For validation, the feature and spatial feature plots of MAF, CD248, GJA1, LAMA3, TJP1, LAMC2, and COL17A1 demonstrated to show the location in BRCA samples, and they were highly-expressed in the invasive ductal carcinoma tissue (cluster 2, 4, 7, 9, 12) ( Figures 9B, C). Besides, a series of analysis based on cell cycle indicated that genes in cluster 4 and 7 of invasive ductal carcinoma and 10 of intraductal carcinoma highly related to phase G2M and S ( Figure 9D). In addition, tumor inhibited pathways like apoptosis, p53 pathway and TNFa signaling via NF-kB were down-regulated while tumor genesis related pathways G2M checkpoint and E2F targets were up-regulated ( Figure 9E). In ChIP-seq analysis, binding peaks were illustrated in CD248 sequence ( Figure 10). Moreover, genes in our hypothesis were validated in ATAC-seq (Figure 11), and they were all regulated in BRCA samples. Therefore, patterns of direct transcriptional regulatory between TF-PSRS interaction pairs were identified.

DISCUSSION
BRCA was a common tumor in female, and patients with bone metastasis suffered from the pain and the risk of fracture and even death (4). What's worse, tumor genesis and osteolytic damage could be mutually reinforcing (4). As consequence, the mechanism of bone metastasis must be expounded for early diagnosis and precise therapy. In the recent study, a total of 813 primary BRCA samples were analyzed. Based on WGCNA method and univariate Cox regression analysis, several modules were annotated by mRNAsi and Hallmark gene sets to find the key module and correspond PSRSs. And the multivariate Cox model was constructed. In addition, multivariate Cox model and risk score were accessed by ROC curve and Kaplan-Meier survival analysis. And the risk score was an independent predict factor. Then, a metastasisspecific regulation network was constructed by the Pearson analysis, and MAF, CD248 and apical junction were significant. Moreover, the regulatory pattern was supported by spatial single-cell RNA sequence and ChIP-seq data and multiomics online databases. Based on CMap analysis, trifluoperazine was identified as the possible inhibitor for bone metastasis of BRCA. MAF was MAF basic leucine zipper transcription factor, which belonged to AP-1 super family, and it regulated the terminal differentiation (34). Besides, MAF was crucial in promoting osteoblast differentiation of bone marrow stromal cell (35). What' more, MAF directly regulated osteoblastspecifical promoter Bglap1, and co-regulated the osteoblast differentiation with RUNX2 (35). In addition, Milica Pavlovic et al. found that high-expressed MAF was significant related to bone metastasis instead of visceral metastasis based on Genomic copy number distortion analysis and immunohistochemistry, and MAF was also correlated to overall survival (36). Moreover, MAF played a role in promoting bone metastasis rather than cancer cell proliferation in vivo (36), which was consistent with our scientific hypothesis. Additionally, MAF also potentially controlled biological processes like migration, adhesion, and osteoclast differentiation in bone metastasis (36).
CD248 highly-expressed in tumor tissue, especially in BRCA, and it was related to the prognosis of patients (37). And the transcription product of CD248 was endosialin, which expressed on the cell surface of fibroblasts and pericytes in tumor instead of tumor endothelium (38). Further, Carmen Viski et al. found CD248 played a pivotal role by promoting the step of infiltration from primary to circulatory system via pericytes in metastasis, which affected on tumor microenvironment (39). Besides, CD248 enhanced the adhesion to the extracellular matrix, and activated the matrix metalloproteinase 9 (MMP9) in tumor metastasis (40). Moreover, CD248 expressed in osteoblast instead of osteoclast, and it has negative effects on osteoblast maturation and ossification (41).
Although none of the study reported the correlation of MAF and CD248, we proposed that MAF positively regulated the transcription of CD248: promoted the function of the transcript in cell adhesion, invasion and migration, and regulated osteoblast function (36,40,41).
Apical junction was a structure of apical domain of epithelial cells, and it linked adjacent epithelial cells by tight and adherent junction, which was essential for maintaining the epithelial barrier (42). And in type III epithelial-mesenchymal transition (EMT), the apical junction was disrupted, the apical-basal polarity lost, and the mesenchymal characteristics emerged, finally cells migrated to vessels and traveled to multiply organs and tissues (43). However, some epithelial characteristics were retained, especially the E-cadherin, and the collective migration was observed in metastasis BRCA (44,45). What's more, when in the stage of transplanting to the bone, E-cadherin and Ecadherin adherent junction formed between cancer cells, then the E-cadherin and N-cadherin adherent junction formed between cancer cells and osteoblasts, and cell-cell contact promoted the tumor proliferation via activated mTOR pathway in tumor microenvironment (46).
Up to date, few of study focused on the relationship between CD248 and apical junction. We speculated that CD248 regulated the apical junction by promoting the bone colonization in bone metastasis (41,46).
Trifluoperazine was a type of calmodulin blocker and dopamine D2 like receptor antagonist, which could inhibit the cancer cell metastasis in vitro, but the collective migration can be promoted by knocking out MRP (47). However, trifluoperazine might disturb the electrostatic surface potential (47). Besides, it inhibited the differentiation of osteoclasts, tumor genesis, metastasis and bone loss and promoted bone formation in breast cancer (48). Therefore, we proposed trifluoperazine might target on the MAF-CD248-apical junction axis, and play a role in inhibiting metastasis.
Last but not least, there were some limitations in our study. Firstly, our analysis was only based on high-throughput bioinformatic analysis rather than mechanism exploration. Then, data and platform were also limited, and more validations based on different data sets were needed (e.g. most ChIP-seq samples were not BRCA). And verifications based on clinical samples (with and without metastasis) were required. Next, more function experiments and directly mechanism needed to be verified. Therefore, gain/loss experiments based on MAF-CD248, CD248-apical junction and MAF-apical junction, rescue experiments on MAF-CD248-apical junction axis in vivo and vitro and Co-Immunoprecipitation will be lunched. Importantly, the MAF-CD248-apical junction axis in bone metastasis BRCA was firstly reported, and our study provided the idea of application on clinical prognosis and precise therapy.

CONCLUSION
In summary, we proposed that MAF positively regulated CD248, then promoted apical junction pathway in BRCA, which played a The feature and spatial plots of MAF, CD248, GJA1, LAMA3, TJP1, LAMC2, and COL17A1, showing that these key genes were highly expressed in the invasive ductal carcinoma tissue (B, C). UMAP plot related to cell cycle, violin plots of phase G2M and S, and phase annotated section (D), and cluster 4 and 7 of invasive ductal carcinoma and 10 of intraductal carcinoma highly related to phase G2M and S. The heatmap of Hallmark gene sets in cell clusters (E). role in bone metastasis. And it could be inhibited by trifluoperazine. Besides, the MAF-CD248-apical junction signal axis was verified by spatial single-cell RNA sequence and ChIPseq data and multi-omics online databases.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found here: TCGA program (https://portal.gdc.cancer. gov) and 10x genomics support center (https://support. 10xgenomics.com/spatial-gene-expression/datasets).

ETHICS STATEMENT
The study was approved by the Ethics Committee of the First Affiliated Hospital of Zhengzhou University. The patients/ participants provided their written informed consent to participate in this study.  The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

ACKNOWLEDGMENTS
We thank the The Cancer Genome Atlas (TCGA) team for using their data.

SUPPLEMENTARY MATERIAL
The