Screening and Identification of Key Biomarkers in Acquired Lapatinib-Resistant Breast Cancer

Lapatinib, targeting the human epidermal growth factor receptor family members HER1 and HER2, has been approved by the US Food and Drug Administration for use in metastatic HER2-positive breast cancer. However, resistance to lapatinib remains a common challenge to HER2-positive metastatic breast cancer. Until now, the molecular mechanisms of acquired resistance to lapatinib (ALR) have remained unclear. With no definite biomarkers currently known, we aimed to screen for key biomarkers in ALR. In this research, we identified 55 differentially expressed genes (DEGs, 20 upregulated, 35 downregulated) through bioinformatic analysis using microarray datasets GSE16179, GSE38376, and GSE51889 from the Gene Expression Omnibus (GEO) database. The related gene function was explored using the Gene Ontology (GO) function and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis. The protein-protein interaction (PPI) network was constructed with the Search Tool for the Retrieval of Interacting Genes (STRING) and Cytoscape. The functional enrichment of the DEGs was analyzed, including negative regulation of the B cell apoptotic process, DNA replication, solute:proton symporter activity, synthesis, and degradation of ketone bodies, and metal sequestration by antimicrobial proteins. Analysis of seven hub genes revealed their concentration mainly in DNA replication and cell cycle. Survival analysis revealed that MCM10 and SPC24 may be related with poor prognosis in patients with ALR. Meanwhile, the prediction model of lapatinib sensitivity was constructed, and emerging role of the model was further analyzed using several webtools. In conclusion, hub genes are involved in the complex mechanisms underlying ALR in breast cancer and provide favorable support for treatment of ALR in future.


INTRODUCTION
Based on molecular markers, breast cancer is divided into four subgroups: luminal A, luminal B, basal-like, and human epidermal growth factor receptor 2 (HER2)-enriched (Perou et al., 2000). Receptor tyrosine-protein kinase HER2, also known as erbB-2, is included in the epidermal growth factor receptor (EGFR) family of receptor tyrosine kinases (Oh and Bang, 2020). HER2 is overexpressed in 20%-25% of breast cancer patients. HER2 over-expression is known as an aggressive tumor phenotype and is associated with worse survival (Parakh et al., 2017). Lapatinib, a reversible tyrosine kinase inhibitor with specificity for both EGFR and HER2, is approved for treating HER2-positive metastatic breast cancer after disease progression with trastuzumab therapy (Gradishar, 2013;Moasser and Krop, 2015). Compared with capecitabine monotherapy, lapatinib in combination with capecitabine improved objective response rate and progression-free survival (Geyer et al., 2006). Despite the effectiveness of lapatinib in HER2-positive breast cancer, acquired resistance remains a major clinical obstacle. D'Amato et al. have pointed out multiple mechanisms of ALR in breast cancers, including activation of compensatory pathways, mutation of the HER2 kinase domain, and gene amplification (D'Amato et al., 2015). Critically, there are currently no definite biomarkers to predict patients' responses to lapatinib.
With the development of gene sequencing and bioinformatics, increasing number of genetic studies have revealed the mechanism of tumorigenesis and drug resistance. By introducing microarray data and bioinformatic analysis that have been widely applied to investigate whole expression of genes in cancer, researchers have deepened their understanding of the differentially expressed genes (DEGs) and functional enrichment analysis among the complex diseases (Zhu et al., 2017). Although there are some bioinformatic studies corresponding to resistance to anti-HER2 therapies, scarce data and different laboratory conditions make it difficult to acquire reliable results. To overcome the limitation of insufficient data, we identified DEGs through bioinformatic analysis with three Gene Expression Omnibus (GEO) microarray datasets. Additionally, the related gene function was explored with Gene Ontology (GO) function and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis, and the protein-protein interaction (PPI) network was determined to express functions to establish a solid theoretical framework for potential molecular mechanisms. In the present study, we aimed to screen for biomarkers in ALR and found that there were 55 DEGs and 7 hub genes, which may be potential biological markers and provide a theoretical support for further treatment of ALR. The specific prediction model was established to evaluate the relationship between clinicopathologic characteristics of breast cancer patients and sensitivity of lapatinib.

Definition of DEGs
Three datasets were merged using Perl scripts to obtain more genes fully. To reduce deviation in data processing, we normalized batch effect using "sva" package in R software and screened the DEGs between lapatinib-sensitive cell samples and lapatinib-resistant cell samples using the "limma" package in R software (http://www.r-project.org/) (Davis and Meltzer, 2007). The difference was considered significant when |logFC| (fold change) was ≥ 1 and P value was < 0.05.

Enrichment Analysis of DEGs
The web-based Gene SeTAnaLysis Toolkit (WebGestalt) (http:// www.webgestalt.org/) is one of the most widely used online databases that helps researchers extract biological information from genes of interest (Liao et al., 2019). The goal of GO is to provide a vocabulary that can be applied to the shared genes and proteins, annotating genes, and analyzing biological process (Ashburner et al., 2000). KEGG is a database that sheds light on higher-order functional behaviors from molecular information generated by genome sequencing and other highthroughput experimental techniques (Kanehisa, 2002). Reactome functions as an extended version of a classic metabolic map (Jassal et al., 2020). To confirm characteristic biological functions of DEGs, analyses were performed using WebGestalt. The difference was considered significant when P value was < 0.05 and top 10 would be selected.

Construction of PPI Network
The PPI network was utilized to reveal many functional relationships and interactions among predicted target proteins using the Search Tool for the Retrieval of Interacting Genes (STRING) (version 11.0) database (http://string-db.org) (Szklarczyk et al., 2015). An interaction score > 0.4 was considered statistically significant, and disconnected nodes in the network were hidden. Cytoscape (version 3.7.2) is a bioinformatics software platform for visualizing modules of the PPI network (Smoot et al., 2011). The cytoHubba of Cytoscape is an application for exploring important hubs and clustering an interactome network with topological algorithms (Chin et al., 2014). For each module, the GO, KEGG, and reactome analysis were performed using WebGestalt.

Selection of Hub Genes
The hub genes were selected with the Maximal Clique Centrality algorithm (MCC). A network of their co-expressed genes was appraised by GeneMANIA (http://genemania.org/), which can provide gene function and extend the list with similar genes (Warde-Farley et al., 2010). Efficient hierarchical cluster analysis of hub genes was performed using UCSC Cancer Genomics Browser (http://xena.ucsc.edu/) (Kent et al., 2002). To perform survival analysis in a larger number of patients, gene expression data and clinicopathologic data of breast cancer patients were downloaded from The Cancer Genome Atlas (TCGA) at the UCSC Cancer Genomics Browser (http://xena.ucsc.edu/). The

Dataset
Platform   survival analysis of hub genes was constructed by the R packages "survival" and "survminer". The difference was considered significant when P value was < 0.05.

Prediction of Drug Sensitivity
Based on the data from TCGA of patients diagnosed with breast cancer, and drug sensitivity data from the Cancer Genome Project (CGP), the prediction model was performed using R package "pRRophetic" (Garnett et al., 2012;Geeleher et al., 2014a). The half-maximal inhibitory concentration (IC50) of lapatinib in each treated patient was obtained by ridge regression, and the prediction accuracy was measured through 10-fold crossvalidation. The default parameters were chosen, including "combat" for removing the batch effect, "allSolidTumors" for tissue type, and mean value for identifying the duplicate gene expression (Geeleher et al., 2014b). The median IC50 was selected for risk stratification to develop a specific model: lower IC50 was more sensitive to lapatinib. GraphPad Prism 7.0 was used to analyze data, and categorical data was compared by chi-square test. The difference was considered significant when P value was < 0.05.

Definition of DEGs in ALR
After normalization of the microarray data, 14,653 genes were merged among three datasets. Fifty-five DEGs are shown in the volcano plot (Figure 2A), consisting of 20 upregulated genes and 35 downregulated genes between lapatinib-sensitive cells and lapatinib-resistant cells.

Enrichment Analysis of DEGs
To obtain a comprehensive understanding of the biological effect of 55 DEGs, WebGestalt was applied to describe functional enrichment (Supplementary Table 1). GO analysis indicated that negative regulation of B cell apoptotic process, glomerular mesangium development, sequestering of metal ion, DNAdependent DNA replication, and protein autoubiquitination were the top 5 relevant biological process ( Figure 3A). For the cellular component, replication terms were mainly enriched ( Figure 3A). Molecular functions were concentrated mainly on solute:proton symporter activity, chromatin binding, and carbohydrate:proton symporter activity ( Figure 3A). KEGG pathway analysis revealed that synthesis and degradation of ketone bodies, terpenoid backbone biosynthesis, and butanoate metabolism were three of the most enriched pathways ( Figure  3B). Reactome pathway analysis demonstrated that DEGs were mainly enriched in metal sequestration by antimicrobial proteins, DNA replication, and cellular hexose transport ( Figure 3B).

Construction of PPI Network
To understand the biological activity at the protein level, an integrated PPI network of these DEGs was performed ( Figure  2B), and the most important module was constructed using Cytoscape ( Figure 2C, Supplementary

Selection of Hub Genes
The hub genes were calculated by Matthews Correlation Coefficient (MCC), and the top seven genes were selected. A network of their co-expressed genes was interpreted by GeneMANIA ( Figure 4). The hub genes could distinguish breast cancer samples from normal tissue samples through hierarchical clustering (Figures 5A-C). When the hub genes were expressed highly, an increasing number of samples presented estrogen receptor status ( Figure 5D). The survival analysis of hub genes from TCGA was verified in 185 patients diagnosed with HER2-positive breast cancer. Patients with high MCM10 and SPC24 expression showed worse overall survival ( Figure 6).

Prediction of Drug Sensitivity
The specific prediction model could evaluate the relationship between breast cancer patients and sensitivity of lapatinib ( Figure 7, Table 2). A cohort involving 664 breast cancer patients with clinicopathologic data from TCGA was analyzed. The distributions of patients' age and gender were not significantly different between the low-IC50 and high-IC50 groups. The high-IC50 group tended to include more nonwhite patients (chi-square test, P = 0.03) and more patients with stage II cancer (chi-square test, P = 0.0003). More patients with infiltrating lobular carcinoma and mixed histology were included in low-IC50 group (chi-square test, P = 0.0004). Interestingly, there were more patients with both HER2-positive and hormone receptor-positive and luminal subtype in low-IC50 group (chi-square test, P < 0.0001) indicating that the status of hormone receptor could influence lapatinib sensitivity.

DISCUSSION
Although lapatinib has been approved for treating HER2positive metastatic breast cancer after trastuzumab failure, acquired resistance to lapatinib (ALR) remains a major clinical challenge. However, the molecular mechanisms of ALR remain unclear. Thus, in-depth study of the mechanism of ALR and discovery of biomarkers with high sensitivity and specificity are of great value to improve the prognosis of patients with HER2positive breast cancer. Microarray data and bioinformatic analysis have enabled us to explore the whole expression of genes in ALR and improved our understanding of DEGs and functional pathways in complex diseases.
In the current study, DEGs were analyzed using three mRNA microarray datasets between lapatinib-sensitive cell samples and lapatinib-resistant cell samples. Fifty-five DEGs were analyzed, consisting of 20 upregulated genes and 35 downregulated genes. Functional enrichment analysis of GO and KEGG revealed that the majority of DEGs were associated with the following cellular processes: cell cycle, development, apoptosis, and signal transduction. In particular, several terms, including GO terms "negative regulation of B cell apoptotic process", "replication fork", "solute:proton symporter activity", KEGG terms "synthesis and degradation of ketone bodies", "terpenoid backbone biosynthesis", and reactome terms "metal sequestration by antimicrobial proteins", "DNA replication", were closely related with classic mechanisms of ALR in breast cancers. The hub genes were calculated by MCC, and the top seven genes including AURKB, GINS2, MCM10, UHRF1, POLE2, SPC24, and E2F2 were presumed to be associated with drug resistance. Seven hub genes were later corroborated with TCGA data from 185 HER2-positive breast cancer patients. Survival analysis showed that MCM10 and SPC24 may be related with poor prognosis in patients with

Biomarkers in Acquired Lapatinib-Resistance
Frontiers in Pharmacology | www.frontiersin.org September 2020 | Volume 11 | Article 577150 acquired lapatinib resistance. Furthermore, the specific prediction model revealed that the distributions of patients' races, pathologic tumor stages, histological types, and molecular subtypes were significantly different between low-IC50 and high-IC50 groups. MCM10, a highly conservative mini-chromosome maintenance protein, is involved in the initiation of eukaryotic genome replication. In a recent study, MCM10 induced migration and invasion of breast cancer via the Wnt/b-catenin pathway (Yang and Wang, 2019). SPC24, a component in the kinetochore microtubule interface, is associated with tumorigenic transformation (Zhu et al., 2015). Additionally, SPC24-which monitors the PI3K/ AKT kinase pathway-is overexpressed in breast cancer, implying its importance in clinical treatment (Zhou et al., 2018). Although E2F2 was not found to be of significance in survival analysis, it could be a gene of interest. E2F2, a member of the E2F family, is typically repressed by the retinoblastoma protein pRB. E2F transcription factors exist in the CDK4/6-RB1 pathway, and this pathway is often dysregulated in hormone receptor-positive breast cancer (Spring et al., 2020). Many researches have reported that E2F1-3 transcripts are highly expressed in HER2-positive tumors (Andrechek, 2015;Wu et al., 2015). Nikolai et al. found that the potential for combined targeting of HER2 and CDK signaling pathways may be a prospective strategy (Nikolai et al., 2016). Therefore, CDK4/6 inhibitors may overcome resistance to lapatinib. Overall, seven hub genes were analyzed, and results showed that these genes were concentrated mainly on DNA replication and cell cycle. Most interestingly, after analyzing the hub genes, we found that several signaling pathways may be related with ALR. Thus, many targeted drugs could be expected to reverse ALR.

CONCLUSION
In conclusion, the overexpression of AURKB, GINS2, MCM10, UHRF1, POLE2, SPC24, and E2F2 in HER2-positive breast cancer patients with ALR showed that these hub genes could be potential prognostic biomarkers in such patients. Survival analysis revealed that high MCM10 and SPC24 expression were negative prognostic factors in patients with acquired lapatinib FIGURE 7 | The prediction model was performed by R package "pRRophetic". P < 0.05 was considered statistically significant.
resistance. Future preclinical and translational studies should be directed at defining mechanisms involved in ALR and a combination of targeted agents.

DATA AVAILABILITY STATEMENT
The data analyzed in this paper were obtained from the microarray datasets GSE16179, GSE38376, and GSE51889 from the Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih. gov/geo). Gene expression data and clinicopathologic data of breast cancer patients were downloaded from The Cancer Genome Atlas (TCGA) at the UCSC Cancer Genomics Browser (http://xena.ucsc. edu/).

AUTHOR CONTRIBUTIONS
YY designed the study. SB and YC conducted all statistical analyses and drafted the manuscript. FY, CS, MY, WL, XH, JL, and HW revised the manuscript. All authors contributed to the article and approved the submitted version.