Gene Expression Profiling for Diagnosis of Triple-Negative Breast Cancer: A Multicenter, Retrospective Cohort Study

Background: Triple-negative breast cancer (TNBC) accounts for 12–20% of all breast cancers. Diagnosis of TNBC is sometimes quite difficult based on morphological assessment and immunohistochemistry alone, particularly in the metastatic setting with no prior history of breast cancer. Methods: Molecular profiling is a promising diagnostic approach that has the potential to provide an objective classification of metastatic tumors with unknown primary. In this study, performance of a novel 90-gene expression signature for determination of the site of tumor origin was evaluated in 115 TNBC samples. For each specimen, expression profiles of the 90 tumor-specific genes were analyzed, and similarity scores were obtained for each of the 21 tumor types on the test panel. Predicted tumor type was compared to the reference diagnosis to calculate accuracy. Furthermore, rank product analysis was performed to identify genes that were differentially expressed between TNBC and other tumor types. Results: Analysis of the 90-gene expression signature resulted in an overall 97.4% (112/115, 95% CI: 0.92–0.99) agreement with the reference diagnosis. Among all specimens, the signature correctly classified 97.6% of TNBC from the primary site (41/42) and lymph node metastasis (41/42) and 96.8% of distant metastatic tumors (30/31). Furthermore, a list of genes, including AZGP1, KRT19, and PIGR, was identified as differentially expressed between TNBC and other tumor types, suggesting their potential use as discriminatory markers. Conclusion: Our results demonstrate excellent performance of a 90-gene expression signature for identification of tumor origin in a cohort of both primary and metastatic TNBC samples. These findings show promise for use of this novel molecular assay to aid in differential diagnosis of TNBC, particularly in the metastatic setting.


INTRODUCTION
Breast cancer is the most common malignancy and the sixth leading cause of cancer-related mortality among women in China, accounting for ∼2,686,000 new cases and 695,000 deaths in 2015 (1). Histologically, breast cancer is a heterogeneous disease with distinct subtypes and pathological features, leading to variable treatment options, and prognoses. Triple-negative breast cancer (TNBC) accounts for 12-20% of all breast cancer cases and is characterized by a lack of estrogen receptor (ER) and progesterone receptor (PR) expression, combined with an absence of both overexpression and amplification of the human epidermal growth factor receptor-2 (HER-2) gene (2).
TNBC is associated with a high rate of relapse and poor outcomes within the first 3 years after treatment (3,4). Given the latest promising data on immunotherapy, precise diagnosis of this malignancy is more important than ever for determining patient prognosis and facilitating patient-tailored therapy (5). In most cases, the primary tumor can be recognized based on morphological assessment and immunohistochemistry (IHC). Mammaglobin (MGB) and gross cystic disease fluid protein-15 (GCDFP-15) are currently the best immunohistochemical markers available for metastatic breast cancer, with reported overall sensitivities ranging from 50 to 87% and 10 to 79%, respectively (6). However, both markers demonstrate considerably lower sensitivities in TNBC than in ER-positive tumors (7)(8)(9). Therefore, clinical identification of the site of tumor origin, in particular for metastatic cancers without a prior history of breast cancer, is difficult and thus urgently needed. In recent years, efforts have been made toward establishing new supplementary diagnostic tools for identification of primary tumor sites. Molecular profiling is a promising diagnostic approach that has the potential to provide an objective classification of metastatic cancers with an uncertain or unknown tissue of origin and to facilitate more time-and cost-effective diagnostic work-up of cancer patients (10). Molecular diagnostic profiling methods that use either microarrays or real-time reverse transcription polymerase chain reaction (RT-PCR) have been developed to classify a multitude of tumor types or to diagnose certain types of cancer. Kerr et al. described a 92-gene molecular classifier with an overall accuracy of 99% for determination of the site of origin of tumors with neuroendocrine differentiation (11). Additionally, Benjamin et al. analyzed microRNA expression profiles to identify malignant pleural mesothelioma (12).
Previously, we developed a pan-cancer gene expression signature with an overall accuracy of 97.1% for classification of carcinomas originating in 22 major tissue types, including adrenal gland, brain, breast, cervix, colorectum, endometrium, gastroesophagus, head and neck, kidney, liver, lung, lymphatic tissues, skin, mesothelial tissues, neuroendocrine tissues, ovary, pancreas, prostate, connective tissue, testis, thyroid, and urinary tract (13). Recently, we updated this gene expression-based signature by eliminating lymphoma-related genes and reference tumor samples to reduce the influence of lymphocytes when classifying lymph node metastases. Therefore, a new version of the gene expression signature was developed using 90 tumorspecific genes corresponding to 21 major tumor types (14).
The aim of the current study was to evaluate the performance and highlight the potential diagnostic utility of this 90-gene expression signature for identifying the anatomical origin of TNBC tumors. In addition, exploratory analyses were conducted to examine and identify subsets of genes within the 90-gene panel for specific use in the diagnosis of TNBC.  Figure S1. In addition, 12 non-TNBC tumor samples including four cases with lymph node metastasis and eight cases with distal metastasis were enrolled in this study. Before inclusion, hematoxylin and eosin (H&E)-stained slides from tumor samples were reviewed by pathologists for evaluation of the percentage of tumor cells and necrotic areas. If fewer than 60% tumor cells or >40% necrotic area was present by inspection, regions of interest were circled on the H&E-stained slides, and the corresponding areas from unstained FFPE tissue sections were then manually macrodissected for tumor enrichment.

Sample Preparation and RNA Isolation
Total RNA was isolated from FFPE tissue sections using an FFPE Total RNA Isolation Kit (Canhelp Genomics, Hangzhou, China). Briefly, paraffin sections were placed in sterile 1.5-ml microcentrifuge tubes, deparaffinized with 100% xylene, and washed twice in 100% ethanol. Deparaffinized tissue was digested with proteinase K at 56 • C for 15 min and then incubated at 80 • C for another 15 min to partially reverse the crosslinking of nucleic acids. Samples were DNase treated and eluted in 40 µl of RNase-free water. The concentration of total RNA was spectrophotometrically determined using total absorbance at 260 nM, and purity was quantified using the A260/A280 ratio. RNA samples with A260/A280 ratios of 1.9 ± 0.2 were included in this study.

Expression Profiling of 90 Tumor-Specific Genes
For each sample, cDNA was generated from total isolated RNA using a High-Capacity cDNA Reverse Transcription Kit with RNase Inhibitor (Applied Biosystems, Foster City, CA, United States). The expression profiles of 90 tumor-specific genes were analyzed simultaneously on a 96-well plate using the Applied Biosystems 7500 Real-Time PCR (Applied Biosystems). The PCR program was initiated at 95 • C for 10 min, followed by 40 thermal cycles, each at 95 • C for 15 s and at 60 • C for 1 min. For each sample, the turnaround time of Real-Time PCR analysis is 90 min.

Data Analysis
Gene expression analysis was performed using R software and packages from the Bioconductor project (15)(16)(17)(18). For each specimen, the expression profile of the 90 tumor-specific genes was analyzed, and a similarity score was obtained for each of the 21 tumor types on the test panel (14). The similarity score represents the degree of certainty by which the gene expression pattern of the specimen matches the gene expression pattern of the indicated tumor type, and scores range from 0 (low certainty) to 100 (high certainty) with a sum of 100 across all 21 tumor types on the panel.
For each specimen, the predicted primary site of the tumor was compared with the reference diagnosis. In the current study, a true positive result was indicated when the predicted tumor type was breast cancer. When the predicted tumor type and reference diagnosis did not match, the result for that specimen was marked as an error. The end point was diagnostic accuracy, defined as the number of correct predictions divided by the total number of evaluable cases.
A non-parametric analysis (rank product) was performed to identify genes that were differentially expressed between the 115 TNBC (73 primary site samples and 42 lymph node metastasis samples) and 188 non-TNBC samples. Gene expression data for the 188 non-TNBC samples were retrieved from a comprehensive cohort of FFPE tumor samples that was used to assess the overall performance of the 90-gene expression signature on 21 major tumor types (14). All non-TNBC samples were collected from Fudan University Shanghai Cancer Center. The clinical characteristic of non-TNBC samples were summarized in Table S1. Of note, the 188 non-TNBC samples did not overlap with any of the 115 TNBC samples validated in this study. Genes with an estimated percentage of  false predictions (PFP) below 0.001 were selected as candidate markers for TNBC. Discriminative power of the selected genes was assessed by hierarchical clustering and visualized using a two-dimensional heat map to examine separation between TNBC and non-TNBC tumors.

Patients and Samples
All TNBC cases were confirmed as female patients. Cases were divided into three groups based on the biopsy site to include 42 paired primary breast tumors (PT) and 42 lymph node metastases (LNM), as well as 31 distant organ metastases (DOM). Table 1 shows detailed clinicopathological characteristics of the validation samples. The median age of patients at diagnosis was 51 years, ranging from 27 to 84 years. There were six early onset TNBC cases (≤35 years old) and sixty-seven late onset TNBC cases. Of 42 PT cases, 19 were on the left breast, and 23 were on the right breast. The most common histological subtype was invasive ductal carcinoma, while only one case was ductal carcinoma in situ. For DOM cases, lung, liver, and brain are the most common metastatic sites. We further investigated the family history of 73 patients. We found that five patients have a family history with breast cancer or ovarian cancer, and 14 patients have a family history with other neoplastic disorders, like colorectal cancer, gastric cancer, liver cancer and so on. Detailed clinical information for each patient is described in Table S2.

Performance of the 90-Gene Expression Signature in TNBC Primary Tumors, Lymph Node Metastases, and Distant Organ Metastases
Tissue sections from 115 samples were processed for isolation of total RNA, and concentrations ranged from 1.82 to 371.81 ng/µL, with a median of 63.23 ng/µL. The A260/A280 ratio ranged from 1.71 to 2.09.
The 90-gene expression signature exhibited 97.4% agreement with the reference diagnosis (112/115, 95% confidence interval: 0.92-0.99). The concordance rate was 97.6% (41/42), 97.6% (41/42), and 96.8% (30/31) for PT, LNM, and DOM cases, respectively ( Table 2). Distribution of the similarity scores for the three groups is shown in Figure 1A. For cases that were concordant with the reference diagnosis, the similarity score of PT cases ranged from 72.9 to 99.1, with a median similarity score of 96.1, whereas the similarity score of LNM cases ranged from 33.5 to 98.6, with a median similarity score of 86.6. The similarity score of cases that were discordant with the reference diagnosis was 31.4 and 65.7 for paired PT and LNM cases, respectively. Differences in the similarity score of paired samples are listed in Figure 1B. Next, we evaluated the biopsy site and prediction results for DOM cases. The 90-gene expression signature identified the correct tissue of origin for 30 of 31 samples. The median similarity score was 77.05, ranging from 16.8 to 96.5, for cases that were concordant with the reference diagnosis.
A discrepancy analysis was then performed to determine the characteristics of the three cases that were discordant with the reference diagnosis ( Table 3). In one case, PT and LNM cases from the same patient were predicted to be brain tumor and germ cell tumor, respectively. Another case that was discordant with the reference diagnosis was from a patient whose tumor was histopathologically diagnosed as TNBC that had metastasized to the lung but was identified as a germ cell tumor by the gene expression signature.

Performance of the 90-Gene Expression Signature in Non-TNBC Metastatic Tumors
A total of 12 non-TNBC metastatic cases were included in the study. The clinical data of 12 patients were characterized in Table S3. The cohort included seven males and five females with a median age of 58.5 years, ranging from 34 to 76 years. The metastatic sites of non-TNBC tumors included lymph node (four cases), brain (four cases), liver (three cases), and colorectum (one case). Thus, the distribution of metastatic site of non-TNBC tumors was very similiar to the distribution of metastatic site of TNBC. For the 12 cases, predictions of 90-gene expression  Comment. This patient has triple negative breast cancer that was confirmed by imaging examination of the breast. The imaging test and IHC stains were primarily non-diagnostic, and the 90-gene expression profiling of her initial biopsy predicted breast carcinoma, highlighting the organ that needed to be inspected.

Identification of Novel TNBC Biomarkers
Furthermore, rank product analysis was performed to select a small subset of genes from the 90-gene panel with diagnostic utility for TNBC. The top 17 upregulated genes and 15 downregulated genes with PFP below 0.001 were identified as candidate genes to distinguish TNBC tumors from other types of tumors. These genes are described in more detail in Table 4. Hierarchical clustering based on the 32 differentially expressed genes showed clear separation between 115 TNBC and 188 non-TNBC tumors, indicating excellent discriminative power of the selected candidate markers (Figure 2).

DISCUSSION
In this study, we evaluated the potential usefulness of a 90gene expression signature to identify TNBC using FFPE samples. Among all samples, 97.6% of PT (41/42) and LNM (41/42) and 96.8% of DOM (30/31) were correctly classified. Among the 19 patients have a family history with breast cancer or other neoplastic disorders, one with family history of colorectal cancer was misclassified. Furthermore, the 90-gene expression signature achieved 100% accuracy in six early onset patients and 97% accuracy in sixty-seven late onset patients. According to the results, there was no significant correlation between age, family history and signature performance. The overall accuracy of 97.4% reported here indicates excellent performance of the 90-gene expression signature in the identification of tumor origin in a heterogeneous group of TNBC tumors.
For the first time in 2006, Bryan et al. explicitly defined TNBC based on negative expression of ER, PR and HER-2 (19). Although TNBC has been extensively studied in clinical and pre-clinical settings, there are only a few reports on suitable differential diagnostics for this subtype of breast cancer. The standard IHC markers employed in most pathology laboratories are useful for identifying breast cancer; however, clinical identification of TNBC, in particular for metastatic tumors without a prior history of breast cancer, is difficult and thus urgently needed. Notably, in most cases, the treatment of choice is dictated by the differential diagnosis. For example, in the case of a small, solitary lung tumor in the absence of swollen lymph nodes, the patient can be treated with either chemotherapy or partial resection of the lung when the lung tumor is diagnosed as metastatic breast cancer. However, when the same tumor is diagnosed as primary non-small cell lung cancer, standard lobectomy may be the preferred treatment (9). Recently, the IMpassion130 trial showed that for patients with PD-L1-positive tumors, the combination of atezolizumab and nab-paclitaxel led to a significant improvement in overall survival of 25.0 vs. 15.5 months with nabpaclitaxel and placebo. Therefore, precise diagnosis of TNBC is more important than ever for clinical decision making in the era of immunotherapy (5). The results reported here demonstrate that the 90-gene expression signature reliably identifies TNBC in both primary and metastatic settings. To the best of our knowledge, this is the first report of a novel molecular assay that can be used to differentially diagnose TNBC.
Two specific patient examples were discussed to some extent to illustrate the pathologic and clinical significance of our 90gene expression signature. For the first case, histopathologically, a Villin+ lesion would be excluded from a diagnosis of breast cancer, and the diagnosis will be leading to the incorrect orientation if the pathologist completely relies upon this marker when imaging analysis showed no positive result. The molecular profile assay provides an effective approach to avoid current limitations of IHC, including inevitably low specificity, false positives, and lack of an accurate molecule biomarker for tumor origin. Clinically, in patients with highly suspected breast cancer, use of a molecular profile assay may be able to identify tumor attributes more quickly when imaging and IHC examination are ineffective. For the second case, the use of a molecular profile assay not only provided evidence for further selection of imaging examination methods but also provided suggestions on which part of the body needed to be examined.
Those three cases with discordant 90-gene expression signature results revealed an intriguing topic: how can TNBC be recognized as a brain or germ cell original tumor? Embryologically, both breast and brain develop from the outer layer of the ovule; therefore, it is understandable that tumors from these two organs may show high similarity in their gene profiles; on the other hand, it has been reported that basal-like breast cancers can exhibit high mRNA expression correlations with serous ovarian cancers and lung squamous carcinomas, suggesting that tumors from different organs may share the same driving events for carcinogenesis (20). Further endeavors should be focus on how to overcome this obstacle. Notably, 32 of the 90 genes in the panel were significantly differentially expressed between TNBC and non-TNBC tumors. Among these genes, the 17 genes upregulated in TNBCs are particularly interesting. Several of these genes are associated with TNBC. In a recent study by Lehmann et al. (21) KRT14 and KRT19 were found to be differentially expressed between basal-like and luminal-like TNBC. Expression of SFRP1 was found to be significantly higher in TNBC than in other breast cancer subtypes. Additionally, SFRP1 expression is significantly correlated with an increased probability of a positive response to neoadjuvant chemotherapy (22). Moreover, expression of KRT15 was found to be 3.8-and 3.5-fold higher in mammary stem cells than in myoepithelial cells and luminal cells of TNBC tumors, respectively (23). Regarding the remaining differentially upregulated genes, such as AZGP1, PIGR, SPINK1, RPS11, TACSTD2, and EPCAM, we are the first to report their overexpression in TNBC. Future studies on the proteins encoded by these genes may provide useful insights into potential novel markers for differential diagnosis of TNBC.
In conclusion, this 90-gene expression signature shows high accuracy in identifying primary and metastatic TNBC tumors, suggesting the potential of this 90-gene expression signature as a complementary tool to support the diagnosis of TNBC. In clinical practice, common sites of breast cancer metastasis are bone, lung, and liver. When there is a solitary mass in these organs, it is extremely important to differentiate whether it is a metastasis from breast cancer or a second primary. In cases where the morphology and immunohistochemistry work-up cannot confirm the primary (most often in the TNBC setting), the 90gene expression signature could provide valuable information for differential diagnosis. Furthermore, this molecular biomarker may also be useful for distinguishing primary TNBC tumors from other poorly differentiated tumors from rare tissue origin that metastasize to the breast, especially in the absence of an in-situ component in needle biopsy samples.