Distinct Microbial Signatures Associated With Different Breast Cancer Types

A dysbiotic microbiome can potentially contribute to the pathogenesis of many different diseases including cancer. Breast cancer is the second leading cause of cancer death in women. Thus, we investigated the diversity of the microbiome in the four major types of breast cancer: endocrine receptor (ER) positive, triple positive, Her2 positive and triple negative breast cancers. Using a whole genome and transcriptome amplification and a pan-pathogen microarray (PathoChip) strategy, we detected unique and common viral, bacterial, fungal and parasitic signatures for each of the breast cancer types. These were validated by PCR and Sanger sequencing. Hierarchical cluster analysis of the breast cancer samples, based on their detected microbial signatures, showed distinct patterns for the triple negative and triple positive samples, while the ER positive and Her2 positive samples shared similar microbial signatures. These signatures, unique or common to the different breast cancer types, provide a new line of investigation to gain further insights into prognosis, treatment strategies and clinical outcome, as well as better understanding of the role of the micro-organisms in the development and progression of breast cancer.


INTRODUCTION
Breast cancer, the second leading cause of cancer death in women, is responsible for the death of 1 in 52 women below 50 years of age (American Cancer Society, 2017). The American Cancer Society estimated that there will be 255,180 new breast cancer cases (2,470 men and 252,710 women) in the US by 2017 (American Cancer Society, 2017). Based on the immuno histochemical classification of hormone receptor status in the cancerous breast cells, there are 4 major groups of breast cancers: endocrine receptor (estrogen or progesterone receptor) positive (abbreviated in the study as BRER), human epidermal growth factor receptor 2 (HER2) positive (abbreviated in the study as BRHR), triple positive (estrogen, progesterone and HER2 receptor positive) (abbreviated in the study as BRTP) and triple negative (absence of estrogen, progesterone and HER2 receptors) (abbreviated in the study as BRTN) (Schnitt, 2010;American Cancer Society, 2017). These four types have specific prognoses and responses to therapy. Specifically, the hormone receptor positive breast cancers (BRER, BRTP) respond to endocrine therapy and show better prognosis, while the hormone receptor negative types (BRHR, BRTN) are more aggressive, non-responsive to endocrine therapy and have poor prognosis (Schnitt, 2010). BRTN cancer is seen in 15-20% of breast cancer patients, is the most aggressive of all the breast cancers, is unresponsive to treatment, highly angiogenic, proliferative and has the lowest survival rate (Siegel et al., 2016).
Among the risk factors to develop cancer in general, infectious agents are known to be the third highest after tobacco usage and obesity, contributing 15-20% of cancer incidence (Morales-Sanchez and Fuentes-Panana, 2014). Age and genetic pre-disposition are also known cancer risk factors; however, the majority of cancers have unknown etiology (Madigan et al., 1995). Recent studies of microbiome dysbiosis in human health suggest specific changes in the microbiome in a number of disease states (Turnbaugh et al., 2006;Xuan et al., 2014;Chen and Wei, 2015), including cancer (Sheflin et al., 2014;Xuan et al., 2014). Further, studies have suggested the association of particular micro-organisms with specific cancers (Banerjee et al., 2015(Banerjee et al., , 2017aChen and Wei, 2015). Thus, a distinct microbiome may contribute to the cause or development of cancer. Conversely, the tumor micro-environment may provide a specialized niche in which these viruses and microorganisms may persist. In either case, cancer-type specific microbial signatures may provide clues for early diagnosis, prognosis and the design of treatment strategies.
We have recently identified a distinct microbial signature associated with triple negative breast cancer (Banerjee et al., 2015). In the present study we asked whether the microbial signatures associated with BRTN are shared by other breast cancer types, or do different breast cancer types have unique signatures. To study this we screened BRTN, BRTP, BRER, and BRHR samples using PathoChip, a pan-pathogen array containing oligonucleotide probes for the detection of all known, sequenced viruses, as well as known human bacterial, parasitic, and fungal pathogens. Additionally, PathoChip contains viral family specific conserved probes that allow for detection of uncharacterized members of the viral families (Baldwin et al., 2014). The PathoChip screen includes a whole genome and transcriptome amplification step that allowed detection of very low copy number of both DNA and RNA viruses and micro-organisms from the cancer tissues (Baldwin et al., 2014;Banerjee et al., 2015). Our analyses now show distinct microbial signatures for BRTN and BRTP samples, while the BRER and BRHR samples had similar microbial signatures.

Study Samples
The study was approved by the institutional review board at the University of Pennsylvania (Protocol number 819358). All methods were performed in accordance with the relevant guidelines and regulations and reviewed by resident pathologists at the UPENN hospital. In the present study, 50 endocrine receptor (estrogen or progesterone receptor) positive (abbreviated as BRER in the study), 34 human epidermal growth factor receptor 2 (HER2) positive (abbreviated as BRHR in the study), 24 triple positive (estrogen, progesterone and HER2 receptor positive) (abbreviated as BRTP in the study) and 40 triple negative (absence of estrogen, progesterone and HER2 receptors, abbreviated as BRTN in the study) breast cancer tissues were included along with 20 breast control samples from healthy individuals. Due to HIPAA regulations, we could not obtain any information regarding the type of treatment these breast cancer patients received, or, if they were new patients. These tissues were obtained as de-identified archived samples. Tumors needing macro-dissection were received in the form of 10 µm sections on glass slides with marked guiding H&E slides, while tumors that did not require macro-dissection were received as 10 µm paraffin rolls. The 20 non-matched control tissues were derived from breast reduction surgeries and obtained as 10 µm paraffin rolls. Our resident pathologist reviewed case history, confirmed the tumor types and demarcated the cancer cells. All the samples were de-identified FFPE (formalin fixed paraffin embedded) samples of breast tumors or controls, and were received from the Abramson Cancer Center Tumor Tissue and Biosample Core. Extreme care was taken to avoid contamination during cutting of the FFPE sections (Banerjee et al., 2015). For each samples, microtome and other equipments were cleaned with 70% ethanol. Further, a new blade was used to prepare and cut each sample, and the area was also de-contaminated before cutting each sample (Banerjee et al., 2015).

Pathochip Design, Sample Preparation, and Microarray Processing
The PathoChip Array design has been previously described in detail (Baldwin et al., 2014). The PathoChip probes were generated in silico using the genome sequences of all known viruses, as well as known human bacterial, parasitic and fungal pathogens. The PathoChip comprises 60,000 probe sets manufactured as SurePrint glass slide microarrays (Agilent Technologies Inc.), containing 8 replicate arrays per slide. Each probe is a 60-nt DNA oligomer that targets multiple genomic regions of the micro-organisms, for example, 18S rRNA gene, 5.8S rRNA gene, 28S rRNA gene, ITS1 and ITS2 for parasite detections, 16S rRNA gene for bacteria detections, 18S rRNA gene, ITS1, 5.8S rRNA gene, ITS2 and 26S rRNA gene to detect fungi, and conserved and specific viral genes to detect viral families and specific viruses. PathoChip screening was done using both DNA and RNA extracted from formalin-fixed paraffin-embedded (FFPE) tumor tissues as described previously (Baldwin et al., 2014;Banerjee et al., 2015). The quality of the extracted nucleic acids was determined by agarose gel electrophoresis and the A260/280 ratio. The extracted RNA and DNA were subjected to whole genome and transcriptome coamplification (WTA) as previously described (Banerjee et al., 2015). A non-template control (RNAase/DNAase free water) was used during the WTA step, to determine if any contamination was present during the amplification step. The quality of the WTA products was determined by agarose gel electrophoresis. Human reference RNA and DNA were also extracted from the human B cell line, BJAB and were used for WTA as previously described (Banerjee et al., 2015). The WTA products were purified, (PCR purification kit, Qiagen, Germantown, MD, USA); the WTA products from the cancers were labeled with Cy3 and those from the human reference DNA were labeled with Cy5 (SureTag labeling kit, Agilent Technologies, Santa Clara, CA). The labeled DNAs were purified and hybridized to the PathoChip as described previously (Banerjee et al., 2015). Posthybridization, the slides were washed, scanned and visualized using an Agilent SureScan G4900DA array scanner (Banerjee et al., 2015).
The question of potential contamination of FFPE blocks or during processing is always a concern. In these experiments all samples were handled and processed in the pathology laboratory using standard aseptic conditions. Likewise the preparation of the DNA and RNA from the samples was done in a dedicated laboratory under established condition designed to minimize laboratory contamination.

Microarray Data Extraction and Statistical Analysis
Agilent Feature Extraction software (Baldwin et al., 2014;Banerjee et al., 2015) was used to extract the raw data from the microarray images. We used the R program for normalization and data analyses (R Core Team, 2015). We calculated scale factor using the signals of green (Cy3) and red (Cy5) channels for human probes. Scale factors are the sum of green/sum of red signal ratios of human probes. Then we used scale factors to obtain normalized signals for all other probes. For all probes except human probes, normalized signal is log2 transformed of green signals/scale factors modified red signals (log2 g -scale factor * log2 r). On the normalized signals, one-sided t-test is applied to select probes significantly present in cancer samples by comparing cancer samples vs. controls. The significance cutoff was log2 fold change of signal ≥1 and adjusted p-value (all p-value were adjusted via Benjamini-Hochberg procedure for controlling FDR) ≤0.01, control prevalence ≤25%, case prevalence ≥40%. Prevalence is calculated as the detection of the microbial signatures in the cancer and in the control samples as percentages. For a particular microbial signature with multiple probes, we calculated the prevalence of that signature by calculating the maximum number of samples that contained even one of the probes of that signature.
The cancer samples were also subjected to hierarchical clustering, based on the detection of microbial signatures in the samples. We used hierarchical clustering technic (Euclidean distance, complete linkage, normalized hybridization signals not scaled) to cluster samples which were represented as heatmaps (Kolde, 2015). Then clusters were further validated by CHindex (Calinski and Harabasz index) which is implemented in the R package as NbClust (Charrad et al., 2014). CH-index is a cluster index that maximizes inter-cluster distances and minimizes intra-cluster distances. We calculated the possible cluster solution that would maximize the index values to achieve the best clustering of the data. Statistical significance between different groups was determined using the two-sided t-test.
Based on the clinical outcomes of the different breast cancer patients, the cases for each breast cancer types were divided into two groups: alive and deceased (with severe outcomes) (Supplementary Table S4). We calculated the proportion of the two groups in each of the hierarchical cluster/sub-cluster of the 4 breast cancer types. One sided t-test was also done to compare the differences of average hybridization signals of organisms in these two groups. Nominal p-values along with log fold change were calculated. Microbial signatures that were detected with significantly (nominal p-value < 0.05) higher average hybridization signal in the deceased cases or in the patients that survived were selected to do box plots for representation of the data. Also differences in the detection of some signatures which were not statistically significant between the different outcomes, but showed some trend were plotted as well. Where, the pvalue > 0.05, we can only suggest that higher detection of those microbial signatures with either of the outcome, is only seen as a trend.

PCR Validation of Pathochip Results
PCR primers from the conserved and/or specific regions of the micro-organisms detected by PathoChip screen were used. The PCR amplification reaction mixtures for each reaction contained 200-400 ng of WTA product and 20 pM each of forward and reverse primers ( Table 7), 300 µM of dNTPs and 2.5U of LongAmp Taq DNA polymerase (NEB). DNA was denatured at 94 • C for 3 min, followed by 30 cycles of 94 • C for 30 s, different annealing temperature for different set of primers for 30-45 s, and 65 • C for 30 s. The PCR conditions for each of the primer sets are mentioned in Table 7.

Microbial Signatures Associated With Different Breast Cancer Types
Unique and common microbial signatures associated with different breast cancer types have been listed in Table 1 and are represented in Figures 1A, 2B, 3C,F. To establish the microbial signatures in the cancers we compared the average hybridization signal for each probe in the cancer samples vs. the controls. Those probes that detected significant higher hybridization signals in the cancer samples (p-value < 0.05, log2 fold change in hybridization signal > 1), present in atleast 40% of the cancer samples, and ≤25% of the controls were considered in the present study. A stringent cut-off criteria of microbial signature detections only in the cancers and not (0% prevalence) in the controls lead mostly to detect less number of probes for a We further averaged the hybridization signals of all the significant probes for each microbial genera and viral families, shown in the Figures 1-3. Supplementary Table S1 shows the average hybridization signals of the probes of microorganisms significantly detected in the cancers vs. the controls, with respective adjusted p-values with multiple corrections. Supplementary Table S2 shows the proportion of probes that were detected significantly in each of the breast cancer types vs. the controls. Supplementary Figure S1 shows the average fold change in hybridization signal intensity for the significantly detected probes of each of the signatures detected in the different breast cancer types over their respective signals in the control breast samples. Additionally, we calculated the percent prevalence of the significant microbial signatures in the cancer samples, which indicate how prevalent a significant virus or microorganism signature is in the cancer samples regardless of the hybridization intensity. For example, hybridization signals for Polyomaviridae probes were 4, 6, and 3% of the total hybridization signals detected in BRER, BRTP, and BRHR respectively. (D) Prevalence of viral signatures in 4 breast cancer types. Since the hybridization signals for Polyomaviridae, Hepadnaviridae and Parapoxviridae were lower than the cut-off (log2 fold change in hybridization signal >1) in one or more breast cancer types they are depicted as negative in this figure. However, (E) shows the heat map of hybridization signals for those viral signatures to be still significantly higher in the cancers when compared to the control.

Viral Signatures Associated With Different Breast Cancer Types
Significant hybridization (described above), at levels above the controls, was detected for 28 viral families among the four breast cancer types (Figures 1A,D). Of these, 17 viral families were detected with significantly higher hybridization signals in greater than 50% of the samples representing all 4 breast cancer types, as compared to the controls (Figures 1B,D). They include signatures of Adenoviridae, Anelloviridae, Arenaviridae, Bunyaviridae, Coronaviridae, Filoviridae, Flaviviridae, Herpesviridae, Iridoviridae, Papillomaviridae, Paramyxoviridae, Parvoviridae, Picornaviridae, Poxviridae, Reoviridae, Retroviridae, and Rhabdoviridae ( Figure 1B). Importantly, in examining the percent hybridization signal ( Figure 1C) and percent prevalence ( Figure 1D) we find that there were a number of viral families significantly detected only in a subset of breast cancer types. Specifically, the signatures for Birnaviridae and Hepeviridae were only detected in BRTP; and Nodaviridae only in BRHR (Figures 1C,D). Further examination of the percent prevalence ( Figure 1D) Hybridization signal intensity offers an additional way to compare the data. Here we noted marked differences for specific viral families between the different breast cancer types. For example, probes for polyomaviridae were detected with the highest hybridization signal in the BRHRs, followed by BRERs and BRTPs ( Figure 1E). Polyomaviridae were detected  in the BRTNs compared to the controls; however, at a lower hybridization signal (log2 fold change in hybridization signal = 0.4-1; Figure 1E) which is below the cut-off to consider the signal positive, thus polyomaviridae are not shown to be present in the BRTNs in Figure 1C or Figure 1D. Similarly, probes of Hepadnaviridae were significantly detected with low hybridization signal in the BRTNs (Figure 1E), while detected with higher hybridization signal intensity (log2 fold change in hybridization signal >1) in the BRERs and BRTPs (Figure 1E).
Signatures of Herpesviridae, Adenoviridae and Poxviridae were detected in >90% of the BRER samples screened (Figure 1D), while the highest hybridization signal was detected for Anelloviridae and Flaviviridae ( Figure 1B). Signatures of Astroviridae, Herpesviridae, Reoviridae were detected in all of the BRTP samples tested (Figure 1D), with the highest hybridization signal detected for Polyomaviridae signatures ( Figure 1C). For BRHR samples, signatures of Reoviridae and Flaviviridae were detected in >90% of the samples screened ( Figure 1D), with signatures of Togaviridae showing the highest hybridization signal ( Figure 1C). Among the BRTN samples, we detected signatures of Reoviridae in 90% of the samples screened ( Figure 1D), with signatures of Picornaviridae and Anelloviridae with the highest hybridization signal ( Figure 1B).
Probes of Poxviridae family were detected significantly in >80% of all the breast cancer types analyzed. Interestingly, probes of Parapoxviridae were detected significantly with high hybridization signal intensity in BRER cancers vs. the controls ( Figure 1E). Probes of Parapoxviridae were also detected significantly in the other 3 types of breast cancers compared to the controls, but showed much lower hybridization signal intensity for those probes (log2 fold change in hybridization signal ∼0.5) ( Figure 1E).
The data show that the cancer samples as a whole have a robust viral signature. However, there are significant and defining differences between the four types with BRTN having the least complex viral signature.
In the healthy control breast tissues, signatures of the viral families Arteriviridae, Hepadnaviridae, Hepeviridae, and Nodaviridae were not detected which were detected in one or more of the cancer types ( Figure 1D).

Bacterial Signatures Associated With Different Breast Cancer Types
Figures 2A-E shows the analysis of bacterial signatures in the 4 breast cancer types. Significant hybridization, above the levels of the controls, was detected for 56 bacterial genera; the majority (50-60%) was proteobacteria, the major group of gram negative bacteria. These phyla partitioned into bacterial signatures unique to each cancer types, as well as signatures that were common to multiple breast cancer types (  (Figures 2B,C).
The marked diversity in bacterial signatures between the breast cancer types are shown in Figure 2B. We identified distinct bacterial signatures uniquely associated with each type of breast cancer analyzed. In this regard BRTN had the least complex bacterial signature, while BRER is the most complex (Figures 2D,E). Signals for Arcanobacterium, Bifidobacterium, Cardiobacterium, Citrobacter, Escherichia were significantly detected in the BRER samples compared to the controls, while those of Bordetella, Campylobacter, Chlamydia, Chlamydophila, Legionella, and Pasteurella were significantly associated with the BRTPs. Signals for Streptococcus were detected significantly in the BRHRs, whereas, Aerococcus, Arcobacter, Geobacillus, Orientia, and Rothia were found associated with the BRTNs.
Hybridization signal intensity again provides an additional view of the complexity of the bacterial community and its diversity among the different breast cancers (Figures 2C,D). Signals for Brevundimonas were detected with higher average hybridization signals in the endocrine receptor positive BRER and BRTP compared to the endocrine receptor negative BRHR and BRTN (Figures 2C,D). Hybridization signals of Mobiluncus and Mycobacterium were predominantly detected in the endocrine receptor negative samples.
Bacterial signatures of Actinomyces were detected in all 4 cancer types; however their hybridization signal intensity was markedly lower in the BRTN samples ( Figure 2C). Similarly, Bartonella was significantly detected in all cancer types, but its hybridization signal intensity was markedly lower in the BRER samples compared to the others ( Figure 2C). The bacterial probes detected with the highest hybridization signals were those for Acinetobacter in BRER and BRHR samples, Brevundimonas in BRTP samples and Caulobacter in BRTN samples ( Figure 2D). As in the case of the viruses our data show that the cancer samples have a robust bacterial signature with significant and defining differences between the four breast cancer types. The healthy control samples did not have some of the bacterial signatures that were detected in one or more of the cancer types, namely, Actinomyces, Aerococcus, Arcanobacterium, Bifidobacterium, Bordetella, Cardiobacterium, Corynebacterium, Eikenella, Fusobacterium, Geobacillus, Helicobacter, Kingella, Orientia, Pasteurella, Peptinophilus, Prevotella, Rothia, Salmonella, and Treponema ( Figure 2E).

Fungal Signatures Associated With Different Breast Cancer Types
Significant hybridization, above the levels of the controls, was detected for 21 different genera of fungi among the 4 types of breast cancer (Figures 3A,B). Interestingly, none of these families were detected in all four cancer types (Figures 3B,C). In fact the fungi signatures for each type of breast cancer were relatively unique; only 7 fungal families (Aspergillus, Candida, Coccidioides, Cunninghamella, Geotrichum, Pleistophora, and Rhodotorula) were detected in more than one type of breast cancer. The receptor positive cancer samples (BRER and BRTPs) showed much more complex fungal diversity than the BRTN samples ( Figures 3A,B). Table 1 and Figure 3C show the unique fungal signatures associated with different breast cancer types. Fungal signatures of Filobasidiella, Mucor, and Trichophyton were found to be significantly associated with BRER samples, Penicillium with BRTP samples, Epidermophyton, Fonsecaea, Pseudallescheria with BRHR samples and Alternaria, Malassezia, Piedraia, and Rhizomucor with BRTN samples. Fungal signatures of Ajellomyces, Alternaria, Cunninghamella, Epidermophyton, Filobasidiella, Rhizomucor, and Trichophyton detected in one or more breast cancer types were not detected in the healthy controls ( Figure 3B).

Parasitic Signatures Associated With Different Breast Cancer Types
Significant hybridization, above the levels of the controls, was detected for 29 different genera of parasites among the 4 types of breast cancer (Figures 3D,E). As in the case of the fungi, no single genus of parasite was significantly detected in all four breast cancer types (Figures 3E,F). Each cancer showed a relatively distinct parasitic signature pattern, with BRHR showing the least diverse signatures. Table 1 and Figure 3F shows the unique and common parasitic signatures among the different breast cancer types.
Analysis of hybridization signal intensity in Figure 3D shows that Plasmodium was detected with the highest hybridization signal in the BRHR samples and also detected in the BRER samples and BRTP samples but not in BRTN samples. In BRTN the highest hybridization signal intensity was detected for the probes of Mansonella followed by Centrocestus, whereas Strongyloides was detected in almost all of the BRTN samples. Naegleria was detected with the highest hybridization signal intensity in BRTP (Figure 3D) while Sarcocystis and Babesia were detected in 92% of BRTP samples (Figure 3E). Among the BRER samples, Brugia showed the highest hybridization signal intensity (Figure 3D), while Thelazia showed the highest prevalence ( Figure 3E). Signatures of Brugia and Paragonimus were only detected in BRER samples (Table 1, Figures 3D,E). Ancylostoma, Angiostrongylus, Echinococcus, Sarcocystis, Trichomonas, Trichostrongylus were found uniquely associated with BRTP samples (Table 1, Figures 3D-F). Balamuthia signatures were associated significantly with BRHR samples, and that of Centrocestus, Contracaecum, Leishmania, Necator, Onchocerca, Toxocara, Trichinella, and Trichuris were detected significantly only with BRTN samples (Table 1,  Figures 3D-F). Signatures of Ancylostoma, Ascaris, Centrocestus, Contracaecum, Hartmanella, Leishmania, Paragonimus, Thelazia, Toxocara, Trichinella, Trichuris detected in one or more cancer types were not detected in the healthy controls ( Figure 3E).

Hierarchical Clustering of the Breast Cancer Samples Based on the Detection of Microbial Signatures
Using the hierarchical clustering analysis based on the detection of microbial signatures associated with the 4 breast cancer types we determine if the breast cancer types fell into any unique and identifiable clusters. While this analysis identified distinct clusters in each of the breast cancer types based on the detection of their microbial signature patterns (Figures 4A-D), it also defined the distinct microbial signature pattern found in BRTNs and BRTPs whereas, BRER and BRHR shared similar microbial signatures ( Figure 4E).
Individually, the different BC types fell into distinct microbial signature clusters. BRER samples fell into 2 distinct clusters 1ER and 2ER, along with 2 ungrouped samples (ungrouped 1ER) ( Figure 4A). Samples grouped in Cluster 1ER and 2ER differed significantly based on the higher detection of mostly bacterial and viral and certain fungal and parasitic signatures in the samples of Cluster 2ER ( Table 2). The ungrouped BRER samples (ungrouped 1ER) were significantly different from clusters 1ER and 2ER ( Table 2).
Majority of the BRTP samples had similar microbial detections and grouped together into 1 major cluster (cluster 1TP), while few samples remained ungrouped (Figure 4B).
The BRHR samples formed 2 major clusters (cluster 1HR and cluster 2HR) (Figure 4C), and they differed from each other in having higher detection of certain bacterial and viral signatures in cluster 2HR compared to samples in cluster 1HR (Table 3). Bacterial signatures of Kingella, Brevundimonas, Eikenella, Bartonella, Acinetobacter, Nodaviridae, Actinomyces, Aeromonas, Mobiluncus, Fusobacterium, Alcaligenes, Brucella, and Staphylococcus; viral signatures of Orthomyxoviridae, Parvoviridae, Papillomaviridae, Nodaviridae, and Astroviridae and fungal signatures of Aspergillus showed significant higher detection in cluster 2HR. The 3 BRHRs that could not be grouped (ungrouped 1HR and 2HR) showed higher detection of certain microbial signatures listed in Table 3 compared to the clustered BRHR samples; in particular, included the parasitic signature of Entamoeba and bacterial signatures of Listeria and Corynebacterium.
The BRTN samples formed two distinct clusters (cluster 1TN and 2TN) with 2 samples that did not cluster into distinct group (ungrouped 1TN) ( Figure 4D). Cluster 1TN differed from Cluster 2TN in having higher detection of bacterial probes of Caulobacter, Brevundimonas, Peptoniphilus, Rothia, Geobacillus, Aerococcus, Mobiluncus, Actinomyces, Bartonella, fungal probes of Malassezia, Piedraia, Rhodotorula, Rhizomucor and parasitic signatures of Leishmania, Toxocara, Contracaecum, Centrocestus, Trichuris, Strongyloides (Table 4). Whereas, samples in Cluster 2TN had significant higher hybridization signal intensity for viral signatures of Poxviridae, Paramyxoviridae, Reoviridae, Parvoviridae, Arenaviridae, bacterial signatures of Sphingomonas, Brucella, Orientia, Stenotrophomonas, fungal signatures of Pleistophora and parasitic signatures of Trichinella. The ungrouped samples differed from the grouped samples in having significantly higher detection of certain viral probes of Anelloviridae, Retroviridae, Poxviridae, and Arenaviridae compared to Cluster 1TN and Cluster 2TN samples ( Table 4). Figure 4E shows the comparison of the microbial signatures from all four breast cancer types together in the clustering analysis. The data show that the different breast cancers grouped into 4 major clusters plus a few ungrouped BRER (2 samples), BRHR (3 samples), and BRTN (2 samples) samples (ungrouped 1, 2, and 3 respectively). Most of the BRTNs were very distinct in their microbial signature pattern association, and they clustered together (cluster 3). Similarly all the BRTPs screened clustered together to form a distinct cluster 4. Conversely, most of the BRER samples shared a similar microbial signature pattern with all of the BRHR samples forming the distinct cluster 1, while the remaining 11 BRER samples formed cluster 2. The BRERs in cluster 2 differed from those in Cluster 1 in having significant   (Table 5). Thus, we identified specific microbial signature patterns associated with different breast cancer types. It will be interesting to see if such distinct microbial signature pattern associated with different breast cancer types, correlate to differences in pathogenesis and clinical outcome.

Association of Microbial Signatures With Clinical Outcomes in the Four Breast Cancer Types
The samples we used in this study were de-identified samples. Thus due to HIPPA regulations we were able to procure only limited sub-set of data from the Tumor Registry. This included outcome, specifically whether the patient was alive or dead since  diagnosis and treatment; the cause of death and length of survival were not available. These data provide only indications of trends which will have to be statistically verified in future studies using samples with associated clinical data. For these analyses the hierarchical clustering for each of the four different breast cancer types were further grouped into sub-clusters based on microbial detections ( Figure 5A). In the BRTNs the cases of sub-cluster 2b had the highest (63%) proportion of the patients who had died, followed by that of Cluster 1 (33%); while sub-clusters 2a and 2c had a higher number of surviving patients ( Figure 5B). The shared feature of sub-clusters 1 and 2b is a higher detection of fungal and parasitic signatures (Figure 5A). BRTP samples did not fall into discrete sub-clusters ( Figure 5A), but overall BRTP showed 82% surviving patients. For BRER samples, sub-clusters 1a, b and c had similar numbers of patients who had died (25, 22, and 33%, respectively), while these numbers were much lower for subclusters 2a and 2b. Sub-clusters 2a and 2b are notable in that they have an overall more robust and diverse microbial signature. Examining the sub-clusters for BRHR shows a high number of surviving patients in all sub-clusters (1a, 1b, and 2; 75, 86, and 85%, respectively). Within the limits of the data these analyses suggest that the specific microbial signatures may correlate with outcome especially in the case of BRTN.
Using the survival data, we also examined variation in average hybridization signal for microbial signatures between the breast cancer types (Figure 6 and Table 6). Interestingly, these analyses showed that high hybridization signals of specific viruses and microbes in a particular breast cancer type may trend with patients who had died, others trended with surviving patients. For example, in BRTP Herpesviridae signatures were detected significantly higher in BRTP patients who had died. Similarly, BRTN patients who had died had significant higher hybridization signals for certain fungal (Malassezia, Rhizomucor, Rhodotorula) and parasitic (Centrocestus, Strongyloides, Trichuris, Contracaecum, Leishmania) signatures. In the BRERs we found a trend of higher detection of the bacterium Peptinophilus signatures in the deceased cases. Similarly, we  found a trend of higher detection of certain bacteria (Listeria, Lactobacillus, Borrelia) in the BRHR cases with severe outcome. Conversely, high hybridization signals for Paramyxoviridae, Astroviridae, and Polyomaviridae were found with greater frequency, respectively, in the BRTN, BRTP, and BRER cancer patients who survived. Additionally, high hybridization signals for the bacteria Sphingomonas and the fungus Aspergillus were detected in the BRHR patients who survived. Again within the limits of the clinical data these finding suggest that the qualitative and quantitative nature of the microbial signatures associated with a patient's cancer may provide diagnostic and prognostic information.

Validation of Pathochip Screen Results by PCR
We selected several viruses and microorganisms detected in the BC samples for verification by non-quantitative PCR and sequencing, these included several viral families and individual viruses (Herpesvirus, Polyoma, Papilloma, Parapox, and MMTV), as well as a prevalent bacterium (Brevundimonas), and fungus (Pleistophora). The primers used were either previously published (Table 7) or were designed based on sequences from the conserved and specific regions of the microorganisms. For detection of parasites we used pan-parasite diagnostic PCR primers enabling exhaustive detection of nonhuman eukaryotic species-specific small subunit rDNA in human clinical samples. For the validation experiments we used the WTA prepared and used for the initial screening. The PCR amplification showed the expected amplicons for the PathoChipdetected viruses, as well as the selected bacterium, fungus and parasite (Figure 7). Sequencing of the PCR products verified the detection of the appropriate virus or other microorganism (Supplementary Table S3, Supplementary Figure S3).

DISCUSSIONS
The human microbiome is comprised of mutualistic, pathogenic, transient and residential viruses and microorganisms. Many recent studies have suggested that the body's microbiome dramatically affects health, where perturbation of the microbiome leads to altered physiology and pathology, including cancer. However, the reverse may also be true, that different human diseases create disease microenvironments amenable to the persistence of a differential microbiome, with or without a direct effect on the establishment or progression of the disease. Such differential microbiomes could be specific to each such disease. Using our in-house metagenomic array technology (PathoChip), we previously established distinct microbial signatures in triple negative breast cancers (BRTNs) (Banerjee et al., 2015). In the present study we determined the microbial signatures that were significantly higher in the 4 major breast cancer types (BRTN, BRTP, BRER, BRHR) compared to the healthy breast control tissues, and also determined whether the microbial signatures associated with the BRTNs was a specific feature of BRTNs, or a generic feature shared with other types of breast cancers.    Nominal p-value (p-value) and p-value with multiple correction (adjust p-value) for each microbial signature detection along with the log2 fold change (logFC) for the t-tests are mentioned.
Our data showed that the various breast cancers have robust and varied micro-organisms with aspects that are unique to each type as well as shared components. The data suggest that breast cancer microbial signatures may provide type-specific communities of organisms unique to each breast cancer type. We also point out that our control FFPE samples, processed in the same way as tumor samples, had different signatures, generally with much lower hybridization signals, arguing against gross contamination.
Examining viral signatures we found that the majority of the viral families detected were associated with all 4 breast cancer types. However, several important viruses were differentially detected; for example, among known oncogenic viruses the signatures of Polyomaviridae were detected with high significance and high signal intensity in the BRER and BRHR samples and with low signal intensity in the other breast cancer types. Signatures of Hepadnaviridae were similarly detected in BRER and BRTPs with high signal intensity, but with very low signal intensity in the other two cancer types. It is intriguing that signatures for Parapoxviridae family were found in all the breast cancers with BRERs showing the highest level of detection. Parapox viruses are known to have homologs to human genes responsible for angiogenesis (Ueda et al., 2003;Delhon et al., 2004).
There were a number of bacterial families shared by all four breast cancer types. For example, all four breast cancer types had dominant signatures for Proteobacteria followed by Firmicutes. The presence of these two bacterial phyla in the breast cancer tissues has been reported (Urbaniak et al., 2014(Urbaniak et al., , 2016Hieken et al., 2016), and suggested to be a result of adaptation to the fatty acid environment and metabolism in the tissue (Urbaniak et al., 2014). Another study found a positive correlation between Proteobacteria and the metabolic by-products of fatty acid metabolism, along with host-derived genes involved in fatty acid biosynthesis (El Aidy et al., 2013). In particular, the signature of the proteobacteria Brevundimonas genus was detected with high hybridization signal and prevalence in all four breast cancer types. Brevundimonas causes bacteremia and has been found associated with immunocompromised and/or cancer patients in other studies (Han and Andrade, 2005;Lee et al., 2011;Banerjee et al., 2015). Additionally, the Mobiluncus family was detected in all four types, it is mostly known to be associated with bacterial vaginosis (Gatti, 2000); however, the association of breast cancers may correlate with recent studies showing an association with breast abscesses and extragenital infections (Glupczynski et al., 1984;Sturm, 1989). We also detected Actinomyces signatures in all four breast cancers, especially in BRHRs where it was detected with very high signal intensity. Previous studies have reported Actinomycosis in the breast tissues of breast cancer patients (Aamir and Bokhari, 2005;Abdulrahman and Gateley, 2015;Banerjee et al., 2015), as primary (Salmasi et al., 2010), or secondary infections (Brunner et al., 2000) in breast, and in breast abscess (Attar et al., 2007). Additionally, each type of breast cancer held signatures for unique bacterial genera, and may provide an ability to detect specific breast cancer types. Fungal infections in cancer patients are common. Among the fungal signatures we detected were yeasts like Candida, Geotrichum, Rhodotorula, Trichosporon as well as fungi causing Mucormycosis, Aspergillosis (cutaneous infections) and dermatophytes like Epidermophyton and Trichophyton are commonly known to be associated with cancers (Mays et al., 2006;Ansari et al., 2015;Banerjee et al., 2015;Jung et al., 2015;Rodríguez-Gutiérrez et al., 2015;Berkovits et al., 2016). Also, we detected Fonsecaea infection, which is seen to predispose squamous cell carcinoma development (Azevedo et al., 2015).
Possibly the most intriguing and unexpected result of the PathoChip screening is the detection of parasite signatures in different breast cancer types. These signatures were quite unique to the different breast cancer types with no signal parasite being prevalently found in all four. Many parasite signatures were distinctly detected in only one type of breast cancer. It should be kept in mind that our sensitive detection approach allows us to detect low abundance organisms, as well as unknown members of parasite families. However, the association of specific parasites with cancer is known. Among the parasites detected, Trichinella (detected in BRTN) has been found in a patient with recurrent ductal invasive breast carcinoma (Kristek et al., 2005). Schistosoma (detected in BRTN, BRTP) has been linked to bladder cancer (Samaras et al., 2010;Benamrouz et al., 2012); additionally we detected signatures of Ascaris (BRHR, BRER) and Trichuris (BRTN) which have been associated with pediatric cancers (Menon et al., 1999). Similarly, Strongyloides (BRTN, BRHR) has been associated with adult cancer patients (Guarner et al., 1997). Other signatures detected, Leishmania (BNTN) and Plasmodium (BRHR, BRTP, BRER), induce the inhibition of apoptosis (Heussler et al., 2001), which may promote oncogenesis (Lowe and Lin, 2000).
It was interesting to further investigate if detection of certain microbial signatures in breast cancers differed among patients who survived or died. We noticed higher detection of certain parasitic and fungal signatures in BRTN patients who died. Of particular interest in these analyses was the finding of high hybridization signals of specific viruses and microbes in a particular breast cancer type that may trend with patients who died, while others trended with surviving patients. Within the limits of the clinical data that could be provided, our FIGURE 7 | PCR validation of microbial signatures in the 4 types of breast cancers and healthy control, using the primers from Table 7. Among the breast cancer types, the endocrine receptor (estrogen/progesterone) positives are abbreviated as BRER, human epidermal growth factor receptor 2 positives are abbreviated as BRHR, triple positives (estrogen, progesterone, and HER2 receptor positive) are abbreviated as BRTP and the triple negatives (absence of estrogen, progesterone, and HER2 receptors) are abbreviated as BRTN. The breast control samples obtained from healthy individuals are abbreviated as NC. The left shows the cropped gel pictures of EtBr stained amplicons run on agarose gel, where M is DNA ladder of RsaI digested φX/174, NTC is non-template control. The sequenced amplicons were subjected to nucleotide blast program in NCBI, and the results are shown in the right. In the Polyomavirus PCR gel picture, the orange and the green arrow heads signify Simian virus 40 and Merkel cell polyomavirus amplicons respectively, the electropherogram of the sequences of which are marked with the same arrow heads in Supplementary Figure S3. findings suggest that the qualitative and quantitative nature of the microbial signatures associated with a patient's cancer may provide diagnostic and prognostic information.
Our findings suggest that the micro-organisms in breast cancers are diverse, extensive and have unique aspects that differentiate the four different breast cancers tested. We represented the microbial signatures that were significantly higher in the breast tumor microenvironment, when compared to healthy breast tissues. Some of these tumor microbial signatures overlapped with the reported skin microbiome (Findley and Grice, 2014;Hieken et al., 2016). For example: Bacteria like, Lactobacillus, Prevotella, Staphylococcus, Lactococcus, Streptococcus have been reported earlier as healthy breast skin flora (Hieken et al., 2016;Urbaniak et al., 2016), Propionibacterium, Corynebacterium bacteria, and Malassezia fungi has been reported to be common skin commensals (Grice and Segre, 2011). Although the detection of those common skin/healthy breast floras in the breast tumor microenvironment in the current study is not surprising, there still exists a breast tumor specific microbiome, which was also reported by other studies (Urbaniak et al., 2014;Xuan et al., 2014).
Many of the microbial signatures that were detected in one or more of the breast cancer types were not detected in the healthy controls, as mentioned in the results section. Most of those micro-organisms were found in earlier studies to be associated with cancer and/or immunocompromised patients (Kontoyianis et al., 1994;Menon et al., 1999;Narikiyo et al., 2004;Aamir and Bokhari, 2005;Kristek et al., 2005;Ramanan et al., 2014;Abdulrahman and Gateley, 2015;Banerjee et al., 2015Banerjee et al., , 2016.
It is possible that micro-organisms in the breast cancer could contribute to the origin, potentiation or modulation of oncogenesis. However, it is equally possible that the tumor microenvironment provides favorable conditions for specific micro-organisms to persist more readily than in the normal tissue microenvironment. Moreover, due to HIPAA regulations we could not get any information on the type of treatment these breast cancer patients received. Thus, while we can only assume that the samples from some of the patients could be obtained before treatment, others could be receiving treatment already at the time of sample procurement. Especially patients already receiving treatment could be immunocompromised, which further exposes them to a higher infection rate, and thus detecting higher number of micro-organisms from those samples is not surprising.
Our data demonstrate for the first time that the microbial signatures of BRTN and BRTPs are distinct and significantly different from the microbial signatures largely shared by BRER and BRHR. Furthermore, the unique characteristics of the breast cancer associated microbial signatures potentially provide certain tools for specific diagnosis and treatment of these cancers. These findings are hypothesis-generating and needs further investigation to identify a microbial risk signature for the different breast cancer types and potential microbial-based prevention therapies. A complete review of the microbiome in these breast cancers and healthy controls would open up more insight into answering those questions.

AUTHOR CONTRIBUTIONS
ER and JA conceptualized the study; SB and ER planned the experiments; SB performed the experiments, analyzed part of the data, made tables and figures for the manuscript, wrote the manuscript, with contributions from ER and JA; ZW and TT analyzed the micro-array data; KP provided technical assistance during the experiments; NS and MF were the pathologists who provided and evaluated the samples for identification of breast cancer and controls; AD identified the patients with different breast cancer types for inclusion in the study.