Exploring the tumor micro-environment in primary and metastatic tumors of different ovarian cancer histotypes

Ovarian cancer is a highly heterogeneous disease consisting of at least five different histological subtypes with varying clinical features, cells of origin, molecular composition, risk factors, and treatments. While most single-cell studies have focused on High grade serous ovarian cancer, a comprehensive landscape of the constituent cell types and their interactions within the tumor microenvironment are yet to be established in the different ovarian cancer histotypes. Further characterization of tumor progression, metastasis, and various histotypes are also needed to connect molecular signatures to pathological grading for personalized diagnosis and tailored treatment. In this study, we leveraged high-resolution single-cell RNA sequencing technology to elucidate the cellular compositions on 21 solid tumor samples collected from 12 patients with six ovarian cancer histotypes and both primary (ovaries) and metastatic (omentum, rectum) sites. The diverse collection allowed us to deconstruct the histotypes and tumor site-specific expression patterns of cells in the tumor, and identify key marker genes and ligand-receptor pairs that are active in the ovarian tumor microenvironment. Our findings can be used in improving precision disease stratification and optimizing treatment options.


Introduction
Ovarian cancer is the second most common and most malignant cancer in the female reproductive tract.Epithelial ovarian cancers (EOC) which account for about 90% of ovarian malignancies can be further divided into serous, endometrioid, clear cell, and mucinous histotypes (Matulonis et al., 2016).The risk factors of epithelial ovarian cancer vary by histotype but generally include age, weight, hormone therapy after menopause, as well as family history (American Cancer Society, 2020).Previous genomic studies (Network, 2011) have demonstrated that mutations in TP53 and NF1, and dysfunction of BRCA1 are involved in the pathogenesis of the serous carcinoma in the ovary (Sangha et al., 2008).However, only a few studies have investigated the cellular landscape and transcriptomic profile of ovarian cancer histotypes which can inform targeted therapies.In recent years, the development of single-cell technology allows researchers to zoom in on the cell-level transcriptome of the tumor tissue and provides a better understanding of the tumor microenvironment (TME).
Single-cell technology has been applied to ovarian cancer previously on malignant abdominal fluid (ascites) associated with high grade serous ovarian carcinoma (HGSOC) histotype (Izar et al., 2020) to resolve the HGSOC landscape and investigate inflammation programs.The stress associated chemo-resistance in solid tumors from metastatic sites with HGSOC was investigated together with stroma signaling to provide insight into chemotherapy resistance (Zhang et al., 2022).A recent study used scRNA-seq on primary and untreated peritoneal metastatic sites (Kan et al., 2022) and identified a subset of RGS5+ cancer-associated fibroblasts (CAFs) strongly supporting tumor metastasis and cancer recurrence in EOC.However, comparisons across multiple sites and histotypes are yet to be performed.We previously reported the cellular composition of metastatic ovarian tumors from the omentum using single-cell RNA sequencing technology, categorized tumor samples based on T cell infiltration signatures which were confirmed by immunohistochemistry, and identified the presence of a GNLY + CD4 T cell population in high T cell infiltrated samples (Olalekan et al., 2021).
In this current study, we characterized primary and metastatic tumors of different histotypes from 12 ovarian cancer patients using Drop-seq, a high-throughput single-cell RNA-seq technique (Evan et al., 2015).We analyzed the distribution of cell types with the tumor microenvironment and investigated possible cell-cell interactions by histotype.Cancer stem cells with increased expression of IFIT1, IFIT2, IFIT3, and ISG15 were uniquely present in HGSOC and MMMT tumors.Cancer-associated fibroblast (CAF) sub-clusters showing high expression of IL6, CCL2, S100A4, PDPN, and FGF7 were identified in both primary and metastatic samples.Our previously reported immune cell populations were validated in this bigger sample collection.Finally, we identified a cluster of IL32+ plasma B cells that were found exclusively in the primary tumor sites that may be of prognostic value.
With the inclusion of additional histotypes and tumor sites in our collection, this study allows us to characterize the differences in cell compositions between sites and different levels of their T cell infiltration, build cell or gene signatures to characterize the different ovarian cancer histotypes, and further investigate the underlying molecular mechanism in the TME.We further explored cell-to-cell communication among different cell sub-clusters, using inferred ligandreceptor (LR) interactions.We note that such interactions are enriched among epithelial cells and fibroblasts, and that LR interaction signatures vary across different tumor sites and histotypes.

Establishing cell lineages, molecular subtypes, and cell-cycle states across samples
To study the cell composition of ovarian cancers, tumor tissues resected from 12 ovarian cancer patients undergoing debulking surgery in the ovaries, omentum, and rectum were analyzed (Figure 1A; Table 1).Briefly, the cohort consisted of seven white, two Asian, two Black, and one woman of unknown racial origin, and ranged between 39-77 years in age (mean ~62 years).Most patient tumors were stage IIIB or above, according to staging by a pathologist.Solid tumor samples of different histotypes were collected from primary (ovaries) and metastatic (omentum, rectum) sites (Table 2) which enabled us to investigate histotypeand site-specific signatures at single cell level.Tumor samples were obtained fresh from surgery and processed using Drop-seq (Evan et al., 2015) within 24 h or fixed in formalin for immunohistochemistry (IHC).Immune (CD45 + ) and cancer cells enriched from a subset of samples were also profiled by Drop-seq to obtain a better representation of immune cells in our single-cell data.
A total of 26 gene expression matrices were generated from Drop-seq experiments on 21 ovarian cancer tumors from 12 patients.We identified a total of 38,811 genetic features across 25,326 cells from tumors resected from multiple tissue sites in this study.The filtered gene expression matrices were integrated using the anchor-based alignment.Unsupervised clustering analysis yielded 11 distinct clusters of cells.The resulting clusters were annotated using Template-based Automated Cell type Assignment (sc-TACA; Section 4), yielding ten major cell types including epithelial, endothelial, mesenchymal stem (MSC), embryonic stem (ESC), fibroblast, macrophage, T, B, and plasma B cells and a small cluster of 37 cells marked as N1 that shared markers with astrocytes which we saw in our previous study (Olalekan et al., 2021) (Figure 1B).Percentages of each cell type comprising each tumor sample are shown in Supplementary Figure S1A; Table 2. Due to the small number of N1 cells in any given sample (<0.1%), we excluded them from further analysis.For simplicity, the cell types were classified into three compartments: epithelia, containing epithelial cells and ESC, stroma containing endothelial cells, MSC and fibroblasts, and immune, containing macrophages, B and plasma B cells, and T cells (Figure 1B).
Next, we explored the expression of the genes associated with the four molecular subtypes of ovarian cancer -differentiated, immunoreactive, mesenchymal, and proliferative -identified from The Cancer Genome Atlas (TCGA) (Network, 2011) in our dataset.We were able to assign one of the four molecular subtypes with the highest TCGA module score to 93.7% of cells; cells with a negative module score were marked as not assigned (NA) (Olalekan et al., 2021).When each cell on the dimension reduction of the Uniform Manifold Approximation and Projection (UMAP) was marked with the molecular subtype assigned to it (Figure 1C), we noted that the major cell types and the cellular compartments they belong to (Figure 1B) match the predominant molecular subtype of ovarian cancer identified by TCGA.The epithelial cells were distributed through all four cancer subtypes and comprised 80% of the cells predicted as differentiated subtype.73% of cells from the predicted immunoreactive subtype were immune cells (B cells, T cells, and macrophages).The mesenchymal subtype, associated with worst survival (Konecny et al., 2014), consisted of the least epithelial cells and contained the highest percentage (82%) of stroma cells, including MSC, fibroblasts, and endothelial cells.The proliferative subtype contained 56% of cells from the epithelial cell category; 26.2% of cells from the ESC (about half of the total ESC population) that also showed unique stem cell features described later, were of the proliferative subtype.Sample-specific composition of TCGA subtypes is shown in Supplementary Figure S1B.
To study the cell cycling effects under the TME, Cell cycle analysis was performed on the combined dataset to assign a cellcycle module score to each cell for the G1/S, G2/M, and M/G1 phases.Cells that could not be assigned to one of these phases were marked as "NA".We noted that the cell cycling patterns were roughly similar for all cell types (Figure 1D), with the exception of ESC.A large fraction of cells across all cell types were assigned to the M/G1 phase (64.3%;Supplementary Figure S1C), as seen previously (Tay et al., 1991).In contrast, most ESC (>70%) were  assigned to the G2/M phase where they likely stalled during the cell cycle (Becker et al., 2006).

Immune cells and their expression in ovarian cancer samples
We identified 5,453 cells as immune cells that could be further split into B cells, plasma B cells, T cells, and macrophages (Supplementary Figure S2A).We also found a few dendritic cells and common myeloid progenitor cells (n = 52 and 30, respectively) that co-clustered with macrophages; these cells were not included in the downstream analysis due to the low cell counts.When identifying the subclusters within each cell type, we denote them as "EP" for epithelial cells, "TC" for T cells, "BC" for B cells, "MA" for macrophages, "ES" for ESC, "FB" for fibroblasts, "MS" for MSC, and "EN" for endothelial cells.We used a single digit starting from 0 to index the sub-clusters for each cell type, e.g., EP0 denotes cluster 0 of epithelial cells.An adjusted p-value threshold (adjusted p < 0.05) was applied to all the gene markers mentioned below for subclusters.

Cellular composition by ovarian cancer histotypes and tumor sites
We conducted further analysis on our tumor samples to examine cell types described above (Figure 1B; Supplementary Table S4A), cancer histotypes (Figure 3B; Table 1), and T cell infiltration into tumors (Figure 3A; Supplementary Figure S2A; Table 2).Based on pathology grading, the samples in this study belong to six ovarian cancer histotypes: serous ovarian carcinoma (SOC), high grade serous ovarian carcinoma (HGSOC), low grade serous ovarian carcinoma (LGSOC), clear cell, endometrioid with serous features, and malignant mixed Müllerian tumors (MMMT).Figure 3B shows the heatmap of cell type compositions combined across all samples, grouped by cancer histotypes.We noted the highest fraction of epithelial cells in MMMT and the highest fraction of MSC in endometrioid samples.Expression of previously established immunohistochemical markers (Köbel et al., 2016;Peres et al., 2018) WT1, NAPSA, and PGR for histotype classification were checked on EP and ES cell lineages.We confirmed higher expression of WT1 in HGSOC, and NAPSA in clear cell histotypes compared to the remaining histotypes, and the presence of PGR expression in endometrioid histotype with serous features (Figure 3C).Due to the limitation of 3' scRNA-seq assays, we were not able to assess the abnormality for TP53 to differentiate HGSOC and LGSOC.Therefore, the over-expression pattern for marker CDKN2A was used to identify HGSOC.The expression of TP53 exhibited low prevalence across all histotypes (Figure 3C; Supplementary Figure S3C).The expression levels of additional markers (Sakata et al., 2017;Peres et al., 2018;Leskela et al., 2019) VIM and ARID1A were used to distinguish other histotypes.The EMT repressors, zinc finger E-Box binding homeobox 1 and 2, (ZEB2, ZEB1) (Sakata et al., 2017;Leskela et al., 2019) related to endometrial carcinosarcoma, a mix of (epithelial) carcinoma, and (mesenchymal) sarcoma, were used to distinguish endometrioid and MMMT histotypes.In epithelial and ES cells, we observed WT1+CDKN2A + for HGSOC, and WT1+CDKN2A-VIM + for LGSOC (Figure 3C).Among the other histotypes, we find SOC to be WT1+CDKN2A + VIM+, clear cell to be WT1-NAPSA+, endometrioid to be WT1-NAPSA-PGR + ZEB2+ARID1A+, and MMMT to be WT1+VIM + ZEB1+ in our limited sample space (Figure 3C).
We project the histotype-specific GWAS (genome wide association study) markers (Lengyel et al., 2022) from HGSOC, LGSOC, clear cell, Endometrioid, and Mucinous histotypes on both epithelial and stromal lineages using averaged expression values in the dot plot (Supplementary Figure S3D).The HGSOC specific markers were more commonly detected in HGSOC samples compared to the other histotypes (Supplementary Figure S3D, rectangle in the HGSOC panel); the LGSOC specific markers were exclusive to the LGSOC samples (Supplementary Figure S3D, rectangle in the LGSOC panel); more HGSOC and less LGSOC specific markers were detected in the SOC sample (Supplementary Figure S3D, rectangle boxes in the SOC panel).The expressions of clear cell-specific markers were not exclusive or elevated in the clear cell samples (Supplementary Figure S3D, rectangle in the clear cell panel), while the expression of endometrioid specific markers were not exclusive but elevated in the endometrioid samples (Supplementary Figure S3D, rectangle in the endometrioid panel).The MMMT samples exhibited a mixed signal from all markers from different histotypes' GWAS genes as well as the borderline GWAS markers samples (Supplementary Figure S3D, rectangles in the MMMT panel).
The secretory and ciliated epithelial markers, and hallmark epithelial markers collected in a previous study (Lengyel et al., 2022) were projected to HGSOC, LGSOC, and SOC samples.The ciliated epithelial marker CAPS was only expressed in EP0.The secretory epithelial marker PAX8 was detected across the epithelial (EP and ES) lineage in HGSOC and SOC but absent in EP1 and all ES cells in LGSOC samples (Supplementary Figure S3E, orange rectangle).The markers for cell cycle (CDK4) and MHC II cluster (HLA-DQA1, HLA-DPA1) were absent in HGSOC (Supplementary Figure S3E, green rectangle).The markers for cytokeratin, KRT23 and epithelial stem cell, CD44 were only found in LGSOC's EP and ES subpopulation respectively (Supplementary Figure S3E, purple rectangle).On the contrary, the markers for chromatin remodeling and panepithelial were lower expressed in LGSOC (Supplementary Figure S3E, blue rectangle).
Tumors from the ovaries were considered primary tumor sites while the tumors from the omentum and rectum were categorized as metastatic (Table 2).We categorized each sample based on the level of T cell infiltration (T Inf ) (Olalekan et al., 2021), and whether its tumor site was primary or metastatic, thus grouping our samples into 4 categories: Metastatic High (T Inf ), Metastatic Low, Primary High, and Primary Low.We found no significant differences in the composition of major cell lineages between primary and metastatic sites.However, at the sub-cellular level, the ratios of FB4, FB2, MA3, and MA2 were higher in metastatic sites, while EP2 showed an opposite pattern (Supplementary Tables S4B, C).T-tests performed on the ratios of different cell lineages showed significantly higher fractions of TC and BC in the high T Inf group, and MS was lower in T Inf (Supplementary Table S4B).Zooming in, the TC0 and BC3 appear to drive these differences, while higher MS2 correlated with low T cell infiltration (Figure 3A; Supplementary Table S4C).The percentages of each immune cell type in these four categories are shown in Supplementary Figure S2A (bottom panel).We found FB sub-clusters, FB0, FB2, FB4, and FB5 expressing CAF markers to be enriched in samples classified as Metastatic Low (Supplementary Figure S3B), along with CSC (EP3 sub-cluster, Supplementary Figure S3B).
Overall, the cell sub-type fractions from the same main cell types were correlated with each other.For example, the fractions of subclusters by sample in EP, ES, and CAF sub types-FB0, FB2, FB3, FB4, and FB5 were similar to each other and thus clustered together (Figure 3B, rectangles).Interestingly, the only non-CAF subcluster of fibroblast, FB1 clustered with MSC (Figure 3B, rectangle).
Due to the limited number of samples available for all histotypes, we were unable to calculate the statistical significance of the cell cluster compositions.Nevertheless, several intriguing observations merit attention.The EP0 cluster was observed in all histotypes (Supplementary Table S4A).The fractions of EP1, EP2, and EP3 cells were higher in MMMT compared to the other histotypes, while the fraction of cells in ES1, ES2, and ES3 were higher in HGSOC (Supplementary Figure S3A).For clear cell histotype, the percentage of cells in TC0, BC2, and BC3 were higher than in other histotypes.The endometrioid histotype showed a high fraction of MSC, MA1, and FB1.The percentage of certain macrophage and T cell subtypes in LGSOC (MA0, MA2, and TC1) was higher than in other histotypes.The SOC sample of undetermined pathology grade appeared more similar to HGSOC from the primary site than LGSOC in terms of cell type composition (Supplementary Figure S3A).In the SOC sample, the human leukocyte antigen (HLA) genes have higher average expression compared to HGSOC samples.BC2 derived from tumors with high T cell infiltration and were identified primarily in clear cell and SOC histotypes (Supplementary Figures S3A, B).
Immunohistochemical staining of vimentin (VIM), CD45, and cytokeratin-7 (CK7) was also performed on tumor tissues from metastatic (Supplementary Figure S3F) and primary (Supplementary Figure S3G) tumors belonging to different ovarian cancer histotypes to investigate the fractions of the major cell lineages in these tumors.We correlated the percentage of each cellular subset in our combined dataset from 18 samples to the IHC results; three samples from patients P3 and P4 that were enriched for CD45 + cells alone for Drop-seq were excluded from this analysis.The percentage that stained for CD45 was well correlated (Pearson correlation = 0.51 and a significant 0.03 p-value estimation, Supplementary Figure S3H, left panel) with the immune population (MA, TC, and BC).The correlations between area staining for CD45 (IHC), and the percentage of T cells, B cells, and macrophages out of all cells are 0.59, 0.53, and 0.17 (not significant), respectively.
The stroma population was estimated using the union of FB, MS, and EN cells in Drop-seq data.The Pearson product-moment correlation with the percentage of cells that stained for vimentin was negative (−0.45, Supplementary Figure S3H, middle panel) with a non-significant p-value and may be caused by the epithelial cells undergoing EMT (we observed a consistently smaller percentage of the stromal subpopulation compared to the VIM-stained percentage).The CK7 percentage was positively correlated (Pearson's correlation of 0.24, Supplementary Figure S3H, right panel) with the epithelia (EP and ES cells), however not significantly (p-value = 0.34).Out of the 18 samples, only 3 samples had more than 30% difference between the stained CK7 and annotated epithelial sub-population.
As seen previously (Olalekan et al., 2021), we noted significant differences in the abundance of T cells between samples reported by Drop-seq.The T cell percentages in Drop-seq data showed the highest correlation with CD45 staining in IHC.Due to the correlation between T cells in tumors and cancer outcome (Olalekan et al., 2021), we categorized a sample as having high T cell infiltration (T inf ) if the percentage of T cells was greater than 10%, and low T inf if less than 10% in the sample as per Drop-seq data (Supplementary Figure S2A; Table 2).

Inferring cellular interactions in the tumor microenvironment using ligandreceptor analysis
To understand the patient-specific TME, we predicted the ligand-receptor interactions among the cell sub-clusters, using CellPhoneDB (Efremova et al., 2020) and additional cancerspecific ligand-receptor (LR) pairs that were curated from previous studies (see Section 4).We found that FB, EP, ES, and MS cells were highly activated for ligand-receptor (LR) interactions (Figure 4A).The higher abundance of FB and EP cells in the TME and high numbers of putative LR interactions identified within and between EP, FB, and immune cells in our data allowed us to further dissect histotype-or site-specific LR repertoires.Accordingly, we selected the following lineage pairs: epithelial-to-fibroblast, immune-to-epithelial, and immune-to-fibroblast. Clusters with less than 50 cells were excluded from the downstream LR analysis.
We first examined the resulting cancer-specific LR interactions in epithelial-to-fibroblasts. To identify LR interactions common to each histotype, we integrated interactions from all samples grouped by their histotypes.Histotype-specific LR signatures across epithelial-to-fibroblasts were identified (Supplementary Figure S4A).HGSOC displayed higher interactions of receptors ITGB1 in epithelial cells (to COL1A2, MDK, and VEGFA in fibroblasts), as well as FGFR1 in epithelial cells (to FGF12 and FGF18 in fibroblasts).
LGSOC histotype had higher LR signatures for ITGA5_ADAM17, MET_SEMA5A, LAMB1_ITGA2, LAMC1_ITGB4, and VEGFA_ NRP2.Receptor FGFR1 was also highly expressed in epithelial cells in LGSOC, though the ligand it enriched for was FGF9.
For cancer associated LR interactions from the immune-toepithelial (Supplementary Figure S4B), the HGSOC patients had higher ITGA4_MDK and BTLA (to VTCN1 and TNFRSF14).
For samples with abundant immune subpopulations, it is feasible for us to break down the immune cells into those compartments with sufficient number of cells captured.The original CellPhoneDB database was used to capture commonly occurring LRs that may not be specific to cancer in immune cell subpopulations.We ranked samples by the number of LR interactions (Figure 4B) and selected four samples with high LR interactions for comparison: P1-1 (metastatic, low T Inf ), P6-1, P7-1 (metastatic, high T Inf ), and P5-1 (primary, high T Inf ).In particular, we examined LR interactions of T cells with fibroblasts (Supplementary Figure S5A) and ESC (Supplementary Figure S5B).We observed common signals for TIGIT in T cells and NECTIN2 in fibroblasts and ESC; TIGIT contains ITIM motifs in its cytoplasmic tail that binds to NECTIN2 and triggers inhibitory signals (Deuss et al., 2017).This ligand-receptor signal was lower in P5-1 (Supplementary Figures S5A, B), which came from the primary tumor site.Similarly, the IL7R_IL7 pair was observed in all four samples for fibroblast (Supplementary Figure S5A), with the lowest signal in P5-1 (IL7R_IL7 was observed in P1-1 and P6-1 for ESCs, Frontiers in Cell and Developmental Biology frontiersin.orgsee Supplementary Figure S5B).This ligand-receptor pair has been correlated with immune cell infiltration in the TME (Wang et al., 2022).The ligand FASLG in T cells to receptors FAS, TNFRSF10A, and TNFRSF1B in fibroblast interaction pairs were detected in all samples (P5-1, 6-1, and 7-1) but not in P1-1 (Supplementary Figure S5A).Their interaction leads to apoptosis of thymocytes that fail to rearrange correctly their T cell receptor (TCR) genes and activationinduced cell death responsible for the peripheral deletion of activated T cells (Volpe et al., 2016).For fibroblast interactions, the LGALS9_CD44/r was enriched in both fibroblasts and ESC in P6-1 and P7-1, which are metastatic, with high T inf (Supplementary Figures S5A, B).The LGALS9_CD44 pair appeared on ESC from all samples except P6-1 (Supplementary Figure S5B).Gal-9 has direct cytotoxic effects, binds to CD44 expressed on cancer cells to limit cancer metastasis, and enhances the stability and function of adaptive regulatory T cells (Wu et al., 2014;Ustyanovska Avtenyuk et al., 2021).Different interactions associated with immune regulation in tumors (Li et al., 2021) were also found: CD74 interactions with APP or COPA were found in samples P5-1 and P6-1, while HLA-C_FAM3C interaction was enriched in samples P1-1 and P7-1 (Supplementary Figures S5A, B).CD2_ CD58 interactions between T cells and ESC were noted in all four samples (Supplementary Figure S5B), and between T cells and fibroblasts in P6-1 and P7-1 (Supplementary Figure S5A).NOTCH2 interactions (Galic et al., 2013;Jia et al., 2019) (with JAG2 and DLL3) were seen in fibroblasts in P1-1 (Supplementary Figure S5A).We also detected intriguing patterns of certain integrin complex-collagen binding pairs (Zeltz and Gullberg, 2016) on fibroblasts enriched in specific samples (Supplementary Figure S5C): integrin complex A2B1 appeared in P6-1 only; enhanced expression of α2β1 integrins may influence spheroid disaggregation and proteolysis responsible for the peritoneal dissemination of ovarian carcinoma (Shield et al., 2007).A1B1 was intriguingly absent in P1-1; instead, integrins A10B1 and A11B1, appear in P1-1 alone.Integrin α11β1 62 was previously seen overexpressed in NSCLC, especially in CAFs (Zhu et al., 2007;Navab et al., 2011).The CD40LG_A5B1 pair was seen for fibroblasts in P1-1, P6-1, and P7-1 (Supplementary Figure S5A).Integrin α5β1 plays an important role in tumor progression (Hou et al., 2020).In addition, strong A4B1 interactions with FN1, VCAM1 and other ligands (Baiula et al., 2019) are seen with fibroblasts in P5-1, P6-1, and P7-1 (Supplementary Figure S5A), and ESC in P6-1 (Supplementary Figure S5B); A4B1 receptors have been proposed to target therapy in inflammatory disorders and cancer (Baiula et al., 2019).
These results suggest that different patient samples may have unique LR signatures that are associated with specific cell types, which may be used to target therapy.

Discussion
Ovarian cancer is a collection of different carcinomas that manifest as different histotypes, each with different cellular compositions and pathogenic mechanisms.Analysis of the TME in different ovarian cancer histotypes at the single-cell resolution can potentially connect the different histotypes with their unique cellular and molecular signatures, understand disease etiology, and help guide therapy.With this aim in mind, we ran Drop-seq on 21 tumor samples from 12 patients and across 6 histotypes of ovarian cancer.We detected three major cell compartments: epithelia (epithelial cells and ESC), stroma (fibroblast, endothelial cells, and MSC), and immune (T, B, plasma B, macrophage) by integrating all single-cell experiments.The four ovarian cancer subtypes using the TCGA gene expression signature revealed highly correlated cell types: the immunoreactive subtype showed a higher correlation with immune cells, while the mesenchymal subtype correlated most with stroma cells and least with epithelial cells.The differentiated and proliferative subtypes both consisted of epithelia but with low and high percentages of ESC, respectively.This suggests that the molecular subtypes classified by TCGA may be driven by the cell type compositions of the tumor samples.Because each tumor sample showed a unique cellular makeup that differed between primary and metastatic sites, it follows that the dominant molecular subtype of a tumor sample is specific to its site of origin, rather than being patient-specific, e.g., patients, P6 and P8, while sharing the HGSOC histotype, have different TCGA subtypes.
For most cell types, we found that the cell cycle phases G1/S, G2/M, and M/G1 were consistently distributed with a higher percentage of the M/G1 phase, with the exception of ESC, where over 70% of the ESC belonged to the G2/M phase.Tumors with high G2/M gene activity have been associated with metastasis and worse outcomes in patients with particular subtypes of breast and pancreatic cancers (Oshi et al., 2020a;Oshi et al., 2020b).The role of p53 in G2/M related cell-cycle arrest in response to DNA damage has been studied extensively (Agarwal et al., 1995;Concin et al., 2003;Müller et al., 2020).
We found five different subtypes of cancer-associated fibroblasts, FB0 and FB2-5 in both primary and metastatic sites, based on the expression of IL6, CCL2, S100A4, PDPN, and FGF7.Each CAF sub-cluster supports different roles in the progression and metastasis of ovarian cancer.Cells in FB0 expressed genes associated with angiogenesis, Integrin signaling, and T cell receptor signaling pathways.These pathways were related to extracellular matrix remodeling and immune crosstalk under the tumor microenvironment (TME).FB2 supported upregulation of NF-kappa B signaling pathway genes and chemokine receptors associated with cancer metastasis.FB2 and FB4 exhibited elevated expressions of growth factor binding genes as well as genes enriched for angiogenesis and blood vessel development.Top differentially expressed genes in FB3 may be involved with endothelial cell signaling and vascular function.FB5 showed genes enriched in immune crosstalk and cytokine/interferon signaling pathways.Among epithelial cells, we identified the EP3 sub-cluster as cancer stem cells, based on high expression of IFIT1 and ISG15.
The majority of the immune sub-clusters were consistent with those identified in our previous study on metastatic ovarian cancer (Olalekan et al., 2021).We identified a new cluster of IL32 + B cells (BC2) that are CD38-SDC1-S100A4+GAPDH+; these cells were found in both primary and metastatic tumor sites with high T cell infiltration, deriving primarily in clear cells and SOC histotypes.
We did not observe any significant difference in the overall composition of cell lineages between primary vs. metastatic sites.We noted higher ratios of specific CAF (FB4 and FB2) and macrophage (MA2 and MA3) subsets and lower ratio of an epithelial subcluster (EP2) in metastatic sites, compared to primary tumors.
Overall percentages of T and B cells were higher in high T Inf samples, be it from primary or metastatic site, while the percentage of MS was lower overall.At the sub-cluster level, the TC0 and BC3 were positively correlated with T Inf status, with MS2 showing a negative correlation.The CAF sub-clusters FB0, FB2, FB4, and FB5, and the CSC (EP3) were enriched in samples classified as metastatic, low T Inf .
The IHC and GWAS markers showed distinct expression patterns on different histotypes, especially for the epithelial and stromal lineages.The immune lineage was overall less sensitive to these known markers.
Besides tumor site and T Inf status, there were also differences in the makeup of cellular sub-types between histotypes.The percentages of epithelial cells from EP1-3 were higher in HGSOC and MMMT histotypes, while the percentages of ESC in clusters ES1-3 were higher in HGSOC only.For clear cell histotype, the percentage of cells in TC0, BC2, and BC3 was higher.The endometrioid histotype had a higher percentage of MS and FB1 cells.The percentages of MA0, MA2, and TC1 cells were higher in LGSOC than in other histotypes.
Lastly, we found fibroblasts and MSC to be active players in the TME, exhibiting potentially distinct LR interactions with epithelial and immune subclusters in patients and histotypes.Imputed ligands and receptors may be leveraged to target therapy in ovarian cancer patients.
Limitations of the study: The total number of patient samples collected in this study is limited due to the pandemic.Certain cancer subtypes such as MMMT were less represented in our samples because of their lower prevalence (Xu et al., 2020).We observed marked heterogeneity in the patients' TME in our datasets; however, due to limitations of sample size, we focused on conservative signals within each group of interest.The treatment and outcome information were not available for the patient cohort and therefore, could not be included in the analyses.The cell subpopulations in tumors dissected from different individuals, tumor sites (primary vs. metastatic) or even different regions sampled from the same tumor may vary.The ligand-receptor interactions were inferred in silico through statistical testing, with the caveat that the same ligand or receptor can account for multiple inferred ligandreceptor pairs.Further validation tests are needed to confirm the ligand-receptor interactions.

Tissue collection, sample preparation, and drop-seq
Ovarian cancer tissues from primary and metastatic sites were collected from women undergoing debulking surgery at the University of Chicago.Some of the tissue collected from the different sites were patient-matched.The University of Chicago's Institutional Review Board for human research approved the collection of human tissue after patient deidentification.Ovarian tumors were transported in DMEM/F12 containing 10% FBS and 1% P/S (100% DMEMF/12), and processed as previously described (Olalekan et al., 2021).Red blood cells and dead cells were removed from cell suspensions using Miltenyi Biotec, 130-094-183, 130-090-101, respectively, used according to manufacturer's protocols.Additionally, some samples were enriched for immune-only, non-immune, tumor-only, and non-tumor cell compartments, using magnetic bead-based isolation or fluorescence activated flow sorting (Miltenyi Biotec,used according to manufacturer's protocols).
Drop-seq was performed as previously described (Olalekan et al., 2021) on ovarian cancer tumor samples from 12 patients (Table 1).A total of 21 tumor samples were present in this study, including 5 patients with Matched primary (right and/or left ovaries) and metastatic (omentum, rectum) tumors (Table 2).Of these, a few randomly selected samples were enriched for select cellular compartments prior to running Drop-seq: CD45 + (5 samples), tumor (2 samples), and non-tumor (1 sample); 18 samples were processed without any enrichment.

Data processing, alignment, and clustering analysis
A total of 40 sequencing runs were performed on Illumina's NextSeq 500 using the 75 cycle v3 kit, as previously described (Olalekan et al., 2021).Some samples were sequenced multiple times to achieve deeper resolution.Each run produced pairedend reads, with Read 1 representing the 12 bp cell barcode and a 6 bp long unique molecular identifier (UMI), and Read 2 representing a 60-64 bp mRNA fragment.Paired-end reads from the same samples were combined to generate 26 paired-end fastq files.Read count matrices were generated from sequence reads from the Drop-seq experiments for both exon and intron regions in the human genome (gencode (Frankish et al., 2019) hg38 v.27) using a Snakemake pipeline (Selewa et al., 2020) andSTAR version 2.5.3 aligner (Dobin et al., 2013).
To select high-quality cells, we applied a filter based on the number of genes detected per cell.Prior to filtering, each sequenced sample produced approximately 5,000 cells.Based on the median number of captured genes per cell, cells with less than 400 genes detected were removed from the dataset.A total of 26,421 cells were retained for the downstream analysis.We followed a standardized pipeline using the single-cell analysis tool suite, Seurat v3.0.2 (Butler et al., 2018;Stuart et al., 2019).A logarithmic normalization method (Butler et al., 2018) was applied to all samples to transfer the gene expression counts [+1, to avoid log(0)] scaled by a factor of 10,000 (TP10K) to log units.The normalized matrices for all samples were integrated by the anchorbased alignment method Canonical Correlation Analysis (CCA) using Seurat (Stuart et al., 2019).The top 1,311 highly variable genes and top 20 canonical vectors were selected to perform the alignment integration, where the integrated gene expression matrix had a lesser number of features (genes) than the original gene expression matrix.The integrated matrix was scaled by a linear transformation to center the mean gene expression for all cells.We applied PCA on the scaled integrated expression matrix to extract the top 50 components in the data, followed by a heuristic elbow plot on the standard deviation of each PC.We selected the top 16 variant PCs based on the elbow plot.
The selected PCs were used in further exploration of the data, such as UMAP (McInnes et al., 2018) dimension reduction, construction of K-nearest neighbor graphs, shared nearest neighbor modularity optimization-based clustering (Stuart et al., 2019), etc.We used dimension reduction methods, UMAP, to generate 2D plots to visualize different cell populations in the experiments.Hierarchical clustering on the shared nearest neighbor graph was applied to infer the clustering structure on the cells where the resolution parameter was set to 0.2.Differential expression analysis was performed through the FindMarkers function in Seurat using the Wilcoxon Rank Sum test, and statistically significant markers were extracted for sub-populations or contrast groups based on an adjusted p-value (adj.p-val.)threshold of 0.05.

Cell cycling effects
We inferred the cell-cycle phase for all cells based on previously curated gene markers reflecting three phases of the cell cycle in chemically synchronized cells (G1/S, G2/M, and M/G1) (Whitfield et al., 2002;Evan et al., 2015).For each cell-cycle phase, the module scores were calculated as the average expression levels of binned gene markers subtracted by the aggregated expression of random gene sets from the same bin.The Seurat AddModuleScore function was used to assign all five module scores to each cell where 24 bins of aggregate expression levels for the marker genes were used and a hundred control genes were selected from the same bin per gene.The highest scored cellcycle phase was assigned to the cell.If none of the module scores were positive, the cell was designated as not assigned (NA).

Cancer subtype classification
Four cancer subtypes-differentiated, immunoreactive, mesenchymal, and proliferative were categorized by previous bulk sequencing study in ovarian cancer (Tothill et al., 2008;Network, 2011).The marker genes for each subtype were determined by the upregulated marker signatures on the four subtypes (Verhaak et al., 2012).The Seurat AddModuleScore function was used to assign four module scores to each cell where 24 bins of aggregate expression levels for the marker genes were used and a hundred control genes were selected from the same bin per gene.The subtypes were then assigned to individual cells by the highest positive modular score.In the absence of positive modular scores, the subtype was considered not assigned (NA).

Cancer patient survival prediction
The cancer outcome was categorized as poor and good in the previous research on the TCGA ovarian cancer dataset (Tothill et al., 2008;Network, 2011), where a list of gene signatures based on RNA-seq data was extracted for both outcomes.We obtained the module scores based on these lists of predictive gene markers using the Seurat AddModuleScore function as described in the cancer subtype classification.The predicted outcome was assigned to the cells according to the module score.

Cell type classification using templatebased method
We assigned the cell type using a template-based cell annotation method, namely, sc-TACA (https://github.com/bingqing-Xie/taca)(Xie, 2021).The sc-TACA method utilizes an annotated single-cell dataset as a template.In this study, six HGSOC metastatic samples in the 26 samples have been previously annotated, which was used as the template.The cell types annotated in this template were denoted by T t i , t 1..p , where p is the total number of unique cell types.All samples were integrated by an anchor-based alignment via Canonical Correlation Analysis (CCA) in Seurat (Butler et al., 2018;Stuart et al., 2019).Then modularity optimization-based hierarchical clustering FindClusters was applied on the integrated dataset with a resolution r = 0.2 that resulted in 11 cell clusters.For each cluster i, we obtained the annotated cell type vector C i c 1 , c 2 , . . ., c Ni where N i is the total number of cells from cluster i and c i ∈ T. The annotation t i of a given cluster i was determined by the highest ratio of annotated cell type within the cluster t i argmax t r i t where r i t ci|ci t Ni .A threshold r min 0.7 was enforced to ensure the robust assignment.If max t r i t < r min for cluster i, it was labeled as undecided.

Analysis of fibroblasts, epithelial cells, and immune sub-population
After identifying the cell types, we extracted the fibroblasts, epithelial cells, and immune cells (T cells, B cells, macrophages) to conduct further investigation.Each sub-population expression matrix was a subset of the integrated matrix.The expression matrix was scaled and PCA analysis was performed to extract the top components in the data.Top PCs were selected based on the elbow plot, which varied from 10 to 20 based on the sub-population variation.Hierarchical clustering on the shared nearest neighbor graph was applied where the resolution parameter was set to a range between 0.2 and 0.5.The same UMAP was used to project the cells to a 2D space to visualize the sub-types for each cell type.Differential expression analysis was performed through the FindMarkers function in Seurat using the Wilcoxon Rank Sum test, and statistically significant markers were extracted for subpopulations or contrast groups based on an adjusted p-value (adj.p-val.)threshold of 0.05.The differences in cell composition ratios between primary and metastatic sites, and between high and low T inf groups were evaluated by two-sided t-test with p-value estimation.

Ligand-receptor interaction analysis on cell type subclusters
We constructed a customized pan-cancer ligand-receptor (LR) interaction database, using CellphoneDB (Efremova et al., 2020) and published cancer studies, including 27 immune checkpoint LR pairs (Pardoll, 2012), 114 interaction pairs between cancer cells and T cells in lung cancer (Chen et al., 2020), 1380 LR pairs in a pan-cancer study (Ghoshdastider et al., 2021), and 216 LR pairs related to ovarian cancer (Castellano et al., 2006).For each sample, we inferred LR interactions among any pair of the cell sub-clusters, SC sc j i , where i is the lineage such as EP, and j is the subcluster index, using the pan-cancer LR database.We obtained a p-value for the likelihood of cell-type enrichment of each ligand-receptor complex (L l j i , where i is a ligand and j is a corresponding receptor).We denote sc j1 i1 , sc j2 i2 for a sub-clusters pair.p-value is calculated by the proportion of means that are as high as (or higher) than the random permutation for all pairs, SC (sc j1 i1 , sc j2 i2 ) .Interactions with adjusted p-value <0.05 were considered significant.
The "significant means" vector, M m sc l ,l ∈ L, sc ∈ SC was extracted for each sample and m l was set to 0 when p-value >0.05 or sub-clusters with insufficient cell counts, (|sc|) < 50.The number of absolute interactions (M > 0) was used as a proxy to estimate the frequency of the cell-cell crosstalk among cell types.The hierarchical clustered heatmap was used to identify the shared patterns for the sub-clusters from different cell types.We then grouped the samples by histotype and site for the downstream comparative analysis.A linear model was built using lmfit in Limma R library for a given contrast group (e.g., one histotype against the rest of the histotypes), and the empirical Bayes moderated t-statistics test ebayes was used to estimate the significance of any LR signature, l ∈ L (Dixon, 2003;Smyth, 2005).Significant positive LR pairs were used as the signature for any given condition group.

FIGURE 2
FIGURE 2 Cellular sub-types in the Immune, Epithelia, and Stroma.(A-C) Subclusters of major immune cell types: T cells, B cells, and macrophages, respectively.(D,E) Subcluster annotation for epithelial and embryonic stem cells (ESC), respectively.(F-H) Subcluster annotation for major cell-types in the stroma: fibroblasts, mesenchymal stem cells, and endothelial cells, respectively.

FIGURE 3
FIGURE 3 Cell composition by tumor site, T cell infiltration, and histotypes; fractions of immune, stromal, and epithelial cells are explored using immunohistochemistry. (A) Heatmap of major cell type composition (left) and sub cell type (right) for all patient samples.The column z-scores are calculated from cellular compositional percentages within each sample; the rows are split by site and T cell infiltration status.(B) Heatmap of cell type subclusters composition percentage for all patient samples.The values are column z-scores normalizing the percentage and the rows are split by histotypes.(C) Dot plot of histotype markers expression in epithelial (EP) and embryonic stem (ES) cells.The expression in the dot plot is the averaged scaled log normalized TP10k value.

FIGURE 4
FIGURE 4Ligand-receptor (LR) interactions predicted by CellPhoneDB using a customized cancer database.(A) Total number of interactions between all cell subtypes.(B) Counts of significant Ligand-receptor pairs for all cell type subclusters stratified by sample, the columns are grouped by the cell lineage of the first interactor.

TABLE 1
Metadata for ovarian cancer patients included in our study.

TABLE 2
Total number of cells and breakdown of cellular composition in each sample, color-coded by tumor site (Primary, Meta) and T cell infiltration (T inf ) status.