Geographically weighted linear combination test for gene-set analysis of a continuous spatial phenotype as applied to intratumor heterogeneity

Amini, Payam; Hajihosseini, Morteza; Pyne, Saumyadipta; Dinu, Irina

doi:10.3389/fcell.2023.1065586

ORIGINAL RESEARCH article

Front. Cell Dev. Biol., 09 March 2023

Sec. Molecular and Cellular Pathology

Volume 11 - 2023 | https://doi.org/10.3389/fcell.2023.1065586

Geographically weighted linear combination test for gene-set analysis of a continuous spatial phenotype as applied to intratumor heterogeneity

PA
Payam Amini ^1,2
MH
Morteza Hajihosseini ^3,4
SP
Saumyadipta Pyne ^5,6^{† *}
ID
Irina Dinu ³^{† *}

1. Department of Biostatistics, School of Public Health, Iran University of Medical Sciences, Tehran, Iran
2. School of Medicine, Keele University, Keele, Staffordshire, United Kingdom
3. School of Public Health, University of Alberta, Edmonton, AB, Canada
4. Stanford Department of Urology, Center for Academic Medicine, Palo Alto, CA, United States
5. Health Analytics Network, Pittsburgh, PA, United States
6. University of California, Santa Barbara, Santa Barbara, CA, United States

Article metrics

View details

Citations

2,9k

Views

1,2k

Downloads

Abstract

Background: The impact of gene-sets on a spatial phenotype is not necessarily uniform across different locations of cancer tissue. This study introduces a computational platform, GWLCT, for combining gene set analysis with spatial data modeling to provide a new statistical test for location-specific association of phenotypes and molecular pathways in spatial single-cell RNA-seq data collected from an input tumor sample.

Methods: The main advantage of GWLCT consists of an analysis beyond global significance, allowing the association between the gene-set and the phenotype to vary across the tumor space. At each location, the most significant linear combination is found using a geographically weighted shrunken covariance matrix and kernel function. Whether a fixed or adaptive bandwidth is determined based on a cross-validation cross procedure. Our proposed method is compared to the global version of linear combination test (LCT), bulk and random-forest based gene-set enrichment analyses using data created by the Visium Spatial Gene Expression technique on an invasive breast cancer tissue sample, as well as 144 different simulation scenarios.

Results: In an illustrative example, the new geographically weighted linear combination test, GWLCT, identifies the cancer hallmark gene-sets that are significantly associated at each location with the five spatially continuous phenotypic contexts in the tumors defined by different well-known markers of cancer-associated fibroblasts. Scan statistics revealed clustering in the number of significant gene-sets. A spatial heatmap of combined significance over all selected gene-sets is also produced. Extensive simulation studies demonstrate that our proposed approach outperforms other methods in the considered scenarios, especially when the spatial association increases.

Conclusion: Our proposed approach considers the spatial covariance of gene expression to detect the most significant gene-sets affecting a continuous phenotype. It reveals spatially detailed information in tissue space and can thus play a key role in understanding the contextual heterogeneity of cancer cells.

Introduction

Globally, there were approximately 2.3 million new cases of and 685,000 deaths due to breast cancer (BC) in 2020 (Lei et al., 2021). BC is also the leading cause of cancer-related deaths among women (Beiki et al., 2012; World Health Organization, 2018) and the second leading cause of cancer deaths globally, worldwide (Beiki et al., 2012). The tumorigenesis involves uncontrolled growth of cells in breast tissues which can be either benign or malignant (Liu et al., 2013). Several studies on breast cancer patients have revealed the different anti- and pro-tumorigenic roles of the CAFs involved (Chang et al., 2012; Brechbuhl et al., 2017; Su et al., 2018).

Extensive studies over the past few decades have uncovered a variety of cell populations in tumors, thus leading to the active research area of intratumor heterogeneity (ITH) (Marusyk et al., 2020). In 2010, Hanahan and Weinberg noted that tumors exhibit an additional dimension of complexity through their “tumor microenvironment” that contributes to the acquisition of the so-called hallmark traits of cancer. ITH is attributed to genetic, epigenetic, and microenvironmental factors (McGranahan and Swanton, 2017; Marusyk et al., 2020) and associated with poor prognosis, therapeutic resistance and treatment failure leading to poor overall survival in cancer patients (Landau et al., 2013; Patel et al., 2014; Zhang et al., 2014; Jamal-Hanjani et al., 2015; Jamal-Hanjani et al., 2017). Indeed, the persistence of some of the drug-tolerant intratumor cell populations could be attributed to their high phenotypic plasticity (Flavahan et al., 2017).

Interestingly, hierarchies of differentiation also exist among normal cells in healthy tissues, but the populations of tumor cells display far greater cell-to-cell variability and the resulting phenotypic instability (Landau et al., 2014; Jenkinson et al., 2017). Such ITH could be attributed to genetic causes ranging from aneuploidy to other factors such as complex contextual signals in the highly aberrant tumor microenvironments, or even global alterations in cancer cell epigenomes (Senovilla et al., 2012). ITH also involves immune cell infiltration, which is important to immunotherapies. Tumor antigen diversity could be determined by the T Cell clonality in the different regions of the same tumor (Senovilla et al., 2012). Studies have shown spatially complex interactions between tumor microenvironments and the patient’s immune system (Hanahan and Weinberg, 2011; Vogelstein and Kinzler, 2015).

While heterogeneous cell types are prevalent within the tumor microenvironment, some of which may account for cancer development and progression, it also contains different non-malignant components, including the cancer-associated fibroblasts (CAFs) (Pietras and Östman, 2010; Cortez et al., 2014; Kalluri, 2016). Although the origin and activation mechanism of CAFs remains an area of active research (Anderberg and Pietras, 2009; Shiga et al., 2015; LeBleu and Kalluri, 2018; Chen and Song, 2019), studies have attributed the processes of formation and derivation of CAFs to various precursor cells (Anderberg and Pietras, 2009; Shiga et al., 2015; LeBleu and Kalluri, 2018; Chen and Song, 2019), which may be the source of the well-known heterogeneity among the CAFs (Du and Che, 2017; Öhlund et al., 2017; Costa et al., 2018; Raz et al., 2018; Lee et al., 2020). Indeed, in certain tumors, such as in the breast, in which the prevalence of CAFs could be as high as 80%, they can play both anti-as well as pro-tumorigenic roles (Chang et al., 2012; Brechbuhl et al., 2017; Su et al., 2018). Importantly, CAFs can facilitate drug resistance dynamically by altering the cell-matrix interactions that control the outer layer of cells’ sensitivity to apoptosis, producing proteins that control cell survival and proliferation, assisting with cell-cell communications, and activating epigenetic plasticity in neighboring cells (Cuiffo and Karnoub, 2012; Junttila and De Sauvage, 2013). CAF-targeted treatments can have dual effects depending on the target and the tissue under consideration (Özdemir et al., 2014; Koliaraki et al., 2015; Wagner, 2016). For instance, spatial proximity to CAFs has been shown to impact molecular features and therapeutic sensitivity of breast cancer cells influencing clinical outcomes (Marusyk et al., 2016).

In recent years, higher resolution, tissue-specific gene expression analysis is made possible by using new platforms such as single-cell RNA sequencing (scRNA-seq), which has rapidly evolved as a powerful and popular tool (Kalisky et al., 2018; Sun et al., 2021). Unlike previous transcriptomic studies that assayed a “bulk” sample, scRNA-seq data can provide a detailed characterization of each tumor. Indeed, the Human Tumor Atlas Network [https://humantumoratlas.org] is increasingly enriched with data on human cancers based on scRNA-seq assays. The high-resolution transcriptomic platform has led to several scRNA-seq studies of the composition of CAFs in different stages of cancer (Bernardo and Fibbe, 2013; Li et al., 2017; Puram et al., 2017; Lambrechts et al., 2018; Elyada et al., 2019; Hosein et al., 2019; Davidson et al., 2020; Dominguez et al., 2020; Friedman et al., 2020). For focused understanding of the heterogeneous expressions of genes, different sites of the same tumor were analyzed with multiregional RNA sequencing for different cancers (Gerlinger et al., 2012; Zhang et al., 2014; Yates et al., 2015; Thrane et al., 2018).

Despite the advancements and efficacy of scRNA-seq, the lack of spatial information in scRNA-seq analysis is a significant shortcoming for typical scRNA-seq methods to capture cellular heterogeneity. For a tumor sample, the presence of spatial contexts might play a major role which could be combined with scRNA-seq data with the explicit aim to capture microenvironmental heterogeneity. Spatial cell-to-cell communication in a given tissue image can be recovered from a spatial scRNA sequencing data via computational spatial re-mapping (Teves and Won, 2020). Alternatively, integration of high-resolution gene expression data with spatial coordinates can resolve such experimental shortcomings (Eng et al., 2019). While imaging the transcriptome in situ with high accuracy has been a major challenge in single-cell biology, development of high-throughput platforms for sequential fluorescence in situ hybridization such as RNA seqFISH+ and algorithms such as CELESTA can identify cell populations and their spatial organization in intact tissues (Zhang et al., 2014; Eng et al., 2019). Towards this, many recent efforts have developed methods to analyze spatial information in single-cell studies (Lee et al., 2014; McKenna et al., 2016; Shah et al., 2016; Frieda et al., 2017; Alemany et al., 2018; Codeluppi et al., 2018; Raj et al., 2018; Spanjaard et al., 2018; Wang et al., 2018).

High-throughput transcriptomic data are useful not only for identifying genes that are differentially expressed, but also to test for co-regulation of multiple genes, i.e., a gene-set, based on existing empirical knowledge of biological pathways and gene signatures, e.g., the well-known hallmarks of cancer. In this direction, several methods for gene-set analysis (GSA) were introduced by (Goeman et al., 2004; Mansmann and Meister, 2005; Subramanian et al., 2005; Kong et al., 2006; Dinu et al., 2007; Efron and Tibshirani, 2007). Since the genes within such gene-sets share a common biological function, considering the correlations within each set is a key aspect of a useful GSA method. However, it was shown by (Tsai and Chen, 2009) that the above GSA methods were affected by large type II errors.

An important limitation of many GSA methods is that they can only accommodate binary outcomes, such as disease versus control. Our method, Linear Combination Test (LCT) is a GSA method that was designed to address these limitations by taking into account correlations across genes and outcomes, and dealing with binary, univariate or multivariate continuous outcomes, measured either at a single point in time or at multiple time points, and therefore, allow us to analyze a wider range of studies involving complex study designs (Wang et al., 2014). Studies have shown that LCT can overcome difficulties such as small sample size, large gene-sets, and can accommodate correlations across gene-sets, time points, and multiple correlated continuous phenotypes (Dinu et al., 2013). Thus, while a specific gene may not show consistent expression across individual cells, LCT is more likely than traditional approaches to detect the regulation of a functional process or biological pathway associated with the intercellular diversity of outcomes in a single cell level experiment.

Recently, we have extended LCT beyond any other “bulk” GSA method for application to single cell experiments (Dinu et al., 2021). However, GSA is considerably more complicated in the presence of spatial information since the analyzed gene-sets need not have a uniform impact over the entire area of a spatially continuous phenotype. In fact, the significance of association between a selected gene-set and a particular phenotypic context at various microenvironmental neighborhoods could be different. Yet, variable as they may be, since spatial effects are generally continuous in nature, proximity may determine more correlated associations than those across distant locations within the same tumor space. Notably, this alludes to Tobler’s First Law of Geography, which states that “everything is related to everything else, but near things are more related than distant things.” Traditional testing of such relationships involves global or “aspatial” regression, with the implicit assumption that the impact of the genes in a gene-set (covariates) on the phenotype (spatial outcome) is constant across the tumor space (study area). In the presence of ITH, such stationarity assumption is unlikely to be valid. Geographically weighted regression (GWR) is a well-known method (Brunsdon et al., 1996) that avoids this problem by performing the regression within local windows and each observation is weighted according to its proximity to the center of the window. Adaptive kernel bandwidths allow for heterogeneity among densities of gene expression over the windows in different parts of the study region. Local regression coefficients and associated statistics are mapped to visualize how the explanatory power of a gene-set on the associated phenotypes changes spatially.

In the present study, we combined gene-set analysis of LCT with the local spatial modeling of GWR with the aim to develop geographically weighted LCT (GWLCT) as a statistical test. We demonstrated it on spatial scRNA-seq data from a real breast tumor sample and obtained key insights into its molecular heterogeneity across different spatially continuous phenotypic contexts defined by five well-known markers of CAFs. We note that GWLCT has several distinct advantages. While the popular GSA methods are aspatial and use only bulk gene expression data, GWLCT is developed for spatial single cell gene expression data. The geographical weighting scheme allows nearby neighborhoods to contribute more to each local model, and the regions with significant association of a selected gene-set and a corresponding phenotype are detected using scan statistics on the local test scores and illustrated as maps. At each location, the combined significance of such associations for the selected gene-sets is computed and visualized with a spatial heatmap. We also present new 3D interactive tools for insightful visualization of the tumor space. In the next section, we describe the data and methods, followed by the results of real tumor data analysis and simulations of different association scenarios using GWLCT, and end with discussion, including future work.

Materials and methods

Data

Data for spatial transcriptomics were downloaded from the 10x Genomics website (https://www.10xgenomics.com/). In brief, the data were created using the Visium Spatial Gene Expression technique on an invasive breast cancer tissue sample that is expressing the Estrogen Receptor (ER), Progesterone Receptor (PR), and Human Epidermal Growth Factor Receptor (HER) negative. Illumina NovaSeq 6000 was used to generate the RNA sequencing data, which had a sequencing depth of 72,436 mean reads per cell. The downloaded dataset was filtered for average gene expression values greater than 1, and the resulting data matrix had 1,981 rows (genes) and 4,325 columns (single cells). The zero counts were substituted as part of the RNAseq data preparation with a relatively small random jitter about zero that would have the least impact on the remaining gene expression values. Using the bestNormalize package in the R programming language, we used a 10-fold cross-validation based data transformation strategy to normalize each gene’s expression across samples (Peterson and Peterson, 2020).

Gene-sets

We downloaded from the Molecular Signatures Database (MSigDB) candidate gene-sets that represent commonly known “hallmarks” of cancer (Liberzon et al., 2011). To ensure their relevance and non-redundancy, we selected 8 hallmark gene-sets with at least 25% overlap with the expressed genes (see above text on preprocessing) but mutual gene-set overlap of less than 10%. The selected hallmark gene-sets are: Epithelial Mesenchymal Transition (EMT, size = 81) (Sun et al., 2020), Angiogenesis (size = 12) (Madu et al., 2020), DNA Repair (DNA_Rep, size = 42) (Paluch-Shimon and Evron, 2019), Pi3k AKT MTOR Signaling (Pi3k, size = 28) (Dong et al., 2021), Fatty Acid Metabolism (FAM, size = 41) (Xu et al., 2021), P53 Pathway (P53, size = 50) (Gasco et al., 2002), Estrogen Response Early (ERE, size = 63) (Oshi et al., 2020), and Estrogen Response Late (ERL, size = 62) (Takeshita et al., 2021).

CAF markers

A selected set of five CAF phenotypes, which were represented by the expression of the corresponding marker genes (the respective phenotypes are noted in parentheses): CXCL12 (CAF-S1), FBLN1 (mCAFs), C3 (inflammatory CAFs), S100A4 (sCAFs), and COL11A1, which is a fibroblast-specific “remarkable biomarker” that codes for collagen 11-α1 and shows expression gain in CAFs (Vázquez-Villa et al., 2015). For details on the CAF markers, see reviews, e.g., (Gascard and Tlsty, 2016; Lee et al., 2020).

Statistical analysis

Similar to our previously developed GSA methods, GWLCT is motivated by a research gap, more precisely, the need for a statistical method taking into account spatial correlations across genes. The main goal of GSA methods is to efficiently screen large catalogues of a priori defined sets of genes sharing common biological functions, easily accessible to GSA users. GSA methods are testing for associations of such sets with a phenotype. To the best of our knowledge, there are no such methods developed for situations where gene measurements at spatial proximity could exhibit higher correlations. What is popularly known as Tobler’s “first law of geography” states that “everything is related to everything else, but near things are more related than distant things.” Based on this fundamental concept, which we borrowed from spatial data analysis, we provide below statistical derivations of an extension of LCT to geographically weighted spatial omic data.

Consider gene expression data of “” gene variables (), “” cells (points) and “” sets of genes at locations given by 2-dimensional Cartesian coordinates. The LCT approach assumes a null hypothesis in which there is no association between a linear combination of with the phenotype (Dinu et al., 2013). For a local point , we can define a univariate regression as:whereis the linear combination of and . For each location in the dataset, we can find the most significant linear combination as follows:where represents the weighted shrunken covariance matrix for each calibration location. The weights are generated using a bisquare kernel function, based on the Euclidean distance between two points and and bandwidth , which determines the radius around the point . Here, the optimal bandwidth is calculated using cross validation (CV) based on the sum of squared errors at each cell point and set of genes:

The bandwidth with the least measure of CV is used for localization. Weighting functions of bisquare and tricube type kernels are used to take the weighted location at against another location into account. The bisquare kernel weighting function is defined as:and the tricube kernel weighting function as:

For the weighting functions, the bandwidth can be determined either beforehand (fixed distance) or as the distance between the point l and its nearest neighbor (adaptive), which is predetermined as well.

The shrunken covariance matrix of the gene expression data in the cell and around the estimated bandwidth () can be written as:

Using the weighted shrunken covariance matrices, the most significant linear combination at location can be determined as the maximum Eigenvector of

Combined significance mapping

We used the Fisher’s sum of log of independent p-values method to calculate the combined significance () of association of the gene-sets (e.g., 8 hallmarks in the present example) with a fixed phenotype at each location. The sum follows a chi-square distribution with degrees of freedom, which yields a combined p-value corresponding to at a given location. Thus, the combined significance is computed as and plotted as a spatial heatmap.

Spatial cluster detection

Based on the count of significant gene-sets as determined by GWLCT at a given location, spatial clusters are detected and mapped for the user. For this purpose, assuming a Poisson distribution of such counts over a grid of points placed on the tumor space, scan statistics are computed with Openshaw’s Geographical Analysis Machine (GAM) (Openshaw et al., 1987) function as implemented in the R package DCluster.

Comparative analysis

A comparative analysis against popular aspatial GSA methods should help the reader understand the relevance of GWLCT extension to the GSA literature. In addition to GWLCT, the global LCT, GSEA, and a Random Forest based GSEA (RF-GSEA) (Chien et al., 2014) were also performed to identify the global gene-sets associated with the outcome. In the domain of regression, the random forest based technique is used when the outcome of concern is a continuous phenotype. The GSEA ignores the continuous phenotype and checks if the gene-sets show statistical difference between biological states. The RF-GSEA combines bootstrap and classification tree to find the proportion of explained variance of a continuous phenotype for a specific gene-set. The small sample size issue has been previously considered in these methods so that variable selection is conducted only from a small random subset of the variables. Moreover, the RF-GSEA is able to accommodate a continuous phenotype when the associations between genesets and phenotypes are non-linear and contain complicated high-order interaction effects (Chien et al., 2014). Next, we give details of the simulation studies, including our approach to generate low and high spatial correlations among genes’ expressions. This is a key aspect in observing and understanding advantages of GWLCT over aspatial methods in our study.

Simulation study

Several scenarios were designed to find the impact of simulation components such as bandwidth, number of coordinate points, number of genes, genes spatial association, phenotype-genes spatial association, and gene set probability on the statistical power of the methods. Similar to our previous simulations studies for LCT and its extensions (Wang et al., 2014), we generated a random subset of genes. The first step of the simulation was designed by making assumptions for the spatial distribution of the genes and phenotype. To do so, we used spatial and spatio-temporal geostatistical modeling, prediction and simulation function in R software called “gstat”. This function creates an R object with the necessary fields for univariate or multivariate geostatistical prediction, its conditional or unconditional Gaussian, or indicator simulation equivalents (Pebesma, 2004). Values were set for the variogram model components as 10 for the partial sill, 3 for the range parameter, 10 for the nugget, and 30 for the number of nearest observations that are used for the kriging simulation. Moreover, a Gaussian model was assumed for the distribution of the gene expressions and phenotypes.

In the next step, the GWLCT components were defined. For the gene-set matrix, a binomial distribution was used to generate a membership indicator matrix in which the proportions of genes belonging to the gene-sets were characterized using the probability parameter (Low = 0.3, and High = 0.9). Three different values for the spatial covariance were considered to imply the spatial association among genes’ expressions. The higher the spatial covariance, the lower the spatial association. Thus, a variance of 50 was considered as high which gives a corresponding low spatial association, a variance of 5 a moderate spatial association, and a variance of 0.1 a high spatial association.

As well, the spatial association between the continuous phenotype and the gene expression data was taken into account by a spatially and normally distributed phenotype generated from the gene expression data with the same parameters as for the spatial association among genes’ expressions. High and low levels for the radius/bandwidth around each location for the local analyses were 20 and 6, respectively. The number of coordinate points was 10 by 10 for low, and 100 by 100 for high. Finally, the two levels of low and high were defined for the total number of genes as 100 and 1000 respectively.

The above simulation components resulted in 144 scenarios in which an adaptive bandwidth kernel with the bisquare weighting function was used. The number of permutations was fixed and considered as 500, and the threshold of significance was assumed as 0.05. The methods were compared based on their statistical power. An average of statistical powers across the locations was computed and used as the performance measure for GWLCT.

R programming software version 4.1.1 is used for data analysis and packages such as corpcor (Schafer et al., 2017), qvcalc (Firth and Firth, 2020), stringr (Wickham, 2010), and plotly (Sievert, 2020).

Results

The three aspatial methods, LCT, RF-GSEA, and GSEA, were used to identify the “hallmark” cancer gene-sets that are significantly associated with five spatially continuous CAF phenotypes represented by their known markers C3, COL11A1, CXCL12, FBLN1, and S100A4 in the single cell breast cancer spatial transcriptomic data. The three aspatial methods, the phenotype, the size of each gene-set, p-value of the test, and the corresponding q-value are shown in Table 1. We note that such global p-values cannot be obtained from our proposed GWLCT, as this method is spatial in nature and assesses significance at every coordinate point rather than an overall significance measure. Using GSEA, the only significant gene-set was P53 (p-value = 0.009, q-value = 0.010). The results of LCT and RF-GSEA revealed that all the gene-sets are strongly associated with the five continuous phenotypes with p-value and q-value less than 0.001. This was expected since the candidate gene-sets represent commonly known “hallmarks” of cancer (Liberzon et al., 2011). The breast cancer data analysis interpretation resulting from the three aspatial methods considered is limited to the global significance values. This is an important limitation of aspatial methods. For the remainder of this section, we emphasize the advantages of GWLCT results interpretation in the context of spatial data analysis. Since the main advantage of GWLCT consists of an analysis beyond global significance, we study specifically at locations across the tumor space. Scan statistics provide a well-established computational method for detecting spatial clusters based on point count data. We computed scan statistics to detect the putative clusters of spatial regulation based on the number of cells with significantly enriched gene-sets occurring at locations where such counts exceeded what may be expected from an underlying Poisson distribution defined over the tumor space. The clusters are demarcated as white regions in Figure 1. Notably, the 3D plots, which are instrumental in showcasing the advantages of GWLCT over existing aspatial GSA methods, are presented here as well as at the following links on GitHub: https://mortezahaji.github.io/GWLCT-Project/. By clicking on any point in the plot, one can find the coordinate of the cell, corresponding number and name of significant gene-sets at three different levels: Low, Moderate, and High expressions of the selected CAF marker gene.

TABLE 1

Method	Phenotype	Gene-set name	Gene-set size	p-value	Q-value
GSEA (No phenotype specified)	Not Applicable	EMT	81	0.504	0.648
		ANGIOGENESIS	12	0.227	0.486
		DNA_Rep	41	0.425	0.763
		PI3K	28	0.059	0.155
		FAM	40	0.623	0.648
		P53	50	0.009	0.010
		ERE	64	0.178	0.486
		ERL	62	0.297	0.486
Global LCT	C3, COL11A1, CXCL12, FBLN1, S100A4	EMT	81	<0.001	<0.001
		ANGIOGENESIS	12	<0.001	<0.001
		DNA_Rep	41	<0.001	<0.001
		PI3K	28	<0.001	<0.001
		FAM	40	<0.001	<0.001
		P53	50	<0.001	<0.001
		ERE	64	<0.001	<0.001
		ERL	62	<0.001	<0.001
RF-GSEA	C3, COL11A1, CXCL12, FBLN1, S100A4	EMT	81	<0.001	<0.001
		ANGIOGENESIS	12	<0.001	<0.001
		DNA_Rep	41	<0.001	<0.001
		PI3K	28	<0.001	<0.001
		FAM	40	<0.001	<0.001
		P53	50	<0.001	<0.001
		ERE	64	<0.001	<0.001
		ERL	62	<0.001	<0.001

An evaluation of cancer hallmark gene-sets associated with five continuous phenotypes C3, COL11A1, CXCL12, FBLN1, and S100A4 using the aspatial methods including GSEA, LCT, and RF-GSEA on a single cell breast cancer study.

FIGURE 1

In addition, for local GWLCT, three different CAF categories were identified as Low (CAF gene expression less than 0.5), Moderate (CAF gene expression between 0.5 and 1), and High (CAF gene expression exceeding 1). Figure 2 demonstrates a snapshot of the 3D plots for the 5 phenotypes at three CAF levels. A snapshot of the 3D plot for the phenotype COL11A1 at high CAF level is also demonstrated in Figure 3. One is able to detect the frequency as well as the names of significant gene-sets at each location (based on the 8 gene-sets) by clicking on each dot, and rotating and zooming into the 3D interactive plot available at the above-mentioned website. The 3D plot in Figure 3 is divided into eight 2-dimension plots in Figure 4 so that one can evaluate the distribution of one to eight significant gene-sets across the regions with COL11A1 expressed at high CAF level.

FIGURE 2

FIGURE 3

FIGURE 4

Finally, Figure 5 shows the combined significance () heatmap of the 8 hallmarks for COL11A1 phenotype at a high CAF level. Higher values of in Figure 5 represent locations with a combined significant gene-set.

FIGURE 5

Simulation study results

Supplementary Table S1 and Figure 6 shows the estimated statistical power of the methods using the simulation study. GSEA has the least statistical power among the four methods at all the 144 scenarios. Using 500 iterations in the simulation study, it was revealed that regardless of any parameter in the simulation, GSEA and GWLCT always have the least and most statistical power among the methods, respectively. Overall, we note that bandwidth, number of coordinate points, gene-set size, or total number of genes in each simulation experiment, do not affect the statistical power performance of the methods, a desirable feature shared by sound gene-set analysis methods.

FIGURE 6

To be particular, by considering GWLCT as the reference category and using Dunnett’s test, we compared the average power of GWLCT against Bulk GSEA, LCT, and RF-GSEA by varying from Low to Moderate to High levels of six different parameters, as shown in Figure 6A through F. (A) For the low bandwidth, the mean power of GWLCT was significantly higher than LCT and Bulk (p < 0.001). The GSEA and GWLCT were not statistically different in terms of statistical power (p = 0.534). For the high bandwidth level GWLCT was not different with the GSEA (p = 0.709), and LCT (p = 0.126) and different with Bulk (p < 0.001). (B) GWLCT performed better than Bulk (p < 0.001 for low and high level of number of coordinate points), and LCT (p = 0.0.021 for low level of number of coordinate points, and p = 0.002 for high level of number of coordinate points). No statistical difference was found between GWLCT and GSEA (p = 0.541 for low level of number of coordinate points, and p = 0.715 for a high level of number of coordinate points). (C) GWLCT performed better than Bulk (p < 0.001 for low and high levels of number of genes), and LCT (p = 0.004 for low level of number of genes, and p = 0.019 for high level of number of genes) and the same with GSEA (p = 0.599 for low level of number of genes, and p = 0.590 for high level of number of genes). (D) GWLCT performed better than Bulk and LCT (p < 0.001 for low and high levels of gene-set probability (size of gene-sets)). No statistical difference was found between GWLCT and GSEA (p = 0.397 for low level of gene-set probability, and p = 0.820 for high level of gene-set probability). (E) For the high level of spatial association between genes and continuous phenotype, GWLCT outperformed Bulk (p < 0.001) and LCT (p = 0.005). No statistical difference was found between GWLCT and GSEA (p = 0.778). For the moderate level of spatial association between genes and continuous phenotype, GWLCT outperformed Bulk (p < 0.001) and LCT (p < 0.001). No statistical difference was found between GWLCT and GSEA (p = 0.399). For the low level of spatial association between genes and continuous phenotype, GWLCT outperformed GSEA (p < 0.001), Bulk (p < 0.001), and LCT (p < 0.001). (F) For the high level of spatial association between genes, GWLCT outperformed Bulk (p < 0.001). No statistical difference was found between GWLCT with GSEA (p = 0.849) and with LCT (p = 0.523). For the moderate level of spatial association between genes, GWLCT outperformed Bulk (p < 0.001) and LCT (p = 0.012). No statistical difference was found between GWLCT and GSEA (p = 0.768). For the low level of spatial association between genes, GWLCT outperformed Bulk (p < 0.001), and LCT (p < 0.001). No statistical difference was found between GWLCT and GSEA (p = 0.528).

The GSEA statistical power is affected by all the simulation variables, however this is mostly due to the zero power of some scenarios. The performance of GSEA is affected by larger number of coordinate points, higher number of genes, higher spatial association among gene expressions, as well as between the phenotype and the gene expressions, and higher number of genes at the gene-sets leads to higher statistical power. Obviously, GSEA is in a lower class of statistical power compared to the other three approaches. Regardless of the scenarios, the results of one-way analysis of variance shows that there are significant differences in the statistical power of LCT, GWLCT, and RF-GSEA (F = 7.842, p < 0.001). Tukey’s multiple comparison revealed that the difference is due to lower statistical power of LCT compared to the other two methods. Considering a fixed effect for other variables in the simulation, one can find out that LCT has a lower statistical power compared to the other two GWLCT and RF-GSEA when the band width is low. For a study with a low number of coordinate points, the power of GWLCT is significantly higher in comparison to LCT. The almost flat trend of power for the methods also reveals that the performance of the methods is robust against this parameter. As the number of genes increases, the power of LCT reduces significantly compared to GWLCT and RF-GSEA. Moreover, as the amount of spatial association among genes decreases, the power of LCT is significantly lower compared to GWLCT and RF-GSEA. At high levels of genes spatial association, LCT performs reasonably well compared to GWLCT and RF-GSEA, as it is designed to accommodate correlations across genes in a set or biological pathway, via a shrinkage correlation matrix. However, at lower levels of gene spatial correlation, GWLCT and RF-GSEA outperform LCT. Moreover, as the amount of spatial association between phenotype and genes increases, the power of GWLCT is significantly higher compared to LCT and RF-GSEA. In addition, the statistical power is robust against the probability of genes belonging to the gene-sets, which is directly related to gene-set size. Although the power increases slightly for all the methods when the size of gene-sets are larger, the increase is not statistically significant between Low and High levels of binomial probability parameter.

Therefore, the most important variables influencing statistical power are the spatial association features. There is a dose response trend of improvement in statistical power for each method, as spatial association among genes increases. The GWLCT outperforms all other methods in all levels of low, medium, and high spatial association among genes. The performance gap narrows down as we move from Low to High levels, indicating higher magnitudes of correlations are easier to be picked up. Interestingly, GWLCT picks up even on subtle spatial associations across genes, exhibiting the largest improvement in statistical power over other methods at Low spatial association levels. The spatial association between phenotype and gene expressions also plays a key role in method performance. When there is a low spatial association between phenotype and genes, GWLCT is still able to detect true significant associations between gene-sets and phenotype, while all other methods have a flat zero statistical power. Similar to the spatial associations across genes, GWLCT picks up on subtle signals for low levels of phenotype-gene expression levels of spatial associations. GWLCT outperforms all other methods at Moderate and High spatial association between phenotype and genes, with the highest statistical power of 0.95, across all simulation scenarios. A significant increase happens when the spatial association between phenotype and genes increases. The highest power for the GWLCT and RF-GSEA can be achieved when both spatial association variables are high. In contrast to GWLCT, the RF-GSEA loses its statistical power when the spatial association among the genes is at Low levels. GWLCT can identify more subtle signals of spatial association, which is an attractive property of the proposed method. More details on the statistical power of the methods in different circumstances can be found in the appendix Supplementary Table S2.

Discussion

Observations of heterogeneity of cell subpopulations in a tumor and the complex interplay of functions involved in the diverse morphological and phenotypic profiles of cancers have a long history. Even in the 19th century, pleomorphism of cancer cells within tumors was observed by the “father of modern pathology”, Rudolph Virchow. More recently, in the 1970s, G.H. Heppner, I.J. Fidler and others showed the existence of distinct subpopulations of cancer cells in tumors, which differed in terms of their tumorigenicity, resistance to treatment, and potential to metastasize. Heppner reviewed the concept of tumor heterogeneity in 1984, and recognized cancers as being composed of multiple subpopulations (Heppner and Miller, 1983), which leads to heterogeneity of cellular morphology, gene expression, metabolism, motility, proliferation, etc (Marusyk et al., 2020). Importantly, ITH has been shown to be associated with poor outcome and decreased response to cancer treatment multiple human cancer types implying a general role in therapeutic resistance (Landau et al., 2013; Patel et al., 2014; Zhang et al., 2014).

The past decade has revealed the immense potential of immunotherapy in cancer. Therapies that promote anti-tumor immune responses have resulted in marked and durable responses in subsets of patients in several cancers (Egen et al., 2020). For instance, abundance of tumor-infiltrating lymphocytes (TILs) and absence of lymphovascular invasion were found to be useful prognostic factors for disease-free survival in patients with HR-/HER2+ breast cancer who were treated using adjuvant trastuzumab (Lee et al., 2015). Spatial transcriptomic approach (Ståhl et al., 2016) was used to identify a type I interferon response overlapping with regions of T Cell and macrophage subset co-localization in HER2+ breast tumors (Andersson et al., 2021). To address the complex interplay between different molecular backgrounds that can characterize ITH with spatial precision, we used GWLCT at a given location in the tissue space to test for the association between a phenotype of interest and different selected gene-sets. The score to summarize the overall significance of such associations is computed and visualized with a spatial heatmap.

Gene-set analysis (GSA) is a well-established methodological approach in bioinformatics to test for significant regulation of a selected collection of genes across given samples that represent distinct outcomes. At the level of single cells, GSA could be extended to samples that are individual cells which admit to different phenotypes of interest (Dinu et al., 2021). Furthermore, for spatial single cell analysis, such phenotypes would ideally have a spatially correlated and continuous representation. The gene-sets used in GSA are typically curated based on existing experimentally obtained knowledge of genes and their involvements in molecular pathways. In the present study, for illustrative purposes, we selected a collection of 8 gene-sets that represent certain distinctive hallmarks of cancer (Hanahan, 2022). To test for their enrichment in relevant intratumor contexts, we selected 5 different CAF phenotypes of interest since CAFs are well-known for their contribution to heterogeneity and plasticity in the tumor microenvironment (Ping et al., 2021).

The usual methods for GSA involve one of the two major approaches: (a) competitive, which examines if the correlation of a gene-set with the phenotype is the same as the other gene-sets, and (b) self-contained hypothesis, which investigates if the expression of a gene-set changes by the experimental condition. Our LCT method belongs to the former approach which is more likely than traditional methods to detect the regulation of a functional process or biological pathway that is significantly associated with the gene expression results of a given SCA experiment (Dinu et al., 2021). Interestingly, LCT also extends to longitudinal (Khodayari Moez et al., 2019), multivariate and continuous outcomes (Wang et al., 2014), which are capabilities that we built upon here for providing more accurate representation of single cell level stochasticity of the transcriptomic behavior than that of the univariate and discrete class labels typically used in traditional bulk sample studies.

Simulations, along with real omic data analysis, have served as a powerful and effective tool for establishing the performance of new GSA methods. Past studies have thus used simulation for comparative analysis of different criteria of performance of LCT and other major GSA methods. It was found that LCT has type I error and power that are comparable to MANOVA-GSA (Wang et al., 2014) and superior to SAM-GS (Dinu et al., 2007), particularly at higher magnitudes of correlation values across gene-sets (as is commonly noted during GSA). In terms of computational efficiency, LCT outperformed both methods. In another simulation study, LCT also outperformed GSEA (Moez et al., 2018). Along this direction, therefore, in the present study, we conducted a large number of simulations to compare the performance of GWLCT against multiple known GSA techniques based on a variety of well-defined criteria under different experimental assumptions (or scenarios). Interestingly, statistical power did not change with a variation in set size, number of coordination points or bandwidth, or total number of genes in the simulation dataset, for any of the methods considered, which represent desirable properties for sound GSA methods. Larger spatial correlation across gene expression measurements and between genes and phenotype are key aspects of improved statistical power across our simulation experiments.

In the present study, we introduced GWLCT as a new computational platform that presents a fusion of ideas from spatial data analysis (GWR) and bioinformatics (GSA). We understand that the dual modes—both spatial and single-cell—in which GWLCT provides a joint extension to other GSA approaches places it in a unique category thus making it difficult to compare with the existing methods. Yet, we conducted extensive simulation studies which revealed better performance of GWLCT based on several criteria as compared to many known GSA methods that work either on bulk transcriptomics for different scenarios or aspatial version of single-cell transcriptomics. In particular, the use of multiple different kernels and flexible (adaptive) choice of corresponding bandwidths for geographical weighting allows the linear combination test to test for local associations between selected gene-sets and phenotypic contexts within a tumor sample. Thus, GWLCT provides a novel spatial version of gene-set analysis using high-resolution spatial scRNA-seq data. It does have some limitations that will be addressed in our future work. For instance, as it is difficult to determine a priori the precise spatial scale at which a gene-set or pathway may be regulated in a given phenotypic context, new multi-scale geographical weighting techniques (Fotheringham et al., 2017) may prove to be useful. We will also extend GWLCT to other omic data as we have previously demonstrated with LCT (Khodayari Moez et al., 2019). As spatial single cell omic platforms become increasingly popular, GWLCT will enrich the ongoing efforts in this rapidly emerging area of research (Zhang et al., 2014; Eng et al., 2019; Hajihosseini et al., 2022). Clinical verification of such new analytical methods will require follow-up studies that must be systematically designed for that specific purpose.

Statements

Data availability statement

The de-identified “Spatial Gene Expression Dataset by Space Ranger 1.2.0” used in the present study was obtained from the 10X Genomics website at https://www.10xgenomics.com/resources/datasets/human-breast-cancer-whole-transcriptome-analysis-1-standard-1-2-0 Genomics obtained fresh frozen human Invasive Lobular Carcinoma breast tissue from BioIVT Asterand. It was AJCC/UICC Stage Group I, ER positive, PR positive, HER2 negative. For further details, see the above website.

Ethics statement

Ethical review and approval was not required for the study on human participants in accordance with the local legislation and institutional requirements. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.

Author contributions

PA: Software coding, Data analysis, drafting the manuscript, reviewing the results and approving the final version of the manuscript. ID: study conception and design, interpreting the results, reviewing the results and approving the final version of the manuscript. SP: study conception and design, interpreting the results, reviewing the results and approving the final version of the manuscript. MH: Prepared the data, contributed data analysis, reviewing the results and approving the final version of the manuscript.

Funding

This research was supported by a Fellowship from Mathematics of Information Technology and Complex Systems Accelerate program (grant number: RES0047324, recipient: ID).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fcell.2023.1065586/full#supplementary-material

References

1
AlemanyA.FlorescuM.BaronC. S.Peterson-MaduroJ.Van OudenaardenA. (2018). Whole-organism clone tracing using single-cell sequencing. Nature556 (7699), 108–112. 10.1038/nature25969
- CrossRef
- Google Scholar
2
AnderbergC.PietrasK. (2009). On the origin of cancer-associated fibroblasts. Oxfordshire, United Kingdom: Taylor and Francis.
- Google Scholar
3
AnderssonA.LarssonL.StenbeckL.SalménF.EhingerA.WuS. Z.et al (2021). Spatial deconvolution of HER2-positive breast cancer delineates tumor-associated cell type interactions. Nat. Commun.12 (1), 6012. 10.1038/s41467-021-26271-2
- CrossRef
- Google Scholar
4
BeikiO.HallP.EkbomA.MoradiT. (2012). Breast cancer incidence and case fatality among 4.7 million women in relation to social and ethnic background: A population-based cohort study. Breast Cancer Res.14 (1), R5–R13. 10.1186/bcr3086
- CrossRef
- Google Scholar
5
BernardoM. E.FibbeW. E. (2013). Mesenchymal stromal cells: Sensors and switchers of inflammation. Cell stem Cell13 (4), 392–402. 10.1016/j.stem.2013.09.006
- CrossRef
- Google Scholar
6
BrechbuhlH. M.Finlay-SchultzJ.YamamotoT. M.GillenA. E.CittellyD. M.TanA-C.et al (2017). Fibroblast subtypes regulate responsiveness of luminal breast cancer to estrogen. Clin. Cancer Res.23 (7), 1710–1721. 10.1158/1078-0432.CCR-15-2851
- CrossRef
- Google Scholar
7
BrunsdonC.FotheringhamA. S.CharltonM. E. (1996). Geographically weighted regression: A method for exploring spatial nonstationarity. Geogr. Anal.28 (4), 281–298. 10.1111/j.1538-4632.1996.tb00936.x
- CrossRef
- Google Scholar
8
ChangP. H.Hwang-VersluesW. W.ChangY. C.ChenC. C.HsiaoM.JengY. M.et al (2012). Activation of Robo1 signaling of breast cancer cells by Slit2 from stromal fibroblast restrains tumorigenesis via blocking PI3K/Akt/β-catenin pathway. Cancer Res.72 (18), 4652–4661. 10.1158/0008-5472.CAN-12-0877
- CrossRef
- Google Scholar
9
ChenX.SongE. (2019). Turning foes to friends: Targeting cancer-associated fibroblasts. Nat. Rev. Drug Discov.18 (2), 99–115. 10.1038/s41573-018-0004-1
- CrossRef
- Google Scholar
10
ChienC-Y.ChangC-W.TsaiC-A.ChenJ. J. (2014). MAVTgsa: an R package for gene set (enrichment) analysis. BioMed Res. Int.2014, 346074. 10.1155/2014/346074
- CrossRef
- Google Scholar
11
CodeluppiS.BormL. E.ZeiselA.La MannoG.van LunterenJ. A.SvenssonC. I.et al (2018). Spatial organization of the somatosensory cortex revealed by osmFISH. Nat. methods15 (11), 932–935. 10.1038/s41592-018-0175-z
- CrossRef
- Google Scholar
12
CortezE.RoswallP.PietrasK. (2014). Functional subsets of mesenchymal cell types in the tumor microenvironment. Seminars cancer Biol.25, 3–9. Elsevier. 10.1016/j.semcancer.2013.12.010
- CrossRef
- Google Scholar
13
CostaA.KiefferY.Scholer-DahirelA.PelonF.BourachotB.CardonM.et al (2018). Fibroblast heterogeneity and immunosuppressive environment in human breast cancer. Cancer Cell33 (3), 463–479. 10.1016/j.ccell.2018.01.011
- CrossRef
- Google Scholar
14
CuiffoB. G.KarnoubA. E. (2012). Mesenchymal stem cells in tumor development: Emerging roles and concepts. Cell adhesion Migr.6 (3), 220–230. 10.4161/cam.20875
- CrossRef
- Google Scholar
15
DavidsonS.EfremovaM.RiedelA.MahataB.PramanikJ.HuuhtanenJ.et al (2020). Single-cell RNA sequencing reveals a dynamic stromal niche that supports tumor growth. Cell Rep.31 (7), 107628. 10.1016/j.celrep.2020.107628
- CrossRef
- Google Scholar
16
DinuI.MoezE. K.HajihosseiniM.LeiteA.PyneS. (2021). Use of linear combination test to identify gene signatures of human embryonic development in single cell RNA-seq experiments. Statistics Appl.19 (1), 431–442.
- Google Scholar
17
DinuI.PotterJ. D.MuellerT.LiuQ.AdewaleA. J.JhangriG. S.et al (2007). Improving gene set analysis of microarray data by SAM-GS. BMC Bioinforma.8 (1), 242–313. 10.1186/1471-2105-8-242
- CrossRef
- Google Scholar
18
DinuI.WangX.KelemenL. E.VatanpourS.PyneS. (2013). Linear combination test for gene set analysis of a continuous phenotype. BMC Bioinforma.14 (1), 212–219. 10.1186/1471-2105-14-212
- CrossRef
- Google Scholar
19
DominguezC. X.MüllerS.KeerthivasanS.KoeppenH.HungJ.GierkeS.et al (2020). Single-cell RNA sequencing reveals stromal evolution into LRRC15+ myofibroblasts as a determinant of patient response to cancer immunotherapy. Cancer Discov.10 (2), 232–253. 10.1158/2159-8290.CD-19-0644
- CrossRef
- Google Scholar
20
DongC.WuJ.ChenY.NieJ.ChenC. (2021). Activation of PI3K/AKT/mTOR pathway causes drug resistance in breast cancer. Front. Pharmacol.12, 628690. 10.3389/fphar.2021.628690
- CrossRef
- Google Scholar
21
DuH.CheG. (2017). Genetic alterations and epigenetic alterations of cancer-associated fibroblasts. Oncol. Lett.13 (1), 3–12. 10.3892/ol.2016.5451
- CrossRef
- Google Scholar
22
EfronB.TibshiraniR. (2007). On testing the significance of sets of genes. Ann. Appl. statistics1 (1), 107–129. 10.1214/07-aoas101
- CrossRef
- Google Scholar
23
EgenJ. G.OuyangW.WuL. C. (2020). Human anti-tumor immunity: Insights from immunotherapy clinical trials. Immunity52 (1), 36–54. 10.1016/j.immuni.2019.12.010
- CrossRef
- Google Scholar
24
ElyadaE.BolisettyM.LaiseP.FlynnW. F.CourtoisE. T.BurkhartR. A.et al (2019). Cross-species single-cell analysis of pancreatic ductal adenocarcinoma reveals antigen-presenting cancer-associated fibroblasts. Cancer Discov.9 (8), 1102–1123. 10.1158/2159-8290.CD-19-0094
- CrossRef
- Google Scholar
25
EngC-H. L.LawsonM.ZhuQ.DriesR.KoulenaN.TakeiY.et al (2019). Transcriptome-scale super-resolved imaging in tissues by RNA seqFISH. Nature568 (7751), 235–239. 10.1038/s41586-019-1049-y
- CrossRef
- Google Scholar
26
FirthD.FirthM. D. (2020). Package ‘qvcalc’.
- Google Scholar
27
FlavahanW. A.GaskellE.BernsteinB. E. (2017). Epigenetic plasticity and the hallmarks of cancer. Science357 (6348), eaal2380. 10.1126/science.aal2380
- CrossRef
- Google Scholar
28
FotheringhamA. S.YangW.KangW. (2017). Multiscale geographically weighted regression (MGWR). Ann. Am. Assoc. Geogr.107 (6), 1247–1265. 10.1080/24694452.2017.1352480
- CrossRef
- Google Scholar
29
FriedaK. L.LintonJ. M.HormozS.ChoiJ.ChowK-H. K.SingerZ. S.et al (2017). Synthetic recording and in situ readout of lineage information in single cells. Nature541 (7635), 107–111. 10.1038/nature20777
- CrossRef
- Google Scholar
30
FriedmanG.Levi-GalibovO.DavidE.BornsteinC.GiladiA.DadianiM.et al (2020). Cancer-associated fibroblast compositions change with breast cancer progression linking the ratio of S100A4+ and PDPN+ CAFs to clinical outcome. Nat. Cancer1 (7), 692–708. 10.1038/s43018-020-0082-y
- CrossRef
- Google Scholar
31
GascardP.TlstyT. D. (2016). Carcinoma-associated fibroblasts: Orchestrating the composition of malignancy. Genes and Dev.30 (9), 1002–1019. 10.1101/gad.279737.116
- CrossRef
- Google Scholar
32
GascoM.ShamiS.CrookT. (2002). The p53 pathway in breast cancer. Breast cancer Res.4 (2), 70–76. 10.1186/bcr426
- CrossRef
- Google Scholar
33
GerlingerM.RowanA. J.HorswellS.LarkinJ.EndesfelderD.GronroosE.et al (2012). Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N. Engl. J. Med.366, 883–892. 10.1056/NEJMoa1113205
- CrossRef
- Google Scholar
34
GoemanJ. J.Van De GeerS. A.De KortF.Van HouwelingenH. C. (2004). A global test for groups of genes: Testing association with a clinical outcome. Bioinformatics20 (1), 93–99. 10.1093/bioinformatics/btg382
- CrossRef
- Google Scholar
35
HajihosseiniM.AminiP.VoicuD.DinuI.PyneS. (2022). Geostatistical modeling and heterogeneity analysis of tumor molecular landscape. Cancers (Basel)14 (21), 5235. Preprints; 2022090388. 10.3390/cancers14215235
- CrossRef
- Google Scholar
36
HanahanD. (2022). Hallmarks of cancer: New dimensions. Cancer Discov.12 (1), 31–46. 10.1158/2159-8290.CD-21-1059
- CrossRef
- Google Scholar
37
HanahanD.WeinbergR. A. (2011). Hallmarks of cancer: The next generation. Cell144 (5), 646–674. 10.1016/j.cell.2011.02.013
- CrossRef
- Google Scholar
38
HeppnerG. H.MillerB. E. (1983). Tumor heterogeneity: Biological implications and therapeutic consequences. Cancer Metastasis Rev.2 (1), 5–23. 10.1007/BF00046903
- CrossRef
- Google Scholar
39
HoseinA. N.HuangH.WangZ.ParmarK.DuW.HuangJ.et al (2019). Cellular heterogeneity during mouse pancreatic ductal adenocarcinoma progression at single-cell resolution. JCI insight5 (16), e129212. 10.1172/jci.insight.129212
- CrossRef
- Google Scholar
40
Jamal-HanjaniM.QuezadaS. A.LarkinJ.SwantonC. (2015). Translational implications of tumor heterogeneity. Clin. cancer Res.21 (6), 1258–1266. 10.1158/1078-0432.CCR-14-1429
- CrossRef
- Google Scholar
41
Jamal-HanjaniM.WilsonG. A.McGranahanN.BirkbakN. J.WatkinsT. B.VeeriahS.et al (2017). Tracking the evolution of non–small-cell lung cancer. N. Engl. J. Med.376 (22), 2109–2121. 10.1056/NEJMoa1616288
- CrossRef
- Google Scholar
42
JenkinsonG.PujadasE.GoutsiasJ.FeinbergA. P. (2017). Potential energy landscapes identify the information-theoretic nature of the epigenome. Nat. Genet.49 (5), 719–729. 10.1038/ng.3811
- CrossRef
- Google Scholar
43
JunttilaM. R.De SauvageF. J. (2013). Influence of tumour micro-environment heterogeneity on therapeutic response. Nature501 (7467), 346–354. 10.1038/nature12626
- CrossRef
- Google Scholar
44
KaliskyT.OrielS.Bar-LevT. H.Ben-HaimN.TrinkA.WinebergY.et al (2018). A brief review of single-cell transcriptomic technologies. Briefings Funct. Genomics17 (1), 64–76. 10.1093/bfgp/elx019
- CrossRef
- Google Scholar
45
KalluriR. (2016). The biology and function of fibroblasts in cancer. Nat. Rev. Cancer16 (9), 582–598. 10.1038/nrc.2016.73
- CrossRef
- Google Scholar
46
Khodayari MoezE.HajihosseiniM.AndrewsJ. L.DinuI. (2019). Longitudinal linear combination test for gene set analysis. BMC Bioinforma.20 (1), 650–719. 10.1186/s12859-019-3221-7
- CrossRef
- Google Scholar
47
KoliarakiV.PasparakisM.KolliasG. (2015). IKKβ in intestinal mesenchymal cells promotes initiation of colitis-associated cancer. J. Exp. Med.212 (13), 2235–2251. 10.1084/jem.20150542
- CrossRef
- Google Scholar
48
KongS. W.PuW. T.ParkP. J. (2006). A multivariate approach for integrating genome-wide expression data and biological knowledge. Bioinformatics22 (19), 2373–2380. 10.1093/bioinformatics/btl401
- CrossRef
- Google Scholar
49
LambrechtsD.WautersE.BoeckxB.AibarS.NittnerD.BurtonO.et al (2018). Phenotype molding of stromal cells in the lung tumor microenvironment. Nat. Med.24 (8), 1277–1289. 10.1038/s41591-018-0096-5
- CrossRef
- Google Scholar
50
LandauD. A.CarterS. L.StojanovP.McKennaA.StevensonK.LawrenceM. S.et al (2013). Evolution and impact of subclonal mutations in chronic lymphocytic leukemia. Cell152 (4), 714–726. 10.1016/j.cell.2013.01.019
- CrossRef
- Google Scholar
51
LandauD. A.ClementK.ZillerM. J.BoyleP.FanJ.GuH.et al (2014). Locally disordered methylation forms the basis of intratumor methylome variation in chronic lymphocytic leukemia. Cancer Cell26 (6), 813–825. 10.1016/j.ccell.2014.10.012
- CrossRef
- Google Scholar
52
LeBleuV. S.KalluriR. (2018). A peek into cancer-associated fibroblasts: Origins, functions and translational impact. Dis. models Mech.11 (4), dmm029447. 10.1242/dmm.029447
- CrossRef
- Google Scholar
53
LeeH. J.KimJ. Y.ParkI. A.SongI. H.YuJ. H.AhnJ-H.et al (2015). Prognostic significance of tumor-infiltrating lymphocytes and the tertiary lymphoid structures in HER2-positive breast cancer treated with adjuvant trastuzumab. Am. J. Clin. pathology144 (2), 278–288. 10.1309/AJCPIXUYDVZ0RZ3G
- CrossRef
- Google Scholar
54
LeeJ. H.DaugharthyE. R.ScheimanJ.KalhorR.YangJ. L.FerranteT. C.et al (2014). Highly multiplexed subcellular RNA sequencing in situ. Science343 (6177), 1360–1363. 10.1126/science.1250212
- CrossRef
- Google Scholar
55
LeeY. T.TanY. J.FalascaM.OonC. E. (2020). Cancer-associated fibroblasts: Epigenetic regulation and therapeutic intervention in breast cancer. Cancers12 (10), 2949. 10.3390/cancers12102949
- CrossRef
- Google Scholar
56
LeiS.ZhengR.ZhangS.WangS.ChenR.SunK.et al (2021). Global patterns of breast cancer incidence and mortality: A population‐based cancer registry data analysis from 2000 to 2020. Cancer Commun.41 (11), 1183–1194. 10.1002/cac2.12207
- CrossRef
- Google Scholar
57
LiH.CourtoisE. T.SenguptaD.TanY.ChenK. H.GohJ. J. L.et al (2017). Reference component analysis of single-cell transcriptomes elucidates cellular heterogeneity in human colorectal tumors. Nat. Genet.49 (5), 708–718. 10.1038/ng.3818
- CrossRef
- Google Scholar
58
LiberzonA.SubramanianA.PinchbackR.ThorvaldsdóttirH.TamayoP.MesirovJ. P. (2011). Molecular signatures database (MSigDB) 3.0. Bioinformatics27 (12), 1739–1740. 10.1093/bioinformatics/btr260
- CrossRef
- Google Scholar
59
LiuX.WangL.ZhangJ.YinJ.LiuH. (2013). Global and local structure preservation for feature selection. IEEE Trans. neural Netw. Learn. Syst.25 (6), 1083–1095. 10.1109/TNNLS.2013.2287275
- CrossRef
- Google Scholar
60
MaduC. O.WangS.MaduC. O.LuY. (2020). Angiogenesis in breast cancer progression, diagnosis, and treatment. J. Cancer11 (15), 4474–4494. 10.7150/jca.44313
- CrossRef
- Google Scholar
61
MansmannU.MeisterR. (2005). Testing differential gene expression in functional groups. Methods Inf. Med.44 (03), 449–453. 10.1055/s-0038-1633992
- CrossRef
- Google Scholar
62
MarusykA.JaniszewskaM.PolyakK. (2020). Intratumor heterogeneity: The rosetta stone of therapy resistance. Cancer Cell37 (4), 471–484. 10.1016/j.ccell.2020.03.007
- CrossRef
- Google Scholar
63
MarusykA.TabassumD. P.JaniszewskaM.PlaceA. E.TrinhA.RozhokA. I.et al (2016). Spatial proximity to fibroblasts impacts molecular features and therapeutic sensitivity of breast cancer cells influencing clinical outcomes. Cancer Res.76 (22), 6495–6506. 10.1158/0008-5472.CAN-16-1457
- CrossRef
- Google Scholar
64
McGranahanN.SwantonC. (2017). Clonal heterogeneity and tumor evolution: Past, present, and the future. Cell168 (4), 613–628. 10.1016/j.cell.2017.01.018
- CrossRef
- Google Scholar
65
McKennaA.FindlayG. M.GagnonJ. A.HorwitzM. S.SchierA. F.ShendureJ. (2016). Whole-organism lineage tracing by combinatorial and cumulative genome editing. Science353 (6298), aaf7907. 10.1126/science.aaf7907
- CrossRef
- Google Scholar
66
MoezE. K.PyneS.DinuI. (2018). Association between bivariate expression of key oncogenes and metabolic phenotypes of patients with prostate cancer. Comput. Biol. Med.103, 55–63. 10.1016/j.compbiomed.2018.09.017
- CrossRef
- Google Scholar
67
ÖhlundD.Handly-SantanaA.BiffiG.ElyadaE.AlmeidaA. S.Ponz-SarviseM.et al (2017). Distinct populations of inflammatory fibroblasts and myofibroblasts in pancreatic cancer. J. Exp. Med.214 (3), 579–596. 10.1084/jem.20162024
- CrossRef
- Google Scholar
68
OpenshawS.CharltonM.WymerC.CraftA. (1987). A mark 1 geographical analysis machine for the automated analysis of point data sets. Int. J. Geogr. Inf. Syst.1 (4), 335–358. 10.1080/02693798708927821
- CrossRef
- Google Scholar
69
OshiM.TokumaruY.AngaritaF. A.YanL.MatsuyamaR.EndoI.et al (2020). Degree of early estrogen response predict survival after endocrine therapy in primary and metastatic ER-positive breast cancer. Cancers12 (12), 3557. 10.3390/cancers12123557
- CrossRef
- Google Scholar
70
ÖzdemirB. C.Pentcheva-HoangT.CarstensJ. L.ZhengX.WuC-C.SimpsonT. R.et al (2014). Depletion of carcinoma-associated fibroblasts and fibrosis induces immunosuppression and accelerates pancreas cancer with reduced survival. Cancer Cell25 (6), 719–734. 10.1016/j.ccr.2014.04.005
- CrossRef
- Google Scholar
71
PatelA. P.TiroshI.TrombettaJ. J.ShalekA. K.GillespieS. M.WakimotoH.et al (2014). Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science344 (6190), 1396–1401. 10.1126/science.1254257
- CrossRef
- Google Scholar
72
Paluch-ShimonS.EvronE. (2019). Targeting DNA repair in breast cancer. Breast47, 33–42. 10.1016/j.breast.2019.06.007
- CrossRef
- Google Scholar
73
PebesmaE. J. (2004). Multivariable geostatistics in S: The gstat package. Comput. geosciences30 (7), 683–691. 10.1016/j.cageo.2004.03.012
- CrossRef
- Google Scholar
74
PetersonR. A.PetersonM. R. A. (2020). Package ‘bestNormalize’. Normalizing transformation functions R package version.
- Google Scholar
75
PietrasK.ÖstmanA. (2010). Hallmarks of cancer: Interactions with the tumor stroma. Exp. Cell Res.316 (8), 1324–1331. 10.1016/j.yexcr.2010.02.045
- CrossRef
- Google Scholar
76
PingQ.YanR.ChengX.WangW.ZhongY.HouZ.et al (2021). Cancer-associated fibroblasts: Overview, progress, challenges, and directions. Cancer gene Ther.28 (9), 984–999. 10.1038/s41417-021-00318-4
- CrossRef
- Google Scholar
77
PuramS. V.TiroshI.ParikhA. S.PatelA. P.YizhakK.GillespieS.et al (2017). Single-cell transcriptomic analysis of primary and metastatic tumor ecosystems in head and neck cancer. Cell171 (7), 1611–1624. 10.1016/j.cell.2017.10.044
- CrossRef
- Google Scholar
78
RajB.WagnerD. E.McKennaA.PandeyS.KleinA. M.ShendureJ.et al (2018). Simultaneous single-cell profiling of lineages and cell types in the vertebrate brain. Nat. Biotechnol.36 (5), 442–450. 10.1038/nbt.4103
- CrossRef
- Google Scholar
79
RazY.CohenN.ShaniO.BellR. E.NovitskiyS. V.AbramovitzL.et al (2018). Bone marrow–derived fibroblasts are a functionally distinct stromal cell population in breast cancer. J. Exp. Med.215 (12), 3075–3093. 10.1084/jem.20180818
- CrossRef
- Google Scholar
80
SchaferJ.Opgen-RheinR.ZuberV.AhdesmakiM.SilvaA. P. D.StrimmerK.et al (2017). Package ‘corpcor’.
- Google Scholar
81
SenovillaL.VitaleI.MartinsI.TaillerM.PailleretC.MichaudM.et al (2012). An immunosurveillance mechanism controls cancer cell ploidy. Science337 (6102), 1678–1684. 10.1126/science.1224922
- CrossRef
- Google Scholar
82
ShahS.LubeckE.ZhouW.CaiL. (2016). In situ transcription profiling of single cells reveals spatial organization of cells in the mouse hippocampus. Neuron92 (2), 342–357. 10.1016/j.neuron.2016.10.001
- CrossRef
- Google Scholar
83
ShigaK.HaraM.NagasakiT.SatoT.TakahashiH.TakeyamaH. (2015). Cancer-associated fibroblasts: Their characteristics and their roles in tumor growth. Cancers7 (4), 2443–2458. 10.3390/cancers7040902
- CrossRef
- Google Scholar
84
SievertC. (2020). Interactive web-based data visualization with R, plotly, and shiny. Florida, United States: CRC Press.
- Google Scholar
85
SpanjaardB.HuB.MiticN.Olivares-ChauvetP.JanjuhaS.NinovN.et al (2018). Simultaneous lineage tracing and cell-type identification using CRISPR–Cas9-induced genetic scars. Nat. Biotechnol.36 (5), 469–473. 10.1038/nbt.4124
- CrossRef
- Google Scholar
86
StåhlP. L.SalménF.VickovicS.LundmarkA.NavarroJ. F.MagnussonJ.et al (2016). Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science353 (6294), 78–82. 10.1126/science.aaf2403
- CrossRef
- Google Scholar
87
SuS.ChenJ.YaoH.LiuJ.YuS.LaoL.et al (2018). CD10+GPR77+ cancer-associated fibroblasts promote cancer formation and chemoresistance by sustaining cancer stemness. Cell172 (4), 841–856. 10.1016/j.cell.2018.01.009
- CrossRef
- Google Scholar
88
SubramanianA.TamayoP.MoothaV. K.MukherjeeS.EbertB. L.GilletteM. A.et al (2005). Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci.102 (43), 15545–15550. 10.1073/pnas.0506580102
- CrossRef
- Google Scholar
89
SunG.LiZ.RongD.ZhangH.ShiX.YangW.et al (2021). Single-cell RNA sequencing in cancer: Applications, advances, and emerging challenges. Mol. Therapy-Oncolytics21, 183–206. 10.1016/j.omto.2021.04.001
- CrossRef
- Google Scholar
90
SunX.WangM.WangM.YaoL.LiX.DongH.et al (2020). Exploring the metabolic vulnerabilities of epithelial–mesenchymal transition in breast cancer. Front. Cell Dev. Biol.8, 655. 10.3389/fcell.2020.00655
- CrossRef
- Google Scholar
91
TakeshitaT.OshinoT.TokumaruY.OshiM.PatelA.TianW.et al (2021). Clinical relevance of estrogen reactivity in the breast cancer microenvironment. Front. Oncol.12, 865024. 10.3389/fonc.2022.865024
- CrossRef
- Google Scholar
92
TevesJ. M.WonK. J. (2020). Mapping cellular coordinates through advances in spatial transcriptomics technology. Mol. Cells43 (7), 591–599. 10.14348/molcells.2020.0020
- CrossRef
- Google Scholar
93
ThraneK.ErikssonH.MaaskolaJ.HanssonJ.LundebergJ. (2018). Spatially resolved transcriptomics enables dissection of genetic heterogeneity in stage III cutaneous malignant melanoma. Cancer Res.78 (20), 5970–5979. 10.1158/0008-5472.CAN-18-0747
- CrossRef
- Google Scholar
94
TsaiC-A.ChenJ. J. (2009). Multivariate analysis of variance test for gene set analysis. Bioinformatics25 (7), 897–903. 10.1093/bioinformatics/btp098
- CrossRef
- Google Scholar
95
Vázquez-VillaF.García-OcañaM.GalvánJ. A.García-MartínezJ.García-PraviaC.Menéndez-RodríguezP.et al (2015). COL11A1/(pro) collagen 11A1 expression is a remarkable biomarker of human invasive carcinoma-associated stromal cells and carcinoma progression. Tumor Biol.36, 2213–2222. 10.1007/s13277-015-3295-4
- CrossRef
- Google Scholar
96
VogelsteinB.KinzlerK. W. (2015). The path to cancer—Three strikes and you’re out. N. Engl. J. Med.373 (20), 1895–1898. 10.1056/NEJMp1508811
- CrossRef
- Google Scholar
97
WagnerE. F. (2016). Cancer: Fibroblasts for all seasons. Nature530 (7588), 42–43. 10.1038/530042a
- CrossRef
- Google Scholar
98
WangX.AllenW. E.WrightM. A.SylwestrakE. L.SamusikN.VesunaS.et al (2018). Three-dimensional intact-tissue sequencing of single-cell transcriptional states. Science361 (6400), eaat5691. 10.1126/science.aat5691
- CrossRef
- Google Scholar
99
WangX.PyneS.DinuI. (2014). Gene set enrichment analysis for multiple continuous phenotypes. BMC Bioinforma.15 (1), 260–269. 10.1186/1471-2105-15-260
- CrossRef
- Google Scholar
100
WickhamH. (2010). stringr: modern, consistent string processing. R. J.2 (2), 38. 10.32614/rj-2010-012
- CrossRef
- Google Scholar
101
World Health Organization (2018). Breast cancer: Breast cancer and early diagnosis. Georgia, United States: American Concer Society.
- Google Scholar
102
XuS.ChenT.DongL.LiT.XueH.GaoB.et al (2021). Fatty acid synthase promotes breast cancer metastasis by mediating changes in fatty acid metabolism. Oncol. Lett.21 (1), 27. 10.3892/ol.2020.12288
- CrossRef
- Google Scholar
103
YatesL. R.GerstungM.KnappskogS.DesmedtC.GundemG.Van LooP.et al (2015). Subclonal diversification of primary breast cancer revealed by multiregion sequencing. Nat. Med.21 (7), 751–759. 10.1038/nm.3886
- CrossRef
- Google Scholar
104
ZhangJ.FujimotoJ.ZhangJ.WedgeD. C.SongX.ZhangJ.et al (2014). Intratumor heterogeneity in localized lung adenocarcinomas delineated by multiregion sequencing. Science346 (6206), 256–259. 10.1126/science.1256930
- CrossRef
- Google Scholar

Summary

Keywords

intratumor heterogeneity, gene-set analysis, geographically weighted regression, linear combination test, Spatial single cell analysis, cancer-associated fibroblast

Citation

Amini P, Hajihosseini M, Pyne S and Dinu I (2023) Geographically weighted linear combination test for gene-set analysis of a continuous spatial phenotype as applied to intratumor heterogeneity. Front. Cell Dev. Biol. 11:1065586. doi: 10.3389/fcell.2023.1065586

Received

09 October 2022

Accepted

22 February 2023

Published

09 March 2023

Volume

11 - 2023

Edited by

Xin Wang, The Chinese University of Hong Kong, China

Reviewed by

Haoyun Lei, Carnegie Mellon University, United States

Zheng Xia, Oregon Health and Science University, United States

Canping Chen, Oregon Health and Science University Portland, United States, in collaboration with reviewer ZX

Updates

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Saumyadipta Pyne, spyne@ucsb.edu; Irina Dinu, idinu@ualberta.ca

†These authors have contributed equally to this work

This article was submitted to Molecular and Cellular Pathology, a section of the journal Frontiers in Cell and Developmental Biology

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Molecular and Cellular Pathology

ORIGINAL RESEARCH article

Geographically weighted linear combination test for gene-set analysis of a continuous spatial phenotype as applied to intratumor heterogeneity

Abstract

Introduction

Materials and methods