Modes of Selection in Tumors as Reflected by Two Mathematical Models and Site Frequency Spectra

The tug-of-war model was developed in a series of papers of McFarland and co-authors to account for existence of mutually counteracting rare advantageous driver mutations and more frequent slightly deleterious passenger mutations in cancer. In its original version, it was a state-dependent branching process. Because of its formulation, the tug-of-war model is of importance for tackling the problem as to whether evolution of cancerous tumors is “Darwinian” or “non-Darwinian.” We define two Time-Continuous Markov Chain versions of the model, including identical mutation processes but adopting different drift and selection components. In Model A, drift and selection process preserves expected fitness whereas in Model B it leads to non-decreasing expected fitness. We investigate these properties using mathematical analysis and extensive simulations, which detect the effect of the so-called drift barrier in Model B but not in Model A. These effects are reflected in different structure of clone genealogies in the two models. Our work is related to the past theoretical work in the field of evolutionary genetics, concerning the interplay among mutation, drift and selection, in absence of recombination (asexual reproduction), where epistasis plays a major role. Finally, we use the statistics of mutation frequencies known as the Site Frequency Spectra (SFS), to compare the variant frequencies in DNA of sequenced HER2+ breast cancers, to those based on Model A and B simulations. The tumor-based SFS are better reproduced by Model A, pointing out a possible selection pattern of HER2+ tumor evolution. To put our models in context, we carried out an exploratory study of how publicly accessible data from breast, prostate, skin and ovarian cancers fit a range of models found in the literature.


Mutations decreasing expected fitness (sp < dq)
In the case of Model A, the large number of living clones started from a driver are present at the end of simulation ( Figure S1C). Many of them descend directly from the initial clone. Each of clones includes a small number of cells. In contrast, the result obtained with use of Model B ( Figure S1D) indicate less clones present at the end of simulation, with initial clone no longer present in the population. Mutations not connected to any ancestor (clones 521 and 594 in Figure S1D descend from the initial clone, with possible earlier passenger mutations.

Neutrality testing 1.2.1 Mutations preserving expected fitness (sp = dq)
In the population where impacts of driver and passenger mutations are in equilibrium, the allele count spectrum have different shape than expected under neutrality in case of model B, (Figure S2B), while in case of model A observed allele count fits better to the expectations ( Figure S2A). The count of singletons calculated based on simulations with use of both Models correspond well to the expected one ( Figure S2C and D).

Mutations increasing expected fitness (sp > dq)
In the case of higher driver mutations impact, results obtained with use of model B ( Figure S3 B and D), especially allele count ( Figure S3B), differ substantially from expected ones. Model A generates singleton count spectrum corresponding well to the expected distribution ( Figure S3C), but observed allele count spectrum deviates significantly from the theoretical curve ( Figure S3A).

Mutations decreasing expected fitness (sp < dq)
In the last example, both models yield allele count spectra with significant deviation from neutrality ( Figure  S4 A and B). In this case, the effect is stronger for spectrum generated by model A ( Figure S4A). However, for this Model We observe similar number of singletons as expected for given allele count S4C, which is not the case for Model B, where simulated and expected singleton counts differ significantly ( Figure S4D).  One-sample Kolmogorov-Smirnov test rejects the null hypothesis that the empirical distribution of allele count fits the theoretical one at the 5% significance level (p < 0.0001). (B) Model B. One-sample Kolmogorov-Smirnov test rejects the null hypothesis that the empirical distribution of allele count fits the theoretical one at the 5% significance level (p < 0.0001). (C) Model A. Two-sample two-sided Wilcoxon test does not reject the null hypothesis that simulated and expected singleton counts come from distributions with equal medians at 5% significance level (p = 0.76). (D) Model B. Two-sample two-sided Wilcoxon test does not reject the null hypothesis that simulated and expected singleton counts come from distributions with equal medians at 5% significance level (p = 0.82). Figure S3. Results of neutrality testing for allele counts (top panels) and singleton number (bottom panels) calculated for simulations with parameters: Kolmogorov-Smirnov test rejects the null hypothesis that the empirical distribution of allele count fits the theoretical one at the 5% significance level (p < 0.0001). (B) Model B. One-sample Kolmogorov-Smirnov test rejects the null hypothesis that the empirical distribution of allele count fits the theoretical one at the 5% significance level (p < 0.0001). (C) Model A. Two-sample two-sided Wilcoxon test does not reject the null hypothesis that simulated and expected singleton counts come from distributions with equal medians at 5% significance level (p = 0.91). (D) Model B. Two-sample two-sided Wilcoxon test rejects the null hypothesis that simulated and expected singleton counts come from distributions with equal medians at 5% significance level (p < 0.0001). Figure S4. Results of neutrality testing for allele counts (top panels) and singleton number (bottom panels) calculated for simulations with parameters: Kolmogorov-Smirnov rejects the null hypothesis that the empirical distribution of allele count fits the theoretical one at the 5% significance level (p < 0.0001). (B) Model B. One-sample Kolmogorov-Smirnov test rejects the null hypothesis that the empirical distribution of allele count fits the theoretical one at the 5% significance level (p < 0.0001). (C) Model A. Two-sample two-sided Wilcoxon test does not reject the null hypothesis that simulated and expected singleton counts come from distributions with equal medians at 5% significance level (p = 0.54). (D) Model B. Two-sample two-sided Wilcoxon test rejects the null hypothesis that simulated and expected singleton counts come from distributions with equal medians at 5% significance level (p < 0.0001).

Comparison of simulated cumulative tails of the SFS to experimental data obtained from patient G31
We supplement the comparisons in Figure 13 in the main text, which were performed for experimental data obtained from patient G30, with analogous comparisons for data from patient G31.

Comparison of simulated cumulative tails of the SFS to data obtained from breast cancers -in the log-log scale
We supplement the comparisons in Figures 12, 13 and 14 in the main text and Figure S5, which were depicted in semi-logarithmic coordinates, with analogous comparisons in the log-log scale.

SUPPLEMENTAL INFORMATION: SITE FREQUENCY SPECTRA -COMPARISON OF VAF OF TCGA CANCER SPECIMENS TO MATHEMATICAL MODELS
In this section of Supplement, we compare the variant allele frequencies of selected TCGA tumors of breast, prostate, skin (melanoma), and ovary, to different models considered in the current paper. Breast and prostate cancers VAF histograms are predominantly unimodal, while among cancers of lung and ovary, multimodal VAF histograms are quite common, and we focus on these for these two cancers.
For these reasons, we compare breast and prostate cancer VAFs to Models A and B, since the model of McDonald et al. (2018), as mathematically analyzed in Tung and Durrett (2021), seems to give SFS similar to Model B ( Figure 13B in the main text). On the other hand, we compare the melanoma and ovarian cancers SFS to those generated by the Model of Dinh et al. (2020). Comparisons we carry out here do not have the ambition of being exhaustive. However, they seem to illustrate the diversity of possible models of evolution to be expected in different cancers.
We selected 2 specimens of each cancer type, and we focus on comparisons of the VAF cumulative tails T (x) in the semi-logarithmic coordinates to their mathematically modelled SFS counterparts. As specified in the main body, comparison of cumulative tails, as opposed to probability mass functions or probability distribution functions, seems meaningful for several reasons; (i) the cumulative tails are smoother, (ii) they are all inscribed into a unit square, if VAF frequency x = m/n is used as an argument, (iii) semi-logarithmic coordinates allow resolving differences in the "deep tail" (T (x) small), and (iv) they are also less sensitive to differences in T (x) for x small, which might be caused by DNA sequencing errors and data "massaging".
However, in the case of melanoma and ovarian cancer, we also supplement the model-fitted VAF histograms, to visualize how the model of Dinh et al. (2020) allows separating the clonal components.
The samples obtained from individuals G30 and G31 of the HER2+ breast cancer, which serve as an illustration for validity of Model A, were obtained from the collection of the Maria Sklodowska-Curie National Research Institute of Oncology, Krakow Branch, Poland, as a part of a larger study of the molecular evolution of breast cancer in Poland (as described in the main body of the paper). To obtain these, we commissioned a search of the archive of preserved specimens, all of which were paraffin-embedded (FFPE) samples. For this reason, direct validation using fast-frozen (FF) samples was impossible.
DNA was extracted from formalin-fixed paraffin-embedded (FFPE) tissues. DNA was isolated from tissue sections using a semi-automatic method available from Maxwell ® RSC Instrument (Promega). The Maxwell ® RSC DNA FFPE Kit (Promega) was used to extract genomic DNA as recommended by the manufacturer. The quality and amount of DNA was assessed using spectrophotometer NanoDrop 2000c and fluorimetrically using the Qubit TM dsDNA HS Assay Kit and Qubit 3.0 device (ThermoFisher Scientific).
Coverage charts (Figures S17 -S18) and a sheet with detailed statistics for these two samples (DNA sequencing quality control (table).xlsx) are provided. They list the read statistics for G30 and G31 tumors, subdivided into primary tumor (P), cancerous lymph node (L) and benign lymph node (C), used as control for the detection of somatic mutations.
Based on our experience the quality of data obtained based on the FFPE material, from patients G30 and G31, is comparable to the quality of data obtained typically using FF material. We were able to obtain >100x true median coverage for the regions captured by the Agilent SureSelect Human All Exon v7 probes, as shown on Figure S17. This was achieved both due to high quality of the reads, moderate read duplication rate for the total number of reads sequenced (>100 M/sample) and high mappability rate, as shown on Figure S18.
While we are unable to compare the results based on FFPE and FF samples obtained from the same individuals it seems safe to assume that for isolated DNA of good quality the results can be comparable between FFPE and FF, as shown in Astolfi et al. (2015). Figure S17. Exon coverage characteristics for the G30 (red) and G31 (blue) tumors. Figure S18. Reads statistics in the two G30 and G31 tumors.