Elucidation of Functional Roles of Sialic Acids in Cancer Migration

Sialic acids (SA), negatively charged nine-carbon sugars, have long been implicated in cancer metastasis since 1960's but its detailed functional roles remain elusive. We present a computational analysis of transcriptomic data of cancer vs. control tissues of eight types in TCGA, aiming to elucidate the possible reason for the increased production and utilization of SAs in cancer and their possible driving roles in cancer migration. Our analyses have revealed for all cancer types: (1) the synthesis and deployment enzymes of SAs are persistently up-regulated throughout the progression for all but one cancer type; and (2) gangliosides, of which SAs are part, tend to converge to specific types that allow SAs to pack at high densities on cancer cell surface as a cancer advances. Statistical and modeling analyses suggest that (i) a highly plausible reason for the increased syntheses of SAs is to produce net protons, used for neutralizing the OH− persistently generated by elevated intracellular iron metabolism coupled with chronic inflammation in cancer tissues; (ii) the level of SA accumulation on cancer cell surface strongly correlates with the stage of cancer migration, as well as multiple migration-related characteristics such as altered cell-cell adhesion, mechanical stress, cell protrusion, and contraction; and (iii) the pattern of SA deployment correlates with the 5-year survival rate of a cancer type. Overall, our study provides strong evidence for that the continuous accumulation of SAs on cancer cell surface gives rise to increasingly stronger cell-cell repulsion due to their negative charges, leading to cell deformation by electrostatic force-induced mechanical compression, which is known to be able to drive cancer cell migration established by recent studies.


INTRODUCTION
It has been observed that increased syntheses of sialic acids (SAs) are associated with cancer development and metastasis since 1960's (1,2), where SAs are negatively charged nine-carbon sugars and generally serve as the capping molecules of cell-surface glycan, as part of plasma membrane-embedded gangliosides (3). Under physiological conditions, brain tissues have the highest concentration of SAs, used for synaptogenesis. Outside of brain, red blood cells have the highest cell-surface concentration of SAs. As of now, it remains largely unknown of why cancer cells produce unusually large numbers of SAs and accumulate them on cell surface (4). Published studies have been mostly focused on their signaling roles with other cell types such as immune and endothelial cells, via binding to siglecs and selectins, to facilitate interactions between cancer and immune cells (5) and to enable cancer cells interactions with and penetration into blood vessels (1), respectively. Very little has been established regarding if they may play roles in creating mechanical compression within cancer tissues, knowing their negative charges and being increasingly placed on cell surface, hence possibly resulting in increasingly stronger cell-cell repulsion, while mechanical pressure has been widely observed in cancer tissues but largely attributed to the confined space for growing tumors (6,7).
We have recently studied 44 reprogrammed metabolisms widely observed in cancer, including persistent SA synthesis, and discovered that each of them produces more protons than its original metabolism (8). In addition, we have also discovered that all cancer tissue cells harbor Fenton reactions: Fe 2+ + H 2 O 2 -> Fe 3+ + q OH + OH − in their cytosol; and the rates of OH − production can saturate the intracellular pH buffer within days, hence increasing the intracellular pH if not neutralized (9), which posts a major stress to the host cells. A linear regression analysis was conducted of the predicted level of Fenton reactions against the predicted levels of all ∼50 reprogrammed metabolisms ( Figure S1), which achieves high R 2 values with statistically significant p-values for each cancer type (8). This strongly suggests that these reprogrammed metabolisms are induced to respond to cytosolic Fenton reactions, serving as neutralizers for the OH − persistently produced by Fenton reactions.
In this context, we present a computational study of transcriptomic data of SA related vs. migration related genes in cancer tissues of eight cancer types from the TCGA database (10). Our analyses have revealed: (1) majority of the cancer types has increased production and deployment of SAs on cell surface, where the synthesis of a SA generates two net protons; (2) the level of SA synthesis correlates with the level of cytosolic Fenton reaction for all cancer types studied; and (3) as a cancer progresses, it tends to converge to use of specific types of gangliosides, facilitating SA packing at high densities. Further analyses lead to the following discoveries: (i) a simple model for predicting the level of SA accumulation on cell surface can statistically well explain cancer progression from stage N0 through stage N3 and then stage M (using the TNM stage notation), where Ni represents cancer tissues that have metastasized to i lymph nodes and M for distant metastasis; (ii) strong correlations are observed between a range of cell migration-associated characteristics, such as increased mechanical stress, cell contraction and protrusion, and SA production and/or deployment; and (iii) the detailed expression patterns of SA synthesis and degradation genes can statistically well explain the average 5-year survival rates of each cancer type, hence the level of metastasis since the survival rate strongly correlates with the level of metastasis across all eight types. Overall, our analyses provide strong evidence for that the SA accumulation on cancer cell surface plays key roles in mechanical compression and cell deformation in cancer tissues.
It has been observed that cancer tissues have strong mechanical pressure within (6,7). It was suggested that this is due to the confined space for the growing tumors. However, the "confined space" argument may not be supported by experimental data. For example, skin melanoma starts to metastasize as soon as the cancer starts to grow vertically, which is clearly not confined by space. Our analyses suggest: mechanical pressure strongly correlates with the cell-surface accumulation level of SAs. A previous study has convincingly demonstrated that mechanical compression can lead to cell deformation, which can drive the activation of actomyosin filaments and associated contractile motion, ultimately driving collective migration by cell clusters with enhanced cell-cell adhesion (11).
By integrating all these together, we have developed a model for how SA synthesis and deployment, responding to cytosolic Fenton reactions, can give rise to increasingly stronger cell-cell repulsion, further leading to mechanical compression and cell deformation, which can drive cancer cell migration.

Elevated Synthesis of Sialic Acids in Cancer
We have examined the key genes involved in SA synthesis (CMAS) and degradation (NEU1). CMAS is up-regulated in seven of the eight cancer types (except for COAD); and NEU1 tends to correlate with CMAS throughout the major portion of a cancer progression for all cancer types, except for the last stage(s), where the two curves may diverge or converge for some cancer types, as detailed in Figure 1.
Previous studies generally attribute the increased SA production to their signaling roles in cancer (12). However, this is not supported by the transcriptomic data analyzed here. It is known that SAs conduct their functions through binding with the siglec and/or selectin molecules. Our analyses of the gene expression data in (bulk) cancer tissues show that there is no or little correlation between CMAS and any siglec gene (SIGLEC1-16), selectin gene (SELE, SELL, and SELP) or their total expressions (data not shown).
We have previously predicted that 44 known reprogrammed metabolisms (8), including persistent production of SA, in cancer are induced to generate protons for neutralizing OH − persistently generated by cytosolic Fenton reactions. Figure 2 shows the predicted levels of cytosolic Fenton reactions (9) vs. the expression level of CMAS across different stages of all cancer types under study. The majority of the cancer types show statistically significant positive correlations like BRCA (cor = 0.352, p-value = 3.05E-33), COAD (cor = 0.216, p-value = 2.32E-04), LUAD (cor = 0.343, p-value = 1.04E-15), LUSC (cor= 0.206, p-value = 3.45E-06), STAD (cor = 0.212, p-value = 9.96E-04). While the detailed correlation between the two curves may not always be high due to contributions by other reprogrammed metabolisms to neutralize OH − by Fenton reactions, their overall trends are generally the same. Hence, we predict: the SA synthesis is induced to respond to cytosolic Fenton reactions.

Accumulation of Sialic Acids on Cancer Cell Surface and 5-Year Survival
It has been established that cancer cells accumulate SAs on cell surface, and the level of such accumulation could be considerably  higher than that of red blood cells (13), which are known to have high-levels of cell-surface SAs that prevent red blood cells from adhering to each other due to their negative charges (14). If our immune system detects adhered red blood cells, it will destroy them as their adhesion indicates that such cells have reached the end of their working lives. Knowing that the accumulation rate of SAs depends on the rates of SA synthesis, degradation, and transfer to cell-surface glycan, we have used the expressions of the following genes to estimate the rate of such accumulation. CMAS is used for SA synthesis, NEU1 for its degradation, and sialyltransferase genes ST3GAL1, 2, 5, ST6GALNAC4, 5, and ST8SIA1, 5 for SA transfer to glycan. The following is used to estimate the rate of the SA placement onto glycan: where E(X) is for the expression of gene X, and Y represents the seven SA transferase genes. Similarly, the following is used to estimate the rate of SA degradation, where CTSA (Cathepsin A) is needed to form a complex with NEU1 to conduct the degradation function. Figure 3 shows the comparative levels of these two quantities across different stages of all eight cancer types. An assumption used here is: for a gene X with expression level E(X), the maximum reaction rate of the enzyme encoded by X is proportional to K cat * E(X), with K cat being the reaction rate constant catalyzed by enzyme X in the Michaelis-Menten formulation [NOTE: this is essentially equivalent to the assumption that (i) the expression level of a gene is linearly proportional to its protein concentration; and (ii) the reactant concentration is higher than the reaction constant K M , a common assumption used when modeling human metabolisms based on Michaelis-Menten formulation]. Hence the reaction rates of the SA placement and degradation should be linearly proportional to the two curves for each cancer in Figure 3.
Knowing that CMAS and NEU1 have comparable K cat values (15,16), we predict that the two curves reflect the relative reaction rates of the two enzymes.
From Figure 3, we conclude: (1) pancreatic cancer (PAAD) has by far the largest (positive) difference between the SA placement and degradation, measured using the total area between the curves of SA placement and SA degradation (the area is negative if the degradation curve is above the placement curve), hence giving rise to highest level of the SA accumulation and the strongest cell-cell repulsion, which is consistent with the known fact that the cancer has the highest death rate, among all cancer types; and (2) more generally, cancer types with higher 5year survival rates tend to have higher SA degradation rates than their placement rates, especially toward the last stage of a cancer. Figure 4 summarizes the average 5-year survival rates of the eight cancers under study.
To further demonstrate that the rates of the SA placement and degradation may have implications to a cancer's 5-year survival rate, we have conducted a linear regression analysis of the survival rate against changes in the rate of the SA placement and the rate of its degradation in the last two stages for each caner type. Specifically, let P1 be the difference between the slopes of the SA placement curve and the slopes of SA degradation in the last two stages and P2 be the difference between the values of SA placement curve and degradation in the last stage of each cancer (Figure 3). Figure 4 shows a visualization of the values of these two parameters along with the corresponding survival rate for each cancer type. We can see that the survival rate can be well explained by these two parameters achieving R 2 = 0.876, p-value = 6.882E-03, when not considering LUSC, which does not fit our model. Hence, the regression analysis is conducted only on the other seven cancer types, which gives the following function and achieves a high fitness level R 2 = 0.876 and p-value is 6.882E-03.

Ganglioside Types as a Cancer Advances
Under physiological conditions, gangliosides, as the hosts of SAs, are predominantly used in brains and red blood cells. In the embryonic stage, brain cells tend to use simple gangliosides, i.e., those with simple glycan structures and gradually switch to more complex structures, such as GM1, GD1a, GD1b, and GT1b (17), where GM is for monosialoganglioside, GD for disialoganglioside and so on.
It has been observed that advanced stage cancers tend to use specific gangliosides such as GD2 and GM2, or GD3 and GM3 to a lesser extent (18). Majority of the published studies focus on the possible signaling roles of such gangliosides like GD2 or GM2 (19,20). Other authors have examined the issue from the perspective that different gangliosides enable different packing densities of gangliosides inside plasma membrane, and observed: generally, the simpler a glycan structure, the more gangliosides can be packed into a fixed area.
To understand these observations, we have examined the expression data of the enzyme genes in the synthesis network of gangliosides (Figure 5), where the "typical" relative expression levels of these genes are shown across the eight cancer types. We note: the synthesis pathways of numerous gangliosides do not quite follow the normal pathways as shown in Figure S2, instead they form distinct synthesis fluxes as shown in Figure 5, shown by the red/blue arrows with different widths. Specifically, the flux first goes downwards along the first column, and then travels to the second column via the ST6GALNAC4/5-catalyzed reactions more than the typical ST3GAL5 reactions. The flux then travels upwards along the second column; and from the second to the third column, the flux is relatively weak via the moderately expressed ST8SIA1.
We have then examined the total expression level of the influx enzymes for each ganglioside (i.e., those that produce the ganglioside) vs. that of the efflux enzymes (i.e., those that utilize the ganglioside); and predict that a ganglioside will have cellular accumulation if its influx rate is higher than its efflux rate, otherwise the ganglioside will not be accumulated. Note that such accumulation of a ganglioside should be proportional to the level of its deployment inside the plasma membrane. The reason we do this simplified qualitative network flux analyses, instead of the detailed kinetics-based Michaelis-Menten analyses is: we do not have the kinetic rate constants, K cat and K M , for multiple enzymes under consideration.
This analysis has revealed that advanced-stage cancer tissues generally have high accumulations of GM2 and GD2, followed by GM3 and GD3 ( Table 1). These results are highly consistent with the published results.
A key discovery made in our previous work on cancer metabolic reprogramming is: cancer tends to produce as many protons as possible in each reprogrammed pathway; and the altered ganglioside synthesis processes is one of the reprogrammed metabolisms studied (8). Hence, we hypothesize: cancer may select specific ganglioside types that maximize the total number of protons produced per cell plasma membrane.
To examine if this is the case, we have examined the packing densities of GM2 and GM1. It has been reported that 451 GM2 molecules pack into a cluster with head (cross-section) radius 66.0 Å and 301 GM1 form a cluster with head radius 58.7 Å (22), hence the ratio between the head areas per GM1 and GM2 is 58.7 2 /301: 66 2 /451, namely 11.45: 9.66. Therefore, for a fixed area, the ratio between the numbers of GM1 and GM2 that can pack into is: 9.66: 11.45.
Note that the normal pathways for synthesizing GM1 and GM2 produce 3 and 2 net protons (the numbers next to the red arrows along the pathway), respectively, but the altered pathways each produce 4 protons. In addition, the synthesis of each ganglioside requires the synthesis of some SAs, each of FIGURE 5 | Metabolic pathway of ganglioside synthesis and metabolism, where the name in bold is the name of the ganglioside above it; and the name next each arrow represents the gene encoding the enzyme that catalyzes the reaction represented by the arrow [adapted from Yu et al. (21)]. The width of each red/blue arrow represents roughly the relative expression level of the corresponding gene, and color red is for reactions that each produce one net proton and blue for pH neutral reactions while reactions without such arrows are for those without expressions.
which produces two net protons. Figure 5 shows the number of SAs attached to each ganglioside. Hence for the same area in plasma membrane, the ratio between the numbers of protons that the syntheses of GM1 and GM2 each produce is: 9.66 x (4 + 2): 11.45 x (4 + 2), namely 9.66: 11.45, respectively. Therefore, more protons are produced by the synthesis and utilization of GM2 than that of GM1.
Considering that no published data regarding the packing densities in the same setting for the other gangliosides are publicly available (to the best of our knowledge), we extrapolate that GM3 and GM2 have a similar relationship to that between GM2 and GM1 presented above. Hence we predict that maximizing the proton production is a key determinant in a cancer's selection of utilization frequencies of GM3, GM2, and GM1 as observed above.

A Sialic Acid-Based Model for Cancer Metastasis
Under the TNM scheme, a cancer tissue is classified into stage N0, N1, N2, N3, or M, where Ni is for tissues with i nearby lymph node(s), 0 ≤ i ≤ 3, being metastasized and M is for distant metastasis. In the following, N4 represents M for the simplicity of presentation. Our goal is to demonstrate that for each cancer type, the average SA accumulation level in stage Ni, 0 ≤ i ≤ 4, can be calculated as C i + C i , where C i denotes the SA level accumulated solely in stage Ni and C i is a fixed positive value denoting the SA level accumulated in the previous i-1 stages for i ≥ 1 and C 0 = 0. Furthermore, C i is a function of the rates of SA synthesis, degradation and transfer to cellsurface glycan, respectively, and the duration of stage Ni, which can be estimated based on the expression data of seven genes: CMAS for SA synthesis, NEU1 for its degradation, ST3GAL1, 2, 5 and B4GALNT1 for its transfer onto cell-surface glycan, and POLDIP2 (a DNA polymerase gene) for the rate of cell cycle whose inverse is a proportional to the duration of one cell cycle. Mathematically, this problem can be formulated as: for each cancer type, search for an unknown function F():  This problem is solved as a multi-classification and a regression problem using two separate neural networks, each with two-hidden layers, one for evaluating the performance of multi-classification model, and on this basis, the other for solving the Ci values and finding the F() function, respectively. Figure 6 shows the two neural network architectures. In the left panel of Figure 6, for each cancer type and each stage Ni, 70% samples are randomly selected as training data and the remaining as the testing data, and this process is repeated for 10 times. Table 2A shows the prediction results for each stage Ni and each cancer type, measured using the macro F1 score, defined as: In the right panel of Figure 6, the loss function of the neuron network is min n i=1 (y −ŷ) 2 ,whereŷ is the predicted c value. When the maximum number of iterations reached 1,000,000 or the loss is less than 0.1, the training process is done. Table 2B summarizes the predicted Ci values for each cancer type, which increase monotonically over stage, hence logically meaningful.
From the table, we conclude that for each cancer stage, the seven genes can statistically well explain the metastasis stage,   hence strongly suggesting that the SA accumulation level on cancer cell surface is a key factor in dictating the level of metastasis of a cancer tissue.

Supporting Evidence
To provide further evidence that the SA accumulation on cancer cell surface can strongly influence the level of metastasis of a cancer, we have examined the statistical relationships between the above seven genes and a range of characteristics uniquely associated with cancer migration, each of which is reflected by a set of marker genes given in Table 3. For a given set of marker genes {g 1 , . . . , g k } over n samples, let X = (x 1 , . . . , x n ) T be the solution that minimizes the following function: where E i g j is the expression level of gene g j in sample I, and X represents the feature vector of {g 1 , . . . , g k } over the n samples, which is used to calculate the co-expression level with the SA related genes. This problem is solved as a linear regression problem. a. Mechanical compression marker genes and SA related genes: We have examined co-expression levels between the above SA related genes and known marker genes of compressive stress. Figure 7A shows that compressive stress marker genes strongly correlate with the SA genes. We have then conducted a regression analysis of the marker genes against the SA genes, with regression results shown in Table 4. b. Cell-cell adhesion genes and SA related genes: It has been known that cancer cells tend to alter their cell-cell adhesion genes as cancer cells start to migrate. The coexpression data and regression results are given in Figure 7B and Table 4. c. Cell contractile genes and SA related genes: The coexpression data and regression results are given in Figure 7C and Table 4. d. Protrusion genes and SA related genes: The co-expression data and regression results are given in Figure 7D and Table 4. e. Motion marker genes and SA related genes: The coexpression data and regression results are given in Figure 7E and Table 4.
From these figures and tables, we can see that each of the key migration-related characteristics can be well explained statistically by the SA genes, hence providing further evidence to that SA accumulation plays key roles in driving cancer migration.

Regulation of Key Genes Leading to Cancer Migration
We have conducted an integrated analysis of genomic, epigenomic, and transcriptomic data to estimate how the key SA genes, namely CMAS, ST3GAL1, ST3GAL5, and NEU1, are regulated across different cancer types. For each gene, we assess the level of contribution to its transcription regulation by transcription factors vs. DNA methylation using a regression analysis (see section Methods). Table 6 shows the detailed regression results for all four genes. From the table, we can see (1) more DNA methylation utilization in cancer samples than in normal samples; and (2) while the level of contribution to transcription regulation by DNA methylation varies across different genes and different cancer types, DNA methylation in general makes substantial contributions to the regulation of the key SA genes.  Table 3) for multiple aspects of cancer metastasis: (A) Mechanical compression genes, (B) Cell-cell adhesion genes, (C) Cell contraction genes, (D) Protrusion genes, and (E) Motion/migration genes across eight cancer types. In the heatmaps, the red represent positive correlation, the green represent negative correlation and the white represent insignificant correlation respectively.

DISCUSSION
Our prediction for linking overproduction and gradual accumulation of sialic acids to cancer migration is based on identified connections, statistical or physical, among seemingly unrelated molecular species and cellular activities. It is the multiplicity of such chains of simple and subtle associations, which are otherwise do not exist, that give us the statistical confidence about our prediction. Some of the detected associations are only useful for making our overall prediction while others, such as the 7-gene signature for predicting the status of cancer metastasis, could be used independently as novel markers for cancer metastasis. In addition, our prediction confidence also comes from the previous work that causally connects mechanical forces to cancer migration.
Tse et al. (11) presented an elegant study that shows external mechanical compression on cancer cells can lead to their deformation, which can give rise to enhanced cellcell adhesion, actomyosin contraction, filopodial protrusion, ultimately collective migration by clusters of cells. This model provides strong support to our prediction that SA accumulation on cancer cell surface will generate increasingly stronger electrostatic repulsion due to the increased densities of electric charges from SAs, leading to enhanced cell-cell adhesion, actomyosin contraction, protrusion as well as migration as established above. Compared to previous studies on SAs and  cancer migration, our study provides a fundamentally novel perspective regarding the roles of SA in cancer migration, namely it is their physical property rather than the signaling functions that may play the predominant role in driving cancer metastasis.
This model can answer a few long standing open questions regarding cancer metastasis: (1) while SAs have long been known to be associated with cancer metastasis, very little has been established regarding why it generally takes years for a cancer to become metastatic, knowing that the expression levels of SA synthesis and transferase genes do not go up very substantially (Figure 1). Our model, in conjunction with Tse et al.'s model, suggests that the gradual accumulation of the SA-associated negative charges on cell surface will activate the migration program as discussed in Tse et al. (11) once their electrostatic repulsion reaches a critical point; (2) very little has been established in the literature regarding why certain cancer types metastasize easily and early while other cancer types are less likely to metastasize-our model suggests that it is the combination of the rates of SA production, degradation and transfer to cell surface glycan that determines when a cancer starts to migrate; (3) gangliosides of certain types such as GM2 and GD2 have long been found to be associated with cancer metastasis and the current literature suggests that it is their signaling roles that may be relevant to migration (17); our model suggests that two factors may contribute to the selection of specific types of gangliosides, namely: (i) the number of SAs that can be put on gangliosides per cell, which is determined by the packing density of individual ganglioside types inside the plasma membrane and the number of SAs that each ganglioside of the type can harbor; and (ii) the number of protons produced by the synthesis process of individual ganglioside types: more complex gangliosides produce more protons through their synthesis process per molecule but result in their lower packing densities inside the plasma membrane, hence possibly giving rise to a lower number of total protons per cell. Hence we postulate that the selected ganglioside types are the result of tradeoff between these two factors, which ultimately enables the maximum number of protons to be produced through this combination. Clearly, our model is a statistical model. We plan to develop a more physics-based model that will allow us to estimate the actual density-level changes as a cancer evolves as well as to calculate the level of electrostatic repulsion in a realistic media environment, hence enabling us to accurately estimate the shield effect of the electrons.

Data
Cancer survival data: The data given in Table 5 are collected from the NCI TCGA website (https://portal.gdc.cancer.gov), which provide clear and detailed clinical information of each cancer patient.
Transcriptomics data: We have used all the transcriptomic data of eight cancer types: BLCA, BRCA, COAD, HNSC, LUAD, LUSC, STAD, and PAAD in the TCGA database. Table 6 summarizes the relevant information. Here we used eight cancer types instead of the 14 cancer types we typically use in our recent studies (8,9,23) is that we have used TNM staging scheme rather than the more traditional staging approach: stage 1, 2, 3, and 4 since a focus of the study is on the stage of metastasis. For this the TNM staging is clearly more appropriate.
Genomics data: Somatic mutation and copy number variation data in TCGA were collected for each of the four SA genes from the UCSC Xena platform (24). Gene-level non-silent mutations calls and copy number variation estimates are used in our study. DNA methylation data: The beta values of the Illumina Human Methylation 450K platform were collected from the UCSC Xena platform. Probes in the gene body, first exon, UTRs or within 1,500 bp upstream the transcription start site for each of the four SA genes were used.
Transcription factor data: ChIP-Seq validated transcription factor-target gene pairs for each of the four SA genes were collected from the ENCODE database (25), TRRUST database (26), and Marbach et al. (27).

Prediction of Regulatory Mechanisms via Regression Analysis
For each of the four SA genes, samples with somatic mutations or copy number variations in the gene were filtered out as we're interested in how the relevant transcription factors and DNA methylation co-regulate the expression of the gene.
TPM values for gene expression and M values for DNA methylation were first centered and scaled to have mean 0.0 and standard deviation 1.0. For each target gene g, our goal is to find real values: µ, {α i } and {β j } so that the following function is minimized where g is the expression level of the target gene g, x e1 , · · · , x ep are the expression levels of the transcription factors that regulate g, and x m1 , · · · , x mq are the DNA methylation levels of the probes in or around g in the genome. The least squares method was used to solve the optimization problem with Lasso penalty using the R package glmnet (28).
The analysis of variance table was then computed using ANOVA to get the sum of squares (SS) for each parameter. The SS for groups x e and x m were then summed up to get SS TF and SS MT for contributions by transcription factors and DNA methylations, respectively. SS MT SS TF +SS MT is used to estimate the level of contribution to transcription regulation by DNA methylation of target gene g.

Prediction of Concentrations of Gangliosides Based on Gene-Expression Data
We have predicted the relative concentrations of all the gangliosides based on gene expression levels of the relevant enzymes. A key simplifying assumption is: the K cat values for all the enzymes involved in this metabolism are approximately the same since we do not have these values. For each metabolite G, we calculate its total influx and efflux as j G i j × E(g j ) and where G i j -> G is the jth influx reaction and G -> G o k is the kth efflux reaction of G; and E(g j ) is the expression level of gene g j whose enzyme catalyzes the jth reaction. An iterative procedure is employed to estimate these two quantities till the system converges on {G i,o j } values. Then we predict G is intracellularly accumulated if j G i j ×E(g j ) is significantly higher than k G o k × E g k , hence used in the plasma membrane. Two levels of "significance" is used: high and moderate in Table 1.

AUTHOR CONTRIBUTIONS
YX designed the research. HS, YZ, and HJ performed research. HS, YZ, and YX analyzed data and wrote the paper.