Navigating Multi-Scale Cancer Systems Biology Towards Model-Driven Clinical Oncology and Its Applications in Personalized Therapeutics

Rapid advancements in high-throughput omics technologies and experimental protocols have led to the generation of vast amounts of scale-specific biomolecular data on cancer that now populates several online databases and resources. Cancer systems biology models built using this data have the potential to provide specific insights into complex multifactorial aberrations underpinning tumor initiation, development, and metastasis. Furthermore, the annotation of these single- and multi-scale models with patient data can additionally assist in designing personalized therapeutic interventions as well as aid in clinical decision-making. Here, we have systematically reviewed the emergence and evolution of (i) repositories with scale-specific and multi-scale biomolecular cancer data, (ii) systems biology models developed using this data, (iii) associated simulation software for the development of personalized cancer therapeutics, and (iv) translational attempts to pipeline multi-scale panomics data for data-driven in silico clinical oncology. The review concludes that the absence of a generic, zero-code, panomics-based multi-scale modeling pipeline and associated software framework, impedes the development and seamless deployment of personalized in silico multi-scale models in clinical settings.


INTRODUCTION
In 1971, President Richard Nixon declared his euphemistic "war on cancer" through the promulgation of the National Cancer Act (1). Five decades later, despite ground-breaking discoveries and advancements in the field of cancer systems biology, a definitive and affordable cure for all types of cancer still evades humankind (2). Numerous "breakthrough" treatments have also gone on to exhibit adverse side effects (3,4) that lower patients' quality of life (QoL) or have reported degrading efficacies (5). At the heart of this problem lies our limited understanding of the bewildering multifactorial biomolecular complexity as well as patient-centricity of cancer.
Recent advances in biomolecular cancer research have helped factor system-level oncological manifestations into mutations across genetic, transcriptomic, proteomic, and metabolomic scales (6)(7)(8)(9) that also act in concert (10,11). Crosstalk between multi-scale pathways comprising of these oncogenic mutations can further exacerbate the etiology of the disease (7,(12)(13)(14). The combination of mutational diversity and interplay between the constituent pathways adds genetic heterogeneity and phenotypic plasticity in cancer cells (15,16). Hanahan and Weinberg (17,18) summarized this heterogeneity and plasticity into "Hallmarks of Cancer"a set of progressively acquired traits during the development of cancer.
Experimental techniques such as high-throughput nextgeneration sequencing, and mass spectrometry-based proteomics are now providing specific spatiotemporal cues on patient-specific biomolecular aberrations involved in cancer development and growth. The voluminous high-throughput patient data coupled with the remarkable complexity of the disease has given impetus to data integrative in silico cancer modeling and therapeutic evaluation approaches (19). Specifically, scale-specific molecular insights into key regulators underpinning each hallmark of cancer are now helping unravel the complex dynamics of the disease (20) besides creating avenues for personalized therapeutics (21,22). In this review, we will evaluate the emergence, evolution, and integration of multiscale cancer data towards building coherent and biologically plausible in silico models and their integrative analysis for employment in personalized cancer treatment in clinical settings. The review concludes by highlighting the need of integrating and modeling multi-omics data and associated software pipelines for employment in developing personalized therapeutics.

Biomolecular Databases
Biomolecular and clinical data generated from large-scale omics approaches for cancer research can be divided into four sub-categories: (i) genome, (ii) transcriptome, (iii) proteome, and (iv) metabolome (53).

Genome-Scale Databases
The foremost endeavor to collect and organize large-scale genomics data into coherent and accessible repositories led to the establishment of GenBank in 1986 (54) ( Figure 1B, Table  S1). This open-access resource now forms one of the largest public databases for nucleotide sequences from large-scale sequencing projects comprising over 300,000 species (55,56). In a salient study employing GenBank, Diez et al. (57) screened breast and ovarian cancer families with mutations in BRCA1 and BRCA2 genes and its distribution in the Spanish population.
Medrek et al. (58) employed microarray profile sets from GenBank to analyze gene levels for CD163 and CD68 in different breast cancer patient groups. The study established the need for localization of tumor-associated macrophages as a prognostic marker for breast cancer patients. To date, GenBank remains a comprehensive nucleotide database; however, its data heterogeneity poses a significant challenge in its employment in the development of personalized cancer therapeutics. Towards an improved data stratification and retrieval of genome-scale data, in 2002, Hubbard et al. (59) launched the Ensembl genome database. Ensembl provides a comprehensive resource for human genome sequences capable of automatic annotation and organization of large-scale sequencing data. Amongst various genome-wide studies utilizing Ensembl, Easton et al. (60) used this database to extract human sequence information to identify novel breast cancer susceptibility loci. Patientspecificity (10,11) and mutational diversity (19) in cancer can manifest across spatiotemporal scales. Hence, the availability of patient-specific data for each type of cancer can furnish valuable insights into the biomolecular foundation of the disease. In an attempt to provide cancer type-specific mutation data, Wellcome Trust's Sanger Institute developed Catalogue of Somatic Mutations in Cancer (COSMIC) (61) database. COSMIC comprises of 10,000 somatic mutations from 66,634 clinical samples. Schubbert et al. (62) employed COSMIC's mutation data to investigate Ras activity in cancers as well as developmental disorders. The study concluded that the duration, as well as the strength of hyperactive Ras signaling controls the probability of tumorigenesis. Similarly, Weir et al. (63) utilized COSMIC data on tumor-suppressor and protooncogenes in the study to characterize the genome of lung adenocarcinomas. The systematic copy-number analysis with SNP data indicated that several lung cancer genes remain to be elucidated and characterized.

Transcriptome-Scale Databases
Gene-level information can facilitate the development of personalized cancer models; however, gene expression may vary from cell to cell and across cancer patients. As a result, cancer patients have divergent genetic signatures and transcriptlevel information. Hence, high-throughput transcriptomic data has the potential to provide valuable insights into the transcriptomic complexity in cancer cells and can be useful in investigating cell state, physiology, and relevant biological events (64) ( Figure 1B and Table S1). Towards developing such a transcriptional information resource, in 2000, Edgar et al. (31) launched the Gene Expression Omnibus (GEO) initiative. GEO acts as a tertiary resource providing coherent high-throughput transcriptomic and functional genomics data. The platform now hosts over 3800 datasets and is expanding exponentially. GEO was employed by Chakraborty et al. (65) for annotation of chemo-resistant cell line models which helped investigate chemoresistance and glycolysis in ovarian cancers. The study identified Mitochondrial Calcium Uptake 1 (MICU1) as an important component for cancer metabolism that influences aerobic glycolysis and chemoresistance and can have a potential role in cancer therapeutics. The curation of patientspecific gene and protein expression data led to the development of The Cancer Genome Atlas (TCGA) (66). TCGA also captures the copy number variations and DNA methylation profiles for different cancer subtypes. TCGA's potential (49,67) is well exhibited by Leiserson et al.'s (68) pan-cancer analysis which helped identify 14 significantly mutated subnetworks containing numerous genes with rare somatic mutations across many cancers types. Davis et al. (69) further evaluated the genomic landscape of chromophobe renal cell carcinomas (ChRCCs) to elicit molecular patterns as clues for determining the origin of cancer cells. To facilitate in data management across different cancer projects as well as to ensure data uniformity towards developing data-driven models, the International Cancer Genome Consortium (ICGC) was launched in 2010 (70). ICGC adopts a federated data storage architecture that enables it to host a collection of scale-specific data from TCGA and 24 other projects (71). Burn et al. (72) estimated the distribution of cytosine in liver tumor data using ICGC. The study reported APOBEC3B (A3B) catalyzed deamination as a chronic source of DNA damage in breast cancer which also explains tumor cell evolution and heterogeneity. Supek et al. (73) compared mutation rates between different human cancers and reported the influence of "silent" mutations as a frequent contributor to cancer. Numerous databases have been established to store largescale genomic data, however, insights from an integrated analysis of genomic data across databases have the potential to provide precise biomolecular cues into complex processes and evolution in cancer cells. This was enabled by cBio Cancer Genomics Portal (cBioPortal) (74) in 2012, with multidimensional dataset retrieval, and exploration from multiple databases. The platform additionally provides data visualization tools, pathway exploration, statistical analysis, and selective data download features for seamless utilization of large-scale genomics data across genes, patient samples, projects, and databases (75). Numerous studies have effectively employed cBioPortal (76)(77)(78); in particular, Jiao et al. (79) evaluated the prognostic value of TP53 and its correlation with EGFR mutations in advanced non-small-cell lung cancer (NSCLC). The study established that TP53 coupled with EGFR mutation can lead to the more accurate prognosis of advanced NSCLC. Hou et al. (80) also used cBioPortal to deduce targetable genotypes which are present in young patients with lung adenocarcinomas and revealed that young patients with lung adenocarcinoma were more likely to harbor targetable genotype.

Proteome-Scale Databases
Transcriptomic data remains limited in providing a deterministic proteome profile (81)(82)(83). Particularly, transcripts produced in a cell can be degraded, translated inefficiently, or modified due to post-translational modification (84,85) resulting in no or a very small amount of functional protein (64). This relatively low correlation between transcriptome and proteome data was highlighted in 2019, by Bathke et al. (86) where it was shown that an increase in transcript synthesis cannot be directly associated with an increase in functional response in a cell. To facilitate functional analysis, there is a need to utilize proteomiclevel data, which can help to capture a more accurate quantitative assessment of complex biomolecular regulation for functional studies ( Figure 1B and Table S1). Following the successful completion of the Human Genome Project (HGP) (1998), in 2003, a group of Swedish researchers reported the Human Protein Atlas (HPA) (87,88) with an aim to map the entire set of human proteins in cells, tissues, and organs for normal as well as cancerous state (89). HPA employs large-scale omics-based technologies to localize and quantify protein expression patterns. The database has successfully managed to host comprehensive information on human proteins from cells, tissues, pathology, brain, and blood region-related studies. HPA data can be employed for various purposes such as investigating the spatial distribution of proteins in different tissue and comparing normal and cancerous protein expression patterns across samples, etc (87). In a salient study employing HPA, Gaḿez-Pozo et al. (90) studied the localized expression pattern of proteins to help profile human lung cancer subtypes. The study reported a combination of peptide-level expressions which can distinguish between non-small lung cancer samples and normal lung cancer in different histological subtypes. Imberg-Kazdan et al. (91) employed HPA to identify novel regulators of androgen receptor (AR) function in prostate cancer towards therapy. Another significant stride towards generation of proteomelevel information came with the establishment of the Human Proteome Project (HPP) (92,93) in 2008 (94) by the Human Proteome Organization (HUPO). HPP consolidated mass spectrometry-based proteomics data, and bioinformatics pipelines, with the aim to organize and map the entire human proteome. To date, numerous studies have utilized HPP towards identifying the complex protein machinery involved in cancer cell fate outcomes (95)(96)(97)(98)(99)(100). Amongst the earliest attempts, in 2001, Sebastian et al. (101) employed the HPP platform to deduce the complex regulatory region of the human CYP19 gene ('armatose'), one of the contributors of breast cancer regulation. HPP project was later segmented into "biology and disease-oriented HPP" (B/D HPP) (102) and chromosomecentric HPP (C-HPP) (103). Specifically, Gupta et al. (104) carried out an extensive analysis of existing experimental and bioinformatics databases to annotate and decipher proteins associated with glioma on chromosome 12, while, Wang et al. (95) performed a qualitative and quantitative assessment of human chromosome 20 genes in cancer tissue and cells using C-HPP resources. The study revealed that several cancerassociated proteins on chromosome 20 were tissue or celltype specific.

Metabolome-Scale Databases
Metabolic reprogramming is one of the earliest manifestations during tumorigenesis (105) and therefore, potentiates the usefulness of identifying metabolic biomarkers involved in cancer onset, its prognosis, as well as treatment. Large-scale efforts to collect metabolomics data have led to the development of several online databases (106-108) ( Figure 1B and Table S1) including the Golm Metabolome Database (GMD) (106), in 2004. GMD provides a comprehensive resource on metabolic profiles, customized mass spectral libraries, along spectral information for use in metabolite identification. GMD was employed in 2011 by Wedge et al. (109) to identify and compare metabolic profiles in serum and plasma samples for small-cell lung cancer patients towards determining optimal agent for onwards analysis. The study showed that the discriminatory ability of both serum and plasma was equivalent. In 2013, Pasikanti et al. (110) utilized GMD to identify biomarker metabolites present in bladder cancer. The study proposed a potential role of kynurenine in malignancy and therapeutic intervention in bladder cancer. To allow for largescale metabolic data stratification and retrieval, in 2007, Wishart et al. published the Human Metabolome Database (HMDB) (107,111). HMDB contains organism-specific information on metabolites across various biospecimens and their accompanying environments. It is now the world's largest metabolomics database with around 114,100 metabolites that have been characterized and annotated. HMDB was employed by Sugimoto et al. (112) to identify environmental compounds specific to oral, breast, and pancreatic cancer profiles. The study identified 57 principal metabolites to help predict disease susceptibility, besides being potential markers for medical screening. Agren et al. (113) employed metabolomic data from HMDB to construct metabolic network models for 69 human cells and 16 cancer types. The study's comprehensive metabolic network analysis between disease and normal cell types has the potential to provide avenues for the identification of cancer-specific metabolic targets for therapeutic interventions. The HMDB supports data deposition and dissemination, however, integrated exploratory analysis is not available. The Metabolomics Workbench (108), reported in 2016, provides information on metabolomics metadata and experimental data across species, along with an integrated set of exploratory analysis tools. The platform also acts as a resource to integrate, deposit, track, analyze, as well as disseminate large-scale heterogeneous metabolomics data from a variety of studies. In a case study built using this platform, Hattori et al. (114) studied the aberrant BCAA (branched-chain amino acids) metabolism activation by MSI2 (Musashi2)-BCAT1 axis which they reported to drive myeloid leukemia progression.

Biomolecular Pathway Databases
Investigations restricted to single biomolecular scales have limited translational potential as cancer dysregulation is driven by tightly coupled biomolecular pathways constituted by biomolecules from a variety of spatiotemporal scales (discussed above). Such biomolecular pathways represent organized cascades of interactions integrating different spatiotemporal scales towards reaching specific phenotypic cell fate outcomes. Numerous scale-specific and multi-omics biomolecular pathway databases now exist to help retrieve, store and analyze existing pathway information towards understanding cellular communication in light of complex cancer regulation (32,34,115). One of the earliest attempts at integrating genomics data for pathway construction came in 1995, with the establishment of the Kyoto Encyclopedia of Genes and Genomes (KEGG) database (32) ( Figure 1C and Table S2). Over time, KEGG has significantly expanded to include high-throughput multi-omics data (116). As a result, the resource is divided into fifteen subgroups including KEGG Genome (for genome-level pathways), KEGG Compound (for small molecules level pathways), KEGG Gene (for gene and protein pathways), KEGG Reaction (for biochemical reaction and metabolic pathways), etc. Li et al. (117) used the KEGG database to perform pathway enrichment analysis for predicting the function of circular RNA (circRNA) dysregulation in colorectal cancer (CRC). The study highlighted circDDX17 potential role as a tumor suppressor and biomarker for CRC. While employing KEGG, Feng et al. (118) identified four up-regulated differentially expressed genes associated with poor prognosis in ovarian cancer. Further, to furnish information on pathways for high-throughput functional analysis studies, PANTHER (protein annotation through evolutionary relationship) database was established in 2010 (34,115). PANTHER hosts information on ontological gene and protein-protein interaction pathways by leveraging GenBank and Human Gene Mutation Database (HGMD) (119), etc. Turcan et al. (120) employed PANTHER to perform network pathway enrichment for biological processes in differentially expressed genes, especially to investigate IDH1 mutations in glioma hypermethylation phenotype. The study provided a framework for understanding gliomas and the interplay between genomic as well as epigenomic regulation in cancer. To store metabolic pathway information in a cell, Karp et al. (121) developed MetaCyc, a comprehensive reference database comprising of metabolic pathways. MetaCyc is currently available as a web-based resource with metabolic pathway information which can be employed to investigate metabolic reengineering in cancers, carry out biochemistrybased studies, and explore cancer cell metabolism, etc. Miller et al. (122) demonstrated the utility of MetaCyc database by evaluating plasma metabolomic profiles after limonene intervention in breast cancer patients. The study employed MetaCyc to perform pathway-based interpretations and revealed that such alterations were related with tissue-level cyclin D1 expression changes.

Biomolecular Network Databases
The regulatory complexity of cancer is compounded by the crosstalk between numerous multi-scale biomolecular pathways resulting in the formation of complex interaction networks ( Figure 1D and Table S3). One of the earliest biomolecular network databases, the Biomolecular Interaction Network Database (BIND) (123) was established in 2001, with an aim to organize biomolecular interactions between genes, transcripts, proteins, metabolites, as well as small molecules. Chen et al. (124) employed BIND to construct a biological interaction network (BIN) towards investigating tyrosine kinase regulation in breast cancer development. The study identified SLC4A7 and TOLLIP as novel tyrosine kinase substrates which are also linked to tumorigenesis. BIND provides a comprehensive resource of predefined interacting pathways; however, it does not contain 'indirect' interaction information. In contrast, the Molecular INTeraction database (MINT) (37), developed in 2002, curates existing literature to develop networks with both direct as well as indirect interactions from large-scale projects with information from genes, transcripts, proteins, promoter regions, etc. MINT can store data on "functional" interaction such as enzymatic properties and modifications present in biomolecular regulatory networks. The database was employed by Vinayagam et al. (125) to construct a human immunodeficiency virus (HIV) network that helped identify novel cancer genes across genomic datasets. The Database of Interacting Proteins (DIP) (126) was developed to mine existing literature and experimental studies on biomolecules and their pathways to construct protein interaction networks (127)(128)(129). Goh et al. (127) constructed a protein-protein interaction network for investigating liver cancer using DIP. The study revealed that hepatocellular carcinoma (HCC) at moderate stage is enriched in proteins that are involved in the immune response. Similarly, Zhao et al. (128) identified autophagic pathways in plants lectin-treated cancer cells which are regulated by microRNAs. The study showed that plant lectin has the potential to block sugar-containing receptor EGFRmediated pathways thereby leading to autophagic cell death. To further consolidate and integrate protein interaction data across pathways as well as organisms, the Search Tool for the Retrieval of Interacting Genes/Proteins -STRING database was developed in 2005 (130). STRING provides a comprehensive text-mining and computational prediction platform which is accessible through an intuitive web interface (131). STRING database also provides information on interaction weights for edges between biomolecules to show an estimated likelihood for each interaction in the network (131). Mlecnik et al. (132) employed STRING database to study T-cells homing factors in colorectal cancer and demonstrated that specific chemokines and adhesion molecules had high densities of T-cell subsets in cancer.

Cellular Environment Databases
Each pathway within a biomolecular network requires input cues from the extracellular environment for onward downstream signal transduction (133)(134)(135). In the case of cancer, the biomolecular milieu constituting the tumor microenvironment (TME) acts as a niche for tumor development, metastasis, as well as therapy response (136) ( Figure 1E and Table S4). Efforts to curate information from the environmental factors such as metabolites, matrisome, and other environmental compounds led to the development of MatrixDB (38) in 2011, which hosts matrixbased information on interactions between extracellular proteins and polysaccharides. MatrixDB additionally links databases with information on genes encoding extracellular proteins such as Human Protein Atlas (137) and UniGene (138) as well as host transcripts information. Celik et al. (139) employed MatrixDB data to evaluate epithelial-mesenchymal transition (EMT) inducers in the environment using nine ovarian cancer datasets and discovered a novel biomarker, HOPX, with the potential to drive tumor-associated stroma. To host studies on extracellular matrix (ECM) proteins from normal as well as disease-inflicted tissue samples, MatrisomeDB (39) was established in 2020. The database curates 17 studies on 15 physiologically healthy murine and human tissue as well as 6 cancer types from different stages (including breast, colon, ovarian, and lung cancer) along with other diseases. Levi-Galibov et al. (140) employed MatrisomeDB to investigate the progression of chronic intestinal inflammation in colon cancer. The study revealed the critical role of heat shock factor 1 (HSF1) during early changes in extracellular matrix structure as well as its composition.

Cell Line Databases
In vitro cell lines derived from cancer patients have become an essential tool for clinical and translational research (141). These cell lines are defined based on gene expression profiles and morphological features which have been cataloged in various databases such as the Cancer Cell Line Encyclopedia (CCLE) (42) ( Figure 1F and Table S5). CCLE contains mutation data on 947 different human cancer cell lines coupled with pharmacological profiles of 24 anti-cancer drugs (42) for evaluating therapeutic effectiveness and sensitivity. Li et al. (142) employed CCLE data to investigate cancer cell line metabolism. The study showed that the mutated asparagine synthetase (ASNS) hypermethylation can cause gastric as well as hepatic cancers to sensitized asparaginase therapy. Hanniford et al. (143) demonstrated epigenetic silencing of RNA during invasion and metastasis in melanoma using the CCLE database. Other cell line databases include Cell Line Data Base (CLDB) (43), and The COSMIC Cell Lines Project (40), and CellMinerCDB (41). The CellMinerCDB (2018) curates data from National Cancer Institute (NCI) (144), BROAD institute (145), Sanger institute (146), and Massachusetts General Hospital (MGH) (147) and provides a platform for pharmacological and genomic analysis.

Histopathological Image Databases
Additionally, histopathological image datasets derived from the microscopic examination of tumor biopsy samples furnish information on cellular structure, function, chemistry, morphology, etc. Numerous histopathological image-based databases have been developed to store, manage, and retrieve such information ( Figure 1G and Table S6). Amongst these databases, The Cancer Imaging Archive (TCIA) (46), reported in 2013, provides a multi-component architecture with various types of images including region-specific (e.g., Breast), cancer-type specific (e.g., TCGA-GBM, TCGA-BRCA), radiology, and anatomy images (e.g., Prostate-MRI). The cancer image collection in TCIA has been captured using a variety of modalities including radiation treatment, X-ray, mammography, and computed tomography (CT), etc (148). Li et al. (149) exploited TCIA by using radiomics data in predicting the risk for breast cancer recurrence, while Sun et al. (150) employed image data to perform a cohort study to validate a radiomics-based biomarker in cancer patients. The study developed a radiomic signature for CD8 cells using the TCGA dataset for predicting the immune phenotype of tumors and deduce clinical outcomes. In 2013, image data from TCIA was integrated with The Cancer Digital Slide Archive (CDSA) (44). The CDSA hosts imaging as well as histopathological data and provides more than 20,000 whole-slide images of 22 different cancer types. The whole-slide images of individual patients present in CDSA help in linking tumor morphology with the patient's genomic and clinical data. Khosravi et al. (151) performed a deep convolution study using CDSA, to distinguish heterogeneous digital pathology images across different types of cancers. To associate patient's genetic information and histology images, Genotype-Tissue Expression (GTEx) (45) was reported in 2014, and was curated using datasets from non-disease tissues of 1000 individuals. Patel et al. (152) employed GTEx to investigate intratumoral heterogeneity present in glioblastoma and concluded that glioblastoma subtype classifiers are variably expressed in individual cells.

Mutation and Drug Databases
Pharmacological investigations have elucidated the mechanism as well as efficacies of numerous cancer drugs, in clinical and preclinical studies (153,154). Databases with drug-target information can be employed in precision oncology towards designing efficacious patient-centric therapeutic strategies ( Figure 1H, Table S7). These databases include DrugBank (155), which was established in 2006 and contains information from over 4100 drug entries, 800 FDA-approved small molecules, and 14,000 protein or drug target sequences. DrugBank combines drug data with drug-target information thus enabling applications in cancer biology including in silico drug target discovery, drug design, drug interaction prediction, etc. In a study employing DrugBank, Augustyn et al. (156) evaluated potential therapeutic targets of achaete-scute homolog 1 (ASCL1) genes in lung cancers and reported unique molecular vulnerabilities for potential therapeutics, while Han et al. (157) determined synergistic combinations of drug targets in K562 chronic myeloid leukemia (CML) cells including BCL2L1 and MCL1 combination. Further to evaluate drugs in light of the patient's genomic signature, PanDrugs (52) database was established in 2019 and currently hosts data from 24 sources and 56297 drug-target pairs along with 9092 unique compounds and 4804 genes. Using PanDrugs, Fernańdez-Navarro et al. (158) prioritized personalized drug treatments using PanDrugs, for Tcell acute lymphoblastic (T-ALL) patients.
Altogether, the availability of voluminous high-resolution biomolecular data has enabled the development of a quantitative understanding of aberrant mechanisms underpinning hallmarks of cancer as well as create avenues for personalized therapeutic insights. Recently, Karimi et al. (159) systematically evaluated the current multi-omics data generation approaches, as well as their associated analysis pipelines for employment in cancer research.

Genome-Scale Models
Amongst initial attempts at developing cancer gene regulation models, in 1987, Leppert et al. (169) reported a computational model to study the genetic locus of familial polyposis coli and its involvement in colonic polyposis and colorectal cancer. The study was also validated by Mehl et al. (170) in 1991, which further elucidated the formation and development of familial

Transcriptome-Scale Models
In silico models developed using transcriptomic expression data can assist in comparing gene expression patterns in cancer for investigating genetic heterogeneity and cancer development as well as towards precision therapy ( Figure 2B

Proteome-Scale Models
To capture the quantitative aspects of biomolecular regulations and functional studies (82,83), data-driven proteomic-based cancer models are essential ( Figure 2C). Such models can be particularly helpful in diagnostic as well as prognostic purposes as well as for monitoring response to treatment (82,83). In a study employing proteome-level information, in 2011, Baloria et al. (190) carried out an in silico proteome-based characterization of the human epidermal growth factor receptor 2 (HER-2) to evaluate its immunogenicity in an in silico DNA vaccine. Akhoon et al. (191) simplified this approach with the development of a new prophylactic in silico DNA vaccine using IL-12 as an adjuvant. In 2017, Fang et al. (192) employed proteome level data towards predicting in silico drug-target interactions for applications in targeted cancer therapeutics, while in 2018, Azevedo et al. (193) designed novel glycobiomarkers in bladder cancer. Recently, in 2020, Lee et al. (194) reported an integrated proteome model of macrophage migration in a complex tumor microenvironment. However, proteome-level models are limited in their ability to provide a complete analysis of the biomolecules present in a cell since they lack information on low molecular weight biomolecular compounds such as metabolites (105).

Metabolome-Scale Models
Metabolic data-driven models can be especially useful in understanding cancer cell metabolism, mitochondrial dysfunction, metabolic pathway alteration, etc ( Figure 2D

Tumor Environment Models
Integrative mathematical models of environmental cues and extracellular matrix can help researchers abstract tumor microenvironments. Such data-driven models can be used to study angiogenesis (201), cell adhesion (202), and vasculature (203), etc ( Figure 2E). Amongst initial attempts at developing cancer environment-based in silico models, in 1972, Greenpan et al. (172) designed a solid carcinoma in silico model to evaluate cancer cell behavior in limited diffusion settings. In 1976, the model was expanded (173) to investigate tumor growth in asymmetric conditions. In 1996, Chaplain developed a mathematical model to elucidate avascular growth, angiogenesis, and vascular growth in solid tumors (174). Anderson and Chaplain (204), in 1998, expanded this strategy and reported a continuous and discrete model for tumorinduced angiogenesis. This modeling approach was further augmented in 2005 by Anderson (202) to a hybrid mathematical model of a solid tumor to study cellular adhesion in tumor cell invasion. Organ-specific metastases and associated survival ratios in small cell lung cancer patients have been modeled and evaluated (205) using similar models.

Cell Models
To model cell population-level behavior in cancer, researchers are increasingly developing innovative cell lines in silico models which can complement in vivo wet-lab experiments, while overcoming wet-lab limitations (206) (Figure 2F). Such models are employed to investigate cell-to-cell interactions as well as evaluate physical features of the synthetic extracellular matrix (ECM) (206), etc (175,207). Amongst initial attempts at developing in silico cell line models, in 1989, Shackney et al. (175) proposed an in silico cancer cell model to study tumor evolution. Results showed an association between discrete aneuploidy peaks with the activation of growth-promoting genes. In 2007, Aubert et al. (207) developed an in silico glioma cell migration model and validated cell migration preferences for homotype and heterotypic gap junctions with experimental results. Gerlee and Nelander (176) expanded this work, in 2012, to investigate the effect of phenotype switching in glioblastoma growth. In conclusion, cell-based cancer models help provide scale-specific insights into cancer, however, they remain limited in investigating the spatiotemporal tissue diversity and heterogeneity in cancer patients.

Multi-Scale Models
Recently, data-driven multi-scale models are becoming increasingly popular in cancer (208)  Summarily, data-integrative computational models have now assumed the forefront in decoding the complex biomolecular regulations involved in cancer and are increasingly been employed for the development of personalized preclinical models as well as therapeutics design (220).

Biomolecular Modeling Platforms
Software platforms aimed at modeling biomolecular entities, abstract information from published literature as well as highthroughput technologies, and model them using a Boolean modeling approach or a differential equations modeling strategy.

Boolean Modeling Software
Boolean network modeling technique was first introduced by Kauffman (232, 233), in 1969. This approach has been widely adopted as a tool to model gene, transcript, protein, and metabolite regulatory networks. Numerous mathematical and computational cancer models have been developed using this representation (171,(221)(222)(223)(224)(225). To facilitate the Boolean model development and analysis process, several platforms have been devised (234-236) ( Table S8). The applicability of these platforms can be further categorized into qualitative or quantitative Boolean network modeling.

Qualitative Boolean Modeling
Biomolecular qualitative Boolean models are a widely employed approach in cancer systems biology research to cater for cases where there is insufficient quantitative information, and/or lack of mechanistic understanding. Thus far, numerous platforms have been reported to help researchers develop qualitative Boolean network models. Amongst them, FluxAnalyzer (237) (244) studied mutually exclusive and cooccurring genetic alterations in bladder cancer. GIMsim also employed multi-valued logical functions, useful in simulating qualitative dynamical behavior in cancer research. However, the platform was unable to program automatic theoretical predictions, moreover, it only employed qualitative analysis approaches and could not be used to accurately map cell fates based on quantitative biomolecular expression data. Taken together, classical qualitative Boolean modeling approaches remain limited in developing predictive cancer models that could leverage quantitative biomolecular expression data generated from next-generation proteomics and relatedsequencing projects.

Quantitative Boolean Modeling
Platforms aimed to integrate quantitative expression data from existing literature and databases towards carrying out network annotation and onwards analysis can be particularly useful in developing personalized cancer models. One such platform, the Markovian Boolean Stochastic Simulator (MaBoss), was established by Stoll et al. (245)

Differential Equations Modeling Software
Boolean models have proven to be a powerful tool in modeling complex biomolecular signaling networks, however, these models are unable to describe continuous concentration, and cannot be used to quantify the time-dependent behavior of biological systems, necessitating the need to switch to quantitative differential equations (254). As a result, numerous stand-alone and web-based tools have been developed to build continuous network models to help describe the temporal evolution of biomolecules towards elucidating more accurate cell fate outcomes from quantitative expression data (Table S8).
Amongst initial attempts at developing such software, GEPASI (GEneral PAthway Simulator) was designed by Pedro Mendes et al. (228), in 1993. GEPASI is a stand-alone simulator that facilitates formulating mathematical models of biochemical reaction networks. GEPASI can also be used to perform parametric sensitivity analysis using an automatic pipeline that evaluates networks in light of exhaustive combinatorial input parameters. Ricci

Cell Environment Modeling Software
To facilitate the development of environmental models that can help investigate inter-, intra-, and extracellular interactions between cellular network models and their dynamic environment, several software and platforms have been reported (229,264). These platforms employ discrete, continuous, and hybrid approaches to develop models of cellular microenvironments towards setting up specific biological contexts such as normoxia, hypoxia, Warburg effect, etc (  (206) in 2020. SALSA is general-purpose software that employs a minimum programming requirement, a significant advantage over its predecessors. The platform has been useful in studying cellular diffusion in 3D cultures. This recent tool was validated in 2021, with a case study that evaluated and predicted therapeutic agents in 3D cell cultures (270).

Cell-Level Modeling Software
Towards modeling cancer cell-specific behaviors such as cellular adhesion, membrane transport, loss of cell polarity, etc several software have been reported which can help develop in silico cancer cell models ( Table S10)

Multi-Scale Modeling Software
Multi-scale cancer modeling approaches bring together scalespecific information towards undertaking an integrative analysis of heterogeneous experimental data by building coherent and biologically plausible models (279)(280)(281). Several multi-scale modeling platforms have been reported to help develop multilevel cancer models (231) ( Table S11) Although CompuCell provides an intuitive framework modeling paradigm, the platform core is not conducive to multi-scale cancer modeling. The focus of the software is primarily multi-agent simulations rather than multi-scale cancer modeling. This poses a significant challenge in the utilization of the software. In 2013, CHASTE (280) (Cancer Heart and Soft Tissue Environment) was launched, which provides a computational simulation pipeline for the mathematical modeling of complex multi-scale models. Users can employ CHASTE for a wide range of problems involving on and off-lattice modeling workflows. CHASTE has also previously been employed to model colorectal cancer crypts. Nonetheless, CHASTE does not have a GUI and can only be executed by command line text commands. Furthermore, it requires recompilation on part of the modeler to use the code updates performed by the group. To further improve the multi-scale modeling approach for investigating cancer, in 2013, Chaudhary et al. (281) published ELECANS (Electronic Cancer System). ELECANS had an intuitive but rigid GUI along with a programmable SDK besides the lack of a highperformance simulation engine (286,287). ELECANS was employed to model the mitochondrial processes in cancer towards elucidating the hidden mechanisms involved in cell death (165). ELECANS provided a feature-rich environment for constructing multi-scale models, however, the platform lacked a biomolecular network integration pipeline, and also placed a heavy programming requirement on its users. In contrast, in 2018, PhysiCell (288) (225). Summarily, multi-scale modeling software has enabled the development of biologically plausible cancer models to varying degrees. These platforms, however, fall short of providing a generic and high-throughput environment that could be conveniently translated into clinical settings.

PIPELINING PANOMICS DATA TOWARDS IN SILICO CLINICAL SYSTEMS ONCOLOGY
In silico multi-scale cancer models, annotated with patientspecific biomolecular and clinical data, can help decode complex mechanisms underpinning tumorigenesis and assist in clinical decision-making. Clinically driven in silico multi-scale cancer models simulate in vivo tumor growth and response to therapies across biocomplexity scales, within a clinical environment, towards evaluating efficacious treatment combinations. To facilitate the development of multi-scale cancer models, several large-scale program projects have been launched (Table S12)  Here, we review and evaluate five salient projects for multi-scale cancer modeling towards their clinical deployment.

ACGT Project
The ACGT project (293), launched in 2007, proposed to develop Clinico-Genomic infrastructure for organizing clinical and genomic data towards investigating personalized therapeutics regimens for an individual cancer patient. The ACGT platform provides an open-source and open-access infrastructure designed to support the development of "oncosimulators" to help clinicians accurately compare results from different clinical trials and enhance their efficiency towards optimizing cancer treatment. The ACGT framework employs molecular and clinical data generated from different sources including whole FIGURE 5 | Salient projects pipelining multi-scale panomics data into clinical settingsa timeline. Timeline highlighting salient project platforms for developing realistic and clinically-driven multi-scale cancer models, along with their associated leading case studies. genome sequencing, histopathological, imaging, molecular, and clinical data, etc. to develop simulators for mimicking clinical trials. Personalized panomics data employed to develop oncosimulators in ACGT is extracted from real patients which enables the oncosimulators to be clinically relevant for predictive purposes. Additionally, ACGT provides data retrieval, storage, integrative, anonymization, and analysis as well as results presentation capabilities. Using the platform, in 2004, a personalized, spatiotemporal oncosimulator model of breast cancer was developed to mimic a clinical trial based on protocols outlined in the Trial of Principle (TOP), towards evaluating the model response to chemotherapeutic treatment in neoadjuvant settings (298). Similarly, in 2009, Graf et al. (299) modeled nephroblastoma oncosimulator, a childhood cancer of the kidney, based on a clinical trial run by the International Society of Paediatric Oncology (SIOP) for simulating tumor response to therapeutic regimens in clinical trials. The results generated from the TOP and SIOP trials enabled the ACGT oncosimulators to adapt in light of real clinical conditions and the software to be validated against multi-scale patient data. The focus of ACGT oncosimulators, however, is limited to existing clinical trials for predicting efficacious treatment combinations.

The ContraCancrum Project
Towards establishing a platform for the development of composite multi-scale models for simulating malignant tumor models, in 2008, ContraCancrum project (294) was initiated. ContraCancrum aimed to construct a multi-scale computational framework for translating personalized cancer models into clinical settings towards simulating malignant tumor development and response to therapeutic regimens. For that, the Individualized MediciNe Simulation Environment (IMENSE) platform (300,301) was established, to undertake the oncosimulator development process. The platform was employed for molecular and clinical data storage, retrieval, integration, and analysis. Oncosimulators, developed under the ContraCancrum project, employ patient data across biologically and clinically relevant scales including molecular, environmental, cell, and tissues level. These oncosimulators can be used to optimize personalized cancer therapeutics for assisting clinician decision-making process. To date, several oncosimulators have been reported, under the ContraCancrum project (27,294,299,(301)(302)(303)(304)(305). In particular, the initial validation of the ContraCancrum workflow was performed using two case studies; glioblastoma multiforme (GBM) (294,305) and non-small cell lung cancer (NSCLC) (294,304). In 2010, Folarin and Stamatakos (306) designed a glioblastoma oncosimulator using personalized molecular patient data to evaluate treatment response under the effect of a chemotherapeutic drug (temozolomide) (300). Similarly, Roniotis et al. (305) developed a multi-scale finite elements workflow to model glioblastoma growth, while Giatili et al. (307) outlined explicit boundary condition treatment in glioblastoma using an in silico tumor growth model. The results from the case studies were validated by comparing in silico prediction with pre-and post-operative imaging and clinical data (294,302). Moreover, The ContraCancrum project hosts more than 100 lung patients' tumor and blood samples (294). This data is employed to develop clinically validated in silico multilevel cancer models for NSCLC using patient-specific data (304). In 2010, using a biochemical oncosimulator, Wan et al. (304) investigated the binding affinities for AEE788 and Gefitinib tyrosine kinase inhibitor against mutated epidermal growth factor receptor (EGFR) for NSCLC treatment. Similarly, Wang et al. (308) developed a 2D agent-based NSCLC model to investigate proliferation markers and evaluated ERK as a suitable target for targeted therapy. Although ContraCancrum's project has created numerous avenues for multilevel cancer model development, the platform is limited to only specific types of cancer for which data is internally available on the platform.

p-Medicine Project
Towards improving clinical deployment capabilities of oncosimulators, another pilot project: the personalized medicine (p-medicine) project was launched in 2011 (295). The main aim of p-medicine was to create biomedical tools facilitating translation of current medicine to P5 medicine (predictive, personalized, preventive, participatory, and psychocognitive) (309). For that, the p-medicine portal provides a webbased environment that hosts specifically-purpose tools for personalized panomics data integration, management, and model development. The portal has an intuitive graphical user interface (GUI) with an integrated workbench application for integrating information from clinical practices, histopathological imaging, treatment, and omics data, etc. Computational models developed under p-medicine workflow, are quantitatively adapted to clinical settings since they are derived using real multi-scale data. Several multi-scale cancer simulation models (oncosimulators) (295,309) have been devised, using the pmedicine workflow. Amongst these, in 2012, Georgiadi et al.  (316) expanded the ALL oncosimulator model to a hybrid oncosimulator for predicting pre-phase treatments for ALL patients. The validation of these models was undertaken using clinical trial data in pre and posttreatment (317). Although p-medicine presents a state-of-the-art in silico multi-scale cancer modeling environment, the project is limited in its application for determining individual biomarkers for potential novel therapy identification. Moreover, the project is limited to its niche set of tools and models that cannot be integrated with other similar model repositories and platforms.

TUMOR Project
In 2012, a transatlantic USA-EU partnership was initiated with the launch of Transatlantic Tumor Model Repositories (TUMOR) (296). TUMOR provides an integrated, interoperable transatlantic research environment for developing a clinically driven cancer model repository. The repository aimed to integrate computational cancer models developed by different research groups. TUMOR's transatlantic aim was to couple models from ACGT and ContraCancrum projects with those at the Center for the Development of a Virtual Tumor (CViT) (318) as well as other relevant centers. TUMOR is predicted to serve as an international clinical translation, interoperable and validation platform for in silico oncology hosting multi-scale cancer models from different cancer model repositories and platforms such as E-Cell (230), CellML (319), FieldML (320), and BioModels (35,321). To support model data interoperability between platforms, TumorML (Tumor model repositories Markup Language) (321) was developed towards facilitating inter-data operability in the TUMOR project. The TUMOR environment also offers a wide range of additional services supporting predictive oncology and individualized optimization of cancer treatment. For example, the platform allows remote access of predictive cancer models in hospitals and to clinicians for the development of quantitative cancer research and personalized cancer therapy model. TUMOR environment also incorporates deterministic as well as stochastic models through COPASI simulator (257). An automatic validation pipeline is also embedded for the execution and deployment of these models in clinical settings (296). TUMOR's ability to couple and integrate models from different scales, approaches, and repositories towards increasing model accuracy in predictive oncology was exemplified with Wang et al.'s (322) NSCLC model. In 2007, Wang et al. developed a 2D multi-scale NSCLC model for evaluating tumor expansion dynamics in NSCLC patients, in CViT. This model was exported in TumorML and made available in the TUMOR repository to help evaluate growth factor influence in aggressive cancer (321). Similarly, in 2014 Sakkalis et al. (323) employed the TUMOR platform to interlink and couple three independent glioblastomaspecific cancer models (EGFR signaling (324), cancer metabolism (325), Oncosimulator (323), reported by different research groups. The resultant model was used to investigate the impact of radiation and temozolomide (chemotherapy) on glioblastoma multiforme to evaluate treatment effectiveness.

CHIC Project
Another transatlantic project Computational Horizons In Cancer (CHIC) (297) launched in 2013, aimed to provide an oncosimulator modeling platform for in silico oncology. CHIC was initiated to develop and implement predictive oncology and individualized multi-scale cancer modeling tools towards assisting quantitative cancer research and personalized cancer therapy. The workflow and tools established under the ambient of the CHIC project allowed the development of robust, interoperable, and collaborative in silico models in cancer and normal conditions. The CHIC project also proposed a pipeline to translate the model towards supporting clinicians to make optimal personalized treatment plans for individual patients. To this end, several models are created using CHIC such as non-small cell lung cancer (326), glioblastoma (327), leukemia model (328), etc. These models furnish a quantitative understanding of tumorigenesis towards providing avenues for promoting individual cancer patient treatment combinations. One such model reported by Kolokotroni et al. (326) evaluated the efficacy of cisplatin-based therapy for NSCLC patients using in silico multiscale cancer model. While, in 2015, Antonopoulos and Stamatakos (327) modeled the infiltration of glioblastoma cells in normal brain regions using a novel treatment. In 2017, Stamatakos and Giatili (329) extended glioblastoma oncosimulator for modeling tumor growth using reaction-diffusion numerical handling based on multi-scale panomics data. They further proposed a clinical pipeline to translate the model into clinical settings towards supporting clinicians to make optimal personalized treatment plans for individual patients. Ouzounoglou et al. (328) developed an in silico multi-scale leukemia oncosimulator model towards modeling deregulations in the G1/S pathway to investigate the altered function of retinoblastoma in ALL patients. However, a clinical translation of these models is currently in the works (297).

DISCUSSION
In this work, we have evaluated the use of data-driven multi-scale cancer models in deciphering complex biomolecular underpinnings in cancer research towards developing personalized therapeutics interventions for clinical decisionmaking. Specifically, we have discussed the chronological evolution of online cancer data repositories that host highresolution datasets from multiple spatiotemporal scales. Next, we evaluate how this data drives single-and multi-scale systems biology models towards decoding complex cancer regulation in patients. We then track the development of various modeling software and associated applications in enhancing the translational role of cancer systems biology models in clinics.
We conclude that the contemporary multi-scale modeling software line-up remains limited in their clinical employment due to the lack of a generic, zero-code, panomic-based framework for translating research models into clinical settings. Such a framework would help annotate in silico cancer models developed using single and multi-scale databases (45,61,(330)(331)(332)(333)(334). The framework should also provide an environment for developing the extra-cellular matrix of a cancer cell which can then be integrated into cellular models. Existing environment (38,39) and cell line databases (41)(42)(43) need to be integrated to design environmental models along with biologically plausible cell line structures. These cell lines could then be assembled into three-dimensional geometries to create multi-scale in silico organoids. The pipeline should also furnish capabilities such as a convenient import workflow for clinical data integration through histopathological image data repositories (44)(45)(46) for designing biologically accurate organoid structures based on each cancer patient's underlying cellular morphology. Once the personalized multi-scale model has been constructed, the pipeline would allow investigation into the temporal evolution of the multiscale organoid under personalized inputs and user-designed biomolecular entities. Data generated from the multi-scale model simulation can be analyzed to elicit biomolecular cues for each cancer patient as well as determine its role. Due to the heterogeneity and complexity of biomolecular data a coherent panomics-based pipeline can be challenging to develop and will require a collaborative effort by various research groups, through close collaborations and data standardization.
Taken together, a translational in silico systems oncology pipeline is the need of the hour and will help develop and deliver personalized treatments of cancer as well as substantively inform clinical decision-making processes.

AUTHOR CONTRIBUTIONS
SC and MG carried out the literature review and drafted the manuscript. All authors contributed to the article and approved the submitted version.

ACKNOWLEDGMENTS
This manuscript has been released as a pre-print at bioRxiv (335).