Cancer is a heterogeneous and complex disease and one of the leading causes of death worldwide. The high tumor heterogeneity between individuals affected by the same cancer type is accompanied by distinct molecular and phenotypic tumor profiles and variation in drug treatment response. In silico modeling of cancer as an aberrantly regulated system of interacting signaling molecules provides a basis to enhance our biological understanding of disease progression, and it offers the means to use computer simulations to test and optimize drug therapy designs on particular cancer types and subtypes. This sets the stage for precision medicine: the design of treatments tailored to individuals or groups of patients based on their tumor-specific molecular cancer profiles. Here, we show how a relatively large manually curated logical model can be efficiently enhanced further by including components highlighted by a multi-omics data analysis of data from Consensus Molecular Subtypes covering colorectal cancer. The model expansion was performed in a pathway-centric manner, following a partitioning of the model into functional subsystems, named modules. The resulting approach constitutes a middle-out modeling strategy enabling a data-driven expansion of a model from a generic and intermediate level of molecular detail to a model better covering relevant processes that are affected in specific cancer subtypes, comprising 183 biological entities and 603 interactions between them, partitioned in 25 functional modules of varying size and structure. We tested this model for its ability to correctly predict drug combination synergies, against a dataset of experimentally determined cell growth responses with 18 drugs in all combinations, on eight cancer cell lines. The results indicate that the extended model had an improved accuracy for drug synergy prediction for the majority of the experimentally tested cancer cell lines, although significant improvements of the model’s predictive performance are still needed. Our study demonstrates how a tumor-data driven middle-out approach toward refining a logical model of a biological system can further customize a computer model to represent specific cancer cell lines and provide a basis for identifying synergistic effects of drugs targeting specific regulatory proteins. This approach bridges between preclinical cancer model data and clinical patient data and may thereby ultimately be of help to develop patient-specific in silico models that can steer treatment decisions in the clinic.
In highly non-linear datasets, attributes or features do not allow readily finding visual patterns for identifying common underlying behaviors. Therefore, it is not possible to achieve classification or regression using linear or mildly non-linear hyperspace partition functions. Hence, supervised learning models based on the application of most existing algorithms are limited, and their performance metrics are low. Linear transformations of variables, such as principal components analysis, cannot avoid the problem, and even models based on artificial neural networks and deep learning are unable to improve the metrics. Sometimes, even when features allow classification or regression in reported cases, performance metrics of supervised learning algorithms remain unsatisfyingly low. This problem is recurrent in many areas of study as, per example, the clinical, biotechnological, and protein engineering areas, where many of the attributes are correlated in an unknown and very non-linear fashion or are categorical and difficult to relate to a target response variable. In such areas, being able to create predictive models would dramatically impact the quality of their outcomes, generating an immediate added value for both the scientific and general public. In this manuscript, we present RV-Clustering, a library of unsupervised learning algorithms, and a new methodology designed to find optimum partitions within highly non-linear datasets that allow deconvoluting variables and notoriously improving performance metrics in supervised learning classification or regression models. The partitions obtained are statistically cross-validated, ensuring correct representativity and no over-fitting. We have successfully tested RV-Clustering in several highly non-linear datasets with different origins. The approach herein proposed has generated classification and regression models with high-performance metrics, which further supports its ability to generate predictive models for highly non-linear datasets. Advantageously, the method does not require significant human input, which guarantees a higher usability in the biological, biomedical, and protein engineering community with no specific knowledge in the machine learning area.
Dementia-related diseases like Alzheimer's Disease (AD) have a tremendous social and economic cost. A deeper understanding of its underlying pathophysiologies may provide an opportunity for earlier detection and therapeutic intervention. Previous approaches for characterizing AD were targeted at single aspects of the disease. Yet, due to the complex nature of AD, the success of these approaches was limited. However, in recent years, advancements in integrative disease modeling, built on a wide range of AD biomarkers, have taken a global view on the disease, facilitating more comprehensive analysis and interpretation. Integrative AD models can be sorted in two primary types, namely hypothetical models and data-driven models. The latter group split into two subgroups: (i) Models that use traditional statistical methods such as linear models, (ii) Models that take advantage of more advanced artificial intelligence approaches such as machine learning. While many integrative AD models have been published over the last decade, their impact on clinical practice is limited. There exist major challenges in the course of integrative AD modeling, namely data missingness and censoring, imprecise human-involved priori knowledge, model reproducibility, dataset interoperability, dataset integration, and model interpretability. In this review, we highlight recent advancements and future possibilities of integrative modeling in the field of AD research, showcase and discuss the limitations and challenges involved, and finally, propose avenues to address several of these challenges.
Biological systems respond to environmental perturbations and to a large diversity of compounds through gene interactions, and these genetic factors comprise complex networks. In particular, a wide variety of gene co-expression networks have been constructed in recent years thanks to the dramatic increase of experimental information obtained with techniques, such as microarrays and RNA sequencing. These networks allow the identification of groups of co-expressed genes that can function in the same process and, in turn, these networks may be related to biological functions of industrial, medical and academic interest. In this study, gene co-expression networks for 17 bacterial organisms from the COLOMBOS database were analyzed via weighted gene co-expression network analysis and clustered into modules of genes with similar expression patterns for each species. These networks were analyzed to determine relevant modules through a hypergeometric approach based on a set of transcription factors and enzymes for each genome. The richest modules were characterized using PFAM families and KEGG metabolic maps. Additionally, we conducted a Gene Ontology analysis for enrichment of biological functions. Finally, we identified modules that shared similarity through all the studied organisms by using comparative genomics.
Frontiers in Molecular Biosciences
Function and Dysfunction of Large Bio-Molecules Assemblies: Insights from Multidisciplinary Computational Approaches