- Laboratory of Computer Science and Intelligent Systems, Department of Computer Sciences, Faculty of Sciences Semlalia, Cadi Ayyad University, Marrakesh, Morocco
Anticancer drug design plays a critical role in developing targeted therapies to combat the complexity and heterogeneity of cancer, a leading cause of mortality worldwide. However, the process of discovering and optimizing anticancer drugs is fraught with challenges, including the need to account for genetic variability, drug resistance, and off-target effects. Traditional methods, such as high-throughput screening and structure-based drug design, have advanced the field but often face limitations due to their computational cost, time-intensive nature, and inability to fully capture the dynamic nature of cancer biology. Recent advancements in artificial intelligence (AI), particularly deep learning (DL), have revolutionized drug design, including anticancer drug design, by enabling the analysis of complex biological data, prediction of drug-target interactions, and generation of novel therapeutic compounds. This article provides a comprehensive review of recent advances in anticancer drug design, with a focus on the transformative role of deep learning. While numerous studies have explored deep learning applications in general drug design, specific research focusing on anticancer drug development remains limited. In this context, we highlight the importance of optimizing chemical properties to transform generated molecules into effective therapeutic candidates. Furthermore, real-world applications are examined, and both challenges and future research opportunities are discussed to guide the development of more precise and personalized approaches to anticancer drug discovery.
1 Introduction
Cancer remains one of the most pressing global health challenges, accounting for nearly 10 million deaths annually and representing a leading cause of mortality worldwide (World-Health-Organization, 2020). Its complexity and heterogeneity, driven by genetic mutations, epigenetic alterations, and dynamic tumor microenvironments, have made the development of effective anticancer therapies a formidable task (Bhattacharya et al., 2025). Traditional treatment modalities, such as chemotherapy and radiation, have significantly improved patient outcomes over the years. However, these approaches are often limited by issues such as drug resistance, off-target effects, and an inability to address the unique molecular profiles of individual tumors (El-Tanani et al., 2025). As a result, there is an urgent need for innovative strategies that can overcome these challenges and deliver targeted, personalized therapies to cancer patients.
The process of discovering and optimizing anticancer drugs is inherently complex, time-consuming, and costly. Traditional methods, such as high-throughput screening (HTS) and structure based drug design (SBDD), have identified numerous potential drug candidates (Siddiqui et al., 2025). However, these approaches often face constraints from their reliance on static models, high computational costs, and limited ability to capture the dynamic nature of cancer biology (Siddiqui et al., 2025). Moreover, the average cost of bringing a new drug to market exceeds $2.8 billion, with a timeline spanning 10–17 years and a success rate of less than 10% in clinical trials (Mulcahy et al., 2025). These challenges underscore the need for more efficient and cost-effective strategies to accelerate anticancer drug discovery.
Beyond costs and timelines, classical HTS pipelines are prone to inflated false-positive rates from 2D monolayer readouts, while SBDD must navigate the combinatorial complexity of a vast chemical space using static structures that miss microenvironment-dependent conformational changes. In contrast, DL integrates multi-omics, 3D imaging, and biophysical features to prioritize candidates robust to ECM density, hypoxia, and stromal interactions, while uncertainty-aware ranking curbs over-confident false positives.
A critical bottleneck in translating computational predictions to clinical success lies in the quality and biological relevance of training data. Traditional deep learning models trained on 2D cell culture data often fail to predict drug behavior in vivo, where spatial heterogeneity, extracellular matrix (ECM) barriers, and microenvironmental factors profoundly influence therapeutic efficacy. To address this limitation, advanced 3D biological models including tumor spheroids, patient-derived organoids (PDOs), and tumor on a chip platforms are increasingly being integrated into the drug discovery pipeline. These systems generate higher fidelity data that better capture the complexity of solid tumors, thereby improving the predictive power of deep learning models for penetration, efficacy, and resistance mechanisms (discussed in detail in Section 3.4).
While the use of deep learning models allows for the rapid generation of novel drug candidates, their success as therapeutics depends not only on biological activity but also on favorable chemical and pharmacokinetic properties (Crouse and Leung, 2023). Without systematic optimization of these parameters, many generated compounds may fail in later stages due to poor bioavailability or toxicity. Therefore, the integration of chemical property optimization into de novo drug design is crucial, especially in the context of anticancer therapies where delivery, selectivity, and toxicity profiles are tightly constrained (Prada-Gracia et al., 2016). The detailed framework for property optimization is discussed in Section 4.
In recent years, artificial intelligence (AI), particularly deep learning (DL), has emerged as a transformative tool in drug discovery, offering unprecedented capabilities to analyze complex biological data, predict drug-target interactions, and generate novel therapeutic compounds (Tyagi et al., 2025). Deep learning, a subset of machine learning, leverages multi layered neural networks to model intricate patterns in large datasets, making it particularly well-suited for tasks such as molecular property prediction, virtual screening, and de novo drug design (Hu et al., 2019). Unlike traditional methods, DL based approaches can integrate diverse data types, including genomic, proteomic, and chemical data, to uncover hidden relationships and generate hypotheses that would be difficult to discern through conventional means (Titilayo and Adetunji, 2025). This has opened new avenues for the development of precision oncology therapies tailored to individual patients and specific cancer subtypes (Delle Cave, 2025).
Despite the growing interest in deep learning for drug discovery, its application to anticancer drug design remains relatively underexplored compared to other therapeutic areas (Desai et al., 2024). While several studies have demonstrated the potential of DL in predicting drug target interactions, optimizing lead compounds, and identifying biomarkers, there is a lack of comprehensive reviews focusing specifically on its role in anticancer drug development. This gap highlights the need for a deeper understanding of how deep learning can address the unique challenges of cancer biology and contribute to the discovery of next-generation anticancer therapies.
This article provides a focused review of recent advances in deep learning for anticancer drug design. We begin by examining the role of deep learning in de novo drug generation, highlighting its potential to accelerate the discovery process. We then delve into the specific applications of deep learning in anticancer drug discovery, including its integration with cancer biology and its use in predicting molecular properties, virtual screening, and personalized therapies. In addition, we explore the critical role of chemical property optimization in translating AI-generated molecules into clinically viable anticancer drugs. Essential physicochemical guidelines, structural criteria, and case studies are examined to highlight how these properties impact drug likeness and therapeutic success. Following this, we address the current challenges and opportunities in the field, and conclude by outlining promising future research directions.
2 Deep learning for de novo drug design
de novo drug design refers to the construction of entirely new ligands within a defined receptor pocket by assembling atoms or molecular fragments (Jorgensen, 2004). In recent years, deep learning (DL) has emerged as a transformative tool in this area, capable of generating novel drug candidates by modeling their underlying chemical and physical properties. A key step in this approach is the extraction of meaningful molecular features (Yang et al., 2021). However, challenges arise due to the enormous size of the chemical search space (Polishchuk et al., 2013) and the discontinuous nature of the optimization landscape (Stumpfe and Bajorath, 2012).
Deep learning helps to mitigate these difficulties by automatically learning task-relevant features (LeCun et al., 2015). Early molecular generation methods approached the problem as string generation, with SMILES (Simplified Molecular Input Line Entry System) serving as a primary representation (Kusner et al., 2017). While effective, SMILES faces two major drawbacks: a single molecule can be encoded by multiple SMILES strings (Arús-Pous et al., 2019), and the format does not adequately capture key chemical properties.
The introduction of graph neural networks (GNNs) has significantly improved molecular design, as graph-based representations often outperform SMILES-based approaches (Xu et al., 2019). Traditional graph generative models build molecules atom-by-atom using nodes and edges (Li et al., 2018), yet this strategy often generates chemically invalid intermediates (Jin et al., 2018). To address this, researchers combine valid subgraphs in the form of junction trees, where each node represents a chemically consistent substructure.
Researchers employ a variety of generative architectures for this purpose, including recurrent neural networks (RNNs) (Olurotimi, 1994), autoencoders (AEs), variational autoencoders (VAEs), adversarial autoencoders (AAEs) (Kingma and Welling, 2018; Rezende et al., 2014; Makhzani et al., 2015), and generative adversarial networks (GANs) Goodfellow et al. (2014). Among them, VAEs and GANs are particularly prominent. VAEs encode molecules into a continuous latent space and then decode optimized candidates from this space. For instance, ChemVAE (Gómez-Bombarelli et al., 2018) generates novel SMILES strings by learning the molecular distribution, while SD-VAE (Dai et al., 2018) enhances ChemVAE by addressing both syntactic and semantic constraints in SMILES.
GANs, in contrast, rely on the adversarial interplay between a generator and discriminator. The generator produces candidate molecules, while the discriminator learns to distinguish real from synthetic ones, improving both components over time. Notable examples include ORGAN (objective-reinforced generative adversarial network), which integrates reinforcement learning to generate molecules with specific properties (Guimaraes et al., 2017). MolGAN, which employs annotation matrices and dense adjacency tensors to generate diverse, drug-like compounds (De Cao and Kipf, 2018) and L-Net, which directly produces high quality 3D drug like molecules from graph-based models (Li et al., 2021).
Finally, recent advances in foundation models highlight how large scale architectures trained on massive chemical datasets can dramatically accelerate molecular design, producing pharmacologically relevant compounds with unprecedented speed (Swanson et al., 2025).
While de novo design frameworks have revolutionized molecular generation, their translation into oncological contexts requires integration with cancer-specific biological data. The next section explores how deep learning approaches intersect with cancer biology to tailor drug discovery efforts toward oncogenic targets.
3 Deep learning for anticancer drug design
3.1 The role of AI in cancer biology: scope and potential
The rapid development of multiomics technologies (Johnson et al., 2016; Hasin et al., 2017; Kim and Kim, 2018) has revolutionized cancer research, enabling artificial intelligence (AI) to uncover novel therapeutic targets (Vinayagam et al., 2016; do Valle et al., 2018; Yang et al., 2017). Figure 1 categorizes these technologies into five domains: epigenetics, genomics, proteomics, metabolomics, and multiomics integration. Below, we detail their roles in AI-driven oncology.
Figure 1. Artificial intelligence in cancer biology: Integrating multi-omics for therapeutic target identification.
Epigenetics investigates reversible DNA and protein modifications that regulate gene expression without altering the DNA sequence (Perakakis et al., 2018). AI analysis of epigenetic data has elucidated cancer mechanisms and identified therapeutic targets, such as the histone modifiers KDM1A, KDM3A, and EZH2, which are critical in oncogenesis and drug resistance (Wilson and Filipp, 2018; Filipp, 2017). For example, (Wilson and Filipp, 2018) integrated transcriptomic and epigenetic networks to reveal how these regulators control mitogenic pathways.
Genomics leverages genome-scale sequencing to map genotype-phenotype relationships, biomarkers, and regulatory elements (Holmes et al., 2021; Ozaki et al., 2002). Network-based methods (e.g., gene coexpression, protein interactions) (Lanza et al., 2017; Califano et al., 2012) have enhanced target discovery, as demonstrated by Kori and Yalcin Arga (2018), who identified cervical cancer drivers (CRYAB, PARP1), and Cantini et al. (2015), who used multi-layer genomic networks to pinpoint pancreatic cancer genes (F11R, HDGF) via machine learning (Zhang et al., 2021; 2019; Zhang et al., 2017a).
Proteomics focuses on protein abundance, interactions (PPIs), and posttranslational modifications (Ong and Mann, 2005; Li et al., 2017). Vinayagam et al. (2016) applied control theory (Kalman, 1963) to PPIs, classifying proteins as indispensable, neutral, or dispensable based on network controllability. This approach revealed 56 cancer-linked genes, 46 of which were novel (Ravindran et al., 2017). Similarly, Do Valle et al. (2021) quantified polyphenol-disease protein proximity, linking network topology to therapeutic effects.
Metabolomics profiles metabolites to identify biomarkers and dysregulated pathways Johnson et al. (2016). Network biology has enabled systems-level insights, such as Basler et al. (2016), who identified metabolic driver reactions that inhibit tumor growth, highlighting their therapeutic potential.
Multiomics integration combines omics layers to model tumor-host interactions Chakraborty et al. (2018). AI-powered network analysis reveals cross-layer regulatory mechanisms, as shown by Gov et al. (2017), who integrated transcriptomics and interactomes to identify ovarian cancer hubs (GATA2, miR-124-3p). Such approaches provide a holistic view of carcinogenesis (Zhang C. et al., 2017).
These advances set a strong foundation for integrating AI-driven methods particularly deep learning architectures into anticancer drug discovery pipelines, where they are now playing critical roles in target identification, virtual screening, de novo molecule generation, and pharmacokinetic prediction. The following section delves deeper into how deep learning is revolutionizing each phase of the drug development process.
3.2 Applications of deep learning in anticancer drug discovery
Deep learning integration has transformed anticancer drug discovery, directly addressing the high costs, prolonged timelines, and low success rates that characterize traditional drug development processes. Deep Learning (DL), a part of artificial intelligence (AI), uses layered neural networks to identify complex, non-linear patterns in large biomedical datasets. This helps make therapeutic development more efficient and targeted. Recent studies have emphasized that AI-driven methods are particularly impactful in oncology, where integrating heterogeneous biological and chemical data accelerates anticancer drug discovery (Pu et al., 2025).
One of the most profound applications of DL in this domain is anticancer target identification. DL models can integrate complex multi-omics data genomic, transcriptomic, proteomic, and epigenomic to identify novel cancer related targets (You et al., 2022). Graph neural networks (GNNs), in particular, enhance accuracy by modeling molecules as nodes within biological networks, enabling superior target prediction. For instance, the Multi-Omics Graph Convolutional Network (MOGONET) (Wang et al., 2021) effectively integrates heterogeneous data to classify breast cancer subtypes and identifies top-ranking biomarkers. In oncology, such approaches have already shown promise, with AI frameworks successfully identifying novel cancer related targets and drug candidates (Pu et al., 2025).
In de novo drug design, DL-based generative models such as recurrent neural networks (RNNs), variational autoencoders (VAEs), generative adversarial networks (GANs), and transformers (e.g., MolGPT) (Lim et al., 2018) can learn the structural features of biologically active molecules and generate novel compounds with desirable pharmacological properties. These models not only accelerate compound generation but also improve structural novelty and drug-likeness. Illustratively, Insilico Medicine’s GENTRL model designed a potent discoidin domain receptor 1 (DDR1) kinase inhibitor in just 21 days a timeline that traditionally spans years demonstrating favorable preclinical pharmacokinetics and efficacy in fibrosis models (Zhavoronkov et al., 2019). Complementing this, a recently reported oral ectonucleotide pyrophosphatase/phosphodiesterase 1 (ENPP1) inhibitor computationally generated as a STING pathway modulator was optimized simultaneously for oral bioavailability, target selectivity, and favorable ADMET, showed antitumor activity in preclinical xenograft models, and is advancing toward IND-enabling studies (Pu et al., 2025).
Deep learning models outperform traditional QSAR methods (Bruce et al., 2007) in predicting ADMET properties. The integration of these predictions with chemical property constraints (detailed in Section 4) is essential for translating generated molecules into viable therapeutics.
3.3 Comparison of deep learning approaches for anticancer drug design
Deep learning has enabled diverse strategies for de novo anticancer drug design, each leveraging different molecular representations and generative paradigms. Table 1 summarizes key approaches, their data modalities, evaluation outcomes, and trade-offs. Hybrid architectures such as PaccMannRL, which combine variational autoencoders with reinforcement learning, directly integrate multi-omics profiles (e.g., TCGA, GDSC) with chemical representations to generate molecules tailored to cancer-specific vulnerabilities (Born et al., 2021). In contrast, methods like ACGT and NEVAE focus on SMILES or graph-based encodings to enable smoother latent spaces and chemically valid outputs, making them suitable for library expansion and scaffold exploration (Hong et al., 2019; Samanta et al., 2020).
Sequence-based models, including BiRNN-based BIMODAL and reinforcement learning frameworks like ReLeaSE and MoleGuLAR, emphasize multi-objective optimization, allowing joint tuning of potency, drug-likeness, and pharmacokinetic properties (Grisoni et al., 2020); Popova et al., 2018; Goel et al., 2021). Generative adversarial networks (GANs) such as Mol-CycleGAN and GAN-Drug-Generator are increasingly applied to lead optimization, supporting scaffold hopping and feedback-driven affinity improvement, although they remain limited by training instability and pipeline complexity (Maziarka et al., 2020; Abbasi et al., 2022).
Overall, these models illustrate the balance between exploration (novel scaffolds, chemical space diversity) and exploitation (high binding affinity, cancer-context specificity). The choice of method depends heavily on the research objective: omics-informed reinforcement learning is well suited for precision oncology, while graph- or SMILES-based models remain robust for high-throughput generative tasks and early-stage compound discovery.
However, a persistent challenge in translating these computational advances to clinical impact is the limited biological fidelity of the data used for model training and validation. Most DL models are trained on datasets derived from 2D monolayer cultures or simplified in vitro assays, which fail to recapitulate the spatial organization, ECM complexity, hypoxic gradients, and stromal interactions characteristic of solid tumors. This gap often results in promising in silico candidates that underperform in vivo due to poor penetration, microenvironment dependent resistance, or unanticipated toxicity. The next section addresses this critical limitation by examining how advanced 3D biological models can provide more predictive training datasets and enable DL models to better anticipate real world therapeutic performance.
3.4 Advanced 3D biological models for deep learning in anticancer drug design
3.4.1 Rationale: Bridging the 2D-to-3D gap
Rapid advances in in silico design and property optimization have not eliminated a persistent translational bottleneck in oncology. When we train or validate models on 2D monolayers, we systematically underpredict extracellular matrix (ECM) barriers, spatial heterogeneity, and microenvironmental selection pressures. These blind spots inflate false positives and produce fragile leads that falter during preclinical and clinical testing. In contrast, high fidelity 3D systems multicellular tumor spheroids, patient-derived organoids (PDOs), and tumor on a chip microphysiological platforms reproduce stromal composition, oxygen and nutrient gradients, pH dynamics, perfusion, and crosstalk between immune cells and cancer-associated fibroblasts. By generating labels that reflect these constraints, 3D systems increase the ecological validity of deep learning (DL) models and strengthen predictions of in vivo outcomes (Guerrero-Aspizua et al., 2020).
3.4.2 3D platforms as higher fidelity training and validation sources
Tumor spheroids model diffusion limited penetration and form hypoxic cores that foster ECM-mediated drug tolerance features that 2D cultures do not capture. Confocal and two-photon z-stacks reveal spatial drug distributions, intra spheroid concentration profiles, and rim to core viability gradients. DL architectures such as 3D convolutional neural networks and vision transformers have already leveraged these data to estimate penetration depth, quantify spatial concordance of cytotoxicity, and map exposure response surfaces as a function of ECM density.
PDOs retain clonal diversity and lineage programs from the source tumors, preserving both inter and intra tumor heterogeneity. Studies increasingly pair high-content imaging with bulk or single cell RNA-seq and spatial transcriptomics after treatment. Cross attention transformers and multimodal fusion networks integrate these modalities to predict efficacy under microenvironmental constraints and to stratify likely responders in difficult indications.
Microfluidic tumor on a chip models recapitulate physiological flow, shear stress, and compartmentalized interactions between stroma and immune cells. Under controlled perfusion, time lapse imaging produces dynamic sequences amenable to temporal CNNs, recurrent networks, and transformers. These models can capture transport kill kinetics and in situ adaptation, revealing how flow and gradients shape therapeutic impact.
Humanized, tissue engineered stroma provides a fidelity anchor. By improving tumor engraftment and recapitulating vascularized, collagen-rich ECM architectures, these systems supply stringent substrates for model assessment and increase confidence in translational predictions (Guerrero-Aspizua et al., 2020).
3.4.3 A conceptual framework for incorporating 3D-Derived labels into DL pipelines
We propose a closed loop workflow that connects DL generation with biologically faithful assessment. First, we prioritize candidates in silico using generative models including VAEs, GANs, and reinforcement learning and filter them by ADMET properties, permeability, and drug-likeness. Next, we acquire informative 3D readouts: spheroid z-stacks and viability maps to characterize spatial penetration and kill; PDO imaging integrated with single-cell or spatial transcriptomics to contextualize efficacy within heterogeneous microenvironments; and chip based perfusion time series to quantify dynamic responses under flow.
We then formulate DL tasks that align with the biology. For spatial penetration and kill mapping, we predict drug distribution and cytotoxic effects as functions of depth and ECM density. For efficacy under ECM constraints, we fuse images with omics to forecast responses across variable microenvironments. For temporal transport kill kinetics, we train sequence models on time resolved imaging to capture time to effect and adaptation. We feed these insights back into the generators by shaping reward functions or losses that penalize shallow penetration and loss of efficacy at depth while rewarding robustness across ECM densities and stromal compositions. Throughout, we quantify uncertainty with ensembles or conformal prediction to flag out of distribution organoids or contexts that warrant additional experiments. For reporting, we summarize penetration error as RMSE in
3.4.4 DL for anticipating drug resistance from 3D spatial and transcriptomic data
3D systems expose resistance phenotypes that remain muted in 2D, including hypoxic and quiescent niches, ECM dependent persistence, and efflux enriched domains. We can anticipate such resistance along three complementary axes. First, spatial resistance mapping uses graph based models graph convolutional and graph attention networks to link pre-treatment neighborhood structure to post treatment resistant niches. Hypoxia-associated signatures, epithelial to mesenchymal transition programs, and integrin FAK signaling hubs identified in space can guide combination strategies that co-target bulk disease and resistant subpopulations. Second, multi-task DL trained on scRNA-seq time courses models trajectories from susceptible states to tolerant and resistant phenotypes. Counterfactual simulations then nominate combinations predicted to collapse resistant attractors within defined ECM contexts. Third, computational pathology integrates nonlinear registration, semantic segmentation, and pixel level classification to align multiplex histology and build heterogeneity heatmaps that mark proliferative zones, angiogenic regions, and immune infiltrates. These maps serve as labels for CNNs that predict co-localization patterns and microenvironmental “safe harbors” where residual disease may persist (Nicolás-Sáenz et al., 2020).
3.4.5 Evaluation tasks and minimal reporting Checklist
We evaluate DL models against tasks that mirror key translational hurdles. For spheroid penetration, we test whether models recover drug distribution depth and spatial cytotoxicity, reporting RMSE in
We also standardize minimal reporting to improve comparability across studies. Authors should describe imaging and omics acquisition parameters, including z- and time-resolution and library preparation protocols; provide ECM composition and porosity measurements; list co-culture components such as CAFs, immune subsets, and endothelial cells; explain how 3D labels shape loss functions or evaluation endpoints; and, when available, include cross-donor external validation in PDOs.
3.4.6 Relation to clinical machine learning in oncology
Clinical machine learning in pathology already demonstrates that spatial context, multi scale analysis, and interpretable features improve diagnostic and prognostic performance. By importing these principles, we enhance DL driven drug discovery: we register and segment images carefully, quantify spatial statistics rigorously, and prioritize labels that reflect tissue architecture. This alignment increases generalization and clinical relevance (Nicolás-Sáenz et al., 2020).
3.4.7 Summary and transition
Evidence from spheroids, PDOs, and microphysiological chips converges on a clear conclusion: when we train and, critically, evaluate models with 3D-derived, spatially resolved labels, we better anticipate penetration limits imposed by dense ECM, sustain efficacy under microenvironmental constraints such as hypoxia, acidity, and stromal interactions, and detect the emergence of spatially localized resistance niches. By consolidating task definitions, metrics, and reporting standards into a common narrative, we enable cross platform comparisons and accelerate the path to in vivo models and clinical settings.
While rigorous 3D validation strengthens biological fidelity, successful translation still depends on physicochemical and pharmacokinetic suitability. Even compelling hits can fail due to poor solubility, unfavorable membrane permeability, or metabolic liabilities. In the next section, we examine how chemical property optimization complements biological validation so that DL-generated molecules are not only efficacious but also pharmaceutically viable.
4 Chemical property optimization in anticancer de novo drug design
While deep learning (DL) has significantly advanced the field of de novo drug design (Wang et al., 2022), the generation of novel molecules is only the first step. For a candidate molecule to succeed as a therapeutic agent, especially in oncology, it must exhibit not only biological efficacy but also favorable chemical, pharmacokinetic, and pharmacodynamic properties (Ernstmeyer and Christman, 2023; Mao et al., 2016). Improving these chemical properties, known as chemical property optimization, is an important step in discovering new drugs to treat cancer. Figure 2 illustrates a conceptual framework for integrating deep learning with property-driven optimization in anticancer drug design. This section explores key principles, guidelines, and case studies relevant to property driven drug design and discusses how these can be integrated with deep learning approaches.
Figure 2. Deep learning-driven pipeline for anticancer drug design: Focus on chemical property optimization.
4.1 Importance of property optimization in oncology
In anticancer drug design, researchers must ensure that computational methods generate molecules that are synthetically feasible, biologically relevant, and pharmacologically viable (Prada-Gracia et al., 2016; Garg et al., 2024). This requires optimizing multiple chemical properties that govern a drug’s absorption, distribution, metabolism, excretion, and toxicity (ADMET) (De Sá and Ascher, 2025). Without this optimization, promising molecular candidates that DL algorithms identify may fail due to poor solubility, low bioavailability, or toxicity (Singh, 2016). Deep learning enhances ADMET property optimization through several key mechanisms: First, DL models can predict ADMET properties with significantly higher accuracy than traditional QSAR methods by learning complex, non-linear structure-property relationships from large datasets (Bruce et al., 2007). Graph neural networks (GNNs) and convolutional neural networks (CNNs) process molecular structures directly, capturing subtle features that influence pharmacokinetic behavior. Second, multi-task learning architectures simultaneously predict multiple ADMET endpoints (e.g., solubility, permeability, hepatotoxicity, cardiotoxicity), sharing learned representations across related properties and improving prediction accuracy for each individual endpoint. Third, modern generative frameworks integrate ADMET predictions directly into the molecule generation process through reinforcement learning and conditional generation, ensuring that designed molecules satisfy pharmacokinetic constraints from the outset rather than filtering candidates post hoc. Consequently, integrating chemical property constraints into DL frameworks (Guzman-Pando et al., 2024) is essential to bridge the gap between in silico design and real-world therapeutic potential.
4.2 Lipinski’s rules and fragment-based design
One of the most widely used filters in early-stage drug development is Lipinski’s Rule of Five (Ro5) (Pollastri, 2010), which evaluates a compound’s drug-likeness based on key parameters: molecular weight (≤500 Da), logP (≤5), hydrogen bond donors (≤5), and hydrogen bond acceptors (≤10). While not absolute, Ro5 provides a useful benchmark for predicting oral bioavailability, as illustrated in Table 2.
In parallel, researchers often apply Lipinski’s Rule of Three (Ro3) in fragment-based drug discovery (FBDD) (Jhoti et al., 2013; Congreve et al., 2003), focusing on smaller molecules (≤300 Da) with stricter criteria, including reduced polarity and fewer rotatable bonds. These fragments serve as starting points in constructing more complex drug-like molecules, and adherence to Ro3 increases the likelihood of developing leads that will eventually satisfy Ro5 criteria (see Table 2).
Deep learning applications to drug-likeness optimization: Generative models incorporate Ro5 and Ro3 criteria as soft or hard constraints during molecular design. For instance, reinforcement learning frameworks use Lipinski compliance as reward components, guiding molecule generation toward drug-like chemical space. Variational autoencoders (VAEs) learn latent representations where drug-like molecules cluster together, enabling conditional sampling from drug-like regions. Junction tree VAEs (JT-VAEs) build molecules from chemically valid fragments that satisfy Ro3 criteria, progressively assembling them into Ro5-compliant structures. This integration ensures that generated molecules begin with appropriate physicochemical foundations.
In oncology, exceptions to these rules are frequent. For example, some anticancer agents exceed typical molecular weight limits but retain efficacy due to specific mechanisms or delivery systems. Therefore, a flexible but informed application of these rules is recommended when designing drugs for complex diseases like cancer (Shultz, 2019). Deep learning models trained specifically on anticancer drug databases can learn these context-dependent exceptions, enabling adaptive application of drug-likeness rules while maintaining therapeutic relevance.
4.3 Structural and physicochemical criteria
Beyond Ro5 and Ro3, multiple structural and physicochemical features (Davis and Leeson, 2023) shape drug likeness and ultimately therapeutic success, as summarized in the accompanying Table 3. In contemporary pipelines, deep learning models routinely optimize these features in concert. For membrane permeability, researchers treat topological polar surface area (TPSA) as a primary lever: values below
Table 3. Structural and physicochemical criteria for anticancer drugs (Mao et al., 2016).
Lipophilicity (logP/logD) exerts a parallel influence on permeability and exposure (Bhal, 2007). Excessive lipophilicity can erode aqueous solubility and inflate off target risks, whereas excessive hydrophilicity can suppress cellular uptake. Deep neural networks trained on large scale libraries (e.g., ChEMBL, PubChem) predict logP/logD with mean absolute errors below
Solubility remains a gatekeeper for absorption (Savjani et al., 2012). Ensemble models that fuse convolutional and recurrent encoders over SMILES achieve
Size related descriptors such as molecular weight and heavy atom count also modulate drug like behavior (Biala et al., 2023). Smaller molecules often permeate membranes more readily and show favorable bioavailability, while high heavy atom counts can reduce ligand efficiency and raise metabolic liability. Conditional variational autoencoders (cVAEs) enable explicit control of molecular weight by conditioning generation on target ranges, and multi objective schemes optimize ligand efficiency (LE
Functional groups and ring systems further refine drug likeness in domain specific ways (Mao et al., 2016). Anticancer agents can often tolerate up to three aromatic rings that provide rigidity and target binding interactions. Models preferentially select groups such as –
4.4 Deep learning enhancement of ADMET properties
Deep learning (DL) models increasingly outperform traditional QSAR approaches (Bruce et al., 2007) in predicting ADMET properties, and their integration with physicochemical constraints introduced earlier enables the translation of generated structures into viable therapeutics. In absorption modeling, multi task networks jointly learn human intestinal absorption (HIA%), Caco-2 permeability, and P-glycoprotein (P-gp) efflux liability. Attention mechanisms surface patterns in hydrogen bonding and conformational flexibility that govern barrier crossing, and generative models use these signals to avoid efflux substrates while maintaining permeability. In practice, models learn that moderating flexibility and positioning charged groups strategically can reduce P-gp recognition and thereby improve oral bioavailability for anticancer agents.
Distribution predictions follow naturally from these absorption aware designs. Deep regression networks estimate volume of distribution and tissue penetration, while specialized classifiers infer blood brain barrier permeability with >90% accuracy from 3D conformational features a capability essential for CNS oncology. In parallel, neural predictors of plasma protein binding to albumin and
Metabolism remains a central arena for DL impact. Multi class classifiers attribute candidate molecules to specific cytochrome P450 isoforms and localize sites of metabolism with atom level resolution, while regression models infer hepatic clearance and half life directly from structure using microsomal stability data. The same architectures forecast CYP inhibition and induction, a crucial safeguard in the polypharmacy common to oncology. Reinforcement learning agents exploit these signals to introduce or shield metabolic soft spots, tuning clearance to therapeutic optima that balance exposure with timely elimination.
Excretion and toxicity complete the ADMET picture. Classifiers predict predominant elimination routes (renal, biliary, metabolic) and, when trained on transporter substrate datasets, estimate renal clearance as a combination of glomerular filtration and active secretion information that guides dosing in patients with renal impairment. Multi task toxicity models simultaneously estimate hERG blockade (cardiotoxicity), drug-induced liver injury risk, Ames mutagenicity, and carcinogenicity. Attention layers highlight toxicophores responsible for liabilities, enabling structure guided detoxification, while transfer learning helps bridge animal to-human translation. Adversarial training extends this paradigm by proposing edits that reduce predicted toxicity without eroding potency, effectively designing out safety risks earlier in the pipeline.
Modern DL pipelines therefore embed ADMET constraints throughout molecular generation rather than applying them as post hoc filters. PaccMannRL exemplifies this integrated strategy by combining cancer transcriptomic profiles with multi objective rewards spanning binding affinity, synthetic accessibility, and ADMET desiderata, yielding candidates optimized across these axes in a single loop (Born et al., 2021). This holistic approach reduces attrition due to poor pharmacokinetics and accelerates the path from in silico design to clinical candidacy.
4.5 Anticancer drug properties
Beyond general pharmacokinetic profiling (see section 4.1), evaluation of anticancer-specific properties is vital for prioritizing candidates based on cytotoxic potential, target selectivity, and resistance mechanisms. A summary is provided in Table 4, and key details are described below.
Cell cycle arrest represents a common strategy in anticancer therapy, where compounds block progression at critical checkpoints such as G1/S or G2/M, thereby suppressing proliferation. CDK inhibitors exemplify this approach (Iwaloye et al., 2023; Pozarowski and Darzynkiewicz, 2004). Complementing this mechanism, apoptosis induction serves as another hallmark of effective anticancer agents, often mediated via intrinsic mitochondrial or extrinsic receptor pathways. Drugs such as Imatinib and natural compounds like epigallocatechin gallate (EGCG) trigger Bcl-2 or BCR-ABL–mediated apoptosis (Iwaloye et al., 2023; Elmore, 2007). Similarly critical is angiogenesis inhibition, which prevents tumor vascularization through interference with VEGF/VEGFR signaling. Small molecules such as Axitinib and natural products like resveratrol effectively restrict blood supply to tumors through this mechanism (Iwaloye et al., 2023; Staton et al., 2009).
Binding affinity, typically quantified through metrics such as IC50, Kd, and Ki, remains a central parameter for characterizing drug–target interactions. Small molecules such as Gefitinib and Imatinib achieve strong binding to EGFR and BCR-ABL, respectively, and researchers can predict these properties computationally using QSAR models with RDKit descriptors and machine learning methods (Wang et al., 2023; Landrum, 2006; Lionta et al., 2014). Beyond mere affinity, selectivity proves vital to ensure that compounds preferentially act on cancer-specific targets, such as mutant EGFR, with reduced toxicity toward wild-type proteins. Researchers have successfully modeled such differential activity using multi-task machine learning approaches trained on mutant versus wild-type bioassays (Wang et al., 2023; Shoemaker, 2006).
Anti-metastatic potential relates to the inhibition of migration and invasion, processes that contribute substantially to cancer lethality. Studies report that quercetin interferes with
Cancer stem cell (CSC) targeting aims to eradicate CSC populations that drive relapse and resistance. Agents such as salinomycin and curcumin act through Wnt, Notch, or Hedgehog signaling pathways (Dontu et al., 2003), though computational prediction remains challenging due to limited datasets. Tumor microenvironment (TME) modulation considers the roles of stromal cells, hypoxia, acidity, and immune evasion in supporting tumor progression. Evofosfamide, a hypoxia-activated prodrug, exemplifies this strategy. While computational proxies exist for HIF-1
Together, these properties provide a structured framework for the in silico and experimental evaluation of anticancer candidates. While computational methods such as machine learning, QSAR, and docking can predict some features, others including metastasis inhibition, CSC targeting, and microenvironment modulation require direct experimental confirmation.
4.6 Case studies in anticancer drug design
Real world examples of successful anticancer drugs highlight how chemical property optimization directly impacts efficacy and deliverability, as illustrated in Table 5. Paclitaxel (Taxol) (Chunarkar-Patil et al., 2024; Markman and Mekhail, 2002), a natural product that stabilizes microtubules and disrupts mitotic spindle formation, demonstrates the critical interplay between potency and formulation. Despite its remarkable efficacy, paclitaxel required significant formulation innovations to overcome poor aqueous solubility, ultimately leading to the development of specialized delivery systems such as Cremophor EL-based formulations and later albumin-bound nanoparticle formulations.
In contrast, Imatinib (Gleevec) (Prada-Gracia et al., 2016) exemplifies rational drug design with a strong emphasis on both selectivity and oral bioavailability. As a tyrosine kinase inhibitor targeting the BCR-ABL fusion protein, Imatinib was developed through systematic optimization of chemical properties to achieve potent on-target activity while minimizing off-target effects. Its favorable pharmacokinetic profile enabled convenient oral administration, revolutionizing the treatment of chronic myeloid leukemia. Similarly strategic in its design, Fluorouracil (Xiao et al., 2016; Pinedo and Peters, 1988) functions as an antimetabolite that mimics pyrimidines, illustrating how strategic functional group incorporation enables interference with DNA synthesis pathways. By exploiting structural similarities to natural nucleotides, Fluorouracil disrupts thymidylate synthase activity and becomes incorporated into RNA and DNA, leading to cytotoxic effects.
Topotecan (Xiao et al., 2016), a topoisomerase I inhibitor, further demonstrates the importance of balancing multiple design objectives. This camptothecin derivative was engineered to achieve potent enzyme inhibition while maintaining manageable toxicity profiles and sufficient pharmacokinetic performance for clinical utility. Its development involved careful optimization of the lactone ring system to enhance stability under physiological conditions while preserving DNA-binding activity.
These examples collectively underscore that even the most potent molecules must be carefully tuned for ADMET characteristics to succeed clinically. The translation from laboratory discovery to clinical application invariably requires optimization across multiple dimensions, including solubility, permeability, metabolic stability, and target selectivity.
4.7 Integrating property optimization with deep learning
Modern DL frameworks increasingly incorporate chemical property constraints into the molecular generation process. Researchers now commonly use techniques such as reinforcement learning with reward functions based on ADMET scores, transfer learning from property-annotated datasets, and multi-objective optimization with graph neural networks.
For example, models like MolGPT (Lim et al., 2018) and GENTRL (Ma et al., 2015) not only generate novel molecules but also prioritize chemical structures with desirable physicochemical and pharmacokinetic profiles. In the context of anticancer drug discovery, this integration allows researchers to generate compounds that are both structurally novel and clinically viable, significantly reducing downstream failure rates.
5 Challenges and opportunities
The application of deep learning (DL) in anticancer drug design has shown promising results, yet it remains a relatively underexplored area. One of the key challenges lies in the limited number of scientific articles and research efforts dedicated specifically to anticancer drug design using DL techniques. Several factors contribute to this limitation, including the complexity of cancer as a disease (Sugihara and Saya, 2013), the scarcity of well-curated, labeled biomedical datasets, and the interdisciplinary expertise required to merge oncology, chemistry, and machine learning (Pandiyan and Wang, 2022).
Moreover, data quality and availability present significant obstacles. Anticancer drug discovery depends on large scale biological and chemical datasets, which often suffer from inconsistency, noise, or missing values (Bajorath, 2022). Additionally, the lack of standardization across datasets and experimental protocols hampers reproducibility and model generalization (Bajorath, 2022). To address these challenges, several strategies are emerging: First, data augmentation techniques can synthetically expand limited datasets while preserving chemical validity through methods such as SMILES enumeration, molecular graph perturbations, and scaffold-based generation. Second, federated learning approaches enable model training across distributed datasets without compromising data privacy, allowing institutions to collaborate while maintaining regulatory compliance. Third, active learning frameworks strategically select the most informative experiments to maximize data efficiency, reducing the number of costly wet-lab validations required. Furthermore, community-driven initiatives such as the Therapeutics Data Commons (Huang et al., 2021) are establishing standardized benchmarks and data curation protocols that improve reproducibility and facilitate model comparison across studies. Another limitation comes from the difficulty of harmonizing heterogeneous data modalities ranging from molecular structures and omics profiles to clinical outcomes which are crucial for capturing the full complexity of anticancer drug response. Integration frameworks that employ multi-modal learning architectures and attention mechanisms show promise in addressing this challenge.
Another critical challenge is the black-box nature of deep learning models (Chamorey et al., 2024). Understanding and interpreting these model decisions remains difficult, which particularly challenges high-stakes fields such as oncology. Without interpretability, gaining trust from domain experts and regulatory bodies becomes harder, thereby slowing down real world adoption. Even when predictions are accurate, the lack of mechanistic insights into drug target interactions or toxicity pathways limits their translational impact. To enhance model transparency, several explainable AI (XAI) techniques are being adapted for drug discovery. Attention mechanisms can highlight which molecular substructures drive predictions, enabling chemists to understand model reasoning at a granular level. Gradient-based attribution methods, such as integrated gradients and saliency maps, identify critical features that influence model outputs. Rule extraction approaches translate neural network decisions into human-interpretable chemical rules, bridging the gap between statistical correlations and domain knowledge. Furthermore, hybrid models that combine interpretable symbolic reasoning with deep learning are emerging as promising solutions, offering both predictive accuracy and mechanistic insights. For instance, neuro-symbolic architectures can encode chemical knowledge graphs alongside learned representations, making predictions more transparent and chemically grounded. These interpretability enhancements not only facilitate regulatory approval but also enable researchers to identify and correct systematic biases or errors in model reasoning.
Generalization to novel data is also a hurdle. Many DL models perform well on training datasets but fail when applied to new, unseen compounds or biological targets (Tang, 2023). This limits their practical usefulness in drug discovery pipelines, where predicting efficacy across diverse patient profiles and cancer subtypes is essential. In particular, chemical property optimization such as balancing potency with ADMET characteristics remains difficult, since most generative models focus on structural novelty without fully integrating pharmacokinetic constraints. To improve model robustness, researchers are employing several strategies. Transfer learning leverages pre-trained foundation models trained on massive chemical datasets, enabling rapid adaptation to specific anticancer tasks with limited labeled data. Multi-task learning approaches share knowledge across related prediction tasks, such as jointly modeling toxicity, solubility, and binding affinity, which improves generalization by capturing shared underlying patterns. Domain adaptation techniques adjust models for new cancer types or molecular scaffolds through methods like adversarial training or fine-tuning on target-domain data. Additionally, uncertainty quantification methods are being integrated to flag predictions where models lack confidence, enabling more informed experimental prioritization. Bayesian neural networks, ensemble methods, and conformal prediction frameworks can provide calibrated confidence intervals, helping researchers distinguish between reliable predictions and those requiring experimental validation. These approaches not only reduce failure rates in downstream testing but also guide efficient resource allocation in drug discovery campaigns.
Despite these challenges, several opportunities emerge. With the growth of open access databases, improved data sharing practices (Huang et al., 2021), and advances in techniques such as transfer learning and graph neural networks, DL models are becoming increasingly powerful and adaptable. Additionally, efforts to develop explainable AI (XAI) can enhance transparency and trust in model predictions (Chamorey et al., 2024; Dwivedi et al., 2023). Recent progress in foundation models trained on massive chemical and biological corpora also offers the potential for cross domain transfer, where models can be fine-tuned for cancer-specific tasks with limited labeled data. Models such as ChemBERTa, MolGPT, and UniMol demonstrate that large-scale pretraining on chemical structures enables few-shot learning capabilities, dramatically reducing the data requirements for specialized applications.
Integrating DL frameworks with experimental validation pipelines offers another promising direction. By closely linking computational predictions with wet lab experiments, researchers can iteratively refine models and accelerate the drug development process. Closed-loop systems that combine robotic synthesis, high-throughput screening, and automated retraining of predictive models are beginning to demonstrate the feasibility of autonomous drug discovery platforms. Furthermore, personalized and precision cancer therapies stand to benefit immensely from DL’s ability to learn complex, non-linear patterns in multimodal data. The combination of patient-specific genomic information with DL-driven compound design could enable the tailoring of anticancer drugs to individual molecular profiles, pushing the field closer to true precision oncology. Patient stratification models that integrate transcriptomic signatures, mutational landscapes, and drug response data can identify subpopulations most likely to benefit from specific therapeutic interventions, optimizing both efficacy and safety outcomes.
In conclusion, while deep learning in anticancer drug design still faces several barriers including limited domain-specific research, data-related issues, and interpretability concerns it holds transformative potential. Addressing these challenges through interdisciplinary collaboration, better data practices, and explainable, robust modeling approaches will be key to unlocking its full impact. The strategies outlined above data augmentation, federated learning, active learning, XAI techniques, transfer learning, and uncertainty quantification provide concrete pathways forward. Ultimately, the successful integration of DL into anticancer drug pipelines will depend on bridging computational innovation with chemical property optimization, biological validation, and clinical translation. As these methods mature and regulatory frameworks adapt, AI-driven drug discovery has the potential to fundamentally reshape oncology therapeutics, delivering more effective, personalized treatments at unprecedented speed and scale.
6 Regulatory and ethical considerations in AI-Driven oncology drug discovery
The integration of AI into anticancer drug discovery raises important regulatory and ethical considerations that must be addressed to ensure responsible deployment.
6.1 Regulatory frameworks
Current regulatory pathways (e.g., FDA, EMA) were designed for traditional development and may not fully address AI-specific challenges (US Food and Drug Administration, 2021; MHRA, 2021; European Parliament and Council, 2024). Key concerns include: (i) validation requirements for AI models used in critical decision-making, (ii) standards for data quality and algorithmic transparency, (iii) guidelines for updating or retraining models post-approval, and (iv) mechanisms for auditing AI-driven predictions. Recent efforts such as the FDA’s SaMD initiatives and the EU AI Act are important steps, but oncology-focused guidance for AI in drug discovery remains under development.
6.2 Algorithmic bias and health equity
Deep learning models trained on historically biased datasets may perpetuate or amplify disparities. In oncology, this is especially concerning given documented differences in drug response across demographic groups and cancer subtypes. Mitigations include curating diverse, representative training data, applying fairness-aware learning and evaluation, and conducting pre-specified bias audits across subpopulations (Hasanzadeh, 2025).
6.3 Data privacy and consent
Training on patient derived multi-omics and clinical data raises questions about informed consent, data ownership, and re-identification risk. Privacy preserving techniques (e.g., federated learning, differential privacy) can reduce but not eliminate these risks; clear ethical and governance frameworks are needed to balance scientific progress with patient autonomy and rights (European Parliament and Council, 2016; US Department of Health and Human ServicesOffice for Civil Rights, 2025; Kairouz, 2025).
6.4 Accountability and liability
As AI contributes to target identification and candidate selection, responsibility for adverse outcomes must be clearly delineated across developers, sponsors, and clinical users. Shared accountability models and auditability requirements, aligned with good machine learning practice, can support safe deployment and post market oversight (Mhra, 2021).
6.5 Explainability requirements
Beyond technical interpretability, there is an ethical imperative for clinically meaningful explanations that patients and clinicians can understand, particularly when AI influences high stakes treatment decisions in cancer (Qadri, 2025; Ding, 2025).
Addressing these considerations will require ongoing collaboration among AI researchers, pharmaceutical companies, regulators, ethicists, and patient advocacy groups to develop transparent, enforceable, and adaptive frameworks that enable innovation while protecting patient welfare and public trust.
7 Conclusion
In conclusion, the successful integration of deep learning into anticancer drug pipelines depends on bridging computational innovation with chemical property optimization, biological validation through 3D models, regulatory compliance, and ethical deployment. By consolidating task definitions, evaluation metrics, and reporting standards; addressing data quality through augmentation, federated learning, and active learning; enhancing interpretability through XAI techniques and neuro-symbolic approaches, and establishing robust regulatory and ethical frameworks, the field can overcome current limitations. Interdisciplinary collaboration, improved data practices, and rigorous validation approaches will be key to unlocking the full transformative potential of AI-driven drug discovery. As these methods mature and regulatory frameworks adapt, deep learning has the potential to fundamentally reshape oncology therapeutics, delivering more effective, personalized treatments at unprecedented speed and scale. The future of anticancer drug design lies at the intersection of deep learning, precision medicine, and responsible AI deployment a frontier ripe with both challenges and unprecedented opportunities to improve outcomes for cancer patients worldwide.
Author contributions
KM: Conceptualization, Data curation, Investigation, Methodology, Visualization, Writing – original draft, Writing – review and editing. M-AC: Conceptualization, Funding acquisition, Methodology, Supervision, Validation, Writing – review and editing. HM: Conceptualization, Funding acquisition, Methodology, Supervision, Validation, Writing – review and editing.
Funding
The authors declare that financial support was received for the research and/or publication of this article. This work was supported by the IRC “Institut de Recherche sur le Cancer” under Grant no: 1144/AAmP2023.
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The authors declare that no Generative AI was used in the creation of this manuscript.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
References
Abbasi, M., Santos, B. P., Pereira, T. C., Sofia, R., Monteiro, N. R., Simões, C. J., et al. (2022). Designing optimized drug candidates with generative adversarial network. J. Cheminformatics 14, 40. doi:10.1186/s13321-022-00623-6
Arús-Pous, J., Johansson, S. V., Prykhodko, O., Bjerrum, E. J., Tyrchan, C., Reymond, J. L., et al. (2019). Randomized smiles strings improve the quality of molecular generative models. J. Cheminformatics 11, 1–13. doi:10.1186/s13321-019-0393-0
Bajorath, J. (2022). Deep machine learning for computer-aided drug design. Front. Drug Discov. 2, 829043. doi:10.3389/fddsv.2022.829043
Basler, G., Nikoloski, Z., Larhlimi, A., Barabási, A. L., and Liu, Y. Y. (2016). Control of fluxes in metabolic networks. Genome Research 26, 956–968. doi:10.1101/gr.202648.115
Bhal, S. K. (2007). Lipophilicity descriptors: understanding when to use logp & logd. Application note. Toronto, Canada: Advanced Chemistry Development, Inc. (ACD/Labs), 27.
Bhattacharya, S., Sharma, M., Page, A. B., Mukherjee, D., and Kanugo, A. (2025). Advancements in cancer research: exploring diagnostics and therapeutic breakthroughs. Sharjah, United Arab Emirates: Bentham Science Publishers.
Biala, G., Kedzierska, E., Kruk-Slomka, M., Orzelska-Gorka, J., Hmaidan, S., Skrok, A., et al. (2023). Research in the field of drug design and development. Pharm. (Basel) 16, 1283. doi:10.3390/ph16091283
Born, J., Manica, M., Oskooei, A., Cadow, J., Markert, G., and Martínez, M. R. (2021). Paccmannrl: de novo generation of hit-like anticancer molecules from transcriptomic data via reinforcement learning. Iscience 24, 102269. doi:10.1016/j.isci.2021.102269
Bruce, C. L., Melville, J. L., Pickett, S. D., and Hirst, J. D. (2007). Contemporary qsar classifiers compared. J. Chemical Information Modeling 47, 219–227. doi:10.1021/ci600332j
Califano, A., Butte, A. J., Friend, S., Ideker, T., and Schadt, E. (2012). Leveraging models of cell regulation and gwas data in integrative network-based association studies. Nat. Genetics 44, 841–847. doi:10.1038/ng.2355
Cantini, L., Medico, E., Fortunato, S., and Caselle, M. (2015). Detection of gene communities in multi-networks reveals cancer drivers. Sci. Reports 5, 17386. doi:10.1038/srep17386
Chakraborty, S., Hosen, M. I., Ahmed, M., and Shekhar, H. U. (2018). Onco-multi-omics approach: a new frontier in cancer research. BioMed Research International 2018, 9836256. doi:10.1155/2018/9836256
Chamorey, E., Gal, J., Mograbi, B., and Milano, G. (2024). Critical appraisal and future challenges of artificial intelligence and anticancer drug development. Pharm. (Basel) 17, 816. doi:10.3390/ph17070816
Chunarkar-Patil, P., Kaleem, M., Mishra, R., Ray, S., Ahmad, A., Verma, D., et al. (2024). Anticancer drug discovery based on natural products: from computational approaches to clinical studies. Biomedicines 12, 201. doi:10.3390/biomedicines12010201
Congreve, M., Carr, R., Murray, C., and Jhoti, H. (2003). A ‘rule of three’ for fragment-based lead discovery? Drug Discovery Today 8, 876–877. doi:10.1016/s1359-6446(03)02831-9
Crouse, E. L., and Leung, J. G. (2023). Chapter 1. Pharmacokinetics, pharmacodynamics, and principles of drug-drug interactions. Clin. Man. Psychopharmacol. Medically Ill, 1–49. doi:10.1176/appi.books.9781615375264.lg01
Dai, H., Tian, Y., Dai, B., Skiena, S., and Song, L. (2018). Syntax-directed variational autoencoder for structured data. arXiv Preprint arXiv:1802.08786. doi:10.48550/arXiv.1802.08786
Davis, A. M., and Leeson, P. D. (2023). “Physicochemical properties,” in The handbook of medicinal chemistry: principles and practice. Cambridge, United Kingdom: The Royal Society of Chemistry (RSC). doi:10.1039/9781788018982-00001
De Cao, N., and Kipf, T. (2018). MolGAN: an implicit generative model for small molecular graphs. arXiv Preprint arXiv:1805. 11973. doi:10.48550/arXiv.1805.11973
de Sá, A. G., and Ascher, D. B. (2025). Auto-admet: an effective and interpretable automl method for chemical admet property prediction. arXiv Preprint arXiv:2502.16378.
Delle Cave, D. (2025). Advances in molecular mechanisms and therapeutic strategies in colorectal cancer: a new era of precision medicine. Int. J. Mol. Sci. 26, 346. doi:10.3390/ijms26010346
Desai, K., Goyal, M. K., and Nachappa, M. N. (2024). “Investigating the applications of deep learning in drug discovery and pharmaceutical research,” in 2024 international conference on advances in computing research on science engineering and technology (ACROSET), 1–6. doi:10.1109/ACROSET62108.2024.10743819
Ding, X. (2025). Colleagues. Explainable artificial intelligence in drug discovery. WIREs Comput. Mol. Sci. Available online at: https://wires.onlinelibrary.wiley.com/doi/10.1002/wcms.70049.
do Valle, Í. F., Menichetti, G., Simonetti, G., Bruno, S., Zironi, I., Durso, D. F., et al. (2018). Network integration of multi-tumour omics data suggests novel targeting strategies. Nat. Commun. 9, 4514. doi:10.1038/s41467-018-06992-7
Do Valle, I. F., Roweth, H. G., Malloy, M. W., Moco, S., Barron, D., Battinelli, E., et al. (2021). Network medicine framework shows that proximity of polyphenol targets and disease proteins predicts therapeutic effects of polyphenols. Nat. Food 2, 143–155. doi:10.1038/s43016-021-00243-7
Dontu, G., Abdallah, W. M., Foley, J. M., Jackson, K. W., Clarke, M. F., Kawamura, M. J., et al. (2003). In vitro propagation and transcriptional profiling of human mammary stem/progenitor cells. Genes. & Dev. 17, 1253–1270. doi:10.1101/gad.1061803
Dwivedi, R., Dave, D., Naik, H., Singhal, S., Omer, R., Patel, P., et al. (2023). Explainable ai (Xai): core ideas, techniques, and solutions. ACM Comput. Surv. 55, 1–33. doi:10.1145/3561048
El-Tanani, M., Rabbani, S. A., Satyam, S. M., Rangraze, I. R., Wali, A. F., El-Tanani, Y., et al. (2025). Deciphering the role of cancer stem cells: drivers of tumor evolution, therapeutic resistance, and precision medicine strategies. Cancers 17, 382. doi:10.3390/cancers17030382
Elmore, S. (2007). Apoptosis: a review of programmed cell death. Toxicol. Pathol. 35, 495–516. doi:10.1080/01926230701320337
K. Ernstmeyer, and E. Christman (2023). Nursing Pharmacology (Eau Claire (WI): Chippewa Valley Technical College). 2nd edn. Open Resources for Nursing (Open RN).
European Parliament and Council (2016). Regulation (eu) 2016/679 (general data protection regulation). Available online at: https://eur-lex.europa.eu/eli/reg/2016/679/oj/eng (Accessed on August 27, 2025).
European Parliament and Council (2024). Regulation (eu) 2024/1689 laying down harmonised rules on artificial intelligence (ai act). Official J. Eur. Union. Available online at: https://eur-lex.europa.eu/eli/reg/2024/1689/oj/eng (Accessed on August 27, 2025).
Filipp, F. V. (2017). Crosstalk between epigenetics and metabolism—yin and yang of histone demethylases and methyltransferases in cancer. Briefings Functional Genomics 16, 320–325. doi:10.1093/bfgp/elx001
Garg, P., Singhal, G., Kulkarni, P., Horne, D., Salgia, R., and Singhal, S. S. (2024). Artificial intelligence–driven computational approaches in the development of anticancer drugs. Cancers 16, 3884. doi:10.3390/cancers16223884
Goel, M., Raghunathan, S., Laghuvarapu, S., and Priyakumar, U. D. (2021). Molegular: molecule generation using reinforcement learning with alternating rewards. J. Chem. Inf. Model. 61, 5815–5826. doi:10.1021/acs.jcim.1c01341
Gómez-Bombarelli, R., Wei, J. N., Duvenaud, D., Hernández-Lobato, J. M., Sánchez-Lengeling, B., Sheberla, D., et al. (2018). Automatic chemical design using a data-driven continuous representation of molecules. ACS Central Science 4, 268–276. doi:10.1021/acscentsci.7b00572
Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., et al. (2014). Generative adversarial nets. Adv. Neural Information Processing Systems 27. doi:10.48550/arXiv.1406.2661
Gottesman, M. M., Fojo, T., and Bates, S. E. (2002). Multidrug resistance in cancer: role of atp–dependent transporters. Nat. Rev. Cancer 2, 48–58. doi:10.1038/nrc706
Gov, E., Kori, M., and Arga, K. Y. (2017). Multiomics analysis of tumor microenvironment reveals gata2 and mirna-124-3p as potential novel biomarkers in ovarian cancer. Omics A Journal Integrative Biology 21, 603–615. doi:10.1089/omi.2017.0115
Grisoni, F., Moret, M., Lingwood, R., and Schneider, G. (2020). Bidirectional molecule generation with recurrent neural networks. J. Chemical Information Modeling 60, 1175–1183. doi:10.1021/acs.jcim.9b00943
Guerrero-Aspizua, S., González-Masa, A., Conti, C. J., Garcia, M., Chacon-Solano, E., Larcher, F., et al. (2020). Humanization of tumor stroma by tissue engineering as a tool to improve squamous cell carcinoma xenograft. Int. J. Mol. Sci. 21, 1951. doi:10.3390/ijms21061951
Guimaraes, G. L., Sanchez-Lengeling, B., Outeiral, C., Farias, P. L. C., and Aspuru-Guzik, A. (2017). Objective-reinforced generative adversarial networks (organ) for sequence generation models. arXiv Preprint arXiv:1705.10843. doi:10.48550/arXiv.1705.10843
Guzman-Pando, A., Ramirez-Alonso, G., Arzate-Quintana, C., and Camarillo-Cisneros, J. (2024). Deep learning algorithms applied to computational chemistry. Mol. Divers. 28, 2375–2410. doi:10.1007/s11030-023-10771-y
Hasanzadeh, F. (2025). Bias recognition and mitigation strategies in artificial intelligence for healthcare. Npj Digit. Med. Available online at: https://www.nature.com/articles/s41746-025-01503-7.
Hasin, Y., Seldin, M., and Lusis, A. (2017). Multi-omics approaches to disease. Genome Biology 18, 1–15. doi:10.1186/s13059-017-1215-1
Holmes, M. V., Richardson, T. G., Ference, B. A., Davies, N. M., and Davey Smith, G. (2021). Integrating genomics with biomarkers and therapeutic targets to invigorate cardiovascular drug development. Nat. Rev. Cardiol. 18, 435–453. doi:10.1038/s41569-020-00493-1
Hong, S. H., Ryu, S., Lim, J., and Kim, W. Y. (2019). Molecular generative model based on an adversarially regularized autoencoder. J. Chemical Information Modeling 60, 29–36. doi:10.1021/acs.jcim.9b00694
Hu, Y., Lu, Y., Wang, S., Zhang, M., Qu, X., and Niu, B. (2019). Application of machine learning approaches for the design and study of anticancer drugs. Curr. Drug Targets 20, 488–500. doi:10.2174/1389450119666180809122244
Huang, K., Fu, T., Gao, W., Zhao, Y., Roohani, Y., Leskovec, J., et al. (2021). Therapeutics data commons: machine learning datasets and tasks for drug discovery and development. arXiv Preprint arXiv:2102.09548. doi:10.48550/arXiv.2102.09548
Iwaloye, O., Ottu, P. O., Olawale, F., Babalola, O. O. B., Elekofehinti, O. O., Kikiowo, B., et al. (2023). Computer-aided drug design in anti-cancer drug discovery: what have we learnt and what is the way forward?. Inform. Med. Unlocked. 41, 101332. doi:10.1016/j.imu.2023.101332
Jhoti, H., Williams, G., Rees, D. C., and Murray, C. W. (2013). The’rule of three’ for fragment-based drug discovery: where are we now? Nat. Reviews Drug Discovery 12, 644–645. doi:10.1038/nrd3926-c1
Jin, W., Barzilay, R., and Jaakkola, T. (2018). “Junction tree variational autoencoder for molecular graph generation,” in International conference on machine learning. Cambridge, MA: Proceedings of Machine Learning Research (PMLR), 2323–2332.
Johnson, C. H., Ivanisevic, J., and Siuzdak, G. (2016). Metabolomics: beyond biomarkers and towards mechanisms. Nat. Reviews Mol. Cell Biology 17, 451–459. doi:10.1038/nrm.2016.25
Jorgensen, W. L. (2004). The many roles of computation in drug discovery. Science 303, 1813–1818. doi:10.1126/science.1096361
Justus, C. R., Leffler, N., Ruiz-Echevarria, M., and Yang, L. V. (2014). In vitro cell migration and invasion assays. J. Vis. Exp., 51046. doi:10.3791/51046
Kairouz, P. (2025). Federated learning: a survey on privacy-preserving distributed machine learning. Available online at: https://arxiv.org/abs/2504.17703 (Accessed on August 27, 2025).
Kalman, R. E. (1963). Mathematical description of linear dynamical systems. J. Soc. Industrial Appl. Math. 1, 152–192. doi:10.1137/0301010
Kim, H., and Kim, Y. M. (2018). Pan-cancer analysis of somatic mutations and transcriptomes reveals common functional gene clusters shared by multiple cancer types. Sci. Reports 8, 6041. doi:10.1038/s41598-018-24379-y
Kori, M., and Yalcin Arga, K. (2018). Potential biomarkers and therapeutic targets in cervical cancer: insights from the meta-analysis of transcriptomics data within network biomedicine perspective. PLoS One 13, e0200717. doi:10.1371/journal.pone.0200717
Kusner, M. J., Paige, B., and Hernández-Lobato, J. M. (2017). “Grammar variational autoencoder,” in International conference on machine learning. Cambridge, MA: Proceedings of Machine Learning Research (PMLR), 1945–1954.
Landrum, G. (2006). Rdkit: Open-source cheminformatics. Available online at: http://www.rdkit.org.
Lanza, V. F., Baquero, F., de la Cruz, F., and Coque, T. M. (2017). Accnet (ac cessory genome c onstellation net work): comparative genomics software for accessory genome analysis using bipartite networks. Bioinformatics 33, 283–285. doi:10.1093/bioinformatics/btw601
LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning. Nature 521, 436–444. doi:10.1038/nature14539
Li, Z., Ivanov, A. A., Su, R., Gonzalez-Pecchi, V., Qi, Q., Liu, S., et al. (2017). The oncoppi network of cancer-focused protein–protein interactions to inform biological insights and therapeutic strategies. Nat. Communications 8, 14356. doi:10.1038/ncomms14356
Li, Y., Vinyals, O., Dyer, C., Pascanu, R., and Battaglia, P. (2018). Learning deep generative models of graphs. arXiv Preprint arXiv:1803.03324. doi:10.48550/arXiv.1803.03324
Li, Y., Pei, J., and Lai, L. (2021). Learning to design drug-like molecules in three-dimensional space using deep generative models. arXiv Preprint arXiv:2104.08474. doi:10.48550/arXiv.2104.08474
Lim, J., Ryu, S., Kim, J. W., and Kim, W. Y. (2018). Molecular generative model based on conditional variational autoencoder for de novo molecular design. J. Cheminformatics 10, 1–9. doi:10.1186/s13321-018-0286-7
Lionta, E., Spyrou, G., Vassilatis, D., and Cournia, Z. (2014). Molecular docking in drug discovery: principles and recent applications. J. Pharm. Biomed. Analysis 93, 85–95.
Ma, J., Sheridan, R. P., Liaw, A., Dahl, G. E., and Svetnik, V. (2015). Deep neural nets as a method for quantitative structure–activity relationships. J. Chemical Information Modeling 55, 263–274. doi:10.1021/ci500747n
Makhzani, A., Shlens, J., Jaitly, N., Goodfellow, I., and Frey, B. (2015). “Adversarial autoencoders,” in arXiv preprint arXiv:1511.
Mao, F., Ni, W., Xu, X., Wang, H., Wang, J., Ji, M., et al. (2016). Chemical structure-related drug-like criteria of global approved drugs. Molecules 21, 75. doi:10.3390/molecules21010075
Markman, M., and Mekhail, T. M. (2002). Paclitaxel in cancer therapy. Expert Opinion Pharmacotherapy 3, 755–766. doi:10.1517/14656566.3.6.755
Maziarka, Ł., Pocha, A., Kaczmarczyk, J., Rataj, K., Danel, T., and Warchoł, M. (2020). Mol-cyclegan: a generative model for molecular optimization. J. Cheminformatics 12, 2. doi:10.1186/s13321-019-0404-1
Mhra, F. H. (2021). Good machine learning practice for medical device development. Available online at: https://www.fda.gov/medical-devices/software-medical-device-samd/good-machine-learning-practice-medical-device-development-guiding-principles (Accessed on August 27, 2025).
Mulcahy, A., Rennane, S., Schwam, D., Dickerson, R., Baker, L., and Shetty, K. (2025). Use of clinical trial characteristics to estimate costs of new drug development. JAMA Netw. Open 8, e2453275. doi:10.1001/jamanetworkopen.2024.53275
Nicolás-Sáenz, L., Guerrero-Aspizua, S., Pascau, J., and Muñoz-Barrutia, A. (2020). Nonlinear image registration and pixel classification pipeline for the study of tumor heterogeneity maps. Entropy 22, 946. doi:10.3390/e22090946
Olurotimi, O. (1994). Recurrent neural network training with feedforward complexity. IEEE Trans. Neural Networks 5, 185–197. doi:10.1109/72.279184
Ong, S. E., and Mann, M. (2005). Mass spectrometry–based proteomics turns quantitative. Nat. Chemical Biology 1, 252–262. doi:10.1038/nchembio736
Ozaki, K., Ohnishi, Y., Iida, A., Sekine, A., Yamada, R., Tsunoda, T., et al. (2002). Functional snps in the lymphotoxin-α gene that are associated with susceptibility to myocardial infarction. Nat. Genetics 32, 650–654. doi:10.1038/ng1047
Pandiyan, S., and Wang, L. (2022). A comprehensive review on recent approaches for cancer drug discovery associated with artificial intelligence. Comput. Biol. Med. 150, 106140. doi:10.1016/j.compbiomed.2022.106140
Perakakis, N., Yazdani, A., Karniadakis, G. E., and Mantzoros, C. (2018). Omics, big data and machine learning as tools to propel understanding of biological mechanisms and to discover novel diagnostics and therapeutics. Metabolism. 87, A1-A9. doi:10.1016/j.metabol.2018.08.002
Pinedo, H. M., and Peters, G. (1988). Fluorouracil: biochemistry and pharmacology. J. Clinical Oncology 6, 1653–1664. doi:10.1200/JCO.1988.6.10.1653
Polishchuk, P. G., Madzhidov, T. I., and Varnek, A. (2013). Estimation of the size of drug-like chemical space based on gdb-17 data. J. Computer-Aided Molecular Design 27, 675–679. doi:10.1007/s10822-013-9672-4
Pollastri, M. P. (2010). Overview on the rule of five. Curr. Protocols Pharmacology 49, 9–12. doi:10.1002/0471141755.ph0912s49
Popova, M., Isayev, O., and Tropsha, A. (2018). Deep reinforcement learning for de novo drug design. Sci. Advances 4, eaap7885. doi:10.1126/sciadv.aap7885
Pozarowski, P., and Darzynkiewicz, Z. (2004). Analysis of cell cycle by flow cytometry. Methods in Cell. Biol. (Elsevier) 75, 301–321. doi:10.1385/1-59259-811-0:301
Prada-Gracia, D., Huerta-Yépez, S., and Moreno-Vargas, L. M. (2016). Application of computational methods for anticancer drug discovery, design, and optimization. Bol. Médico Del Hosp. Infant. México English Ed. 73, 411–423. doi:10.1016/j.bmhimx.2016.10.006
Prasanna, S., and Doerksen, R. J. (2009). Topological polar surface area: a useful descriptor in 2d-qsar. Curr. Med. Chem. 16, 21–41. doi:10.2174/092986709787002817
Pu, C., Cui, H., Yu, H., Cheng, X., Zhang, M., Qin, L., et al. (2025). Oral enpp1 inhibitor designed using generative ai as next generation sting modulator for solid tumors. Nat. Commun. 16, 4793. doi:10.1038/s41467-025-59874-0
Qadri, Y. A., Shaikh, S., Ahmad, K., Choi, I., Kim, S. W., and Vasilakos, A. V. (2025). Explainable artificial intelligence: a perspective on drug discovery and development. Pharmaceutics 17, 1119. doi:10.3390/pharmaceutics17091119
Ravindran, V., Sunitha, V., and Bagler, G. (2017). Identification of critical regulatory genes in cancer signaling network using controllability analysis. Phys. A Stat. Mech. Its Appl. 474, 134–143. doi:10.1016/j.physa.2017.01.059
Rezende, D. J., Mohamed, S., and Wierstra, D. (2014). “Stochastic backpropagation and approximate inference in deep generative models,” in International conference on machine learning. Cambridge, MA: Proceedings of Machine Learning Research (PMLR), 1278–1286.
Samanta, B., De, A., Jana, G., Gómez, V., Chattaraj, P., Ganguly, N., et al. (2020). Nevae: a deep generative model for molecular graphs. J. Machine Learning Research 21, 1–33.
Savjani, K. T., Gajjar, A. K., and Savjani, J. K. (2012). Drug solubility: importance and enhancement techniques. Int. Sch. Res. Notices 2012, 195727. doi:10.5402/2012/195727
Semenza, G. L. (2000). Hif-1: mediator of physiological and pathophysiological responses to hypoxia. J. Appl. Physiology 88, 1474–1480. doi:10.1152/jappl.2000.88.4.1474
Shoemaker, R. H. (2006). The nci-60 human tumour cell line anticancer drug screen. Nat. Rev. Cancer 6, 813–823. doi:10.1038/nrc1951
Shultz, M. D. (2019). Two decades under the influence of the rule of five and the changing properties of approved oral drugs. J. Med. Chem. 62, 1701–1714. PMID: 30212196. doi:10.1021/acs.jmedchem.8b00686
Siddiqui, B., Yadav, C. S., Akil, M., Faiyyaz, M., Khan, A. R., Ahmad, N., et al. (2025). Artificial intelligence in computer-aided drug design (cadd) tools for the finding of potent biologically active small molecules: traditional to modern approach. Comb. Chem. & High Throughput Screen. 28. doi:10.2174/0113862073334062241015043343
Singh, D. (2016). Defining desirable natural product derived anticancer drug space: optimization of molecular physicochemical properties and admet attributes. ADMET DMPK 4, 98–113. doi:10.5599/admet.4.2.291
Staton, C. A., Reed, M. W. R., and Brown, N. J. (2009). A critical analysis of current in vitro and in vivo angiogenesis assays. Int. J. Exp. Pathology 90, 195–221. doi:10.1111/j.1365-2613.2008.00633.x
Stumpfe, D., and Bajorath, J. (2012). Exploring activity cliffs in medicinal chemistry: miniperspective. J. Medicinal Chemistry 55, 2932–2942. doi:10.1021/jm201706b
Sugihara, E., and Saya, H. (2013). Complexity of cancer stem cells. Int. Journal Cancer 132, 1249–1259. doi:10.1002/ijc.27961
Swanson, K., Wu, W., Bulaong, N. L., Pak, J. E., and Zou, J. (2025). The virtual lab of ai agents designs new sars-cov-2 nanobodies. Nature 1–3. doi:10.1038/s41586-025-09442-9
Tang, Y. (2023). Deep learning in drug discovery: applications and limitations. Front. Comput. Intell. Syst. 3, 118–123. doi:10.54097/fcis.v3i2.7575
Titilayo, O., and Adetunji, C. O. (2025). “Artificial intelligence and deep learning process and drug discovery,” in Health technologies and informatics. Boca Raton, FL: CRC Press (Taylor & Francis Group), 102–108.
Tyagi, E., Kumari, P., Prakash, A., and Bhuyan, R. (2025). Revolutionizing anti-cancer drug discovery: the role of artificial intelligence. Int. J. Bioinforma. Intelligent Comput. 4, 01–38. doi:10.61797/ijbic.v3i2.323
US Department of Health and Human Services, Office for Civil Rights (2025). Summary of the hipaa privacy rule. Available online at: https://www.hhs.gov/hipaa/for-professionals/privacy/laws-regulations/index.html (Accessed on August 27, 2025).
US Food and Drug Administration (2021). Artificial intelligence/machine learning (ai/ml)-based software as a medical device (samd) action plan. Available online at: https://www.fda.gov/media/145022/download (Accessed on August 27, 2025).
Vinayagam, A., Gibson, T. E., Lee, H. J., Yilmazel, B., Roesel, C., Hu, Y., et al. (2016). Controllability analysis of the directed human protein interaction network identifies disease genes and drug targets. Proc. Natl. Acad. Sci. 113, 4976–4981. doi:10.1073/pnas.1603992113
Wang, T., Shao, W., Huang, Z., Tang, H., Zhang, J., Ding, Z., et al. (2021). Mogonet integrates multi-omics data using graph convolutional networks allowing patient classification and biomarker identification. Nat. Communications 12, 3445. doi:10.1038/s41467-021-23774-w
Wang, M., Wang, Z., Sun, H., Wang, J., Shen, C., Weng, G., et al. (2022). Deep learning approaches for de novo drug design: an overview. Curr. Opinion Structural Biology 72, 135–144. doi:10.1016/j.sbi.2021.10.001
Wang, L., Song, Y., Wang, H., Zhang, X., Wang, M., He, J., et al. (2023). Advances of artificial intelligence in anti-cancer drug design: a review of the past decade. Pharmaceuticals 16 (2), 253. doi:10.3390/ph16020253
Wilson, S., and Filipp, F. V. (2018). A network of epigenomic and transcriptional cooperation encompassing an epigenomic master regulator in cancer. Npj Syst. Biol. Appl. 4, 24. doi:10.1038/s41540-018-0061-4
World-Health-Organization (2020). Cancer. Available online at: https://www.who.int/news-room/fact-sheets/detail/cancer.
Xiao, Z., Morris-Natschke, S. L., and Lee, K. H. (2016). Strategies for the optimization of natural leads to anticancer drugs or drug candidates. Med. Res. Rev. 36, 32–91. doi:10.1002/med.21377
Xu, Y., Lin, K., Wang, S., Wang, L., Cai, C., Song, C., et al. (2019). Deep learning for molecular generation. Future Medicinal Chemistry 11, 567–597. doi:10.4155/fmc-2018-0358
Yang, K., Xia, B., Wang, W., Cheng, J., Yin, M., Xie, H., et al. (2017). A comprehensive analysis of metabolomics and transcriptomics in cervical cancer. Sci. Reports 7, 43353. doi:10.1038/srep43353
Yang, S. Q., Ye, Q., Ding, J. J., Yin, M. Z., Lu, A. P., Chen, X., et al. (2021). Current advances in ligand-based target prediction. Wiley Interdiscip. Rev. Comput. Mol. Sci. 11, e1504. doi:10.1002/wcms.1504
You, Y., Lai, X., Pan, Y., Zheng, H., Vera, J., Liu, S., et al. (2022). Artificial intelligence in cancer target identification and drug discovery. Signal Transduct. Target. Ther. 7, 156. doi:10.1038/s41392-022-00994-0
Zak, K. M., Grudnik, P., Guzik, K., Zieba, B. J., Musielak, B., Dömling, A., et al. (2017). Structural basis for small-molecule inhibition of the pd-1/pd-l1 interaction. Curr. Opin. Struct. Biol. 45, 18–26. doi:10.18632/oncotarget.8730
Zhang, L., Liu, Y., Wang, M., Wu, Z., Li, N., Zhang, J., et al. (2017a). Ezh2-chd4-and idh-linked epigenetic perturbation and its association with survival in glioma patients. J. Molecular Cell Biology 9, 477–488. doi:10.1093/jmcb/mjx056
Zhang, C., Peng, L., Zhang, Y., Liu, Z., Li, W., Chen, S., et al. (2017b). The identification of key genes and pathways in hepatocellular carcinoma by bioinformatics analysis of high-throughput data. Med. Oncology 34, 1–13. doi:10.1007/s12032-017-0963-9
Zhang, L., Li, J., Yin, K., Jiang, Z., Li, T., Hu, R., et al. (2019). Computed tomography angiography-based analysis of high-risk intracerebral haemorrhage patients by employing a mathematical model. BMC Bioinformatics 20, 109–116. doi:10.1186/s12859-019-2741-5
Zhang, L., Dai, Z., Yu, J., and Xiao, M. (2021). Cpg-island-based annotation and analysis of human housekeeping genes. Briefings Bioinformatics 22, 515–525. doi:10.1093/bib/bbz134
Keywords: anticancer drug design, deep learning, AI in drug discovery, chemical property optimization, anticancer drug properties
Citation: M’rhar K, Chadi M-A and Mousannif H (2025) Exploring deep learning approaches in anticancer drug design: a review of recent advances. Front. Drug Discov. 5:1713308. doi: 10.3389/fddsv.2025.1713308
Received: 27 September 2025; Accepted: 20 November 2025;
Published: 10 December 2025.
Edited by:
Nethaji Muniraj, Children’s National Hospital, United StatesReviewed by:
Guerrero-Aspizua, Sara, University Carlos III of Madrid, SpainIqbal Azad, Integral University, India
Copyright © 2025 M’rhar, Chadi and Mousannif. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Kaoutar M’rhar, ay5tcmhhci5jZWRAdWNhLmFjLm1h