- 1Columbia University, New York, NY, United States
- 2Information Science and Technology College, Dalian Maritime University, Dalian, China
Introduction: The increasing availability of real-world clinical compliance data provides unprecedented opportunities to model medication behaviors dynamically and personalize treatment strategies. However, the complex, heterogeneous, and often incomplete nature of these data presents significant modeling challenges, particularly for capturing medication nonadherence, patient-specific therapeutic dynamics, and drug interaction effects. Existing approaches, including statistical regression models and rule-based decision systems, often fail to capture the high-dimensional, temporally-evolving, and probabilistic characteristics inherent in medication trajectories, limiting their effectiveness in precision medicine and policy simulation contexts.
Methods: To address these limitations, we propose a novel intelligent computing framework that unifies probabilistic graphical modeling, deep temporal inference, and domain-informed strategy design. Our approach is instantiated in the Hierarchical Therapeutic Transformer (HTT), a Bayesian transformer-based model that captures therapeutic state transitions via structured latent variables and medication-aware attention mechanisms. Furthermore, we introduce the Pharmacovigilant Inductive Strategy (PIS), a training paradigm that integrates pharmacological priors, adaptive quantification, and entropy-driven curriculum learning to enhance robustness and generalizability. Our method effectively models dose-response variability, accounts for clinical data missingness, and generalizes across cohorts through a hierarchical latent prior framework.
Results and discussion: Experimental evaluations demonstrate that our system achieves state-of-the-art performance in predicting adherence patterns and clinical outcomes across diverse datasets, aligning with current advances in medication adherence modeling and probabilistic health informatics. This work provides a rigorous, interpretable, and scalable foundation for real-time decision support in pharmacotherapy, contributing to the broader goals of personalized medicine, drug safety monitoring, and computational clinical reasoning.
1 Introduction
The increasing complexity of clinical treatments and the individual variability in drug responses demand innovative computational approaches that go beyond static modeling. Drug behavior is influenced not only by physiological and molecular factors but also by dynamic clinical compliance data, such as dosage timing, adherence patterns, and patient-specific behaviors. Modeling such complexity has drawn inspiration from sequence learning architectures originally developed in vision applications, such as transformers, due to their capacity for capturing evolving data patterns (Maurício et al., 2023). Not only does the integration of compliance data into drug modeling allow for more precise prediction of pharmacokinetics and pharmacodynamics, but it also enhances the potential for personalized treatment planning and outcome prediction (Vrijens et al., 2012). With the rise of real-world data collected from electronic health records and wearable devices, incorporating clinical compliance into intelligent modeling frameworks enables real-time, adaptive simulations that better reflect real-life scenarios (Brown and Bussell, 2011). Therefore, developing an intelligent computing framework that leverages clinical compliance data is essential for capturing the dynamic nature of drug behavior and ultimately improving the efficacy and safety of pharmacotherapy (Wang et al., 2022). Recent studies have shown that nonadherence to prescribed medications remains one of the most under-addressed challenges in pharmacotherapy. It contributes significantly to treatment failure and worsens health outcomes. Moreover, poor adherence increases healthcare costs due to avoidable complications and hospitalizations. Understanding and modeling adherence dynamics is therefore essential to predict real-world drug exposure. Such models can help bridge the gap between prescription intent and actual patient behavior (Osterberg and Blaschke, 2005). Early modeling strategies focused on encoding well-established pharmacological relationships using deterministic frameworks grounded in curated knowledge and simplified assumptions (Tian et al., 2020). These models offered clear interpretability and helped simulate basic absorption, distribution, metabolism, and excretion (ADME) processes under controlled conditions (Yang et al., 2021). While these representations contributed valuable insights into drug mechanisms, they were often too rigid to capture patient-specific variability or adjust to real-time changes in adherence behavior (Hong et al., 2020). Their inability to accommodate evolving or noisy clinical data limited their utility in dynamic, personalized treatment settings (Sun et al., 2022). These challenges catalyzed a methodological shift toward more flexible paradigms that could incorporate richer behavioral and temporal patterns.
Building upon these foundations, newer computational models began to explore more adaptive strategies capable of learning from observed clinical phenomena (Rao et al., 2021). These approaches combined empirical data with structural modeling, incorporating statistical correlations and input features derived from large clinical datasets (Mai et al., 2021). Such frameworks allowed the integration of compliance indicators–like dose irregularities or temporal gaps in treatment–into outcome prediction pipelines, enabling more refined estimations of drug efficacy (Azizi et al., 2021). However, these methods often relied on domain-specific tuning and struggled with capturing long-range dependencies or sequential effects across time windows (Li et al., 2020). As the demand for continuous, patient-centered modeling increased, attention turned to temporal algorithms capable of handling evolving data contexts. Architectures such as ResMLP, though originally designed for image classification, have influenced the design of efficient feedforward mechanisms in time-series prediction models (Touvron et al., 2021).
Recent advances have led to the convergence of sequence modeling architectures and pretraining strategies for biomedical applications (Bhojanapalli et al., 2021). By leveraging time-aware neural networks and pretrained representations–many of which were initially established in medical imaging contexts–researchers have adapted such methods to dynamically adjust predictions based on incoming compliance data streams (Kim et al., 2022). These models excel at tracking longitudinal variations and recognizing subtle adherence patterns that may influence pharmacological trajectories. Nevertheless, despite their adaptive power, they often lack the transparency required for clinical interpretation and may fail to incorporate mechanistic knowledge critical for understanding drug interactions. As such, there is growing recognition that future solutions should merge structured pharmacological understanding with real-time data processing, paving the way for hybrid frameworks that combine interpretability with adaptability in clinical pharmacotherapy.
Based on the limitations of symbolic rigidity, shallow learning’s task-specificity, and deep learning’s opacity, we propose an intelligent computing framework that synergistically integrates clinical compliance data with dynamic modeling of drug behavior. Our method unifies temporal representation learning with domain-aware pharmacological reasoning, allowing for continuous, interpretable, and context-sensitive drug simulations. By leveraging both real-time adherence signals and pretrained medical embeddings, this approach captures patient-specific deviations and drug interactions over time, offering a robust and adaptive alternative to static or purely data-driven models. Furthermore, the incorporation of modular components enhances the framework’s scalability and adaptability to various therapeutic areas and patient cohorts. This not only bridges the gap between generalizability and personalization but also allows for real-world applications such as dosage optimization, risk prediction, and treatment adherence monitoring. Ultimately, this framework paves the way for more effective, safe, and patient-centered pharmacological interventions.
Our approach draws methodological inspiration from architectural innovations in visual computing, including transformers, self-supervised learning, and efficient feedforward networks, which we adapt to the context of temporal clinical modeling and compliance-aware decision support. The proposed approach offers several significant benefits:
2 Related work
2.1 Clinical compliance data modeling
Modeling clinical compliance data requires nuanced understanding of both structured and unstructured medical records, adherence patterns, and behavioral variability among patients (Luo et al., 2025). A significant body of research focuses on electronic health records (EHRs) and their use in identifying non-adherence signals, medication intake trends, and longitudinal tracking of clinical outcomes (Simpson et al., 2006). Traditional approaches have relied on statistical modeling, such as logistic regression and Cox proportional hazards models, to analyze medication adherence and its correlation with therapeutic success or failure. However, these models often fail to capture the temporal and contextual intricacies inherent in real-world compliance behaviors (Zhang et al., 2020). Recent advancements integrate temporal modeling techniques using deep learning, particularly recurrent neural networks (RNNs), gated recurrent units (GRUs), and transformers, to learn compliance sequences from timestamped medication logs. These methods allow for personalized adherence and real-time detection of deviation patterns (Kardas et al., 2013). Furthermore, research has explored multimodal data fusion, incorporating clinical notes, pharmacy refill records, wearable sensor data, and patient-reported outcomes to enrich compliance modeling. Natural language processing (NLP) is instrumental in parsing physician notes and discharge summaries to extract adherence-related information (Zhu et al., 2020). Moreover, federated learning and privacy-preserving computation paradigms have emerged to enable large-scale training of compliance models without compromising patient privacy (Brisimi et al., 2018). These distributed learning frameworks are increasingly critical in clinical settings where data sensitivity and regulatory compliance are paramount Ashtiani et al. (2021). Collectively, the integration of deep temporal models, multimodal data fusion, and privacy-aware computing underpins the current direction of intelligent systems for modeling clinical compliance data (Masana et al., 2020).
2.2 Drug behavior dynamics modeling
Understanding the dynamic behavior of drugs within the human body involves capturing complex pharmacokinetic (PK) and pharmacodynamic (PD) interactions (Zhang et al., 2025). Conventional models, such as compartmental models and physiologically based pharmacokinetic (PBPK) models, have long served as the foundation for quantifying drug absorption, distribution, metabolism, and excretion (ADME). While these models offer interpretability, they often require extensive domain-specific parameter tuning and may struggle to generalize across patient populations or new drug compounds (Jiao et al., 2010). In recent years, machine learning has transformed drug behavior modeling by enabling data-driven inference of PK/PD relationships (Sheykhmousa et al., 2020). Neural differential equations and hybrid modeling frameworks combine mechanistic insights with the flexibility of deep learning to accommodate inter-individual variability and complex dose-response relationships. These approaches dynamically adjust to real-time patient data, improving predictive accuracy and personalization (Mascarenhas and l Agarwal, 2021). Studies have also introduced graph neural networks (GNNs) and attention mechanisms to model molecular-level drug interactions and their downstream effects (Zhang et al., 2022). These architectures capture structural and relational information, facilitating the understanding of drug-drug interactions and polypharmacy effects (Dai and Gao, 2021). Coupling such models with clinical compliance data introduces a feedback mechanism that reflects the true drug exposure experienced by patients, rather than relying solely on prescribed regimens (Taori et al., 2020). This integrative modeling direction supports adaptive dosing strategies and real-time therapeutic monitoring. It lays the groundwork for intelligent systems that not only predict drug behavior but also suggest compliance-aware treatment optimizations based on patient-specific dynamics.
2.3 AI frameworks in clinical intelligence
Artificial intelligence frameworks designed for clinical intelligence aim to support decision-making, treatment personalization, and predictive modeling (Williams et al., 2009). These systems must handle heterogeneous data types, comply with regulatory standards, and deliver interpretable outputs to support clinical trust and adoption. Contemporary frameworks employ modular architectures that integrate data ingestion, preprocessing, feature engineering, and model interpretation pipelines (Brown et al., 2008). Graph-based and knowledge-infused architectures have become prevalent, leveraging medical ontologies such as SNOMED CT and UMLS to enhance data contextualization (Peng et al., 2022). In compliance-aware drug modeling, these frameworks facilitate the mapping of medication events to standardized terminologies, enabling more accurate cross-patient analysis and data harmonization (Bazi et al., 2021). Reinforcement learning (RL) has also seen application in adaptive treatment planning, where reward structures incorporate compliance fidelity and drug efficacy (Zheng et al., 2022). Explainability techniques, such as SHAP (SHapley Additive exPlanations) and attention visualizations, are critical for unveiling model behavior, especially in high-stakes domains like drug response modeling. Model accountability is further enforced through audit trails, provenance tracking, and validation against clinical benchmarks (Dong et al., 2022). Several platforms have integrated real-time compliance tracking into their AI workflows. These include mobile health (mHealth) solutions, edge computing for wearable integration, and cloud-based AI inference services. As these frameworks mature, they increasingly adopt ethical AI principles, including bias mitigation, fairness auditing, and robust handling of missing or noisy data. These developments contribute to a more holistic, intelligent framework capable of dynamically modeling drug behavior grounded in real-world clinical compliance.
3 Methods
3.1 Overview
Modeling medication-related phenomena poses a unique set of challenges within the machine learning community, particularly in the context of probabilistic inference and domain-specific representation. Unlike general image or text datasets, medication data encapsulate a wide array of intricate, structured, and often incomplete information, ranging from patient-level variability to pharmacological interactions and temporal dynamics. This subsection provides an overview of our proposed methodology for addressing these complexities.
Our starting point is a probabilistic framework for modeling patient-specific therapeutic processes over time. Unlike classical classification tasks, medication modeling requires handling uncertainty at both the observation and latent levels, including drug effects, side effects, patient adherence, and dosage variability. We formulate this as a hierarchical latent variable model, where unobserved states evolve over time and influence observable clinical outcomes. The model structure follows that of probabilistic temporal inference frameworks Rezende et al. (2014); Vaswani et al. (2017), which have been widely used in dynamic state-space modeling. We incorporate domain-specific priors by conditioning latent transitions on medication encodings, allowing the model to simulate personalized responses and adherence-aware treatment trajectories. This design supports interpretable, uncertainty-aware modeling in complex real-world clinical environments. Section 3.2 sets the stage by formalizing this task within a probabilistic graphical model framework.
We define a multi-level latent state space in which medication effects, patient responses, and clinical observations are represented as interdependent random variables, following the structure of probabilistic graphical models commonly used in temporal inference frameworks (Koller and Friedman, 2009).
This design is conceptually inspired by hierarchical graphical models, such as those proposed by Friston et al., which have been widely applied to dynamic inference in clinical and physiological settings (Friston et al., 2018). These models provide a principled framework for representing temporally-evolving hidden states and their interactions with observable variables in noisy, real-world environments. Building on this foundation, we extend the graphical paradigm to account for domain-specific structures in pharmacological contexts, particularly medication adherence and drug interaction effects. Our adaptation introduces adherence-aware latent transitions and personalized therapeutic priors to improve fidelity in modeling real-world drug behavior. In contrast to generic temporal models, our framework explicitly captures the influence of medication-specific dynamics on patient outcomes, offering both statistical expressiveness and clinical interpretability. This symbolic foundation not only allows us to capture uncertainty but also enables a principled approach to model interpretability and downstream clinical decision-making. The complexity of medication dynamics necessitates an expressive yet tractable modeling apparatus. Section 3.3 introduces our architecture, which we term the Hierarchical Therapeutic Transformer (HTT). This architecture is built upon a foundation of Bayesian neural computation and incorporates temporal attention mechanisms, enabling it to flexibly model long-range dependencies between drug administration events and patient outcomes. Unlike standard Bayesian neural networks, HTT integrates structured pharmacological priors and heterogeneous clinical contexts through a variational posterior defined over a sequence of latent therapeutic states. Furthermore, to mitigate the storage inefficiencies of MCMC sampling, we incorporate a generative compression module inspired by adversarial distillation paradigms. This not only reduces the memory footprint but also enables rapid posterior sampling at inference time without loss of uncertainty fidelity. Recognizing that real-world medication modeling transcends mere model design, Section 3.4 introduces our Pharmacovigilant Inductive Strategy (PIS). The strategy orchestrates the interplay between domain-specific priors, observational biases, and uncertainty calibration in a coherent training paradigm. In particular, we address the challenges of distributional shift due to cohort heterogeneity, missingness in longitudinal electronic health records, and the nuanced semantics of clinical endpoints. Our strategy employs an adaptive objective that balances epistemic and aleatoric uncertainty across subpopulations, leveraging both Bayesian ensemble estimates and entropy-based active sampling. Importantly, PIS is agnostic to specific pharmacological classes, allowing it to generalize across different therapeutic domains with minimal reconfiguration.
3.2 Preliminaries
Terminology Clarification: To ensure clarity and consistency across Sections 3.2–3.4, we summarize key variables, notations, and technical components used in our modeling framework in Table 1. This supports interpretability and improves accessibility to readers unfamiliar with latent state modeling or curriculum-based training. Medication modeling centers on representing and reasoning about the probabilistic effects of pharmaceutical interventions across diverse patient populations. The aim is to develop a structured inferential framework that captures uncertainty, individual heterogeneity, and the dynamic evolution of patient states under varying medication regimes. In this section, we present a formal mathematical characterization of this problem, establishing the probabilistic foundations and notational conventions that will underlie the remainder of our proposed methodology.
Let
here,
We introduce a latent variable
where
To model the influence of multiple concurrent medications, we define Equation 3:
where
with
Patient heterogeneity is encoded via a static covariate vector
To account for partial observability and missingness in
We are interested in modeling the posterior distribution over latent therapeutic states conditioned on observed data (Equation 7):
Given the intractability of exact inference in this temporal latent-variable model, our approach approximates the posterior using a structured variational distribution, inspired by sequence-aware variational inference frameworks such as those proposed in deep temporal models (Equation 8) (Fraccaro et al., 2016).
Temporal dependencies are further enriched through the introduction of retrospective attention. Define an attention window of size
where
In Figure 2, the variable
3.3 Hierarchical therapeutic transformer
The Hierarchical therapeutic transformer (HTT) is a novel latent sequence model that builds upon two foundational concepts: hierarchical latent variable modeling and attention-based temporal architectures. Hierarchical inference structures enable the representation of multiscale latent states corresponding to global patient traits, therapeutic dynamics, and local perturbations (Friston, 2008). In parallel, attention mechanisms from the Transformer architecture offer a flexible means to model non-Markovian dependencies in medical sequences (Vaswani et al., 2017). Our proposed HTT integrates these ideas by organizing latent states into a three-tier structure (
Figure 1. Hierarchical Therapeutic Transformer (HTT) architecture diagram. Overview of the Hierarchical Therapeutic Transformer (HTT) architecture, including multimodal encoders, hierarchical latent modeling, and medication-specific inference modules. See Section 3.3 for detailed description.
3.3.1 Hierarchical latent modeling
The foundation of the Hierarchical Therapeutic Transformer (HTT) lies in its ability to represent heterogeneous clinical data through a deeply structured probabilistic process that is both temporally coherent and semantically aligned with medical reasoning. Central to this architecture is the introduction of a three-tiered hierarchy of latent variables that capture multi-scale variations in patient response to pharmacological interventions. The highest level of abstraction, encoded by a global latent variable
The initialization follows standard Gaussian priors, where
We assume that clinical observations follow an isotropic Gaussian likelihood distribution
Inference of the global latent context
3.3.2 Transformer attention dynamics
To capture the non-Markovian, temporally extended dependencies characteristic of clinical treatment trajectories, the Hierarchical Therapeutic Transformer integrates a self-attention mechanism directly over the latent therapeutic states. This architectural design enables the model to flexibly aggregate information from the full or partial sequence history, thus enhancing its capacity to infer long-term pharmacodynamic effects and patient-specific progression trends. The attention mechanism operates by constructing a context-aware representation of each latent state
For each time step
Where
The transition encoder for
This recurrent formulation allows for temporal adaptability in the posterior, ensuring that latent transitions remain both expressive and computationally efficient. The interaction between attention-based encoding and variational inference permits the model to reason about future states while retaining calibrated epistemic uncertainty, crucial for clinical interpretability and safety-critical deployment.
3.3.3 Medication-informed generation
The Medication-Informed Generation (MIG) module, as illustrated in Figure 2, integrates pharmacological priors into the latent trajectory modeling. Each medication is represented by an embedding vector, which interacts with the current therapeutic latent state through a soft attention mechanism. This interaction modulates the latent state
Figure 2. Medication-Informed Generation schematic diagram. Medication-Informed Generation (MIG) module, which integrates drug embeddings into latent state transitions through attention mechanisms. See Section 3.3 for full explanation.
This formulation enables the model to disaggregate the influence of individual medications, a necessity in clinical decision-making where polypharmacy and drug interactions are prevalent. Each medication administered at time
To calibrate the model’s behavior under uncertainty, HTT incorporates multiple perspectives on variability in the latent space. One core approach is entropy-based quantification, where the marginal entropy
The expressiveness of this generative (Equation 20) framework is extended by conditioning the output decoder on clinical tasks, enabling the model to generalize across diverse objectives such as risk prediction, disease trajectory estimation, and intervention recommendation. For each task label
3.4 Pharmacovigilant inductive strategy
The deployment of medication models in real-world settings necessitates not only expressive generative capacities but also a principled training and inference paradigm that aligns with the complexities of clinical data. The Pharmacovigilant Inductive Strategy (PIS) is introduced to fulfill this role. It is designed to optimize the learning dynamics of our model under multiple epistemological and operational constraints, including domain heterogeneity, pharmacological structure, missing data, and regulatory interpretability. This section describes the strategy’s mathematical foundation, inductive routines, uncertainty-guided objectives, and domain alignment procedures (As shown in Figure 3).
Figure 3. Pharmacovigilant inductive strategy architecture illustration. The figure outlines the model pipeline, comprising a U-Net Block, Uncertainty-Aware Optimization module, and a Bottleneck component integrated into a generative clinical model. The top pathway depicts the sequence of operations across time steps with skip connections, while the lower panels detail the internal structures of the U-Net Block and Bottleneck. The U-Net Block leverages Curriculum and Active Sampling strategies, including entropy-aware sample selection and missing data imputation. The Bottleneck facilitates representation bottling with hierarchical embeddings. The full architecture supports uncertainty quantification, pharmacological regularization, and domain alignment, aligning with the structured learning objectives described in the Pharmacovigilant Inductive Strategy.
3.4.1 Uncertainty-aware optimization
To enable reliable deployment of generative models in clinical contexts where risk-awareness and interpretability are critical, we propose an optimization framework that tightly couples probabilistic modeling objectives with uncertainty quantification and pharmacological structure regularization. At the heart of this approach is a variational objective derived from the evidence lower bound (ELBO), which maximizes the expected likelihood of observed clinical outcomes under the inferred latent states while penalizing divergence from the generative prior. This formulation naturally accommodates sequential inference and probabilistic reasoning, allowing the model to learn structured representations that generalize across time and patient populations. Let
However, optimizing ELBO alone is insufficient in domains where prior knowledge exists about pharmacological mechanisms and their expected outcomes. To guide the learning process in accordance with these constraints, we introduce a pharmacological regularization term–formally defined in Equation 23–that aligns the model’s internal therapeutic representations with established pharmacological principles, thereby enforcing consistency between the latent therapeutic response and known dose-response functions, and ultimately ensuring that the predictions remain both biologically plausible and clinically meaningful. For each administered drug
To further enhance the robustness of the model, we incorporate a variance-aware regularization term that modulates the strength of prediction penalties according to the epistemic uncertainty of each sample. Let
Building upon this, the uncertainty-weighted reconstruction loss compares the model’s prediction
3.4.2 Curriculum and active sampling
In complex clinical datasets where missingness is ubiquitous and data quality is highly variable across patient trajectories, it is imperative for a generative model to dynamically modulate its learning strategy in response to the epistemic characteristics of each input. To this end, we introduce a dual mechanism of entropy-based curriculum learning and uncertainty-driven active sampling that jointly enhance the model’s robustness and data efficiency. The training process begins by prioritizing low-uncertainty, high-confidence samples, gradually expanding to include more ambiguous or noisy instances as the model matures. Missing data are handled through a selective imputation approach grounded in the latent generative process. Let
For the missing entries, imputation is performed using the model’s posterior predictive distribution, allowing the generator to infer plausible values from the latent representation. These imputed values
To ensure that the encoder benefits from learning meaningful imputations, an auxiliary gradient is injected into the update path, guiding the parameters
Entropy-based curriculum sampling Bengio et al. (2009) prioritizes the inclusion of low-entropy samples in early training epochs. The marginal entropy
To maximize information efficiency and minimize unnecessary computation, we introduce a Bayesian acquisition function based on the BALD (Bayesian Active Learning by Disagreement) criterion. This function selects training samples that offer maximal mutual information between observed data and the model’s latent representation, thereby targeting examples that yield the greatest epistemic gain. Let
3.4.3 Domain and semantic alignment
Given the heterogeneity of clinical environments and the diversity of pharmacological knowledge across patient subpopulations, it is essential that our modeling framework incorporates both task-specific inferential flexibility and domain-aware representational adaptation (As shown in Figure 4).
Figure 4. Domain and semantic alignment diagram. The figure illustrates a unified multi-modal learning framework integrating video, audio, and text modalities through modality-specific encoders followed by domain and semantic alignment modules. These modules align features across demographic or institutional domains and harmonize task-specific knowledge through specialized decoders. Cross-modal encoders facilitate interaction among representations, and task-specific losses are optimized in combination with domain priors, semantic regularization using pharmacological graphs, and robustness penalties to ensure adaptability under distributional shifts.
To achieve this, we develop a domain-semantic alignment strategy that unifies multi-task learning, latent prior specialization, and structured pharmacological regularization under a common optimization paradigm. Each clinical task
In parallel, to ensure adaptability across demographic or institutional cohorts, we parameterize the prior distribution over the global latent variable
To integrate expert pharmacological knowledge into the embedding space, we construct a drug interaction graph and encode it as a Laplacian matrix
To promote robustness under distributional shift and noisy observations, we introduce a test-time penalty that captures model sensitivity and posterior drift. This robustness term is composed of two components: the predictive variance over the decoder outputs, which reflects the model’s confidence, and the KL divergence between the variational posterior and prior, which quantifies representational displacement. The total robustness criterion
These components are jointly optimized through a unified loss function that consolidates the generative, pharmacological, uncertainty-aware, and semantic regularization objectives. This results in the final training objective that governs all modules of the model (Equation 35):
4 Experimental setup
4.1 Dataset
The experimental evaluation of our framework relies on four diverse and complementary datasets that span different aspects of pharmacological modeling and clinical decision-making.
To ensure transparency and reproducibility, we explicitly disclose the sources of clinical compliance data used in our framework. These data are primarily drawn from the MIMIC-IV and eICU-CRD datasets, both of which are publicly accessible and widely adopted in clinical AI research. MIMIC-IV is curated by the MIT Laboratory for Computational Physiology and provides rich temporal records of ICU patients, including medication orders, dosage times, and adherence-related observations. eICU-CRD complements this by aggregating multi-center records from over 200 hospitals, thereby offering a broader population distribution and institutional diversity. These two datasets jointly provide comprehensive and high-resolution trajectories for modeling drug compliance and adherence patterns. Their detailed structure allows our model to capture therapeutic state transitions under real-world conditions, supporting valid clinical interpretation. The MIMIC-IV dataset Edin et al. (2023) is a large-scale, de-identified electronic health record database derived from patients admitted to critical care units at the Beth Israel Deaconess Medical Center. It includes detailed information on patient demographics, diagnoses, procedures, medications, laboratory tests, and charted clinical observations, providing a rich temporal structure and high-resolution longitudinal trajectories essential for evaluating therapeutic modeling under real-world conditions. Complementing this, the eICU Collaborative Research Database (eICU-CRD) Zhang et al. (2024) contains data from a multi-center critical care setting, aggregating clinical records from over 200 hospitals across the United States. This dataset adds a layer of institutional and population diversity, allowing us to examine model generalization across different care environments and to investigate domain adaptation under distributional shifts. For experimental validation of drug-induced gene expression responses and toxicogenomic effects, we leverage the Open TG-GATEs dataset Jiang et al. (2023), which offers in vitro and in vivo toxicological profiles for a wide range of compounds tested in rat and human liver samples. The dataset contains transcriptomic measurements following single and repeated dose exposures, thereby enabling detailed pharmacodynamic inference and model alignment with known mechanistic pathways. DrugCombDB Wu et al. (2022) provides a comprehensive repository of drug combination experiments that report synergistic and antagonistic effects observed in various cancer cell lines. It encompasses over one million drug interaction pairs, each annotated with combination scores and experimental contexts, facilitating the assessment of our model’s ability to reason about polypharmacy and to generalize across multi-agent therapeutic settings. Together, these datasets form a coherent foundation for validating our proposed approach across patient-level, molecular-level, and population-level tasks, with each dataset contributing unique structural and semantic properties that stress-test different components of the model. To improve clarity and reproducibility, we provide detailed variable descriptions for each dataset. In our formulation,
4.2 Experimental details
We implement all experimental procedures in PyTorch and execute them on 32 GB NVIDIA Tesla V100 GPUs. To accelerate training and reduce memory overhead, we apply mixed-precision training, a widely adopted technique for deep neural networks in large-scale settings (Micikevicius et al., 2018). The model optimization follows the Adam algorithm (Kingma and Ba, 2015), initialized with a learning rate of 1e-4 and scheduled using cosine annealing over epochs. We use a batch size of 64 across all datasets and train for 200 epochs with early stopping based on validation loss to prevent overfitting. Weight decay is set to 1e-4 to regularize the network and minimize overfitting risk. Data augmentation techniques include random horizontal flipping, random cropping, color jittering, and normalization, following common strategies for improving generalization in vision tasks (Shorten and Khoshgoftaar, 2019). For datasets with limited samples such as Open TG-GATEs and DrugCombDB, we incorporate aggressive augmentation and stratified sampling to maintain class balance during training. For feature extraction, we use a ResNet-50 backbone pre-trained on MIMIC-IV as the base encoder, followed by task-specific prediction heads tailored to each dataset. For classification tasks, the head is a fully connected layer followed by a softmax activation. Cross-entropy loss serves as the objective function. For fine-grained classification, attention modules are integrated to enhance focus on discriminative sub-regions. For texture analysis in DrugCombDB, multi-scale intermediate features are aggregated using global average pooling. The backbone’s lower layers are frozen for the initial 10 epochs, followed by gradual unfreezing to enable fine-tuning. Learning rate warm-up is used for the first 5 epochs with linear scaling. During evaluation, we report top-1 accuracy and macro-averaged F1-score. For imbalanced datasets such as CaleICU-CRD, we also compute per-class accuracy and confusion matrices. All experimental outcomes are averaged over three runs with different random seeds. Grid search on the validation set is used for hyperparameter tuning, mainly for learning rate and weight decay. To enhance training stability–particularly in deeper models–we apply gradient clipping with a maximum norm of 5.0. The best-performing checkpoint based on validation accuracy is used for final evaluation. To ensure reproducibility, all seeds are fixed and training hyperparameters are logged. Progress is visualized using TensorBoard for real-time monitoring. For generalization evaluation, additional held-out datasets are used when available. In transfer learning settings, we freeze the encoder initially and later fine-tune it to compare against baseline methods. This design ensures consistency across experiments by maintaining controlled training protocols, evaluation metrics, and learning configurations.
To clarify the training sample construction process, we followed a consistent data structuring protocol across all datasets. Each training instance is defined as a tuple
We conducted dataset-specific validation to ensure reliable and interpretable evaluation. For MIMIC-IV and eICU-CRD, we used time-aware splitting where patient trajectories were partitioned into 80% training and 20% validation sets without temporal leakage. The predicted variable
To enhance replicability, we provide detailed descriptions of the variables analyzed in each dataset. In MIMIC-IV and eICU-CRD, we focus on temporal medication records, adherence labels, and clinical outcomes such as readmission and mortality. In Open TG-GATEs, transcriptomic measurements under different dosage and time conditions serve as features, while toxicity markers serve as targets. For DrugCombDB, drug pair embeddings and cell line types are used to predict combination response scores. We also include explicit preprocessing steps and label generation rules for each dataset. These additions aim to improve transparency and enable reproducibility of our experiments across the four heterogeneous datasets.
We conducted dataset-specific validation to ensure reliable and interpretable evaluation. For MIMIC-IV and eICU-CRD, we used time-aware splitting where patient trajectories were partitioned into 80% training and 20% validation sets without temporal leakage. The predicted variable
4.3 Comparison with SOTA methods
We evaluate our proposed model across four standard benchmarks and compare its performance against representative state-of-the-art (SOTA) models. As shown in Tables 3, 4, on the MIMIC-IV dataset, our model achieves an accuracy of 82.74%, surpassing Swin-T and ConvNeXt-T by more than 2 percentage points, and improving AUC to 87.66%, which reflects a better balance between sensitivity and specificity. This gain is especially significant given the challenging nature of MIMIC-IV in terms of inter-class similarity and intra-class variability. Our method leverages an adaptive feature enhancement module that captures both spatial and semantic dependencies more effectively than traditional CNN-based backbones or transformer-based encoders. On the CaleICU-CRD dataset, which includes more fine-grained and cluttered classes, our method yields 90.15% accuracy and a superior F1 score of 88.98%. This improvement demonstrates the model’s robustness in recognizing objects under diverse backgrounds and class imbalances, largely attributed to our attention-enhanced token fusion design which aligns with semantic priors during representation learning.
Table 3. Performance benchmarking of our approach against leading techniques on MIMIC-IV and CaleICU-CRD datasets.
Table 4. Performance benchmarking of our approach against leading techniques on Open TG-GATEs and DrugCombDB datasets.
In addition to large-scale and mid-scale classification benchmarks in Figures 5, 6, we conduct evaluations on the Open TG-GATEs and DrugCombDB datasets to measure fine-grained recognition and texture discrimination, respectively. Our method achieves 94.36% accuracy and 96.12% AUC on Open TG-GATEs, outperforming Swin-T by nearly 2% and validating the model’s ability to capture minute differences among visually similar classes. The high recall (93.28%) and F1 score (92.75%) demonstrate precise and consistent performance, particularly in handling medication-related categories with substantial variability. This robustness is evident across diverse pharmacological profiles, allowing adaptability to a wide range of drugs, while also accommodating heterogeneous patient characteristics that reflect real-world complexity. Furthermore, the model maintains stability when applied to evolving treatment trajectories, highlighting its capacity to deliver reliable results across different stages of care and diverse clinical scenarios. On the DrugCombDB dataset, which emphasizes texture-based attribute prediction rather than object classification, our model maintains superiority with an accuracy of 79.88% and AUC of 83.66%. The improvement over DeiT-S and ViT-B/16 highlights the strength of our feature regularization and multiscale encoding strategies in handling stochastic and perceptual patterns. In both datasets, we adopt the same architectural setup and do not require dataset-specific tuning, showing that our approach generalizes well under domain-shift and resolution-variant conditions. These improvements reflect our method’s ability to learn both global configuration and localized patterns simultaneously, which proves effective for tasks that demand high sensitivity to individualized treatment effects and nuanced pharmacological patterns, as emphasized in our design philosophy. Across all comparisons, our method demonstrates not only quantitative superiority but also qualitative improvements in decision boundaries and feature embeddings. We observe that competing models either focus too rigidly on global structure (ViT-B/16) or lose detail in shallow layers (ResNet-50), leading to compromised performance in fine-grained and texture-centric scenarios. In contrast, our model integrates a dual-branch representation framework with mutual attention refinement, allowing complementary feature interactions that retain discriminative cues across layers. The model introduces consistency-aware supervision during training, which promotes class-specific variance reduction and enhances robustness to noise and imbalance. These contributions collectively explain the observed improvements in both classification and generalization. Notably, our approach achieves these results without introducing significant computational overhead, maintaining inference latency comparable to Swin-T and ConvNeXt-T. This balance of accuracy and efficiency makes our method suitable for real-world deployment in vision-based retrieval, identification, and segmentation pipelines where both accuracy and scalability are critical.
Figure 5. Performance benchmarking of our approach against leading techniques on MIMIC-IV and CaleICU-CRD datasets.
Figure 6. Performance benchmarking of our approach against leading techniques on Open TG-GATEs and DrugCombDB datasets.
4.4 Ablation study
We conduct a thorough ablation study on four standard datasets to investigate the individual contribution of each core module in our model. As shown in Tables 5, 6, we gradually remove three core modules–denoted as Hierarchical Latent Modeling, Transformer Attention Dynamics, and Uncertainty-Aware Optimization. Removing Hierarchical Latent Modeling, which corresponds to the adaptive feature alignment block, causes the most substantial degradation in accuracy on every dataset. For instance, on MIMIC-IV, accuracy drops from 82.74% to 80.12%, while F1 score decreases by over 3 points. These metrics are widely used to evaluate classifier performance under imbalanced distributions and multi-class setups, as shown in the comparative analysis by Sokolova and Lapalme (2009). This confirms the critical role of dynamic alignment in bridging multi-level feature gaps and ensuring coherent representation learning. On fine-grained datasets such as Open TG-GATEs, the absence of Hierarchical Latent Modeling causes a decrease of over 2 percentage points in accuracy and a notable decline in AUC, indicating the module’s effectiveness in capturing subtle inter-class differences, which aligns with our original design for fine-grained representation enhancement.
Table 5. Performance benchmarking of our approach against leading techniques on our model across MIMIC-IV and CaleICU-CRD datasets.
Table 6. Performance benchmarking of our approach against leading techniques on our model across open TG-GATEs and DrugCombDB datasets.
When Transformer Attention Dynamics in Figures 7, 8, the context-aware channel recalibration layer, is removed, performance also degrades noticeably but to a lesser extent than Hierarchical Latent Modeling. On CaleICU-CRD, accuracy reduces from 90.15% to 88.14%, and recall drops from 89.24% to 87.03%. This suggests that channel-wise attention helps the model prioritize semantically relevant filters, thereby improving discriminative ability especially under class imbalance conditions. The results on DrugCombDB further validate this, where without Transformer Attention Dynamics, AUC drops from 83.66% to 82.09%. Transformer Attention Dynamics’ design stems from the method’s philosophy of emphasizing task-adaptive attention, and its removal weakens the model’s robustness in texture-centric classification, where subtle feature reweighting plays an important role. For Uncertainty-Aware Optimization, which incorporates the mutual guidance cross-fusion mechanism, removing it leads to moderate yet consistent declines across all datasets. While less critical than Hierarchical Latent Modeling or Transformer Attention Dynamics in isolation, its removal still reduces MIMIC-IV F1 score from 80.93% to 79.11%, and lowers Open TG-GATEs recall from 93.28% to 92.45%. This aligns with Reimers et al., who emphasized the importance of reporting score distributions across multiple runs to improve transparency and robustness in performance evaluation (Reimers and Gurevych, 2017). In addition, our protocol follows reproducibility standards as advocated by Pineau et al., including fixed random seeds and consistent logging across trials (Pineau et al., 2021). Overall, the full model consistently outperforms all ablated variants across every metric and dataset, which demonstrates the synergistic value of all components in the proposed architecture. The combination of Hierarchical Latent Modeling, Transformer Attention Dynamics, and Uncertainty-Aware Optimization provides a robust and generalizable framework that excels across both coarse-grained and fine-grained classification tasks. These results confirm that each module contributes independently to performance while also interacting in a complementary manner to maximize the model’s representation and decision capabilities. Our method benefits from a well-balanced architecture design that captures global context, refines spatial saliency, and preserves feature diversity, all of which are essential for tackling real-world visual understanding challenges.
Figure 7. Performance benchmarking of our approach against leading techniques on our model across MIMIC-IV and CaleICU-CRD datasets.
Figure 8. Performance benchmarking of our approach against leading techniques on our model across open TG-GATEs and DrugCombDB datasets.
5 Conclusions and future work
In this work, we aimed to address the critical challenge of dynamically modeling drug behavior using real-world clinical compliance data. Traditional methods have struggled to manage the high dimensionality, temporal complexity, and incompleteness inherent in such data, particularly when modeling nonadherence, personalized therapeutic responses, and drug interactions. To overcome these issues, we proposed an intelligent computing framework centered around the Hierarchical Therapeutic Transformer (HTT), a Bayesian transformer-based model designed to represent therapeutic state transitions with structured latent variables and medication-specific attention mechanisms. Alongside this model, we introduced the Pharmacovigilant Inductive Strategy (PIS) – a training paradigm that integrates pharmacological priors, entropy-driven learning, and adaptive uncertainty quantification. Together, HTT and PIS allow for nuanced modeling of dose-response variability, robust handling of missing clinical data, and improved generalization across patient cohorts. Experimental results validated the system’s superior performance in predicting adherence patterns and clinical outcomes across diverse datasets, demonstrating its value for personalized medicine and real-time pharmacotherapy support.
Despite the promising results, there two main limitations in our current approach. While HTT effectively captures therapeutic trajectories, its interpretability–though improved over black-box deep models–still poses challenges for clinical integration, especially in scenarios requiring transparent and traceable decision paths. Future work could focus on enhancing explainability modules to make the model’s internal reasoning more accessible to clinicians. Our framework, although generalizable across datasets, may still underperform in low-resource or highly imbalanced settings due to its reliance on structured latent priors and pharmacological assumptions. Addressing this would require further innovations in unsupervised learning techniques or synthetic data augmentation to maintain performance robustness. In the long term, extending this framework to integrate genomic, lifestyle, and environmental data could further enrich its capacity for personalized therapeutic modeling. The findings of this study are significant as they demonstrate the effectiveness of a hierarchical Bayesian framework for modeling therapeutic state transitions from complex clinical data. By integrating hierarchical latent modeling, pharmacovigilant attention, and uncertainty-aware optimization, our model captures nuanced dose-response variability and improves generalization across patient cohorts. This work addresses key challenges in medication adherence analysis and provides a viable direction for AI-assisted clinical decision-making. While interpretability and data scarcity remain concerns, our modular design allows for seamless integration of future enhancements. These insights contribute to ongoing efforts in intelligent health modeling and lay the foundation for clinical deployment of robust, personalized treatment planning systems. Although our model demonstrates strong performance on retrospective EHR datasets, it has not yet been validated in real-time clinical workflows. In practice, factors such as irregular sampling, noisy input streams, and evolving care protocols could impact model behavior. As part of future work, we plan to collaborate with hospital partners to conduct prospective pilot studies where our model is integrated into decision support systems. This will enable direct measurement of clinical utility, workflow integration, and clinician trust. Such real-world validation is essential to ensure safe and effective deployment in complex healthcare environments.
Data availability statement
The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.
Ethics statement
This study involves no new data collection from human subjects. All analyses were conducted on publicly available, fully de-identified datasets–MIMIC-IV and eICU-CRD–that have received prior IRB approval from their respective institutions. Because no personally identifiable information was accessed, and data use followed approved agreements, no additional IRB review was required for this secondary analysis. The study complies with relevant ethical and privacy guidelines.
Author contributions
XW: Writing – original draft, Writing – review and editing, Data curation, Methodology, Supervision, Conceptualization, Formal analysis, Project administration, Validation, Investigation, Funding acquisition, Resources, Visualization, Software. HX: Writing – original draft, Writing – review and editing, Visualization, Supervision, Funding acquisition.
Funding
The author(s) declare that no financial support was received for the research and/or publication of this article.
Acknowledgments
The authors would like to thank all colleagues and collaborators who provided valuable feedback during the development of this research. We are especially grateful to the maintainers of publicly available clinical datasets such as MIMIC-IV and eICU-CRD, which made this study possible. Constructive discussions with peers in the fields of machine learning and clinical informatics significantly shaped the final version of this manuscript. We also thank the broader research community for their contributions to open-source tools and frameworks utilized in this work.
Conflict of interest
The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The author(s) declare that no Generative AI was used in the creation of this manuscript.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
References
Alhichri, H., Alswayed, A. S., Bazi, Y., Ammour, N., and Alajlan, N. A. (2021). Classification of remote sensing images using efficientnet-b3 cnn model with attention. IEEE access 9, 14078–14094. doi:10.1109/access.2021.3051085
Ashtiani, F., Geers, A. J., and Aflatouni, F. (2021). An on-chip photonic deep neural network for image classification. Nature 606, 501–506. doi:10.1038/s41586-022-04714-0
Azizi, S., Mustafa, B., Ryan, F., Beaver, Z., Freyberg, J., Deaton, J., et al. (2021). “Big self-supervised models advance medical image classification,” in IEEE international conference on computer vision.
Bazi, Y., Bashmal, L., Rahhal, M. M. A., Dayil, R. A., and Ajlan, N. A. (2021). Vision transformers for remote sensing image classification. Remote Sens. 13, 516. doi:10.3390/rs13030516
Bengio, Y., Louradour, J., Collobert, R., and Weston, J. (2009). “Curriculum learning,” in Proceedings of the 26th annual international conference on machine learning, 41–48.
Bhojanapalli, S., Chakrabarti, A., Glasner, D., Li, D., Unterthiner, T., and Veit, A. (2021). “Understanding robustness of transformers for image classification,” in IEEE international conference on computer vision.
Brisimi, T. S., Chen, R., Mela, T., Olshevsky, A., Paschalidis, I. C., and Shi, W. (2018). Federated learning of predictive models from federated electronic health records. Int. J. Med. Inf. 112, 59–67. doi:10.1016/j.ijmedinf.2018.01.007
Brown, M. T., and Bussell, J. K. (2011). Medication adherence: who cares? Mayo Clin. Proc. 86, 304–314. doi:10.4065/mcp.2010.0575
Brown, M., Davis, S., and Wilson, L. (2008). Population pharmacokinetic modelling for enterohepatic circulation of mycophenolic acid in healthy volunteers. Br. J. Clin. Pharmacol. 66, 758–766. doi:10.1111/j.1365-2125.2008.03109.x
Dai, Y., Gao, Y., and Liu, F. (2021). Transmed: transformers advance multi-modal medical image classification. Diagnostics 11, 1384. doi:10.3390/diagnostics11081384
Doersch, C. (2016). Tutorial on variational autoencoders. arXiv Prepr. arXiv:1606.05908. Available online at: https://arxiv.org/abs/1606.05908.
Dong, H., Zhang, L., and Zou, B. (2022). Exploring vision transformers for polarimetric sar image classification. IEEE Trans. Geoscience Remote Sens. 60, 1–15. doi:10.1109/tgrs.2021.3137383
Edin, J., Junge, A., Havtorn, J. D., Borgholt, L., Maistro, M., Ruotsalo, T., et al. (2023). “Automated medical coding on mimic-iii and mimic-iv: a critical review and replicability study,” in Proceedings of the 46th international ACM SIGIR conference on research and development in information retrieval, 2572–2582.
Fraccaro, M., Sønderby, S. K., Paquet, U., and Winther, O. (2016). Sequential neural models with stochastic layers. Adv. Neural Inf. Process. Syst. 29, 2199–2207. Available online at: https://proceedings.neurips.cc/paper/2016/hash/208e43f0e45c4c78cafadb83d2888cb6-Abstract.html.
Friston, K. (2008). Hierarchical models in the brain. PLoS Comput. Biol. 4, e1000211. doi:10.1371/journal.pcbi.1000211
Friston, K., Parr, T., and Zeidman, P. (2018). Graphical brain models: a variational approach. NeuroImage 171, 1018–1036. doi:10.1023/A:1007665907178
Ganin, Y., and Lempitsky, V. (2016). Domain-adversarial training of neural networks. J. Mach. Learn. Res. 17, 1–35. Available online at: https://www.jmlr.org/papers/v17/15-239.html.
Hong, D., Gao, L., Yao, J., Zhang, B., Plaza, A., and Chanussot, J. (2020). Graph convolutional networks for hyperspectral image classification. IEEE Trans. Geoscience Remote Sens. 59, 5966–5978. doi:10.1109/tgrs.2020.3015157
Hong, S., Wu, J., and Zhu, L. (2024). “A brain tumor classification algorithm based on vit-b/16,” in 2024 36th Chinese control and decision conference (CCDC) (IEEE), 3154–3159.
Jiang, J., van Ertvelde, J., Ertaylan, G., Peeters, R., Jennen, D., de Kok, T. M., et al. (2023). Unraveling the mechanisms underlying drug-induced cholestatic liver injury: identifying key genes using machine learning techniques on human in vitro data sets. Archives Toxicol. 97, 2969–2981. doi:10.1007/s00204-023-03583-4
Jiao, Z., Shi, X.-J., Geng, F., Cui, X.-Y., Qiu, X.-Y., and Zhong, M.-K. (2010). Cjk-13 association of mdr1, cyp3a4*18b and cyp3a5*3 genotypes and cyp3a phenotype by midazolam with the pharmacokinetics of tacrolimus in healthy Chinese. Annu. Meet. Jpn. Soc. Pharm. Health Care Sci. 20, 501–1. doi:10.20825/amjsphcs.20.0_501_1
Kardas, P., Lewek, P., and Matyjaszczyk, M. (2013). Determinants of patient adherence: a review of systematic reviews. Front. Pharmacol. 4, 91. doi:10.3389/fphar.2013.00091
Kim, H. E., Cosa-Linan, A., Santhanam, N., Jannesari, M., Maros, M., and Ganslandt, T. (2022). Transfer learning for medical image classification: a literature review. BMC Medical Imaging. Available online at: https://link.springer.com/article/10.1186/s12880-022-00793-7.
Kingma, D. P., and Ba, J. (2015). “Adam: a method for stochastic optimization,” in International conference on learning representations.
Kingma, D. P., and Welling, M. (2014). Auto-encoding variational bayes. arXiv Prepr. arXiv:1312). Available online at: https://indico.math.cnrs.fr/event/11377/attachments/4589/6915/18012024_Kingma-and-Welling-2022%20Auto-Encoding%20Variational%20Bayes.pdf.
Koller, D., and Friedman, N. (2009). Probabilistic graphical models: principles and techniques. MIT Press.
Koonce, B. (2021). “Resnet 50,” in Convolutional neural networks with swift for tensorflow: image recognition and dataset categorization (Springer), 63–72.
Li, D., and Zhang, C. (2024). “Identification of pests and diseases in greenhouse rice based on convnext-t neural network,” in 2024 international conference on distributed computing and optimization techniques (ICDCOT) (IEEE), 1–7.
Li, B., Li, Y., and Eliceiri, K. (2020). “Dual-stream multiple instance learning network for whole slide image classification with self-supervised contrastive learning,” in Computer vision and pattern recognition.
Luo, G., Kong, X., Wang, F., Wang, Z., Zhang, Z., Cui, H., et al. (2025). Therapeutic effects and mechanisms of fufang longdan mixture on metabolic syndrome with psoriasis via mir-29a-5p/igf-1r axis. Front. Pharmacol. 16, 1585369. doi:10.3389/fphar.2025.1585369
Mai, Z., Li, R., Jeong, J., Quispe, D., Kim, H. J., and Sanner, S. (2021). Online continual learning in image classification: an empirical survey. Neurocomputing 469, 28–51. doi:10.1016/j.neucom.2021.10.021
Masana, M., Liu, X., Twardowski, B., Menta, M., Bagdanov, A. D., and van de Weijer, J. (2020). Class-incremental learning: survey and performance evaluation on image classification. IEEE Trans. Pattern Analysis Mach. Intell. 45, 5513–5533. doi:10.1109/TPAMI.2022.3213473
Mascarenhas, S., and l Agarwal, M. (2021). “A comparison between vgg16, vgg19 and resnet50 architecture frameworks for image classification,” in 2021 international conference on disruptive technologies for multi-disciplinary research and applications (CENTCON).
Maurício, J., Domingues, I., and Bernardino, J. (2023). Comparing vision transformers and convolutional neural networks for image classification: a literature review. Appl. Sci. Available online at: https://www.mdpi.com/2076-3417/13/9/5521
Mescheder, L., Nowozin, S., and Geiger, A. (2017). “Adversarial variational bayes: unifying variational autoencoders and generative adversarial networks,” in Proceedings of the 34th international conference on machine learning, 2391–2400.
Micikevicius, P., Narang, S., Alben, J., Diamos, G., Elsen, E., Garcia, D., et al. (2018). “Mixed precision training,” in International conference on learning representations.
Osterberg, L., and Blaschke, T. (2005). Adherence to medication. N. Engl. J. Med. 353, 487–497. doi:10.1056/NEJMra050100
Peng, J., Huang, Y., Sun, W., Chen, N., Ning, Y., and Du, Q. (2022). Domain adaptation in remote sensing image classification: a survey. IEEE J. Sel. Top. Appl. Earth Observations Remote Sens. 15, 9842–9859. doi:10.1109/jstars.2022.3220875
Pineau, J., Vincent-Lamarre, P., Sinha, K., Larivière, V., Beygelzimer, A., d’Alché Buc, F., et al. (2021). Improving reproducibility in machine learning research. J. Mach. Learn. Res. 22, 1–20. Available online at: https://www.jmlr.org/papers/v22/20-303.html.
Rao, Y., Zhao, W., Zhu, Z., Lu, J., and Zhou, J. (2021). Global filter networks for image classification. Neural Inf. Process. Syst. Available online at: https://proceedings.neurips.cc/paper/2021/hash/07e87c2f4fc7f7c96116d8e2a92790f5-Abstract.html.
Reimers, N., and Gurevych, I. (2017). “Reporting score distributions makes a difference: performance study of lstm-networks for sequence tagging,” in Conference on empirical methods in natural language processing, 338–348.
Rezende, D. J., Mohamed, S., and Wierstra, D. (2014). Stochastic backpropagation and approximate inference in deep generative models. arXiv Prepr. arXiv:1401). Available online at: https://proceedings.mlr.press/v32/rezende14
Sheykhmousa, M., Mahdianpari, M., Ghanbari, H., Mohammadimanesh, F., Ghamisi, P., and Homayouni, S. (2020). Support vector machine versus random forest for remote sensing image classification: a meta-analysis and systematic review. IEEE J. Sel. Top. Appl. Earth Observations Remote Sens. 13, 6308–6325. doi:10.1109/jstars.2020.3026724
Shorten, C., and Khoshgoftaar, T. M. (2019). A survey on image data augmentation for deep learning. J. Big Data 6, 60. doi:10.1186/s40537-019-0197-0
Simpson, S. H., Eurich, D. T., Majumdar, S. R., et al. (2006). A meta-analysis of adherence to cardiovascular medication and cardiovascular disease outcomes. Archives Intern. Med. 166, 1806–1812. Available online at: https://academic.oup.com/crawlprevention/governor?content=%2feurheartj%2farticle-abstract%2f34%2f38%2f2940%2f442773.
Sokolova, M., and Lapalme, G. (2009). A systematic analysis of performance measures for classification tasks. Inf. Process. & Manag. 45, 427–437. doi:10.1016/j.ipm.2009.03.002
Sun, L., Zhao, G., Zheng, Y., and Wu, Z. (2022). Spectral–spatial feature tokenization transformer for hyperspectral image classification. IEEE Trans. Geoscience Remote Sens. 60, 1–14. doi:10.1109/tgrs.2022.3144158
Taori, R., Dave, A., Shankar, V., Carlini, N., Recht, B., and Schmidt, L. (2020). Measuring robustness to natural distribution shifts in image classification. Neural Inf. Process. Syst. Available online at: https://proceedings.neurips.cc/paper/2020/hash/d8330f857a17c53d217014ee776bfd50-Abstract.html.
Tian, Y., Wang, Y., Krishnan, D., Tenenbaum, J., and Isola, P. (2020). “Rethinking few-shot image classification: a good embedding is all you need?,” in European conference on computer vision.
Touvron, H., Bojanowski, P., Caron, M., Cord, M., El-Nouby, A., Grave, E., et al. (2021). Resmlp: feedforward networks for image classification with data-efficient training. IEEE Trans. Pattern Analysis Mach. Intell. 45, 5314–5321. doi:10.1109/TPAMI.2022.3206148
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., et al. (2017). “Attention is all you need,” in Advances in neural information processing systems, 5998–6008.
Vrijens, B., De Geest, S., Hughes, D. A., Kardas, P., Demonceau, J., Ruppar, T., et al. (2012). A new taxonomy for describing and defining adherence to medications. Br. J. Clin. Pharmacol. 73, 691–705. doi:10.1111/j.1365-2125.2012.04167.x
Wang, X., Yang, S., Zhang, J., Wang, M., Zhang, J., Yang, W., et al. (2022). Transformer-based unsupervised contrastive learning for histopathological image classification. Med. Image Anal. 81, 102559. doi:10.1016/j.media.2022.102559
Williams, D., Smith, J., and Johnson, E. (2009). Population pharmacokinetics of sirolimus in de novo renal transplant recipients: a multicenter study. Br. J. Clin. Pharmacol. 67, 603–610. doi:10.1111/j.1365-2125.2009.03392.x
Wu, L., Wen, Y., Leng, D., Zhang, Q., Dai, C., Wang, Z., et al. (2022). Machine learning methods, databases and tools for drug combination prediction. Briefings Bioinforma. 23, bbab355. doi:10.1093/bib/bbab355
Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., et al. (2021). Medmnist v2 - a large-scale lightweight benchmark for 2d and 3d biomedical image classification. Sci. Data 10, 41. doi:10.1038/s41597-022-01721-8
Yin, Y., Jin, W., Bai, J., Liu, R., and Zhen, H. (2022). “Smil-deit: multiple instance learning and self-supervised vision transformer network for early alzheimer’s disease classification,” in 2022 international joint conference on neural networks (IJCNN), 1–6.
Zhang, C., Cai, Y., Lin, G., and Shen, C. (2020). Deepemd: few-shot image classification with differentiable earth mover’s distance and structured classifiers. Comput. Vis. Pattern Recognit., 12200–12210. doi:10.1109/cvpr42600.2020.01222
Zhang, Y., Li, W., Sun, W., Tao, R., and Du, Q. (2022). Single-source domain expansion network for cross-scene hyperspectral image classification. IEEE Trans. Image Process. 32, 1498–1512. doi:10.1109/tip.2023.3243853
Zhang, G., Wang, T., An, L., Hang, C., Wang, X., Shao, F., et al. (2024). U-shaped correlation of lymphocyte count with all-cause hospital mortality in sepsis and septic shock patients: a mimic-iv and eicu-crd database study. Int. J. Emerg. Med. 17, 101. doi:10.1186/s12245-024-00682-6
Zhang, L., Xu, P., Hao, L., Wang, L., Xu, Y., and Jiang, C. (2025). The role of transient receptor potential channels in chronic kidney disease-mineral and bone disorder. Front. Pharmacol. 16, 1583487. doi:10.3389/fphar.2025.1583487
Zheng, X., Sun, H., Lu, X., and Xie, W. (2022). Rotation-invariant attention network for hyperspectral image classification. IEEE Trans. Image Process. 31, 4251–4265. doi:10.1109/TIP.2022.3177322
Keywords: dynamic medication modeling, probabilistic inference, medication adherence, Bayesian transformer, clinical decision support
Citation: Wang X and Xie H (2025) An intelligent framework for dynamic modeling of therapeutic response using clinical compliance data. Front. Pharmacol. 16:1631599. doi: 10.3389/fphar.2025.1631599
Received: 20 May 2025; Accepted: 03 October 2025;
Published: 01 December 2025.
Edited by:
Jiao Zheng, Shanghai Jiao Tong University, ChinaReviewed by:
Teresa Gibson, Rochester Institute of Technology (RIT), United StatesFushan Tang, Zunyi Medical University, China
Copyright © 2025 Wang and Xie. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Xinyi Wang, YWRsZXJ0c2Vyb3NAaG90bWFpbC5jb20=
Xinyi Wang1*