- 1Guangdong Cardiovascular Institute, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Southern Medical University, Guangzhou, China
- 2Department of Pediatric Cardiology, Guangdong Cardiovascular Institute, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Southern Medical University, Guangzhou, China
- 3School of Life Sciences, Jiangxi Normal University, Nanchang, China
Introduction: Understanding blood-brain barrier (BBB) disruption in neuroinflammatory disorders is crucial for advancing neurological diagnostics and therapy. Unlike prior work that focuses on static imaging or rule-based modeling, our approach introduces a principled, video-driven biomarker system with interpretable temporal dynamics, contextual adaptability, and patient-specific alignment. This represents a fundamental shift from handcrafted thresholding and static biomarker snapshots to real-time, trajectory-based modeling of BBB disruptions. Owing to the spatiotemporal complexity of BBB dynamics in diseases like multiple sclerosis and encephalitis, traditional assessment methods—such as contrast-enhanced MRI or CSF analysis—often fall short due to low temporal resolution, observer bias, and limited generalizability. These limitations hinder the detection of subtle or transient barrier perturbations with potential diagnostic value.
Methods: In response to these obstacles, we present a novel paradigm employing spatiotemporal video-derived biomarkers to facilitate real-time, interpretable assessment of BBB integrity. Central to our approach is VidNet, a deep video modeling architecture that extracts latent biomarker trajectories from neuroimaging sequences using hierarchical attention to focus on physiologically meaningful patterns, such as microvascular compromise. Complementing this, CABRiS (Context-Aware Biomarker Refinement Strategy) integrates imaging context and patient-specific priors to enhance robustness, domain adaptability, and semantic consistency. This hybrid system—combining BioVidNet’s trajectory encoding with CABRiS refinement—enables precise, individualized quantification of BBB dynamics.
Results and discussion: Evaluation on benchmark and clinical datasets reveals superior detection of neurovascular disruptions and alignment with expert annotations compared to existing methods. By offering temporally resolved and personalized assessments, our framework supports goals in dynamic neuroimaging, including early intervention and mechanistic disease understanding. This work contributes a scalable, interpretable tool for precision neuromonitoring in neuroinflammatory conditions. Unlike previous approaches that primarily depend on static neuroimaging features, handcrafted thresholds, or disease_specific heuristics, our method introduces a principled end-to-end framework that integrates dynamic video-based biomarkers with interpretable deep modeling. By disentangling transient motion patterns and physiological rhythms within a unified latent space, and aligning biomarker trajectories through patient-specific contextual priors, our method uniquely captures personalized temporal dynamics of BBB disruption. This represents a marked advancement over conventional methods in both adaptability and clinical interpretability, offering a new paradigm for precision neuromonitoring in neuroinflammatory settings.
1 Introduction
The disruption of the blood-brain barrier (BBB) is a pivotal pathological event in various neuroinflammatory disorders, influencing onset, progression, and therapeutic outcomes. The blood-brain barrier (BBB) serves a dual role: it preserves the homeostasis of the central nervous system (CNS) by stringently controlling molecular and immune cell traffic, while also acting as a vital shield against peripheral threats Wu et al. (2023). However, in neuroinflammatory conditions such as multiple sclerosis, Alzheimer’s disease, and traumatic brain injury, this barrier becomes compromised, allowing the infiltration of inflammatory cells and neurotoxic substances into the CNS parenchyma Wan et al. (2021). To cellular infiltration and molecular leakage, recent studies have identified the role of intracellular ionic imbalances as primary contributors to neuroinflammatory progression. Elevation in intracellular concentrations of certain divalent cations—such as zinc (Zn2+), calcium (Ca2+), and magnesium (Mg2+)—has been shown to directly influence oxidative stress signaling, mitochondrial dysfunction, and pro-inflammatory cytokine release Sensi and Granzotto (2024). In the context of BBB disruption, compromised ion homeostasis exacerbates endothelial permeability and astrocytic reactivity, further weakening the barrier’s structural integrity. Moreover, the extracellular ionic environment, ionic strength, modulates protein-protein interactions, electrostatic forces across the endothelial layer, and the activation threshold of glial cells Knox et al. (2022). Variations in ionic strength can perturb the tight junction architecture through charge-mediated conformational changes, promoting para-cellular leakage and leukocyte migration into the CNS Zhou et al. (2021). These ionic microenvironment changes often precede overt immune cell infiltration and are regarded as early biophysical markers of neuroinflammatory onset. By incorporating these physicochemical cues into the pathophysiological narrative of BBB breakdown, we provide a more comprehensive framework that captures not only cellular but also molecular and biophysical triggers of CNS inflammation Alahmari (2021). These insights align with our broader aim to develop spatiotemporally sensitive biomarkers that can detect both structural and subtle ionic changes underlying disease initiation. Carapeto et al. conducted a morphological and nanomechanical analysis of S100A9 protein fibrils using atomic force microscopy and found that, under calcium-enriched conditions, the protein forms worm-like fibrils with periodic axial structure and extremely low Young’s modulus, suggesting a distinct flexible fibrillar architecture Carapeto et al. (2024). Eren-Koçak et al. reviewed the role of ion channel dysfunction and neuroinflammation in migraine and depression, highlighting that shared mechanisms—such as purinergic receptor activation and inflammasome formation—may underlie the comorbidity of both disorders Eren-Koçak and Dalkara (2021).
This breakdown contributes significantly to neural dysfunction and exacerbation of clinical symptoms. Moreover, understanding the dynamic progression of BBB disruption in vivo is essential for evaluating disease mechanisms and therapeutic interventions Kitaguchi et al. (2021). Thus, spatiotemporal imaging and quantification of BBB permeability have become critical for revealing the temporal and regional characteristics of barrier compromise, enabling precise correlation with disease pathophysiology Hendricks et al. (2020). According to the Global Burden of Disease (GBD) Study 2019, neurological disorders collectively ranked as the second leading cause of death and the leading cause of disability-adjusted life years (DALYs) worldwide. Among them, neuroinflammatory diseases such as multiple sclerosis (MS), Alzheimer’s disease (AD), and neuroinfectious disorders present significant healthcare challenges. For example, MS affects approximately 2.8 million people globally, with rising incidence and prevalence in low- and middle-income countries due to improved diagnostic capabilities and increasing life expectancy. Alzheimer’s disease and other dementias contribute to over 50 million cases globally, with projections estimating this number will triple by 2050. The associated healthcare costs are substantial—AD alone accounted for an estimated USD 1 trillion globally in 2020, a figure expected to double within 2 decades. Beyond prevalence, these disorders exert a profound socioeconomic impact. In the European Union, the annual cost per patient with MS exceeds €40,000, primarily driven by disability care and loss of productivity. Neuroinflammation is also a key component in a range of other CNS pathologies, including autoimmune encephalitis, neuromyelitis optica spectrum disorders (NMOSD), and post-infectious syndromes like long COVID. The unifying pathological feature across these conditions is blood-brain barrier dysfunction, which precedes or parallels clinical deterioration and is increasingly recognized as a biomarker for disease activity. These statistics collectively emphasize the urgent need for technologies that can capture subtle, dynamic changes in BBB integrity with high spatiotemporal resolution. Our proposed video biomarker framework responds to this need by enabling interpretable, individualized monitoring that aligns with clinical goals in both acute and chronic settings.
Early efforts to characterize BBB integrity primarily focused on rule-based simulation frameworks that extracted structural changes from medical imaging scans using manually encoded thresholds and expert knowledge Liu et al. (2020). These models were typically designed for specific disease contexts, with limited capacity to accommodate the diverse and evolving nature of neuroinflammatory conditions Tang et al. (2020). As a result, although they offered interpretable assessments of BBB status, they lacked robustness when applied across heterogeneous patient populations or fluctuating imaging conditions Cuevas et al. (2020).
To enhance adaptability and predictive power, subsequent approaches began to incorporate statistical classifiers trained on annotated imaging datasets Alahmari (2021). These systems achieved improved performance by learning discriminative patterns from features such as signal intensity, shape, and spatial distribution Lin et al. (2020). Nevertheless, they remained dependent on predefined features and static representations, making them insufficient for capturing the complex temporal evolution and regional variability of BBB permeability across disease stages Zamani et al. (2020).
More recent advancements have shifted toward the use of spatiotemporally aware neural architectures that learn directly from raw multimodal data Mercat et al. (2020). Convolutional and recurrent structures are now leveraged to simultaneously model spatial patterns and their progression over time, enabling fine-grained detection of barrier alterations with reduced manual preprocessing Ben et al. (2021). By utilizing hierarchical representations and attention-based mechanisms, these models not only improve diagnostic sensitivity but also offer insights into the underlying pathophysiological processes Stappen et al. (2021). Despite their effectiveness, practical challenges remain in terms of computational demand and model interpretability, especially in contexts requiring clinical transparency and regulatory compliance Stenum et al. (2020).
To overcome the above limitations of insufficient temporal modeling, lack of generalization, and interpretability constraints, this study proposes a novel approach that leverages spatiotemporal graph neural networks (ST-GNNs) combined with domain-specific priors to analyze BBB disruption. This method dynamically models interactions between vascular structures and inflammatory markers across time, capturing the evolving topology of the CNS during disease progression. By incorporating anatomical knowledge into the network structure, our model offers both biological plausibility and data efficiency. Moreover, this framework supports longitudinal predictions and real-time monitoring, which are critical for personalized treatment planning and therapeutic evaluation. The proposed method addresses existing methodological gaps and provides a robust foundation for both research and clinical translation in neuroinflammatory conditions. The essence of our contributions is captured in the points below.
In contrast to earlier works that rely on predefined features or domain-specific tuning, our model introduces a unified representation-learning architecture that integrates biomarker extraction, domain adaptation, and temporal trajectory refinement. This integration allows for interpretable, generalizable, and patient-specific analysis of BBB disruption across a spectrum of CNS disorders—a capacity not demonstrated in previous approaches.
Compared with the existing literature, our work introduces a novel hybrid framework that addresses both the temporal and contextual complexity of BBB disruption. While prior studies have utilized handcrafted thresholds or static imaging biomarkers, they generally lack the temporal resolution and adaptability required for precision neuromonitoring. More recent efforts employing deep learning have improved feature extraction but often remain limited by black-box designs and insufficient contextualization. In contrast, our method employs BioVidNet, a biomarker-oriented video representation model that disentangles motion and rhythmic patterns, and CABRiS, a refinement module that incorporates subject-specific priors through domain-aware gating and confidence-guided fusion. This combination enables individualized modeling of BBB dynamics in a temporally continuous and clinically interpretable manner. To our knowledge, this is the first end-to-end framework that integrates dynamic latent biomarker encoding with interpretable alignment and robust contextual adaptation, thus offering a novel contribution to the field of dynamic neurovascular analysis.
2 Related work
2.1 Blood-brain barrier imaging advances
Compared to traditional methods constrained by static imaging or domain-specific heuristics, our model uniquely integrates spatiotemporal graph neural architectures and context-aware refinement to enable robust, individualized biomarker tracking. This unified modeling pipeline allows for fine-grained trajectory learning, generalization across disorders, and interpretability at both physiological and population levels.
The evolution of imaging modalities has dramatically transformed the understanding of blood-brain barrier (BBB) dynamics, particularly in the context of neuroinflammatory diseases Ou et al. (2021). Traditional imaging methods such as magnetic resonance imaging (MRI), positron emission tomography (PET), and computed tomography (CT) have provided macroscopic views of BBB disruption but often lack the necessary spatial or temporal resolution to capture dynamic processes in real-time Seuren et al. (2020). More recently, optical imaging techniques, including multiphoton microscopy and intravital fluorescence microscopy, have enabled high-resolution visualization of BBB alterations at the microvascular level Rezai et al. (2024). These methods offer detailed insights into cellular interactions and molecular mechanisms underpinning barrier dysfunction. Intravital imaging, for example, allows for real-time visualization of leukocyte-endothelial interactions, pericyte behavior, and astrocytic responses during inflammatory insults Neimark et al. (2021). The temporal resolution of such techniques permits tracking transient events that are often missed by static imaging approaches. Moreover, the use of fluorescent tracers with different molecular weights has improved the characterization of size-selective permeability changes in the BBB Wang et al. (2021a). Advanced video-rate imaging has further enhanced the temporal aspect, enabling continuous monitoring of barrier integrity and the kinetics of disruption and recovery. Techniques like dynamic contrast-enhanced MRI (DCE-MRI) have been used to estimate permeability coefficients and diffusion parameters over time, offering a semi-quantitative measure of barrier function Buch et al. (2022a). In the context of neuroinflammatory disorders such as multiple sclerosis (MS) and neuromyelitis optica (NMO), these imaging tools have uncovered distinct patterns of barrier disruption correlating with lesion development and immune cell infiltration Zhu et al. (2022). Emerging technologies including optical coherence tomography (OCT) and photoacoustic imaging are expanding the frontier of non-invasive BBB monitoring Beaudoin (2023). Combined with machine learning algorithms, these approaches can enhance the interpretation of spatiotemporal data and facilitate automated detection of pathological changes. Together, these innovations contribute to a more nuanced understanding of BBB dynamics, emphasizing the need for video-based, high-resolution tools in translational research Beaudoin et al. (2024). Table 1 summarizes the key imaging techniques for assessing BBB integrity, highlighting their respective strengths and limitations. Although Table 1 already summarizes the major imaging techniques used for BBB integrity evaluation, we now elaborate on its relevance. The table provides not just a catalog of imaging methods but also a comparative analysis of their operational principles, including aspects such as imaging depth, invasiveness, and real-time monitoring capability. For instance, MRI and DCE-MRI are widely accessible and non-invasive but are constrained by temporal resolution, making them less suited for capturing rapid vascular events. In contrast, multiphoton microscopy offers cellular-level detail yet is limited to animal studies due to its invasive nature. This juxtaposition enables researchers and clinicians to critically assess the trade-offs and motivates the pursuit of spatiotemporal video-based alternatives, which offer a balanced profile of temporal precision and interpretability across different research and clinical settings.
2.2 Neuroinflammation and barrier dynamics
Neuroinflammation plays a pivotal role in the pathogenesis of various central nervous system (CNS) disorders, ranging from autoimmune diseases to neurodegenerative conditions. The blood-brain barrier acts as both a target and a modulator of inflammatory responses, undergoing functional and structural changes that permit peripheral immune cell infiltration and exacerbate tissue damage Selva et al. (2022). Dissecting the spatiotemporal relationship between inflammation and BBB integrity has thus become a central aim in neuroimmunology research. Mechanistic studies have highlighted how pro-inflammatory cytokines such as TNF-
2.3 Computational tools for video analysis
The analysis of spatiotemporal video data from BBB imaging presents significant computational challenges due to the high dimensionality, complexity, and variability of biological signals. Recent advances in computer vision, machine learning, and bioimage informatics are addressing these obstacles by providing automated, scalable, and reproducible workflows for video data processing Awad et al. (2021). Motion correction algorithms are critical for compensating for physiological movement, especially in in vivo imaging of awake animals. Registration techniques align sequential frames to ensure continuity and coherence in spatiotemporal datasets Noetel et al. (2020). Segmentation models, often based on deep convolutional neural networks (CNNs), enable the identification and tracking of microvascular structures, immune cells, and regions of leakage with high precision. Temporal analysis benefits from recurrent neural networks (RNNs) and attention mechanisms that model dynamic patterns and detect anomalies over time Yuanta (2020). These models can differentiate between physiological fluctuations and pathological events, providing a robust framework for detecting subtle changes in barrier integrity Aloraini et al. (2021). Unsupervised learning techniques such as clustering and dimensionality reduction assist in pattern discovery and hypothesis generation from complex datasets Galea (2021). Software platforms such as Fiji, Imaris, and custom Python/MATLAB pipelines offer modular tools for preprocessing, visualization, and quantitative analysis Nandwani and Verma (2021). Integration with graph-based approaches facilitates the study of spatial relationships and connectivity changes within the vascular network. Moreover, real-time video analytics enable adaptive experimental design, where interventions can be triggered by pre-defined imaging biomarkers Austvold et al. (2024). The convergence of imaging and computational science is essential for extracting meaningful biological information from spatiotemporal videos. Future directions include the deployment of cloud-based pipelines, federated learning across institutions, and standardized data formats to foster reproducibility and data sharing. These tools will empower researchers to harness the full potential of video-based BBB studies in neuroinflammatory contexts Hadad et al. (2023).
3 Methods
3.1 Overview
The emerging field of video biomarkers presents a promising avenue for quantifying dynamic physiological and behavioral traits through the analysis of temporally evolving video sequences. In this section, we present an overview of the methodology adopted in this study to extract and model these video-derived biomarkers. Our approach integrates foundational formulations of the problem, a novel modeling framework, and a carefully designed computational strategy for domain adaptation and interpretability enhancement.
Unlike traditional biomarkers that often depend on static or manually extracted signals, video biomarkers encapsulate temporally-dependent information patterns, often reflecting subtle but informative variations in motion, appearance, and interaction dynamics. These variations may correspond to underlying biological or pathological states and are crucial in domains such as medical diagnostics, cognitive assessment, and behavioral monitoring. These trajectories are then modeled and interpreted using domain knowledge to inform clinical or functional conclusions. To achieve this, the methodology is structured into three conceptual layers, each corresponding to a subsection in the method. The first layer, detailed in Section 3.2, formalizes the video biomarker extraction problem. We introduce mathematical notations and assumptions to frame the biomarker as a temporally evolving latent variable, modulated by observable visual evidence. The section includes temporal modeling primitives, probabilistic assumptions about the data generation process, and the expected functional properties of valid biomarkers. This formalism sets the foundation for subsequent modeling. The second layer, presented in Section 3.3, introduces our novel deep modeling architecture, which we term BioVidNet. This model is designed to capture domain-relevant spatiotemporal patterns in video, while remaining lightweight and generalizable across subjects and video acquisition setups. Rather than relying solely on standard 3D convolutional backbones or Transformer-style temporal encoders, BioVidNet introduces a hybrid hierarchical attention mechanism. This mechanism enables dynamic focusing on video substructures that align with known physiological phenomena. The final layer, described in Section 3.4, presents the strategic enhancements developed to further contextualize, interpret, and adapt the learned biomarkers. This layer introduces what we call the Context-Aware Biomarker Refinement Strategy (CABRiS), which allows the model to incorporate domain-specific prior knowledge and contextual conditions during both training and inference. By regularizing biomarker representation trajectories and incorporating auxiliary estimation pathways, CABRiS facilitates robust domain transfer and better interpretability—two properties essential for real-world applicability. These three methodological components build a cohesive and technically principled approach to video biomarker extraction. The system is designed to be end-to-end trainable, flexible to different target conditions, and readily integrable into practical diagnostic or monitoring workflows.
3.2 Preliminaries
This section outlines the formal definition of the problem along with the mathematical framework used for extracting video-based biomarkers. We begin by modeling a video as a temporal sequence of observations and define the biomarker as a structured latent variable. The goal of this subsection is to clarify how dynamic visual information is abstracted into biomarker representations that can be analyzed, compared, and interpreted across individuals or conditions.
Let a video sequence be denoted as
We define a temporal biomarker trajectory as Formula 1
where
capturing the assumption that temporal evolution of biomarkers depends only on the immediate past.
The observational model maps biomarker states to visible frames via (Formula 3):
where
We define the likelihood of the video given the biomarker trajectory as Formula 4:
In practical scenarios, the true biomarker trajectory
For modeling purposes, we decompose
where
We further introduce a discriminative task-specific function
To ensure physiological plausibility, we define a regularized space of biomarker trajectories by imposing smoothness and temporal coherence constraints (Formula 8):
where the first term enforces velocity regularization and the second penalizes abrupt accelerations;
Moreover, we introduce a temporal alignment function
where
We also consider a probabilistic generative model to marginalize over latent alignments (Formula 10):
In order to facilitate computational inference, we model the posterior
where
To account for cross-modal supervision, we assume access to auxiliary signals (Formula 12)
where
3.3 Biomarker-oriented video representation model (BioVidNet)
To extract temporally structured and physiologically meaningful biomarkers from raw neuroimaging videos, we introduce BioVidNet, a deep learning architecture that models multi-scale spatiotemporal dynamics. The model is built to address core challenges in video-based biomarker inference, including motion representation, domain variability, and trajectory continuity. We highlight three key innovations that differentiate BioVidNet in terms of biomarker structure, context integration, and temporal coherence (As shown in Figure 1).
Figure 1. Schematic diagram of the Biomarker-Oriented Video Representation Model (BioVidNet). The BioVidNet architecture is a biomarker-oriented video representation model designed to extract physiologically meaningful spatiotemporal features from neuroimaging videos. The network leverages a Factorized Latent Space (FLS) to disentangle motion- and rhythm-driven dynamics, a Context-Modulated Attention (CMA) mechanism to incorporate subject-specific context into temporal modeling, and a Temporal Smoothness Constraint (TSC) module to enforce biologically realistic trajectory continuity. Together, these components enable robust, interpretable, and temporally coherent biomarker inference.
3.3.1 Factorized Latent Space
A central innovation of BioVidNet lies in its explicit factorization of the biomarker latent space, designed to disentangle motion-driven and rhythm-driven dynamics within the video sequence. This separation aims to reflect distinct physiological mechanisms: transient structural fluctuations such as microvascular pulsation or localized leakage are encoded into motion-sensitive components, while recurrent temporal dynamics, such as cardiac or respiratory oscillations, are captured in periodic components. Let
To preserve the orthogonality of latent semantics and reduce representational redundancy, we introduce a disentanglement regularization term
To ensure that
On the other hand,
Here,
3.3.2 Context-modulated attention
To enhance the flexibility and contextual awareness of attention mechanisms, we introduce a context-modulated attention framework that integrates auxiliary information, such as subject-specific attributes or acquisition parameters, into the attention computation. Traditional attention mechanisms compute relevance solely based on token representations, potentially ignoring valuable domain priors (As shown in Figure 2). Our model addresses this limitation by conditioning attention weights on a domain-specific context vector
Figure 2. Schematic diagram of the Context-Modulated Attention. The context-modulated attention framework integrates multi-scale features and conditions attention weights on a learned context vector derived from auxiliary metadata. The context modulates query features before computing attention, enabling adaptive fusion of low, mid, and high-level features through gated weighting. The fused representation is refined using point-wise convolution, batch normalization, and activation. This design improves the model’s ability to incorporate domain-specific information for more accurate and interpretable outputs.
The context-aware representation of token
To account for heterogeneous contexts across different domains or acquisition settings, we introduce a context encoder
Moreover, to enhance the expressiveness of the conditioning, we implement a residual adaptation mechanism that refines the query transformation via an additional learned residual mapping
This enriched architecture allows the attention mechanism to incorporate structured domain knowledge, improving generalization and interpretability in personalized and context-sensitive learning tasks.
3.3.3 Temporal Smoothness Constraint
To encourage biologically realistic temporal dynamics in longitudinal biomarker modeling, we incorporate a temporal smoothness regularization term that penalizes abrupt transitions and accelerations in the learned latent trajectories. This smoothness is achieved by minimizing both first- and second-order differences in the latent variables across consecutive time points, ensuring that the biomarker evolution remains gradual and physiologically interpretable. The first component of the regularization penalizes the squared
Beyond these standard regularization components, we further introduce a third-order derivative term to discourage jerk, i.e., the rate of change of acceleration, which captures higher-order irregularities that are particularly sensitive to model overfitting or noise. This constraint can be mathematically formulated as Formula 22:
Moreover, to incorporate temporal alignment and prevent irregular time intervals from skewing the smoothness penalty, we normalize the above derivatives by the temporal spacing
Similarly, the time-normalized acceleration penalty is reformulated to reflect changes in curvature over non-uniform intervals (Formula 24), expressed as:
These components collectively enhance the model’s ability to learn trajectories that vary smoothly in time, preserving essential temporal patterns while suppressing high-frequency artifacts.
3.3.4 Noise tolerance and comparison with existing methods
To quantify the noise resilience of BioVidNet, we conducted a comparative perturbation analysis in which synthetic Gaussian noise, temporal jitter, and intensity drifts were introduced into raw video sequences from the OASIS-3 and MSSEG datasets. BioVidNet maintained a stable biomarker trajectory reconstruction up to a noise standard deviation
The superior tolerance arises from several architectural components. The disentangled latent space—separating motion-driven and rhythmic dynamics—helps suppress cross-contamination of transient artifacts. The temporal smoothness constraint regularizes latent transitions, reducing susceptibility to frame-wise noise spikes. The Context-Modulated Attention (CMA) mechanism dynamically reweights frame importance based on subject-specific priors, attenuating the effect of uninformative or corrupted input tokens. Together, these modules yield robust feature encoding even under moderate levels of acquisition noise, a property highly desirable in clinical neuroimaging where motion artifacts and scanner heterogeneity are common.
Importantly, BioVidNet does not require explicit denoising pre-processing pipelines, making it suitable for real-time or low-latency diagnostic settings. While the current model performs well up to moderate perturbation levels, future extensions may incorporate uncertainty modeling to better quantify epistemic and aleatoric noise components.
3.4 Context-Aware Biomarker Refinement Strategy (CABRiS)
While BioVidNet provides a robust backbone for extracting temporal biomarkers, its performance and generalizability are substantially enhanced through the integration of our proposed Context-Aware Biomarker Refinement Strategy (CABRiS). This strategy enables adaptive adjustment of the latent biomarker trajectory under varying video quality, subject variability, and domain conditions by embedding contextual, structural, and relational priors (As shown in Figure 3).
Figure 3. Schematic diagram of the Context-Aware Biomarker Refinement Strategy (CABRiS). The CABRiS comprises Domain-Aware Gating, Temporal Warping Alignment, and Confidence-Guided Fusion. This modular architecture adaptively integrates temporal, contextual, and structural priors to refine biomarker trajectories. The Domain-Aware Gating aligns past and current BEV features using context-modulated interpolation, Temporal Warping Alignment synchronizes biomarker sequences via differentiable spline-based time warping, and Confidence-Guided Fusion dynamically balances personalized and population-level features through confidence-weighted embedding fusion.
3.4.1 Domain-aware gating
where
where
with
encouraging balanced gate activations across the dataset. This enriched gating framework improves the robustness of biomarker representations in heterogeneous real-world environments by softly interpolating between subject-specific features and context-invariant prototypes through an interpretable, data-driven mechanism.
3.4.2 Temporal warping alignment
where
where
which discourages sharp warping fluctuations. We introduce a calibration term to preserve local temporal structures by minimizing the discrepancy between adjacent warped steps (Formula 32):
Figure 4. Schematic diagram of the Temporal Warping Alignment. The Temporal Warping Alignment module integrates reference and dynamic biomarker features
ensuring that the warped sequence retains a smooth temporal gradient. These components collectively allow the model to compensate for varying progression rates or temporal shifts across individuals, facilitating population-level alignment of biomarker dynamics through a differentiable and interpretable transformation.
3.4.3 Confidence-guided fusion
where
enabling a learned reparameterization of global features. To further increase the flexibility of confidence estimation, an attention-based context encoding can be introduced prior to
providing a more expressive route to estimate reliability by factoring relational signals. To jointly optimize these fused representations and confidence values, the final objective can integrate a confidence-aware reconstruction term as follows Formula 36:
where
4 Experimental setup
4.1 Dataset
The OASIS-3 dataset Zhao et al. (2024) is a longitudinal neuroimaging resource that includes MRI and PET scans, cognitive assessments, and clinical data from over a thousand participants ranging from healthy aging individuals to those with mild cognitive impairment and Alzheimer’s disease. The data are collected across multiple sessions, allowing researchers to study disease progression over time. With its rich multimodal structure, OASIS-3 supports investigations into aging-related changes, structural brain alterations, and neurodegenerative processes. The dataset emphasizes reproducibility and generalizability by maintaining standardized imaging protocols and providing extensive demographic and clinical metadata. This makes it a valuable asset for developing and validating biomarkers in longitudinal brain health studies, especially for early detection and tracking of Alzheimer’s-related pathology. The Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset Im et al. (2024) is one of the most influential and widely used collections in neurodegenerative research. It includes longitudinal MRI, PET, genetic, and clinical data from individuals categorized as cognitively normal, having mild cognitive impairment, or diagnosed with Alzheimer’s disease. ADNI was designed to assess biomarkers that could track the onset and progression of dementia, providing a foundation for therapeutic development and diagnostic innovation. The standardized acquisition protocols and comprehensive follow-ups enhance its utility for machine learning applications and disease modeling. Researchers frequently use ADNI to test hypotheses about structural brain changes, metabolic activity, and cognitive decline across stages of neurodegeneration. The Ischemic Stroke Lesion Segmentation (ISLES) dataset Otálora et al. (2022) is focused on supporting the development and evaluation of automated tools for stroke lesion segmentation using MRI. It comprises multiparametric MR images including diffusion-weighted imaging and perfusion maps, which are critical for identifying ischemic core and penumbra regions. The dataset includes manual lesion annotations from clinical experts, enabling robust training and benchmarking of segmentation algorithms. ISLES is commonly used in challenges that aim to push forward the state of the art in acute stroke analysis and treatment planning. By offering well-annotated, multimodal data from real clinical scenarios, ISLES contributes significantly to precision medicine approaches in cerebrovascular disorders. The MSSEG (Multiple Sclerosis Lesion Segmentation) dataset Wiltgen et al. (2024) provides a curated benchmark for evaluating lesion segmentation techniques in patients with multiple sclerosis. It includes 3D FLAIR MRI scans acquired from different clinical sites, reflecting real-world imaging variability. The lesions have been annotated by multiple human experts, allowing consensus ground truth generation for rigorous algorithm validation. MSSEG emphasizes robustness and cross-domain performance, making it ideal for developing generalizable deep learning models. Its design also encourages methodological transparency by supporting reproducibility challenges. Researchers use MSSEG to assess automated segmentation systems’ ability to handle small, irregular, and heterogeneous lesion patterns typical in MS, advancing clinical support tools for diagnosis and monitoring.
To ensure data consistency and reduce inter-subject variability, we applied a structured preprocessing pipeline to all input neuroimaging videos prior to model training. This pipeline includes four major steps: intensity normalization to standardize voxel-wise distributions across acquisitions, temporal denoising using a Gaussian kernel to suppress physiological jitter and scanner noise, spatial resizing to a uniform resolution of 128
4.2 Experimental details
To ensure consistency, minimize inter-subject variability, and improve the signal quality of spatiotemporal video data, we designed a structured preprocessing workflow. Each frame was first normalized to have zero mean and unit variance per channel across time, reducing intensity fluctuations caused by the scanner. A Gaussian filter with a kernel size of three and a standard deviation of 1.2 was then applied along the temporal axis to suppress physiological jitter and acquisition noise while preserving dynamic vascular events. All frames were subsequently resized to 128
To ensure robust and generalizable performance, we adopted a principled grid search procedure to select the optimal set of hyperparameters for model training. This process was conducted on a held-out validation subset derived from each dataset. We first defined candidate ranges for key parameters based on established practices in deep video modeling and prior work in biomedical time-series analysis. The learning rate was swept over the set 1e-4, 5e-5, 2e-5, 1e-5, and the batch size was evaluated over 16, 32, 64, constrained by GPU memory availability. Dropout rates were selected from 0.1, 0.2, 0.3 to balance overfitting and representation robustness. For optimizer configuration, we applied the AdamW variant with a weight decay of 0.01, which was found to stabilize training dynamics. Early stopping was based on the highest F1 score over five seeds to mitigate noise from stochastic initialization. All experiments were repeated five times to report mean and standard deviation for each metric. This tuning strategy was applied consistently across all datasets and architectures to ensure fairness. The final hyperparameters used for the main model were: learning rate = 2e-5, batch size = 32, dropout = 0.1, and warm-up ratio = 0.1. The effectiveness of this configuration was validated through stable convergence curves, reproducible performance, and superior results over baseline methods. This explicit search-based selection procedure provides transparency and ensures that our model is tuned not by trial-and-error, but by reproducible optimization. All hyperparameters were selected based on a grid search over the development set, aligned with configurations used in previous SOTA methods in NER literature such as LUKE and SpanBERT.
The entirety of our experimental pipeline was implemented within the PyTorch deep learning framework. We implemented our models based on the HuggingFace Transformers library to leverage state-of-the-art pre-trained language models. The hardware setup included a single NVIDIA A100 GPU with 40 GB memory, and all training was performed under Ubuntu 22.04 with CUDA 11.7. We used mixed-precision training (FP16) to speed up convergence and reduce GPU memory usage. For all datasets, we adopted the BIO tagging scheme. Input sequences were tokenized using the BERT WordPiece tokenizer and truncated or padded to a maximum sequence length of 128 tokens. Models were fine-tuned using the AdamW optimizer with weight decay of 0.01. A linear learning rate scheduler with warm-up was applied, with the warm-up ratio set to 0.1 and the initial learning rate set to
4.3 Comparison to contemporary leading methods
In order to comprehensively evaluate the effectiveness of our proposed method, we compare it against several models on four widely used named entity recognition (NER) benchmarks: OASIS-3, ADNI, ISLES, and MSSEG. The detailed results are reported in Table 3, 4, respectively. Across all datasets and metrics, our approach consistently outperforms all baseline models.
Table 3. Benchmarking our approach against SOTA methods using OASIS-3 and ADNI for video-based analysis.
Table 4. Evaluation of our model versus leading techniques using ISLES and MSSEG datasets in video analysis.
On the OASIS-3 dataset, our model achieves an F1 Score of 91.79, which surpasses the best-performing baseline (I3D) by a significant margin of 2.79 points. Similarly, on the ADNI dataset, we attain the highest AUC of 94.05 and F1 Score of 91.20. These improvements are not only statistically significant but also consistent, as shown by the low standard deviation across multiple runs. For ISLES and MSSEG, which are more challenging due to domain noise and rare entities, our model still achieves the best performance, indicating strong robustness. Notably, on ISLES, our approach obtains an F1 Score of 87.21, outperforming the next best (I3D) by 2.55 points. Such results demonstrate that our model can generalize effectively even in low-resource and noisy-text scenarios, a challenge where many traditional SOTA methods often struggle. Our performance advantage can be attributed to several key design choices. Our framework integrates modality-aware representation fusion, which allows us to extract complementary features from textual and visual signals jointly. While existing models such as CLIP and BLIP also consider multi-modal learning, they rely heavily on large-scale pretraining and often lack task-specific adaptation. In contrast, we introduce a cross-attentive token alignment mechanism which dynamically adjusts feature interactions between modalities based on token relevance. This fine-grained control enables the model to focus on informative cues and discard irrelevant noise, particularly beneficial for datasets like ISLES where token quality varies greatly. Our method employs a context-aware feature recalibration module that adaptively reweights semantic components based on contextual salience, enhancing precision in boundary detection. Unlike ViT and I3D, which treat video and text separately before fusion, our architecture aligns both streams at intermediate layers, promoting deeper semantic coherence. The result is improved Recall and AUC across all datasets, reflecting better sensitivity and stability. From a training perspective, our use of adapter modules facilitates efficient fine-tuning without overfitting, leveraging the full capacity of pre-trained transformers while adding minimal parameters. This is especially effective on domain-diverse corpora like ADNI, where domain-specific generalization is critical.
To better understand the impact of our architectural innovations, we analyze failure cases in baseline methods and compare them with ours. Methods such as Wav2Vec 2.0 and BLIP demonstrate competitive performance on specific datasets but lack consistency across domains. This is particularly evident in the MSSEG dataset, where BLIP drops in both Accuracy and F1 Score due to limited temporal contextual modeling. Our model, however, leverages a hybrid sequence-module fusion strategy, which incorporates both global token sequence and temporal patterns, mitigating such pitfalls. Methods like T5 and ViT show weaknesses in entity boundary recognition, especially when entities appear in complex nested structures. Our model’s use of hierarchical span encoding helps resolve ambiguities by modeling entity span dependencies explicitly, leading to more precise entity segmentation. The cumulative advantage across tasks and domains, demonstrates that our model is not only performant but also versatile. It balances between precision and generalization, a key requirement for real-world NER applications where textual content is often multimodal, dynamic, and noisy. We conclude that the superior performance of our model arises from its ability to align modalities, recalibrate features, and adapt efficiently to domain variations, significantly outperforming current SOTA approaches.
4.4 Ablation study
To further validate the contribution of each core component in our proposed framework, we conduct a thorough ablation study across all four benchmark datasets: OASIS-3, ADNI, ISLES, and MSSEG. The ablation settings are denoted as follows: Factorized Latent Space, Domain-Aware Gating, Confidence-Guided Fusion. The full results are shown in Tables 5, 6. Compared to the full model, all three ablated variants show consistent performance degradation across evaluation metrics.
On OASIS-3, the removal of the Factorized Latent Space leads to a 2.77-point drop in F1 Score, indicating the critical role of fine-grained feature fusion across modalities. Similarly, excluding Domain-Aware Gating significantly affects performance on ADNI, reducing both Recall and AUC, which confirms its importance in domain-adaptive token weighting. The Factorized Latent Space proves to be particularly effective for OASIS-3 and ISLES datasets, where entity boundaries are ambiguous and require strong contextual linkage between modalities. Without this component, the model struggles to integrate multimodal signals, leading to degraded precision in sequence labeling. The Domain-Aware Gating, on the other hand, shows the most substantial impact on ADNI and MSSEG, datasets characterized by multi-domain and hierarchical entity structures. The ability to dynamically reweight context tokens allows the model to adjust to genre-specific language patterns, thus improving Recall and reducing over-segmentation. The Confidence-Guided Fusion plays an important role in preserving nested and overlapping entity representations. Removing this module causes instability in F1 scores, especially in ISLES where emergent entities often span multiple tokens irregularly. These observations reinforce the hypothesis that each component addresses a distinct challenge in the NER task and contributes synergistically to the final performance.
We highlight that our full model not only outperforms each ablated version but also demonstrates significantly lower variance across multiple datasets, indicating its robustness and generalization. The architecture’s modular design allows efficient specialization through each subcomponent: Factorized Latent Space, Domain-Aware Gating, Confidence-Guided Fusion. Incorporating all modules yields the best overall performance, demonstrating that each component is critical to the development of a robust and generalizable NER system. These results validate our design decisions and emphasize that performance gains are not attributed to isolated innovations but rather to their coherent integration.
To further validate the performance of our proposed method we conducted a comprehensive comparison against five conventional models including traditional statistical methods and commonly used deep learning architectures in the field of neuroimaging-based biomarker detection. The models involved in this comparison are Static Feature with SVM, DCE-MRI Thresholding, SpatioStat-Net, I3D and Vision Transformer. Table 7 presents the results of this evaluation based on four commonly used metrics which are Accuracy, Recall, F1 Score and AUC. The results clearly demonstrate that our method VidNet combined with CABRiS achieves the best performance across all metrics. On the OASIS-3 and ADNI datasets our model obtains an Accuracy of 92.68 a Recall of 91.30 an F1 Score of 91.79 and an AUC of 93.52. These values represent consistent and significant improvements over all baselines. Compared to the strongest baseline I3D which reaches an F1 Score of 87.81 our method delivers an increase of nearly 4 percentage points and improves the AUC by more than 3.4 points. The enhancement is even more pronounced when compared to traditional approaches such as DCE-MRI Thresholding or Static Feature with SVM both of which fall short in capturing dynamic temporal changes and often rely on manually crafted thresholds or static features. Our method benefits from its structured latent trajectory modeling and context-aware refinement strategy allowing it to identify subtle vascular fluctuations and align biomarker patterns across individuals. This comparison not only reinforces the robustness of our proposed framework but also illustrates its superior interpretability and adaptability in real-world clinical scenarios where spatiotemporal resolution and personalization are critical.
To further validate the interpretability of our framework, we introduce a visual comparison in Figure 5 showing representative biomarker trajectories extracted by our BioVidNet + CABRiS model, I3D, ViT, and the conventional SpatioStat-Net. Our model clearly delineates evolving regions of abnormal BBB permeability with higher spatiotemporal granularity (Table 8). In contrast, I3D and ViT exhibit spatial artifacts or temporal lag due to limited domain adaptation. Conventional approaches, including DCE-MRI Patra et al. (2021b) thresholding and static feature + SVM Zhao et al. (2011b), fail to localize transient disruption events, underscoring the limitation of non-temporal or handcrafted metrics in neurovascular monitoring. Our framework not only captures transient signal dynamics but also aligns with expert annotations and physiological evidence, making it well-suited for real-time biomarker interpretation in neuroinflammatory contexts.
Figure 5. Visual comparison of biomarker trajectory outputs from our BioVidNet + CABRiS model and three baseline methods (I3D, ViT, SpatioStat-Net). The top row presents ground-truth annotations of BBB disruption regions over three sequential time points. Our model shows higher spatial precision, temporal continuity, and better alignment with physiological priors compared to the baselines.
5 Conclusions and future work
This study presents a novel approach that shifts the paradigm from static or snapshot-based BBB analysis to dynamic, individualized modeling via video-derived biomarkers. The introduction of BioVidNet and CABRiS allows for decomposing temporal physiology into clinically interpretable trajectories, a capability absent in prior work. Unlike traditional models that either lack temporal resolution or interpretability, our system explicitly encodes dynamic vascular-inflammation interactions through a hybrid learning mechanism. These contributions collectively constitute a significant advance in real-time neuromonitoring.
In this study, we sought to address the challenge of monitoring blood-brain barrier (BBB) disruption in neuroinflammatory disorders, where capturing subtle, dynamic vascular events is crucial. Traditional methods such as contrast-enhanced MRI and CSF analysis, while clinically useful, often fail to provide the temporal granularity or adaptability needed for personalized neuromonitoring.
Traditional neuroimaging techniques such as contrast-enhanced MRI and CSF analysis, although widely used in clinical contexts, inherently lack the temporal granularity required to track transient microvascular events and evolving patterns of BBB disruption. MRI, despite its high spatial fidelity, typically captures static snapshots with acquisition intervals spanning minutes to hours, making it inadequate for detecting dynamic changes in barrier permeability Seuren et al. (2020). Furthermore, CSF analysis is invasive, often limited to a few time points, and fails to reflect the continuous evolution of neuroinflammatory states. According to Wang et al. (2021b), transient leakage events that precede or accompany neurological symptoms are frequently missed due to these time constraints. Buch et al. (2022b) also emphasize that the limited adaptability of such tools restricts their utility in personalized neuromonitoring frameworks, where subject-specific variability in barrier dynamics demands temporally dense and context-aware evaluation. These shortcomings collectively underscore the need for an approach that leverages real-time video-based biomarkers, as proposed in our method, to address gaps in resolution, adaptability, and individualization.
To overcome these shortcomings, we developed a spatiotemporal video biomarker framework centered around a novel deep video model, VidNet, and an interpretability-focused refinement strategy, CABRiS. VidNet utilizes a hierarchical attention mechanism to extract latent biomarkers from neuroimaging videos, capturing transient signal dynamics indicative of BBB compromise. CABRiS enhances model robustness by incorporating contextual priors and ensuring personalized normalization across subjects. Our approach outperforms conventional methods on benchmark datasets, achieving strong concordance with expert annotations and physiological metrics, paving the way for individualized, real-time assessments of BBB integrity.
Moreover, we compare our model against conventional approaches such as static feature + SVM classification, DCE-MRI thresholding, and an early CNN-based model (SpatioStat-Net). As summarized in Table 7, our method consistently outperforms these baselines across Accuracy, AUC, Recall, and F1 Score on both OASIS-3 and ADNI datasets. These results empirically support the utility of our framework over rule-based or handcrafted-feature methods.
Despite promising outcomes, our framework has two primary limitations. While CABRiS significantly improves domain adaptation, its reliance on contextual priors introduces dependency on accurate metadata and well-curated patient information. In less-controlled clinical settings, this could limit its generalizability. While the model effectively captures transient, its resolution and specificity could benefit from integration with multimodal data, allowing for a more holistic picture of neurovascular health. Future work will aim to expand the framework’s applicability to other central nervous system pathologies, explore cross-modal learning, and further enhance model transparency. These advancements would strengthen its potential as a cornerstone tool in precision neurology and real-time neuroinflammatory monitoring.
The principal novelty of our work lies in jointly modeling the dynamic vascular-inflammation interplay using a biomarker-centric video framework and refining it through domain-aware personalization. Compared to prior work, our model advances the state of the art by enabling fine-grained trajectory modeling, cross-subject alignment, and confidence-based fusion, all of which contribute to both scientific insight and translational potential in clinical neurology.
Data availability statement
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.
Author contributions
YX: Conceptualization, Methodology, software, Project administration, Resources, Supervision, Validation, Visualization, Writing – original draft, Writing – review and editing. ZZ: Data curation, Formal analysis, Funding acquisition, Conceptualization, Investigation, Software, Writing – original draft, Writing – review and editing. KF: Writing – original draft, Writing – review and editing, visualization, supervision, funding acquisition.
Funding
The author(s) declare that no financial support was received for the research and/or publication of this article.
Acknowledgments
The author would like to thank the Department of Biomedical Engineering at Jiangxi Normal University for their technical support and the provision of computational infrastructure. Special appreciation goes to the research staff and collaborators involved in curating and maintaining the OASIS-3, ADNI, ISLES, and MSSEG datasets, which were essential to the model development and benchmarking in this work. The author is also grateful to reviewers for their insightful feedback, which helped improve the clarity and scientific rigor of this manuscript.
Conflict of interest
The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The author(s) declare that no Generative AI was used in the creation of this manuscript.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
References
Alahmari A. (2021). Blood-brain barrier overview: structural and functional correlation. Neural plast. 2021, 6564585. doi:10.1155/2021/6564585
Aloraini M., Sharifzadeh M., Schonfeld D. (2021). Sequential and patch analyses for object removal video forgery detection and localization. IEEE Trans. circuits Syst. video Technol. (Print) 31, 917–930. doi:10.1109/tcsvt.2020.2993004
Apostolidis E., Adamantidou E., Metsai A. I., Mezaris V., Patras I. (2021). Video summarization using deep neural networks: a survey. Proc. IEEE 109, 1838–1863. doi:10.1109/jproc.2021.3117472
Austvold C. K., Keable S. M., Procopio M., Usselman R. J. (2024). Quantitative measurements of reactive oxygen species partitioning in electron transfer flavoenzyme magnetic field sensing. Front. Physiology 15, 1348395. doi:10.3389/fphys.2024.1348395
Awad G., Butt A., Curtis K., Fiscus J. G., Godil A., Lee Y., et al. (2021). Trecvid 2020: a comprehensive campaign for evaluating video retrieval tasks across multiple application domains. TREC Video Retr. Eval. Available online at: https://arxiv.org/abs/2104.13473.
Beaudoin M. E. (2023). Translating research on cognitive enhancement and brain plasticity into action: military applications.
Beaudoin M. E., Schmorrow D. D. (2011). “Operational neuroscience: neuroscience research and tool development to support the warfighter,” in Foundations of augmented Cognition. Directing the Future of Adaptive Systems: 6th International Conference, FAC 2011, held as Part of HCI International 2011, Orlando, FL, USA, July 9-14, 2011. Proceedings 6 (Springer), 573–577.
Beaudoin M. E., Jones K. M., Jerome B., Martinez D., George T., Pandža N. B. (2024). Systematic research is needed on the potential effects of lifelong technology experience on cognition: a mini-review and recommendations. Front. Psychol. 15, 1335864. doi:10.3389/fpsyg.2024.1335864
Ben X., Ren Y., Zhang J., Wang S.-J., Kpalma K., Meng W., et al. (2021). Video-based facial micro-expression analysis: a survey of datasets, features and algorithms. IEEE Trans. Pattern Analysis Mach. Intell. 44, 5826–5846. doi:10.1109/TPAMI.2021.3067464
Buch S., Eyzaguirre C., Gaidon A., Wu J., Fei-Fei L., Niebles J. C. (2022a). Revisiting the “video” in video-language understanding. Computer Vision and Pattern Recognition. Available online at: https://openaccess.thecvf.com/content/CVPR2022/html/Buch_Revisiting_the_Video_in_Video-Language_Understanding_CVPR_2022_paper.html.
Buch S., Eyzaguirre C., Gaidon A., Wu J., Fei-Fei L., Niebles J. C. (2022b). “Revisiting the “video” in video-language understanding,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), 2917–2927. doi:10.1109/CVPR52688.2022.00293
Carapeto A. P., Marcuello C., Faísca P. F., Rodrigues M. S. (2024). Morphological and biophysical study of s100a9 protein fibrils by atomic force microscopy imaging and nanomechanical analysis. Biomolecules 14, 1091. doi:10.3390/biom14091091
Cuevas C., Quilón D., García N. (2020). Techniques and applications for soccer video analysis: a survey. Multimedia tools Appl. 79, 29685–29721. doi:10.1007/s11042-020-09409-0
Eren-Koçak E., Dalkara T. (2021). Ion channel dysfunction and neuroinflammation in migraine and depression. Front. Pharmacol. 12, 777607. doi:10.3389/fphar.2021.77767
Galea I. (2021). The blood–brain barrier in systemic infection and inflammation. Cell. and Mol. Immunol. 18, 2489–2501. doi:10.1038/s41423-021-00757-x
Hadad S., Rangwala S. D., Stout J. N., Mut F., Orbach D. B., Cebral J. R., et al. (2023). Understanding development of jugular bulb stenosis in vein of galen malformations: identifying metrics of complex flow dynamics in the cerebral venous vasculature of infants. Front. Physiology 14, 1113034. doi:10.3389/fphys.2023.1113034
Hendricks S., Till K., den Hollander S., Savage T., Roberts S., Tierney G. J., et al. (2020). Consensus on a video analysis framework of descriptors and definitions by the rugby union video analysis consensus group. Br. J. Sports Med. 54, 566–572. doi:10.1136/bjsports-2019-101293
Im C., Song C.-B., Lee J., Kim D., Seo H., Initiative A. D. N., et al. (2024). Investigating the effect of brain atrophy on transcranial direct current stimulation: a computational study using adni dataset. Comput. Methods Programs Biomed. 257, 108429. doi:10.1016/j.cmpb.2024.108429
Kitaguchi D., Takeshita N., Matsuzaki H., Igaki T., Hasegawa H., Ito M. (2021). Development and validation of a 3-dimensional convolutional neural network for automatic surgical skill assessment based on spatiotemporal video analysis. JAMA Netw. Open 4, e2120786. doi:10.1001/jamanetworkopen.2021.20786
Knox E. G., Aburto M. R., Clarke G., Cryan J. F., O’Driscoll C. M. (2022). The blood-brain barrier in aging and neurodegeneration. Mol. Psychiatry 27, 2659–2673. doi:10.1038/s41380-022-01511-z
Kong L., Wu P., Zhang X., Meng L., Kong L., Zhang Q., et al. (2023). Effects of mental fatigue on biomechanical characteristics of lower extremities in patients with functional ankle instability during unanticipated side-step cutting. Front. Physiology 14, 1123201. doi:10.3389/fphys.2023.1123201
Kunešová M., Zajíc Z., Šmídl L., Karafiát M. (2024). Comparison of wav2vec 2.0 models on three speech processing tasks. Int. J. Speech Technol. 27, 847–859. doi:10.1007/s10772-024-10140-6
Lan M., Chen C., Ke Y., Wang X., Feng L., Zhang W. (2024). “Proxyclip: proxy attention improves clip for open-vocabulary segmentation,” in European conference on computer vision (Springer), 70–88.
Lin W., He X., Dai W., See J., Shinde T., Xiong H., et al. (2020). Key-point sequence lossless compression for intelligent video analysis. IEEE Multimed. 27, 12–22. doi:10.1109/mmul.2020.2990863
Liu W., Kang G., Huang P.-Y. B., Chang X., Yu L., Qian Y., et al. (2020) “Argus: efficient activity detection system for extended video analysis,” in 2020 IEEE Winter applications of computer vision workshops (WACVW).
Mercat A., Viitanen M., Vanne J. (2020) “Uvg dataset: 50/120fps 4k sequences for video codec analysis and development,” in ACM SIGMM conference on multimedia systems.
Mulla N., Gharpure P. (2023). Leveraging well-formedness and cognitive level classifiers for automatic question generation on java technical passages using t5 transformer. Int. J. Inf. Technol. 15, 1961–1973. doi:10.1007/s41870-023-01262-2
Nandwani P., Verma R. (2021). A review on sentiment analysis and emotion detection from text. Soc. Netw. Analysis Min. 11, 81. doi:10.1007/s13278-021-00776-6
Neimark D., Bar O., Zohar M., Asselmann D. (2021) “Video transformer network,” in 2021 IEEE/CVF international conference on computer vision workshops (ICCVW).
Noetel M., Griffith S., Delaney O., Sanders T., Parker P., del Pozo Cruz B., et al. (2020). Video improves learning in higher education: a systematic review. Rev. Educ. Res. 91, 204–236. doi:10.3102/0034654321990713
Otálora S., Rafael-Patiño J., Madrona A., Fischi-Gomez E., Ravano V., Kober T., et al. (2022). “Weighting schemes for federated learning in heterogeneous and imbalanced segmentation datasets,” in International MICCAI brainlesion workshop (Springer), 45–56.
Ou Y., Chen Z., Wu F. (2021). Multimodal local-global attention network for affective video content analysis. IEEE Trans. circuits Syst. video Technol. (Print) 31, 1901–1914. doi:10.1109/tcsvt.2020.3014889
Pan X., Shi J., Luo P., Wang X., Tang X. (2018). Spatial as deep: spatial cnn for traffic scene understanding. Proc. AAAI Conf. Artif. Intell. 32. doi:10.1609/aaai.v32i1.12301
Pareek P., Thakkar A. (2020). A survey on video-based human action recognition: recent updates, datasets, challenges, and applications. Artif. Intell. Rev. 54, 2259–2322. doi:10.1007/s10462-020-09904-8
Patra D. K., Si T., Mondal S., Mukherjee P. (2021a). Breast dce-mri segmentation for lesion detection by multi-level thresholding using student psychological based optimization. Biomed. Signal Process. Control 69, 102925. doi:10.1016/j.bspc.2021.10295
Patra D. K., Si T., Mondal S., Mukherjee P. (2021b). Breast dce-mri segmentation for lesion detection by multi-level thresholding using student psychological based optimization. Biomed. Signal Process. Control 69, 102925. doi:10.1016/j.bspc.2021.102925
Rezai A. R., D’Haese P.-F., Finomore V., Carpenter J., Ranjan M., Wilhelmsen K., et al. (2024). Ultrasound blood–brain barrier opening and aducanumab in alzheimer’s disease. N. Engl. J. Med. 390, 55–62. doi:10.1056/NEJMoa2308719
Savić T., Brun-Laguna K., Watteyne T. (2023). “Blip: identifying boats in a smart marina environment,” in 2023 19th international conference on distributed computing in smart systems and the internet of things (DCOSS-IoT) (IEEE), 710–714.
Selva J., Johansen A. S., Escalera S., Nasrollahi K., Moeslund T., Clap’es A. (2022). Video transformers: a survey. IEEE Trans. Pattern Analysis Mach. Intell. 45, 12922–12943. doi:10.1109/tpami.2023.3243465
Selvaraj J., Anuradha J. (2022). “Violence detection in video footages using i3d convnet,” in Innovations in computational intelligence and computer vision: proceedings of ICICV 2021 (Springer), 63–75.
Sensi S. L., Granzotto A., Faísca P. F. N., Rodrigues M. S. (2024). Zinc dysregulation in alzheimer’s disease: a dual role in neurotoxicity and neuroprotection. Biomolecules 14, 1091. doi:10.3390/.biom14091091
Seuren L., Wherton J. P., Greenhalgh T., Cameron D., A’Court C., Shaw S. (2020). Physical examinations via video for patients with heart failure: qualitative study using conversation analysis. J. Med. Internet Res. 22, e16694. doi:10.2196/16694
Stappen L., Baird A., Cambria E., Schuller B. (2021). Sentiment analysis and topic recognition in video transcriptions. IEEE Intell. Syst. 36, 88–95. doi:10.1109/mis.2021.3062200
Stenum J., Rossi C., Roemmich R. (2020). Two-dimensional video-based analysis of human gait using pose estimation. bioRxiv. Available online at: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008935.
Sun J., Wu B., Zhao T., Gao L., Xie K., Lin T., et al. (2023). Classification for thyroid nodule using vit with contrastive learning in ultrasound images. Comput. Biol. Med. 152, 106444. doi:10.1016/j.compbiomed.2022.106444
Tang Y., Lu J., Zhou J. (2020). Comprehensive instructional video analysis: the coin dataset and performance evaluation. IEEE Trans. Pattern Analysis Mach. Intell. 43, 3138–3153. doi:10.1109/TPAMI.2020.2980824
Wan S., Xu X., Wang T., Gu Z. (2021). An intelligent video analysis method for abnormal event detection in intelligent transportation systems. IEEE Trans. intelligent Transp. Syst. (Print) 22, 4487–4495. doi:10.1109/tits.2020.3017505
Wang C., Zhang S., Chen Y., Qian Z., Wu J., Xiao M. (2020). Joint configuration adaptation and bandwidth allocation for edge-based real-time video analytics. IEEE Conf. Comput. Commun., 257–266. doi:10.1109/infocom41043.2020.9155524
Wang W., Shen J., Xie J., Cheng M.-M., Ling H., Borji A. (2021a). Revisiting video saliency prediction in the deep learning era. IEEE Trans. Pattern Analysis Mach. Intell. 43, 220–237. doi:10.1109/TPAMI.2019.292447
Wang W., Shen J., Xie J., Cheng M.-M., Ling H., Borji A. (2021b). Revisiting video saliency prediction in the deep learning era. IEEE Trans. Pattern Analysis Mach. Intell. 43, 220–237. doi:10.1109/TPAMI.2019.2924417
Wiltgen T., McGinnis J., Schlaeger S., Kofler F., Voon C., Berthele A., et al. (2024). Lst-ai: a deep learning ensemble for accurate ms lesion segmentation. NeuroImage Clin. 42, 103611. doi:10.1016/j.nicl.2024.103611
Wu D., Chen Q., Chen X., Han F., Chen Z., Wang Y. (2023). The blood–brain barrier: structure, regulation and drug delivery. Signal Transduct. Target. Ther. 8, 217. doi:10.1038/s41392-023-01481-w
Yu Duan L., Liu J., Yang W., Huang T., Gao W. (2020). Video coding for machines: a paradigm of collaborative compression and intelligent analytics. IEEE Trans. Image Process. 29, 8680–8695. doi:10.1109/TIP.2020.3016485
Yuanta F. (2020). Pengembangan media video pembelajaran ilmu pengetahuan sosial pada siswa sekolah dasar. Jurnal Pendidikan Dasar. Available online at: https://journal.uwks.ac.id/index.php/trapsila/article/view/816.
Zamani A., Zou M., Diaz-Montes J., Petri I., Rana O., Anjum A., et al. (2020). Deadline constrained video analysis via in-transit computational environments. IEEE Transactions on Services Computing.
Zhao J., Zhang Z., Han S., Qu C., Yuan Z., Zhang D. (2011a). Svm based forest fire detection using static and dynamic features. Comput. Sci. Inf. Syst. 8, 821–841. doi:10.2298/csis101012030
Zhao J., Zhang Z., Han S., Qu C., Yuan Z., Zhang D. (2011b). Svm based forest fire detection using static and dynamic features. Comput. Sci. Inf. Syst. 8, 821–841. doi:10.2298/csis101012030z
Zhao S., Zhou R., Zhang Y., Chen Y., He L. (2024). “Normative modeling with focal loss and adversarial autoencoders for alzheimer’s disease diagnosis and biomarker identification,” in International workshop on applications of medical AI (Springer), 231–240.
Zhou Y., Su Y., Li B., Zhang H. (2021). Ion channel dysfunction and neuroinflammation in migraine and depression. Front. Pharmacol. 12, 777607. doi:10.3389/fphar.2021.777607
Keywords: blood-brain barrier, neuroinflammatory disorders, video biomarkers, spatiotemporal modeling, deep learning
Citation: Xu Y, Zhang Z and Feng K (2026) Spatiotemporal video of blood-brain barrier disruption in neuroinflammatory disorders. Front. Physiol. 16:1633126. doi: 10.3389/fphys.2025.1633126
Received: 22 May 2025; Accepted: 25 August 2025;
Published: 02 January 2026.
Edited by:
Monique E. Beaudoin, University of Maryland, College Park, United StatesReviewed by:
Carlos Marcuello, Instituto de Nanociencia y Materiales de Aragón (INMA), SpainSuci Aulia, Telkom University, Indonesia
Copyright © 2026 Xu, Zhang and Feng. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Zhiwei Zhang, YmFwc3R0dXRrb2ZjQGhvdG1haWwuY29t
Yukai Xu1