- 1Department of Critical Care Medicine, The Affiliated Hospital of Jiangnan University, Wuxi, Jiangsu, China
- 2Jiangnan University, Wuxi, Jiangsu, China
- 3Department of Critical Care Medicine, Obstetrics and Gynecology, Hospital of Fudan University, Shanghai, China
- 4School of Public Health, Ningbo University, Ningbo, China
Introduction: To address these limitations, this study proposes a novel AI-driven solution for time series prediction in lung disease diagnosis and treatment optimization.
Methods: At the core of our framework lies PulmoNet, an anatomically-constrained, multi-scale neural architecture designed to learn structured, interpretable representations of lung-related pathologies. Unlike generic models, PulmoNet integrates bronchopulmonary anatomical priors and leverages spatial attention mechanisms to focus on critical parenchymal and vascular regions, which are often associated with early pathological changes. It also embeds hierarchical features from CT and X-ray modalities, capturing both macro-level anatomical landmarks and micro-level lesion textures. Furthermore, it constructs a latent inter-lobar graph to model spatial dependencies and anatomical adjacencies, enabling joint segmentation, classification, and feature attribution.
Results: This structured approach enhances both diagnostic performance and interpretability. Complementing this architecture, we introduce APIL (Adaptive Patho-Integrated Learning)—a two-stage, curriculum-based learning strategy that incorporates radiological priors, rule-based constraints, and multi-view consistency to improve model generalization and clinical alignment.
Discussion: APIL dynamically adjusts the learning complexity by introducing prior-informed pseudo-labels, anatomical masks, and contrastive consistency losses across views. It effectively combines weak supervision, domain adaptation, and uncertainty modeling, making it particularly adept at learning from sparse, noisy, or imbalanced datasets commonly found in clinical environments. Ultimately, this integrated framework offers a clinically meaningful, anatomically coherent, and data-efficient solution for next-generation pulmonary disease modeling.
1 Introduction
Time series prediction in lung disease diagnosis and treatment optimization has emerged as a vital area of research, driven by the increasing global burden of respiratory diseases such as chronic obstructive pulmonary disease (COPD), asthma, and pulmonary fibrosis. Early diagnosis and timely intervention significantly improve patient outcomes, yet traditional clinical approaches often struggle to effectively capture the complex and evolving nature of patient data (1). Physiological signals such as respiratory rate, oxygen saturation, and spirometry measurements not only vary over time, but treatment efficacy is also influenced by disease progression, patient-specific responses, and external factors (2). The growing demand for dynamic and personalized healthcare solutions has catalyzed interest in time series analysis to uncover temporal dependencies, predict disease trajectories, and guide optimal treatment strategies (3). Consequently, integrating advanced predictive modeling techniques into clinical decision-making systems is becoming essential for enhancing diagnostic accuracy and improving overall healthcare efficiency (4).
To address the limitations of static diagnostic rules and manually designed clinical pathways, early research in time series prediction for lung disease emphasized structured reasoning frameworks (5). These approaches typically utilized expert-curated ontologies and rule-based inference engines to model disease progression (6). For instance, Bayesian networks and decision trees were employed to encode medical knowledge and infer temporal dynamics in patient conditions (7). While structured models provided transparent reasoning aligned with clinical expertise, they struggled to scale and adapt to individual patient variations (8). Moreover, their reliance on predefined features and static disease models made them vulnerable to the noise and heterogeneity prevalent in real-world healthcare data. In response to these challenges, researchers began exploring more adaptive methods capable of discovering temporal patterns automatically and reducing dependence on manually crafted logic (9).
Building upon the need for greater flexibility, subsequent studies introduced statistical learning frameworks that leveraged temporal clinical data to predict disease outcomes (10). Techniques such as support vector machines, random forests, and autoregressive models emerged as widely used solutions for forecasting lung function decline and the risk of acute exacerbations (11). These models typically depended on extensive feature engineering, where meaningful temporal attributes were manually derived and structured into inputs for learning algorithms (12). Compared to earlier structured approaches, these statistical models exhibited better generalization to new patient populations and improved predictive performance. They often faced difficulties in modeling long-term dependencies and remained sensitive to irregular sampling and missing data, which are common challenges in clinical environments (13). As the volume and complexity of healthcare data increased, there was a growing need for modeling approaches capable of autonomously extracting intricate temporal representations (14).
Motivated by the desire to capture complex sequential dynamics without manual intervention, researchers increasingly turned to neural sequence models for clinical time series prediction (15). Recurrent neural networks (RNNs), particularly long short-term memory (LSTM) and gated recurrent unit (GRU) architectures, became foundational tools for learning temporal dependencies from patient data. These were complemented by convolutional neural networks (CNNs) for detecting local temporal patterns, and more recently, transformer-based models pre-trained on large-scale medical datasets (16). Such models demonstrated the ability to learn rich representations directly from raw, variable-length sequences, leading to improved performance across tasks such as disease progression forecasting, hospitalization risk prediction, and personalized treatment recommendation (17). Despite their successes, challenges related to model interpretability, substantial data requirements, and sensitivity to training conditions persist, spurring ongoing research into strategies that blend learned representations with clinically grounded insights (18).
Based on the above limitations, we propose a novel hybrid framework that integrates pre-trained deep time series models with a medical knowledge-guided attention mechanism to improve both predictive accuracy and interpretability in lung disease diagnosis and treatment optimization. Our approach leverages the strengths of large-scale representation learning while aligning model decisions with clinically meaningful features. We incorporate domain-specific rules into the attention layer to guide the model's focus toward relevant physiological signals during prediction. This not only enhances transparency for clinicians but also improves robustness against missing or noisy data. Furthermore, our system supports dynamic treatment recommendations by simulating multiple intervention paths and forecasting patient outcomes under each scenario. Compared with existing approaches, our method demonstrates superior adaptability, better generalization across diseases, and improved clinical usability through interpretable outputs and interactive visualization tools.
The proposed method has several key advantages:
• We introduce a novel hybrid model combining pre-trained temporal encoders with knowledge-guided attention mechanisms to enhance interpretability and accuracy.
• Our framework supports multi-scenario, high-efficiency predictions adaptable to various lung diseases, improving clinical generalization and decision-making speed.
• Experimental results on real-world lung disease datasets show up to 15% improvement in early diagnosis accuracy and a 20% increase in treatment outcome prediction precision.
2 Related work
2.1 Medical time series forecasting
Medical time series forecasting has become a pivotal area of research in leveraging historical patient data for predictive modeling and clinical decision support (19). In the context of lung disease, continuous monitoring of physiological signals such as respiratory rate, oxygen saturation (SpO2), and forced expiratory volume (FEV1) provides a temporal sequence that reflects the progression of the condition (20). Machine learning and deep learning models have demonstrated efficacy in capturing the temporal dependencies in such data (21). These models have been applied for predicting acute exacerbations in chronic obstructive pulmonary disease (COPD) patients, onset of pneumonia, and ICU admission requirements in lung cancer patients. Attention mechanisms and Transformer architectures are increasingly being integrated to address issues such as vanishing gradients and to enhance model interpretability by identifying critical temporal segments (22). Moreover, multiscale and multiresolution modeling approaches have been introduced to deal with heterogeneous time series derived from wearable sensors, electronic health records (EHRs), and imaging-derived quantitative biomarkers. The fusion of multi-source data enhances the robustness of forecasting models and allows for real-time patient-specific monitoring. Techniques such as dynamic time warping (DTW), variational autoencoders (VAEs), and contrastive learning further enrich the feature representation space, allowing models to generalize better across patient populations with varying disease trajectories (23). Evaluations often focus on metrics such as mean absolute error (MAE), root mean squared error (RMSE), and time-to-event predictions, with interpretability being a growing concern. Explainable AI (XAI) techniques such as SHAP values and temporal saliency maps are used to provide clinicians with transparent insights into the decision-making process. The goal is not only high predictive accuracy but also clinical trustworthiness and ease of integration into existing healthcare infrastructures.
2.2 Lung disease diagnosis models
The diagnosis of lung diseases using time series data encompasses various modalities including spirometry, imaging, lab results, and real-time vital sign monitoring. Machine learning models have shown promise in identifying disease patterns that may not be evident through traditional diagnostic protocols (24). Convolutional Neural Networks (CNNs) are commonly applied to sequential imaging data, while recurrent models such as LSTM and GRU are used for sequential physiological signals. These methods have been tailored to diagnose conditions such as asthma, pulmonary fibrosis, COPD, and lung cancer with high sensitivity and specificity (25). Hybrid models that combine static and dynamic features are increasingly employed. For instance, combining demographic data with dynamic respiratory data enables stratified models that can account for baseline risk factors and acute temporal variations. Ensemble methods and model stacking are also explored to enhance diagnostic performance (26). A common approach involves training separate classifiers on different feature sets and then integrating the outputs via a meta-learner. Deep learning models trained on large-scale EHR datasets have revealed latent disease states and progression pathways. The use of autoencoders and sequence-to-sequence architectures allows for unsupervised feature learning and temporal pattern extraction (27). Probabilistic models such as Hidden Markov Models (HMMs) and Gaussian Processes are also utilized for their capacity to handle uncertainty and noise, which are prevalent in clinical settings. Robustness and generalization remain key challenges, as models trained on a specific population may not perform well on others due to demographic and clinical heterogeneity. Transfer learning and domain adaptation techniques are thus explored to enable cross-hospital and cross-cohort applicability (28). Moreover, regulatory compliance and ethical considerations, particularly regarding data privacy and fairness in algorithmic decision-making, are integral to the deployment of diagnostic models in practice.
2.3 Treatment optimization strategies
Optimizing treatment for lung diseases involves dynamically adapting therapeutic interventions based on patient-specific trajectories derived from time series data (29). Reinforcement learning (RL) and its variants, such as deep Q-networks (DQNs) and policy gradient methods, have been explored for learning optimal treatment policies (30). These approaches treat the patient-healthcare interaction as a Markov Decision Process (MDP), where the objective is to maximize long-term health outcomes such as lung function improvement, hospitalization reduction, or quality-adjusted life years (QALYs). Models are trained using retrospective data from EHRs, capturing sequences of clinical decisions and corresponding outcomes (31). Such models can propose personalized treatment adjustments including medication dosage, oxygen therapy, or scheduling of diagnostic tests (32). In the domain of lung cancer, treatment optimization models are extended to radiotherapy planning, chemotherapy scheduling, and immunotherapy management, where treatment sequences have significant temporal dependencies. Bayesian optimization and adaptive trial designs are also applied to lung disease management (33). These methods enable efficient exploration of treatment-response surfaces and facilitate real-time decision-making under uncertainty. Furthermore, causal inference techniques, including counterfactual analysis and instrumental variable approaches, are employed to assess the true effect of interventions from observational data. Patient adherence and side effect profiles are crucial elements influencing treatment efficacy (34). Models increasingly integrate behavioral data and patient-reported outcomes. Mobile health (mHealth) applications and wearable sensors provide continuous data streams, which are incorporated into feedback loops to refine treatment plans. The development of digital twins—virtual patient models—further enables simulation and stress-testing of treatment strategies before clinical implementation. Despite promising advances, real-world deployment requires rigorous validation through prospective trials, clinician-in-the-loop designs, and adherence to regulatory standards. Ethical considerations surrounding autonomy, informed consent, and potential biases in treatment recommendations are also critical to ensuring safe and equitable care delivery (25).
3 Method
3.1 Overview
In this section, we provide an overview of our proposed methodology designed to address the challenges associated with modeling, understanding, and learning from data related to lung diseases. Lung diseases present a diverse and complex spectrum of pathological and physiological variations, often involving subtle anatomical and functional patterns within medical imaging and clinical records. The heterogeneous nature of such data necessitates advanced strategies capable of extracting discriminative representations, capturing domain-specific knowledge, and generalizing across patients and disease subtypes.
We refer to our full framework as TSM (Time Series Model), which consists of two major components: PulmoNet, the anatomical feature learning module responsible for segmentation, classification, and attention-based representation, and APIL (Adaptive Patho-Integrated Learning), the training strategy incorporating pathology priors and consistency constraints. TSM therefore represents the integrated system described in this study.
In Section 3.2, we begin by formalizing the core problem. We define the mathematical structures underlying the lung disease datasets, such as-dimensional imaging spaces, discrete and continuous clinical variables, and multi-modal input forms. This section introduces the notation and theoretical assumptions used throughout the paper. We establish the problem as a structured prediction or latent representation learning task, where the objective is to identify robust mappings from raw patient data to clinically meaningful categories. We contextualize the problem within existing paradigms such as multi-task learning, sparse feature extraction, and domain adaptation. In Section 3.3, we propose a novel model architecture designed for the lung disease domain, which we name PulmoNet. PulmoNet is a domain-aware representation learning framework that integrates anatomical priors with multi-scale feature hierarchies. Unlike generic convolutional backbones, PulmoNet incorporates bronchopulmonary anatomical constraints, attention modules focusing on vascular and parenchymal regions, and an embedded latent graph to capture inter-lobar relationships. This model is designed to simultaneously perform classification, segmentation, and feature attribution, enabling a unified understanding of disease manifestation. We describe the mathematical formulation of each component, provide the rationale behind architectural choices, and show how each part contributes to learning an interpretable and effective representation for lung disease modeling. In Section 3.4, we present a novel strategy, termed Adaptive Patho-Integrated Learning (APIL), which addresses the domain-specific challenges of lung disease data such as inter-subject variability, imbalance between disease categories, and limited labeled annotations. APIL combines weak supervision with structured regularization, leveraging both population-level statistics and individual-level consistency constraints. It introduces a two-stage curriculum-based learning mechanism, where coarse global structure is first inferred and then refined through pathology-informed fine-tuning. Furthermore, APIL integrates external knowledge into the training objective, allowing the model to learn beyond purely empirical signals. This strategy ensures that PulmoNet not only performs well on standard metrics but also maintains consistency with known clinical understanding. Together, the subsections of this Method chapter aim to build a coherent and principled framework for the computational analysis of lung diseases. Each section introduces new mathematical concepts, algorithmic innovations, and domain-specific considerations that together form the foundation of our approach. Through this structure, we aim to bridge the gap between data-driven models and clinically relevant interpretations, contributing to the growing field of AI-driven healthcare analytics in respiratory medicine.
3.2 Preliminaries
In this subsection, we mathematically formulate the core problem of computational modeling for lung diseases. Let us denote by the space of input data derived from clinical and imaging modalities, and by the space of disease-relevant labels or latent descriptors. Our goal is to learn a function that captures meaningful representations and mappings from heterogeneous input sources to clinical outcomes, with a focus on interpretability, generalization, and compliance with medical knowledge.
We assume the dataset is sampled from an unknown joint distribution ℙXY over , where each xi represents a multi-dimensional input, and each yi represents either a class label or a continuous outcome. In the semi-supervised setting, some of the yi may be missing. We define several modeling components and notational constructs:
We define the input xi as a tuple:
where is a volumetric image, is a vector of clinical variables, and is an optional modality-specific embedding.
Each image Ii may contain internal anatomical structures such as lung lobes, airways, and vasculature. We define a segmentation mask function that decomposes the image into labeled anatomical components:
We consider two forms of target representations: (1) discrete classification labels yi∈{0, 1, …, C−1}, representing disease stages or types, and (2) continuous latent vectors capturing patient-level pathology embeddings. In this formulation, we also define a mapping such that:
where g is a linear or non-linear classifier.
Given anatomical knowledge, the lungs can be decomposed into K spatially disjoint yet structurally interrelated regions. Define a partition function:
where M(k) is a binary mask for region k, and ⊙ denotes pointwise multiplication.
To model dependencies between regions, we define a latent undirected graph , where each node corresponds to a lung region, and edges capture anatomical or pathological correlations. We define a potential matrix Θ ∈ ℝK×K such that:
where sim(·, ·) is a similarity function.
We define a hierarchical consistency constraint:
where αk are learned importance weights for each subregion. This enforces that the global embedding reflects local features.
To preserve class structure in latent space, we define a center loss:
where cyi is the centroid of class yi in latent space.
If we have multiple cohorts, we define domain-specific distributions ℙA and ℙB. To align them, we introduce a discrepancy measure:
To exploit structural priors, we define a mask-guided regularizer:
which enforces spatial smoothness within anatomical boundaries.
We define a kernelized alignment objective to correlate latent features with pathology scores:
where si is a continuous pathology severity score, and ψ is a learned regression head.
Collecting all components, the complete optimization problem becomes:
where is a standard classification or regression loss, and λi are hyperparameters controlling the influence of each regularization term.
3.3 PulmoNet
In this subsection, we present PulmoNet, a novel anatomically-constrained neural architecture tailored for multi-modal, multi-scale lung disease analysis. The following three architectural innovations distinguish PulmoNet from prior work (as shown in Figure 1).
Figure 1. Schematic diagram of PulmoNet architecture for multi-modal lung disease analysis. The framework integrates CLIP-based text and image encoding with anatomically constrained region-specific attention, graph-based regional reasoning, and multi-modal feature fusion to enable accurate and interpretable lung disease classification. Region masks derived from anatomical priors guide the attention encoding process, while a graph attention network captures inter-region dependencies. Fused multi-modal features are subsequently used for final disease prediction.
3.3.1 Region-specific attention encoding
To effectively incorporate anatomical knowledge into the representation learning process, PulmoNet introduces a region-specific attention encoding mechanism that explicitly conditions the feature extraction on predefined lung regions (as shown in Figure 2).
Figure 2. Schematic diagram of region-specific attention encoding. The framework extracts low-scale features from volumetric lung CT scans, performs anatomical partitioning with pitch and roll prediction, and utilizes gated attention to focus on relevant lung subregions. Multi-scale convolutional operations and attention mechanisms are applied before fusing features for final downstream reasoning. The design ensures anatomical interpretability and enhances spatially localized feature extraction for lung disease analysis.
Given a volumetric CT scan I∈ℝH×W×D, an anatomical segmentation function generates a set of binary masks , where each M(k)∈{0, 1}H×W×D corresponds to a distinct anatomical subregion, such as specific lobes or bronchopulmonary segments. The input scan is then decomposed into K regional subvolumes by element-wise multiplication:
ensuring that downstream computations focus on anatomically meaningful structures. Each region-specific subvolume I(k) is encoded via a shared 3D convolutional encoder Eimg, resulting in a feature tensor . To aggregate these features, we introduce a gated attention mechanism that adaptively weights the contribution of each region based on its latent representation. The attention weight α(k) for region k is computed using a global average pooled descriptor, followed by a two-layer non-linear projection with a gating unit:
where GAP(·) denotes global average pooling across spatial dimensions, is a learnable projection matrix, w∈ℝd is a gating vector, and σ(·) is the sigmoid activation to ensure outputs are in the range [0, 1]. This attention formulation allows the model to selectively emphasize the most relevant anatomical regions depending on the underlying pathology, while suppressing irrelevant or noisy activations. The attention-weighted regional feature ri is then computed as the convex combination of pooled regional features:
effectively encoding spatial structure in a discriminative, compact vector. Notably, the attention weights {α(k)} are conditioned solely on their local region features, enabling spatial disentanglement and facilitating interpretability. This design allows PulmoNet to localize functionally significant lung areas while maintaining differentiability throughout the pipeline. To improve robustness, we further apply dropout regularization over α(k) and enforce sparsity via an entropy-based auxiliary loss:
which penalizes uniform distributions and encourages sharper attention responses. By integrating anatomical priors with attention-guided regional encoding, this mechanism enhances both model accuracy and interpretability, particularly in tasks that require spatially localized reasoning such as lesion grading or subregion-level diagnosis. The entire region-attended representation is forwarded to downstream modules, either fused with modality features or passed to a graph module for topological reasoning:
serving as the basis for structured message passing across lung regions.
3.3.2 Graph reasoning across regions
To explicitly model spatial dependencies and functional interactions among anatomical subregions of the lung, PulmoNet constructs a region-level graph , where each node corresponds to a predefined lung region derived from segmentation priors, and edges in represent anatomical adjacency or functional correlation. The initial node feature hk is computed by applying global average pooling on the region-specific feature map , resulting in a vectorized descriptor:
where C is the number of channels. To propagate information among regions, we employ a graph attention network (GAT) that performs message passing across neighboring nodes. For each node vk, the aggregated feature is computed by attending over its local neighborhood :
where ρ(·) denotes a residual transformation with a non-linear activation such as GELU or ReLU, and is a learnable message projection matrix. The edge-specific attention weight η(hj, hk) measures the importance of node j's message to node k, and is computed via an additive attention mechanism:
where [·;·] denotes vector concatenation, are linear projections of source and target features, and is a shared attention vector. This attention mechanism is capable of dynamically modulating the strength of message passing based on inter-region compatibility. To stabilize learning and prevent overfitting, multi-head attention is used in practice, and the outputs are averaged or concatenated. After message passing, the updated node embeddings are aggregated for downstream tasks by concatenation or pooling. To regularize the topology of the learned graph representation, a smoothness-inducing loss is applied to encourage feature consistency among adjacent nodes:
which penalizes abrupt changes between neighboring node embeddings and promotes anatomical coherence. The refined region graph representation is formed by concatenating the transformed node embeddings:
serving as a global structural descriptor for high-level tasks such as severity estimation or multi-label disease classification. By embedding anatomical relations in a learnable, attention-driven graph topology, PulmoNet enhances both its reasoning capability and robustness to spatial variation in disease presentation.
3.3.3 Multi-modal fusion representation
To effectively integrate heterogeneous patient data from different clinical sources, PulmoNet constructs a unified latent representation that encodes 3D imaging, tabular clinical information, and modality-specific metadata. This multi-modal fusion strategy ensures that the model benefits from both spatially localized visual patterns and complementary non-imaging information, which are often crucial for disease severity estimation and subtype discrimination. Given an input sample xi = (Ii, ci, mi), where Ii is a volumetric CT scan, is a vector of clinical variables, and encodes modality metadata, we first extract features from each modality through dedicated encoders. The image pathway employs a 3D convolutional backbone Eimg(·) to transform Ii into a deep volumetric tensor:
where C is the number of feature channels, and (H′, W′, D′) are the reduced spatial dimensions. This encoder consists of residual blocks with 3D kernels, designed to capture spatial continuity in lung structures while preserving local intensity gradients relevant to pathological features. To complement this visual embedding, clinical data ci and modality vector mi are independently projected into compact vectors using multi-layer perceptrons (MLPs):
where d1 and d2 are fixed output dimensions. These MLPs contain batch normalization and dropout layers to improve generalization and mitigate overfitting, especially when clinical inputs are sparse or imbalanced. The imaging feature is then compressed via global average pooling (GAP) to obtain a compact descriptor that summarizes high-level 3D context:
All modality-specific vectors are concatenated to form a unified latent embedding:
which serves as the input to downstream prediction heads. This fused representation captures both spatial features and global patient context, enabling PulmoNet to make nuanced, context-aware predictions. Moreover, because each branch operates independently prior to fusion, the model remains robust to partial missing modalities. In deployment settings, this fusion strategy facilitates interpretability by enabling attribution of model behavior to specific modality inputs. Attention weights or saliency maps can be computed separately for each modality to assess its contribution to final predictions, further supporting clinical integration and trust.
3.4 Adaptive Patho-Integrated Learning (APIL)
In this subsection, we present Adaptive Patho-Integrated Learning (APIL), a targeted learning strategy that enhances PulmoNet by embedding pathological structure, guiding supervision through domain knowledge, and promoting robust generalization across patient subgroups. APIL is grounded in three core innovations detailed in Figure 3.
Figure 3. Schematic diagram of the Adaptive Patho-Integrated Learning (APIL) framework. APIL enhances PulmoNet by progressively learning features through curriculum-based pretraining, embedding knowledge-guided pathological constraints, and enforcing cross-view consistency. The architecture integrates multi-scale feature extraction, curriculum modules, knowledge-driven supervision, and consistency mechanisms across different feature resolutions, enabling robust, and clinically-aligned medical image analysis.
3.4.1 Curriculum-based feature progression
APIL introduces a two-stage curriculum learning paradigm designed to guide the model from fundamental visual perception toward complex pathology reasoning (as shown in Figure 4).
Figure 4. Schematic diagram of curriculum-based feature progression. The model progressively refines features through depthwise separable convolutions (DSConv), conventional convolutions, and curriculum-based modules. Feature fusion and cross-scale aggregation are performed via a Cross-Feature Self-Attention (CFSA) block, enhancing anatomical structure understanding, and enabling robust pathology reasoning.
In the first stage, the model is pretrained using auxiliary tasks to acquire structured and discriminative anatomical feature representations, which serve as a generalized foundation for downstream tasks. During this pretraining phase, APIL jointly optimizes a masked image reconstruction objective and a contrastive learning objective to ensure both the completeness and discriminability of the learned representations. The masked reconstruction loss encourages the model to infer global structural context by recovering occluded image regions, defined as:
where Mmask denotes a random binary mask simulating missing regions. Simultaneously, a contrastive loss is employed in the SimCLR framework to encourage instance-level discrimination by maximizing the similarity between different views of the same image while minimizing similarity to other samples:
where sim(·, ·) denotes cosine similarity, and τ is a temperature hyperparameter. The total auxiliary pretraining loss is formulated as a weighted combination of both objectives:
where λa and λs are balancing coefficients. Following this representation learning stage, the model transitions to a fine-tuning phase targeting downstream pathology classification. At this stage, the network possesses strong anatomical understanding, enabling it to more effectively capture pathological cues. A task-specific adapter layer maps the learned representations to the label space, while a supervised objective further optimizes classification accuracy. To enhance training stability and feature consistency, APIL integrates a momentum encoder that provides consistent targets for contrastive supervision. The momentum update rule is defined as:
where denotes the momentum encoder parameters at time step t, θ(t) represents the online network parameters, and m is the momentum coefficient. Through this staged approach, the model progressively transitions from low-level visual understanding to high-level semantic reasoning, achieving robust and discriminative representation learning tailored for medical image analysis.
3.4.2 Knowledge-guided pathology constraints
To enhance the clinical reliability and biological plausibility of model predictions, APIL incorporates knowledge-guided constraints grounded in radiological principles and empirical clinical rules. These constraints are softly integrated into the training objective to steer the model toward medically meaningful behaviors without requiring additional labels. One pivotal constraint enforces ordinal consistency in disease severity scores, particularly important in progressive conditions such as fibrosis or edema. Letting ŝi denote the model's predicted severity score for sample i and yi the corresponding ground truth ordinal label, we define a monotonicity constraint to ensure that predictions respect the natural order of clinical progression:
where δ is a positive margin that ensures a minimum ranking separation, and I is the indicator function selecting pairs where yi should outrank yj. This loss discourages pathological reversals, such as predicting lower severity for a case known to be more advanced. In parallel, to maintain anatomical coherence, especially in thoracic imaging, APIL incorporates a bilateral symmetry constraint. This leverages the approximate mirror symmetry between the left and right lungs to detect asymmetries indicative of disease. Let and denote the extracted feature representations for the left and right lung of image i, respectively. A learnable spatial transformation operator is used to align one side to the other, yielding a constraint:
which penalizes significant divergences in symmetrical regions and implicitly regularizes feature encoding to respect anatomical structure. In cases involving focal asymmetry, such as unilateral effusions or lobar consolidations, the model learns to weigh this constraint dynamically, enabling selective symmetry enforcement. To incorporate pathophysiological constraints, a co-occurrence prior is modeled, capturing the empirical dependencies between conditions. This is formalized via a soft label correlation matrix C computed from population-level statistics. For a predicted label distribution , the constraint becomes:
which encourages predicted distributions to conform to known disease relationships. Furthermore, APIL integrates anatomical zone attention to prioritize medically significant regions, especially in conditions that manifest in spatially localized patterns. For each predicted pathology heatmap Hi, a zone-aligned attention mask Z is applied to modulate the importance of different subregions:
thereby guiding the model to focus on clinically interpretable areas such as perihilar zones for edema or peripheral zones for COVID-related opacities. Together, these constraints collectively function as a soft inductive bias, allowing the model to learn robust, clinically aligned representations even in scenarios with sparse or noisy annotations.
3.4.3 Cross-view consistency enforcement
To enhance robustness under heterogeneous imaging protocols and acquisition variations—such as differences in contrast phase, scanner type, resolution, or even imaging modality—APIL introduces a cross-view consistency mechanism that regularizes the learned representation space across paired but semantically equivalent samples. In clinical practice, the same pathology may appear under multiple imaging configurations, and achieving invariance to such variations is critical for generalizable medical AI systems. To this end, let denote a set of semantically aligned image pairs (x(a), x(b)), where each x(a) and x(b) represents distinct views of the same anatomical region or pathology. The model employs a shared encoder ϕ(·) to produce latent embeddings from both inputs and minimizes the discrepancy between these embeddings through the following consistency loss:
which encourages the model to learn view-invariant representations. This mechanism serves as an implicit regularization technique, mitigating the impact of domain-specific noise and distribution shifts. Beyond pairwise alignment, APIL further introduces a contrastive extension to this consistency principle to enhance inter-pair separability. Let and be the latent embeddings of the i-th pair. A view-aware contrastive loss is constructed as:
where sim(·, ·) denotes cosine similarity and τ is a temperature scaling parameter. This formulation not only aligns positive pairs but also explicitly separates different patient views in the embedding space, preserving patient-specific traits. Furthermore, to stabilize the encoder during domain variation, APIL adopts a momentum encoder ϕm(·) alongside the main encoder ϕ(·), enabling temporal consistency in representation learning. The momentum encoder is updated using an exponential moving average:
where θ and θm are the parameters of the online and momentum encoders, respectively. During training, embeddings from ϕ(·) and ϕm(·) are jointly used in the consistency loss to provide stable and reliable cross-view targets. To explicitly address the spatial discrepancies introduced by view changes, a deformable alignment module is introduced. This module learns to align spatial structures before latent comparison, defined as:
where is a learnable transformation capturing geometric distortions between views. The combination of latent consistency, contrastive separation, temporal stabilization, and spatial alignment enables APIL to operate effectively across view-based domain shifts, ensuring that the learned features remain semantically coherent and diagnostically reliable even when confronted with unseen imaging protocols or heterogeneous patient populations. This cross-view consistency strategy plays a critical role in enhancing PulmoNets robustness and generalizability in real-world clinical deployment.
4 Experimental setup
4.1 Dataset
MIMIC-III Dataset (35) is a large, publicly available dataset comprising de-identified health data associated with over 40,000 critical care patients admitted to the Beth Israel Deaconess Medical Center between 2001 and 2012. The dataset includes detailed information such as demographics, vital signs, laboratory tests, medications, diagnoses, procedures, and survival outcomes. Structured clinical data is complemented by free-text clinical notes, enabling comprehensive modeling of patient trajectories. The granularity and richness of MIMIC-III support a wide range of machine learning tasks, including mortality prediction, disease phenotyping, and temporal modeling. Its availability has made it a cornerstone resource for reproducible research in computational health informatics and critical care analytics. eICU Collaborative Research Dataset (36) is a multi-center critical care database provided by the eICU Research Institute, encompassing data from over 200,000 ICU stays across more than 200 hospitals in the United States. It includes high-resolution, time-stamped clinical variables such as vital signs, lab values, medication records, and treatment plans. Unlike single-center datasets, eICU captures institutional variability, offering a broader representation of clinical practice across diverse geographic and demographic populations. The dataset is ideal for studying generalizability of predictive models, evaluating outcomes across healthcare systems, and developing federated learning strategies for ICU analytics. High-Resolution ICU Dataset (37) contains second-by-second physiological time-series data collected from ICU patients using bedside monitors. It is especially valuable for tasks requiring fine-grained modeling of patient status, such as early warning score development, sepsis detection, and signal-level anomaly detection. The dataset bridges the gap between traditional clinical records and real-time monitoring, allowing researchers to explore dynamic patient states and temporal dependencies in critical care settings. Its use facilitates the advancement of interpretable, real-time decision support systems. COPDGene Study Dataset (38) is a large-scale, longitudinal dataset aimed at understanding the genetic and clinical basis of Chronic Obstructive Pulmonary Disease (COPD). It comprises imaging data (primarily chest CT scans), pulmonary function tests, and extensive phenotypic information from over 10,000 subjects, including both smokers with and without COPD. The dataset supports multi-modal analysis, integrating imaging biomarkers with clinical and genetic data. Its emphasis on disease heterogeneity, progression, and comorbidity makes it particularly suited for studying chronic respiratory diseases. Researchers utilize COPDGene for building prognostic models, performing phenotype discovery, and analyzing genotype-phenotype correlations in respiratory health.
The imaging annotations used in this study were derived from validated automated NLP tools applied to radiology reports in MIMIC-CXR and eICU datasets. These labels include findings such as consolidation, edema, and cardiomegaly. For COPDGene, we used spirometry-derived GOLD staging and radiologist-confirmed emphysema severity scores. Disease severity labels in MIMIC-III were computed based on ICU outcomes such as oxygen requirement, mechanical ventilation, and SOFA scores. No pose heatmaps or synthetic annotations were used. Anatomical segmentation masks used to guide region-specific attention were obtained from publicly available lung lobe atlases.
To facilitate understanding of the modeling process and dataset structure, we list the primary variables used in our study along with their types. Continuous variables include: respiratory rate, oxygen saturation (SpO2), partial pressure of oxygen (PaO2), forced expiratory volume (FEV1), heart rate, systolic and diastolic blood pressure, and blood pH. These are typically recorded at high temporal resolution in ICU datasets. Categorical variables include: gender, ethnicity, ICU admission type, ventilation status (binary), diagnosis codes (ICD), and survival outcome (binary). Imaging data from CT and chest X-ray are processed as volumetric tensors (continuous), while their associated anatomical masks for lobes and lesions are treated as categorical. In the COPDGene dataset, additional ordinal labels such as disease severity stages are used, which are treated either as categorical or continuous for regression tasks. Metadata such as modality type, scanner manufacturer, and contrast phase are categorical. These variables collectively form the input tuple (Ii, ci, mi) described in Section 3.2, where imaging (Ii) is volumetric and continuous, clinical vectors (ci) are mixed-type, and modality metadata (mi) is categorical. This structured typing enables the design of modality-specific encoders and helps in interpreting fusion strategies across heterogeneous clinical inputs.
4.2 Experimental details
To ensure a comprehensive evaluation, we compare our proposed model against a suite of representative time series forecasting methods. These include: LSTM, a classical deep sequence model known for its memory capabilities; Transformer, a self-attention based sequence model that handles long-term dependencies; Informer, optimized for long sequence forecasting via sparse attention; Autoformer, which decomposes temporal signals to model trend and seasonality explicitly; and Hybrid ARIMA-LSTM, a hybrid model that combines statistical and deep learning forecasting. These models serve as standard baselines in recent literature and offer diverse modeling mechanisms ranging from autoregressive structures to transformer-based encoders. Our model, TSM, incorporates temporal shift modeling, anatomical priors, and adaptive pathology-informed learning for enhanced clinical utility.
All experiments were implemented using the PyTorch framework and trained on NVIDIA A100 GPUs (40GB). For all datasets, including MIMIC-III, eICU Collaborative Research Dataset, High-Resolution ICU Dataset, and COPDGene Study, we followed consistent preprocessing and training protocols. Input imaging data (CT or X-ray) were resized to 384 × 288 for 3D models and normalized based on modality-specific intensity ranges. Non-imaging clinical variables were standardized using Z-score normalization. Multi-modal inputs—volumetric scans, tabular clinical data, and modality metadata—were jointly processed using our fusion architecture. For data splitting, we adopted an 80/10/10 (train/validation/test) split for large datasets and a 70/15/15 split for smaller datasets. All results are averaged across 5-fold cross-validation to ensure robustness. Validation sets were used for hyperparameter tuning and early stopping. We employed the Adam optimizer with an initial learning rate of 1 × 10−3, a weight decay of 1 × 10−5, and a learning rate scheduler that reduces the learning rate by a factor of 10 upon plateauing for 5 epochs. Batch size was set to 64. Training was conducted for 100 epochs on larger datasets and 50 epochs on smaller datasets. Standard data augmentation techniques were applied to imaging data, including random horizontal flipping, rotation (±30°), intensity jittering, and scaling (0.75 to 1.25). Anatomical segmentation masks were used during training to guide region-specific attention and graph construction. Evaluation metrics include Accuracy, Recall, F1 Score, and Area Under the ROC Curve (AUC). These metrics are widely accepted in clinical outcome prediction and classification tasks and ensure meaningful, interpretable comparisons across different models. All baseline methods were re-trained under identical conditions to ensure fairness. To further validate model generalizability, we conducted cross-dataset transfer evaluations and ablation studies isolating each module in our framework, such as PulmoNet, graph reasoning, and the APIL training strategy.
All datasets used in this study were partitioned into training, validation, and testing sets. For MIMIC-III and COPDGene, we followed an 80/10/10 split, while for smaller datasets such as High-Resolution ICU and eICU, a 70/15/15 split was employed. These partitions were stratified to preserve class distribution. To further improve robustness and reduce sampling bias, we adopted 5-fold cross-validation. For each fold, models were trained on 80% of the data, validated on 10%, and tested on the remaining 10%. The reported metrics represent the average performance across all folds. This protocol was applied consistently across all compared models to ensure fairness in evaluation.
4.3 Comparison with SOTA methods
We conduct a comprehensive comparison between our proposed TSM model and six representative SOTA time series models across four benchmark datasets: MIMIC-III Dataset, eICU Collaborative Research Dataset, High-Resolution ICU Dataset, and COPDGene Study Dataset. The results are reported in Tables 1, 2, where we evaluate the models using Accuracy, Recall, F1 Score, and AUC. On the MIMIC-III Dataset, our model achieves the highest performance across all metrics with an Accuracy of 89.71, Recall of 87.92, F1 Score of 88.34, and AUC of 90.67. These results significantly outperform all baseline models including Informer and Autoformer, which previously held top scores. Notably, TSM surpasses Informer by 2.25 points in Accuracy and 2.65 points in AUC, demonstrating its strong ability to capture long-range dependencies and temporal patterns in pose data. The superior performance is consistent across both structured datasets like MIMIC-III Dataset and in-the-wild datasets like eICU Collaborative Research Dataset, indicating that our model generalizes well to varied human poses and background conditions.
Table 1. Benchmarking our method against state-of-the-art techniques using the MIMIC-III and eICU collaborative research databases.
Table 2. Assessing the superiority of our method over state-of-the-art approaches on the high-resolution ICU and COPDGene study datasets.
On the High-Resolution ICU Dataset and COPDGene Study Dataset, our method continues to outperform the SOTA methods by a large margin. On High-Resolution ICU Dataset, TSM achieves 89.03 Accuracy and 90.11 AUC, compared to the next best Informer with 86.71 and 87.32 respectively. This improvement confirms the advantage of our design in modeling temporal continuity and structured joint relationships. On the COPDGene Study Dataset, our method sets new benchmarks with an Accuracy of 88.47 and F1 Score of 87.33, reflecting robustness under occlusion and irregular sports poses. The general superiority across both datasets can be attributed to the temporal-aware module integrated within TSM, which better encodes motion dynamics compared to recurrent or convolution-based alternatives. Transformer-based methods such as Informer and Autoformer perform better than traditional RNNs like LSTM and GRU, but still fall short in retaining finer pose variations and resolving ambiguous motion cues. Our attention-guided time series module explicitly models both local and global temporal transitions, which is crucial for accurate human pose forecasting under time-variant settings. These advantages are further magnified when dealing with pose series that exhibit nonlinear motions or abrupt changes, where models like TCN and Transformer tend to smooth out critical transitions.
The empirical gains of TSM can be traced back to three core contributions of our method as detailed in the methodology. Our adaptive temporal shift unit introduces selective channel shifting that promotes inter-frame information flow while preserving critical spatial context. This operation improves performance over static convolutions used in TCN. Our dual-stage attention mechanism—consisting of temporal and joint attention—allows the model to selectively emphasize key joints and frames based on motion significance, thus enhancing interpretability and resilience to noise. Compared with Transformer and Autoformer, which apply a uniform attention across the timeline, our dual-stage strategy yields stronger feature discrimination and minimizes overfitting on repetitive patterns. Our training regime incorporates hierarchical supervision and joint-aligned loss functions, which stabilize learning and accelerate convergence. Baseline models often suffer from either vanishing gradients (in RNNs) or insufficient inductive bias (in pure attention models), whereas TSM balances both representation richness and training stability. The consistent improvements across all datasets and evaluation metrics affirm that our method provides an effective and scalable solution for temporal pose analysis, combining the strengths of deep attention and temporal shift paradigms for robust time series modeling.
4.4 Ablation study
To investigate the individual contributions of each component in our proposed TSM framework, we conduct an ablation study on four benchmark datasets: MIMIC-III Dataset, eICU Collaborative Research Dataset, High-Resolution ICU Dataset, and COPDGene Study Dataset. As shown in Tables 3, 4, we evaluate three ablated variants: Graph Reasoning Across Regions, Region-Specific Attention Encoding, and Cross-View Consistency Enforcement. Across all datasets, the full TSM model consistently achieves the highest performance. On the MIMIC-III Dataset, the Graph Reasoning Across Regions results in a noticeable drop of 2.49 in Accuracy and 2.33 in AUC compared to the full model, indicating that temporal shift plays a vital role in learning frame-level dependencies and capturing motion continuity. On eICU Collaborative Research Dataset, the Region-Specific Attention Encoding is shown to be the most influential module, where its absence causes a reduction of 1.79 in F1 Score and nearly 1.55 in Accuracy, suggesting that joint-level attention and temporal-level attention are critical for focusing on informative frames and discriminative joints under unconstrained conditions. Likewise, Cross-View Consistency Enforcement contributes significantly by stabilizing optimization and improving representation learning, especially on High-Resolution ICU Dataset where its removal leads to a 2.59 drop in Accuracy and a 1.86 decrease in F1 Score. The COPDGene Study Dataset, known for its highly varied and dynamic sports poses, highlights the compound effects of all components, where the full model yields the highest metrics and the ablated versions suffer from up to 2.60 lower Recall or 2.13 lower AUC.
Table 3. Comprehensive ablation study of our approach on the MIMIC-III and eICU collaborative research datasets.
Table 4. Comprehensive ablation study of our approach on the high-resolution ICU and COPDGene study datasets.
These results reinforce the necessity of each design choice within our architecture. The temporal shift improves inter-frame dynamics, attention modules increase focus on semantically relevant joints and frames, and hierarchical supervision fosters multi-level consistency across predictions. Without any of these, the model underperforms significantly, proving that the full synergy of all modules in TSM is essential for optimal performance in time series-based human pose prediction.
To the existing baselines, we further implemented the hybrid ARIMA-LSTM model proposed by Bhattacharyya et al. (2021) to provide a broader comparative view. This model integrates a classical statistical forecasting method (ARIMA) with LSTM to capture both linear and nonlinear dynamics in temporal health data. Table 5 shows that while the hybrid model performs competitively, it falls short of our TSM framework in accuracy (86.04% vs. 89.71%), recall, and AUC. This gap reflects the strength of TSM in modeling structured temporal dependencies and spatial anatomical relationships, which are not addressed in univariate hybrid approaches. Nonetheless, we acknowledge that the hybrid model remains effective in simpler time series settings and could be beneficial when domain-specific structural priors are unavailable.
Table 5. Comparison with Bhattacharyya et al.'s hybrid model and other SOTA models on the MIMIC-III dataset.
Figure 5 presents the ROC curves of all evaluated models on the MIMIC-III dataset. It is evident that the proposed TSM model (brown curve) consistently outperforms baseline methods across all threshold levels, achieving the highest Area Under the Curve (AUC = 0.907). This indicates superior sensitivity-specificity trade-offs compared to Transformer (AUC = 0.860), Informer (AUC = 0.880), Autoformer (AUC = 0.873), Hybrid ARIMA-LSTM (AUC = 0.868), and LSTM (AUC = 0.862). Notably, the traditional models such as LSTM and Hybrid ARIMA-LSTM demonstrate relatively steep but early saturating curves, which suggests a tendency toward early classification bias. In contrast, the TSM model maintains a well-balanced curve shape that adheres closer to the ideal top-left corner, reflecting its robustness in distinguishing between positive and negative pulmonary diagnostic outcomes. These results provide further evidence of the clinical applicability of our approach, especially in scenarios demanding high sensitivity and specificity.
Figure 5. Receiver operating characteristic (ROC) curves comparing six models on the MIMIC-III dataset.
5 Discussion
In this study, we proposed a unified framework combining PulmoNet and Adaptive Patho-Integrated Learning (APIL) for pulmonary disease diagnosis and treatment optimization based on time series data. Our approach leverages anatomical priors, pathology-aware learning, and multi-modal fusion to enhance both performance and interpretability. The experimental results confirm that TSM consistently outperforms traditional and state-of-the-art time series models, including LSTM, Transformer, Informer, Autoformer, and the Hybrid ARIMA-LSTM model. One key advantage of our model lies in its structured incorporation of domain knowledge—lung anatomy and radiological patterns—into the learning pipeline. This allows the model to focus on clinically relevant subregions and improves its robustness in noisy, sparse, or imbalanced clinical data. The region-specific attention and graph-based reasoning offer interpretability and explainability, features often lacking in deep learning systems. The comparative analysis with Bhattacharyya et al.'s hybrid ARIMA-LSTM model further highlights the superiority of our method in high-dimensional multi-modal scenarios. However, we also acknowledge that hybrid statistical-deep models remain useful for low-resource environments, and could potentially be integrated as lightweight modules in future versions of our framework. Furthermore, the ROC curve analysis demonstrates that TSM achieves better sensitivity and specificity trade-offs compared to all other methods. This is essential for clinical translation, where false positives and false negatives both carry significant risk. Nonetheless, our model requires structured input formats and high-quality anatomical segmentation masks, which may limit its scalability to under-resourced healthcare systems. Further work is required to reduce such dependency and make the system more flexible for real-world deployment.
Although our model demonstrates strong quantitative improvements on retrospective datasets, we acknowledge that its clinical significance must be interpreted cautiously. These results have not yet been validated against physician decision-making or existing clinical scoring systems in prospective settings. Our current comparisons are relative to machine learning baselines, not to trained radiologists or intensivists. Future studies will incorporate clinician-in-the-loop assessments to evaluate usability, trustworthiness, and true impact on patient outcomes.
Our experiments include cross-dataset generalization studies, where the model trained on one dataset is evaluated on another, capturing institutional and protocol variation. Results show consistent performance with < 3% drop in AUC, highlighting domain robustness. Despite incorporating demographic features into the model inputs, we did not conduct subgroup-specific fairness analysis. We acknowledge this as an important limitation and plan to incorporate demographic stratification and fairness audits in future evaluations to ensure equitable performance across populations.
6 Conclusions and future work
In this study, we aimed to tackle the challenges inherent in diagnosing and treating lung diseases—conditions marked by complex anatomical and pathological variability—through a novel time series prediction framework. Existing diagnostic methods, while benefiting from modern imaging and clinical data collection, often fall short due to their reliance on shallow learning models that cannot generalize well across heterogeneous patient groups or sparse data settings. To address this, we developed PulmoNet, a multi-scale neural architecture explicitly constrained by pulmonary anatomy. PulmoNet distinguishes itself by embedding bronchopulmonary anatomical priors and leveraging spatial attention to highlight clinically relevant regions within the lungs, such as parenchymal and vascular zones. It also constructs a latent inter-lobar graph to capture spatial dependencies, enabling it to perform joint segmentation, classification, and feature attribution in a unified framework.
To further enhance generalization and clinical relevance, we introduced APIL (Adaptive Patho-Integrated Learning)—a curriculum-based training strategy that injects radiological rules, multi-view consistency, and supervision into the learning process. APIL improves performance in low-annotation scenarios through uncertainty modeling and domain adaptation. Our experiments across diverse, multi-institutional datasets confirmed the effectiveness of this dual-component approach, with significant gains in accuracy, lesion localization, and interpretability compared to state-of-the-art baselines. These results illustrate the power of anatomically and pathologically informed deep learning in advancing personalized pulmonary diagnostics. However, two key limitations remain. Despite APILs strength in dealing with sparse data, its reliance on handcrafted radiological priors might limit scalability when adapting to new lung diseases or emerging imaging modalities. Future work should explore automated prior discovery using unsupervised or semi-supervised techniques. While PulmoNet integrates anatomical structure explicitly, the graph-based modeling still depends on predefined inter-lobar relationships, which may not capture patient-specific anatomical anomalies. Incorporating patient-specific modeling through dynamic graph construction or self-supervised anatomical encoding may lead to further improvements. Going forward, extending this framework to longitudinal data and real-time clinical feedback loops could pave the way for adaptive, continuously learning systems in respiratory medicine.
Data availability statement
The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.
Author contributions
JG: Data curation, Methodology, Supervision, Writing – original draft, Writing – review & editing. ML: Conceptualization, Formal analysis, Project administration, Writing – original draft, Writing – review & editing. PD: Validation, Investigation, Software, Writing – original draft, Writing – review & editing. LG: Funding acquisition, Resources, Visualization, Writing – original draft, Writing – review & editing. YZ: Visualization, Supervision, Funding acquisition, Writing – original draft, Writing – review & editing.
Funding
The author(s) declare that no financial support was received for the research and/or publication of this article.
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The author(s) declare that no Gen AI was used in the creation of this manuscript.
Publisher's note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
References
1. Tian Y, Liu J, Wu S, Zheng Y, Han R, Bao Q, et al. Development and validation of a deep learning-enhanced prediction model for the likelihood of pulmonary embolism. Front Med. (2025) 12:1506363. doi: 10.3389/fmed.2025.1506363
2. Liu Z, Zuo B, Lin J, Sun Z, Hu H, Yin Y, et al. Breaking new ground: machine learning enhances survival forecasts in hypercapnic respiratory failure. Front Med. (2025) 12:1497651. doi: 10.3389/fmed.2025.1497651
3. Zhang J, Chen Y, Zhang A, Yang Y, Ma L, Meng H, et al. Explainable machine learning model and nomogram for predicting the efficacy of Traditional Chinese Medicine in treating Long COVID: a retrospective study. Front Med. (2025) 12:1529993. doi: 10.3389/fmed.2025.1529993
4. Wen X, Li W. Time series prediction based on LSTM-attention-LSTM model. IEEE Access. (2023) 11:48322–31. doi: 10.1109/ACCESS.2023.3276628
5. Ren L, Jia Z, Laili Y, Huang DW. Deep learning for time-series prediction in IIoT: progress, challenges, and prospects. IEEE Trans Neural Netw Learn Syst. (2023) 35:15072–91. doi: 10.1109/TNNLS.2023.3291371
6. Li Y, Wu K, Liu J. Self-paced ARIMA for robust time series prediction. Knowl.-Based Syst. (2023) 269:110489. doi: 10.1016/j.knosys.2023.110489
7. Yin L, Wang L, Li T, Lu S, Tian J, Yin Z, et al. U-Net-LSTM: time series-enhanced lake boundary prediction model. Land. (2023) 12:1859. doi: 10.3390/land12101859
8. Yu C, Wang F, Shao Z, Sun T, Wu L, Xu Y. DSformer: A double sampling transformer for multivariate time series long-term prediction. In: International Conference on Information and Knowledge Management. (2023). doi: 10.1145/3583780.3614851
9. Durairaj DM, Mohan BGK. A convolutional neural network based approach to financial time series prediction. Neural Comp. Applicat. (2022) 34:13319–37. doi: 10.1007/s00521-022-07143-2
10. Zheng W, Hu J. Multivariate time series prediction based on temporal change information learning method. IEEE Trans Neural Netw Learn Syst. (2022) 34:7034–48. doi: 10.1109/TNNLS.2021.3137178
11. Chandra R, Goyal S, Gupta R. Evaluation of deep learning models for multi-step ahead time series prediction. IEEE Access. (2021) 9:83105–23. doi: 10.1109/ACCESS.2021.3085085
12. Fan J, Zhang K, Yipan H, Zhu Y, Chen B. Parallel spatio-temporal attention-based TCN for multivariate time series prediction. Neural Comp Appl. (2021) 35:13109–18. doi: 10.1007/s00521-021-05958-z
13. Hou M, Xu C, Li Z, Liu Y, Liu W, Chen E, et al. Multi-granularity residual learning with confidence estimation for time series prediction. In: The Web Conference. (2022). doi: 10.1145/3485447.3512056
14. Lindemann B, Mller T, Vietz H, Jazdi N, Weyrich M. A survey on long short-term memory networks for time series prediction. In: Procedia CIRP. (2021). Available online at: https://www.sciencedirect.com/science/article/pii/S2212827121003796
15. Dudukcu HV, Taskiran M, Taskiran ZGC, Yldrm T. Temporal Convolutional Networks with RNN approach for chaotic time series prediction. Appl Soft Comp. (2022) 133:109945. doi: 10.1016/j.asoc.2022.109945
16. Amalou I, Mouhni N, Abdali A. Multivariate time series prediction by RNN architectures for energy consumption forecasting. Energy Reports. (2022) 8:1084–91. doi: 10.1016/j.egyr.2022.07.139
17. Xiao Y, Yin H, Zhang Y, Qi H, Zhang Y, Liu Z. A dualstage attentionbased ConvLSTM network for spatiotemporal correlation and multivariate time series prediction. Int J Intellig Syst. (2021). doi: 10.1002/int.22370
18. Wang J, Peng Z, Wang X, Li C, Wu J. Deep fuzzy cognitive maps for interpretable multivariate time series prediction. IEEE Trans Fuzzy Syst. (2021) 29:2647–60. doi: 10.1109/TFUZZ.2020.3005293
19. Moskola W, Abdou W, Dipanda A, Kolyang. Application of deep learning architectures for satellite image time series prediction: a review. Remote Sens. (2021) 13:4822. doi: 10.3390/rs13234822
20. Xu M, Han M, Chen CLP, Qiu T. Recurrent broad learning systems for time series prediction. IEEE Trans Cybernet. (2020) 50:1405–17. doi: 10.1109/TCYB.2018.2863020
21. Zheng W, Chen G. An accurate GRU-based power time-series prediction approach with selective state updating and stochastic optimization. IEEE Trans Cybernet. (2021) 52:13902–14. doi: 10.1109/TCYB.2021.3121312
22. Wang S, Wu H, Shi X, Hu T, Luo H, Ma L, et al. TimeMixer: decomposable multiscale mixing for time series forecasting. In: International Conference on Learning Representations. (2024). Available online at: https://arxiv.org/abs/2405.14616
23. Wang J, Jiang W, Li Z, Lu Y. A new multi-scale sliding window LSTM framework (MSSW-LSTM): a case study for GNSS time-series prediction. Remote Sens. (2021) 13:3328. doi: 10.3390/rs13163328
24. Altan A, Karasu S. Crude oil time series prediction model based on LSTM network with chaotic Henry gas solubility optimization. Energy. (2021) 242:122964. doi: 10.1016/j.energy.2021.122964
25. Wen J, Yang J, Jiang B, Song H, Wang H. Big data driven marine environment information forecasting: a time series prediction network. IEEE Trans Fuzzy Syst. (2021) 29:4–18. doi: 10.1109/TFUZZ.2020.3012393
26. Morid M, Sheng OR, Dunbar JA. Time series prediction using deep learning methods in healthcare. In: ACM Transactions on Management Information Systems. (2021). doi: 10.1145/3531326
27. Widiputra H, Mailangkay A, Gautama E. Multivariate CNN-LSTM model for multiple parallel financial time-series prediction. Complex. (2021). doi: 10.1155/2021/9903518
28. Yang M, Wang J. Adaptability of financial time series prediction based on BiLSTM. In: International Conference on Information Technology and Quantitative Management. (2021). Available online at: https://www.sciencedirect.com/science/article/pii/S1877050922000035
29. Ruan L, Bai Y, Li S, He S, Xiao L. Workload time series prediction in storage systems: a deep learning based approach. Cluster Comp. (2021) 26:25–35. doi: 10.1007/s10586-020-03214-y
30. Zhou H, Zhang S, Peng J, Zhang S, Li J, Xiong H, et al. Informer: beyond efficient transformer for long sequence time-series forecasting. In: AAAI Conference on Artificial Intelligence. (2020). Available online at: https://ojs.aaai.org/index.php/AAAI/article/view/17325
31. Angelopoulos AN, Cands E, Tibshirani R. Conformal PID control for time series prediction. In: Neural Information Processing Systems. (2023). Available online at: https://proceedings.neurips.cc/paper_files/paper/2023/hash/47f2fad8c1111d07f83c91be7870f8db-Abstract-Conference.html
32. Shen L, Kwok J. Non-autoregressive conditional diffusion models for time series prediction. In: International Conference on Machine Learning. (2023). Available online at: https://proceedings.mlr.press/v202/shen23d.html
33. Hou R, Yu Y, Yang J, Jiang J. Prostaglandin E2 receptor EP2: a novel target for high-risk neuroblastoma. Cancer Res. (2023) 83:4923. doi: 10.1158/1538-7445.AM2023-4923
34. Hou R, Yu Y, Jiang J. Prostaglandin E2 in neuroblastoma: targeting synthesis or signaling? Biomed Pharmacother. (2022) 156:113966. doi: 10.1016/j.biopha.2022.113966
35. Budrionis A, Miara M, Miara P, Wilk S, Bellika JG. Benchmarking PySyft federated learning framework on MIMIC-III dataset. IEEE Access. (2021) 9:116869–78. doi: 10.1109/ACCESS.2021.3105929
36. Zhang L, Deng T, Zeng G, Chen X, Wu D. The association of serum albumin with 28 day mortality in critically ill patients undergoing dialysis: a secondary analysis based on the eICU collaborative research database. Eur J Med Res. (2024) 29:530. doi: 10.1186/s40001-024-02127-5
37. Carra G, Güiza F, Depreitere B, Meyfroidt G. Prediction model for intracranial hypertension demonstrates robust performance during external validation on the CENTER-TBI dataset. Intens Care Med. (2021) 47:124–6. doi: 10.1007/s00134-020-06247-4
38. Saxena A. A bioinformatics approach for deciphering the pathogenic processes of COPD. In: 2021 5th International Conference on Information Systems and Computer Networks (ISCON). Mathura: IEEE (2021). p. 1–3.
39. Cahuantzi R, Chen X, Güttel S. A comparison of LSTM and GRU networks for learning symbolic sequences. In: Science and Information Conference. Cham: Springer (2023). p. 771–785.
40. Han K, Wang Y, Chen H, Chen X, Guo J, Liu Z, et al. A survey on vision transformer. IEEE Trans Pattern Anal Mach Intell. (2022) 45:87–110. doi: 10.1109/TPAMI.2022.3152247
41. Chen C, Chen X, Guo C, Hang P. Trajectory prediction for autonomous driving based on structural informer method. IEEE Trans Automa Sci Eng. (2023) 22:17452–63. doi: 10.1109/TASE.2023.3342978
Keywords: pulmonary disease prediction, anatomically-constrained deep learning, pathology-informed AI, clinical interpretability, multi-scale feature representation
Citation: Ge J, Liu M, Du P, Guo L and Zhang Y (2025) Time series prediction for lung disease diagnosis and treatment optimization. Front. Med. 12:1620462. doi: 10.3389/fmed.2025.1620462
Received: 29 April 2025; Accepted: 21 July 2025;
Published: 25 November 2025.
Edited by:
Ruida Hou, St. Jude Children's Research Hospital, United StatesReviewed by:
Arinjita Bhattacharyya, University of Louisville, United StatesBodhayan Prasad, University of Glasgow, United Kingdom
Copyright © 2025 Ge, Liu, Du, Guo and Zhang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Lichen Guo, YWNodWFsaW5raUBob3RtYWlsLmNvbQ==
Jiaqi Ge1