CORRECTION article
Front. Electron.
Sec. Power Electronics
Volume 6 - 2025 | doi: 10.3389/felec.2025.1693752
A Hybrid LSTM-Transformer Model for Accurate Remaining Useful Life Prediction of Lithium-Ion Batteries
Provisionally accepted- 1Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences (CAS), Shenzhen, China
- 2Northeast Electric Power University, Jilin, China
- 3University of Macau, Taipa, Macao, SAR China
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
1. Introduction With the global transition toward cleaner energy and the rapid advancement of electrification technologies, lithium-ion batteries have emerged as essential energy storage components in electric vehicles, renewable energy storage systems, and portable electronic devices[1]. Owing to their high energy density, long cycle life, and low self-discharge rate, lithium-ion batteries have become increasingly important. However, their performance inevitably degrades over time due to repeated charge-discharge cycles, leading to capacity fade and a shortened remaining useful life (RUL)[2]. Accurate prediction of battery state of health (SOH) and RUL is crucial for optimizing battery management, extending service life, reducing maintenance costs, and ensuring system safety[3,4]. In particular, battery failure in electric vehicles or large-scale energy storage systems can result in significant safety hazards and economic losses, making the development of high-accuracy RUL prediction methods a pressing research focus. Precise RUL prediction plays a vital role in optimizing battery management systems. First, it provides a scientific basis for battery replacement and maintenance decisions, thereby lowering operational costs[5]. Second, it enables early identification of potential failures, thus enhancing system safety[6]. Moreover, accurate RUL estimates support battery recycling and second-life applications, contributing to sustainable resource utilization. Nevertheless, several challenges hinder effective RUL prediction. The degradation process is influenced by multiple factors such as temperature, charge/discharge rates, and usage scenarios, exhibiting high nonlinearity and complexity[7]. Real-world operational data often contain noise and missing values, increasing the difficulty of modeling[8]. Furthermore, long-term prediction requires models that can simultaneously capture short-term fluctuations and long-term trends, which is difficult for single-model architectures to achieve. Traditional RUL prediction approaches can be categorized into physics-based and data-driven methods. Physics-based models rely on the electrochemical mechanisms of batteries[9], using complex mathematical formulations to describe the degradation process. However, these methods require detailed knowledge of material properties and operating conditions, involve high computational complexity, and often lack generalizability across different battery types. In contrast, data-driven approaches have gained popularity by learning patterns directly from operational data[10]. With advances in sensor technology and data acquisition capabilities, these methods can effectively model battery behavior using features such as voltage, current, and temperature, showing improved adaptability and predictive accuracy. Data-driven RUL prediction techniques generally fall into three categories: statistical models[11], machine learning methods[12], and deep learning models[13]. Statistical approaches, such as Kalman filtering and particle filtering, model battery degradation probabilistically. For example, Nunes et al.[14] proposed an online RUL estimation method for second-life lithium-ion batteries based on an unscented Kalman filter and degradation curve modeling, validated on six different second-life battery datasets. Despite some success—achieving a worst-case mean absolute percentage error (MAPE) of 5.279% and an R² score of 0.726—statistical models often struggle with nonlinear or complex degradation behaviors. Machine learning approaches such as support vector machines, random forests, and XGBoost have demonstrated promising results in RUL prediction through hand-crafted features. Jafari et al.[15] introduced a hybrid RUL prediction method based on particle filtering and Kalman filtering, where XGBoost was used as the observation model due to its strong nonlinear fitting capabilities. Despite high predictive accuracy based on full-cycle test data, such methods are heavily dependent on the quality of feature engineering and may suffer from overfitting or inefficiency when applied to high-dimensional time-series data. The emergence of deep learning has opened new avenues for RUL prediction. Long Short-Term Memory (LSTM) networks, known for their capability in modeling temporal dependencies, have been widely adopted for battery degradation modeling. LSTM networks utilize gating mechanisms to effectively capture long-term dependencies, making them well-suited for modeling the nonlinear degradation process of batteries. Reza et al.[16] proposed an improved method combining LSTM with the Gravitational Search Algorithm (GSA), using data cleaning to remove noise, replacing anomalies with highly correlated data, and applying normalization. GSA was employed to optimize the LSTM hyperparameters to address key challenges in battery life prediction. However, LSTM models may still face limitations such as vanishing gradients and computational inefficiencies when dealing with long sequences. Recently, Transformer models have demonstrated excellent performance in natural language processing and time-series analysis tasks due to their strong capability in extracting global features[17,18]. The Transformer architecture processes sequences in parallel through multi-head attention mechanisms, effectively capturing long-range dependencies. Nevertheless, its application in battery RUL prediction remains relatively unexplored. Given the above challenges, accurate RUL prediction remains difficult due to the following key factors, (1) The degradation process is highly nonlinear and influenced by external factors such as temperature, cycling rate, and usage scenarios; (2) Real-world data are often noisy and incomplete, complicating the modeling process; (3) Long-term prediction tasks require models to simultaneously capture short-term variations and long-term trends, which is difficult for single models to handle effectively. To address these challenges, this study develops a hybrid deep learning model that captures both local temporal dependencies and global contextual features. As illustrated in Figure 1, we propose a novel LSTM-Transformer hybrid model based on the MIT battery dataset to enhance prediction accuracy and robustness. The LSTM module extracts local temporal dynamics from time-series inputs, while the Transformer module, with its attention mechanism, captures global feature dependencies. The integration of both allows the model to effectively represent complex degradation patterns. The main contributions of this work are as follows, (1) A novel hybrid deep learning architecture combining local temporal modeling and global attention mechanisms is proposed for lithium-ion battery RUL prediction. (2) Temperature-based features are engineered based on battery physical mechanisms to enhance input feature expressiveness. (3) The proposed model is validated on the MIT battery dataset, demonstrating superior prediction accuracy and robustness compared to existing methods, with strong potential for real-world applications. The remainder of this paper is organized as follows: Section 2 introduces the data preprocessing and feature engineering methods; Section 3 details the architecture and training process of the LSTM-Transformer hybrid model; Section 4 presents experimental results and performance evaluation; Section 5 concludes the paper with a summary of findings and future work directions. Through this research, we aim to provide an efficient and accurate solution for lithium-ion battery RUL prediction, offering theoretical and technical support for optimized battery management systems. Figure 1. Schematic diagram of the LSTM-Transformer hybrid model architecture. 2. Data Preprocessing and Feature Engineering Accurate prediction of the remaining useful life of lithium-ion batteries critically depends on high-quality data preprocessing and feature engineering. To construct a time-series modeling– ready prediction dataset, this study systematically performs data cleaning, temperature feature extraction, normalization, target variable transformation, and sample construction based on a sliding window. By incrementally sliding a fixed-length window over the time-series data to extract local segments, representative features or sequential samples are generated. This approach facilitates the capture of local temporal dependencies and dynamic variations[19], thereby ensuring that the input data fed into the model possesses strong representativeness and consistency. The original dataset used in this work is the MIT Battery Dataset, which includes operational data from multiple batteries under various charging and discharging conditions. Key variables include voltage, current, and temperature. Let the raw feature matrix be denoted as n d x , where n represents the number of samples and d indicates the number of feature dimensions. The target variable is denoted as n y . To explore the linear relationships between different features, we construct a Pearson correlation-based heatmap, as illustrated in Figure 2(a). In this heatmap, red indicates strong positive correlations, while blue denotes strong negative correlations. The heatmap reveals significant correlations among several voltage-related, capacity-related, and temperature-related features. Based on this analysis, redundant features are removed to improve the generalization ability of the model and to mitigate issues related to multicollinearity. (a) (b) Figure 2. (a) Feature correlation heatmap based on Pearson coefficients. (b) Feature space distribution of battery life using Principal Component Analysis. To comprehensively evaluate the structure of the feature space, Principal Component Analysis was applied to the high-dimensional feature data after redundancy removal, as shown in Figure 2(b). The first two principal components were visualized, with a color gradient representing the corresponding battery RUL values. This visualization enables the observation of degradation trends within the feature space. The results indicate that samples with different RUL levels exhibit clear clustering patterns in the two-dimensional PCA space, demonstrating the discriminative capability of the extracted features in characterizing battery degradation states. Temperature, as a critical factor influencing lithium-ion battery aging and performance degradation, plays a key role in modeling degradation behavior. Analysis reveals that the raw temperature features (T1-0 to T3-3) exhibit significant dynamic fluctuations during battery operation and are highly correlated with changes in the RUL curve. To this end, three types of temperature-derived features are designed: temperature mean, temperature range, and temperature fluctuation. The temperature mean reflects the overall thermal load level during battery operation. Elevated operating temperatures accelerate electrolyte decomposition, solid electrolyte interphase (SEI) layer growth, and structural degradation of electrode materials, which are key contributors to capacity fade and internal resistance increase. The temperature range measures the amplitude of temperature variation within each time window, indicating the degree of thermal stress fluctuation. Frequent and intense thermal stress cycles may induce mechanical fatigue or even cracking of electrode particles, exacerbating material degradation and performance decline. Temperature fluctuation, quantified by the standard deviation, characterizes the local instability of temperature over time, typically associated with abnormal conditions such as high-current charge/discharge events and cooling system failures. These factors are prone to cause localized hotspots and accelerate undesirable electrochemical side reactions, ultimately shortening battery life. For each temperature sensor group, the temperature mean is defined as follows: 1 , 0 1 m mean i i j j T T m (1) In this context, ,i j T denotes the thj temperature channel within the thi group, where 4 m represents the number of channels in each group. The corresponding temperature difference range is defined as follows: , , 0 0max min range i i j i j j m j m T T T (2) The above two features respectively characterize the central tendency and extreme dispersion of each temperature group, which can reflect phenomena such as localized overheating or abnormal heat dissipation. In addition, to capture the overall fluctuation level of the thermal behavior, the standard deviation across all temperature channels is calculated and used as an indicator of temperature volatility. 2 1 1 ( ) K temp k k T T k (3) Let 12 K denote the total number of temperature channels, kT represent the value of the th k temperature channel, and T be the mean temperature across all channels. These three temperature-derived features not only enhance the semantic expressiveness of the data but also provide physically consistent inputs aligned with the underlying battery aging mechanisms. To eliminate dimensional discrepancies among features and to improve training stability, Z-score normalization is applied to the feature matrix, ensuring that each feature has zero mean and unit variance before being fed into the model. scaled X x (4) Let and denote the mean and standard deviation of each feature column, respectively, and scaled x represent the normalized feature matrix. This normalization ensures that all features follow an approximately zero-mean and unit-variance distribution, which facilitates faster convergence of gradient descent during model training and enhances generalization performance. Meanwhile, the Yeo-Johnson transformation[20] is applied to the target variable for nonlinear processing. This parameterized transformation adjusts the data distribution to approximate a normal distribution, thereby enhancing the stability and accuracy of subsequent model training. The transformation is defined as follows: 2 ( 1) 1, 0, 0 ln( 1), 0, 0 ( 1) 1, 0, 2 2 ln( 1), 0, 2 trans y y y y y y y y y (5) Here, denotes the transformation parameter, which is automatically estimated using the maximum likelihood method. The transformed variable trans y exhibits a more symmetric distribution, which is beneficial for subsequent model convergence and stable error control. To accommodate the requirements of deep learning-based time series modeling, the dataset is reconstructed into a sliding window format. As illustrated in Figure 3, let the input window length be 30 in w and the prediction horizon be 7 out w . For the normalized feature matrix scaled n d x and the transformed target variable trans n y , each training sample is constructed from the following subsequences: [ : ] [ : ] in out w d scaled t in w trans t in in out x X t t w y y t w t w w (6) The total number of samples that can be constructed from the dataset is given by: 1 in out N n w w (7) The final dataset was split into training and testing sets at a ratio of 8:2, with the training set randomly shuffled to enhance sample diversity and training robustness. The data preprocessing pipeline significantly improved the semantic representation and structural compatibility of the input data. The design of temperature-derived features was closely aligned with the underlying battery physical mechanisms. Standardization and nonlinear transformation ensured numerical stability during model training, while the sliding window data construction effectively captured the dynamic evolution of battery degradation. Together, these steps laid a solid foundation for subsequent battery life prediction modeling based on the LSTM-Transformer architecture. 3. Model Architecture and Training Process To achieve high-precision regression prediction of the remaining useful life of lithium-ion batteries, this study designs a deep neural network model that integrates Long Short-Term Memory networks with the Transformer architecture. The model leverages the strength of LSTM in capturing local temporal dynamics in time series data and the powerful capability of Transformer in modeling global dependencies, thereby enhancing the ability to characterize the evolving performance trends of batteries. Time series features are fed into two parallel subnetworks for separate encoding, followed by feature-level fusion to ultimately predict the battery life over multiple future time steps. 3.1 LSTM Model In lithium-ion battery life prediction, operational features such as voltage, temperature, and current exhibit pronounced temporal correlations. Single-step or short-range modeling approaches often fail to capture the complex evolutionary processes. The Long Short-Term Memory network employs gating mechanisms to propagate information along the temporal dimension, enabling the model to capture nonlinear dynamic changes with long-term dependencies. Figure 4 illustrates the neural network architecture of the LSTM. Figure 4. Computational Process of the LSTM Neural Network The fundamental computational process of the LSTM is as follows. The forget gate determines whether to retain the cell state from the previous time step 1 tc at the current time step, and its formulation is given by: 1 [ , ] t f t t f f W h x b (8) Here, 1 [ , ] t t h x represents the concatenation of the previous hidden state 1 th and the current input features tx ; f W and fb denote learnable parameters; is the sigmoid activation function, whose output ranges from 0 to 1, indicating the retention proportion. The input gate controls how much of the current input information is written into the cell state, consisting of two steps: 1 [ , ] t i t t i t i W h x b C (9) 1 tanh [ , ] t c t t c c W h x b (10) Here, ti denotes the input gate weights, tc represents the candidate cell state, and the hyperbolic tangent function tanh ensures that the output range is within [−1,1], thereby enhancing the model's nonlinear fitting capability. The cell state is updated by combining the weights of the forget gate and the input gate to revise the memory from the previous time step, 1 t t t t t c f c i c (11) In this equation, denotes element-wise multiplication, and the final current cell state tc is obtained, enabling long-range retention of critical historical information. The output gate determines the amount of information output as the current hidden state, expressed as follows: 1 [ , ] t o t t o o W h x b (12) tanh( ) t t t h o c (13) Here, th represents the hidden state at the current time step, serving as the response to the current sequential input and facilitating information flow to subsequent layers. Through the aforementioned gating mechanisms, the LSTM can effectively learn the stage-wise patterns and long-term dependencies in time series data, enabling accurate modeling of the performance evolution process in prediction tasks. 3.2 Transformer Model Although LSTM performs well in sequence modeling, it suffers from gradient decay and low training efficiency when handling long-term dependencies. To address these issues, the Transformer architecture is introduced, which establishes direct connections between any positions within the sequence through a multi-head self-attention mechanism, thereby enhancing the model's ability to capture global dynamics. The core computation in the Transformer is the scaled dot-product attention, defined as follows: ( , , ) softmax T k QK Attention Q K V V d (14) Here, Q, K, and V represent the Query, Key, and Value matrices, respectively, and k d is a scaling factor used to prevent numerical instability. Attention weights are obtained via a softmax operation, enabling a weighted fusion of information across all time steps. Since the Transformer lacks an explicit sequential structure, positional encoding is introduced to preserve temporal order. The fixed sinusoidal positional encoding scheme is employed as follows: model model 2 2 ( ,2 ) sin 10000 ( ,2 1) cos 10000 i d i d pos PE pos i pos PE pos i (15) Here, pos denotes the position index of the current time step, i represents the dimension index, and model d is the embedding dimension. This encoding scheme enables the model to perceive the sequential order, thereby allowing it to capture temporal patterns such as periodicity and trends during modeling. In the proposed model, the Transformer consists of two stacked encoder layers, each comprising a multi-head attention sublayer and a feed-forward neural network. The output is a sequence-level global feature representation, which is subsequently aggregated via average pooling to obtain a fixed-length vector Trans h . 3.3 Feature Fusion and Prediction Output Considering the complementary strengths of LSTM and Transformer in modeling different aspects of sequential data, a feature-level fusion strategy is employed. The hidden representations generated by each sub-network are concatenated to form a unified feature vector for downstream prediction. The fusion process is formulated as follows: ( , ) fusion LSTM Trans h Concat h h (16) Where 64 LSTM h denotes the final hidden state output from the LSTM branch, and 64 Trans h represents the global feature vector obtained by average pooling from the Transformer module. These two components are concatenated to form a 128-dimensional fused vector. This fused representation is then passed through fully connected layers to perform multi-step regression prediction, as defined by the following expression: ˆ out fusion out Y W h b (17) Where 7 ˆY denotes the predicted remaining useful life percentages for the next seven time steps, out W and out b are the parameters of the linear projection. This structure enables the model to perform multi-step forecasting of long-term degradation trends, thereby supporting early warning and precise management of battery life. To achieve multi-step prediction of battery remaining useful life, a parallel forecasting strategy is adopted. Given a fixed-length historical input window, the model performs a single forward pass to directly output the target value sequence over the entire prediction horizon. This approach effectively avoids error accumulation during the prediction process and improves both prediction stability and computational efficiency. 3.4 Loss Function and Optimization Strategy The prediction of the remaining useful life of lithium-ion batteries is essentially a regression task, where the target variable is a continuous percentage value. Therefore, the mean squared error (MSE) is adopted as the loss function for optimization. It is defined as follows: 2 ( ) ( ) 1 1 ˆ N i i MSE i L Y Y N (18) The variable ( ) ˆ i Y denotes the predicted RUL value of the thi sample generated by the model, ( )i Y represents the corresponding ground truth label, and N is the total number of samples. As a widely used regression loss function, MSE penalizes the squared prediction error, effectively reducing the impact of large deviations and improving the robustness of the model predictions. During training, the Adam optimizer is employed for parameter updates. This optimization algorithm integrates the advantages of momentum and adaptive learning rate adjustment, offering fast convergence and flexible parameter tuning. The initial learning rate is set to 3 1 10 , and the total number of training epochs is set to 200. Furthermore, mini-batch gradient descent with a batch size of 32 is used to enhance both the training stability and computational efficiency. 4. Experimental Results and Performance Analysis A comprehensive evaluation was conducted to assess the predictive performance of the proposed LSTM-Transformer hybrid model on the MIT battery dataset, demonstrating its effectiveness and superiority in the task of remaining useful life prediction for lithium-ion batteries. Model training and testing were performed on the publicly available MIT battery degradation dataset. After undergoing data preprocessing and temporal windowing, the dataset was restructured into time series samples with an input sequence length of 30 and an output prediction horizon of 7 steps. The objective was to forecast the battery capacity degradation trend over the next 7 cycles. During training, the Adam optimizer was employed with an initial learning rate set to 0.001, and an Early Stopping mechanism was integrated to prevent overfitting. Architecturally, the LSTM layer captures local temporal dynamics, while the Transformer module exploits its global attention mechanism to model long-term dependencies. The synergistic integration of both enhances the model's capacity to capture the complex degradation behaviors of batteries. 4.1 Model Training Process As shown in Figure 5, the loss curves of both the training and testing sets over 200 epochs illustrate the model's convergence behavior. It can be observed that the loss values decrease steadily with increasing training epochs, particularly during the initial 50 epochs where a rapid drop is evident—indicating efficient convergence. At epoch 50, the training loss decreased to 0.0021 and the testing loss reached 0.0054. Although the training loss continued to decrease afterward, the testing loss plateaued and remained low, reflecting the model's strong ability to avoid overfitting. By the end of training at epoch 200, the training loss converged to 0.0008, and the testing loss stabilized around 0.0038, suggesting strong generalization on unseen data. These training dynamics and test loss results demonstrate the model's stability and robust convergence characteristics. Figure 5. Training and Testing Loss Curves over Epochs 4.2 Performance Evaluation Metrics for Model Prediction To comprehensively and objectively evaluate the performance of the proposed model in battery life prediction tasks, three commonly used regression evaluation metrics are employed: Root Mean Square Error (RMSE), Mean Absolute Percentage Error (MAPE), and the Coefficient of Determination (R²). These metrics assess the prediction performance from three perspectives: absolute error, relative percentage error, and the model's ability to explain variance in the data. Collectively, they offer a robust evaluation framework for assessing the accuracy, stability, and generalization capability of data-driven predictive models. These metrics are also widely adopted in current state-of-the-art regression-based forecasting studies. RMSE, which quantifies the standard deviation of the prediction errors between the predicted and actual values, is defined as follows: 2 1 1 ˆ ( ) n i i i RMSE y y n (19) where iy denotes the actual value of the th i sample, ˆiy represents the predicted value of the th i sample, and n is the total number of samples. A lower RMSE indicates smaller deviations between predicted and true values, reflecting higher model accuracy. MAPE, or Mean Absolute Percentage Error, measures the relative percentage deviation between predicted and actual values. It is defined as follows: 1 ˆ 1 100% n i i i i y y MAPE n y (20) This metric intuitively reflects the percentage error of the predicted values relative to the true values, making it suitable for comparing prediction accuracy across different scales. The coefficient of determination R² is used to measure the goodness of fit of the model and is defined as follows: 2 2 1 2 1 ˆ ( ) 1 ( ) n i i i n i i y y R y y (21) Here, y denotes the mean of all true values. The coefficient of determination R² ranges from 0,1 , where values closer to 1 indicate a higher degree of model fit and greater explained variance. 4.3 Model Performance Analysis The final evaluation results of the model's predictive performance are as follows: RMSE of 0.0085, MAPE of 0.0200, and an R² of 0.9902. These results demonstrate that the model exhibits excellent capability in fitting accuracy, error control, and capturing the variation trends of the target variable. To more intuitively illustrate the model's predictive effectiveness, Figure 6(a) presents a scatter plot comparing the predicted values with the true values. It can be observed that the majority of scatter points are densely clustered around the reference line, indicating a high consistency between the model's predictions and the actual battery life across different samples. This tightly concentrated scatter distribution not only validates the high R² value but also indirectly suggests that the model does not suffer from significant underfitting or overfitting, thereby demonstrating strong generalization ability. Figure 6(b) shows the comparison between the predicted and true trajectories on the test set. By forecasting future states over consecutive time steps, it is evident that the proposed LSTM-Transformer hybrid model can effectively fit the target trend, with predictions closely matching the real values and no notable deviations. This result indicates that the model possesses reliable short-term predictive capability, effectively adapting to the nonlinear gradual degradation characteristics of the battery state sequence, thus meeting the engineering requirements for remaining useful life prediction. (a) (b) Figure 6. (a) Scatter plot of predicted values versus true values (b) Comparison of predicted and true trajectories during the training process For further analysis of prediction errors, Figure 7(a) presents the frequency distribution histogram of the model's prediction residuals, aiming to reveal whether there exist systematic biases or outliers in the errors. As observed, the residuals roughly exhibit a symmetric bell-shaped distribution, with most errors concentrated near zero, indicating that the overall prediction errors are small and unbiased. This statistical characteristic of the error distribution suggests that the prediction deviations mainly arise from minor perturbations inherent in the data rather than from systematic errors caused by the model structure. Additionally, the absence of long tails or skewness in the residual distribution further confirms the stability and consistency of the model's predictions. Figure 7(b) depicts the temporal variation of prediction errors for all samples in the test set during the prediction process. Overall, the prediction errors fluctuate slightly without persistent systematic bias, demonstrating that no significant underfitting or overfitting occurred during training. (a) (b) Figure 7. (a) Histogram of model residual frequency distribution (b) Temporal variation of prediction errors To more clearly present the comparative effects of the evaluation metrics, Figure 8(a) visualizes the three core performance indicators RMSE, MAPE, and R² using a bar chart. It can be intuitively observed that all metrics fall within excellent ranges: RMSE approaches zero, MAPE is well below the commonly accepted 5% tolerance threshold for predictive models, and R² significantly exceeds the benchmark of 0.9 for strong model fit. This visualization not only facilitates a comprehensive and balanced demonstration of the model's performance but also enables straightforward comparison with traditional models, providing important references for subsequent optimization studies. (a) (b) Figure 8. (a) Bar chart of evaluation metrics for the LSTM-Transformer model (b) Comparison of RMSE performance among different models In summary, both quantitative metrics and visual analyses demonstrate that the proposed LSTM-Transformer hybrid model exhibits high accuracy, robustness, and interpretability in the battery remaining useful life prediction task. The model achieves satisfactory results not only in individual metric performance but also in fitting overall degradation trends and controlling prediction errors, providing strong empirical support for multimodal fusion approaches targeting complex time-series forecasting problems. Moreover, these outcomes lay a solid foundation for the model's future application in practical engineering scenarios. To further validate the performance advantages of the proposed LSTM-Transformer model in lithium-ion battery RUL prediction, several representative benchmark models were selected for comparative experiments, as shown in Figure 8(b). Their prediction accuracies on the same dataset, expressed by RMSE values, are summarized in Table 1. The AUKF_GASVR model, which integrates particle filtering with nonlinear regression, and the deep learning-based MC-LSTM model achieved RMSEs of 0.0134 and 0.0168, respectively, yet both still suffered from relatively large fitting errors. With the introduction of attention mechanisms and ensemble learning strategies, model performance further improved. For instance, the Bi-LSTM-AM model, combining bidirectional sequence modeling with attention mechanisms, reduced the RMSE to 0.0106; the FBA-XGBoost-LSTM model, leveraging feature enhancement and deep network integration, compressed the error to 0.01003, demonstrating strong learning capability. However, among all compared models, the proposed LSTM-Transformer model achieved the best overall performance with an optimal RMSE of 0.0085, indicating a significant accuracy advantage. These results fully demonstrate that the LSTM-Transformer hybrid model effectively integrates LSTM's strength in capturing local temporal dynamics with Transformer's ability to extract global dependency features in time-series modeling. This synergy enables a more comprehensive learning of the complex mechanisms underlying battery life evolution, yielding higher accuracy and robustness, making it a highly efficient modeling solution for current RUL prediction tasks. Table 1. Performance comparison of different models. Model RMSE AUKF_GASVR[21] 0.0134 MC-LSTM[22] 0.0168 Bi-LSTM-AM[23] FBA-XGBoost-LSTM[24] LSTM-Transformer 0.0106 0.01003 0.0085 5. Conclusion With the widespread adoption of electrification and intelligent systems in transportation, energy storage, and industrial control, health management and remaining useful life prediction of lithium-ion batteries have become critical tasks to ensure system safety and operational efficiency. Addressing key challenges such as the difficulty of RUL prediction under nonlinear and complex degradation mechanisms during battery operation, this work constructs a hybrid deep learning model that integrates Long Short-Term Memory networks with Transformer architecture based on the publicly available MIT battery dataset. The model aims to enhance prediction accuracy, stability, and generalization capability. This study centers on the theme of "high-dimensional sequence modeling and deep fusion prediction." First, in data preprocessing and feature engineering, raw sensor data including battery temperature and multi-channel voltages were systematically processed. Feature dimensionality reduction, principal component analysis, and physics-informed temperature-derived feature design were conducted to construct interpretable input variables such as temperature mean, temperature difference range, and temperature fluctuation. Meanwhile, unified data normalization techniques, including Z-score standardization and Yeo-Johnson transformation, were applied to improve the model's capability to handle multi-scale and heterogeneous distributions. Furthermore, to align with sequence prediction tasks, a sliding window method was employed to reconstruct time series samples, effectively embedding local temporal dynamics. Second, at the model design level, this work proposes a fusion modeling approach combining LSTM and Transformer structures. The LSTM module leverages gating mechanisms to accurately capture short-term fluctuations and long-term dependencies, which is well-suited for modeling dynamic sequences exhibiting continuity and staged features during battery degradation. The Transformer module employs multi-head attention and positional encoding to model global dependencies across the entire input sequence, enhancing expressiveness under long-sequence conditions. By concatenating and fusing the feature vectors from both modules, a multimodal prediction model capable of simultaneously capturing local temporal dynamics and global structural variations was constructed. In terms of training optimization and performance evaluation, a training framework based on mean squared error loss and the Adam optimizer was established, achieving stable convergence after 200 iterations. Evaluation on the test set demonstrated that the proposed LSTM-Transformer model attained an RMSE of 0.0085, MAPE of 0.0200, and R² of 0.9902, significantly outperforming conventional single deep learning models. Residual distribution analysis and visualization of prediction results further validated the model's strong ability to capture battery degradation trends, robust performance, and lack of systematic bias, indicating substantial potential for engineering applications. However, the proposed method still faces certain limitations in practical deployment, such as its reliance on high-quality sensor data and insufficient transferability across different usage scenarios. Future research will focus on enhancing the model's adaptability to multi-source data and exploring strategies that integrate online learning with few-shot learning to improve its practicality and robustness. In summary, the proposed LSTM-Transformer fusion prediction model exhibits high accuracy and stability in battery RUL forecasting. It provides effective technical support for the development of next-generation intelligent battery management systems, with promising prospects for practical engineering deployment and broader adoption.
Keywords: lithium-ion battery, Remaining useful life, LSTM, transformer, Time-seriesprediction
Received: 27 Aug 2025; Accepted: 13 Oct 2025.
Copyright: © 2025 Zhao, ZHANG, Wang, Feng, Cao and Wang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Yanhui ZHANG, zhangyh@siat.ac.cn
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.