Evaluation of the State of Health of Lithium-Ion Battery Based on the Temporal Convolution Network

The state of health (SOH) of lithium-ion batteries is an important part of the battery management system (BMS). Accurately grasping the SOH of the lithium-ion battery will help replace the battery in time, to avoid accidents. Aiming at the problems of complex BMS management and high calculation cost caused by too many inputs/attributes, this study used feature engineering to mine the higher temperature variety rate associated with degraded capacity as the input of temporal convolutional networks (TCNs) and SOH as the output to establish the TCN model. On this basis, three lithium-ion batteries, namely, as B0005, B0007, and B0018 are verified, and the mean absolute error (MAE) and root mean square error (RMSE) of predicted SOH are not more than 1.455% and 1.800%, respectively. To further obtain the uncertain expression of predicted SOH, this study adopts the sampling method to obtain the confidence interval of lithium-ion battery SOH prediction results.


INTRODUCTION
The lithium-ion battery is favored by people because of its advantages of high energy density, long service life, high stability, and moderate price. It has the highest degree of commercialization and is widely used in aerospace, electric vehicles, and smart grid. However, frequent fires and explosions make people gradually aware of the aging problem behind battery safety (Zhang and Lee, 2011;Li et al., 2021a). State of health (SOH) is an important indicator in the battery management system (BMS). By accurately grasping SOH, you can choose the time to replace the battery or change the charging strategy (Wang et al., 2022) to prolong the battery life. Therefore, accurate estimation of SOH is particularly important for control strategy formulation, operation, and maintenance (Meng et al., 2020).
There are two main methods for the evaluation of SOH: model-based and data-driven. The former includes the mechanism model, equivalent circuit model, and empirical model (Liu et al., 2017), which have the advantages of accuracy, simple modeling, and strong robustness, respectively, but they also have the problems of modeling difficulty (Ma et al., 2019), poor dynamic characteristics, and insufficient accuracy . The data-driven method based on statistics and artificial intelligence algorithms (Pang et al., 2014) does not need modeling. The accuracy of prediction results depends on feature engineering and algorithm selection. It has strong adaptability, so it is more widely used. At present, SOH is generally defined as follows (Zhang and Lee, 2011): where C N is the rated capacity and C τ is the maximum capacity that can be charged/discharged. As a direct health factor, C τ needs to be obtained using the ampere-hour integral method, which has the disadvantages of being time-consuming and with low accuracy . Therefore, it is necessary to explore the relevant parameters (indirect health factors) that can indirectly reflect the battery performance to realize the evaluation of SOH. Because the electrochemical parameters are not easy to obtain, it is necessary to extract the characteristics with the help of external physical quantities such as voltage, current, and temperature in the working process of the battery. On this basis, the machine learning algorithm is used to evaluate the SOH. Song et al. (2020) and Lu et al. (2020) extracted some features from the capacity increment curve and studied the battery SOH with the help of Gaussian process regression and artificial neural network, respectively. Zhang et al. (2021a) extracted the average voltage, voltage difference, current difference, and temperature difference and evaluated the SOH of lithium-ion battery using XGBoost. Li et al., (2021b) established the SOH evaluation model by taking charging capacity, charging time, average charging temperature, average charging voltage, discharge temperature, and average discharge voltage as the input and discharge capacity as the output of the LSTM model. Li et al. (2020) directly used the voltage data during charging as the input of the GRU model to evaluate SOH. Shen et al. (2021) and Orchard et al. (2015) took voltage, current, and temperature in each cycle as inputs and used the CNN to establish the SOH prediction model. However, a large number of input attributes or input data greatly increase the complexity and computational cost of battery management (Olivares et al., 2013). In this study, through feature engineering, the temperature variety rate with the highest correlation with health status is mined as the input of the SOH evaluation model to predict SOH. However, a small number of characteristics or samples are difficult to reflect the capacity regeneration phenomenon in the aging process of the lithium-ion battery, which affects the accuracy of SOH evaluation [capacity regeneration phenomenon (Widodo et al., 2011;Bai et al., 2018) refer to the phenomenon that there is a certain interval between charging and discharging cycles, resulting in the increase of the maximum available capacity of the battery]. The recurrent neural network can track the capacity decline trend through the learning of time series, but at the same time, there is the problem that the prediction result lags behind the actual result. While the time convolution network (TCN) (Li et al., 2019) can also be used in time series, there is no lag problem and it has a better prediction effect. In addition, most studies only focus on the accuracy of the proposed algorithm, but for BMS, it is also  necessary to obtain uncertain expression. Therefore, based on the TCN model, this study obtains the expression of uncertainty with the help of sampling.

EXTRACTION OF INDIRECT HEALTH FACTORS
Indirect health factors refer to the characteristics that are strongly related to aging and can characterize the SOH of lithium-ion batteries. Many studies mostly use equal voltage drop discharge time Gou et al., 2020), equal voltage differential charging time, or discharge voltage sample entropy (Zhang et al., 2021b). However, the determination process of these health factors is complex and cumbersome. Therefore, it is necessary to select other factors with a simple and reliable structure. Figure 1 shows the discharge temperature curve and voltage curve of the B0005 lithium-ion battery in the NASA data set. Through observation, it is found that the temperature curve and voltage curve change feebly in the [1,000 and 2,000] range, and with the deepening of the aging process, the slope of this part of the segment decreases and increases synchronously. Therefore, the temperature variety rate and voltage variety rate of the curve segments can be used as candidate indirect health factors. In addition, it can also be observed from Figure 1 that with the deepening of battery aging, the maximum discharge temperature also shows a synchronous upward trend. Due to some differences in the initial temperature, the temperature range is used as a candidate indirect health factor.

Indirect Health Factor Time Series
The voltage variety rate, temperature variety rate, and temperature range are, respectively, in the following forms: where v 2000 and v 1000 are the voltage at the 2000s and 1000s in the i-th discharge process, respectively, T 2000 and T 1000 are the temperature at 2000 s and 1000 s in the i-th discharge process, respectively, T max and T min are the maximum and minimum temperature in the i-th discharge process, respectively, and i is the discharge cycle in the whole aging process. Therefore, the time series of indirect health factors can be expressed as follows:

Correlation Analysis
Zhang et al. (2019) used the gray correlation method to analyze the correlation between candidate indirect health factor series and capacity series, which is cumbersome. This study uses the Pearson correlation coefficient method (Zhou et al., 2013) for research.
Thus, the correlation between candidate indirect health factor sequences and capacity sequences during discharge in NASA data sets B0005, B0006, B0007, and B0018 can be obtained.
It can be seen from Table 1 that the temperature change rate has the highest correlation and the visual effect is shown in Figure 2. Therefore, the temperature change rate is finally selected as an indirect health factor.

TCN ALGORITHM THEORY
To resolve the traditional neural network not being able to solve the timing constraints of time series, a recurrent neural network (RNN) was proposed. However, with the increase in data scale, the problems of gradient disappearance and gradient explosion may occur. To overcome this disadvantage, long-short-term memory (LSTM), gated recurrent unit (GRU), and other methods have been derived based on the RNN. Although LSTM and GRU show better performance than the RNN in memory and accuracy, the advantage of "infinite memory" does not exist. Li et al. (2019) point out that convolution structure is superior to recurrent neural networks in tasks such as audio synthesis and machine translation. On this basis, the TCN framework is explored. This framework is superior to recurrent neural networks in memory ability and accuracy, which provides a novel idea and direction for the solution of time-series problems. The TCN includes onedimensional causal convolution, extended causal convolution, and residual connection.

One-Dimensional Causal Convolution
One-dimensional convolution in the TCN, such as twodimensional convolution, still has the characteristics of weight sharing. Given a time series (X t , Y t ), the time series is transformed into the input and output of supervised learning. When the input X t and output Y t form a single channel, the output at any time t depends on the sub input sequence with the same convolution kernel length including the current time, that is, causal convolution, as shown in Figure 3 in the following part. When the convolution kernel length is k 3, then y t x t−2 · w 1 + x t−1 · w 2 + x t · w 3 . In addition, TCN default step size stride 1, that is, the input sub time series moves one step at a time.

Dilatory Causal Convolution
The deep network can be obtained by superposition based on a one-dimensional causal convolution network, but the increase of the receptive field is obtained at the expense of the depth of the network, so it is difficult to deal with long historical data. To solve this problem, TCN proponents refer to the dilatory causal convolution in the WaveNet model (Ding and Jia, 2019). The definition of extended causal convolution is given as follows: where f: {0, . . . , k − 1} is the convolution kernel and d is the dilation factor (b is usually taken as 2). The operation is realized using the dilation factor (default 2), which increases the receptive field w on the premise of fewer network layers.  Note that unlike the end of life (EOL) standard (Sun et al., 2021) defined in IEEE1188-1996, NASA, defines EOL, as 70%, that is, 1.4 Ah. Capacity degradation curves for B0005, B0007, and B0018 is shown in Figure 5.
where b is the base dilation factor, n − 1 is the number of network layers, and i is the number of network layers before the current layer.

Residual Connection
It can be seen from the aforementioned text that through the selection of convolution kernel size and base expansion factor, a relatively small number of network layers can be used to achieve a large receptive field. However, despite the abovementioned operations, the network will be very deep. Therefore, TCN proponents introduced a residual block structure similar to Resnet. As shown in Figure 4, the residual block includes a two-layer convolution network and nonlinear mapping. To normalize the input of the hidden layer and offset the problem of gradient explosion, weight normalization is applied to each convolution layer. However, the aforementioned architecture can only realize complex linear regression, and nonlinearity needs to be introduced by adding the activation function. In addition, to prevent overfitting and gradient disappearance and accelerate model training, regularization is introduced by dropout after each convolution layer of each remaining block. During the jump connection, because the number of channels in the input layer and the output layer may be inconsistent, the output and input cannot be added directly. To solve this problem, 1 can be introduced × 1 convolution to ensure that the two tensors have the same shape.
o Activation(x + F(x)), where x is the input of the network and the function F(x) is the residual mapping to be learned. Through the residual connection, the problems of gradient disappearance and gradient explosion can be effectively alleviated, and the degradation of the model can be avoided.

Simulation Environment
The experimental analysis model in this study is based on Python 3 6. The experiment was carried out on the Dell Notebook. The relevant configurations are as follows: the system model is Inspiron 5488, the GPU is NVIDIA GeForce mx250, the CUDA version is 10.1, and the Pytorch version is 1.7.1.

Data Sources
The data set was provided by the NASA Ames prediction Excellence Center (Lin et al., 2021). On the accelerated life test platform developed by NASA, a 18650 lithium cobalate battery with a rated capacity of 2 Ah was used to carry out reference charge and discharge tests according to different test configuration files. First, for the reference charge test, the battery was charged to the charge cut-off voltage at constant current and then charged to the cut-off current at a constant voltage. For the reference discharge test, the battery should be discharged at a constant current until the voltage is below the discharge cut-off voltage. The dataset files in the mat format include two parts: one is the voltage, current, temperature, and time in the test; and the other is the capacity estimated using the Coulomb counting method (Lee et al., 2020). The discharge test conditions are shown in Table 2 below.

Data Preprocessing
In this study, "charge-discharge" is regarded as a cycle. It should be noted that there are some wrong data in the NASA data set. For example, in the 310th and 313rd operations of B5, the charging record is missing, resulting in repeated discharge. For this phenomenon, the records of repeated charging or discharging operations are deleted (Zhao et al., 2021). Therefore, the cycle times of B5, B6, B7, and B18 are 167, 167, 167, and 132, respectively. Then, the time series comprising temperature change rate and SOH is transformed into samples required for supervised learning according to a certain window (is set to 8), and then the samples are divided into the training set, verification set, and test set according to different prediction starting points. If the starting point is set to 90, the training set is 73 samples generated from the data of the first 80 discharges. The validation set is 10 samples immediately after the training set, and the test set is the remaining samples. In addition, at the beginning of model training, the Z-score method should be used to standardize the data.

Model Parameter Configuration
The model parameters are configured in Table 3, as follows: In addition, to stop training in and obtain a better model, early stop technology is also adopted.

Prediction and Evaluation Indicators
For the prediction effect of SOH, in addition to the average absolute error MAE (mean absolute error) and root mean error RMSE (root mean square error), the absolute error AE (absolute error) between the predicted value of EOL and the real value is used as the evaluation index.
AE cycle EOL − cycle EOL , where y m and y m are the predicted and true values of the MTH discharge, respectively and cycle EOL and cycle EOL are the cycles from the predicted starting point to the predicted value or the true value reaching EOL, respectively.

Prediction Results of Different Algorithms and Prediction Starting Points
To better illustrate the rationality of the selected health factors and algorithms, it is further compared with LSTM, GRU, and CNN-LSTM algorithms, and the prediction effect is shown in Table 4 below. Since B0018 has prominent capacity regeneration and there are fewer samples to be predicted than other batteries, it has a better visual display effect. Therefore, taking B0018 as an example, Figures 6, 7 show the SOH prediction effects under different algorithms and different prediction starting points, respectively. It can be seen from the table that the performance of the TCN in B0007 and B0018 is significantly better than that of LSTM, GRU, and CNN-LSTM. For B0005, when the prediction starting point is 90, the MAE and RMSE of the prediction results of the TCN algorithm are greater than those of other algorithms, but AE is less than that of other algorithms, indicating that the TCN can better predict the SOH of the discharge process before EOL, and there is a large deviation in the later stage, resulting in larger MAE and RMSE. Therefore, in general, the effect of the TCN is significantly better than that of other comparison algorithms involved in this study.
Since the effect of the TCN algorithm is obviously better than that of other algorithms, only the SOH prediction effects of the other two batteries under different prediction starting points are shown in Supplementary Figures SA1, SA2.

Expression of Uncertainty
The aforementioned prediction process is point estimation. However, the prediction of SOH should also include confidence interval, that is, uncertainty expression. In this study, the confidence interval of SOH in each discharge process is obtained by 100 repeated simulations. Still B0018 is taken as an example. When the starting point of prediction is 90, it can be seen from Figure 8 that the real value can better fall within the prediction confidence interval between [91,134]. For the visual display effect of B005 and B007, please refer to Supplementary Figures SB1, SB2). The average value of standard deviation in the confidence interval is 0.096% and 0.153%. Similarly, Table 5 also gives the confidence interval, mean value, and maximum value of standard deviation for other batteries and prediction starting points. Through comparison, it is found that when the prediction starting point is set to 90, the reliable prediction interval is the largest, and the mean value and maximum value of standard deviation are also small. Therefore, the prediction starting point should be set to 90.

CONCLUSION
Based on the NASA public data set, a novel convolutional neural network is used to evaluate the SOH of lithium-ion batteries by using the temperature variety rate of indirect health factors mined by feature engineering. On this basis, the uncertainty expression of SOH evaluation is given by sampling. The verification results show that the extracted health factors are simple and feasible and the algorithm has high accuracy. Therefore, the method proposed in this study has high practical value.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material; further inquiries can be directed to the corresponding author.

AUTHOR CONTRIBUTIONS
DZ: conceptualization and writing-review and editing; WZ: writing-original draft preparation and investigation; LW: supervision; XC: conceptualization and resources; XL: writing-review and editing and software; PW: supervision.