A Hybrid Data-Driven Method to Predict Battery Capacity of Medical Devices and Analyze Component Effects

Lithium-ion batteries are currently the most utilized power source in medical devices due to their long service life, high energy performance, and being portable. The performance of battery-powered medical devices is heavily dependent on battery capacity, which would be directly affected by related battery component parameters. To widen the application of battery-powered medical devices, it is vital to effectively monitor battery capacity and analyze the effects of battery component parameters. This article derives a hybrid data-driven method to achieve accurate early predictions of battery capacity and reliable analysis of battery component effects. To be specific, a Gaussian process regression-based data-driven model is first developed to efficiently capture the underlying fitting among four component parameters and battery capacity. Then two effect analysis tools including the automatic relevance determination kernel-based weights and tree-based local interpretable model-agnostic explanation are equipped to quantify and analyze both global and local effects of these four component parameters, respectively. Illustrative results show that the designed hybrid data-driven method is able to provide accurate battery capacity predictions with 0.97 R 2 , while both global effects and local effects of four component parameters are successfully quantified. Due to the merits of data-driven characteristics, the designed hybrid data-driven method is capable of efficiently helping users to monitor/predict battery capacity and analyze/understand the effects of interested component parameters. This could further benefit battery-powered medical devices for higher-performance and longer-lifetime applications.


INTRODUCTION
Lithium-ion (Li-ion) batteries are one of the most popular power sources in medical devices owing to their advantages of long service life, high energy performance, and being portable (Li et al., 2021a;Liu et al., 2022a). The property of the battery such as capacity plays a vital role in affecting battery-powered medical devices' performance and will be influenced by related component parameters. In light of this, to improve battery operational performance and widen the applications of battery-powered medical devices, it is crucial to monitor/predict battery capacity and analyze/quantify the effects of corresponding component parameters simultaneously (Li et al., 2021b).
However, Li-ion batteries belong to a complicated power source involving numerous chemical, mechanical, and electrical dynamics during their operations (Wu et al., 2019;Chen et al., 2021). To date, the widely used approaches to analyze and understand how component parameters affect battery performance especially for its capacity are mainly based on trial and error solutions, which usually lead to huge cost and time consumption (Li et al., 2016;Yang et al., 2021). Therefore, developing the proper approach to not only predict the battery capacity of medical devices and analyze the corresponding component parameters is a challenging but important task for widening battery-powered medical devices.
With the quick development of data science, artificial intelligence, and cloud platform, data-driven approaches have become a promising and powerful tool in the field of Li-ion battery management (Lucu et al., 2018;Li et al., 2019;Liu et al., 2022b). Specifically, lots of data-driven approaches are derived for estimating the internal states of batteries (Feng et al., 2020a;Feng et al., 2021;Shi et al., 2021;Tang et al., 2021), predicting battery aging trajectories (Hu et al., 2022a;Hu et al., 2022b;Liu et al., 2022c) and remaining useful life (Liu et al., 2020a;Ren et al., 2020;Hu et al., 2021), balancing battery cells (Feng et al., 2020b;Liu et al., 2020b), performing effective battery charging (Liu et al., 2017;Xie et al., 2020), and energy management Wang et al., 2022;Xie et al., 2022;Zhang et al., 2022). In summary, according to the well-designed data-driven approaches, reliable management could be achieved to improve battery operational performance. However, these approaches are mainly related to battery macro-dynamic rather than micro-dynamic such as its component parameters. Currently, there are still limited research focuses on the analysis of battery component parameters by designing related datadriven methods. For example, according to the cross-industrial standard process, a neural network-based data-driven method is proposed in the study by Schnell et al. (2019) to analyze the dependency between battery component parameters. After using four parameters from battery mixing and coating processes, a Gaussian process regression (GPR)-based data-driven method is developed in the study by Liu et al. (2021a) to predict battery electrode mass loading and analyze the effects of these parameters. To handle the imbalance issue during battery production, an RUBoost-based data-driven method is derived in the study by Liu et al. (2021b) to classify the quality of battery and analyze formulation components. In real battery-powered medical device applications, it should be known that the component parameters will be crucial for determining and influencing battery property, especially for its capacity. To improve the performance of battery-powered medical devices, it therefore becomes necessary to develop an efficient data-driven method to predict battery capacity and analyze how related component parameters would influence the battery capacity.
Based upon the above discussion, to benefit battery-powered medical devices, a hybrid data-driven method is designed in this study to accurately predict battery capacity and analyze both global and local effects of component parameters of the corresponding battery. Several contributions are made as follows: 1) a GPR-based data-driven model is developed to perform accurate battery capacity prediction by using four battery component parameters as the input terms; 2) after equipping the automatic relevance determination (ARD)-based kernel structure, the global effects of these four battery component parameters are quantified and analyzed based on the ARD-based weights; and 3) after equipping tree-based local interpretable model-agnostic explanation (LIME), the local effects of these component parameters are quantified and analyzed for four randomly selected sample points. Due to the pure data-driven nature, the developed hybrid data-driven method can accurately predict battery capacity at the early prediction stage and analyze both global and local effects of corresponding component parameters, generating an effective way to well monitor battery capacity and understand the related component parameters, further benefitting the performance of battery-powered medical devices.
The rest of this article is organized as follows: Both battery key component parameters and related capacity dataset are described in Section 2. The GPR-based data-driven model, two effect analysis tools including ARD-kernel-based weights and treebased LIME, and related performance indicators are described in Section 3. Section 4 then provides and discusses the results of both battery capacity predictions and component effect analyses. Section 5 summarizes this study finally.

KEY BATTERY COMPONENTS AND RELATED CAPACITY DATASET
As the main power source to supply energy for numerous medical devices, Li-ion batteries are usually composed of some components including the positive electrode, negative electrode, and electrolyte (Ayerbe et al., 2021;Niri et al., 2021). It should be known that these component parameters play an important role in affecting battery properties such as its capacity, which would then determine the performance of related medical devices. Therefore, in order to ensure the effectiveness of battery-powered medical devices, battery capacity dynamics require to be carefully monitored and how battery component parameters affect battery capacity must be well analyzed (Liu et al., 2022d).
For the battery within medical devices, it generally consists of several electrode component parameters, as shown in Figure 1. To be specific, the battery electrode usually contains the active material components, electrode additive components, and polymeric binder components. For real medical device applications, LTO is usually selected as a widely utilized active material as it has the merits of being nontoxic and adaptive to complicated conditions such as high temperatures and large currents. In addition, electrode additive is another important component parameter for Li-ion batteries (Liu et al., 2022e). In order to increase the intrinsic electronic conductivities of battery electrodes, several conductive fillers such as carbon-black as well as carbon-nanofiber are generally needed within Li-ion batteries. Moreover, to further increase batteries' mechanical cohesion, polymeric binders are also required. In many battery-powered medical device applications, three types of polymeric binders including polyvinylidene fluoride (PVDF), polyethylene co-ethyl acrylate co-maleic anhydride (TPE), and hydrogenated nitrile butadiene rubber (HNBR) are adopted as they have the advantages of presenting exceptional chemical stabilities and efficient binding properties. All these component parameters are crucial for determining battery electrode properties such as thickness and electronic conductivities, further affecting battery performance, especially for its capacity, which must be carefully monitored and analyzed in battery-powered medical device applications.
In light of this, to ensure the effectiveness of battery-powered medical devices, it is necessary to develop a suitable solution which can not only perform accurate battery capacity prediction but also analyze the effects of these key component parameters on battery capacity dynamics. In this study, a hybrid data-driven method through equipping the GPR data-driven model with effect analysis tools is designed to predict battery capacity and quantify both global as well as local effects of component parameters. To ensure model training effectiveness, the well-proven data (Rynne et al., 2019) from Hawaii Natural Energy Institute Franco are utilized. More information regarding these data and how to carry out an experiment for generating this dataset are referred to (Liu et al., 2021c) for the interested reader. In this study, four basic battery component parameters including LTO-based active materials (LTO) with a formulation weight from 75% to 95%, C65-based carbon black (C65) with a formulation weight from 0% to 20%, CNF with a formulation weight from 0% to 10%, and binders with a formulation weight from 3% to 20% are utilized. To obtain the capacity data of the corresponding battery, the coulomb-counting approach with a C/25 current rate is utilized.

HYBRID DATA-DRIVEN METHOD
In this section, the Gaussian process regression (GPR)-based model is first introduced for battery capacity prognostics. Then the effect analysis tools including automatic relevance determination (ARD) kernel weight and tree-based local interpretable model-agnostic explanation (LIME) are derived to analyze the global and local component effects, respectively. Afterward, several performance indicators are given to evaluate the performance of battery capacity prediction via the developed data-driven model.

Gaussian Process Regression
According to Bayesian theory, GPR is able to give a Gaussian process for non-parametric regression (Tagade et al., 2020;Liu et al., 2021c), whose probability distribution can be described by a mean function m(x) and a kernel function k(x, x′) as follows: Here, m(x) is generally set to zero for simplifying computation. For a prediction, the output's prior distribution can be described by ( 3 ) Assuming training dataset x and testing dataset x′ present similar Gaussian distributions, then the testing output y′ could show a joint prior distribution with training output y (Williams and Rasmussen, 2006): According to this joint prior distribution, predicted output y′ corresponding to inputs x′ can be calculated by computing the conditional distribution p(y′|x, y, x′) as where y′ is the mean predicted values, while cov(y′) reflects its variance values.
It can be seen that kernel function k(x, x′) plays an important role in determining GPR's performance and must be carefully designed. In this study, to ensure the effectiveness of GPR in battery capacity prediction, three kernel functions are explored.
The first one is a classical kernel function called squared exponential (SE) kernel. k SE x, x′ where σ SE and σ m are its two hyperparameters for determining the amplitude and length of the SE kernel function. In general, the SE kernel function is easy to cause smooth distribution. To further enhance the performance of GPR in fitting nonlinear relations, another two classical kernel functions including Mater5/2 kernel function k M5/2 (x, x′) and quadratic kernel function k Quadratic (x, x′) could be used as follows: where σ M5/2 and σ Q are hyperparameters to determine Matern5/2 kernel and quadratic kernel functions' amplitudes, respectively. σ m is the hyperparameter to determine their length.

Effect Analysis Tools
After developing a GPR-based data-driven model for battery capacity prediction, to further quantify and analyze the effects of corresponding component parameters, two data-driven-based effect analysis tools including the ARD kernel weight and local interpretable model-agnostic explanations need to be adopted.

ARD Kernel Weight
To equip the GPR model with the capability of quantifying the global effects of parameters of interest, the classical kernel functions including SE kernel, Matern5/2 kernel, and the quadratic kernel can be enhanced with the ARD structures (Zhao et al., 2018) as follows: Obviously, compared with classical kernels, the ARD structure-based kernels have an individual hyperparameter for each input term. The values of these individual hyperparameters could reflect how important this input term will affect the prediction results (Zhao et al., 2018). In light of this, the global effects of component parameters on battery capacity can be quantified directly by using the ARD kernel-based GPR model.
Theoretically, a larger hyperparameter σ i (i LTO, Binder, C65, CNF) would lead to a smaller ARD kernel weight, which indicates the lower global effect on the predicted battery capacity output.

Local Interpretable Model-Agnostic Explanation
After using ARD kernel weight to quantify the global effects of component parameters, to further analyze the local effects of these parameters on the specific samples, an effective datadriven tool named the local interpretable model-agnostic explanation (LIME) (Zafar and Khan, 2021) will be adopted. According to the developed GPR-based datadriven model, a detailed procedure to equip LIME for local effect analysis is illustrated in Table 1.
Algorithm 1. Detailed procedure to equip LIME for local effect analysis of component parameters.
In summary, based upon four key processes, LIME is able to quantify the local effect of a sample point as follows: 1) generating several new samples around the interested sample S, as shown in step 3; 2) performing capacity prediction of these generated samples based on the developed GPR-based data-driven model, as illustrated in step 4; 3) constructing a local prediction model by using the generated samples and predicted points from GPR-based datadriven model, as shown in line 5; and 4) using the coefficients from this local prediction model to quantify the effects and importance of component parameters on the capacity prediction from the GPR-based data-driven model.

Data-Driven Structure
Based upon the effective data-driven tools mentioned earlier, to well predict battery capacity and analyze the effects of battery component parameters of interest, a hybrid data-driven structure is developed by equipping a GPR-based data-driven model, ARD kernel-based weights, and tree-based LIME, as illustrated in Figure 2.
To be specific, the input terms of this data-driven structure are four battery component parameters including LTO, binder, C65, and CNF, while the output term of this structure is the corresponding battery capacity. After well training GPR-based data-driven model by minimizing its negative log marginal likelihood, the ARD kernel-based weights can be used to reflect the global effects of these four component parameters on battery capacity. Through equipping the tree-based LIME, the local effects for some specific sample points can also be quantified and analyzed.

Model Performance Indicators
To directly reflect the battery capacity prediction performance of GPR-based data-driven model, the following typical performance indicators (Šeruga et al., 2021) are used: 1) Mean absolute error (MAE): Supposing T is the total number of all samples, Y t stands for the real battery capacity value, whileŶ t is the predicted capacity value from the GPR-based data-driven model, MAE can be expressed as follows: 2) Root mean square error (RMSE): as another typical performance indicator, RMSE is calculated by 3) R 2 : let Y reflect the average value of all predicted battery capacities, R 2 can be calculated by It should be known that for real battery capacity prediction, when the predicted values get close to the actual values, MAE and RMSE would become close to 0, while R 2 will reach 1.

RESULT AND DISCUSSION
In order to evaluate the capacity prediction performance of using a GPR-based data-driven model, this section first presents and discusses the battery capacity prediction results via three different ARD kernels. Then the tests using ARD kernel-based weights and tree-based LIME are carried out to quantify the local effect and global effect analyses of all four battery components, respectively.

Capacity Prediction via GPR
We first focus on the battery capacity predictions by using four component parameters as inputs to the GPR-based datadriven model with three different ARD kernels. After performing six-fold cross-validation, the capacity prediction results of using SE-based GPR, Matern5/2-based GPR, and quadratic-based GPR are illustrated in Figure 3, while their corresponding performance indicators are illustrated in Table 2. According to Figure 3, it can be seen that all three GPR-based data-driven models can capture most of battery capacity sample points, indicating the effective performance of ARD-based kernel functions. To be specific, SE-based GPR presents the worst prediction results with 4.52 mAh RMSE and 0.91 R 2 value, which is 57.5% and 5.2% worse than those from Matern5/2-based GPR. In comparison, quadratic-based GPR presents the best results for battery capacity prediction, whose RMSE and R 2 values are 2.43 mAh and 0.97, respectively, which are 15.3% and 1.1% better than those of Matern5/2based GPR. It can be concluded that four battery component parameters (LTO, binder, C65, and CNF) and battery capacity present strong nonlinear relations as the smooth SE kernel cannot well capture their underlying mapping. By using nonsmooth kernel functions like Matern5/2 and quadratic kernels, the capacity prediction performance can be improved.
To further explore these battery capacity prediction results using GPR-based data-driven models, the prediction versus true plots for all GPR-based data-driven cases are illustrated in Figure 4. In theory, the farthest observation in the prediction versus true plots could make the prediction line toward that sample. The more accurate a model is, the observations from this model should get closer to the perfect prediction line. In this study, it can be seen that most observations can get close to the perfect prediction line by using GPR-based data-driven models. But there still exist several observations that are away from the perfect prediction line this is mainly caused by the overfitting issue of the data-driven model and can be improved when more corresponding data are available.

Component Effect Analyses
Next, after developing the GPR-based data-driven model to effectively predict battery capacity, the ARD-based kernel weight and tree-based LIME would be carried out to analyze both the global effects and local effects of four component parameters, respectively.

Analysis of Global Effects
To quantify the global effects of component parameters for all observation samples, the hyperparameters of the quadratic-based GPR model are utilized as this GPR model gives the best results for battery capacity prediction. For global effect analysis, after normalizing the weights of the ARD-based kernel, the importance of four component parameters (LTO,binder,C65,and CNF) are quantified and plotted in Figure 5. It can be obvious that the LTO term provides the largest importance weight with 0.89 value for the battery capacity prediction. Binder term gives the second largest importance weight with 0.09 value. In comparison, the Frontiers in Energy Research | www.frontiersin.org July 2022 | Volume 10 | Article 928250 importance weights of both C65 and CNF are too small, which means their global effects can be negligible for this battery capacity prediction.

Analysis of Local Effects
Next, to further explore the local effects of component parameters for some specific sample points, the tree-based LIME is adopted. Here four sample points are randomly selected for further local effect exploration with the detailed information shown in Table 3. It can be seen that these sample points are composed of different values of LTO, C65, CNF, and binder. To analyze the local effects of four battery component parameters (LTO, C65, CNF, and binder) on predicting battery capacity of these four case points, the quantified  parameter effects through using tree-based LIME are illustrated in Figure 6. Here the predicted model within LIME is the decision tree. Both the capacity prediction results of the GPR-based data-driven model and tree-based LIME are shown for all four case points to reflect their prediction difference. According to Figure 6, although there exist a few differences between the quantified effects of these four sample points, their quantified local effects all present a similar trend. Specifically, the LTO term always gives the largest quantified effect, while the binder term gives the second largest effect. In comparison, the CNF term always gives the smallest quantified effect for all these four sample points. The trend of local effects is similar to the trend of global effects, which indicates that our hybrid data-driven method effectively quantifies both global and local effects of these four component parameters. Moreover, it can be obviously observed that the predicted battery capacity values from tree-based model prediction are all close to the predicted values from the GPR-based data-driven model, which indicates that the tree-based LIME is able to well predict battery capacity and the related local effect analysis can also be well explained.

CONCLUSION
Li-ion batteries are the most popular power source and are widely utilized in medical devices to supply power and energy. As the performance of battery-powered medical devices is highly affected by battery capacity, this study focuses on the accurate monitoring/ prediction of battery capacity and the explainable analysis of related component parameters. To achieve this, a hybrid datadriven method using the GPR-based data-driven model and effect analysis tools is developed. Illustrative results indicate that the designed GPR-based data-driven model is capable of accurately predicting battery capacity with 0.97 R 2 by using the quadraticbased kernel. Through equipping ARD kernel-based weights and   tree-based LIME, both the global effects and local effects of four main component parameters including LTO, C65, CNF, and binder can be successfully quantified and analyzed. Specifically, LTO and binder provide the first and second most important ranking for both global and local effects, while CNF gives the smallest contribution to battery capacity prediction. Due to the merits of the data-driven nature, the designed hybrid data-driven method is capable of efficiently helping users to monitor/predict battery capacity at the early prediction stage and analyze/ understand both global and local effects of interested component parameters. This could further benefit batterypowered medical devices for wider and longer-life applications.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

AUTHOR CONTRIBUTIONS
RF and QP contributed to the conception and design of the study. RF wrote the first draft of the manuscript. CL, HQ, LZ, and QP contributed to manuscript revision, read, and approved the submitted version.