Recent Advances in Screening Lithium Solid-State Electrolytes Through Machine Learning

Compared to liquid electrolytes, lithium solid-state electrolytes have received increased attention in the field of all-solid-state lithium ion batteries due to safety requirements and higher energy density. However, solid-state electrolytes face many challenges, including lower ionic conductivity, complex interfaces, and unstable physical or electrochemical properties. One of the most effective strategies is to find a new type of lithium solid-state electrolyte with improved properties. Traditional trial and error methods require resources and time to verify the new solid-state electrolytes. Recently, new lithium solid-state electrolytes were predicted through machine learning (ML), which has proved to be an efficient and reliable method for screening new functional materials. This paper reviews the lithium solid-state electrolytes that have been discovered based on ML algorithms. The selection and preprocessing of datasets in ML technology are initially discussed before describing the latest developments in screening lithium solid-state electrolytes through different ML algorithms in detail. Lastly, the stability of candidate solid-state electrolytes and the challenges of discovering new lithium solid-state electrolytes through ML are highlighted.


INTRODUCTION
Air pollution control has become an increasingly severe problem in recent years. Some countries aim to ban new gasoline and diesel-powered vehicles by 2030, and governments have implemented policies that develop renewable energy. However, most renewable energy sources, such as wind and solar power, are intermittent energy sources that use rechargeable batteries for storage to enable daily usage. Rechargeable batteries, especially lithium ion batteries (LIBs) with an organic liquid electrolyte, are used in portable consumer electronics and electric vehicles owing to their high energy density and long life span. Even though the energy density of the LIBs has improved tremendously over recent decades, it still needs to be further enhanced to meet the high requirements of the consumers. Moreover, safety is another vital issue that has impeded the application of these types of batteries after LIB safety accidents caused by organic liquid electrolytes . An efficient solution to this problem is to replace flammable organic electrolytes with solid-state electrolytes, which could significantly reduce the risk of leaks, evaporation, and decomposition and ensure higher safety. Solid-state lithium ion batteries (ASSLIB) are being explored in next-generation LIBs. This research faces the challenge of creating stable solid-state electrolytes with high ionic conductivity (Borodin et al., 2015).
Ideal solid-state electrolytes should ideally have the following merits: high ionic conductivity, a low migration energy barrier, a wide electrochemical window, strong electrochemical stability, and high mechanical rigidity to suppress dendrite growth on the anode. In the past few years, Li 10 GeP 2 S 12 (Kamaya et al., 2011), Li 7 P 3 S 11 (Yamane et al., 2007), and some solid polymer electrolytes were widely investigated. However, the lithium solid-state electrolytes reported to-date do not meet these commercial requirements and properties (Do et al., 2013;Sun et al., 2014;Zhang et al., 2014;Chai et al., 2017;Yang et al., 2017) and new solid-state electrolytes with better performance are required. Traditional searching tactics involve trialand-error methods that require a lot of time to fabricate and verify the new solid-state electrolytes, with slow progress and low efficiency.
The Materials Genome Initiative (founded in 2011) has successfully used a machine learning (ML) algorithm to screen new solid-state electrolytes (Holdren, 2011;Sendek et al., 2019;Hatakeyama-Sato et al., 2018;Xie and Grossman, 2018;Cubuk et al., 2019;Zhang et al., 2019). ML is a statistical method involving different algorithm structures that can learn the key information from available data and extract useful characteristics. The computation speed of ML algorithms is much faster than density functional theory (DFT) (Li et al., 2017;Sendek et al., 2018). Conventional methods (e.g., the first principle-based DFT) may take over four weeks to predict, while ML only takes 1 s (Saal et al., 2013). ML technology is usually applied in forecasting attributes, aiming to reflect molecular materials and targeted attributes. Figure 1A shows the typical workflow used in screening lithium solid-state electrolytes through ML algorithms. The whole screening process begins by selecting suitable descriptors (datasets) divided into three parts: training data, testing data, and validating data. They then use different ML algorithms, including supervised, semi-supervised, and unsupervised learning, to calculate and predict potential candidates. The ionic conductivity of candidates is further verified by the DFT or Vienna Ab-initio Simulation Package (VASP) simulation.
Different ML algorithms with advantages could lead to varying results in screening lithium solid-state electrolytes. Generally, supervised learning (e.g., support vector machine, neural network, decision tree, regression method) needs huge data with an effective label to support the calculation and more accurate data. Unsupervised learning (e.g., clustering) could calculate the results by itself without labeling. Semi-supervised learning inherits the previous frame and is trained in many datasets. There is little cost involved in training new data, resulting in increasing calculation speed. This review offers a systematic comparison and discussion of the different algorithms used to boost discovery and further advance this new technology in screening the solid-state electrolytes.
The review first illustrates how others address and prepare with datasets, then introduces and discusses the latest developments in screening lithium solid-state electrolytes through different ML algorithms including Neural Network (NN), Support Vector Machine (SVM), Regression, and Clustering in detail. The review finally verifies the stability of some ML predicted materials and highlights the limitations and current challenges faced by ML-based screening for solid-state materials.

Dataset and Preprocessing
ML technology prioritizes choosing an appropriate dataset and then transferring it to a shape that an algorithm can handle. The Material Project is a database of materials and core program of the Materials Genome Initiative . It contains information about compounds from the Inorganic Crystal Structure Database (ICSD). Figure 1B gives the lithium ionic conductivity distribution of all Li-contained compounds by pymatgen . The data can be computed and analyzed by the Materials Application Programming Interface. The database includes information on the lattice structure, band structure, density of state, space group, energy, and phase diagram, etc. Different ML algorithms may use diverse descriptors and various representations as an input (Curtarolo et al., 2012;Jain et al., 2015;Kiran and Joseph, 2017). The more suitable the representation of the input data, the more accurately an algorithm can map it to the output data. For this reason, suitable input data should be easier to acquire than the output attributes that will be predicted, and it is better when the dimensionality is lower (Ghiringhelli et al., 2015).
To solve simulation errors, data should be preprocessed deliberatively. The preprocessing stage extracts raw data features and gives the subset unique characteristics, consisting of a continuous curve or discrete data. Feature extraction involves the transformation of the original data to essential characteristics. Its transforms, maps, or changes the dimensions of the original data into new data that reflects the input information more clearly. A large number of descriptors encode structures and properties, including: Coulomb Matrix (Rupp et al., 2012); Finger Prints; Graph Theory (Bonchev and Rouvray, 1991); 3D Geometry, Voronoi Tessellations (Ward et al., 2017); Simplified Molecular Input Line Entry Systems (SMILES) based on radial distribution functions (Schütt et al., 2014); and property-labeled material fragments (Isayev et al., 2017).

Neural Networks
Neural networks, like the human brain, have many layers and units. Weights store knowledge and then form a completed net. It can process the input data layer by layer, thus converting the initial input representation into a more closely related representation to the output target. Learning is the process of adjusting the weights so that the training data is more accurate.
Sendek et al. reported a neural network for predicting the fast Li-ion conduction at room temperature (RT). They found that Li 5 B 7 S 13 , Li 3 InCl 6 , and Li 2 B 2 S 5 had high ionic conductivity (Sendek et al., 2018). As shown in Figure 1C, 11 crystalline compounds were identified from 317 candidate materials by the ML-guide method. The ionic conductivity of Li 5 B 7 S 13 was predicted to be 0.074 S/cm, six times higher than the bestknown material (Ding et al., 2009;Zhao and Daemen, 2012;Chou and Hwang, 2014;Cubuk and Kaxiras, 2014;Xiao et al., 2015). This study built a convolution neural network to train the atomistic structure, and five descriptors were used as input data. The ML-based simulation revealed a 44-times improvement in the log-average of conductivity compared to random guesswork and manual computation. However, the input data contained less information about properties, the number of materials screened was small, and the training precision was low. Zeeshan et al. predicted that LiAuI 4 and Ba 38 Na 58 Li 26 N were superionic conductors with ionic conductivity of 9.4 × 10 -4 and 10 -3 S/cm respectively through crystal graph convolutional neural network (CGCNN) (Xie and Grossman, 2018;Zeeshan et al., 2018). The 100 networks model was used to predict shear and bulk moduli as well as some mechanical properties. The accuracy of the screening results was further confirmed by measuring electronic conductivity and thermodynamic stability. Compared to the usual models, the CGCNN method was more general but required more data to train. This method significantly decreased the costs involved with first-principles calculations.

Support Vector Machine
A support vector machine classifies various samples by different labels to find a partition hyperplane. However, there have many partition hyperplanes that can separate the training samples. The boundary should classify the input samples robustly and have the strongest generalization ability to the unseen examples. The Fujimura team employed the support vector regression (SVR) method to predict ionic conductivity (Fujimura et al., 2013). Figure 1D shows the predicted 72 compositions. This work indicated that Li 4 GeO 4 had the highest ionic conductivity. They used the SVM method with a Gaussian kernel to predict the low-temperature conductivities of the compounds. The phase transition temperature (T c ), diffusivity at 1600 K (D 1600 ), the average volume of the disordered structures (V dis ), and experimental temperature T were regarded as independent variables, while the logarithm of ionic conductivity as the dependent variable. To confirm which descriptor combinations can show the best performance, they undertook many experiments and included different numbers of variables to estimate the prediction error. The bootstrapping error was lowest when it chose a combination of these three: D 1600 , T.
Cubuk et al. discovered that LiN 5 P 3 O, Li 3 Na 4 O 3 , LiPO 3 , LiMg 3 K 2 O 4 , LiNaMg 3 O 5 , Li 2 K 3 GaO 4 , Li 5 Na 2 O 3 , Li 4 NaGaO 4 , Li 2 MgO 2 , Li 5 K 2 O 3, and Li 5 Na 2 NO 2 satisfied all of the screening criteria, including stability, high Li conductivity, low cost and weight, and a large window of electrochemical stability . They used 30 elemental descriptors from 40 materials to train a linear SVM through the leave-one-out crossvalidation (Sendek et al., 2016). The prediction accuracy is low because the selected 40 materials are out of Material Project or ICSD. In order to solve the trade-off between ML model accuracy and previously uncharacterized materials, they took the outputs of the structure model in Sendek's research (Sendek et al., 2018) as the labels to train the generic descriptors model, named transfer learning (Pan and Yang, 2010). Through this transfer learning, a new generic model was trained using three datasets, including 40 data, Material Project data, and 21 of the 12,716 lithium-containing materials, with an accuracy of 87.5%, 92.0%, and 85.7%, respectively.
Because of these complex chemical systems, it is still difficult to completely calculate total conductivity by existing computing capacity. Most conductors have added polymers, such as plasticizers, which makes the calculations more complicated. To solve this complex interaction computing, Kan Hatakeyama-Sato's team used a polymer database to train a gradient boosting model, which shows a 90% accuracy for training data and 81% accuracy for the testing data (Hatakeyama-Sato et al., 2018). As shown in Figure 1E, experiment data were similar to the predicted ionic conductivity. The analysis indicated that electronegativity and polarity of the monomer units were dominant variables in determining ionic conductivity.

Regression Method
The regression model studies the relationship between dependent variables (targets) and independent variables (predictors), used in predictive analysis, time series models, and finding causal relationships between variables. The kernel ridge regression and gradient boosting regression algorithm model were used to find the potential solid-state electrolytes LiOH, LiAuI 4 , LiBH 4 , Li 2 WS 4 , and Ba 38 Na 58 Li 26 N, etc. (Zeeshan et al., 2018), through training the elastic tensor from the materials project database. An elastic tensor can assess the interface stability between Li anode and solid-state electrolytes. Sendek's team related the crystallographic directions with the elastic tensor. They took DFT-computed information on 482 electrolyte materials as input data and set up an 'rbf' kernel ridge model with α and γ equal to 0.01, and two gradient boosting models with different max depth, minimum samples per leaf, and minimum samples split. This study predicted the elastic tensor of 548 materials with a cubic crystal structure. Although the training data was small, the regression model was useful. They found a relationship between material stiffness and other characteristics like mass density, the ratio of bond ionicity, volume per atom, and sub lattice electronegativity.

Clustering Algorithm
It is difficult and time-consuming to calculate the migration barriers and ion diffusion of the solid-state electrolytes, especially considering the huge material space. The unsupervised learning algorithm can discover the relationship between properties and X-ray diffraction (XRD) intensities have emerged in screening lithium solid-state electrolytes. Zhang et al. found that Li 8 N 2 Se, Li 6 KBiO 6 , and Li 5 P 2 N 5 (i.e., three new materials systems) had an ionic conductivity higher than 10 -2 S/cm (Zhang et al., 2019). Figure 1C shows the tree dendrogram generated using the agglomerative hierarchical clustering method. The highest result appears in groups Ⅴ and Ⅵ. Ratio of features (4) and (7) −0.03 0.719 1.611 0 20 Ratio of features (5) and (8 These new materials comprise new structures, chemistries, and compositions that are significantly different from existing chemistries. Using clustering methods to classify modified X-ray diffraction (mXRD) can define every anion lattice and fully capture the anionic crystal structure information. The data contained 528 compounds, and three different models were generated to classify the results. The latter two models were set to evaluate the robustness of the clustering algorithm. The unsupervised learning rising algorithm discovered a quantitative correlation between group and conductivity. The clustering model captured the physical dependence of fast solidstate Li-ion diffusion on anion lattice and gathered excellent materials together. The unsupervised method solves the problem created by scarce datasets, unlike supervised learning models, which need label training data. In unsupervised learning, the label of training data is unknown. It relies on learning non-labeled data to reveal intrinsic properties and laws of data, which form the basis of data analysis. The variances and errors of model parameters are rarely affected by the experimental method.

DISCUSSION
This study used different ML algorithms to screen and discover typical solid-state electrolytes with high ionic conductivity. As shown in Figure 2D, most materials were concentrated in the range of 10 -4 S/cm. In particular, Li 5 B 7 S 13 , Li 8 SeN 2 , KLi 6 BiO 6 , and Li 5 P 2 N 5 exhibit a high ionic conductivity near 10 -2 S/cm. As a screening method for solid-state electrolytes, the NN structure had the most robust ability in precision or recall. The neural units can learn about the strange input layer by layer, resulting in a high learning rate, meaning that classification tasks often use the NN model. As for the clustering method, it can discover the complex unseen patterns hiding behind multi-dimensional data. If the clustering is sufficiently subdivided, the conductivity of the same species will be very similar. SVR has higher prediction accuracy and stability when there is less label data. Compared to the SVR, the LR algorithm has advantages in prediction efficiency, but the prediction accuracy fluctuates greatly. Gradient Boosting, Kernel Ridge Regression, and Crystal Graph Convolution Neural Network were distributed at lower ionic conductivity. Thus, clustering and the NN method provide robust prediction and a highly efficient screening method for solid-state electrolytes. Our recent work illustrates ML methods in applications, advantages, challenges, and typical references, as summarized in Table 2.
It is important to verify the stability of the predicted materials, which could be determined by their formation energy and energy above hull: E hull (Ong et al., 2008;Jain et al., 2011). Theoretically, the formation of energy should be negative and E hull 0 eV for thermodynamically stable materials. First principles were used to calculate the formation of energy and E hull of the screening results from previous research via VASP. As shown in Figure 2E, the typical predicted materials have negative formation energy, and most of the samples exhibit a low E hull , almost equal to zero, confirming a stable thermodynamically. It is worth to mention that Li 6 Ho(BO 3 ) 3 , with ionic conductivity of up to 5.1 × 10 −3 S/ cm in (Sendek et al., 2018) shows both low formation energy and zero energy above the hull, which could be a promising material. In addition to the ionic conductivity, the electrochemical stable window (Wang et al., 2015) is another crucial feature in screening excellent solid-state electrolytes. Yizhou Zhu's team (Zhu et al.,  2015) used first principles to analyze thermodynamics and found that solid-state electrolyte materials have a limited electrochemical window. Thus, when considering candidate solid-state electrolytes, formation energy, energy above the hull, and the electrochemical window should also be included.
There are still many factors limiting the process of screening new solid-state electrolytes. 1) The training data is still not large enough to satisfy the big structure model, which affects the accuracy of the screening results.
2) The constructed model is still too complicated and cannot adapt to the scarcity of material data, and the model should be as simple as possible. 3) Suitable descriptors have not yet been precisely selected, and there is still no quantitative connection between performance and the parameters of the material. To tackle these challenges, we need greater effects both from the material and algorithmic scientists. On the one hand, material scientists should fabricate and verify the lithium solid-state electrolytes found through ML, which will increase the dataset and provide better guidance for the further optimization of the algorithm. On the other hand, more advanced or updated methodologies could be applied to screen lithium solid-state electrolytes according to data availability. For example, neural networks (e.g., Auto Encoders, Generative Adversarial Networks, etc.) were recently used to predict crystal and material properties (Ryan et al., 2018;Zheng et al., 2018), even in a scenario with little training data. Active learning (Gao et al., 2020) combines data labeling and model training to minimize labeling costs by prioritizing high-value data and classifying the materials based on small material data. These new algorithms have potential applications in screening for lithium solid-state electrolytes.

CONCLUSION
This review discusses the latest developments in the use of ML algorithms in screening for solid-state electrolyte materials. Some potential lithium solid-state electrolytes were predicted with high ionic conductivity and stability. We focused on demonstrating various ML algorithms, including clustering, support vector machine, neural networks, and regression, used in screening for solid-state electrolytes. In general, neural networks showed a significant advantage in screening. The challenge of using ML in screening for solid-state electrolytes include a lack of data and complicated algorithms. These could be resolved by experimentally verifying the predicted candidates and developing new ML algorithms. As a powerful simulation method, ML have accelerated the discovery of lithium solidstate electrolytes and could be developed to screen for other functional materials.