This article was submitted to Computational Materials Science, a section of the journal Frontiers in Materials
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
This paper proposes a novel neural network architecture and its ensembles to predict the critical superconductivity temperature of materials based on their chemical formula. The research describes the methods and processes of extracting data from the chemical formula and preparing these extracted data for use in neural network training using TensorFlow. In our approach, recurrent neural networks are used including long short-term memory layers and neural networks based on one-dimensional convolution layers for data analysis. The proposed model is an ensemble of pre-trained neural network architectures for the prediction of the critical temperature of superconductors based on their chemical formula. The architecture of seven pre-trained neural networks is based on the long short-term memory layers and convolution layers. In the final ensemble, six neural networks are used: one network based on LSTM and four based on convolutional neural networks, and one embedding ensemble of convolution neural networks. LSTM neural network and convolution neural network were trained in 300 epochs. Ensembles of models were trained in 20 epochs. All neural networks are trained in two stages. At both stages, the optimizer Adam was used. In the first stage, training was carried out by the function of losses Mean Absolute Error (MAE) with the value of optimizer learning rate equal to 0.001. In the second stage, the previously trained model was trained by the function of losses Mean Squared Error (MSE) with a learning rate equal to 0.0001. The final ensemble is trained with a learning rate equal to 0.00001. The final ensemble model has the following accuracy values: MAE is 4.068, MSE is 67.272, and the coefficient of determination (R2) is 0.923. The final model can predict the critical temperature for the chemistry formula with an accuracy of 4.068°.
This paper presents a work deal with superconducting materials—materials that conduct current with zero resistance temperature equal to or below the critical temperature Tc (
The relevance of superconductors include societal challenges related to health and wellbeing. Superconductors are used in medicine mainly inside devices for CT scan and Magnetic Resonance Imaging, MRI, systems and in magnetometers for SQUID, Superconducting Quantum Interference Device (
Although the superconductivity effect can be used in many areas, the effect disappears when the temperature rises above critical (
In addition, superconductors are often used in electrical, researchers, and other systems. Superconductors are used in superconducting fault current limiters, SCFCKs, for electrical current limitation (
At present, there are two main directions in the field of superconductivity application: in magnetic systems and in electric machines. Many types of research are underway to find new superconductors with high critical temperatures (
In the research (
In the research (
In the research (
All these researches (
Therefore, this paper presents a novel technique for an accurate prediction of these cases.
Then the aim of our research is to develop a model for using the chemical formula of material and then to predict the critical temperature of superconductivity for this material. Our research considers and describes an approach based on the use of various neural network architectures and their combinations for chemical formulas analysis. This research considers the use of neural networks whose structure is based on the use of LSTM and convolution layers.
In the final neural network ensemble, six networks are used: one network based on LSTM and four based on convolutional neural networks, and one embedding ensemble of convolution neural networks. LSTM neural network and convolution neural network were trained in 300 epochs. Ensembles of models were trained in 20 epochs. All neural networks are trained in two stages. At both stages, the optimizer Adam was used. In the first stage, training was carried out by the function of losses Mean Absolute Error (MAE) with the value of optimizer learning rate equal to 0.001. In the second stage, the previously trained model was trained by the function of losses Mean Squared Error (MSE) with a learning rate equal to 0.0001. The final ensemble is trained with a learning rate equal to 0.00001.
This article is organized as follows: after presenting the introduction in
The research used a dataset from the research (
The dataset used elements with atomic numbers up to 86, elements up to and including radon. For each element, 16 parameters were selected: atomic mass, number of neutrons, number of protons, period, atomic radius, electronegativity, first ionization, density, melting point, boiling point, number of shells, group, specific heat, is metal, is nonmetal, is metalloid. The parameters “is metal”, “is nonmetal”, “is metalloid” were presented in 1 hot encoding format. These parameters have been selected because they provide a precise description for each of the 86 elements considered. After selecting these parameters, a table of 86 elements was standardized for these parameters.
Machine learning algorithms have low data accuracy with large differences in the size of input values. The parameters of chemical elements can vary by many orders of magnitude (
Elements from the standardized table of chemical elements were inserted into superconductor formulas from the dataset for the elements, respectively. Coefficients of chemical elements in standardization formulas were not exposed. If the number of elements in a claim is less than 10, the claim is expanded to 10 and all parameters of all missing elements were set to zero (
Two variants of chemical elements arranged in the processed dataset were considered: sorting by their indication in the chemical formula in the source dataset and sorting by their number in the periodic table of elements. The choice of sorting by number in the periodic table of elements was due to some differences in the order in which the chemical elements were arranged in the formulas in different fields of activity. In this variant of sorting the arrangement of elements in the formula view for the neural network, the specifics of recording the chemical formula do not affect the result of the neural network operation.
In this subsection, the details about the neural networks used in this research are described.
LSTM neural networks, Long Short-Term Memory neural networks, are a special kind of recurrent neural networks, capable of learning long-term dependencies. They were introduced by Sepp Hochreiter and Jurgen Schmidhuber (
Structure of LSTM neuron (
LSTM predictions are always based on past network input experience. However, as input data increase in size, the importance of data entering the neural network decreases at the beginning compared to data entering the neural network later (
Deep convolution networks provide state-of-the-art classifications and regressions results over many high-dimensional problems (
For different types of input data, there are different convolution variants defined by the parameters of the convolution kernel. For two-dimensional convolution, by the example of image analysis, the kernel is defined by resolution of image and depth, the number of color channels, of image. For one-dimensional convolution, the kernel parameters are the length of the input sequence, the number of elements in one sequence row, and depth, and the number of values per one sequence step.
A deep neural network consists of neuronal layers with set parameters. In one layer, all neurons have the same convolution parameters. An example of a neuron of a one-dimensional convolution neural network for the sequence analysis using the chemical formula is shown in
One-dimensional convolution neuron.
One-dimensional convolution layer is used for analyzing two-dimensional data sequences. This type of layer creates and uses a convolution kernel that is convolved with the layer input over a single spatial or temporal dimension to produce a tensor of outputs. In the case of the formula analysis in
Neural networks are flexible and scalable algorithms that can adapt to the data used in training. However, they are trained using a stochastic learning algorithm and adapt to the specifics of the learning data during training. So even the same architecture neural networks, trained on the same dataset, but started training with different parameters of weights, can find different variants of the optimal set of weights each time they are trained, which, in turn, leads to different forecasts (
To improve learning outcomes and reduce learning, a learning approach is used that is based on learning from the same data and then combines multiple neural networks with different architectures. This is called ensemble learning and not only reduces the variance of forecasts but can also lead to forecasts that are better than any single model. An example of an ensemble of neural networks is shown in
Ensemble of neural networks (
In the training process, different methods are used to assess the quality of training and neural network performance. Prediction of the critical superconductivity temperature value for a chemical formula is a regression problem. For the regression problem, metrics are used: mean absolute error, mean squared error, root mean square error, and coefficient of determination.
MAE is a measure of the error between paired observations expressing the same phenomenon. A special feature of MAE is its resistance to emissions in data. MAE calculated by the following equation:
The MSE is calculated as the mean square difference between the predicted and actual values. The result is always positive regardless of the sign of the predicted and actual values, and the ideal value is 0.0. A square value means that larger errors result in more errors than smaller errors, which means that the model is penalized for larger errors. MSE calculated by the following equation:
RMSE is a measure of the difference between the values predicted by the model or evaluator and the observed values. It is calculated as the square root of MSE and big value of errors have a disproportionate impact on RMSE. RMSE is calculated by
The coefficient of determination, R2, is the proportion of dispersion in the dependent variable, which is predictable from the independent variable or variables. It provides an estimation of how well the observed outcomes are reproduced by the model based on the proportion of common variations in outcomes explained by the model. The coefficient of determination is calculated by the following equation:
The processed dataset was shuffled and randomly divided into three parts: 80% - training set, 10% - test set, and 10% - validation set. Each value within each subset was duplicated 5 times, after which the subset was mixed. This operation was performed so that the training did not focus on the features of individual batches. Since formulas had a different number of elements in their composition, they were added to 10 elements by an element, all parameters of which and the coefficient in the formula have the value 0.
After pre-processing, the original set containing 21,263 formulas was divided into a training set of 17,010 unique formulas, a test set of 2,126 formulas, and a validation set of 2,127 formulas. Each formula in each subset was repeated 5 times, after which the subset was mixed. As a result, the training set had 85,050 formulas, the test set had 10,630 formulas, and the validation set had 10,635 formulas.
After dividing the dataset into subsets, they were trained in different neural network architectures. Since the formulas were a sequence of elements with predefined parameters, the focus was on neural network architectures based on LSTM and one-dimensional convolution layers. The input of the neural network was supplied a processed formula added to 10 elements, each of which has 17 parameters: the coefficient of the element in the formula and 16 parameters of the chemical element from the periodic table of elements.
Models were trained in two stages. At both stages, the optimizer Adam was used. In the first stage, training was carried out by the function of losses Mean Absolute Error, MAE with the value of learning rate equal to 0.001. In the second stage, the previously trained model was trained by the function of losses Mean Squared Error, MSE, with a low learning rate equal to 0.0001.
Various architectures based on LSTM layers were considered during the research. The architectures and their names, which gave the most accurate results, are shown in
Architectures of recurrent neural networks.
Results of training of recurrent neural networks.
Name | Dropout | Activation | MAE (degrees°K) | MSE (degrees°K2) |
---|---|---|---|---|
R1 | 0 | None | 4.8908 | 92.1871 |
R1 | 0 | Relu | 5.1268 | 100.8792 |
R1 | 0.05 | none | 4.8328 | 93.8598 |
R1 | 0.05 | Relu | 5.0758 | 97.2897 |
R1 | 0.1 | none | 4.9218 | 94.8925 |
R1 | 0.1 | Relu | 5.1089 | 96.2879 |
R2 | 0 | none | 4.9705 | 100.8791 |
R2 | 0 | Relu | 5.1798 | 110.8971 |
R2 | 0.05 | none | 4.9518 | 99.8791 |
R2 | 0.05 | Relu | 5.1209 | 107.9871 |
R2 | 0.1 | None | 5.0129 | 97.5872 |
R2 | 0.1 | Relu | 5.1408 | 100.1879 |
R3 | 0 | None | 4.9287 | 96.7898 |
R3 | 0 | Relu | 5.1287 | 110.8987 |
R3 | 0.05 | None | 4.9019 | 95.1982 |
R3 | 0.05 | Relu | 5.0791 | 105.2847 |
R3 | 0.1 | None | 4.9271 | 95.0879 |
R3 | 0.1 | Relu | 5.0971 | 101.2878 |
R4 | 0 | None | 4.9833 | 99.7972 |
R4 | 0 | Relu | 5.0917 | 100.1972 |
R4 | 0.05 | None | 4.9613 | 96.4879 |
R4 | 0.05 | Relu | 5.0796 | 104.1975 |
R4 | 0.1 | None | 4.9486 | 95.1479 |
R4 | 0.1 | Relu | 5.0579 | 109.1871 |
According to the analysis of these results, the best results were obtained by R1 neural network of three LSTM layers, without the function of the activation of neurons on the recurrent layers, with a value of dropout on each of the layers equal to 5%. The best results were obtained with the number of epochs equal to 300 and the size of the batch equal to 200.
Also, in the process of research were considered the architecture of convolution networks based on one-dimensional convolution layers. L2 regularization with a value of 0.0001 was applied for each layer for reducing the effect of overfitting. Architectures and their indexes, which gave the most accurate results, are shown in
Architectures of convolution neural networks.
Results of training of convolutional neural networks with different kernels.
Name | Kernel | MAE (degrees°K) | MSE (degrees°K2) |
---|---|---|---|
C1 | 2 | 4.7972 | 85.8867 |
C1 | 3 | 4.7287 | 85.4572 |
C1 | 4 | 4.7553 | 83.9879 |
C1 | 5 | 4.7843 | 89.4839 |
C2 | 2 | 4.8138 | 88.3587 |
C2 | 3 | 4.7628 | 89.7028 |
C2 | 4 | 4.8575 | 86.9126 |
C2 | 5 | 4.8052 | 85.8867 |
C3 | 2 | 4.8975 | 86.0501 |
C3 | 3 | 4.7559 | 85.7088 |
C3 | 4 | 4.8641 | 85.3812 |
C3 | 5 | 4.8384 | 88.4810 |
The best result for convolutional neural networks was obtained with the kernel size equal to three. The best results were obtained with the number of epochs equal to 300 and the size of the batch equal to 200.
For the variants of architectures, cross-validation training was used. The source dataset of 21,263 values was divided 10 times into 90% train subset and 10% test subset. In addition, each value within each subset was duplicated 5 times, after which the subset was mixed. The cross-validation train subset had 95,685 formulas and the test dataset had 10,630 formulas. The best results of training with the use of cross-validation are presented in
Results of training of neural networks with cross validation.
Name | MAE (degrees°K) | MSE (degrees°K2) |
---|---|---|
R1 | 4.8009 | 79.1871 |
C1 | 4.6608 | 78.0079 |
C2 | 4.7025 | 76.4987 |
C3 | 4.6989 | 79.7912 |
Neural networks training process on cross-validated dataset.
The cross-validation variant of the dataset, which gave the best training results and the neural networks trained on this data, were saved and used to create an ensemble of neural networks.
Since the C2 and C3 neural networks have similar architecture, they were adapted to minimize MSE loss and combined into an ensemble. After training, an ensemble of C2 and C3 neural networks was inserted into the final ensemble. Model C1 has been added in two variants: adapted to minimize MAE loss and adapted to minimize MSE loss. The final ensemble architecture from pre-trained neural networks is shown in
Architecture of final ensemble of neural networks.
The final version of the ensemble was trained in two stages. In the first stage, training was carried out by the function of Mean Absolute Error loss, MAE, with the value of learning rate equal to 0.00001. In the second stage, the previously trained model was trained by the function of Mean Squared Error loss, MSE, with a low learning rate equal to 0.0000001. The low rate of training is discussed by the fact that the ensemble consists of already pre-trained models. The best results were obtained with the number of epochs equal to 20 and the size of the batch equal to 200. The process of ensemble training for the minimization of MAE loss is shown in
Ensemble training process.
The error of this ensemble after training is 4.068 for MAE loss and 67.272 for MSE loss. After minimization, MSE, loss coefficient of determination, R2, was calculated. The coefficient of determination, R2, is 0.923. Also, after minimizing MSE loss, root mean square error, RMSE, was calculated. RMSE is 8.202. The final ensemble of neural networks has 1,330,247 trainable parameters.
In this research, the application of neural networks to the analysis of superconductor tempers was considered. The analysis was based on the properties of chemical elements and their coefficients in the chemical formula. As a result, neural networks with convolution and recurrence architecture were trained. The ensemble of neural networks was created as a combination of the best variants of architectures of pre-trained neural architectures.
The accuracy metrics of ensemble after training are shown in
Accuracy metrics of ensemble.
Neural network name | RMSE (degrees°K) | R2 | MAE (degrees°K) |
---|---|---|---|
Ensemble of R1, C1, C2, C3 networks | 8.202 | 0.923 | 4.068 |
R1 | 8.899 | 0.892 | 4.801 |
C1 | 8.832 | 0.897 | 4.661 |
C2 | 8.749 | 0.906 | 4.703 |
C3 | 8.932 | 0.890 | 4.699 |
The final ensemble of neural networks has 1,330,247 trainable parameters.
The results of the prediction of this ensemble in comparison with previously developed algorithms are presented in
Compilation different algorithms.
Algorithm | Author | Year | RMSE (degrees°K) | R2 | MAE (degrees°K) |
---|---|---|---|---|---|
Multiple regression |
Kam Hamidieh | 2018 | 17.6 | 0.74 | — |
XGBoost |
Kam Hamidieh | 2018 | 9.5 | 0.92 | — |
XGBoost |
Abdulkadir Karacı | 2019 | 9.091 | 0.928 | — |
Hybrid neural network |
Shaobo Li | 2020 | 9,141 |
0.899 | 5.023 |
Proposed model | Authors of this paper | 2020 | 8.202 | 0.923 | 4.068 |
Original value was 83.565. In the article (
The root mean square error of the developed ensemble is smaller than the minimum value of the RMSE error of previous algorithms. RMSE of the ensemble from this article was decreased by 0.889°.
The coefficient of determination is more than the value of previous algorithms. The mean absolute error is smaller than the MAE of previous algorithms. R2 metric of the ensemble from this article was decreased by 0.005.
Mean absolute error of the developed ensemble is smaller than the minimum value of MAE error of previous algorithms. Mean absolute error of ensemble from this article was decreased by 0.937°.
Although the neural network ensemble has lost exactly the coefficient of determination, MAE and RMSE values have decreased. Comparison of MAE and RMSE value changes with the coefficient of determination value allows considering the decrease in the value by the coefficient of determination as insignificant in comparison with improved accuracy and decrease in MAE and RMSE values.
The developed neural network ensemble algorithm, as well as earlier algorithms presented in this article, is based on the same dataset. This dataset includes only the chemical formula of the superconductor and its critical temperature. However, many substances may change their internal structure depending on many factors. Adding to the model of atomic structure information of investigated material could considerably increase the quality of work of models intended for analysis of chemical compounds.
In this paper, an ensemble of neural networks was developed to predict the critical temperature of superconductors. Input data for this neural network model are only the chemical formula of the material. Sorting chemical elements in the formula by the element number in the periodic table of chemical elements allowed the neural network to concentrate more actively on real parameters of chemical elements rather than on features of their representation in the chemical formula.
As the chemical formula represents a sequence of parameters, recurrence and convolution algorithms of neural networks were used in this neural network model. Combined use of these algorithms helped to give high accuracy calculation of the target parameter, critical temperature for chemical formula. Our proposed method based on a neural network model can be useful for searching high-temperature superconductors.
Nowadays, superconductors are used in various fields. However, the necessity to provide low temperatures for the use of the effect of superconductivity makes these devices with the use of superconductors expensive and difficult to service. These devices are large with a relatively small size of the used superconductor. Many efforts and materials are used to simply maintain the temperature enough to support the superconductivity effect. Also, low temperatures are dangerous for humans, where the superconductivity effect is manifested.
Due to the increase of using superconductor material in different devices, which solves problems related to societal challenges, the proposed research line could be implemented in real time in the near future.
The raw data supporting the conclusions of this article will be made available by the authors without undue reservation.
Funding acquisition: BG-Z and AM-Z; investigation: DV, BG-Z, and AM-Z; methodology: BG-Z; project administration: BG-Z and AM-Z; resources: BG-Z; software: DV; supervision: AM-Z; writing—original draft: DV; writing—review and editing: BG-Z and AM-Z.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors, and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
The Supplementary Material for this article can be found online at: