Studies of different kernel functions in nuclear mass predictions with kernel ridge regression

The kernel ridge regression (KRR) approach has been successfully applied in nuclear mass predictions. Kernel function plays an important role in the KRR approach. In this work, the performances of different kernel functions in nuclear mass predictions are carefully explored. The performances are illustrated by comparing the accuracies of describing experimentally known nuclei and the extrapolation abilities. It is found that the accuracies of describing experimentally known nuclei in the KRR approaches with most of the adopted kernels can reach the same level around 195 keV, and the performance of the Gaussian kernel is slightly better than other ones in the extrapolation validation for the whole range of the extrapolation distances.


Introduction
Nuclear mass is important for both nuclear physics [1] and astrophysics [2,3]. During the past decades, great progress has been made in mass measurements of atomic nuclei, and about 2,500 nuclear masses have been measured to date [4]. Nevertheless, the masses of a large number of neutron-rich nuclei involved in the r-process remain unknown from experiments and cannot be measured even with the next-generation RIB facilities. Therefore, theoretical predictions for nuclear masses are imperative at the present time. Global mass model can be traced back to the von Weizsäcker mass formula based on the famous liquid drop model (LDM) [5]. Lots of efforts have been made in pursuing different possible extensions of the LDM, which are known as the macroscopic-microscopic models, such as the finite-range droplet model (FRDM) [6] and the Weizsäcker-Skyrme (WS) model [7]. The microscopic mass models based on the non-relativistic and relativistic density functional theories (DFTs) have also been developed [8][9][10][11][12][13][14][15][16][17]. The root-mean-square (rms) deviation between theoretical mass models and the available experimental data [4] range from about 3 MeV for the BW model [18] to about 300 keV for the WS ones [7], which is still not sufficient for accurate studies of exotic nuclear structure and astrophysical nucleosynthesis [19,20]. What's more, for neutron-rich nuclei far away from the experimentally known region, the differences among the predictions of different mass models can be as large as several tens MeV [6][7][8][9][10][11][21][22][23].
The KRR approach was employed to improve nuclear mass predictions for the first time in Ref. [36]. It is shown that the extrapolation behavior of the KRR approach is quite different with other approaches, e.g., the RBF approach. The RBF approach would worsen the mass descriptions for nuclei at large extrapolation distances, as the effects of the adopted linear kernel remain large at such distances. However, the KRR approach can automatically identify the limit of the extrapolation distance and avoid the risk of worsening the mass description for nuclei at large extrapolation distances, which is due to the decay behavior of the Gaussian kernel as the increase of extrapolation distance. This reflects the importance of kernel function in nuclear mass predictions with kernel-based machine learning approaches.
There are many commonly-used kernel functions in the KRR approach. The detail features of different kernels are different, which can affect the performances of the KRR approach in nuclear mass predictions. It is therefore necessary to study the effects of different kernel functions on the performances of the KRR approach in the practical applications of nuclear mass predictions.
In this work, the performances of the KRR approach for nuclear mass predictions with different kernel functions, including Gaussian, Laplacian, Matern, Cachy, Multiquadric, inverse Multiquadric, Logarithm, power, and inverse power kernels, are compared. The paper is organized as follows: In Section 2, the theoretical framework of the KRR approach is introduced. The numerical details are given in Section 3. In Section 4, the comparisons through the leave-one-out cross-validation and extrapolation validation are presented. Finally, a summary is given in Section 5.

Theoretical framework
The KRR approach is a powerful machine-learning approach for non-linear regression and has been successfully applied in nuclear mass predictions [36]. In this method, the KRR function is written as where x i ≡ (N i , Z i ) are locations of nuclei in the nuclear chart, m is the number of training nuclei, w i are weight parameters to be determined, K (x j , x i ) is the kernel function, which measures the correlations between nuclei. The weight parameters w i are determined by minimizing the loss function defined as where w = (w 1 , . . . , w m ). The first term of Eq. 2 is the variance between the data y (x i ) and the KRR prediction S(x i ), and the second is the penalty term that penalizes large weights to reduce the risk of overfitting. The hyperparameter λ determines the strength of penalty. Minimizing Eq. 2 yields where K is the kernel matrix with elements K ij = K (x i , x j ), and I is the identity matrix. Nine kernel functions are adopted in the present study, i.e., the Gaussian kernel K(r) = exp (−r 2 /2σ 2 ), the Laplacian kernel K(r) = exp (−r/σ), the Matern kernel K(r) = (1 + 3r/5σ) exp (−3r/5σ), the Cachy kernel K(r) = 1/(1 + r 2 /σ), the Multiquadric (MQ) kernel K(r) , the Logarithm kernel K(r) = ln (r σ + 1), the power kernel K(r) = r σ , and the inverse power kernel K(r) = 1/r σ , where the Euclidean norm r is defined to be the distance between two nuclei. The adjustable hyperparameter σ ≥ 0 in each kernel plays an important role in the performance of the corresponding kernel, and should be carefully tuned according to the nuclear mass data.

Numerical details
The KRR function (1)  The experimental masses M exp are taken from the AME2020 [4], while only those nuclei with Z, N ≥ 8 and experimental errors σ exp < 100 keV are considered. There are totally 2340 data composing the entire data set. The theoretical masses M th are taken from the mass table WS4 [7].
One of the hyperparameters, i.e., penalty strength λ, had been carefully validated to be 0.3 for the KRR study of nuclear masses in Ref. [36], which would be adopted in this study.

Results and discussion
The main purpose of this work is to compare the performances of different kernel functions in the KRR approach for nuclear mass predictions. The performances are illustrated by comparing the accuracies of describing experimentally known nuclei and the extrapolation abilities, through the leave-one-out cross-validation and the extrapolation validation.

Leave-one-out cross-validation
The leave-one-out cross-validation is adopted to evaluate the accuracy of the KRR approach with different given types of kernel functions. In the leave-one-out cross-validation, for a given set of hyperparameters (σ, λ), the mass prediction for each of the 2,340 nuclei Frontiers in Physics frontiersin.org is obtained by the KRR network trained on all other 2,339 nuclei. The rms deviation Δ rms between experimental and predicted masses of the 2,340 nuclei is calculated and regarded as a measure of the accuracy. There are mainly two advantages of the leave-one-out crossvalidation. First, it avoids the randomness caused by the random sampling in the validation-set method. Second, it matches the idea that when one wants to predict the mass of an unknown nucleus, information of all the other nuclei with experimentally known masses would be considered to build the model.
In Figure 1, the Δ rms between the KRR predictions with different kernels and the experimental data are shown as functions of the corresponding hyperparameter σ, respectively. The minima of the Δ rms between the experimental data and the theoretical nuclear mass predictions, as shown in Figure 1 for every kernels, are listed in Table 1, together with the corresponding hyperparameters σ.
As can be seen in both Figure 1; Table 1, if the hyperparameters σ are adjusted to proper values respectively, the KRR approach with most of the kernels can reduce the Δ rms to similar level around 195 keV, except for the one with inverse power kernel, which reduces the Δ rms to 220 keV. Note that for the MQ kernel K(r) r 2 + σ 2 √ , the predictions with smaller σ gives smaller Δ rms , which indeed reduces to be the linear kernel K(r) = r when σ = 0. It is also noted that the Δ rms increases rapidly The Δ rms between the KRR predictions with different kernels (A-I for nine different kernels) and the experimental data as functions of the corresponding hyperparameter σ. as the decrease of the hyperparameter σ of the logarithm kernel approaching small values. This is because that the Logarithm kernel K(r) = ln (r σ + 1) would approach to a constant ln (2) and lose the predicted power, when the σ is taken as a small value. It is found that the predictions of the even-odd (eo) and odd-even (oe) nuclei are more accurate that the predictions of the even-even (ee) and odd-odd (oo) nuclei, which holds true for all the kernels. This is because the KRR prediction generates a smooth nuclear mass surface, which tends to average the predictions of all the nuclear masses. Generally speaking, the ee nuclei are most bound and the oo nuclei are least bound, while the eo and oe nuclei are in-between. Therefore, the smooth KRR prediction tends to have better descriptions of the eo and oe nuclei. If one want to well capture the odd-even effects and improve the nuclear mass predictions in the framework of KRR approach, the adopted kernel function should be remodulated to include the oddeven effects [37]. As is known, the shell effects commonly have an energy change of about 10 MeV between a magic nucleus and its mid-shell isotopes. Therefore, it is naturally believed that the shell effects can be captured by the KRR approach with precision of 195 keV.
The results from the leave-one-out cross-validation indicate that the KRR approach with different kernels can reach similar accuracies in interpolation or very short extrapolation, if proper values of hyperparameters are adopted. Therefore, in the applications of predicting nuclear masses for the nuclei that very close to the experimentally known region, the choices of different kernel functions may hardly affect the prediction accuracy.

Extrapolation validation
In order to examine the extrapolation abilities of the KRR approaches with different kernels, the set of nuclei with known masses is redivided as shown in Figure 2. For each isotopic chain of Z ≥ 26, the eight most neutron-rich nuclei are removed out from the FIGURE 2 Nuclei in the training set (gray) and eight test sets (other colors) for examining the extrapolation power for the neutron-rich nuclei. The inset zoom in the region from Z = 26 to 29.

FIGURE 3
Comparison of the extrapolation power of the KRR approach with different kernels (A-I for nine different kernels) for eight test sets with different extrapolation distances.
Frontiers in Physics frontiersin.org training set, and they are classified into eight test sets respectively, corresponding to the different extrapolation distances from the remain training set in the neutron direction. This is the similar as the division in Refs. [36], but for Z ≥ 26. The rms deviations Δ rms between the experimental and predicted masses of the eight test sets would be taken as a measure to compare the extrapolation abilities. Figure 3 shows the Δ rms of the eight test sets for the KRR approach with different kernels adopting the corresponding hyperparameters listed in Table 1, in comparison with the ones for the WS4 mass model. First of all, for the case of short extrapolation, i.e., extrapolation distance smaller than four, the KRR approach with all of the adopted kernels can reduce the Δ rms obtained by the WS4 mass model. For the test sets with large extrapolation distances, i.e., extrapolation distance larger than four, the KRR approach with the MQ, logarithm, and power kernels obviously worsen the WS4 predictions. This is due to the fact that the corrections for the MQ, logarithm, and power kernels increase with the increasing of the Euclidean norm r. While, for the other six kernels, the corrections decrease with the increasing of the Euclidean norm r, which give them the abilities to reduce the risk of worsening the mass descriptions for nuclei at large extrapolation distances. The detail discussions can be seen in Ref. [36], where the Gaussian kernel is taken as an example.
The extrapolation performances of the KRR approach with the six kernels that the corrections decrease with the increasing of the Euclidean norm r are similar, except for the inverse power kernel. They can improve the mass predictions of the nuclei with extrapolation distance smaller than five, and reduce the risk of worsening mass predictions at large extrapolation distances. Among these adopted kernels, the performance of the Gaussian kernel is slightly better than other ones in the extrapolation validation for the whole range of the extrapolation distances. Therefore, the Gaussian kernel, which is commonly used in machine-learning, can be also taken as a default choice in the nuclear mass predictions.

Summary
The performances of different kernel functions, i.e., Gaussian, Laplacian, Matern, Cachy, Multiquadric, inverse Multiquadric, Logarithm, power, and inverse power kernels, in nuclear mass predictions with the KRR approach in describing experimentally known nuclei and extrapolating to neutron-rich nuclei are compared. The comparison is performed through the leave-one-out cross-validation and the extrapolation validation. From the leave-oneout cross-validation, it is found that the KRR approach with most of the kernels can reduce the Δ rms to similar level around 195 keV. From the extrapolation validation, it is found that the performances of the kernel functions strongly depend on its increasing/decreasing behaviors with respect to the Euclidean norm r. For the case that the kernel functions decrease with the increasing of the Euclidean norm r, the corresponding KRR predictions can reduce the risk of worsening the mass predictions for nuclei at large extrapolation distances. Among these adopted kernels, the performance of the Gaussian kernel is slightly better than other ones in the extrapolation validation for the whole range of the extrapolation distances. Therefore, it is suggested to be taken as the default choice in the nuclear mass predictions.
In the present study, only the masses are considered as the outputs to train the ML models, and thus the obtained ML models are unable to predict other nuclear properties. However, the predictions of different nuclear properties by ML models at the same time can be achieved by the idea of multi-task learning. Multitask learning (MTL) is a subfield of machine learning, in which multiple related learning tasks are solved at the same time by exploiting commonalities and differences across tasks. It has been successfully applied in nuclear physics, e.g., in the description of giant dipole resonance key parameters [58] and in the description of nuclear masses and separation energies [41]. It would be interesting to apply different kernels in the MTL framework in future works, in that case the performances and reliabilities of different kernels can be evaluated on additional nuclear properties.

Author contributions
XW conceived the idea, performed the calculations, and wrote the manuscript.