A Lightweight Verification Method Based on Metamorphic Relation for Nuclear Power Software

The verification of nuclear design software commonly uses direct comparison methods. Benchmark questions, classical programs, experimental data, manual solutions, etc., would be used as expected results to compare with program outputs to evaluate the reliability of software coding and the accuracy of the numerical solution. Because nuclear power software numerically simulates complex physical processes, it involves many partial differential equations. It is usually challenging to construct analytical or accurate solutions and is expensive to develop benchmark questions and experimental data. Hence, the quantity of verification examples is small. By using the direct comparison method, verification is complicated, high cost, and inadequate. Entering the validation process without adequate proof will adversely impact the effectiveness and efficiency of validation. Metamorphic testing is an indirect verification technology that cleverly combines the nature of the model with software verification. It evaluates the correctness of the code by examining whether the program satisfies the metamorphic relation. Without manual solutions or benchmark examples, it has broad application prospects in the field of nuclear power. A lightweight verification method based on metamorphic relation has been produced here. Metamorphic relations are identified from physical equations, numerical algorithms, and program specifications. Next, they are explicitly used to system, integration, and unit tests to improve test adequacy. Because no need to develop verification examples, this method can detect code errors as soon as possible at a low cost, improve test efficiency, avoid mistakes remaining in subsequent stages and reduce the overall cost of verification.


INTRODUCTION
The development of nuclear power software usually includes the stages of physical equation modeling, numerical method selection, and code programming. Verification evaluates whether the algorithm is suitable for equations and whether the code accurately implements the algorithm. Verification is the prerequisite for validation. Without adequate verification, it will substantially adversely impact the effectiveness and efficiency of validation.
Software verification usually uses direct comparison methods. Benchmark questions, classical programs, experimental data, manual solutions, etc., would be used as expected results to compare with program outputs to evaluate the reliability of software coding and the accuracy of the numerical solution.
These verification examples are part of system-level information and can only be used for system and acceptance testing. While failures have been detected in those testing levels, revealing and locating defects in functions and solvers is a great challenge. As a result, the cost is exceptionally high, even leading to the collapse of the entire project. Because nuclear power software numerically simulates complex physical processes, it involves many partial differential equations. It is usually impossible to construct analytical or accurate solutions and is expensive to develop benchmark questions and experimental data. Hence, the small number of verification examples further aggravates nuclear software verification's difficulty.
In the process of software verification, tester often implicitly check whether the code satisfies the specific characteristics of the physical equation, numerical solution method, and program specification. If the above rules are violated, it indicates that the code has defects and verification is false. Metamorphic testing (MT) is a rapid indirect verification method for qualitative evaluation. MT cleverly combines the evaluation of the model nature with software verification. Without manual solutions or benchmark questions, it assesses the code reliability by examining whether the code satisfies the metamorphic relation (MR). It has broad application prospects in the nuclear field.
The main innovation points in this article include: 1) A lightweight verification method based on metamorphic relation has been developed. It employs MRs to rapidly evaluate the code reliability at a low cost before the traditional methods estimate the solution accuracy expensively. The former is a supplement to the latter. 2) It makes the verification of nuclear power software more reasonable, reveals defects in the early stage of verification, and reduces the total cost of development.
3) The study of MR is helpful to deep insight into the characters of equations and algorithms, improve the quality of code and continuously increase the developer's confidence in the program. In other words, MRs are the domain knowledge, and the research on them is profit to understand the system better and reuse that knowledge.
Specifically, a group of metamorphic relations is identified from the characteristics of physical equations and numerical algorithms. Then, metamorphic relations are explicitly used to evaluate whether the code keeps the specific rules of equations and algorithms. Two types of code errors can be revealed out quickly and efficiently. The first one is that the code does not accurately implement the numerical algorithm, and the second one is that the numerical method does not correctly solve the physical model.
For the application of this method, the point-depletion computing code, namely NUIT, was used as the experimental object. Without verification examples, the code failures were found by the metamorphic relation. This method significantly alleviates the requirement for verification examples and improves verification efficiency and adequacy.

Direct Comparison Method
At present, nuclear power software verification usually adopts a direct comparison method, which verifies the correctness of the code by comparing the actual output with the expected result. The working principle is shown in Figure 1.
The expected result mainly employs typical benchmark questions, power plant operating data, and experimental bench data. For example, software package NESTOR is verified by international benchmark questions, Qinshan Nuclear Power Plant Unit 1 and Unit 2 operating data, and Hualong No. 1 Unit bench data (Lu et al., 2018). Furthermore, verification of PCM adopts benchmark questions, CPR1000/M310 power plant data, critical reactor test data, and similar software (Wang et al., 2018). The classical program is also a kind of expected result, such as the ORIGEN program for fuel consumption analysis (Hermann and Westfall 1998), APOLLO (Sanchez et al., 1988) and CASMO (Rhodes, Smith, and Lee 2006) for assembly calculation, MCNP (Brown et al., 2002) for radiation shielding, RELAP (Andrs et al., 2012) for system program and so on.

Oracle Problem
Oracle is a mechanism used to determine whether the execution result of the program under test is correct. It is challenging to construct when the expected result does not exist or the construction cost is exceptionally high; it is called an Oracle problem (Barr et al., 2015). Nuclear power software involves the numerical solution of many partial differential equations. It is usually tricky to construct analytical or accurate solutions. Furthermore, for fourth-generation reactors, such as hightemperature gas-cooled reactors, sodium-cooled fast reactors, molten salt reactors, lead reactors et al., and modern designs, e.g., high-fidelity, one-step method, multi-physics coupling, etc., new-generation software has almost no comparable programs and benchmark questions. In addition, benchmark questions, power plant operating data, and experimental bench data are only applicable to specific reactor types due to differences in the neutron energy spectrum, geometric configuration, and core materials. For verification examples, the development cost is high, the cycle is long, and the quantity is small. Therefore, the Oracle problem of nuclear power software is particularly prominent.
Compared with traditional testing methods, i.e., the direct comparison method, this type of software is called a non-testable system (Patel and Hierons 2018). Oracle problem makes nuclear power software testing insufficient. Hence, defects are challenging to find, which affects the safety and economy of engineering design. The sharp-jump problem is found in the classic burnup program ORIGEN when it calculates the decay chain of 239 Pu and 233 U (Isotalo and Aarnio 2011). If the half-life of some daughternucleus meets a specific relationship with the burnup step length, the calculation error will suddenly increase. Without adequate verification, such situations would remain. Generally, software verification includes four test levels: unit testing, integration testing, system testing, and acceptance testing. Each level requires differently corresponding expected results. However, benchmark questions are only applicable to acceptance testing, and the expected results are seriously insufficient in other test levels. Code bugs are challenging to find early, making it challenging to locate defects and high costs for debugging and repairing. The characteristics of nuclear power software essentially cause the Oracle problem. Even if developing more benchmark questions, this problem can only be alleviated but cannot be solved. Therefore, there is an urgent need to introduce new software verification technologies.

Metamorphic Testing
Most scientific computing software is untestable software (Kanewala and Bieman 2014). Software verification often implicitly checks whether the code satisfies the specific characteristics of the physical equations, numerical methods, and program specifications. If those characteristics are violated, the code should have errors and could not pass the test. Metamorphic testing is an indirect verification technology that skillfully combines the program's specific characteristics checking with software verification without constructing verification examples. The correctness of the code is evaluated by examining whether the code meets the metamorphic relation (MR). Its working principle is shown in Figure 2.
MRs are necessary properties of the target function or algorithm in relation to multiple inputs and their expected outputs (Chen et al., 2018;Chen and Tse 2021). For example, a program P implements sine function. It is hard to construct an oracle to determine whether P(x) is correct. However, applying her periodicity, i.e., sin(x) = sin (x+2π), an MR can be obtained as following: if x 2 = x 1 + 2π, then P (x 2 ) = P (x 1 ). As a result, using a group of inputs that satisfied such input pattern, if twice execution results violate the output pattern, it will indicate that P does not agree with MR. In other words, P conflicts with the basic property of sine. Thus, P has a failure. MRs are essential properties that are meaningful for software verification, and codes should abide by them.
Metamorphic testing is one of the effective means to solve Oracle problems (Chen et al., 1998;Liu et al., 2014;Segura et al., 2018;Kanewala and Yueh Chen 2019). Studies have shown that MT has the advantages of reasonable cost and a more vital ability to expose errors (Hu et al., 2006). It is used for software verification, software validation, and software quality assurance (Segura and Zhou 2018). Furthermore, it appears to be the only technique applicable to all three areas of verification, namely testing, proving, and debugging (Chen and Tse 2021). MT has broad application prospects in the nuclear field.

LIGHTWEIGHT VERIFICATION METHOD BASED ON METAMORPHIC RELATION
For relieving the Oracle problem, this paper developed a lightweight verification method based on metamorphic relation. The MR hierarchical classification model  identifies MRs from the specific property of physical equations, numerical algorithms, and program specifications. Then applying them to system testing, integration testing, and unit testing, respectively, to improve the adequacy of testing. Because there is no need to develop verification examples, this method can reveal code failures at the earliest opportunity. As a result, it will improve verification efficiency at a lower cost. In addition, it is also a necessary supplement to the traditional verification technology.
NUIT is a burnup calculation code independently developed by the Institute of Nuclear and New Energy

MR Identification Model
MR is the key in MT. According to current research literature (Sun et al., 2019;Segura et al., 2016), there are several MR identification techniques, such as machine-learning-based, search-based, pattern-based, data mutation-based, and existing MRs' composition etc. We divide them into two categories, namely static analysis, and dynamic discovery, from the perspective of whether to execute the program under test. The former does not execute the program and derives MR by analyzing physical equations' properties, numerical algorithms, and program specifications. The latter reveals MR from inputs and outputs. Because these relations are fitted from data, their validity has not been proved theoretically, thus called likely relations. However, they can provide heuristic information for MR identification. As a result, one abstract MR identification model has been constructed, illustrated in Figure 3. This model has four types of MR, i.e., physics model, computational model, code model, and likely MR. Besides them, a single MR should be formally described with the template approach (Segura et al., 2017).

The Verification Processes
It is assumed that a group of MR has been obtained. The lightweight verification method includes two core stages: MR identification and program evaluation. Specifically, we describe the main activities as follows: 1) Analyzing the nature of the physical equation .  Example 1: We are analyzing the nature of the physical equation. In the case of the fission reaction, the density of 135 Xe gradually increases and does not change until production and consumption reach a dynamic balance after about 2-3 days. According to this rule, we can identify a physical model MR. Specifically, suppose t is the burnup time, D(t) is the nuclide density of 135 Xe, T is the threshold at which the reaction reaches balance. Before balance, (t1, t2) < T, if t1 < t2, then D(t1) < D(t2); after balance, (t1, t2) > T, if t1 < t2, then D(t1)£D(t2). We construct two sets of test inputs. One set of the total time is less than the balance time, and the other set is greater than the balance time. The failure can be detected if the density of 135 Xe violates MR. Example 2: One property of the numerical algorithm is that the nuclide density should smoothly change with the burnup step. The corresponding computational model MR is described as follows. Similarly, t1 and t2 is the burnup time, D(t) is the nuclide density, T is the error threshold. If t2 is next to t1, then|D(t1)−D(t2)| < T. A set of test inputs with continuous changes in burnup step length is constructed, and the failure can be found if the absolute deviation is greater than the threshold. Example 3: After studying the characteristics of the program specification of the matrix exponent method, we find one rule that the result should not be affected by the nuclide ranking rule in the matrix. Hence, a code model MR is obtained. It assumes that o is the sorting rule, D(o) is the nuclide density when the burnup matrix is sorted by rule o, T is the error threshold. If o1 and o2 are different, then |D(o1) − D(o2)| < T. Next, it orders the burnup matrix with three rules: ascending, descending, and random. It indicates that a failure exists while the change of actual outputs has occurred.

Automation Execution Algorithm
It supposes that a set of MR has been obtained. The automation execution algorithm is as follows: 1) Reading a metamorphic relation. 2) Generating a set of test inputs according to the input pattern r and driving the program under test to execute to obtain the calculation outputs. 3) Evaluate whether those results comply with the output pattern R. If R is violated, the verification fails, and the process ends. Else 4) checking whether there is still a metamorphic relation that has not been adopted. If not, terminate the process, else do activity 1)-3) repeatedly.

CASE STUDY
The burnup program describes the law of nuclide density changes over time. It is an essential part of the reactor's physical design. It plays a crucial role in calculating the breeding and consumption of fuel in the reactor and changes in reactivity. The density of a particular nuclide can be expressed by Eq. 1.
n i is the density of nuclide i, l ij is the production rate of nuclide j decaying into nuclide i, λ j is the decay constant of nuclide j, ∅ is the space and energy average neutron flux, f ik is the production rate that nuclide k fission into nuclide i, σ i is the average neutron absorption cross-section of nuclide i. The burnup equation can also be rewritten in matrix form, as shown in Eq. 2, where A is the coefficient matrix of the N-order nuclide depletion equation, and N is the number of nuclides.

Experiment
Twenty-eight MRs have been identified from NUIT using the static analysis technique, of which eighteen are physical model MR, and the rest are computational model MR Li et al., 2020a;Li et al., 2020b;Li et al., 2021). Specifically, they are listed as follows.
The input parameters of NUIT mainly include initial fuel enrichment, mass, burnup step length, step unit, and the number of steps. The burnup calculation types include pure decay, constant flux, and constant power. The parameters constrained by the calculation type have neutron fluence rate and power. The solver mainly includes TTA and CRAM. The solver parameters include approximate order and truncation threshold. The output parameters mainly include nuclide density, radioactivity, neutron reaction rate, neutron absorption rate, decay heat, and other physical quantities.
By analyzing the physical model MRs, the adjustable input parameters include fuel enrichment, mass, total burnup time, neutron flux, and power. Since the neutron flux and power can be converted to each other, and the obtained properties are equivalent. Thus only power is taken here. The source test case of MT uses the verification example of the user manual. The burnup database adopts the high-temperature gas-cooled reactor HTGR nuclide database. The solver employs CRAM. The initial values of other parameters involve that the fuel enrichment is 8.5 percent, mass is one ton, power is 20 MW. The total burnup time is 340 days, of which the step length of the first stage is 1 day, the second stage is 4 days, and the third stage is 12 days, with twenty steps in each stage. The nuclide density is selected as the output parameter. Figure 4 illustrates the trend relation between the density of some nuclides and the burnup level. A linear function y = ax + b can express some relations, such as 135 Cs and 235 U, the coefficient a is greater than zero in the former, while a is less than zero in the latter. A power function y = ax 2 +bx + c can also denote ones; for example, 237 Np and 135 Xe, the parameter a is greater than zero in the former, while a is less than zero in the latter. These observations can guide MR identification.
Assuming that the input pattern of MR is inequality, the single factor approach is used to design test cases, i.e., only one parameter changes at a time. To accurately describe the physical laws, the number of samples is more than 20. Therefore, the design results are as follows: 1) the fuel enrichment is from 1 to 20 percent, increasing by 1 percent each time; 2) The fuel mass is from 500 to 10000 kg, increasing by 500 kg each time; 3) The power is from 20 to 210 MW, increasing or decreasing by 10 MW each time. To sum up, a total of 160 test cases are designed.

Result
A total of forty-six defects were found, of which thirteen bugs were contributed by the lightweight method. After analyzing carefully, we can divide the defects of NUIT into three categories. The first one is that the code does not accurately implement the numerical algorithm. The second one is that the numerical algorithm does not correctly solve the physical equation; It results in the applicable scope of the code being narrower than that agreed in the requirements document. The last one is that the parameters of the algorithm are set inappropriately for specific calculation conditions. Hence, the first type error number is thirteen, and the second type error is three. Half of them are contributed by the lightweight method.
For example, 1) when solving the short half-life nuclides, such as 134 Cs, 242 Cm, and 244 Cm, etc., by the TTA method, it is necessary to shorten the burnup step length; otherwise, the deviation will increase significantly. 2) Since time-consuming and significant deviation, the TTA method is not suitable for solving non-homogeneous burnup equations. 3) Matrix exponent numerical algorithms, like CRAM, QRAM, LPAM, and others, are more stable and reliable in the constant power than constant neutron flux. 4) Similarly, their results of the instantaneous are better than integral.

DISCUSSION
The program model MR is applied for unit testing to evaluate whether the code correctly performs the program design specifications. Next, the computational model MR is employed for integration testing to estimate whether the code accurately implements the numerical solution algorithm. Moreover, the physical model MR is performed for system testing to ensure that the code correctly explains the physical equations. Compared with the traditional verification model, this paper clearly defines the nature of verification activity. It makes the implicit evaluation of the program's properties explicating. Furthermore, we can perform qualitative verification on nuclear power software without benchmarks at a low cost by taking advantage of MT. It should be compliance testing before any quantitative examinations at every test level.
The lightweight verification method has the following advantages: 1) It assumes that the program accurately implements the numerical algorithm. 2) The properties of numerical algorithms and physical equations belong to high-order rules independent of the specific implementation of code. The program should keep these high-order regulations, whether the programming language is Python or C/C + +, whether the mathematical library is Intel MKL, OPENBLAS, or EIGEN. Therefore, lightweight verification has broader applicability and stronger reusability, which is helpful to improve the evaluation level of nuclear power software. It has important practical significance for shortening software certification time.
With advantage 1, the actual test time of NUIT only took 3 months, and there was no development cost of verification example for improvement of the test coverage. Based on advantage 2, the physical MRs are applied at the different solvers of NUIT, such as TTA, CRAM, and others. It reduces the test time significantly. Moreover, MRs identified from NUIT can verify other burnup calculation programs, such as KYLIN-2 developed by NPIC.
To sum up, the lightweight verification method based on MR alleviates the Oracle problem better compared with the traditional direct comparison method. It uses a lower cost to increase the test adequacy, reveal code bugs in early stage of verification, and avoid leaving defects to the subsequent testing level. Since reducing the cost of defect location and repair, improving the efficiency of research and development, it has broad application prospects in nuclear power software verification.
The main limitations of this method come from MR and source test cases. At present, MR identification technology mainly depends on manual analysis and inference, so data-driven MR mining technology is a promising research direction.