Application of Generalized (Hyper-) Dual Numbers in Equation of State Modeling

The calculation of derivatives is ubiquitous in science and engineering. In thermodynamics, in particular, state properties can be expressed as derivatives of thermodynamic potentials. The manual differentiation of complex models can be tedious and error-prone. In this work, we revisit dual and hyper-dual numbers for the calculation of exact derivatives and show generalizations to higher order derivatives and derivatives with respect to vector quantities. The methods described in this paper are accompanied by an open source Rust implementation with Python bindings. Applications of the generalized (hyper-) dual numbers are given in the context of equation of state modeling and the calculation of critical points.


INTRODUCTION
The calculation of derivatives is required in many branches of physics and engineering. In particular, derivatives are required in solvers for nonlinear equations or optimization problems. Accurate calculations of derivatives are even more important in cases where the derivatives are part of the model itself, rather than just appearing in algorithms that often do not suffer much from appropriate approximations to the Jacobians. The ideal solution would be to not approximate at all and implement all required derivatives of the model by hand. This approach guarantees accurate results and efficiency. However, for complicated models, the probability of errors in the implementation becomes high. The process can be aided by symbolic math toolboxes and automatic code generation. An entirely different approach that requires no additional implementation within the model, is the use of numerical derivatives like forward or central differences, and higher order methods. The simplicity of the approach comes at the price, that the result is inherently an approximation and the achievable precision depends on the chosen step size. For small step sizes, the calculation can become unstable due to machine precision. For large step sizes, the calculation is stable but less accurate. To find the optimal step size for an arbitrary problem can be challenging.
Exact results without differentiation by hand can be obtained using automatic differentiation (Griewank and Walther, 2008). In this approach, the calculations in the model are represented by a calculation tree, that contains all intermediate results and their derivatives. The individual steps in this tree consist of algebraic operations or elementary functions for which the derivatives are known. Depending on the order with which the derivatives are calculated, the methods are referred to as forward or backward accumulation. The method of using (hyper-) dual numbers (Fike and Alonso, 2011) and the generalizations introduced in this work can be understood as an automatic differentiation using operator overloading and forward accumulation. Automatic differentiation with dual numbers is similar to the wider known complex step differentiation (Martins et al., 2003). However, aside from the availability of complex numbers in many programming languages, there is no advantage whatsoever to use complex step differentiation above dual numbers: complex step differentiation can be seen as an improved numerical derivative that alleviates the problems at small step sizes, whereas dual number differentiation always results in exact derivatives within machine precision. Automatic differentiation using (hyper-) dual numbers has been used in various fields of engineering and science such as solving optimal control problems (Agamawi and Rao, 2020) or formulating equations of motion for rigid-and multi-body systems (Cohen and Shoham, 2015;Cohen and Shoham, 2017). Yet, the concept of automatic differentiation with dual numbers is rarely used in chemical engineering and thermodynamics in particular, with the exception of the work of Diewald et al. (2018) and Rehner et al. (2019). With this review of the mathematical background and the openly available implementation in Rust and Python, we aim to change that.
This manuscript is structured as follows: in section 2 the properties of dual and hyper-dual numbers are briefly revisited and a generalization to higher derivatives and vector valued gradients and Hessians is demonstrated. Section 3 outlines the implementation in Rust and Python that is available on GitHub. Section 4 shows applications for generalized (hyper-) dual numbers within thermodynamics and equation of state modeling. This is a particular interesting application for automatic differentiation as state properties can be identified as partial derivatives of thermodynamic potentials.

METHODS
This section aims to explain why dual and hyper-dual numbers are useful in the context of automatic differentiation by establishing an isomorphism between the number and derivative calculus. Isomorphism refers to the observation that all derivation rules have a counterpart in the arithmetic of the (hyper-) dual numbers. With this link established, it is shown how it can be reverted to obtain generalized (hyper-) dual numbers for the calculation of higher order and partial derivatives, gradients, Jacobians and Hessians.

Dual Numbers
Similar to complex numbers, dual numbers are formed by adjoining a new element ε with the property ε 2 0. A dual number x can thus be written in the form The product of x with a dual number y y 0 + y 1 ε is For the application in thermodynamic modeling and other parts of physics, the most interesting property, however, is the exact Taylor expansion All higher order terms cancel since they contain the factor ε 2 0. Therefore, the derivative of a (real) function f (x) can be calculated exactly by evaluating the function using dual numbers and taking the ε-component f 1 of the result, as (4) As opposed to numerical differentiation methods (forward, backward, or central differences), calculating the derivative with dual numbers does not introduce an error by truncating the Taylor expansion (the truncation is exact for dual numbers). Therefore the necessity to find an optimal step size is eliminated and we can simply choose x 1 1.
The multiplication, Eq. 2, of two dual numbers follows the same rules as the product rule in calculus and the Taylor expansion, Eq. 3, is analogous to the chain rule. With the chain rule, the product rule and the elementwise addition, the derivatives of arbitrarily complex functions are known. Therefore, dual numbers form an isomorphism to the first derivative, and the derivatives of any model, that is written using analytical functions can be calculated exactly using dual numbers. As an example, the division can be written as a product of the numerator with the divisor raised to the power of minus one: The result of this composition is the quotient rule in calculus. Additional functions like exponentials e x e x0 + x 1 e x0 ε or logarithms follow from Eq. 3.

Hyper-Dual Numbers
As an extension to dual numbers, hyper-dual numbers were introduced by Fike and Alonso (2011). Hyper-dual numbers contain two extra dimensions ε 1 and ε 2 with the property ε 2 1 ε 2 2 (ε 1 ε 2 ) 2 0. Thus a hyper dual number x can be written as For a function f (x, y), the exact Taylor expansion using hyper dual numbers is Therefore, hyper dual numbers can be used to calculate exact second partial derivatives f xy (x, y) by setting x 1 y 2 1 and all other derivative parts to zero. The first partial derivatives f x (x, y) and f y (x, y) are always calculated simultaneously. The product of two hyper dual numbers x and y xy x 0 y 0 + x 0 y 1 + x 1 y 0 ε 1 + x 0 y 2 + x 2 y 0 ε 2 + x 0 y 12 + x 1 y 2 + x 2 y 1 + x 12 y 0 ε 1 ε 2 (10) again corresponds to the product rule for partial derivatives and the chain rule can be written as To calculate derivatives using dual or hyper-dual numbers, Eqs. 3, 11 are used to implement the required elementary functions (exponentials, powers, trigonometric functions, etc.) and Eqs. 2, 10 are used to overload the multiplication operator. Then, by the virtue of Eqs. 4, 9, the derivatives of arbitrary functions are available by setting the first derivative part(s) of the input variable(s) to 1.

Generalizations
A detailed mathematical investigation of the properties of dual and hyper dual numbers is important in a general context. However, from a science and engineering point of view, the isomorphism between the number and properties of derivatives is all that matters. Dual numbers are usually presented as numbers with certain properties. Then, by identifying the isomorphism to the chain and product rules, the exact calculation of derivatives follows as an application. There is nothing stopping us from reversing the process: start with a certain application, like the calculation of third or higher order derivatives and generate a number that is isomorphic to the chain and product rules. In other words, we think about the number as data structures, that contain a function value and its corresponding derivatives simultaneously.

Higher Order Ordinary Derivatives
For ordinary derivatives of an arbitrary order all derivatives up to the highest order are required for intermediate steps.
Therefore, a generalized dual number of order K can be written as where the additional dimensions are referred to with the letter ] as opposed to ε to indicate, that they represent additional derivative orders instead of additional variables. The multiplication of two generalized dual numbers x and y can be written as The general expression of the chain rule requires the application of Faà di Bruno's formula where the inner sum sums over all tuples of non-negative integers (k 1 , . . ., k i ) with i m 1 mk m i. The general implementation is cumbersome and presumably slow. Therefore it is appropriate to implement these generalized dual numbers for a fixed value of K. For K 3, the product rule and the chain rule simplify accordingly. The i-th derivative (i ≤ K) of a real function f (x) can be identified as the ] i component of f evaluated using generalized dual numbers.

Gradients and Jacobians
With marginal changes to the formulation, a vector dual number can be defined that is able to calculate the gradient of a function with respect to an arbitrary amount of variables. To do so, the scalar dimension ε is replaced with a vector ε ε 1 . . . ε N ⊺ with N the number of variables. The vector dual number x can then be written as The product rule and the chain rule are modified accordingly. The most relevant use case is that for a function f (x) of N variables x x 1 . . . x N ⊺ , ε can be chosen as an array of N elements. Then the gradient ∇f can be determined from the dual part of f (x 0 + ε): For array valued functions F, this generalizes to the Jacobian J, as

Vectorized Second Partial Derivatives and Hessians
Vector hyper dual numbers can also be defined and allow the calculation of second partial derivatives with respect to an arbitrary number of variables. Both ε 1 and ε 2 are vectors and thus the vector hyper dual number x in general can be written as with the product rule and the chain rule To save computation time for the case of simple Hessians, only one of the first derivatives needs to be accounted for as they are identical. The vector hyper dual number can then be written as (26) instead. The Hessian H (and the gradient that comes for free) of a scalar function f(x) can then be calculated from

Higher Order Partial Derivatives
Higher order derivatives can be added by implementing new dual numbers with additional fields for the derivatives and extended sets of product and chain rules. However, by allowing the elements of any generalized dual number to be a generalized dual number themselves, higher order derivatives can be obtained without additional implementation work (Szirmay-Kalos, 2021). This becomes apparent, when a hyper dual number x is written as showing that it can simply be represented as a dual number for which the coefficients are themselves dual numbers. By avoiding redundancies in the calculation, a dedicated implementation for hyper dual numbers can be expected to be faster than the recursive version. However, the recursive implementation offers great flexibility when it comes to higher order derivatives for which a dedicated implementation becomes tedious. An example for which the usage of recursive dual numbers is useful is given in section 4.2.

IMPLEMENTATION
Our implementation is done in the Rust programming language, an open source language that was released in its first stable version in 2015 and has been increasingly adapted and used since then. It is an attractive choice for a numerical library because it is a strongly typed, compiled language that generates efficient machine code while offering high-level language constructs (e.g. pattern matching), elegant error handling, allows for generic programming and has automatic code creation features (macros). A very valuable feature of the Rust compiler is the so-called borrow checker that prevents operations that would lead to undefined behavior (null or dangling pointers) or data races at compile time.
Besides the compiler, Rust ships with Cargo, a package manager that resolves dependencies of external packages, compiles the code and its dependencies (by calling the compiler, rustc), runs tests and benchmarks and builds the documentation. The bindings to Python are written in pure Rust using PyO3, a Rust package that allows to build native Python extension modules.
Dual numbers as presented in this work are implemented in Rust as structured types (structs, similar to classes in other languages). Different from other object oriented languages, Rust provides inheritance and polymorphism in form of traits, i.e. a collection of methods that extend the data type's functionality. Overloading of operators is realized by implementing the respective traits e.g. the Add trait for addition, Sub for subtraction, and so on.
Dual numbers are generic over their inner data types and the number of variables. The generic data type allows to specify the precision, but also to define recursive dual numbers. The number of variables is specified using Rusts min_const_generics feature. Therefore, the size of the arrays is known at compile time and scalar dual numbers can be obtained as a special case of the generic vector dual numbers without any overhead. Our library defines a trait, DualNum, which is implemented for all dual numbers as well as floating point numbers (f64 and f32), and contains a number of useful traits for numerical operations (binary arithmetic operations, trigonometric functions, exponential and logarithm as well as spherical Bessel functions, comparison operators and so on). This trait can be used to write generic functions whose arguments and results can be any dual number types, and-since the compiler generates a version for each dual number type at compile time (a concept called monomorphization)-no checks at runtime are necessary. Since the DualNum trait is build upon the more generic Num trait (which is part of the Rust num package), dual numbers work seamlessly with other Rust packages that offer functions or structures parameterized over Num. For example, it is possible to write a generic function that uses the fast Fourier transform (using the RustFFT package) for which (partial) derivatives can be computed by simply calling the function with a dual number as argument instead of a floating point number.
Data types with generic parameters cannot be directly exposed to Python, since these parameters (and consequently the size of the data type) have to be known at compile time. Therefore, we offer several specific parameter combinations as Python classes, e.g. scalar dual and hyper dual numbers with 64 bit floating point numbers as inner data types. The Python data types work seamlessly with most of numpy's math functions so that existing code rarely has to be modified. The source code for the Rust package including the Python bindings is available on GitHub under the name num-dual. The code is also published on the Rust package repository crates. io and the python packages for Windows, Linux and macOS are available from the Python Package Index (PyPI).

APPLICATIONS
The usage of the generalized (hyper-) dual numbers in the last section is by no means restricted to a specific field in science or engineering. However, in this work we want to focus on Frontiers in Chemical Engineering | www.frontiersin.org October 2021 | Volume 3 | Article 758090 the application in equation of state modeling. Besides the documentation of the provided data types and functions, the github repository contains a comprehensive example which illustrates the methods and algorithms discussed below in form of a Jupyter notebook using the Peng-Robinson equation of state.

Calculation of Thermodynamic Properties
The calculation of state properties in thermodynamics is a particularly interesting application of generalized (hyper-) dual numbers because properties can be written as partial derivatives of a thermodynamic potential. In modern equations of state, this thermodynamic potential is usually the Helmholtz energy A with its characteristic variables temperature T, volume V, and number of particles of each species n. In this and the subsequent sections bold symbols indicate arrays over all components. With first order partial derivatives, the entropy S, the pressure p, and the chemical potentials μ can be obtained as Using dual numbers, the scalar properties can be calculated as With vector dual numbers, the chemical potential can be calculated in a single function evaluation, as With scalar dual numbers, the individual components have to be evaluated individually, as one single evaluation of the Helmholtz energy using the third order dual numbers introduced in section 2.3.1. By setting the ] 3 component of the Helmholtz energy A (T, V, n) is the second criticality condition. The critical temperature and either density or volume are iterated using a Newton scheme. The Jacobian can be approximated numerically, however, if recursive dual numbers and the calculation of eigenvalues and eigenvectors for dual numbers are available, it can be evaluated exactly. The temperature and volume are set to The entries of M (themselves dual numbers) are calculated by setting with the Kronecker delta δ ij and evaluating the Helmholtz energy The eigenvalue λ dual 1 and the corresponding eigenvector u dual are calculated as shown in Supplementary Material. Finally, the second criticality condition is determined by setting and evaluating A(T dual , V dual , n d3−dual ) 3 . The Jacobian is now available from the gradient parts of the two criticality conditions.
Here it was assumed that operators are available for all combinations of dual and recursive hyper-dual numbers. In practice this poses difficulties with respect to the implementations and is therefore likely not the case. Then, T dual and V dual can simply be lifted to the higher order dual number by setting all derivative parts to zero.

Cross Association
In equations of state that are based on thermodynamic perturbation theory by Wertheim (1984a), Wertheim (1984b), like the statistical associating fluid theory (SAFT) family (Chapman et al., 1989;Gross and Sadowski, 2001;Lafitte et al., 2013) or the cubic plus association (CPA) equation of state (Kontogeorgis et al., 2006), association contributions are used to model highly directional short range interactions like hydrogen bonds. The Helmholtz energy contribution for the cross association is given by with N Ai the number of association sites of kind A on molecule i and χ A i the corresponding fraction of non-bonded sites. What makes this contribution relevant in the context of generalized (hyper-) dual numbers is, that the fraction of non-bonded sites χ A i is determined by the implicit set of equations The exact expression for the association strength Δ A i B j differs between different equations of state, but the equations shown here are valid for all variants. To be able to automatically determine partial derivatives of the Helmholtz energy, the partial derivatives of the monomer fractions χ Ai are also required. With an appropriate iterative method, these derivatives can be obtained with hardly any additional implementations using generalized (hyper-) dual numbers.
The state of the art iteration method for the monomer fraction was once again presented by Michelsen (2006). Applied to the notation used above, the problem is reformulated as a minimization of the property with respect to the variables χ A i . The minimum of Q is obtained when the gradients vanish. The stationary points of Eq. 53 coincide with the solutions of Eq. 51. The solution is found by using the Newton iteration scheme with the modified HessianĤ with entrieŝ The modification of the Hessian ensures that it is negative definite. As shown in Supplementary Material, when Eq. 54 is iterated from some initial value using generalized (hyper-) dual numbers, the derivatives are calculated automatically. To speed up the computation, it is advisable to first solve the nonlinear equation for the real part. Then, as the equations for the derivatives are linear, the resulting χ Ai can be lifted to the relevant dual number and as many steps of Eq. 54 need to be applied as there are derivatives, i.e. one for dual numbers, two for hyper-dual numbers and three for third order dual numbers. With the derivatives of the monomer fractions in place, the Helmholtz energy contribution and its derivatives are available from Eq. 50.

DISCUSSION
Generalized (hyper-) dual numbers enable the calculation of exact derivatives of scalar and vector valued functions which is especially valuable when derivatives are part of the (physical) model. Derivatives no longer need to be implemented or approximated which leads to less error prone and faster development. Depending on the model, an implementation using dual numbers can be more costly to evaluate compared to hand-written derivatives. However, in the context of thermodynamic equations of state shown in this work, using the right type of dual number in combination with caching prior results, this disadvantage can be compensated.
Thermodynamic equations of state greatly benefit from generalized (hyper-) dual numbers because thermodynamic properties are computed from (partial) derivatives of a single function, the thermodynamic potential. We showed above that generalized (hyper-) dual numbers can be used to efficiently compute these properties even for complex models that contain implicit expressions like the association contribution in SAFT. In some cases, like the calculation of critical points, the algorithms themselves can be simplified using generalized (hyper-) dual numbers. Because only a single model equation has to be implemented (the Helmholtz energy), separation of model agnostic algorithms, like phase equilibria and stability analysis, and the model equation is simple and leads to more maintainable and extensible code. We believe that from a research point of view this enables faster and easier development of new models and algorithms, and consequently, it will enable faster adaption and transfer to industry.

AUTHOR CONTRIBUTIONS
PR and GB have contributed equally to design and implementation of the num-dual Rust and Python libraries as well as the scientific research and the documentation process.