ILP-based Resource Optimization Realized by Quantum Annealing for Optical Wide-area Communication Networks—A Framework for Solving Combinatorial Problems of a Real-world Application by Quantum Annealing

Resource allocation of wide-area internet networks is inherently a combinatorial optimization problem that if solved quickly, could provide near real-time adaptive control of internet-protocol traffic ensuring increased network efficacy and robustness, while minimizing energy requirements coming from power-hungry transceivers. In recent works we demonstrated how such a problem could be cast as a quadratic unconstrained binary optimization (QUBO) problem that can be embedded onto the D-Wave Advantage™ quantum annealer system, demonstrating proof of principle. Our initial studies left open the possibility for improvement of D-Wave solutions via judicious choices of system run parameters. Here we report on our investigations for optimizing these system parameters, and how we incorporate machine learning (ML) techniques to further improve on the quality of solutions. In particular, we use the Hamming distance to investigate correlations between various system-run parameters and solution vectors. We then apply a decision tree neural network (NN) to learn these correlations, with the goal of using the neural network to provide further guesses to solution vectors. We successfully implement this NN in a simple integer linear programming (ILP) example, demonstrating how the NN can fully map out the solution space that was not captured by D-Wave. We find, however, for the 3-node network problem the NN is not able to enhance the quality of space of solutions.


INTRODUCTION
Quantum computing is a cutting-edge technology that has gained significant relevance during the last decades.Algorithms for searching and optimization are currently studied intensively on quantum computers as they hold the potential for solving problems with non-polynomial (NP) complexity very efficiently.Nowadays, quantum computers have reached a scale that allows for the solution of non-trivial problems which have real-world applications.
One example is the energy-aware resource allocation of wide-area networks Chiaraviglio et al. (2009).In these cases, one can consider the resource allocation as an optimization problem and introduce it as a relevant application of quantum computing, and in particular quantum annealing (QA), as it inherently provides a certain failure tolerance with self-healing capability.Further, the problem has NP complexity and requires frequent solutions for just-in-time adaptation of the network.If solutions are generated quickly, say on the order of seconds, a revolution in network operation with increased network efficiency might be possible since current solutions obtained from classical and/or heuristic algorithms require 15 minutes or more for time-to-solution as shown in Tornatore et al. (2002) and Feller (2012).In previous studies, Witt et al. (2023), we have demonstrated how this resource allocation problem can be formulated, based on an integer linear program (ILP) model, as a quadratic unconstrained binary problem (QUBO) which can then be embedded onto a quantum annealer.
In our initial studies we used the D-Wave Advantage™ system (JUPSI) at the Forschungszentrum Jülich to perform the quantum annealing.As part of the solution process, the QUBO problem was embedded onto the quantum qubits prior to performing the quantum annealing.This entailed mapping the problem onto a network of logical qubits, whereby each logical qubit consists of a constellation, or 'chain', of physical qubits.This mapping ensures the requisite 'connectivity' of the logical qubits as dictated by the QUBO problem.We discovered that the network optimizing approach was greatly limited by this embedding process.For example, the optimization problem of a network with 3 nodes can be described as a QUBO with approximately 100 binary variables (logical qubits).Even though this 3-node problem is quite small, the required amount of physical qubits was in the range of 500 qubits, representing already roughly 10% of the physical qubits available in the D-Wave Advantage™ system.With the current embedding process that we employed at the time, a simulation of a 15-node problem, corresponding to a real-world network, would require a quantum annealer with approximately 50,000 physical qubits, which is an order of magnitude larger then current systems.We note that the embedding process is not unique.
Our initial studies also had limited scope in system run parameters, such as annealing time and profile of the annealing process, penalty factor of the QUBO matrix, and chain strength between physical qubits constituting logical qubits.Our choice of run parameters were constrained mainly to system default values, with little exploration on the dependence of quality of solutions on these run parameters.Therefore there is potential room for increasing the efficiency of the quantum annealing process (which would result in better quality and more feasible solutions) by judicious choice of optimized run parameters.
In this paper we address some of these issues by studying the process of annealing with the aim to optimize the parameters for the quantum annealing procedure.We introduce solution quality metrics for evaluation purposes.Of particular import is the Hamming distance metric, which rates the distance between the ideal and obtained solution vector in binary space.By using D-Wave solutions in conjunction with the Hamming distance to optimal solution, we empirically determine correlations between various run parameters and the quality of solution.These correlations guide us in determining optimized run parameters for the system in question, with the hope that the same optimized run parameters can be applied to similar, but larger, systems.Furthermore, we apply a decision tree neural network (NN) to learn these correlations, after which we use the NN to 'guess' improved solutions.This NN represents a machine learning (ML) approach that we couple with D-Wave generated solutions that aims at providing better quality solutions, and represents an example of a hybrid classical (ML)/quantum (QA) procedure for solving the combinatorial optimization problem.
Our paper is organized as follows.In the following section we give a cursory description of ILPs in general, the used method to solve ILPs on quantum annealer, and two ILPs that we examined in our study.We then introduce in Sec.2.4 a Hamming distance metric, and demonstrate how it is used to derive correlations between quality of solutions and various system run parameters.Such correlations will be 'learned' by our decision tree NN, which we describe in detail in Sec.2.5.In Sec. 3 we present our findings.We first concentrate on a simple ILP problem, demonstrating that our hybrid classical/quantum procedure does indeed result in new feasible solutions while at the same time providing guidance on optimized run parameters.We then apply the formalism to the 3-node network problem mentioned above, where here we see limited improvement in solutions, all of which unfortunately are nowhere near the optimal solution.In Sec. 4 we discuss our findings and recapitulate.We comment on possible future directions of investigation.

The Concept of Integer Linear Programs (ILPs)
The investigations performed in our work fall under the class of discrete optimization problems, meaning variables x to be optimized take on only discrete values.Such problems can be cast succinctly as an integer linear program (ILP), where certain constraints, given as a set of linear (in-)equation, have to be satisfied while minimizing a linear function.An ILP can be defined in its canonical form by objective x 0 = argmin The ILP's objective function can be seen as a loss function and is defined in (1) with a vector of cost terms c weighting the variable vector x.Matrix A and vector b parameterize the linear equations that represent the inequality constraints (2).They can be reshaped to equality constraints, Ax + b + s = 0, by introducing slack variables s ∈ R, s j ≥ 0. This is a typical step within the classical ILP-solving algorithm simplex, see Nash (2000).We use the convention, that R are real-valued numbers, N natural numbers inclusive zero, and B binary numbers.
Such problems are well known to be non-polynomial (NP)-hard in general.According to Adler et al. (2014) and Karp (1972), linear programs are a rare class of problems in NP that resists the classification as NP-complete or polynomial-solvable problems.Lenstra (1983) argued, that mixed-integer linear programs with fixed number of variables are solvable in polynomial-time.In contrary, Nguyen and Pak (2017) present integer programs that are NP-complete, even for fixed number of variables.Their work further shows, that some integer programs are solvable in polynomial-time.We can conclude, that bounded integer linear programs are NP-complete and are solvable within polynomial time in few cases.

Solving Integer Linear Programs on Quantum Annealer
Quantum annealers are well suited for investigating ILP problems.However, an additional modification to the ILP problem is required prior to embedding the problem on the quantum qubits.Here the constraints are included into the cost function (to be minimized) by introduction of penalty weight p.In so doing, the original ILP problem with constraints is recast into quadratic form without constraints, Here A, b and c are problem specific parameters as introduced in the previous section.It is useful to classify solution vectors x into two categories: feasible solutions which fulfill the constraints and unfeasible solutions which violate the constraints.While a feasible solution to the ILP is not necessarily an optimal solution, an unfeasible solution hypothetically can have a smaller objective value than the optimal feasible solution.
The problem is mapped to the quadratic unconstrained binary optimization (QUBO) by definition of matrix Q that includes a penalty factor p and a constant C. The inequality can be expressed by an equation and another minimization over a slack variable s incorporated into the bit vector q.The QUBO objective function minimizes both the ILP objective function plus another objective function representing the constraints.The penalty term expresses the relative weight between both (ILP objective and constraint) objective functions, see Chang et al. (2020) and Witt et al. (2023) for a more detailed description.Finding the solution set q 0 that provides the absolute minimum of q ⊤ Qq is equivalent to solving the original ILP problem with solution vector x 0 .
The D-Wave Advantage™ system is adapted to solving the Ising spin system that represents an array of binary spins with interactions between spins σ giving by some connectivity matrix J and external magnetic field h.Our QUBO matrix can easily be rewritten using J and h without any loss of generality, with Q 0 = Q − diag{ q}, q = diag −1 {Q}, and g some constant.The problem is now well suited for the D-Wave machine.In Chang et al. (2020) and Witt et al. (2023), we demonstrated proof of principle that such a problem can be solved on a quantum annealer.

Investigated Integer Linear Programs
In this work we have investigated multiple ILPs, two of which we define explicitly here.The details of the remaining ILPs we considered are described in our accompanying Supplementary Material.The first ILP optimizes the selection of two integer variables under some constraints.It provides a test case where all possible solutions can be studied with the approach of brute force sampling, i.e., it provides a well-suited setup for benchmarking.The second ILP describes a realistic network resource optimization as studied in Witt et al. (2023).As possible solutions are representable as binary vectors with more than 60 variables, a brute force sampling is not applicable within reasonable time for this case.

Trivial ILP
Based on expressions (1) to (4), we can define a particular ILP problem by A graphical interpretation of this ILP is depicted in Fig. 1.We can easily obtain the optimal solution vectors, and the optimal cost value c T x 0 = 6 from this graph.This problem is an explicit example of an ILP that can have more than one optimal solution, which in turn can cause misleading results in benchmarking experiments.In general, ILPs can have zero, one or more solutions.We restrict the values in x to x i ∈ {0, 1, 2, 3}, ∀i ∈ {1, 2}, such that the ILP is uniquely solvable.For binary representation of integer values, we will use 2 bits for each variable in x and 3 bits for each element in s.Then q is the binary search vector to be optimized.According to Chang et al. (2020), this mapping with integer mapping matrix Z can be described by In Tab.S1 of our Supplementary Material we enumerate other trivial ILPs that we have investigated.These ILPs encompass a range of optimal solutions, parameters, and dimensions.

Network Resource Allocation Problem
Optical wide-area networks consist of nodes that are linked by optical fiber systems in form of a meshed topology.Nodes v ∈ V are two layered.They are equipped with electrical IP routers in the upper layer and optical cross connects (OXCs) in the lower layer.Traffic from connected networks that traverses the wide area network is "handed over" at the IP layer.Signal transitions between layers inside the WAN are performed with optical bidirectional transceivers, that are configured for unidirectional use as required.Optical transceivers generate optical signals with a bandwidth of 50GHz at various center frequencies.A finite number of signals can be combined in a dense wavelength division multiplexing (DWDM) scheme on a particular optical fiber link.This schemes are specified according to ITU-T (2020).Thus, usable frequency bands in the optical region, typically referred as wavelengths, are uniquely defined.Optical cross connects enable wavelength-selective forwarding and redirection of optical signals between connected fibers.Fiber links are realized by a sequence of fiber spans and fiber amplifiers and provide a hardware-wise connection between nodes according to the networks topology.The maximal reach of optical signals depends on the signal configuration (specified by modulation schemes, and used forward error correction, etc.) and the transceiver type itself.As an example, a tunable coherent transceiver1 achieves an optical reach of 1000 km at a rate of 100 GBit/s.Typically, optical transmission paths are organized as a sequence of transmission sections c with at least one section to enable a end-to-end data transfer.Transmission sections are abstract links in the lower layer that provide optical transparent transmission on multiple wavelength.Their spanning distance is limited by the optical reach of the driving transceivers.Fig. 2 illustrates how transmission paths in wide-area networks can be realized.Energy-aware traffic engineering can be seen as a major task for economic network operation.Therefore, network resources like transceivers and wavelengths on fiber links have to be allocated to assign the required capacity to a transmission section c.Assuming that the network is operated as single rate system, i.e. all transceivers have the same signal rate, e.g.ξ = 100 GBit/s, capacities at a transmission section c can be scaled if multiple transceivers, enumerated by w c , are activated.Thus, the transmission section's capacity is w c ξ.A unidirectional traffic demand d represents a connectivity request between two network nodes.We assume, that a demand exist for all disjunct node pairs (u, v) with u ̸ = v and u, v ∈ V .The network has to provide appropriate transmission paths, i.e. routes through the network topology along a sequence of transmission sections, to enable the transport of the demand's traffic with volume h d .We prepare a network-specific collection T of possible transmission path realization prior the optimization, whereas possible transmission path realizations per demand d are defined as t d ∈ T d ⊂ T , see Witt et al. (2023) Sec.II-C.The economic resource allocation within WDM networks is a discrete and combinatorial optimization problem.Witt et al. (2023) devised an integer linear program (ILP) based on Enderle et al. (2020) to address the energy-aware resource allocation problem within wide-area networks and prepared the ILP according to the ILP-to-QUBO mapping formalism as presented in Chang et al. (2020) and delineated in Sec.2.2.They further studied the solvability of this ILP, prepared in QUBO form, on the D-Wave Advantage™, a state-of-the-art quantum annealer with over 5000 qubits.Since the current work focuses on improvement methods within the algorithmic part and not on the application itself, we refer to Witt et al. (2023) for a more in-depth explanation and interpretation of the ILP.In the following, we recapitulate the ILP briefly.Parameters and variables are given in Tab. 1. Traffic volumes h d per demand d, that are varying over time, are held constant during the optimization and will be updated frequently in a real scenario.The equality constraint (10) enforces that a demand is routed on exactly one transmission path.Constraint (11) combines traffic flows per transmission section as selected in (10) and reserves the required capacity in terms of optical channels w c .Constraint (12) activates installed transceivers to drive the transmission sections.Minimizing the number of optical channels w c as defined in objective (13), reduces the total amount of active transceivers as well.This enables a energy-aware network operation.

Constraints:
Objective: The network under test is a fully-connected 3 node network, e.g. the topology has a triangular shape with two short edges of 300 km length and a long edge spanning a 424 km distance.Each network edge is realized by two fiber links to realize bidirectional transmission.Traffic demand values h d are taken

Correlations Between Solution Metrics and System Run Parameters
With the intent to minimize the objective function, D-Wave provides a distribution of solutions, all of which are not equally important nor of equal quality.The setup of the ILP scenario (penalty term, float variable solution, integer sizes) and the QUBO (sparsity-affecting transformations, embedding, chain strength) parameterize the problem.The annealing procedure (annealing schedule, spin transformations, thermalization/decorrelation pauses) can also have significant influence on the obtained distribution of solutions.Studies like Willsch et al. (2022) show that a proper parameter selection in terms of annealing schedule and embedding variants can change the situation significantly.Furthermore, the effect of thermalization in the context of quantum annealing processes can have an impact, as was shown in Dickson et al. (2013).Ideally, the solutions to the problem should not be affected by the choices for these meta parameters.Still, we selected the range of parameters to be tested using our experience garnered from our previous study, see Witt et al. (2023).
However, as we show in later sections, different combinations of parameters significantly affect the likelihood of obtaining feasible solutions.Choices for such meta parameters can be highly correlated.For example, longer (slower) annealing profiles can provide higher probabilities for finding a feasible solution, yet at the cost of generating fewer total number of solutions.
Our first studies, Witt et al. (2023), found that probabilities for finding a minimal feasible solution for the three-node network problem were at the order of 10 −4 % and below.This presented a non-trivial task to evaluate the quality of the distribution of solutions when only having a sample sizes of less than 10 6 .To address this issue, we formulate statistical measures based on the distribution of samples to quantify the quality of our D-Wave setup.Since the optimal solution is, by definition, a feasible solution, we are interested in the rate in which feasible solutions are produced.We thus consider the feasibility ratio, that rates the success of finding N feasible feasible solutions within a solution set with N samples samples.
Another metric of choice for solutions in the binary search space that we use in our research here is the Hamming distance dist{x, y} This metric gives the number of flipped bits between an ideal solution x obtained by a classical ILP solver like CPLEX or GLPK and a non-ideal solution y obtained by the quantum annealer.For binary solution vectors the Hamming distance is equivalent to the L 2 -norm of the difference between x and y.This metric provides a sense of 'distance' between two solution vectors, essentially telling us how many 'bit-flips' are required to bring one solution into another.Ultimately it allows us to perform a direct comparison between particular D-Wave solution vectors and a known desired solution vector.
Finally, we can train a neural network (NN) on these correlations, with the goal that once trained, we can use the NN to make further guesses on optimal solutions vectors.We describe our NN in the following section.

Machine-learning Approach
We employ a decision tree (DT) neural network in our investigations.This NN is a type of supervised machine learning (ML) algorithm that is used typically for regression and classification analysis.It is a model that represents a series of decisions and their possible consequences in the form of a tree-like structure Breiman et al. (1984).Each node in the tree represents a decision, and each branch represents a possible outcome or path that can be taken based on that decision.In Fig. 3 we provide a graphical example of a decision tree and its mapping to a neural network.Fig. 3.A graphical example of decision tree network (left) and its mapping to a neural network (right).
A major advantage of decision tree NNs is their ease of use, understandability, and interpretability.This makes their implementation simple and their application efficient.Another advantage comes from their inherent robustness to data outliers.They can even handle missing values in the data.The data itself can be both categorical and numerical in nature.
However, a potential drawback of DTs is that they can easily overfit the data.This ultimately means that, though they may be sufficiently expressive to explain the trained data, they fail when extrapolating to new, or unseen data.Thus the NN is limited in its generalizability.This issue can to a certain degree be mitigated by pruning the tree or using other techniques to reduce the complexity of the model.In our studies we did not employ such mitigation techniques, and leave such potential studies for later investigations.We used the Scikit-learn python module Pedregosa et al. (2011) and its functionalities to implement our DT networks.

Construction of Sherrington-Kirkpatrick Graph
The Sherrington-Kirkpatrick (SK) graph encloses the coupling strength and external fields of a Ising Hamiltonian.As mentioned in Thai et al. (2022), finding the weighted minimal cut in this graph is equivalent to finding the ground state in the Ising Hamiltonian.Further, the Hamiltonian's energy landscape can be explored by exploration of the SK graph's cut space.
The corresponding SK graph of the Ising Hamiltonian H(x) = h ⊤ x + x ⊤ J x with n variables x i can be denoted as G SK H = (V, E, w) with node set V , undirected edges (i, j) ∈ E and their weights W .The first n nodes of V correspond to the variables x i .A further node is added to V to capture the external fields h.Set E contains only edges with non-zero weights according to w ij = J ij + J ji for 1 ≤ i, j ≤ n and weights w i,n+1 = h i .Then, the weighted adjacency matrix J ′ of graph G SK H with J ′ ij = J ij + J ji , J ′ ji = 0 and J ′ i,n+1 = h i can be used together with y ∈ S n+1 to define the SK Hamiltonian as To apply a weighted minimal-cut approach on the SK graph for graph reduction, a cut is defined by a subset S ⊆ V , such that ⟨S, V \S⟩ contains a set of edges that needs to be cut for separation of S and V \X.With c(S) = (u,v)∈⟨S,V \S⟩ w uv , the capacity of the cut, a minimal cut is defined as

Trivial ILP Problem
We now provide our findings for our simple ILP problem that we described in Sec.2.3.1.Similar results for the other trivial ILPs we considered are found in the Supplementary Material.Note that this problem is sufficiently small that we can determine all possible feasible solutions via brute force, which includes the optimal solution.In this case this corresponds to a total number of N feasible = 1536 feasible solutions.The whole solution space contains N samples = 2 13 = 8192 possible vectors as our binary search vector q has a dimension of 13, see Sec. 2.3.1.Thus, the feasibility ratio ( 14) for brute force sampling is r feasible = 18.75%.

Observations and Correlations
In Fig. 4 we show results for brute force sampling (left side) and a run on the D-Wave Advantage™ (right side) using a penalty of p = 2 and default run parameters.In the D-Wave case, just 200 samples are taken, which is a relative small portion (∼ 2.4%) compared to the complete solution space.We have to remark, that the optimal solution can be found even if the sample set is small.Fig. 4 shows the distribution of solutions over energy (upper row) and Hamming distance (middle row) obtained by the mentioned sampling methods and classified by their feasibility demarcated by feasible (green), infeasible (red), and all (blue) solutions.We can observe, that solutions obtained with D-Wave show low energies and only Hamming distances of up to 8.This indicates, that the aimed optimization takes place and only solutions with mostly good qualities are found by D-Wave quantum annealer.But, we still have to sort solutions by feasibility after sampling as minimizing the energy can not entirely sort out infeasible solutions.The lower row of Fig. 4 shows the correlations between the solution's energy and their Hamming distance in relation to the best feasible solution.Solutions with small Hamming distances tend to have smaller energy values as observable and indicated by the best fitting curves.We identified, that higher-energy solutions are correlated with increasing Hamming distances to optimal solution as the slope of the fitting curves are non-zero.The energy range for solutions at same Hamming distances is spread widely if the whole search space is considered.Feasible solutions could be found only at the lower energy range.Within the D-Wave sample set, solutions with small Hamming distances are over-proportionally feasible solutions which is indicted by the regression curves.In the brute force sampling case, we can describe the distribution of solutions upon the Hamming distance (Fig. 4 left, middle row) by a cumulative distribution function, with d representing possible Hamming distances for binary search vectors q of length N q .This relation can be used for benchmarking as it forms a fundamental boundary that only depends on the vector size.
In Fig. 5 (upper left panel) we show dependence of the feasibility ratio ( 14) as a function of anneal time and penalty factor, as well as the average hamming distance (upper right panel) as a function of the same parameters.There is seemingly little correlation between p and the anneal time as long as p ≳ 10.However, these results suggest that increasing beyond p ≳ 100 is beneficial since in this region solutions with lower Hamming distance are more likely.It is remarkable, that all feasibility ratios that are shown in Fig. 5 are significantly larger than the theoretical value of 18.75% for the case that all possible solutions are considered.
We also encountered a number of individual feasible solution samples obtained in a single run whereby the solution vectors fulfil the ILP's constraints and differ from each other in at least one of its components, but are not necessarily optimal solutions.It can happen that some of these individual feasible solutions can share the same cost value.The parameter dependence of the number of individual feasible solutions is presented in lower left panel of Fig. 5.We find that short annealing times generate more individual solutions, however at the expense of reducing the low-energy solutions.So the D-Wave quantum annealer finds more solutions with higher energies if shorter annealing profiles are applied.To no surprise, these correlations suggest that optimizing to longer anneal times will provide lower energy solutions.Similar findings are found for the other trivial ILPs listed in the Supplementary Material.The relevant figures in this case are Figs.S1-S3.
We point out that we find no correlations between the parameters chain strength and annealing time, suggesting that further optimization of the chain strength parameter is not possible.

Improvements obtained by Machine Learning Approach
Within a sample set, generated by D-Wave Advantage™ we have 110 independent solutions for our trivial ILP problem when using p = 2 and an annealing time of 20 µs.This represents approximately 10% of the possible feasible solutions.To improve upon this, we train a NN on the correlations described above and then use the NN to generate more solutions.
In particular, we train our NN using the solution vector versus the energy and feasibility correlations obtained from D-Wave data.With input of energy and feasibility, the decision tree regression predicts a new solution which has the corresponding input energy and feasibility.We note that our NN does not always provide new solutions whose output energy coincides with the same input energy.This is readily seen in the left plot in Fig. 6, where the output energy E out is plotted as a function of input energy E in .A one-to-one correspondence would provide a straight line with slope of unity, which is clearly not seen.However, the correlation between input and output energy as captured by our NN is still positive.We find that the slope of this correlation depends on the p value, whereby larger p values provide a slope closer to unity.Qualitatively similar behavior is found for the ILPs listed in the Supplementary Material, as can be seen in Fig. S5 of this document.We expect the decision tree to recognize the feasibility condition, but predicted solutions of the NN are not always feasible.As mentioned above, the NN predicts solution vectors whose energy ranges have some correlation with the input energies.This feature provides, in principle, an advantage over brute-force sampling since we can target solutions within a specific energy range using our NN, whereas such control via brute force sampling is not possible.However, there isn't a complete one-to-one correspondence between input and output energy since approximately 20% of the predicted solution vectors have components that are not binary but contain fractional numbers.In these cases we round the fraction to zero if the fractional number is smaller than 0.2, and to one if larger than 0.8.Between 0.2 and 0.8, we enumerate all possible combinations of 0 and 1, generating in these case new proposed solution vectors.We then perform another feasibility test on these NN solutions to filter out infeasible solutions.The energy distribution of feasible versus infeasible solutions after this treatment is shown in Fig. 6 right.

Three-node Network
We now turn our attention to the 3-node problem, which represents the smallest, non-trivial system of wide-area networks.Here we use CPLEX to obtain the optimal solution vector, from which we make comparisons with D-Wave solution vectors.The distribution of D-Wave solutions as a function of Hamming distance to the optimal solution is given in Fig. 7.Note that in this case the optimal solution is not captured by D-Wave.In fact, D-Wave cannot find any feasible solutions within a set of 600 000+ samples.As remark, the entire search space for this case is 2 63 .
When we investigate inter-parameter correlations, we find little to no correlations between the Hamming distance, chain strength, anneal time, and penalty factor p. This is demonstrated by the nearly flat dependence of the data in Fig. 8.This lack of correlation prevents us from obtaining optimized run parameters for this system, and unfortunately suggests that larger node problems will become just as difficult, if not more difficult, to optimize.These findings already hint at the difficulties we encounter when applying an NN to this system, as we describe in the following section.But we nonetheless train a DT network on the energy and Hamming distance to optimal solution, exactly as described in Sec.3.1.2.

Interpretation of Findings
The total number of feasible solutions of the trivial ILP problem is 1536.As previously mentioned, D-Wave finds a little less than 10% of these solutions, but with our NN we can fully ascertain the full solution space distribution (compare the lower right panel of Fig. 5 with that of Fig. 6 and see also Figs. S2 and S5 of our Supplementary Material).More concretely, we provide the exact number of addition feasible solutions found with our NN as a function of input parameters annealing time and penalty factor p in Fig. 9 (see Fig. S4 for our other trivial ILPs).This means that, for our simple ILP problem, the decision tree after round off treatment provided 1426 new independent feasible solutions.The distribution of new solutions as a function of Hamming distance provided by our ML technique is given in Fig. 10.So combining our NN results with D-Wave's, all possible 1536 solutions were found.Thus our hybrid classical (ML)/quantum (D-Wave) method allowed us to fully map out the full solution space.We note that our NN is not generalizable to all trivial ILPs, but is unique for each ILP.This is because the solution vector space generated by D-wave is specific for each ILP, and so each NN is trained with this specific solution vector layout.Within our formalism a 'master' NN for all trivial ILPs is not possible.
We now discuss our 3-node problem.Note that in this case, D-Wave could not find the optimal solution provided by CPLEX, despite our system parameter investigations mentioned in the previous section.It is not viable to assume that feasible solutions can be found by luck or random guessing.The probability to find the optimal (minimal and feasible) solution is 1/2 63 ∼ 10 −19 in our case.The fact that we could not find feasible solutions within a set of 600 000 samples indicates that feasible solutions are very rare.This was already observed in our previous study Witt et al. (2023).There, we were not able to find any feasible solutions for some of the test sets and in other cases around 0.2 to 11 per million samples.There are some possible hints for why this is the case here.First of all, the entire solution set contains only a small portion of feasible solutions that fulfill the ILP.Further, the annealer minimizes the energy of the QUBO Hamiltonian.As it is possible that the lowest energy state can be obtained with various solution vectors someone could find also a energy-wise optimized vector that does not fulfill the ILP.Furthermore, hardware imperfections like noise or limited detection resolution can cause this undesirable behavior.
At this point, critical voices could rate the annealer as an expensive random sampler.But this is not the case as we were able to show that trivial ILP problems are definitely solvable with D-Wave.In these cases, we explicitly used less samples than the solution space's size to avoid an oversampling-somebody could also solve small problems by oversampling even if the sampler is neither a random guess sampler where each solution is equal probable or a minimizing sampler like the quantum annealer.Thus, D-Wave performs better than a random sampler.Clearly the solvability is not the same for the network problem case, and this may be due to a) a higher connected QUBO and longer chains of qubits that represent a logical qubit, which cause chain breaks in the quantum annealing hardware to be more likely, b) numbers in the QUBO matrix have a higher differing range that may be not represented in the hardware well-enough, and of course c) other issues that are beyond our knowledge.
As part of an approach for improvements, we trained our NN for the 3-node problem with the distributions that we generated from our correlation studies in a comparable way as it was done in the simple ILP problem.Once trained, we found, however, that the NN was unsuccessful in finding any new feasible solutions, let alone the optimal solution.We attribute this to the fact that our D-Wave data distribution of energies (which is used to train the NN) does not cover the energy region of the optimal solution.In fact, as shown in Fig. 11, the distribution of D-Wave solutions is far from the optimal solution.Our NN could therefore not generalize sufficiently to lower energy solutions.Compounding the issue is the fact that the distribution of D-Wave solutions contained no feasible solutions, and this in turn limited what the NN could 'learn'.Thus our hybrid (ML)/quantum (D-Wave) method failed to produce any new solutions for our 3-node problem.An obvious question to raise is whether another choice of NN is better suited for our 3-node problem.As we discussed in Sec.2.5, one of the main advantages that motivated our choice of the decision tree NN is admittedly its ease of use, interpretability, and implementation.However, because of its potential lack of expressivity, one could argue that another choice of NN, e.g.convolutional or recurrent, might lead to better results.This indeed may be the case, and at the least warrants further research.We point out, however, that regardless of the NN architecture, our formalism requires that there exist correlations between hyper-parameters and the resulting D-Wave solutions vectors.It is these correlations that are 'learned' by the NN.Since we found no such correlations in our 3-node problem, we suspect that any other type of NN will have similar difficulties as those encountered by our decision tree NN.

Outlook on Further Improvements
Still there may be ways to improve the situation.Our studies to date have only varied the annealing profile.Instead, one may perform reverse annealing, where the annealing is run 'backwards' from a starting classical solution, allowing for exploration of the energy landscape around the classical solution.We are actively investigating this procedure.Reverse annealing may be also applicable to set initial states as shown in Pelofske et al. (2023).Thus, expected solutions or solutions that are close to an expected solution can be set as start value for the annealing process.If the optimizer is applied frequently-a typical situation in network optimization-the last obtained solution can be used for the initialization of the next run as new optimal network configurations might be close to the last configuration.
Annealing parameter like annealing schedules and various embeddings for our problems can be studied more detailed like in the study of Willsch et al. (2022).The authors of Willsch et al. (2022) discovered an increase in the success rate for proper settings in the annealing schedule.In our case we observed a more or less constant success rate, especially for the 3-node network problem.Apart from that it may be valuable to study our approach on a larger set of similar problems to get a more general perspective.Unfortunately, we had to restrict our study on a single problem instance as the amount of feasible solutions for our problem is very rare and large sampling sets are required for the analyzes.Besides, thermalization within the annealing process can be studied as well, see Dickson et al. (2013).
Furthermore, since the size of the problem that is embedded on the quantum annealer plays a crucial role for its solvability, methods for efficient embedding or problem reduction should be incorporated within future studies.We point out that the work of Thai et al. (2022) seems promising in reducing the demands on the number of physical qubits.Here the authors introduced a fast Hamiltonian reduction algorithm (FastHare) that defines non-separable groups of qubits, i.e. qubits that obtain the same value in optimal solutions, and performed a reduction by merging non-separable groups into single qubits.This could be done within a worst case time complexity of O(αn 2 ) with a user-defined parameter α.The authors of Thai et al. (2022) showed in a benchmark that their algorithm is capable of saving 62% of physical qubits on average within a processing time of 0.3 seconds, outperforming the roof duality-the reduction used within the D-Wave's software development kit SDK.We reviewed parts of their work.In particular, we mapped our trivial ILP problem to a so-called Sherrington-Kirkpatrick (SK) graph.We further evaluated all cut values within this graph.The results Fig. 12   Unfortunately, we were not able to fully implement and apply this sophisticated algorithm as we struggled at the following point.The algorithm applies a min-cut algorithm on the SK graph to detect non-separable qubit groups.Originally, we though that a standard min-cut algorithm could be applied here.Unfortunately, the for us available min-cut algorithms can be only applied in graphs with positive-weighted edges.But, due to the nature of QUBO, Ising or SK Hamiltonians, the edges in a SK graph may have negative-valued edge weights.This issue was not addressed in their work Thai et al. (2022).However, it remains unsure, if the fast Hamiltonian Reduction (FastHare) algorithm can improve the solvability of our ILPs with D-Wave's annealer as the authors used randomly generated graph structures in their evaluation, i.e., the graphs are weakly connected and as such well-suited for graph compression.
Beside the ILP to QUBO mapping formalism that was described in Chang et al. (2020) and Witt et al. (2023), someone could model the problem in a differing way.One possibility is the introduction of constraint-specific penalty factors, that create new degrees of freedom usable for problem-specific optimization of the algorithm.It can be achieved by the use of a penalty vector p ⊤ = [p 1 , p 2 , . . ., p m ] and a corresponding penalty matrix P = Ip inside the formulations.The QUBO Hamiltonian and thus the objective to be optimized is then This extends the ILP to QUBO mapping formalism to a generalized form.Required details could be found in Witt et al. (2023), Sec.III-D.

Outcome
Our work can be summarized as follows.The approach aims to solve ILPs with a quantum annealing attempt .We tried to find optimal annealing parameters and discovered weak correlations between annealing parameters and success rates in the 3-node network case.Further, a decision tree ML approach was applied to increase the rate of feasible ILP solutions.We realized that further improvements are needed to overcome remaining hurdles and discussed some attempts therefore.Even as the results for the 3-node problem are not fully satisfying, we are able to show with less complicated ILP problems that the approach works in principle.Thus, we expect that the approach can be extended in a way that larger problem instances are also solvable.
Finally, fast ILP-solving methods can have a significant impact on systems that should be optimized in real time.As an example, a novel mode of real-time network operation in wide-area networks is studied in Witt (2024).Here, similar ILPs are used to define a frequently applied network optimization.S1.Varied parameters are the QUBO-specific penalty p and the annealing time.

Fig. 1 .
Fig. 1.Graphical interpretation of the trivial ILP problem.The drawn constraint lines are slightly shifted for visualization purposes without falsifying the feasible region of integer values.

Fig. 2 .
Fig. 2. a-c) Various ways of realizing transmission paths in wide-area networks with optical DWDM layer and d) architecture of a OXC as introduced in Witt et al. (2023).
S⊆V c(S) with the minimal capacity of M C(G SK ) = min S⊆V c(S).

Fig. 4 .
Fig. 4. (Left side) brute force sampling to investigate the entire solution space.(Right side) D-Wave sampling with penalty p = 2 and a set of 200 samples.(Upper row) histogram of solutions sorted by energy values.As solutions gathered by D-Wave's quantum annealer have only energy values in the lower area compared to the brute force case, x-axis are scaled differently.(Middle row) histogram of solutions over Hamming distance with respect to the best feasible solution vector.(Lower row) scatter plot of solutions with reference to their energy values and the Hamming distance with respect to the best feasible solution vector.(Blue) solution set under investigation.(Red) infeasible solutions.(Green) feasible solutions.

Fig. 5 .
Fig. 5. Feasibility rate (upper left), averaged Hamming distance (upper right), and (lower left) number of individual solutions as a function of anneal time (µs) and penalty factor p. (Lower right) energy distribution at p = 2 and annealing time of 20µs.Values based on samples generated by D-Wave Advantage™ to solve our trivial ILP problem.

Fig. 6 .
Fig. 6.Decision tree method to find more feasible solutions based on D-Wave data at p = 2 and annealing time=20 µs.

Fig. 7 .
Fig. 7. Histogram of Hamming distance of the all solutions for 3-node problem.There are no feasible solutions.

Fig. 8 .
Fig. 8. Hamming distance in dependence of penalty p and annealing parameters chain strength and annealing time.The Hamming distance is obtained by comparing with the best solution obtained by CPLEX.

Fig. 9 .
Fig. 9.The number of new independent feasible solutions found by decision tree for the ILP problem.

Fig. 10 .
Fig. 10.Histogram of Hamming distance of the feasible solution obtained by D-Wave (blue) and the new feasible solutions obtained by decision tree method for the ILP problem.

Fig. 11 .
Fig. 11.Comparing the data distribution and the best solution.We convert the solution vector to decimal number to plot.
show that the cut values in the SK graph correspond to the energy values of QUBO or Ising problem solution vectors.As the Hamiltonian reduction is based on graph compression on basis of minimal cuts, we expect that the proposed algorithm Thai et al. (2022) can improve the situation, as a reduced Hamiltonian might be better solvable on the D-Wave quantum annealer.

Fig. 12 .
Fig. 12. Evaluation of the trivial ILP problem in representation as Sherrington-Kirkpatrick (SK) graph.(Left) distribution of energy for SK graph's Hamiltonian if all possible solutions are considered, (right) distribution of all cut values in SK graph.

Figure S1 :
Figure S1: Average values of the Hamming distances between the best known solution and solutions obtained by the D-Wave Advantage™ quantum annealer for various trivial ILP problems that are defined according to TableS1.Varied parameters are the QUBO-specific penalty p and the annealing time.

Figure S4 :
Figure S4: Amount of new feasible solutions obtained by a neural network based on the decision tree method.Results are shown for various trivial ILP problems that are defined according to TableS1.Varied parameters are the QUBO-specific penalty p and the annealing time.

Table 1 .
List of parameters used in the ILP for network optimization.N amount of transceivers installed at node v ρ c,t d ∈ B indicates whether the transmission section c is part of the demandspecific transmission path realization t d φ v,c ∈ B indicates whether transmission section c is connected to node v h d ∈ R traffic volume of demand d Variables g t d ∈ B path selector is 1 if a transmission path for demand d is realized by circuit configuration t d ∈ T d ω c ∈ N amount of active transceivers, driving a transmission section c from a normal distribution with mean 75 Gbit/s and standard deviation 20 Gbit/s.As they represent floating numbers, we discretize them with an accuracy of a=1 (acc.to Witt et al. (2023) Sec.III-C), i.e. fractions are rounded to 'x.0' or 'x.5'.We set the number of installed transceivers per node to η v = 15 and the maximal number of parallel optical signals per transmission path to ω c,max = 3.The parameter ω c,max influences the QUBO's matrix sizes as described in Witt et al. (2023) Sec.III-C.The parameters ρ c,t d and φ v,c represent the connectivity described by the topology.They are predefined together with the transmission path realization sets T d .The boolean selector variable g t d , indicating the selection of a predefined transmission path realization t d for demand d, and the number of parallel optical signals per transmission path ω c are determined during the optimization.