Mapping constrained optimization problems to quantum annealing with application to fault diagnosis

Current quantum annealing (QA) hardware suffers from practical limitations such as finite temperature, sparse connectivity, small qubit numbers, and control error. We propose new algorithms for mapping boolean constraint satisfaction problems (CSPs) onto QA hardware mitigating these limitations. In particular we develop a new embedding algorithm for mapping a CSP onto a hardware Ising model with a fixed sparse set of interactions, and propose two new decomposition algorithms for solving problems too large to map directly into hardware. The mapping technique is locally-structured, as hardware compatible Ising models are generated for each problem constraint, and variables appearing in different constraints are chained together using ferromagnetic couplings. In contrast, global embedding techniques generate a hardware independent Ising model for all the constraints, and then use a minor-embedding algorithm to generate a hardware compatible Ising model. We give an example of a class of CSPs for which the scaling performance of D-Wave's QA hardware using the local mapping technique is significantly better than global embedding. We validate the approach by applying D-Wave's hardware to circuit-based fault-diagnosis. For circuits that embed directly, we find that the hardware is typically able to find all solutions from a min-fault diagnosis set of size N using 1000N samples, using an annealing rate that is 25 times faster than a leading SAT-based sampling method. Further, we apply decomposition algorithms to find min-cardinality faults for circuits that are up to 5 times larger than can be solved directly on current hardware.


Introduction
In the search for ever faster computational substrates recent attention has turned to devices manifesting quantum effects.Since it has long been realized that computational speedups may be obtained through exploitation of quantum resources, the construction of devices realizing these speedups is an active research area.Currently, the largest scale computing devices using quantum resources are based on physical realizations of quantum annealing.Quantum annealing (QA) is an optimization heuristic sharing much in common with simulated annealing, but which utilizes quantum, rather than thermal, fluctuations to foster exploration through a search space [14,23,12].
QA hardware relies on an equivalence between a physical quantum model and a useful computational problem.The low energy physics of the D-Wave QA device [2,10,20,22] is well captured by a time-dependent Hamiltonian of the form H(t) = A(t)H 0 + B(t)H P where H 0 = i∈V (G) σ x i includes off-diagonal quantum effects, and where H P = i∈V (G) h i σ z i + (i,j)∈E(G) J i,j σ z i σ z j is used to encode a classical Ising optimization problem of the form min s E(s) ≡ min s i∈V (G) On the D-Wave device the connectivity between the binary variables s i ∈ {−1, +1} is described by a fixed sparse graph G = (V, E).The weights J ≡ {J i,j } (i,j)∈E(G) , and the linear biases h ≡ {h i } i∈V (G) are programmable by the user.The A(t) and B(t) functions have units of energy, and satisfy B(t = 0) = 0 and A(t = τ ) = 0 so that as time advances from t = 0 to t = τ the Hamiltonian H(t) is annealed to a purely classical form.Thus, the easily prepared ground state of H(0) = H 0 evolves to a low energy state of H(τ ) = H P , and measurements at time τ yield low energy states of the classical Ising objective Eq. ( 1).Theory has shown that if the time evolution is sufficiently slow, i.e. τ is sufficiently large, then with high probability the global minimizer of E(s) can be obtained.Physical constraints on current hardware platforms [6] impact this theoretical efficacy of QA. [3] has noted the following issues that are detrimental to performance: 1. Limited precision/control error on parameters h/J: problems are not represented exactly in hardware, but are subject to small, but noticeable, time-dependent and time-independent additive noise.
2. Limited range on h/J bounds the range of all parameters relative to thermal scales k b T : thus very low effective temperatures which are needed for optimization when there are many first excited states are unavailable.
3. Sparse connectivity in G: problems with variable iteractions not matching the structure of G cannot be solved directly.
4. Small numbers of total qubits |V (G)|: only problems of up to 1100 variables can currently be addressed.
[3] suggested approaches ameliorating these concerns.The core idea used to address concerns 1 to 3 is the construction of penalty representations of constraints with large (classical) energy gaps between feasible and infeasible configurations.The large energy gaps buffer against parameter error and maximize energy scales relative to the fixed device temperature.Sparse device connectivity was addressed using locally-structured embedding, which consists of placing constraints directly onto disjoint subgraphs of G and routing constraints together using chains of ferromagnetically-coupled qubits representing the same logical variable.This differs from the more common global approach in which constraints are modelled without regard for local hardware structure.We contrast the two approaches in §2.1, and provide some experimental evidence that the locally-structured approach is well-suited to current QA hardware.With locally-structured embedding, the number of qubits used, size of the energy gaps, and size of chains play an important role in determining D-Wave hardware performance.Here, we expand on the methods in [3] and offer several improvements.One way of reducing the required number of qubits, described in §2.2, is by clustering constraints, thereby reducing the number of literals in the CSP.To maximize energy gaps, we follow the methods in [3] but extend them to maxconstraint-satisfaction problems (MAX-CSP): given a set of constraints, find a variable assignment that minimizes the number of constraints that are unsatisfied.§2.3 describes two extensions: one that involves the explicit introduction of variables to indicate the reification of the constraints, and one that does not.Lastly, §2.4 describes how to reduce the size of the largest chains used by combining placement and routing into a single, iterative algorithm.Using linear programming, we can also find effective lower bounds on the size of the largest chains, which makes optimal routing faster.
To address the issue of a limited number of total qubits, [3] adapted two problem decomposition methods to the Ising context, namely dual decomposition (DD) and belief propagation (BP).However these algorithms suffer from issues of precision and a large number of iterations respectively.In §2.5 we give two alternatives.One is the well-studied Divide-and-Concur algorithm [18], which produces excellent experimental results.The other is a novel message passing algorithm based on distributed minimization of the Bethe free energy called Regional Generalized Belief Propagation; this offers some of the potential benefits of BP with far fewer calls to the QA hardware.
A salient feature of D-Wave QA device is the low cost of sampling low energy configurations of Eq. ( 1).After a constant overhead time to program h and J, additional i.i.d.samples can be obtained at an annealing rate of 20 µs/sample.Consequently, problems where a diversity of ground states are sought form an interesting application domain.As an application of our MAX-CSP modelling techniques, we focus on the fault diagnosis problem.In fault diagnosis each constraint is realized as a logical gate which defines the input/output pairs allowed by the gate.A circuit of gates then maps global inputs to global outputs.An error model is prescribed for each gate, and fault diagnosis seeks the identification of a minimum number of faulty gates consistent with observed global inputs and outputs and error model.A diversity of minimal cost solutions is valuable in pinpointing the origin of the faults.In §3, we test the ability of the D-Wave hardware to generate a range of minimal cost solutions and also use the hardware to test various decomposition algorithms on a standard benchmark set of fault diagnosis problems.

Approaches to Embedding
Modeling a constrained problem as a G-structured Ising objective requires reconciliation of the problem's structure with that of G. Two approaches may be taken to accommodate the connectivity required by G.In global embedding we model each constraint as an Ising model without regard for the connectivity of G, add all constraint models, and map the structure of the aggregate model onto G using the heuristic minor-embedding algorithm of [7].Previous examples of global embedding include [4,11,35,39,47,51].Alternatively, when the scopes of constraints are small, locallystructured embedding [3] models each constraint locally within a subgraph G ⊂ G, places the local subgraphs G within G, and then connects (routes) variables occurring in multiple local subgraphs.Figure 1 contrasts the two approaches.
The methods offer different trade-offs.The former method typically utilizes fewer qubits, and has shorter chains of connected qubits representing logical problem variables.The latter method is more scalable to large problems, usually requires less precision on parameters, and offers lower coupling strengths used to enforce chains.More precisely, assume an embedded Ising model is parameterized by [h, J] = [h, J P +αJ C ], where [h, J P ] describes the encoded constraints, J C enforces the couplings within chains, and α > 0 is a chain strength.In a satisfiable CSP that has been embedded with the locally-structured approach, the chains representing a solution to the CSP will be a ground state of [h, J] regardless of the choice of α. 1 In contrast, for some global embeddings, the chain strength required to enforce unbroken chains can grow with system size, increasing the precision with which the original problem must be represented and making the dynamics of quantum annealing more difficult [48].Whether the benefits of improved precision and lower chain strengths outweighs the drawbacks of using more qubits depends on the problem.Figure 2 gives an example of a problem class (random XOR-3-SAT problems) for which the overall performance and scaling of quantum annealing hardware is noticeably improved with locally-structured embedding.For the fault diagnosis problems studied here, the locally-structured approach also performs better, and we pursue improvements to the locally-structured algorithm of [3].[21], using global (blue) and locally-structured (red) modelling strategies.Each XOR-3-SAT instance is randomly generated subject to having a unique solution and a clause-to-variable ratio of 1.0.Global embeddings: each constraint s1s2s3 = 1 (with si = ±1) is encoded in the 4-qubit penalty model in Figure 1(C), which has energy gap g = 1.The Ising model representing the sum of these constraints is then embedded using the heuristic in [7].Local embeddings: each constraint is mapped directly to the K3,3structured Ising model in Figure 1(A), which has energy gap g = 2. Constraints are then embedded using the rip-up and replace algorithm in §2.4.2.Left: Scaling of hardware performance with no post-processing.ST99 is the number of samples needed to find the ground state with 99% probability, which, given a fraction p of all samples taken that are in the ground state, is given by the formula ST99 = log(1 − .99)/log(1 − p) [40].Bold lines indicate the median across 50 instances of each problem size.For points not shown, the hardware failed to find a ground state.Middle: total number of qubits required to embed the Ising model (median).Right: number of qubits in the largest chain in the embedding (median).

Preprocessing
For some CSPs, aggregating several small constraints into a single larger one prior to modelling may lead to more efficient hardware mappings and better hardware performance.The benefit stems from the fact that variables appearing in multiple constraints within a cluster need only be represented once (or perhaps not at all).As an example, consider the boolean-valued constraint y = XOR(x 1 , x 2 ).If XOR is represented using AND/OR/NOT gates, for example as {a 1 = AND(x 1 , ¬x 2 ), a 2 = AND(x 2 , ¬x 1 ), y = OR(a 1 , a 2 )}, at least 9 literals are needed.On the other hand, by clustering the three gates XOR can be represented by an Ising model directly using only 4 qubits (see Figure 1(c)).
Unfortunately as constraints become larger it becomes more difficult to find Ising models to represent them.This upper limit on the practical cluster size requires straightforward modifications to standard clustering methods like agglomerative clustering [45].When the CSP is derived from a combinational circuit, cone-based clustering [43,33] can also be adapted to accommodate bounds on the cluster size.Both agglomerative and cone clustering can be performed in polynomial time.

Mapping constraints to Ising models
Regardless of whether or not constraints are clustered, the next step in mapping to hardware is identifying an Ising model to represent each constraint.We assume a constraint on n binary variables is characterized by a subset of {0, 1} n which indicates valid variable assignments.Since quantum hardware uses spin variables with values in {−1, +1}, we identify 0 with -1 and 1 with whenever no chains are broken.
+1, and assume that the feasible set F is a subset of {−1, +1} n .Our goal is to find an Ising model that separates the feasible solutions F from the infeasible solutions {−1, +1} n \ F based on their energy values.In particular, the ground states of the Ising model must coincide with the feasible solutions F. Furthermore, to improve hardware performance, we seek Ising models for which the gap, that is, the smallest difference in energy between feasible and infeasible solutions, is largest.
Typically, due to both the complexity of the constraint and the sparsity of the hardware graph, the Ising model requires ancillary variables, which also may help to obtain larger gaps.We assume that the allowable interactions in the Ising model are given by an m-vertex subgraph G of the hardware graph G, where m ≥ n.The constraint variables are mapped to a subset of n vertices of G, while the rest of the vertices are associated with h = m − n ancillary variables.For simplicity, we write a spin configuration z ∈ {−1, +1} m as z = (s, a), meaning that the working variables take the values s, while the ancillary variables are set to a.
The Ising model we seek is given by variables θ = (θ 0 , , where θ i are the local fields h i , θ i,j are the couplings J i,j , and θ 0 represents a constant energy offset (unconstrained).To simplify the notation, for a configuration z we define φ(z) = (1, (z i ) i∈V (G) , (z i z j ) (i,j)∈E(G) ).Thus, the energy of z is given by E θ (z) = θ, φ(z) .The hardware imposes lower and upper bounds on θ, that we denote respectively by θ and θ (with To separate feasible and infeasible solutions we require that for some positive gap g: Thus, the problem of finding the Ising model with largest gap can be stated as follows θ ≤θ ≤ θ. Here, constraints (3) and ( 5) guarantee that all feasible solutions have minimum energy 0, while constraint (4) forces infeasible solutions to have energy at least g.This optimization problem is solved as a sequence of feasibility problems with fixed gaps g.Using the fact that the interaction graph G has low tree-width, we can significantly condense the formulation above.In this way the number of constraints may be reduced from exponential in m to exponential in the treewidth of G.The resulting model is solved with a Satisfiability Modulo Theories (SMT) solver (see [3] for more details).
The penalty-finding techniques above assume that a placement of variables within the Ising model is given.However different placements allow for different energy gaps, and it is not clear, even heuristically, what characteristics of a placement lead to larger gaps.For small constraints, canonical augmentation [32] can be used to generate all non-isomorphic placements.

Methods for MAX-CSP
Borrowing from fault diagnosis terminology, we consider constraints characterized by two disjoint subsets of feasible solutions: healthy states F 1 ⊆ {−1, 1} n , and faulty states As before, we require an Ising model that separates feasible from infeasible solutions, but preferring healthy to faulty states whenever possible.The particular case F 2 = {−1, 1} n \ F 1 corresponds to a MAX-CSP problem, in which a CSP is unsolvable but nonetheless we attempt to maximize the number of constraints satisfied by applying the same penalty to every constraint with a faulty configuration.
One way to model (F 1 , F 2 ) is through reification where a variable representing the truth of the constraint is introduced.This reified, or health, variable is +1 for healthy states and -1 for faulty states.That is, we define a feasible set and model F using the methods in §2.3.In this case, both solutions in F 1 and F 2 will be equally preferred.To break the tie to favor healthy states, the health variable can be added to the objective function with negative weight. 2 We call this the explicit fault model.
A second strategy is to modify the optimization problem of §2.3 so that all solutions in F 1 have energy 0, while all solutions in F 2 have energy e > 0 and infeasible solutions have energy at least g > e.In this case, we fix the intermediate energy e and the optimization problem becomes: θ ≤θ ≤ θ.
It is straightforward to adapt the SMT solution methods of [3] to this problem.We call this the implicit fault model.The implicit model generally requires fewer variables (i.e., qubits).However, care must be taken to ensure that g is large compared to e; otherwise, when adding penalties together, it may be difficult to differentiate several faulty constraints from a single infeasible constraint.In the explicit model, this issue can be avoided by choosing a sufficiently small weight for health variables in the objective function.

Locally-structured embedding
Given a method for generating penalties on subgraphs G the next steps of locally-structured embedding are the placement of G's within G, and the routing of chains of interactions between variables occurring in multiple constraints.[3] suggested adapting VLSI algorithms for placement [24,8,41] and routing [24,17] to accomplish these steps, and in this section, we describe two improvements to that work.Firstly, using routing models we find a tight lower bound on the size of the largest chain.This bound is combined with search heuristics to speed the discovery of good embeddings.Secondly, embedding algorithms that utilize placement and routing steps differ in a significant way from their classical VLSI counterparts, and a modification that performs simultaneous placement and routing improves results.

Chain length lower bounds and improved routing
The performance of D-Wave's hardware in solving an embedded Ising model depends heavily on the size of the chains of variables: shorter chains are more likely to yield better performance [48].In this section, we focus on routing which minimizes chain lengths.We assume that constraints have already been placed in the hardware (see [3] for placement methods).We show how to find tight lower bounds on the maximum chain size in an embedding, and provide an effective procedure to improve routing using these bounds.We first consider bounds for a single chain, which reduces to the well-studied Steiner tree problem.Let T ⊆ V (G) be a set of terminals; i.e. qubits in the hardware graph to which a variable has been assigned during placement.A Steiner tree is a connected subgraph of G that contains all the terminals.The Steiner tree problem consists of finding the smallest (fewest number of nodes) Steiner tree.The non-terminal vertices in a Steiner tree are called Steiner points.
There are several ways to model the Steiner tree problem as an mixed integer linear program (MILP).However the tightness of the linear program (LP) relaxation will have a significant impact on the time required to find a solution.Here we consider a formulation whose LP relaxation, known as the bidirected cut relaxation, has an integrality gap of at most 2 [37].First, we transform G into a directed graph by replacing each edge with two opposite arcs.For each v ∈ V (G) \ T , let x v be a binary variable indicating whether v is part of the Steiner tree.When variables x v are fixed, the Steiner tree is just a tree spanning T ∪ {v ∈ V (G) : x v = 1}.A tree can be modelled as a multi-commodity transshipment problem: pick any v 0 ∈ T as root, and find a path from v 0 to each of the other |T | − 1 terminals.Concretely, if we define flow variables f i a indicating that arc a is on the path from v 0 to terminal i, then an MILP formulation for the the Steiner tree problem is a→v Here, constraints (11) are the flow constraints, while the capacity constraints (12) allow flow to be routed only through Steiner points (i.e., v ∈ V (G) with x v = 1).The notation v → a (respectively a → v) refers to all arcs whose tail is v (respectively, whose head is v).
The LP relaxation of program (BCR) above produces very tight lower bounds for a range of Steiner tree problems [9].This MILP can be extended to the full routing problem using different flows for each Steiner tree to be found, with the additional demand that every variable can appear in at most one Steiner tree.
Having access to good bounds on the chain lengths allows for a simple improvement to the routing phase presented in [3].Assume we have a heuristic routing algorithm Route(G, T , M ) that takes as input a hardware graph G, a collection of terminal sets T = {T i } for each variable x i in the CSP, and a maximal allowable chain size M .Then any successful call to Route(G, T , M ) will provide an upper bound on the maximal chain length no worse than M , while any unsuccessful call provides a lower bound of M + 1.With these bounds we can perform a heuristic binary search for the optimal maximal chain length, and beginning with the good lower bound provided by the LP relaxation of (BCR) will significantly reduce the number of iterations in the search.

Combined place-and-route algorithms
The place-and-route model of embedding, while known to scale well, is often inefficient in maximizing the size of a problem embeddable in a fixed hardware graph.One reason is that in contrast with VLSI, the resources being negotiated by placement and routing are identical (namely, vertices of G).So, for example, an optimal placement might squeeze constraints as close together as possible, leaving no room for routing.For this reason we have developed a rip-up-and-replace algorithm which combines the placement and routing phases of embedding, using new routing information to update placements and vice versa.
During the course of the algorithm, vertices of G may be temporarily assigned to multiple variables, and each vertex is given an exponentially increasing penalty weight according to the number of times it has been used.More precisely, if each variable x i currently has chain S i ⊂ V (G), then the weight of vertex q ∈ V (G) is ω(q) = α |{i:q∈Si}| for some fixed α > 1.Each constraint C is given a placement (L C , v C ) consisting of a location L C ⊂ V (G) and an assignment of variables to vertices within the location, v C : V (C) → L C (where V (C) denotes the set of variables associated with constraint C).
The algorithm iteratively alternates between assigning constraints to locations, and routing variables between constraints (i.e.creating chains).Chains are created using a weighted Steiner tree approximation algorithm such as the MST algorithm [28] or Path Composition [17].Constraint locations are chosen based on a cost function, where the cost of (L, v) depends on the weight of vertices in L and the weight of routing to (L, v), which is approximated by weighted shortest-path distances to existing chains.The algorithm terminates when a valid embedding is found or no improvement can be made.Explicit details are given in Algorithm 1 below.
Alternatives to rip-up-and-replace, which iteratively updates the locations for constraints based on variable routing, such as simulated annealing or genetic algorithms could be used to update placements.For example, define a gene to consist of a preferred location for each constraint, and a priority order for constraints.Given a gene, constraints are placed in order of priority, in their preferred location if it is available or the nearest available location otherwise.During simulated annealing, genes are mutated by perturbing the preferred location for a constraint or transposing two elements in the priority order.These algorithms tend to take much longer than rip-up-andreplace, but eventually produce very good placements.

Decomposition algorithms
Owing to a limited number of qubits, it is often the case that a CSP or Ising model is too large to be mapped directly onto the hardware.[3] offered various decomposition techniques which use QA hardware to solve subproblems as a subroutine for solving larger ones.In this section, we describe two additional algorithms: divide-and-concur [18,49], specialized to our case of Ising model energy minimization, and a new algorithm inspired by regional generalized belief propagation [50].
For both algorithms, we partition the constraints of a MAX-CSP into regions R = {R 1 , R 2 , • • • } so that each subset of constraints can be mapped to a penalty model on the hardware using the methods of the previous section.For a region R ∈ R, the penalty model [h (R) , J (R) ] produces an Algorithm 1 Rip-up and replace heuristic for finding a placement of constraints and embedding of variables in a hardware graph.Require: Graph G, list of constraints C, list of potential placements (L, v) for each C ∈ C Ensure: Placement of each constraint C ∈ C on a location (L C , v C ) and a chain S i ⊂ V (G) for each variable x i such that all chains are disjoint, or "failure".
Optimize chain length of chains {S i } for terminals {T i } else Return "failure" function Trim(S i ,C) Ising energy function E R (z (R) ) whose ground states satisfy all the constraints in that region.Here z (R) is the subset of variables involved in the constraints of region R. Since embedding is slow in general, regions are fixed and embedded in hardware as a pre-processing step.
The key problem is that sampling a random ground state from each region produces inconsistent settings for variables involved in multiple regions' constraints.At a high level, messages passed between regions indicate beliefs about the best assignments for variables, and these are used to iteratively update the biases on h (R) in hopes of converging upon consistent variable assignments across regions.The two algorithms presented here implement this strategy in very different ways.

Divide and concur (DC)
Divide-and-concur [18,49] (DC) is a simple message passing algorithm that attempts to resolve discrepancies between instances of variables in different regions via averaging.In each region R, in addition to having an an Ising model energy function E R (z (R) ) representing its constraints, one introduces linear biases L R (z (R) ) on its variables, initially set to 0. Let z (R) i denote the instance of variable z i in region R.The two phases of each DC iteration are: e. satisfy all constraints and optimize over linear biases).
• Concur: average the instances of each variable: and update the linear biases by setting This basic algorithm tends to get stuck cycling between the same states; one mechanism to prevent this problem is to extend DC with difference map dynamics [49].DC has been shown to perform well on constraint satisfaction problems and constrained optimization problems, and compared to other decomposition algorithms, has relatively low precision requirements for quantum annealing hardware.That is, assuming each variable is contained in a small number of regions, the linear biases on the variables in the Ising model of each region (namely −z i ) are discretized.On the other hand, like most decomposition algorithms, DC is not guaranteed to find a correct answer or even converge.

Regional Generalized Belief Propagation (GBP)
[3] explored min-sum belief propagation as a decomposition method.Here we take a different approach: instead of minimizing the energy of an Ising model E(z) directly, we sample from its Boltzmann distribution p(z) = e −E(z)/T /Z.Presuming that we have successfully mapped our constraints to Ising models with large gaps ( §2.3), and that the temperature T is sufficiently small, we have confidence that sampling from the Boltzmann distribution provides good solutions to the original constrained optimization problem.The Boltzmann distribution is the unique minimum of the Helmholtz free energy Our algorithm decomposes A into regional free energies.The resultant algorithm is similar in spirit to the generalized belief propagation algorithm of [50] based on their region graph method.Sum-product belief propagation is related to critical points of the (non-convex) Bethe approximation, which for Ising energies reads where d i = |{j ∈ V : (i, j) ∈ E}|.The distribution p in the free energy is approximated by local beliefs (marginals) b i , b ij at each vertex and edge.To obtain consistent marginals, b i (z i ) = j b ij (z i , z j ) whenever (i, j) ∈ E, one introduces a constrained minimization problem, and it is the Lagrange multipliers associated to these constraints that relate to the fixed points of belief propagation.In particular, if belief propagation converges then we have produced an interior stationary point of the constrained Bethe approximate free energy [50].
In our case, having divided a MAX-CSP into regions R, we can formulate a regional analogue of the Bethe approximation, where now c i = |{R : i ∈ R}| is the number of regions whose Ising model includes variable z i .In exactly the same way as above, requiring consistent marginals induces a constrained minimization problem for this regional approximation.The critical points of this problem are fixed points for a form of belief propagation.Specifically, for each variable z i in a constraint of R, the messages passed between variable and region are For large regions, which involve a large number of variables, the first of these messages is intractable to compute.As in previous work [3], we use QA hardware to produce this message.In that work, the algorithm relied on minimizing the energy of the penalty model; here we harness the ability of the hardware to sample from the low energy configurations of the Ising model without relying on finding a ground state.Unfortunately, it is not as simple as sampling from the Ising model formed from the constraints in a given region.Even if the hardware were sampling from its Boltzmann distribution, this would minimize the free energy of just that region Unless the region R is isolated, this would not recover the desired belief b R as we have failed to account for energy contributions of variables involved in other regions' constraints.We instead add corrective biases to each region's penalty model and sample from the Boltzmann distribution of this energy function.We use the notation E (R) and V (R) for the Ising model graph associated to region R, and ∂R ⊂ V (R) for it's boundary: indices of variables that also appear in the penalty models of constraints in other regions.Only these variables gain corrective biases.
Algorithm 2 Generalized belief propagation (GBP) based on regional decomposition.
Require: A decomposition of a CSP into constraint regions R and penalty Ising models E R for each R ∈ R. Putative temperature T .Ensure: A critical point of the constrained regional Bethe approximation (13), or "failure".
For each R ∈ R and j ∈ ∂R, initialize F j→R (z j ) ∝ 1.
) by minimizing Eq. ( 14) using the corrected energy ẼR Compute the messages Algorithm 2 is a generalized belief propagation (GBP) that uses the Boltzmann distribution of each region's corrected penalty model to re-estimate their collective corrective biases.If this algorithm converges, then one obtains a critical point of the regional Bethe approximation (13) constrained to give consistent marginals z (R) \zi b R (z (R) ) = b i (z i ), [30].Like belief propagation, there is generally no guarantee of convergence and standard relaxation techniques, such as bounding messages away from 0 and 1, are required.
Beyond a proof of correctness, GBP offers a distinct computational advantage over our previous belief propagation algorithm from [3].For ease of reference, we include the relevant message formulation from that work: Note there are 2|∂R| Ising model energy minimizations to be performed in each region R.With current QA hardware, programming of h, J parameters is significantly slower than sampling many solutions, and thus the cost of 2|∂R| reprogrammings can be significant.In GBP however we use QA hardware not to estimate a ground state energy, but to approximate the distribution b R (z (R) ).This can be performed with a single programming call per region.Each message is formed from the marginals, z (R) \zi b R (z (R) ), which are estimated from the hardware sampled ensemble.
One weakness in this algorithm is the need to know the temperature T in order to produce the corrective biases V (R) i (z i ).[1] and [38] propose methods to estimate instance dependent effective temperature directly from samples.It seems likely these techniques can be applied to GBP, and will be incorporated into future work.

Application: fault diagnosis
We apply the methods of the previous sections to solve problems in fault diagnosis, a large research area supporting an annual workshop since 19893 .We focus on circuit hardware fault diagnosis, which has featured as the "synthetic track" in four recent international competitions [29,36].Our goal is to use fault diagnosis as an example of how to use the methods of this report, and we use these competitions as inspiration rather than adhere to their rules directly.The typical problem scenario is to inject a small number of faults into the circuit, using the specified fault modes for the targeted gates, and produce a number of input-output pairs.Now, given only these input-output pairs as data, one wishes to diagnose the faulty gates that lead to these observations.As typically there will be many valid diagnoses, the problem is to produce one involving the fewest number of gates (a "min-fault" diagnosis).
We restrict to the "strong" fault model, in which each gate is healthy, and behaves as intended, or fails in a specific way.(In the "weak" fault model only healthy behaviour is specified.)The strong fault model is generally considered more difficult that then weak model, but is no harder to describe using the Ising model techniques of §2.3.
Both the strong and weak fault model diagnosis problems are NP-hard.State-of-the-art performance for deterministic diagnosis is achieved by translating the problem into a SAT instance and using a SAT solver [33], but this approach has not been as thoroughly investigated in the strong fault model [44].Greedy stochastic search produces excellent results in the weak fault model, but is less successful in the strong fault model [13].
We study the effectiveness of the D-Wave hardware in two experiments.First, we examine the ability of the hardware to sample diverse solutions to a problem.We find, despite not sampling diagnoses uniformly, that almost all min-cardinality diagnoses can be produced by oversampling the hardware by a factor of 1000 (Table 2).Next, we use the hardware to produce a solution for a problem too large to be embedded.We test dual decomposition from [3] and divide-and-concur from §2.5 above, and solve several min-fault diagnosis problems that require multiple regions (Figure 4).

Problem generation
We test on the ISCAS '85 benchmarks [19] and 74X-Series combinatorial logic circuits.From publicly available .isc files4 , we remove fault modes for buffer or fan-out wires, leaving only fault models for gates.Additionally, in order to accommodate penalty modeling with a small number of variables, we split certain large gates into smaller ones; this can be done without changing the correct fault diagnoses.
Owing to the difficulty of generating good input-output pairs [36, §4.2], we take a simplified approach.For each circuit, we randomly generate 100 observations (input-output pairs) and select a subset of size 20 with as uniform a distribution of minimum fault cardinalities as possible.These cardinalities are verified using a the MAX-SAT solver EVA [34].
We perform cone-clustering ( §2.2) on each circuit using the "pessimistic" approach for strongfault models of [44], and generate Ising models to represent the constraints for each cone.When using explicit health variables, the resulting Ising models have energy gap at least 2, while with implicit health variables, the energy gap is 1 (using hardware-structured Ising models with We partition the cone-clusters into regions using the software package METIS [25], with the number of regions chosen so that each region is embeddable in a working D-Wave hardware graph with up to 1152 qubits.Finally we embed each region using the "rip-up-and-replace" algorithm of §2.4.2.It is important to note that for a given circuit, each of its regions need only be embedded once as different test observations may use the same embedding.Table 1 summarizes the circuits, partitions, and embedding statistics.

Generating diverse solutions
To test the D-Wave hardware's ability to generate diverse solutions, we consider the problem of finding all min-cardinality fault diagnoses for a given observation.This is computationally more difficult than finding a single diagnosis, but also more realistic from the perspective of applications.Again, state-of-the-art performance in the weak fault model is achieved using a SAT-solver [33].The hardware's natural ability to rapidly generate low-energy samples lends itself to applications in which a diverse set of optimal solutions are required.Unfortunately, samples taken from the D-Wave hardware do not conform to a Boltzmann distribution, owing to both noise and quantum mechanical effects.In contrast with greedy stochastic search [13], min-cardinality solutions will not generally be sampled with equal probability.In practice, Gibbs sampling [16] and other postprocessing techniques may be used to make a distribution of ground states more uniform.
We restricted to the 74X-Series circuits in Table 1, which can entirely embedded within the current hardware architecture.For each input-output pair for a circuit, we used SharpSAT [46] to enumerate the min-cardinality diagnosis set Ω ≤ and then drew 1000|Ω ≤ | samples from the QA hardware.Ising models were pre-processed with roof-duality [5] and arc-consistency [31], allowing certain variables to be fixed in polynomial time.Random spin-reversal transformations ("gauge transformations") were applied to mitigate the effects of intrinsic control error in the D-Wave hardware.Samples were post-processed using majority vote to repair broken chains, followed by greedy bit-flipping in the original constraint satisfaction space to descend to local minima.See [26] for more details on pre-and post-processing.
The results in Figure 3 show the expected number of samples needed to see all min-fault diagnoses at least once, together with the number of samples needed to see just a single min-fault diagnosis.Namely, if p i denotes the fraction of all samples taken that correspond to min-fault diagnosis i, then the expected number of samples required to find a single min-fault solution is 1/ i p i , and the expected number of samples required to find all min-fault solutions is (This is the coupon collecting problem with non-uniform probabilities [42,15].)Following [13], we also computed the expected fraction of all min-fault diagnoses found when taking N |Ω ≤ | samples, for N ∈ {10, 100, 1000}.These results are summarized in Table 2.

Solving large problems
We tested the performance of the D-Wave hardware in solving the fault diagnosis problem for circuits too large to be embedded.On the regions produced in §3.Performance in finding all min-fault diagnoses for the 74X benchmarks using a D-Wave 2X processor.Ω ≤ is the set of min-fault diagnoses for a given instance, and Mc(N ) is the expected percentage of all min-fault diagnoses found when N |Ω ≤ | samples are taken for each instance.Note that the annealing time to take 100 samples is 2 ms, roughly the same as the time to take 4 samples reported in Table 6 of [13].
(A) c432 c499 c880 c1355 c1908 0 dual decomposition (DD) from [3] and divide-and-concur (DC) from §2.5.To measure algorithm performance independent of quantum annealing, we also found the minima for regional Ising models exactly using low-treewidth variable elimination [27].Such an exact solver (SW) gives an upperbound on the performance of a decomposition algorithm.
In Figure 4(A), we show the number of successful min-fault diagnoses out of 20 instances for several of the ISCAS '85 benchmark circuits.Using the exact solver we attempted to solve each faultdiagnosis instance 100 times, and recorded the median number of successes across the 20 instances for each circuit.Using the D-Wave hardware, we attempted to solve each instance once.A summary of the D-Wave hardware performance on each regional optimization problem is given in Figure 4(B).Each problem was solved by drawing 20,000 samples across 20 spin reversal transformations, with pre-and post-processing as in §3.2.
Note that the overall performance of the decomposition algorithms using D-Wave's heuristic optimizer is similar to that using an exact solver, despite the fact that the D-Wave hardware does not solve every sub-problem to optimality.This suggests that QA hardware that provides only approximate solutions in the form of low-energy samples can still be used to solve large optimization problems, provided it can capture a sufficiently non-trivial portion of the original problem.

Conclusion
In this paper we have expanded on the approach given in [3] to solve large discrete optimization problems using quantum annealing hardware limited by issues of precision, connectivity and size.Applying these techniques with the D-Wave 2X device, we are able to solve non-trivial problems in model-based fault diagnosis.While the total running times of the decomposition algorithms are not currently competitive with the fastest classical techniques, both the speed and the performance of the algorithms improve dramatically with the size of the quantum hardware available.
Two of the most important directions for future research are: 1. Expanding penalty-modelling techniques to more qubits.As the available hardware grows larger, large energy gaps and other forms of error-correction will become more important to finding the grounds state in quantum annealing.In addition, a better understanding of the performance trade-off between larger energy gap and fewer qubits is needed.
2. Alternate strategies for decomposition algorithms.Since minor-embedding is itself a difficult discrete optimization problem, current decomposition algorithms are hampered by the need for fixed regions with pre-computed embeddings.More research is needed into circumventing the need for fixed regions, combining quantum annealing with the best classical constraint satisfaction methods, and making better use of the fast sampling capabilities of the available hardware.

Figure 1 :
Figure 1: Comparison of locally-structured (top) and global (bottom) embeddings of a CSP with constraints {XOR(x1, x2, x3), XOR(x1, x4, x5), N EQ(x3, x5)} in a D-Wave-structured hardware graph.(A) The penalty models for XOR and N EQ have an energy gap g = 2. (B) After locally-structured embedding with a chain strength of α = 1, the Ising model for the CSP has an energy gap of 2. Chain couplings are indicated with thick blue edges.(C) The given penalty model for XOR using only 4 qubits has energy gap g = 1.(D) The aggregate Ising model for the CSP.Variable ai is an auxiliary variable used to define the i-th constraint.(E) Global embedding of the aggregate model.The chain strength α = 2 was optimized experimentally, and the entire Ising model is scaled by a factor of 1/α to satisfy the range requirement −1 ≤ Jij ≤ 1.After scaling, the Ising model for the CSP has an energy gap of g = 0.5.The global embedding uses fewer qubits but requires more precision to specify.

Figure 2 :
Figure2: Comparison of the D-Wave quantum annealing hardware performance in solving XOR-3-SAT problems[21], using global (blue) and locally-structured (red) modelling strategies.Each XOR-3-SAT instance is randomly generated subject to having a unique solution and a clause-to-variable ratio of 1.0.Global embeddings: each constraint s1s2s3 = 1 (with si = ±1) is encoded in the 4-qubit penalty model in Figure1(C), which has energy gap g = 1.The Ising model representing the sum of these constraints is then embedded using the heuristic in[7].Local embeddings: each constraint is mapped directly to the K3,3structured Ising model in Figure1(A), which has energy gap g = 2. Constraints are then embedded using the rip-up and replace algorithm in §2.4.2.Left: Scaling of hardware performance with no post-processing.ST99 is the number of samples needed to find the ground state with 99% probability, which, given a fraction p of all samples taken that are in the ground state, is given by the formula ST99 = log(1 − .99)/log(1 − p)[40].Bold lines indicate the median across 50 instances of each problem size.For points not shown, the hardware failed to find a ground state.Middle: total number of qubits required to embed the Ising model (median).Right: number of qubits in the largest chain in the embedding (median).

Figure 3 :
Figure 3: Performance in finding all min-fault diagnoses for the 74X benchmarks using a D-Wave 2X processor.Missing ×'s indicate that not all solutions were found.

Figure 4 :
Figure 4: (A) Summary of D-Wave hardware success rate using divide-and-concur (DC) and dual decomposition (DD), compared to the same decomposition algorithm using an exact low-treewidth software solver (SW).(B) Percentage of hardware samples (HW) and regional solutions (Region) within ∆% of optimality across all instances tested.

Table 1 :
Statistics for the 74X Series and ISCAS '85 benchmarks as embedded on a D-Wave 2X processor, including number of regions |R| in the decomposition.Chain length refers to the maximum size of a chain within each region.