Applying the Hubbard-Stratonovich Transformation to Solve Scheduling Problems Under Inequality Constraints With Quantum Annealing

Quantum annealing is a global optimization algorithm that uses the quantum tunneling effect to speed-up the search for an optimal solution. Its current hardware implementation relies on D-Wave’s Quantum Processing Units, which are limited in terms of number of qubits and architecture while being restricted to solving quadratic unconstrained binary optimization (QUBO) problems. Consequently, previous applications of quantum annealing to real-life use cases have focused on problems that are either native QUBO or close to native QUBO. By contrast, in this paper we propose to tackle inequality constraints and non-quadratic terms. We demonstrate how to handle them with a realistic use case-a bus charging scheduling problem. First, we reformulate the original integer programming problem into a QUBO with the penalty method and directly solve it on a D-Wave machine. In a second approach, we dualize the problem by performing the Hubbard-Stratonovich transformation. The dual problem is solved indirectly by combining quantum annealing and adaptive classical gradient-descent optimizer. Whereas the penalty method is severely limited by the connectivity of the realistic device, we show experimentally that the indirect approach is able to solve problems of a larger size, offering thus a better scaling. Hence, the implementation of the Hubbard-Stratonovich transformation carried out in this paper on a scheduling use case suggests that this approach could be investigated further and applied to a variety of real-life integer programming problems under multiple constraints to lower the cost of mapping to QUBO, a key step towards the near-term practical application of quantum computing.


INTRODUCTION
Since being first proposed by Richard Feynman in 1981, quantum computing has been an intriguing idea for both physicists and computer scientists [1]. The core concept of a quantum computer is to manipulate a quantum system in a large Hilbert space in order to gain a significant advantage over classical computing to solve certain classes of hard problems. For instance, it has been theoretically proven that quantum algorithms bring an exponential speed-up over traditional methods on the problems of factorization [2] and quantum simulation [3,4], as well as a quadratic speed-up for unstructured search [5].
For many hard computational problems however, no approach, either classical or quantum, has been theoretically proven to outperform others. In practice, the investigation of quantum advantage is thus closely related to hardware development in order to obtain empirical evidence. Yet, controlling and manufacturing a quantum computer remains an unsolved challenge, although past decades have witnessed multiple breakthroughs. Despite the recently achieved significant milestones for quantum computing, with quantum supremacy tests being performed (Google's superconducting quantum computer Sycamore [6], photonic quantum computer JiuZhang [7]), it is still a fundamental research question to determine which problems and applications should be solved more efficiently by using quantum computing in place of classical computers; and to what extent. Given the current technological advancement, quantum annealing-a metaheuristic optimization algorithmachieves promising experimental performances in finding near-optimal solutions for some NP-hard optimization problems [8][9][10][11]. Quantum annealing requires to map an optimization problem to an Ising spin glass model with Hamiltonian H P , to which QPUs are restricted to by construction, and then solve the problem by sampling the low-energy states of the physical system. The system is typically initialized in the ground state of an easy-toimplement driver Hamiltonian H D , and then evolved progressively towards H P : where A(0) 1, B(0) 0 and B(1) 1, A(1) 0. With σ x , σ z the Pauli spin operators and J ij , h z i , h x i some scalar coefficients, the problem Hamiltonian with n qubits is written H p <i,j> J ij σ z i σ z j + n i 1 h z i σ z i while the driver Hamiltonian H D n i 1 h x i σ x i does not commute with H P , thus creating the quantum fluctuations. The annealing process, which is a dynamic process of the quantum system, is non-trivial and techniques such as non-linear scheduling, pausing and reverse-annealing can further improve the quality of the final system state [12,13].
The understanding of the underlying physical process and the speed-up that quantum annealing could provide over classical methods has been a heated topic of debate and interest. Indeed, the notion of quantum speed-up itself has been proven to be elusive, and varies from provable to limited speed-ups, depending on the question that is investigated [14][15][16]. It has been commonly believed that quantum annealing can exhibit speedup over simulated annealing due to the fact that it can escape local minima via quantum tunneling effect [17]. However, some research is still needed to clarify to what extent quantum tunneling can accelerate the optimization process. Besides, another issue is whether the quantum tunneling effect can be observed in the current physical device. Finally, it has been shown that the speed-up of quantum annealing could be highly dependent on the high dimensional energy landscape determined by the problem structure [18].
Despite these key challenges, the rapid progresses achieved in hardware development suggest that practical applications of quantum computing are within reach. Particularly, the latest generation of D-Wave's Quantum Processing Units (QPUs), designed to solve optimization problems with quantum annealing, reaches more than 5,000 qubits. As such, quantum annealing has already been applied to several real life problems with D-Wave's experimental device. The applications range from scheduling and planning problems [19][20][21], machine learning [22,23] and other domains such as molecular design [24,25], portfolio optimization [26,27] or robotic movement [28]. Notably, some experiments carried out on D-Wave's 2000qubit machine have shown that quantum annealing, when combined with classical solvers within an hybrid approach, can outperform some existing commercial solvers on industrial scale optimization problems, such as scheduling and traffic control [29,30].
In this work, we are interested in solving an industrial energy problem, which can be written as a constrained integer programming problem: the optimal charging scheduling of electrical (EV) buses. Through this application, we aim at providing insights on two of the main research questions quantum annealing still needs to address, namely mapping to QUBO problems and embedding in hardware [19]. Due to the complex nature of the EV-bus charging problem, not tailored a priori for quantum computing, mapping to QUBO and then to hardware is effectively a costly step. In order to go beyond the penalty method commonly used for such mappings [31], we propose to use the Hubbard-Stratonovich transformation to obtain an efficient, with least qubit requirement, mapping. The transformation was first applied to quantum annealing problems by Ohzeki in 2020 [32], and we explore more deeply its practical implementation and performances by studying a problem of increased complexity, with several inequality constraints.
In the rest of this paper, we solve the EV bus charging scheduling problem on the D-Wave 2000q Quantum Processing Unit by firstly reformulating it into a QUBO problem with the penalty method, in Section 2. Besides, we also discuss the limitations of the penalty method, specifically the issues of graph connectivity and dynamic range. Next, by employing a method based on the Hubbard-Stratonovich transformation and described in Section 3, we reduce some of the quadratic penalties in the QUBO problem into linear terms, breaking thus the limitation of connectivity and dynamic range. Finally, we present and discuss some numerical experiments in Section 4 and conclude in Section 5.

Problem Definition
We consider a fleet of N bus electrical buses and a set of N pile piles. Assuming that the electricity consumption of all buses is known during their respective operating hours, the objective of the EVbus charging scheduling problem is to find an optimal daily schedule for charging the buses by minimizing the cost of electricity while avoiding any breakdown. The EV-bus charging scheduling problem with minimization of price can thus be formulated as follows, with T (t 1 , t 2 , . . . , t k , . . . , t T ) being a discretization of time periods: where z ijk is a binary variable indicates whether bus i is charging at pile j for the time period t k while o ik 0 if bus i is in operation and o ik 1 otherwise, i.e., if bus i is available for charging at time t k . ω j is the power output of pile j, p k the cost of electricity at time t k . We consider a time period of 24 h, and need only to find the values z ijk during non-operating hours, i.e., during charging windows. This reduces the number of time steps to L time ≤ T. In the sequel, we denote by S il (respectively, D il ) the starting (respectively, ending) time of the l-th charging window of bus i. The constraints of the problem can be expressed as: ∀k : ∀i, l : ∀i, l : ∀i, l : Constraint (3) ensures that any bus can only be charged at most one pile at a given time, while (4) means that the total number of buses being charged at the same time should not be greater than the total number of piles N pile . (5) and (6) control the state-ofcharge (SOC) of each bus i between 100 and 30% at all times by adding constraints at each time a bus leaves the charging station, with c ik being the power consumption of bus i at time t k . Note here that the power ω j is also normalized to be consistent with SOC ∈ [0, 1]. Lastly (7) guarantees that a bus can only be charged continuously, and not intermittently, at most once during each charging window. Hence, the original EV-bus charging problem Eqs 2-7 is an integer programming problem with inequality constraints.

Reformulation of Constraints With the Penalty Method
Due to hardware constraints, quantum annealing can only be directly sampled from an Ising Hamiltonian. In this section, we reformulate thus the integer programming problem to a quadratic unconstrained binary optimization (QUBO) problem, which by definition does not include any constraint or non-binary variables [31,33]. For each equality constraint of the form C i (x) 0 in the optimization problem, the penalty method consists in augmenting the cost function with a term P 2 i (x) which penalizes the solutions that violate the constraint. For the inequality constraints, following [31], we first transform them into equality constraints with slack variables and then apply the penalty method. More specifically, constraints (3) and (7) are equivalent to with s 1,ik , s 5,il ∈ {0, 1}. This yields the corresponding penalty functions: It should be noted that Eq. 11 is of degree 4 in x, i.e, it is not a quadratic form. Therefore, a first limitation of the penalty method is that the already quadratic constraint (7) cannot be included in the augmented cost function. Next, handling constraints (4) to (6) requires to introduce a slack variable that is either an integer [constraint (4)] or a real number [constraints (5)- (6)]. We choose to encode such nonbinary variables s with binary encoding, i.e., where N is the number of bits and upper-indexed s (i) are used to denote the encoding binary variable of s. Other options, not explored in this article, for encoding the continuous variables include one-hot encoding and order encoding, as in [34]. Constraints (4)-(6) become where s 2,jk is a binary-encoded integer slack variable in [0, N pile ] and s 3,il , s 4,il are binary-encoded continuous slack variables in [0, 1]. Furthermore, it is easy to see that the penalties P 3 and P 4 can be written into a single one: with Finally, it can be observed that the barrier method in classical optimization is a more efficient way of handling inequality constraints [35], yet not applicable here due to the log function required by this method but not feasible in practice on the quantum annealer.

Quadratic Unconstrained Binary Optimization Formulation of the Original Problem
By employing the penalty method and encoding the continuous variables, the original constraints are reformulated into penalty functions P 1 . . . 5 . The effective QUBO Hamiltonian reads then: where H p i,j,k p k ω j x ijk is the Hamiltonian corresponding to the original cost function (2). It remains to determine the values of λ i such that the structure of the original problem can be kept unchanged in the reformulated QUBO problem. More specifically, let C denote the set of all viable solutions x, we need to ensure that: We proceed to an analysis of the different terms in Eq. 16, following the method in [31]. The detailed calculations are described in the Supplementary Material, and we find that in order to preserve the structure of the original problem, it is sufficient that the coefficients λ i satisfy the following inequalities: where min δp is the minimal price difference between any two given times during the day (including the difference between lowest price and 0), ε is a margin to control the state of charge within the boundaries [0.3 + ε, 1 −ε], typically set to ε 0.1. The non-quadratic penalty P 2 5 is omitted. These inequalities indicate thus how to appropriately set the values of the coefficients λ i for the penalty functions. Another practical issue is the ratio r between the largest and smallest terms in the QUBO objective function H QUBO . Indeed, a large ratio will require the hardware device to have high dynamic range and precision, which is challenging for the current technology advancement. More precisely, recalling that the original objective function is H* i,j,k p k ω j x ijk , we obtain the lower and upper bounds, respectively: where r max , r min give the range of the ratio r for which the structure of the problem stays the same. Hence, in order to guarantee that the structure of the problem is preserved, N, ε, L time should be such that: Furthermore, observe that the upper bound Ltime ε 2 of r min scales with the number of time steps, L time but is independent of N bus and N pile . Consequently, increasing the size of this problem solved by quantum annealing and penalty method would not represent a higher challenge in terms of dynamic range of the physical device. This is a satisfying result as the QUBO form of many hard problems requires scaling dynamic range [36]. On the contrary, increasing the number of time steps requires higher performances from the QPUs.

Limitations of the Quadratic Penalty Model
Although the quadratic penalty model maps the constrained optimization problem to an unconstrained one while preserving the relation between solutions, there are several practical limitations while implementing the quadratic penalty model on a real quantum annealer. Firstly, the Hamiltonian that can be realized onto the real physical device is an Ising Hamiltonian which contains at most two-body interactions, i.e., quadratic terms in the objective function. In the previous Section 2.3, we have shown that some penalty functions can contain higher-order terms that can not be written into the QUBO objective function, cf. quadratic constraint (7) and the corresponding penalty (11).
The second practical limitation of the penalty method relates to the connectivity of the hardware architecture. We refer to the quadratic penalty term of the first constraint P 2 1,ik (x) ( j x ijk − s 1,ik ) 2 as an example. To implement this constraint, we are requiring an all-to-all connectivity between the qubits representing x ijk for some fixed i, j and an ancillary qubit s 1,ik . It is beyond current technology to realize arbitrary connectivity between qubits on a physical device, as the interactions are often restricted between neighbors. Specifically, the D-Wave 2000q architecture is based on a Chimera graph, which has only limited connectivity; each qubit being coupled to at most 6 other qubits. The next generation of D-Wave devices (D-Wave Advantage) has an improved connectivity with the Pegasus graph of degree 15 [37], but was not available for testing as of writing time. Nevertheless, it is still very challenging to solve a highly connected problem on either of these architectures. Besides, as shown in Section 2.3, the coefficient λ is typically much larger than the terms in the original objective function, which could outrange the practical dynamic range of current device.
In practice, before the logical Ising Hamiltonian can be sampled, the problem has to be mapped to the physical hardware graph via minor embedding [38,39]. Minor embedding maps one logical qubit into a chain of physical qubits with negative neighboring interaction strength J chain_strength . This process creates significantly more physical qubits, for example the number of physical qubits needed to embed a N-clique onto the Chimera graph of D-Wave 2000q is O(N 2 ), i.e., only up to 64 fully connected qubits can be embedded onto over 2000 qubits. Besides, the interaction J chain_strength must be of similar magnitude or greater than the penalties to prevent breaking of chains, which would further address the problem of effective dynamic range. We refer to [38,40] [41,42]. In this case, the state would only evolve in a sub-space of the Hilbert space that satisfies the constraints. This approach is however not yet implementable on the current version of quantum annealer devices. In another work, Vyskočil et al. suggest to solve the problem of embedding constraints by using ancillary variables and mixed-integer linear programming to find the optimal combinatorial design of these qubits on the hardware graph [43,44]. Ajagekar et al. apply a decomposition of the problem into a constrained MILP problem and unconstrained QUBO subproblems that would be solved by quantum annealer [45]. However, these methods either could not be implemented with the current hardware or very hard to generalize to realistic optimization problems with several constraints.

BREAKING THE QUADRATIC PENALTY BY HUBBARD-STRATONOVICH TRANSFORMATION
In this work, we follow a method described by M. Ohzeki and based on the Hubbard-Stratonovich transformation to reduce the quadratic terms into linear terms [32]. The transformation is a well-known technique in statistical physics [46,47] and is defined via the integral identity with a real positive a. Consider the former Hamiltonian structure with additional penalty terms H(x) H p (x) + i λ i P 2 i (x), with P(x) being a linear expression of x and H p (x) the original cost Hamiltonian. The partition function of the system subjected to the predefined Hamiltonian H can be written as where β ∈ R is the inverse temperature of the system at equilibrium. By performing the Hubbard-Stratonovich transformation (22) on the quadratic penalty terms and applying a complex change of the integral variable y←i β/λ ], one can obtain the following expression of the partition function: The Notice that now, it is only necessary to determine the ground state of the effective Hamiltonian H(x, ]) in Eq. 25, which depends on P i instead of P 2 i . The method brings thus three significant improvements over the penalty method in Section 2.3. Firstly, the problem of connectivity mentioned in Section 2.4 is solved, and we can thus handle much larger problems with the same number of qubits. Secondly, the penalty P 5 (11) can be written into the objective function, with highest order term being quadratic. Finally, it can be observed from Eq. 25 that the penalty coefficients λ i no longer explicitly affect the penalty terms P i . Consequently, the penalty coefficients no longer causes problems in terms of the dynamic range of D-Wave QPUs. We also note that in our problem, solving the ground state of H(x, ]) remains a non-trivial problem, hence the necessity of the quantum sampler. However, it is also noted by Ohzeki that the relation between 〈P k (x)〉 x and ] could be non-monotonic for certain problems, causing difficulties for the gradient descent method to find the saddle point.
The algorithm is summarized as in Algorithm 1, where the braket notation 〈〉 x in Eq. 26 is the probabilistic summation of P(x), with the state of x sampled from the Hamiltonian H(x, ]) from Eq. 25 on the quantum annealing device or other optimization algorithm.

EXPERIMENTAL RESULTS
In this section, we generate various problems with different settings in L time , N bus , N pile based on realistic operational data of EV-buses. The continuous variables are discretized using N 8 bit for binary encoding. We then analyze the experimental results on the minor embedding, directly sampling the QUBO model from the D-Wave-2000q quantum annealer, and the iterative method based on Hubbard-Stratonovich transformation. The ground truth optimal solution is obtained using the commercial solver Cplex. We also use the classical counterpart simulated annealing [48] as a benchmark.

Minor Embedding
First of all, we generate several instances of the EV-bus scheduling problem with different values of L time , N bus , N pile based on real operational data. The problems are generated by randomly sampling from 15 typical bus operational schedules and randomly setting the number N pile and power ω j of charging piles in a reasonable range. We refer to Supplementary Figure S1 in the Supplementary Material for a visualization of these typical bus schedules.
To solve the problem via a quantum annealer, the logical problem (QUBO) graph needs first to be mapped to the hardware graph. In this case, the D-Wave-2000q has a Chimera architecture, which possesses a limited connectivity of degree 6. This implies that the number of physical qubits after the mapping is larger than the number of logical qubits in its original QUBO form, due to the minor embedding. We note that the minor embedding problem is NP-hard itself hence a heuristic algorithm is employed here [49]. Indeed, as analyzed in Section 2.4 and Section 3, the highly connected QUBO graph caused by the quadratic penalties increases significantly the number of qubits needed on the hardware to solve the EV-bus scheduling problem, even at a small scale.
The cost of embedding is illustrated in Figures 1A-D, where a problem with ∼ 150 qubits will cost ∼ 1,500 qubits on the hardware. We further observe a quadratic increase of physical qubits in Figure 1A when fixing N bus 1, N pile 1 and increasing L time from 24 to 120, and a linear increase in Figures 1B-D where L time is fixed. Consistently with the analysis in Section 2.3, we find thus that increasing the number of time steps L time is more challenging in terms of hardware requirement than increasing N bus , N pile . This result further emphasizes that the extra qubits needed are mostly due to the quadratic penalty terms in P 2 3,4 (x) ( D in j,k 1 ω j x ijk + . . . ) 2 , where the number of variables x in the penalty increases with L time .

Solving the Penalty Model
After minor embedding, the embedded problem is then sampled using the D-Wave quantum annealer as an efficient solver for QUBO problems. As discussed in Sections 2.3 and 2.4, the penalty coefficients λ i for the penalties P i need to be set appropriately to guarantee that the structure of the original problem is preserved. Additionally, the parameter J chain_strength is introduced during minor-embedding to ensure that the physical qubits can be correctly mapped back to the logical qubits after the sampling. Although theoretical bounds, i.e., minimum theoretical values to ensure that the penalized optimal solution is the original one, are provided for λ i in Section 2.3, in practice, the optimal values for P i can not be directly calculated. Instead, we explore the trade-off between penalty and chain strength by computing the time to 99%-success, as defined in [14], and chain break fraction for an instance of the problem N bus 1, N pile 1, L time 24 solved with different settings of λ 3,4 and J chain_strength . The coefficients λ 1,2 are constant to 10, since according to our previous results, these values are above their theoretical bounds and are of no significant effect on the efficiency of the solver.
As shown in Figure 2A, the time to 99%-success for this problem takes similar values along the lines J chain_strength ∼ λ 3,4 . However, this conclusion does not hold for other problems, as shown in Supplementary Figure S2 in the Supplementary Material, where the optimal λ 3,4 seems to be in the range [50,800] for problems of larger sizes. We also observe that for λ 3,4 ≫ J chain_strength , almost no solution could be found, which can be explained by an increased chain break fraction, as shown in Figure 2B. Equally for λ 3,4 ≪ J chain_strength : although the quality of mapping is guaranteed by an increased chain strength, yet the efficiency of sampling by D-Wave QPUs is reduced. We conclude that choosing the optimal values for λ 3,4 and J chain_strength is largely an empirical question that is problem dependent, but it is safer to set a penalty λ 3,4 close to the theoretical bounds and a chain strength J chain_strength of the same order of magnitude as λ 3,4 .

Solving With the Hubbard-Stratonovich Transformation
It is clear from the previous discussion that directly embedding and sampling the problem obtained with the penalty method suffers from several drawbacks and does not enable to solve largescale problems. An alternative method discussed in Section 3 is proposed to obtain a better scaling with the quantum annealer. Using the Hubbard-Stratonovich transformation, the QUBO Hamiltonian H QUBO (x) is transformed into an effective Hamiltonian H QUBO (x, ]) with an additional multiplier ]. Then the remaining problem is to find the saddle point of the effective Hamiltonian, which is the original optimal solution. This method is similar to the Lagrangian dual method in classic optimization, and can reduce the quadratic terms into linear ones in the penalized QUBO energy.
To demonstrate the effectiveness of this method, we solve a bus charging scheduling problem with N bus 3, N pile 2 and L time 48. The QUBO formulation of this problem with quadratic penalty constraints could not be directly embedded onto the 16 × 16  Chimera graph of the D-Wave 2000q sampler. For reference, to embed such a problem onto a 24 × 24 Chimera graph would cost 2,499 physical qubits and the maximum chain length is 31. On the other hand, the dual Hamiltonian as defined in Eq. 25 only requires 320 physical qubits to embed with a maximum chain length of 4. This example illustrates thus clearly that the limitation for connectivity is well mitigated after the Hubbard-Stratonovich transformation.
We investigate then whether the iterative approach can lead to the ground state of the original Hamiltonian, i.e., the optimal schedule with minimum cost and satisfying all constraints. As shown in Figures 3A,B, the optimal solution is obtained after 96 iterations of updating the multipliers ] based on the sampling of the dual Hamiltonian with the D-Wave 2000q annealer. In Figure 3A, we visualize the state-of-charge SOC of the optimal solutions found while sampling the dual Hamiltonian at different iterations. It can be seen that while updating the multipliers ] according to Algorithm 1, the state of charge gradually falls into the interval [0. 3,1], where the constraints are satisfied. Figure 3B displays the distribution of different samples returned by both the D-Wave 2000q annealer and simulated annealing. We note that the multipliers ] are updated according to the lowest energy samples returned by D-Wave's machine. We observe that the sampled lowest energy of the dual Hamiltonian does not decrease monotonically. This might be caused by multiple reasons. Firstly, the sampling, quantum or classical, is a heuristic process so that the actual ground states of the dual Hamiltonian are not necessarily found at each iteration, as shown in Figure 3B. Secondly, the energy landscape of the dual Hamiltonian might be non-convex, as also discussed by Ohzeki when this method is proposed [32]. Thirdly, the learning rate might be too large.
From an optimization perspective, the aforementioned three points imply that in practice, the task of solving for the multipliers ] is carried out in a complex landscape with noisy evaluations of the gradient. In this context, finding the optimal multipliers, hence the saddle point could be very challenging, especially with a large number of multipliers. To resolve this problem, instead of using a fixed learning rate η 0.1, we have employed an adaptive gradient method to update the multipliers ], with ADAM solver for stochastic optimization [50]. To further confirm the effectiveness of the adaptive gradient methods, we have solved the problem multiple times with the two different strategies, ADAM or fixed learning rate. It is observed in Figure 3C that the adaptive gradient approach for updating ] has a significant advantage over fixing the learning rate, requiring about 10 times less iterations to converge. This emphasizes thus that the practical constraint caused by the transformation, namely performing a classical gradient descent, requires careful tuning in order to reduce the additional computational overhead, which ADAM achieves. Yet, we note that it has been proven that ADAM as described in [50] could possibly diverge, even in the convex case [51]. Hence, the convergence of ADAM-type methods has been analyzed theoretically in several recent contributions.  [53]. The interested reader is also referred to [54][55][56][57] for more in-depth analyses of adaptive gradient algorithms. Consequently, although ADAM has shown to be an appropriate choice for solving the bus charging scheduling problem, further work could explore the impact of the adaptive gradient method on the convergence and computational cost of this iterative approach.

CONCLUSION
In this work, we have solved a realistic EV-bus charging scheduling problem with D-Wave's quantum annealer. Firstly, we have reformulated the original integer optimization problem with inequality constraints into a QUBO problem by the penalty method. Furthermore, theoretical analysis on the lower bound of the penalty and experimentation with minor embedding emphasize the fact that the key challenge for quantum annealing to solve the bus charging scheduling problem is two-fold: on one hand, the number of additional qubits needed for minor embedding and on the other hand, the penalty bounds to preserve the structure of the original problem. We analyze these problems with numerical experiments and find out that the former problem is caused by the quadratic penalty which makes it especially difficult to increase the number of time steps when modeling the problem. The latter challenge is linked to the case-dependent problem structure and the dynamic range of the physical device, which have a joint effect on the efficiency of quantum annealing.
To mitigate the above problems, we have employed an iterative approach based on the Hubbard-Stratonovich transformation proposed by Ohzeki [32]. This method reduces the quadratic penalty terms in the original QUBO Hamiltonian into linear terms in its dual Hamiltonian with the Hubbard-Stratonovich transformation. Then by iteratively solving the dual Hamiltonian to update the multipliers ] introduced by the transformation, the ground state of the original QUBO Hamiltonian, hence the optimal solution of the original problem, can be found. We demonstrated that it is possible to solve a larger-scale problem which can not be directly embedded onto the D-Wave 2000q′s Chimera graph with this approach. Besides, we observed that due to the noisy gradient evaluation caused by the heuristic sampling process and the complex energy landscape, it is significantly more efficient to update the multipliers in an adaptive manner, for instance with ADAM solver, instead of using a fixed learning rate.
As believed by many, quantum annealing can provide speedup for certain optimization problems with quantum tunneling effect. However in practice, the physical implementation for a quantum annealer contains only two-body and mostly neighbouring interactions between qubits (spins). This means that the ideal problems for a physical quantum annealer to solve in the near future are limited to problems which can be formulated as QUBO problems on a sparsely connected graph. While a large portion of the optimization problems can be eventually reformulated as QUBOs by encoding continuous variables, introducing penalties and ancillary qubits, it will generally induce an additional burden in terms of connectivity and dynamic range of the physical device. The Hubbard-Stratonovich transformation can be a possible way of alleviating this cost and solving larger-scale problems with an iterative procedure. Under the Hubbard-Stratonovich transformation, the complexity of solving the original problem is transformed into both solving the dual problem and the optimization of the multipliers, where the latter can be achieved using a classical computer. In this sense, this approach can be considered as a hybrid classical-quantum algorithm where the advantage of the quantum computing is better utilized. We believe that similar to the EV-bus charging scheduling problem considered in this work, many more realistic use-cases can be better solved by quantum annealing with the combination of Hubbard-Stratonovich transformation and adaptive gradient descent.

DATA AVAILABILITY STATEMENT
The datasets presented in this article are not readily available because Bus electrical consumption is a proprietary data shared by a third party entity. Requests to access the datasets should be directed to TN, tahar.nabil@edf.fr.