Quantum Image Denoising: A Framework via Boltzmann Machines, QUBO, and Quantum Annealing

We investigate a framework for binary image denoising via restricted Boltzmann machines (RBMs) that introduces a denoising objective in quadratic unconstrained binary optimization (QUBO) form and is well-suited for quantum annealing. The denoising objective is attained by balancing the distribution learned by a trained RBM with a penalty term for derivations from the noisy image. We derive the statistically optimal choice of the penalty parameter assuming the target distribution has been well-approximated, and further suggest an empirically supported modification to make the method robust to that idealistic assumption. We also show under additional assumptions that the denoised images attained by our method are, in expectation, strictly closer to the noise-free images than the noisy images are. While we frame the model as an image denoising model, it can be applied to any binary data. As the QUBO formulation is well-suited for implementation on quantum annealers, we test the model on a D-Wave Advantage machine, and also test on data too large for current quantum annealers by approximating QUBO solutions through classical heuristics.


Introduction
Quantum annealing (QA) [15,7,2] is a promising technology for obtaining good solutions to difficult optimization problems, by making use of quantum interactions to aim to solve Ising or quadratic unconstrained binary optimization (QUBO) instances.Since Ising and QUBO instances are NP-hard, and many other combinatorial optimization problems can be reformulated as Ising or QUBO instances (see e.g.[9]), QA has the potential to become an extremely useful tool for optimization.As the capacities of commercially available quantum annealers continue to improve rapidly, it is of great interest to build models that are well-suited for this emerging technology.Furthermore, QA has promising machine learning applications surrounding Boltzmann Machines (BMs), as both QA and BMs are closely connected to the Boltzmann distribution.Boltzmann Machines are a type of generative artificial neural network that aim to learn the distribution of some training data set by fitting a Boltzmann distribution to the data, as described thoroughly in [10, §20].On the other hand, QA aims to produce approximate minimum energy (maximum likelihood) solutions to a Boltzmann distribution via finding the ground state of the associated Hamiltonian that determines the 1 arXiv:2307.06542v3[quant-ph] 18 Aug 2023 distribution.Hence, maximum likelihood type problems on BMs are a natural candidate for applying QA in a machine learning framework.We contribute to the goal of furthering useful applications of QA in machine learning in this paper by building an image denoising model particularly well-suited for implementation via QA.
The task of image denoising is a fundamental problem in image processing and machine learning.In any means of collecting images, there is always a chance of some pixels being afflicted by noise that we wish to remove; see e.g.[4] for a good overview.Accordingly, many classical and data-driven approaches to the image denoising problem have been studied in the literature [5,25,11,23,6].This paper studies a quantum binary image denoising model using Restricted Boltzmann Machines (RBMs henceforth) [10, §20.2] that can take advantage of QA by formulating the denoising problem as a QUBO instance.Specifically, given a trained RBM, we introduce a penalty-based denoising scheme that admits a simple QUBO form, for which we derive the statistically optimal penalty parameter as well as a practically-motivated robustness modification.The denoising step only needs to solve a QUBO admitting a bipartite graph representation, and so is well-suited for QA.As QA has also shown promise for training BMs [1,8], our full model lends itself well for denoising images using quantum annealers, and could thus play a role in the their future applications since QA can then be leveraged for both the training and denoising steps.The model also shows promise in absence of QA, and our insights presented are not limited to the QA framework, as the QUBO formulation of the denoising problem and its statistical properties we prove may be of independent interest.
The paper is organized as follows.Section 2 gives a summary of background on quantum annealing and Boltzmann Machines.Section 3 describes our main contribution of the image denoising model for QAs, and Section 4 shows some practical results obtained.
Remark 1.1.We frame our work as a binary image denoising method, although the framework does not depend on the data being images, and can be applied to the denoising of any binary data.This is because the framework does not use any spatial relationships between the pixels, and instead treats the image as a flattened vector whose distribution is to be learned.Hence, the denoising scheme can be applied as-is to any other binary data setting.

Contributions and Organization
We provide QUBO-based denoising method for binary images (applicable to general binary data) using restricted Boltzmann machines in Section 3.This is done by formulating the denoising objective in equation 3.1 by combining the energy function of the distribution learned by the RBM with a (parameterized) penalty term for deviations from a given noisy image.This objective turns out to have an equivalent QUBO formulation, which is shown in claim 1.In Theorem 3.4, we derive the optimal choice for the penalty parameter under the assumption that the true images follow the distribution learned by the RBM, which also recovers the maximum a posteriori estimate per Corollary 3.5, though our model is more flexible, and this flexibility allows for useful practical modifications.Theorem 3.6 shows that the denoising method yields a result that is strictly closer (in expectation) to the true image than the noisy image is, under some additional assumptions.Given that these idealistic assumptions won't be met in reality, we propose a robustness modification in Section 3.3 that improves performance empirically.In Section 4, as the method lends itself well to quantum annealing, we then implement the method on a D-Wave Advantage 5000-qubit quantum annealer, demonstrating strong empirical performance.Since only small datasets can be tested on the D-Wave machine due to the relatively low number of qubits, we also test the method on a larger dataset, for which we use simulated annealing on a conventional computer in place of quantum annealing to find good solutions the QUBO denoising objective.Though we highlight the method being well-suited for quantum annealers, we emphasize that it may be of independent interest to the machine learning and image processing communities at large.

Related Work
Closely related work of [17] uses a similar model as ours for the image reconstruction task, also solving QUBO formulations via quantum annelaing.In the reconstruction task, some subset of pixels is unknown (or obscured or missing), and needs to be restored, whereas our work considers denoising, where which pixels are noise-afflicted is unknown.[11] derives a maximum a posteriori (MAP) estimator for the noise free image as a denoising method in a particular model of binary images that is less general than ours, though we would recover their estimator under a particular choice of our penalty parameter if we were to apply our framework to their model (since we recover MAP in a more general setting).Further, RBMS and quantum annealing have been studied for the classification problem, for instance in [18] and [1].Other research in the machine learning communities has also studied handling label noise, such as related work in [26], which studies the problem of training models in the presence of noisy labels, whereas our approach is entirely unsupervised (the data need not have any labels to begin with).

Background
Quantum Annealers make use of quantum interactions with the primary goal of finding the ground state of Hamiltonian by initializing and then evolving a system of coupled qubits over time [14].In particular, we may view QA as implementing the Ising spin-glass model [22] evolving over time.As the QUBO model is equivalent to the Ising model [9], and QUBO instances can be efficiently transformed to Ising instances, QA is well suited to provide good solutions to QUBO problems.A QUBO cost function, or energy function, takes the form where x i ∈ {0, 1}, and Q is a symmetric, real-valued matrix.We will occasionally refer to Q ij as the weight between x i and x j .QUBO is well-known to be NP hard [3], and many combinatorial problems can be reformulated as QUBO instances.See [9,20] for thorough presentation of QUBO formulations of various problems.A Boltzmann Distribution using the above QUBO as its energy function takes the form where z is a normalizing constant.Note that a parameter called inverse temperature has been fixed to unity and is not explicitly shown in the above expression.In this paper, we will focus on making use of Boltzmann Machines, a type of generative neural network that fits a Boltzmann Distribution to the training data via making use of latent variables.Specifically, we consider Restricted Boltzmann Machines (RBMs), which have seen significant success and frequent use in deep probabilistic models [10].RBMs consist of an input layer of visible nodes, and a layer of latent, or hidden nodes, which each have zero intra-group weights.Let v ∈ {0, 1} v and h ∈ {0, 1} h denote the visible and hidden nodes, respectively.It will be convenient for us to write x = (v, h) ∈ {0, 1} v+h as their concatenation.The probability distribution represented by a RBM is then with the restriction that Q ij = Q ji = 0 if i, j ∈ {1, . . ., v} or i, j ∈ {v + 1, . . ., v + h}.Hence, we have the simplified energy function where W is the v × h matrix consisting of the Q ij weights between the visible and hidden nodes, and b v and b h are vectors of the diagonal entries Q ii , i ∈ {1, . . ., v} corresponding to visible nodes, and Q ii , i ∈ {n + 1, ..., v + h} corresponding to hidden nodes, respectively.We will write the Boltzmann distribution with this energy function as P W,bv,b h , noting that this is also P model Q for the appropriate Q.
It is well known that RBMs can universally approximate discrete distributions [10], making them a powerful model.They are also more easily trained than general Boltzmann Machines, usually through the contrastive divergence algorithm as described in [12], or variants thereof.
Now we can calculate the gradient with respect to θ as The first term can be computed exactly and efficiently from the data, since the conditional P θ (h|v) admits the simple form P (h j = 1|v) = logistic(b h + (v T W ) j ); we refer the interested reader to [8] or [10] and will focus on the second term.Due to its intractability to compute (one would have to sum over all possibilities of v and h), the most promising approach is to approximate it by sampling from P θ (v, h).Classically, this is done via Gibbs sampling as described in [12].However, recent research has also investigated using quantum annealers to sample from the relevant Boltzmann distribution, as suggested in [8], which would make QAs useful in the training process since obtaining good Gibbs samples can be expensive.We note that together with our framework, QAs show promise to become useful for both the RBM training and the denoising process in the implementation of our method.

Image Denoising as Quadratic Unconstrained Binary Optimization
This section is devoted to showing how one can naturally frame the image denoising problem as a QUBO instance over a learned Boltzmann Distribution fit to the data.

Denoising via QUBO
Let us assume we are given a trained Restricted Boltzmann Machine described in Sec. 2. The model prescribes to each vector x ∈ {0, 1} v+h the cost f Q (x) and corresponding likelihood P model Q (x) defined in Eqs.(2.1) and (2.3), respectively.We will here make the assumption that P model Q describes the distribution of our data.Hence, high likelihood vectors in P model Q correspond to low cost vectors of f Q .In particular, note that finding the maximum likelihood argument in (2.2) corresponds to finding a solution to the QUBO instance in (2.1).Now, supposing this model, our goal is to reconstruct an image that has been affected by noise.The visible portion of our vector will be considered to be a flattened image with v pixels, black or white corresponding to 0 or 1, respectively, in the binary entries of the vector.

Noise Model
We now describe the noise assumptions we will conduct our analysis under.Definition 3.1.For x ∈ {0, 1} v , we define x afflicted by salt-and-pepper noise of level σ as the random variable Xx,σ := (x + ϵ)mod2, where ϵ i = B i (p) ∼ Bern(σ), independently.
In other words, a binary image afflicted by salt-and-pepper noise has each pixel independently flipped with probability σ.In particular, we are interested in XX,σ , where X ∼ P model Q , which is the compound random variable obtained by sampling X from the learned distribution of the data and then afflicting it with salt-and-pepper noise.For notational simplicity, will simply write X when the intended subscripts are clear from context.Note that salt-and-pepper noise is a natural noise model for binary data, since the only means in which pixels (or data entries, for general binary data) can be changed is by flipping the 0 − 1 value.
Suppose we are given a realization x ∈ {0, 1} v of XX,σ .The reconstruction process aims to retrieve this original X using x and the trained model through Q.The approach we will take begins from the intuition that X is likely to be a high-likelihood image that is close to x.To enforce this "closeness" to x while searching for higher likelihood images in our model to remove noise, we add to the cost in (2.1) a penalty for deviations from x to formulate the following natural denoising cost function: for some ρ > 0 that determines the penalty level.The intuition is that the minimizer of this function for a well-chosen ρ will change a restricted number of pixels to find an image that is similar to the noisy image, but has a lower cost, i.e. higher likelihood, under the model, in hopes of removing the noise.We show next that this minimizing (3.1) corresponds to solving a QUBO instance. Proof.
Noting that x i = x 2 i for the above derivation since they are in {0, 1} here.Since the xi terms do not depend on x, the claim follows.
Hence, solving the QUBO in on the right hand side of equation 3.2 gives us the solution to 3.1.Claim 1 thus tells us that we simply need to modify the diagonal of the original matrix Q of our model by adding diag(1 − 2x 1 , ..., 1 − 2x n ) and then solve the resulting QUBO to get the denoised image.We can then make use of quantum annealing to solve the resulting QUBO of 3.2, or use classical methods and heuristics like simulated annealing instead.We formally spell out the denoising procedure in algorithm QUBO Denoise.

QUBO Denoise
Input: A matrix Q, a noisy image x sampled from the distribution of XX,σ with X ∼ P model Q , and a penalty parameter ρ > 0. Output: A denoised image X * ρ,x,Q .
For the remainder of the paper, X * ρ,x,Q will denote the denoised image obtained by applying QUBO Denoise with noisy image x, penalty parameter ρ, and the distribution-defining matrix Q. Remark 3.2.Considering the entire process of sampling a noisy image and then denoising it, the measurability of X * ρ, XX,σ ,Q is inherited from the measurability of XX,σ , which in turn inherits its measurability as compound random variable of the measurable noise and original image X ∼ P model Q .

Optimal Choice of penalty parameter ρ
The choice of the parameter ρ for the proposed image denoising model is clearly crucial to its success, since different choices will result in different solutions.If ρ is chosen to be too small, there is very little cost to flipping a pixel, and then many pixels may be flipped and the solution may not resemble the noisy image at all anymore.If ρ is too large, we may be too heavily penalizing flipping pixels, and thus may not be able to get rid of noise effectively.Hence, we now turn towards finding the optimal choice for ρ.We will evaluate the choice of ρ via expected overlap: Definition 3.3.The expected overlap between two distributions P and a P ′ , is defined by where X ∼ P, X ′ ∼ P ′ .We will consider X ∼ P model Q , and X ′ as X * ρ, XX,σ ,Q the corresponding denoised image, and will also call d(P, P ′ ) the expected overlap between X and X ′ .To keep notation simple, for the remainder of this section allow us to write X in place of XX,σ , with X and σ being clear from context.
Our main positive result concerning the choice of ρ is summarized in the following theorem: 2 and X be the noisy image.Then choosing ρ = log 1 − σ σ to obtain X * ρ, X,Q is optimal with respect to maximizing the expected overlap between X and X * ρ, X,Q .
Proof.Let X ∼ P model Q , and X be X afflicted by salt-and-pepper noise of level σ.Then since XX,σ is obtained by flipping pixels with probability σ, we have the conditional probability where β σ := log 1−σ σ .In order to infer the original image X from the noisy one X, we utilize the Bayes formula and calculate the conditional probability P post βσ,Q (X = x| X = x).
Note that x includes pixels for hidden nodes, which is fine here.Our approach finds the state which is most likely under this distribution, which is realized by annealing for the above QUBO with the β σ term.
The overlap of two vectors x * and x is given by the proportion of shared entries.We consider the average (over the noise) of solutions, Xρ,x,Q with where θ(x) = 1 if x > 0, otherwise 0, noting that the right hand side represents the inferred pixel value based on the expectation from P model Q .We have formally distinguished P model Q (x) from P post ρ,Q (x|x), but in fact they are the same.Note that where sign(x) is the sign of x.Let α σ,Q := −β σ i (x i − x i ) 2 − i,j Q ij x i x j for conciseness.In order to evaluate the statistical performance of our method with coefficient ρ of penalty term, we calcuclate the average of overlap as A sum in the right hand side of the above equation holds (3.9) Hence, the averaged overlap holds This inequality means that the averaged overlap is maximized when ρ = β σ = log 1−σ σ .This theorem is based on a known fact in statistical physics of information processing [21] and translates the fact into the setting of our problem.Notably, the optimal choice of ρ does not depend on the distribution of the data, but only on the noise level, for which in many real world cases one may have good estimates.The proof of the theorem also reveals the following corollary: Corollary 3.5.Under the same assumptions of Theorem 3.4, setting ρ := log 1−σ σ makes X * ρ, X,Q the maximum a posteriori estimator for the original noise-free image X.
The corollary follows from observing that the energy function in the numerator of the posterior distribution (3.4) is exactly (3.1) with ρ := 1−σ σ , noting that minimizing (3.1) is equivalent to maximizing (3.4).However, this framework allows for additional flexibility in choosing the ρ parameter that is absent in standard MAP estimation.In fact, in sections 3.3 and 4.1 we go on to demonstrate that in practice, choosing a larger ρ may be beneficial for robustness of the method.
Though Theorem 3.4 derives the optimal choice of ρ, it does not give any guarantees that the method will yield an improvement in expected overlap, even under its assumptions.Next, we prove a theorem to show that in the case of visible units being independent of one another, our image denoising method produces in expectation strict denoising improvements with respect to the expected overlap.For c > 0 and a model distribution P model Q as in 2.2, let I c be the set of indices i such that |Q ii | > c.These indices correspond to components of X that are either 0 or 1 with probability at least 1 1 + e −c , depending on whether Q ii is positive or negative, respectively.Theorem 3.6.Suppose that Q is diagonal, X ∼ P Q , and that X is X afflicted by salt-and-pepper noise of level σ.With I c as defined above for c > 0, setting ρ ≥ log 1−σ σ , and assuming that I ρ ̸ = ∅, the expected overlap of the denoised image and the true image is strictly larger than the expected overlap of the noisy image and the true image, i.e.
Intuitively, these are the indices which are likely to be zero or one, respectively.Further, letting x †i denote the vector obtained by flipping entry i of x, we have that |f Hence, this reveals that x * solves (3.1) by setting x * i = 1 ∀i ∈ I 1 ρ , x * i = 0 ∀i ∈ I 0 ρ , and x * i = xi otherwise, since the value of f Q of (2.1) is reduced by more than ρ, so that the overall penalized objective (3.1) improves despite the ρ penalty accrued by the pixel flips.Now, let The cases where this happens are: i ∈ I 0 ρ and X i = 0, i ∈ I 1 ρ and X i = 1, or i / ∈ I ρ and pixel i was not flipped by the noise.
We know where σ is the probability that the pixel was flipped by the noise.On the other hand, P ( Xi = X i ) = 1 − σ ∀i.We characterize For the left-hand side, assuming I ρ ̸ = ∅, we have so that (3.12) holds when and the theorem is proven.
The assumption that matrix Q is diagonal is equivalent to the components of X being independent, which is not realistic with real data.However, since in the RBM model the visible units are independent conditioned on the hidden units, we still consider this independent case to be informative to the denoising method.In fact, if the hidden states were fixed (or known, or recovered correctly), Theorem 3.6 would apply.We leave it as a tantalizing open question to generalize this result beyond the independent case.The assumption of nonemptiness of I ρ is a natural one for the denoising task; indeed, when I ρ is empty, no entries of Q are large in magnitude, which is equivalent to the entries of X being close to uniformly distributed.In that case, intuitively of course it should not be possible to guarantee that we can denoise an image well if it looks like noise to begin with.

Robust Choice of ρ
The optimal choice of ρ as derived in Theorem 3.4 relies on the assumption that the observed data comes from the learned distribution, or equivalently that the distribution generating our data has been perfectly learned by the RBM.However, in practice we will always only approximately learn the data distribution.Hence, we do not want to rely too heavily on the exact distribution we have learned when we denoise the images.One may hope to have a more robust method by only changing the value of a pixel when there is some confidence in the model that the pixel should be flipped.We may thus want to penalize flipping pixels slightly more than we should under the idealistic setting of Theorem 3.4, which corresponds to choosing a larger ρ value than log 1−σ σ , or equivalently using a smaller σ ′ < σ value when setting ρ := log 1−σ ′ σ ′ .We opt for the latter as a means of intentionally biasing ρ to make the approach more robust for application.Figures 2 and 3 in Section 4 show the effect this proposed robustness modification has, demonstrating indeed that choosing a larger ρ via intentionally using a smaller σ yields positive results.If the true noise level is σ, our experiments demonstrate that setting to roughly ρ := 1−0.75σ 0.75σ has a positive effect on performance.

Empirical Results
This section contains results from implementing the previously described method and comparing it against other denoising approaches.Datasets and code are available on the first author's GitHub page for the purpose of easy reproducibility.

Results with Quantum Annealing
In this subsection, we present empirical results obtained by implementing our model on a quantum annealer, D-Wave's Advantage system4.1, which has 5000 qubits and enables embedding of a complete bipartite graph of size 172 × 172.Hence, we use 12 × 12 pixel images here so that the visible layer is of size 144.We test the method on two different datasets with very differently structured data.The first dataset is a 12 × 12 version of the well-known MNIST dataset [19], created by downsizing the original dataset with nearest-neighbor image downscaling and binarizing pixels.The second dataset we use is a 12 × 12 pixel Bars-and-Stripes (BAS) dataset, as has been used in closely related work [17,8], where an 8x8 version was used to accomodate a smaller 2000 qubit machine, D-Wave 2000Q used there.Each image consists of binary pixels with either each row or each column sharing the same values, so that each image consists of either "bars" or "stripes".
Figure 1: Examples of the denoising process using our method showing the true, noisy, and denoised images across different noise levels.
For both datasets we train the RBM by using the classical Contrastive Divergence algorithm first presented in [12].The number of hidden units was set to 50 and 64 for BAS and MNIST, respectively.For the BAS data, 4000 images were generated as training data, and 1000 as test data, while for MNIST, we simply used the full MNIST provided training set of 60,000 images and test set of 10,000 images.Noisy images were generated by adding salt-andpepper noise of level σ to images from the test dataset.Given a noisy image, we are then able to embed and solve the resulting denoising QUBO of 3.2 onto a D-Wave quantum annealer, Advantage system4.1.A function of D-Wave's Ocean softoware, find embedding, is utilized to find appropriate mappings from variables in a QUBO to physical qubits on D-Wave's Pegasus graph.A variable in QUBO is often mapped to multiple physical qubits, called chain, that are strongly connected to each other to behave like a single variable.A mapping can be used for every noisy images for each dataset, since their QUBO have the same graph stracuture.
We have prepared in advance 50 sets of the different mappings for each dataset and choose a mapping from the pool at random to embed QUBO of each image.This random selection is done to avoid possible artificial effects on the denoising performance from using only a particular mapping.Parameters for embedding and annealing, i.e., chain strength and annealing time, are tuned to maximize the performance.In particular, we set chain strength as the product of a coefficient c 0 and the maximum abstract value among the elements of each QUBO matrix, where we tune c 0 .The adopted values of the parameters are different between MNIST and BAS but the same values for all the range of σ.We set (c 0 , annealing time) = (0.6, 50 µs), (0.5, 40 µs) for BAS and MNIST, respectively.The number num reads of reads of annealing is 100 for each noisy image.We calculate the average of solution of each pixel over the reads to approximate Eq. (3.6) and use it to evaluate the overlap that is proportion of pixels in denoised images that matched the original image.We denoise 200 noisy images for each σ, which are randomly selected from the pool of test images for each sigma.Note also that for each value of sigma, the different methods compared use the same set of (randomly selected) noisy test images.Figures 2 and 3    Based on the empirical performance, using a bias factor of around 0.75 seems to give an improved performance compared to using a bias factor of 1 in both data sets.A bias factor of 0.5 seems to perform quite well across most noise regimes as well, with largely overlapping confidence regions to the 0.75 parameter setting, though in the low-noise setting for the BAS dataset we observe an adverse effect.The authors thus suggest a setting of 0.75 for the bias factor.
Next, in figures 4 and 5, we compare our method to popular other denoising methods for binary images on the 12×12 MNIST and bars-and-stripes datasets, respectively, across different noise levels.When comparing to other methods, a crucial factor is that we choose ρ based off of σ, but in practice σ may be unknown.In light of this, we include two versions of our method in these comparisons.First, we use our method with ρ := log 1−σ σ , using the true value of σ without introducing the recommended bias factor.Secondly, we simulate the situation in which the true σ is unknown, and instead we only have a guess for σ.To simulate having an approximate guess for σ, for each image afflicted by noise of level σ, we sample σ ′ uniformly from an interval of size σ/2 centered at sigma.We then set ρ := log 1−0.75σ ′ 0.75σ ′ , using a bias factor of 0.75 on with this "guessed" value of σ.This is a significantly more realistic way of testing our method, since it gives an idea of how well the method may perform when the true noise level present in the noisy images is unknown and must be guessed.Our implementation here only assumes that the practitioner roughly knows the magnitude of the noise.For example, if the true noise is σ = 0.2, here we sample σ ′ uniformly from [0.15, 0.25] to simulate the guess.We compare our method to Gibbs denoising with an RBM [25, section 3.2], median filtering [13], Gaussian filtering [24, chapter 5], and a graph-cut method [11] for denoising.For the Gibbs denoising, we use the same well-trained RBM as for our QUBO-based method, and parameters of the method were carefully tuned for best performance to use 20 Gibbs iterations to then construct the denoised image as the exponentially weighted average of the samples with decay factor 0.8.For the graph-cut method, the recommended parameter setting in the reference of β = 0.5 is used.Overall, the QUBO-based method performs quite strongly.Across all noise regimes in the MNIST data, and in most noise regimes in the bars-and-stripes dataset, the method outperforms the others.In particular, for the MNIST data the 95% confidence region for the QUBO method entirely dominates the others.Indeed, we see the good performance that our analysis from Section 3 suggests, even when the true σ is unknown and instead guessed.Using a guessed σ and the robustness modification of Section 3.3 makes the method perform as well (if not slightly better) as knowing the true σ without the robustness modification.Only in the noise regime of σ ≥ 0.2 in the BAS data does Gibbs denoising outperform our method.In Figure 1, we also provide examples of applying our denoising method to noisy images across different noise levels.

Testing on Larger Images
Though we see the the straightforward implementability of our method on quantum annealers as a strong positive, a current drawback on using QAs is the limited data size that can be handled to accomodate their still small qubit capacities.Of course we can still instead test our method on larger datasets by obtaining solutions to the denoising QUBO 3.1 using other means.In Figure 6, we implement our method on a binarized version of the popular MNIST dataset [19] by using simulated annealing [16] to find solutions to (3.1).We particularly choose to test on the full-size MNIST dataset since we could only use a downscaled version on the QA due to size limitations on the input data, so this experiment serves to test our method without this downscaling.All methods are implemented as described in 4.1, and again for our method we use a guessed σ to simulate the unknown σ case and bias the guess for robustness.
Figure 6: Proportion of pixels in denoised images that were correctly denoised, for different denoising methods on the MNIST dataset, with 95% confidence intervals shaded.

Conclusion and Future Work
We investigated an image denoising framework via a penalty-based QUBO denoising objective that shows promise both theoretically through its statistical properties and practically through its empirical performance together with the proposed robustness modification.The method is well-suited for implementability on a quantum annealer, providing an important application of QAs within machine learning through the fundamental image denoising task.Good results are still obtained on larger datasets when the QUBO is only classically approximated by simulated annealing instead, revealing the approach to be promising even in the absence of QAs.As RBMs form a core building block of many deep generative models such as deep Boltzmann machines or deep belief networks [10], a natural next step is to attempt to incorporate this approach into these more complex models, though current hardware limitations on existing quantum annealers are restrictive.Further, since our method takes advantage of QAs for the denoising step, further research into making use of QAs for the training process of RBMs would yield a full image denoising model where both the model training and image denoising make use of QA.

Funding
PK was supported in part by g-RIPS Sendai, Cyberscience Center at Tohoku Univ., and NEC Japan, in early stages of the work.PK is grateful to the USRA Feynman Academy internship program, support from the NASA Academic Mission Services (contract NNA16BD14C), and funding from DARPA under DARPA-NASA agreement SAA2-403688.
first investigate the robust choice of ρ as discussed in Section 3.3.This is done by using a biased value of σ when setting ρ = log 1−σ σ , instead setting ρ := log 1−bσ bσ for some bias factor b. The denoising performance for b ∈ {1.25, 1, 0.75, 0.5} are shown, with 95% confidence intervals obtained by bootstrapping.Note that using a bias factor b = 1 means using the true value of σ for determining ρ.

Figure 2 :
Figure 2: Proportion of pixels in denoised MNIST images that matched the original image, for different denoising methods with 95% CI error bars.

Figure 3 :
Figure 3: Proportion of pixels in denoised BAS images that matched the original image, for different denoising methods with 95% CI error bars.

Figure 4 :
Figure 4: Proportion of pixels in denoised MNIST images that matched the original image, for different denoising methods with 95% CI error bars.

Figure 5 :
Figure 5: Proportion of pixels in denoised BAS images that matched the original image, for different denoising methods with 95% CI error bars.