<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article article-type="research-article" dtd-version="2.3" xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Artif. Intell.</journal-id>
<journal-title>Frontiers in Artificial Intelligence</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Artif. Intell.</abbrev-journal-title>
<issn pub-type="epub">2624-8212</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">580607</article-id>
<article-id pub-id-type="doi">10.3389/frai.2020.580607</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Artificial Intelligence</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Graph Neural Networks for Maximum Constraint Satisfaction</article-title>
<alt-title alt-title-type="left-running-head">T&#xf6;nshoff et al.</alt-title>
<alt-title alt-title-type="right-running-head">RUN-CSP</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>T&#xf6;nshoff</surname>
<given-names>Jan</given-names>
</name>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1022428/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Ritzert</surname>
<given-names>Martin</given-names>
</name>
<uri xlink:href="https://loop.frontiersin.org/people/1101212/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Wolf</surname>
<given-names>Hinrikus</given-names>
</name>
<uri xlink:href="https://loop.frontiersin.org/people/1033799/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Grohe</surname>
<given-names>Martin</given-names>
</name>
<uri xlink:href="https://loop.frontiersin.org/people/1024332/overview"/>
</contrib>
</contrib-group>
<aff>Chair of Computer Science 7 (Logic and Theory of Discrete Systems), Department of Computer Science, RWTH Aachen University, <addr-line>Aachen</addr-line>, <country>Germany</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>
<bold>Edited by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/399203/overview">Sriraam Natarajan</ext-link>, The University of Texas at Dallas, United States</p>
</fn>
<fn fn-type="edited-by">
<p>
<bold>Reviewed by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1009060/overview">Mayukh Das</ext-link>, Samsung (India), India</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1049172/overview">Ugur Kursuncu</ext-link>, University of South Carolina, United States</p>
</fn>
<corresp id="c001">&#x2a;Correspondence: Jan T&#xf6;nshoff, <email>toenshoff@informatik.rwth-aachen.de</email>
</corresp>
<fn fn-type="other">
<p>This article was submitted to Machine Learning and Artificial Intelligence, a section of the journal Frontiers in Artificial Intelligence</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>25</day>
<month>02</month>
<year>2021</year>
</pub-date>
<pub-date pub-type="collection">
<year>2020</year>
</pub-date>
<volume>3</volume>
<elocation-id>580607</elocation-id>
<history>
<date date-type="received">
<day>06</day>
<month>07</month>
<year>2020</year>
</date>
<date date-type="accepted">
<day>29</day>
<month>10</month>
<year>2020</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x00A9; 2021 T&#xf6;nshoff, Ritzert, Wolf and Grohe.</copyright-statement>
<copyright-year>2021</copyright-year>
<copyright-holder>T&#xf6;nshoff, Ritzert, Wolf and Grohe</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract>
<p>Many combinatorial optimization problems can be phrased in the language of constraint satisfaction problems. We introduce a graph neural network architecture for solving such optimization problems. The architecture is generic; it works for all binary constraint satisfaction problems. Training is unsupervised, and it is sufficient to train on relatively small instances; the resulting networks perform well on much larger instances (at least 10-times larger). We experimentally evaluate our approach for a variety of problems, including Maximum Cut and Maximum Independent Set. Despite being generic, we show that our approach matches or surpasses most greedy and semi-definite programming based algorithms and sometimes even outperforms state-of-the-art heuristics for the specific problems.</p>
</abstract>
<kwd-group>
<kwd>graph neural networks</kwd>
<kwd>combinatorial optimization</kwd>
<kwd>unsupervised learning</kwd>
<kwd>constraint satisfaction problem</kwd>
<kwd>graph problems</kwd>
<kwd>constraint maximization</kwd>
</kwd-group>
<contract-num rid="cn001">GR 1492/16-1</contract-num>
<contract-sponsor id="cn001">Deutsche Forschungsgemeinschaft<named-content content-type="fundref-id">10.13039/501100001659</named-content>
</contract-sponsor>
</article-meta>
</front>
<body>
<sec id="s1">
<title>1 Introduction</title>
<p>Constraint satisfaction is a general framework for casting combinatorial search and optimization problems; many well-known NP-complete problems, for example, <italic>k</italic>-colorability, Boolean satisfiability and maximum cut can be modeled as constraint satisfaction problems (CSPs). Our focus is on the optimization version of constraint satisfaction, usually referred to as maximum constraint satisfaction (M<sc>ax</sc>-CSP), where the objective is to satisfy as many constraints of a given instance as possible. There is a long tradition of designing exact and heuristic algorithms for all kinds of CSPs. Our work should be seen in the context of a recently renewed interest in heuristics for NP-hard combinatorial problems based on neural networks, mostly GNNs (for example, <xref ref-type="bibr" rid="B27">Khalil et al., 2017</xref>; <xref ref-type="bibr" rid="B37">Selsam et al., 2019</xref>; <xref ref-type="bibr" rid="B30">Lemos et al., 2019</xref>; <xref ref-type="bibr" rid="B34">Prates et al., 2019</xref>).</p>
<p>We present a generic graph neural network (GNN) based architecture called RUN-CSP (Recurrent Unsupervised Neural Network for Constraint Satisfaction Problems) with the following key features:</p>
<p>
<bold>Unsupervised:</bold> Training is unsupervised and just requires a set of instances of the problem.</p>
<p>
<bold>Scalable:</bold> Networks trained on small instances achieve good results on much larger inputs.</p>
<p>
<bold>Generic:</bold> The architecture is generic and can learn to find approximate solutions for any binary M<sc>ax</sc>-CSP.</p>
<p>We remark that in principle, every CSP can be transformed into an equivalent binary CSP (see <xref ref-type="sec" rid="s2">Section 2</xref> for a discussion).</p>
<p>To solve M<sc>ax</sc>-CSPs, we train a GNN, which we view as a message passing protocol. The protocol is executed on a graph with nodes for all variables and edges for all constraints of the instance. After running the protocol for a fixed number of rounds, we extract probabilities for the possible values of each variable from its current state. All parameters determining the messages, the update of the internal states, and the readout function are learned. Since these parameters are shared over all variables, we can apply the model to instances of arbitrary size<xref ref-type="fn" rid="FN1">
<sup>1</sup>
</xref>. Our loss function rewards solutions with many satisfied constraints. Thus, our networks learn to satisfy the maximum number of constraints which naturally puts the focus on the optimization version M<sc>ax</sc>-CSP of the constraint satisfaction problem.</p>
<p>This focus on the optimization problem allows us to train unsupervised, which is a major point of distinction between our work and recent neural approaches to Boolean satisfiability (<xref ref-type="bibr" rid="B37">Selsam et al., 2019</xref>) and the coloring problem (<xref ref-type="bibr" rid="B30">Lemos et al., 2019</xref>). Both approaches require supervised training and output a prediction for satisfiability or coloring number. Furthermore, our approach not only returns a prediction whether the input instance is satisfiable, but it returns an (approximately optimal) variable assignment. The variable assignment is directly produced by a neural network, which distinguishes our end-to-end approach from methods that combine neural networks with conventional heuristics, such as <xref ref-type="bibr" rid="B27">Khalil et al., (2017)</xref> and <xref ref-type="bibr" rid="B33">Li et al., (2018)</xref>.</p>
<p>We experimentally evaluate our approach on the following NP-hard problems: the maximum 2-satisfiability problem (M<sc>ax</sc>-2-SAT), which asks for an assignment maximizing the number of satisfied clauses for a given Boolean formula in 2-conjunctive normal form; the maximum cut problem (M<sc>ax</sc>-C<sc>ut</sc>), which asks for a partition of a graph in two parts such that the number of edges between the parts is maximal (see <xref ref-type="fig" rid="F1">Figure 1</xref>); the 3-colorability problem (3-COL), which asks for a 3-coloring of the vertices of a given graph such that the two endvertices of each edge have distinct colors. We also consider the maximum independent set problem (M<sc>ax</sc>-IS), which asks for an independent set of maximum cardinality in a given graph. Strictly speaking, M<sc>ax</sc>-IS is not a maximum constraint satisfaction problem, because its objective is not to maximize the number of satisfied constraints, but to satisfy all constraints while maximizing the number of variables with a certain value. We include this problem to demonstrate that our approach can easily be adapted to such related problems.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption>
<p>A 2-coloring for a grid graph found by RUN-CSP in 40 iterations. Conflicting edges are shown in red.</p>
</caption>
<graphic xlink:href="frai-03-580607-g001.tif"/>
</fig>
<p>Our experiments show that our approach works well for all four problems and matches competitive baselines. Since our approach is generic for all M<sc>ax</sc>-CSPs, those baselines include other general approaches such as greedy algorithms and semi-definite programming (SDP). The latter is particularly relevant, because it is known (under certain complexity theoretic assumptions) that SDP achieves optimal approximation ratios for all M<sc>ax</sc>-CSPs (<xref ref-type="bibr" rid="B35">Raghavendra, 2008</xref>). For M<sc>ax</sc>-2-SAT, our approach even manages to surpass a state-of-the-art heuristic. In general, our method is not competitive with the highly specialized state-of-the-art heuristics. However, we demonstrate that our approach clearly improves on the state-of-the-art for neural methods on small and medium-sized binary CSP instances, while still being completely generic. We remark that our approach does not give any guarantees, as opposed to some traditional solvers which guarantee that no better solution exists.</p>
<p>Almost all models are trained on quite small training sets consisting of small random instances. We evaluate those models on unstructured random instances as well as more structured benchmark instances. Instance sizes vary from small instances with 100 variables and 200 constraints to medium sized instances with more than 1,000 variables and over 10,000 constraints. We observe that RUN-CSP is able to generalize well from small instances to instances both smaller and much larger. The largest (benchmark) instance we evaluate on has approximately 120,000 constraints, but that instance required the use of large training graphs. Computations with RUN-CSP are very fast in comparison to many heuristics and profit from modern hardware like GPUs. For medium-sized instances with 10,000 constraints inference takes less than 5&#xa0;s.</p>
<sec id="s1-1">
<title>1.1 Related Work</title>
<p>Traditional methods for solving CSPs include combinatorial constraint propagation algorithms, logic programming techniques and domain specific approaches, for an overview see <xref ref-type="bibr" rid="B5">Apt (2003)</xref>, <xref ref-type="bibr" rid="B17">Dechter (2003)</xref>. Our experimental baselines include a wide range of classical algorithms, mostly designed for specific problems. For M<sc>ax</sc>-2-SAT, we compare the performance to that of <italic>WalkSAT</italic> (<xref ref-type="bibr" rid="B36">Selman et al., 1993</xref>; <xref ref-type="bibr" rid="B26">Kautz, 2019</xref>), which is a popular stochastic local search heuristic for M<sc>ax</sc>-SAT. Furthermore, we use the state-of-the-art M<sc>ax</sc>-SAT solver <italic>Loandra</italic> (<xref ref-type="bibr" rid="B9">Berg et al., 2019</xref>), which combines linear search and core-guided algorithms. On the M<sc>ax</sc>-C<sc>ut</sc> problem, we compare our method to multiple implementations of a heuristic approach by <xref ref-type="bibr" rid="B21">Goemans and Williamson (1995)</xref>. This method is based on semi-definite programming (SDP) and is particularly popular since it has a proven approximation ratio of <inline-formula id="inf1">
<mml:math id="minf1">
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
<mml:mo>&#x2243;</mml:mo>
<mml:mn>0.878</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>. Other M<sc>ax</sc>-C<sc>ut</sc> baselines utilize extremal optimization (<xref ref-type="bibr" rid="B10">Boettcher and Percus, 2001</xref>) and local search (<xref ref-type="bibr" rid="B8">Benlic and Hao, 2013</xref>). For M<sc>ax</sc>-3-C<sc>ol</sc>, we measure the results against <italic>HybridEA</italic> (<xref ref-type="bibr" rid="B19">Galinier and Hao, 1999</xref>; <xref ref-type="bibr" rid="B32">Lewis et al., 2012</xref>; <xref ref-type="bibr" rid="B31">Lewis, 2015</xref>), which is an evolutionary algorithm with state-of-the-art performance. Furthermore, a simple greedy coloring heuristic (<xref ref-type="bibr" rid="B11">Br&#xe9;laz, 1979</xref>) is also used as a comparison. <italic>ReduMIS</italic> is a state-of-the-art M<sc>ax</sc>-IS solver that combines kernelization techniques and evolutionary algorithms. We use it as a M<sc>ax</sc>-IS baseline, together with a simple greedy algorithm.</p>
<p>Beyond these traditional approaches there have been several attempts to apply neural networks to NP-hard problems and more specifically CSPs. An early group of papers dates back to the 1980s and uses Hopfield Networks (<xref ref-type="bibr" rid="B25">Hopfield and Tank, 1985</xref>) to approximate TSP and other discrete problems using neural networks. Hopfield and Tank use a single-layer neural network with sigmoid activation and apply gradient descent to come up with an approximative solution. The loss function adopts soft assignments and uses the length of the TSP tour and a term penalizing incorrect tours as loss, hence being unsupervised. This approach has been extended to <italic>k</italic>-colorability (<xref ref-type="bibr" rid="B15">Dahl, 1987</xref>; <xref ref-type="bibr" rid="B38">Takefuji and Lee, 1991</xref>; <xref ref-type="bibr" rid="B20">Gassen and Carothers, 1993</xref>; <xref ref-type="bibr" rid="B22">Harmanani et al., 2010</xref>) and other CSPs (<xref ref-type="bibr" rid="B2">Adorf and Johnston, 1990</xref>). The loss functions used in some of these approaches are similar to ours.</p>
<p>Newer approaches involve modern machine learning techniques and are usually based on GNNs. NeuroSAT (<xref ref-type="bibr" rid="B37">Selsam et al., 2019</xref>), a learned message passing network for predicting satisfiability, reignited the interest in solving NP-complete problems with neural networks. <xref ref-type="bibr" rid="B34">Prates et al., (2019)</xref> use GNNs to learn TSP and trained on instances of the form <inline-formula id="inf2">
<mml:math id="minf2">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>G</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>&#x2113;</mml:mi>
<mml:mo>&#xb1;</mml:mo>
<mml:mi>&#x3b5;</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> where <inline-formula id="inf3">
<mml:math id="minf3">
<mml:mi>&#x2113;</mml:mi>
</mml:math>
</inline-formula> is the length of an optimal tour on <italic>G</italic>. They achieved good results on graphs with up to 40 nodes. Using the same idea, <xref ref-type="bibr" rid="B30">Lemos et al., (2019)</xref> learned to predict <italic>k</italic>-colorability of graphs scaling to larger graphs and chromatic numbers than seen during training. <xref ref-type="bibr" rid="B43">Yao et al., (2019)</xref> evaluated the performance of unsupervised GNNs for the M<sc>ax</sc>-C<sc>ut</sc> problem. They adapted a GNN architecture by <xref ref-type="bibr" rid="B12">Chen et al., (2019)</xref> to M<sc>ax</sc>-C<sc>ut</sc> and trained two versions of their network, one through policy gradient descent and the other via a differentiable relaxation of the loss function which both achieved similar results. <xref ref-type="bibr" rid="B4">Amizadeh et al., (2019)</xref> proposed an unsupervised architecture for C<sc>ircuit</sc>-SAT, which predicts satisfying variable assignments for a given formula. <xref ref-type="bibr" rid="B27">Khalil et al., (2017)</xref> proposed an approach for combinatorial graph problems that combines reinforcement learning and greedy search. They iteratively construct solutions by greedily adding nodes according to estimated scores. The scores are computed by a neural network, which is trained through Q-Learning. They test their method on the MVC, M<sc>ax</sc>-C<sc>ut</sc>, and TSP problems, where they outperform traditional heuristics across several benchmark instances. For the &#x23;P-hard weighted model counting problem for DNF formulas, <xref ref-type="bibr" rid="B1">Abboud et al., (2019)</xref> applied a GNN-based message passing approach. Finally, <xref ref-type="bibr" rid="B33">Li et al., (2018)</xref> use a GNN to guide a tree search for M<sc>ax</sc>-IS.</p>
</sec>
</sec>
<sec id="s2">
<title>2 Constraint Satisfaction Problems</title>
<p>Formally, a <italic>CSP-instance</italic> is a triple <inline-formula id="inf4">
<mml:math id="minf4">
<mml:mrow>
<mml:mi>I</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>X</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>D</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>C</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, where <italic>X</italic> is a set of variables, <italic>D</italic> is a domain, and <italic>C</italic> is a set of constraints of the form <inline-formula id="inf5">
<mml:math id="minf5">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>&#x2113;</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>R</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> for some <inline-formula id="inf6">
<mml:math id="minf6">
<mml:mrow>
<mml:mi>R</mml:mi>
<mml:mo>&#x2286;</mml:mo>
<mml:msup>
<mml:mi>D</mml:mi>
<mml:mi>&#x2113;</mml:mi>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula>. A <italic>constraint language</italic> is a finite set <inline-formula id="inf7">
<mml:math id="minf7">
<mml:mtext>&#x393;</mml:mtext>
</mml:math>
</inline-formula> of relations over some fixed domain <italic>D</italic>, and <italic>I</italic> is a <inline-formula id="inf8">
<mml:math id="minf8">
<mml:mo>&#x393;</mml:mo>
</mml:math>
</inline-formula>-instance if <inline-formula id="inf9">
<mml:math id="minf9">
<mml:mrow>
<mml:mi>R</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mo>&#x393;</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> for all constraints <inline-formula id="inf10">
<mml:math id="minf10">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>&#x2113;</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>R</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>C</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>. An <italic>assignment</italic> <inline-formula id="inf11">
<mml:math id="minf11">
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
<mml:mo>:</mml:mo>
<mml:mi>X</mml:mi>
<mml:mo>&#x2192;</mml:mo>
<mml:mi>D</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> satisfies a constraint <inline-formula id="inf12">
<mml:math id="minf12">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>&#x2113;</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>R</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> if <inline-formula id="inf13">
<mml:math id="minf13">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>&#x3b1;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>&#x2113;</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>R</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>, and it satisfies the instance <italic>I</italic> if it satisfies all constraints in <italic>C</italic>. <inline-formula id="inf14">
<mml:math id="minf14">
<mml:mrow>
<mml:mtext>CSP</mml:mtext>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mo>&#x393;</mml:mo>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> is the problem of deciding whether a given <inline-formula id="inf15">
<mml:math id="minf15">
<mml:mo>&#x393;</mml:mo>
</mml:math>
</inline-formula>-instance has a satisfying assignment and finding such an assignment if there is one. M<sc>ax</sc>
<inline-formula id="inf16">
<mml:math id="minf16">
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mtext>CSP</mml:mtext>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mo>&#x393;</mml:mo>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> is the problem of finding an assignment that satisfies the maximum number of constraints.</p>
<p>For example, an instance of 3-COL has a variable <inline-formula id="inf17">
<mml:math id="minf17">
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>v</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> for each vertex <italic>v</italic> of the input graph, domain <inline-formula id="inf18">
<mml:math id="minf18">
<mml:mrow>
<mml:mi>D</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mn>1,2,3</mml:mn>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, and a constraint <inline-formula id="inf19">
<mml:math id="minf19">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>v</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>w</mml:mi>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mi>R</mml:mi>
<mml:mo>&#x2260;</mml:mo>
<mml:mn>3</mml:mn>
</mml:msubsup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> for each edge <inline-formula id="inf20">
<mml:math id="minf20">
<mml:mrow>
<mml:mi>v</mml:mi>
<mml:mi>w</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> of the graph. Here, <inline-formula id="inf21">
<mml:math id="minf21">
<mml:mrow>
<mml:msubsup>
<mml:mi>R</mml:mi>
<mml:mo>&#x2260;</mml:mo>
<mml:mn>3</mml:mn>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mn>1,2</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mn>2,1</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mn>1,3</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mn>3,1</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mn>2,3</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mn>3,2</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> is the inequality relation on <inline-formula id="inf22">
<mml:math id="minf22">
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mn>1,2,3</mml:mn>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. Thus 3-COL is <inline-formula id="inf23">
<mml:math id="minf23">
<mml:mrow>
<mml:mtext>CSP</mml:mtext>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mi>R</mml:mi>
<mml:mo>&#x2260;</mml:mo>
<mml:mn>3</mml:mn>
</mml:msubsup>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
<p>In this paper, we only consider <italic>binary CSPs</italic>, that is, CSPs whose constraint language only contains unary and binary relations. From a theoretical perspective, this is no real restriction, because it is well known that every CSP can be transformed into an &#x201c;equivalent&#x201d; binary CSP (see <xref ref-type="bibr" rid="B17">Dechter, 2003</xref>). Let us review the construction. Suppose we have a constraint language <inline-formula id="inf24">
<mml:math id="minf24">
<mml:mtext>&#x393;</mml:mtext>
</mml:math>
</inline-formula> of maximum arity <inline-formula id="inf25">
<mml:math id="minf25">
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>&#x2265;</mml:mo>
<mml:mn>3</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> over some domain <italic>D</italic>. We construct a binary constraint language <inline-formula id="inf26">
<mml:math id="minf26">
<mml:mrow>
<mml:mover accent="true">
<mml:mtext>&#x393;</mml:mtext>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula> as follows. The domain <inline-formula id="inf27">
<mml:math id="minf27">
<mml:mrow>
<mml:mover accent="true">
<mml:mi>D</mml:mi>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula> of <inline-formula id="inf28">
<mml:math id="minf28">
<mml:mrow>
<mml:mover accent="true">
<mml:mtext>&#x393;</mml:mtext>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula> consists of all elements of <italic>D</italic> as well as all pairs <inline-formula id="inf29">
<mml:math id="minf29">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi mathvariant="bold-italic">a</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>R</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> where <inline-formula id="inf30">
<mml:math id="minf30">
<mml:mrow>
<mml:mi>R</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mtext>&#x393;</mml:mtext>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf31">
<mml:math id="minf31">
<mml:mi mathvariant="bold-italic">a</mml:mi>
</mml:math>
</inline-formula> is a tuple occurring in <italic>R</italic>. For every <inline-formula id="inf32">
<mml:math id="minf32">
<mml:mrow>
<mml:mi>R</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mtext>&#x393;</mml:mtext>
</mml:mrow>
</mml:math>
</inline-formula>, we add a unary relation <inline-formula id="inf33">
<mml:math id="minf33">
<mml:mrow>
<mml:msub>
<mml:mi>Q</mml:mi>
<mml:mi>R</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> consisting of all pairs <inline-formula id="inf34">
<mml:math id="minf34">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi mathvariant="bold-italic">a</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>R</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>D</mml:mi>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> where <inline-formula id="inf35">
<mml:math id="minf35">
<mml:mrow>
<mml:mi mathvariant="bold-italic">a</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>R</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>. Moreover, for <inline-formula id="inf36">
<mml:math id="minf36">
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2264;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>&#x2264;</mml:mo>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> we add a binary &#x201c;projection&#x201d; relation <inline-formula id="inf37">
<mml:math id="minf37">
<mml:mrow>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> consisting of all pairs <inline-formula id="inf38">
<mml:math id="minf38">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi mathvariant="bold-italic">a</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>R</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> for <inline-formula id="inf39">
<mml:math id="minf39">
<mml:mrow>
<mml:mi>R</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mtext>&#x393;</mml:mtext>
</mml:mrow>
</mml:math>
</inline-formula>, say of arity <inline-formula id="inf40">
<mml:math id="minf40">
<mml:mrow>
<mml:mi>&#x2113;</mml:mi>
<mml:mo>&#x2264;</mml:mo>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>, and <inline-formula id="inf41">
<mml:math id="minf41">
<mml:mrow>
<mml:mi mathvariant="bold-italic">a</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mi>&#x2113;</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>R</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>. Finally, for every instance <inline-formula id="inf42">
<mml:math id="minf42">
<mml:mrow>
<mml:mi>I</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>X</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>D</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>C</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> of <inline-formula id="inf43">
<mml:math id="minf43">
<mml:mrow>
<mml:mtext>CSP</mml:mtext>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mtext>&#x393;</mml:mtext>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> we construct an instance <inline-formula id="inf44">
<mml:math id="minf44">
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>I</mml:mi>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>X</mml:mi>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>D</mml:mi>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>C</mml:mi>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> of <inline-formula id="inf45">
<mml:math id="minf45">
<mml:mrow>
<mml:mtext>CSP</mml:mtext>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mtext>&#x393;</mml:mtext>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, where <inline-formula id="inf46">
<mml:math id="minf46">
<mml:mrow>
<mml:mover accent="true">
<mml:mi>X</mml:mi>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula> consists of all variables in <italic>X</italic> and a new variable <inline-formula id="inf47">
<mml:math id="minf47">
<mml:mrow>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>c</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> for every constraint <inline-formula id="inf48">
<mml:math id="minf48">
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>&#x2113;</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>R</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>C</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf49">
<mml:math id="minf49">
<mml:mrow>
<mml:mover accent="true">
<mml:mi>C</mml:mi>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula> consists of a <italic>tuple constraint</italic> <inline-formula id="inf50">
<mml:math id="minf50">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>c</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>Q</mml:mi>
<mml:mi>R</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> and <italic>projection constraints</italic> <inline-formula id="inf51">
<mml:math id="minf51">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>c</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> for all <inline-formula id="inf52">
<mml:math id="minf52">
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2264;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>&#x2264;</mml:mo>
<mml:mi>&#x2113;</mml:mi>
<mml:mo>&#x2264;</mml:mo>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>. Here, the tuple constraints select for every constraint <inline-formula id="inf53">
<mml:math id="minf53">
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>x</mml:mi>
<mml:mo>&#xaf;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>R</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>C</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> a tuple <inline-formula id="inf54">
<mml:math id="minf54">
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>R</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> and the projection constraints ensure a consistent assignment to the original variables <inline-formula id="inf55">
<mml:math id="minf55">
<mml:mrow>
<mml:mi>X</mml:mi>
<mml:mo>&#x2286;</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>X</mml:mi>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. Then the instances <italic>I</italic> and <inline-formula id="inf56">
<mml:math id="minf56">
<mml:mrow>
<mml:mover accent="true">
<mml:mi>I</mml:mi>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula> are equivalent in the sense that <italic>I</italic> is satisfiable if and only if <inline-formula id="inf57">
<mml:math id="minf57">
<mml:mrow>
<mml:mover accent="true">
<mml:mi>I</mml:mi>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula> is and there is a one-to-one correspondence between the satisfying assignments.</p>
<p>However, the construction is not approximation preserving. For example, it is not the case that an assignment satisfying 90% of the constraints of <inline-formula id="inf58">
<mml:math id="minf58">
<mml:mrow>
<mml:mover accent="true">
<mml:mi>I</mml:mi>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula> yields an assignment satisfying 90% of the constraints of <italic>I</italic>. It is possible to fix this by adding weights to the constraints, making it more expensive to violate projection constraints. Moreover, and arguably more importantly in this context, it is not clear how well our method works on CSPs of higher arity when translated to binary CSPs using this construction. We leave a thorough experimental evaluation of CSPs with higher arities for future work.</p>
</sec>
<sec sec-type="materials|methods" id="s3">
<title>3 Method</title>
<sec id="s3-1">
<title>3.1 Architecture</title>
<p>We use a randomized recurrent GNN architecture to evaluate a given problem instance using message passing. For any binary constraint language <inline-formula id="inf59">
<mml:math id="minf59">
<mml:mo>&#x393;</mml:mo>
</mml:math>
</inline-formula> a RUN-CSP network can be trained to approximate M<sc>ax</sc>
<inline-formula id="inf60">
<mml:math id="minf60">
<mml:mrow>
<mml:mo>&#x2010;</mml:mo>
<mml:mtext>CSP</mml:mtext>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mo>&#x393;</mml:mo>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. Intuitively, our network can be viewed as a trainable communication protocol through which the variables of a given instance can negotiate a value assignment. With every variable <inline-formula id="inf61">
<mml:math id="minf61">
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>X</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> we associate a short-term state <inline-formula id="inf62">
<mml:math id="minf62">
<mml:mrow>
<mml:msubsup>
<mml:mi>s</mml:mi>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mi mathvariant="normal">&#x211d;</mml:mi>
<mml:mi>k</mml:mi>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> and a hidden (long-term) state <inline-formula id="inf63">
<mml:math id="minf63">
<mml:mrow>
<mml:msubsup>
<mml:mi>h</mml:mi>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mi mathvariant="normal">&#x211d;</mml:mi>
<mml:mi>k</mml:mi>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> which change throughout the message passing iterations <inline-formula id="inf64">
<mml:math id="minf64">
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mtext>max</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. The short-term state vector <inline-formula id="inf65">
<mml:math id="minf65">
<mml:mrow>
<mml:msubsup>
<mml:mi>s</mml:mi>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula> for every variable <italic>x</italic> is initialized by sampling each value independently from a normal distribution with zero mean and unit variance. All hidden states <inline-formula id="inf66">
<mml:math id="minf66">
<mml:mrow>
<mml:msubsup>
<mml:mi>h</mml:mi>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula> are initialized as zero vectors.</p>
<p>Every message passing step uses the same weights and thus we are free to choose the number <inline-formula id="inf67">
<mml:math id="minf67">
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mtext>max</mml:mtext>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:mi mathvariant="normal">&#x2115;</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> of iterations for which RUN-CSP runs on a given problem instance. This number may or may not be identical to the number of iterations used for training. The state size <italic>k</italic> and the number of iterations used for training <inline-formula id="inf68">
<mml:math id="minf68">
<mml:mrow>
<mml:msubsup>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mtext>max</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:mtext>tr</mml:mtext>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula> and evaluation <inline-formula id="inf69">
<mml:math id="minf69">
<mml:mrow>
<mml:msubsup>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mtext>max</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:mtext>ev</mml:mtext>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula> are the main hyperparameters of our network.</p>
<p>Variables <italic>x</italic> and <italic>y</italic> that co-occur in a constraint <inline-formula id="inf70">
<mml:math id="minf70">
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>y</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>R</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> can exchange messages. Each message depends on the states <inline-formula id="inf71">
<mml:math id="minf71">
<mml:mrow>
<mml:msubsup>
<mml:mi>s</mml:mi>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mi>s</mml:mi>
<mml:mi>y</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula>, the relation <italic>R</italic>, and the order of <italic>x</italic> and <italic>y</italic> in the constraint but not on the internal long-term states <inline-formula id="inf72">
<mml:math id="minf72">
<mml:mrow>
<mml:msubsup>
<mml:mi>h</mml:mi>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mi>h</mml:mi>
<mml:mi>y</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula>. The dependence on <italic>R</italic> implies that we have independent message generation functions for every relation <italic>R</italic> in the constraint language <inline-formula id="inf73">
<mml:math id="minf73">
<mml:mo>&#x393;</mml:mo>
</mml:math>
</inline-formula>. The process of message passing and updating the internal states is repeated <inline-formula id="inf74">
<mml:math id="minf74">
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mtext>max</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> times. We use linear functions to compute the messages as preliminary experiments showed that more complicated functions did not improve performance while being less stable and less efficient during training. Thus, the messaging function for every relation <italic>R</italic> is defined by a trainable weight matrix <inline-formula id="inf75">
<mml:math id="minf75">
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mi>R</mml:mi>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mi mathvariant="normal">&#x211d;</mml:mi>
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:mi>k</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:mn>2</mml:mn>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> as<disp-formula id="e1">
<mml:math id="me1">
<mml:mrow>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>R</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mi>s</mml:mi>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mi>s</mml:mi>
<mml:mi>y</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mi>R</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msubsup>
<mml:mi>s</mml:mi>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msubsup>
<mml:mi>s</mml:mi>
<mml:mi>y</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:math>
<label>(1)</label>
</disp-formula>The output of <inline-formula id="inf76">
<mml:math id="minf76">
<mml:mrow>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>R</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> consists of two stacked <italic>k</italic>-dimensional vectors, which represent the messages to <italic>x</italic> and <italic>y</italic>, respectively. Note that the generated messages depend on the order of the variables in the constraint. This behavior is desirable for asymmetric relations. For symmetric relations we modify <inline-formula id="inf77">
<mml:math id="minf77">
<mml:mrow>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>R</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> to produce messages independently from the order of variables in <italic>c</italic>. In this case we use a smaller weight matrix <inline-formula id="inf78">
<mml:math id="minf78">
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mi>R</mml:mi>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mi mathvariant="normal">&#x211d;</mml:mi>
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:mn>2</mml:mn>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> to generate both messages. Note that the two messages can still be different, but the content of each message depends only on the states of the endpoints.</p>
<p>The internal states <inline-formula id="inf79">
<mml:math id="minf79">
<mml:mrow>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:mi>x</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf80">
<mml:math id="minf80">
<mml:mrow>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mi>x</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> are updated by an LSTM cell based on the mean of the received messages. For a variable <italic>x</italic> which received the messages <inline-formula id="inf81">
<mml:math id="minf81">
<mml:mrow>
<mml:msub>
<mml:mi>m</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>m</mml:mi>
<mml:mi>&#x2113;</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> the new states are thus computed by<disp-formula id="e2">
<mml:math id="me2">
<mml:mrow>
<mml:msubsup>
<mml:mi>h</mml:mi>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mi>s</mml:mi>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:mtext>LSTM</mml:mtext>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mi>h</mml:mi>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:mtext>&#x2009;</mml:mtext>
<mml:msubsup>
<mml:mi>s</mml:mi>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mi>&#x2113;</mml:mi>
</mml:mfrac>
<mml:munderover>
<mml:mstyle displaystyle="true">
<mml:mo>&#x2211;</mml:mo>
</mml:mstyle>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>&#x2113;</mml:mi>
</mml:munderover>
<mml:msub>
<mml:mi>m</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:math>
<label>(2)</label>
</disp-formula>For every variable <italic>x</italic> and iteration <inline-formula id="inf82">
<mml:math id="minf82">
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mtext>max</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, the network produces a soft assignment <inline-formula id="inf83">
<mml:math id="minf83">
<mml:mrow>
<mml:msup>
<mml:mi>&#x3c6;</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> from the state <inline-formula id="inf84">
<mml:math id="minf84">
<mml:mrow>
<mml:msubsup>
<mml:mi>s</mml:mi>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula>. In our architecture we use <inline-formula id="inf85">
<mml:math id="minf85">
<mml:mrow>
<mml:msup>
<mml:mi>&#x3c6;</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mtext>softmax</mml:mtext>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>W</mml:mi>
<mml:msubsup>
<mml:mi>s</mml:mi>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> with <inline-formula id="inf86">
<mml:math id="minf86">
<mml:mrow>
<mml:mi>W</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mi mathvariant="normal">&#x211d;</mml:mi>
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> trainable and <inline-formula id="inf87">
<mml:math id="minf87">
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mi>D</mml:mi>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> (domain size of the CSP). In <italic>&#x3c6;</italic>, the linear function reduces the dimensionality while the softmax function enforces stochasticity. The soft assignments <inline-formula id="inf88">
<mml:math id="minf88">
<mml:mrow>
<mml:msup>
<mml:mi>&#x3c6;</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> can be interpreted as probabilities of a variable <italic>x</italic> receiving a certain value <inline-formula id="inf89">
<mml:math id="minf89">
<mml:mrow>
<mml:mi>v</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>D</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>. If the domain <italic>D</italic> contains only two values, we compute a &#x201c;probability&#x201d; <inline-formula id="inf90">
<mml:math id="minf90">
<mml:mrow>
<mml:msup>
<mml:mi>p</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>&#x3c3;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>W</mml:mi>
<mml:msubsup>
<mml:mi>s</mml:mi>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> for each node with <inline-formula id="inf91">
<mml:math id="minf91">
<mml:mrow>
<mml:mi>W</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mi mathvariant="normal">&#x211d;</mml:mi>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#xd7;</mml:mo>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula>. The soft assignment is then given by <inline-formula id="inf92">
<mml:math id="minf92">
<mml:mrow>
<mml:msup>
<mml:mi>&#x3c6;</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi>p</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:msup>
<mml:mi>p</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. To obtain a hard variable assignment <inline-formula id="inf93">
<mml:math id="minf93">
<mml:mrow>
<mml:msup>
<mml:mi>&#x3b1;</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
<mml:mo>:</mml:mo>
<mml:mi>X</mml:mi>
<mml:mo>&#x2192;</mml:mo>
<mml:mi>D</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>, we assign the value with the highest estimated probability in <inline-formula id="inf94">
<mml:math id="minf94">
<mml:mrow>
<mml:msup>
<mml:mi>&#x3c6;</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> for each variable <inline-formula id="inf95">
<mml:math id="minf95">
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>X</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>. From the hard assignments <inline-formula id="inf96">
<mml:math id="minf96">
<mml:mrow>
<mml:msup>
<mml:mi>&#x3b1;</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mi>&#x3b1;</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mtext>max</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:mtext>ev</mml:mtext>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula>, we select the one with the most satisfied constraints as the final prediction of the network. This is not necessarily the last assignment <inline-formula id="inf97">
<mml:math id="minf97">
<mml:mrow>
<mml:msup>
<mml:mi>&#x3b1;</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mtext>max</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:mtext>ev</mml:mtext>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
<p>
<bold>Input:</bold> Instance <inline-formula id="inf98">
<mml:math id="minf98">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>X</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>C</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf99">
<mml:math id="minf99">
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mtext>max</mml:mtext>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:mi mathvariant="normal">&#x2115;</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
<p>
<bold>Output:</bold> <inline-formula id="inf100">
<mml:math id="minf100">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi>&#x3c6;</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mi>&#x3c6;</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mtext>max</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mi>&#x3c6;</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
<mml:mo>:</mml:mo>
<mml:mi>X</mml:mi>
<mml:mo>&#x2192;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mn>0,1</mml:mn>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mi>d</mml:mi>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
<p>
<bold>for</bold> <inline-formula id="inf101">
<mml:math id="minf101">
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>X</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> <bold>do</bold>.</p>
<p>//<italic>random initialization</italic>
<disp-formula id="equ1">
<mml:math id="mequ1">
<mml:mrow>
<mml:msubsup>
<mml:mi>s</mml:mi>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x223c;</mml:mo>
<mml:mi mathvariant="script">N</mml:mi>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mn>0,1</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mi>k</mml:mi>
</mml:msup>
</mml:mrow>
</mml:math>
</disp-formula>
<disp-formula id="equ2">
<mml:math id="mequ2">
<mml:mrow>
<mml:msubsup>
<mml:mi>h</mml:mi>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
<mml:mo>:</mml:mo>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mi mathvariant="normal">&#x211d;</mml:mi>
<mml:mi>k</mml:mi>
</mml:msup>
</mml:mrow>
</mml:math>
</disp-formula>
<bold>for</bold> <inline-formula id="inf102">
<mml:math id="minf102">
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mtext>max</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> <bold>do</bold>.</p>
<p>
<bold>for</bold> <inline-formula id="inf103">
<mml:math id="minf103">
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mo>:</mml:mo>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>y</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>R</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>C</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> <bold>do</bold>.</p>
<p>//<italic>generate messages</italic>
<disp-formula id="equ3">
<mml:math id="mequ3">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mi>m</mml:mi>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:mtext>&#x2009;</mml:mtext>
<mml:msubsup>
<mml:mi>m</mml:mi>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>:</mml:mo>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>R</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mi>s</mml:mi>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:mtext>&#x2009;</mml:mtext>
<mml:msubsup>
<mml:mi>s</mml:mi>
<mml:mi>y</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</disp-formula>
<bold>for</bold> <inline-formula id="inf104">
<mml:math id="minf104">
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>X</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> <bold>do</bold>.</p>
<p>//<italic>combine messages and update</italic>
<disp-formula id="equ4">
<mml:math id="mequ4">
<mml:mrow>
<mml:msubsup>
<mml:mi>r</mml:mi>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
<mml:mo>:</mml:mo>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mrow>
<mml:mtext>deg</mml:mtext>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
<mml:mstyle displaystyle="true">
<mml:msub>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>C</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>c</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:msubsup>
<mml:mi>m</mml:mi>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
</mml:math>
</disp-formula>
<disp-formula id="equ5">
<mml:math id="mequ5">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mi>h</mml:mi>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mi>s</mml:mi>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>:</mml:mo>
<mml:mo>&#x3d;</mml:mo>
<mml:mtext>LSTM</mml:mtext>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mi>h</mml:mi>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:mtext>&#x2009;</mml:mtext>
<mml:msubsup>
<mml:mi>s</mml:mi>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:mtext>&#x2009;</mml:mtext>
<mml:msubsup>
<mml:mi>r</mml:mi>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</disp-formula>
<disp-formula id="equ6">
<mml:math id="mequ6">
<mml:mrow>
<mml:msup>
<mml:mi>&#x3c6;</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>:</mml:mo>
<mml:mo>&#x3d;</mml:mo>
<mml:mtext>softmax</mml:mtext>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>W</mml:mi>
<mml:mo>&#x22c5;</mml:mo>
<mml:msubsup>
<mml:mi>s</mml:mi>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</disp-formula>
<bold>Algorithm 1:</bold> Network Architecture.</p>
<p>
<bold>Algorithm 1</bold> specifies the architecture in pseudocode. <xref ref-type="fig" rid="F2">Figure 2</xref> illustrates the message passing graph for a M<sc>ax</sc>-2-SAT instance and the internal update procedure of RUN-CSP. Note that the network&#x2019;s output depends on the random initialization of the short-term states <inline-formula id="inf105">
<mml:math id="minf105">
<mml:mrow>
<mml:msubsup>
<mml:mi>s</mml:mi>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula>. Those states are the basis for all messages sent during inference and thus for the solution found by RUN-CSP. By applying the network multiple times to the same input and choosing the best solution, we can therefore boost the performance.</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption>
<p>
<bold>(A)</bold> The graph corresponding to the M<sc>ax</sc>-2-SAT-instance f &#x3d; (&#xac;X<sub>1</sub>
<inline-formula id="inf106">
<mml:math id="minf106">
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x2228;</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2227;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>&#x2228;</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mn>3</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2227;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mo>&#x2228;</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mn>3</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. The nodes for the variables are shown in green. The functions through which the variables iteratively exchange messages are shown in blue <bold>(B)</bold> An illustration of the update mechanism of RUN-CSP. The trainable weights of this function are shared across all nodes, which allows RUN-CSP to process instances with arbitrary structure.</p>
</caption>
<graphic xlink:href="frai-03-580607-g002.tif"/>
</fig>
<p>We did evaluate more complex variants of this architecture with multi-layered messaging functions and multiple stacked recurrent cells. No increase in performance was observed with these modifications, while the running time increased. Replacing the LSTM cells with GRU cells slightly decreased the performance. Therefore, we use the simple LSTM-based architecture presented here.</p>
</sec>
<sec id="s3-2">
<title>3.2 Loss Function</title>
<p>In the following we derive our loss function used for unsupervised training. Let <inline-formula id="inf107">
<mml:math id="minf107">
<mml:mrow>
<mml:mi>I</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>X</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>D</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>C</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> be a CSP-instance. Assume without loss of generality that <inline-formula id="inf108">
<mml:math id="minf108">
<mml:mrow>
<mml:mi>D</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>d</mml:mi>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> for a positive integer <italic>d</italic>. Given <italic>I</italic>, in every iteration our network will produce a soft variable assignment <inline-formula id="inf109">
<mml:math id="minf109">
<mml:mrow>
<mml:mi>&#x3c6;</mml:mi>
<mml:mo>:</mml:mo>
<mml:mi>X</mml:mi>
<mml:mo>&#x2192;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mn>0,1</mml:mn>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mi>d</mml:mi>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula>, where <inline-formula id="inf110">
<mml:math id="minf110">
<mml:mrow>
<mml:mi>&#x3c6;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> is stochastic for every <inline-formula id="inf111">
<mml:math id="minf111">
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>X</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>. Instead of choosing the value with the maximum probability in <inline-formula id="inf112">
<mml:math id="minf112">
<mml:mrow>
<mml:mi>&#x3c6;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, we could obtain a hard assignment <inline-formula id="inf113">
<mml:math id="minf113">
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
<mml:mo>:</mml:mo>
<mml:mi>X</mml:mi>
<mml:mo>&#x2192;</mml:mo>
<mml:mi>D</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> by independently sampling a value for each <inline-formula id="inf114">
<mml:math id="minf114">
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>X</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> from the distribution specified by <inline-formula id="inf115">
<mml:math id="minf115">
<mml:mrow>
<mml:mi>&#x3c6;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. In this case, the probability that any given constraint <inline-formula id="inf116">
<mml:math id="minf116">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>y</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>R</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>C</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> is satisfied by &#x3b1; can be expressed by<disp-formula id="e3">
<mml:math id="me3">
<mml:mrow>
<mml:munder>
<mml:mrow>
<mml:mtext>Pr</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
<mml:mo>&#x223c;</mml:mo>
<mml:mi>&#x3c6;</mml:mi>
</mml:mrow>
</mml:munder>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>&#x3b1;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>y</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>R</mml:mi>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>&#x3c6;</mml:mi>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mi>T</mml:mi>
</mml:msup>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mi>R</mml:mi>
</mml:msub>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mi>&#x3c6;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>y</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
<label>(3)</label>
</disp-formula>where <inline-formula id="inf117">
<mml:math id="minf117">
<mml:mrow>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mi>R</mml:mi>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mn>0,1</mml:mn>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> is the characteristic matrix of the relation <italic>R</italic> with <inline-formula id="inf118">
<mml:math id="minf118">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mi>R</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>&#x21d4;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>R</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>. We then aim to minimize the combined negative log-likelihood over all constraints:<disp-formula id="e4">
<mml:math id="me4">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="normal">&#x2112;</mml:mi>
<mml:mrow>
<mml:mtext>CSP</mml:mtext>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>&#x3c6;</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>I</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>:</mml:mo>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mi>C</mml:mi>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
<mml:mo>&#x22c5;</mml:mo>
<mml:munder>
<mml:mstyle displaystyle="true">
<mml:mo>&#x2211;</mml:mo>
</mml:mstyle>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>y</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>R</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>C</mml:mi>
</mml:mrow>
</mml:munder>
<mml:mo>&#x2212;</mml:mo>
<mml:mtext>log</mml:mtext>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>&#x3c6;</mml:mi>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mi>T</mml:mi>
</mml:msup>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mi>R</mml:mi>
</mml:msub>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mi>&#x3c6;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>y</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
<label>(4)</label>
</disp-formula>We combine the loss function <inline-formula id="inf119">
<mml:math id="minf119">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="normal">&#x2112;</mml:mi>
<mml:mrow>
<mml:mtext>CSP</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> throughout all iterations with a discount factor <inline-formula id="inf120">
<mml:math id="minf120">
<mml:mrow>
<mml:mi>&#x3bb;</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mn>0,1</mml:mn>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> to get our training objective:<disp-formula id="e5">
<mml:math id="me5">
<mml:mrow>
<mml:mi mathvariant="normal">&#x2112;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c6;</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2264;</mml:mo>
<mml:msubsup>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mtext>max</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:mtext>tr</mml:mtext>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>I</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>:</mml:mo>
<mml:mo>&#x3d;</mml:mo>
<mml:munderover>
<mml:mstyle displaystyle="true">
<mml:mo>&#x2211;</mml:mo>
</mml:mstyle>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:msubsup>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mtext>max</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:mtext>tr</mml:mtext>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:munderover>
<mml:msup>
<mml:mi>&#x3bb;</mml:mi>
<mml:mrow>
<mml:msubsup>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mtext>max</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:mtext>tr</mml:mtext>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo>&#x22c5;</mml:mo>
<mml:msub>
<mml:mi mathvariant="normal">&#x2112;</mml:mi>
<mml:mrow>
<mml:mtext>CSP</mml:mtext>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi>&#x3c6;</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:mi>I</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
<label>(5)</label>
</disp-formula>This loss function allows us to train unsupervised since it does not depend on any ground truth assignments. Furthermore, it avoids reinforcement learning, which is computationally expensive. In general, computing optimal solutions for supervised training can easily turn out to be prohibitive; our approach completely avoids such computations.</p>
<p>We remark that it is also possible to extend the framework to weighted M<sc>ax-</sc>CSPs where a real weight is associated with each constraint. To achieve this, we can replace the averages in the loss function and message collection steps by weighted averages. Negative constraint weights can be incorporated by swapping the relation with its complement. We demonstrate this in <xref ref-type="sec" rid="s4-2">Section 4.2</xref> where we evaluate RUN-CSP on the weighted M<sc>ax</sc>-C<sc>ut</sc> problem.</p>
</sec>
</sec>
<sec id="s4">
<title>4 Experiments</title>
<p>To validate our method empirically, we performed experiments for M<sc>ax</sc>-2-SAT, M<sc>ax</sc>-C<sc>ut</sc>, 3-COL and M<sc>ax</sc>-IS. For all experiments, we used internal states of size <inline-formula id="inf121">
<mml:math id="minf121">
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>128</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>; state sizes up to <inline-formula id="inf122">
<mml:math id="minf122">
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1024</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> did not increase performance for the tested instances. We empirically chose to use <inline-formula id="inf123">
<mml:math id="minf123">
<mml:mrow>
<mml:msubsup>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mtext>max</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:mtext>tr</mml:mtext>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>30</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> iterations during training and, unless stated otherwise, <inline-formula id="inf124">
<mml:math id="minf124">
<mml:mrow>
<mml:msubsup>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mtext>max</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:mtext>ev</mml:mtext>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>100</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> for evaluation. Especially for larger instances it proved beneficial to use a relatively high <inline-formula id="inf125">
<mml:math id="minf125">
<mml:mrow>
<mml:msubsup>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mtext>max</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:mtext>ev</mml:mtext>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula>. In contrast, choosing <inline-formula id="inf126">
<mml:math id="minf126">
<mml:mrow>
<mml:msubsup>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mtext>max</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:mtext>tr</mml:mtext>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula> too large during training (<inline-formula id="inf127">
<mml:math id="minf127">
<mml:mrow>
<mml:msubsup>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mtext>max</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:mtext>tr</mml:mtext>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x3e;</mml:mo>
<mml:mn>50</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>) resulted in unstable training. During evaluation, we use 64 parallel runs for each instance and use the best result. Further increasing this number mainly increases the runtime but has no real effect on the quality of solutions. We trained most models with 4,000 instances split into in 400 batches. Training is performed for 25 epochs using the Adam optimizer with default parameters and gradient clipping at a norm of 1.0. The decay over time in our loss function was set to <inline-formula id="inf128">
<mml:math id="minf128">
<mml:mrow>
<mml:mi>&#x3bb;</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0.95</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>. We provide a more detailed overview of our implementation and training configuration in the <xref ref-type="sec" rid="s35">Supplementary Material</xref>.</p>
<p>We ran our experiments on machines with two Intel Xeon 8160 CPUs and one NVIDIA Tesla V100 GPU but got very similar runtime on consumer hardware. Evaluating 64 runs on an instance with 1,000 variables and 1,000 constraints takes about 1.5&#xa0;s, 10,000 constraints about 5&#xa0;s, and 20,000 constraints about 8&#xa0;s. Training a model takes less than 30&#xa0;min. Thus, the computational cost of RUN-CSP is relatively low.</p>
<sec id="s4-1">
<title>4.1 Maximum 2-Satisfiability</title>
<p>We view M<sc>ax</sc>-2-SAT as a binary CSP with domain <inline-formula id="inf129">
<mml:math id="minf129">
<mml:mrow>
<mml:mi>D</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mn>0,1</mml:mn>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> and a constraint language consisting of three relations <inline-formula id="inf130">
<mml:math id="minf130">
<mml:mrow>
<mml:msub>
<mml:mi>R</mml:mi>
<mml:mrow>
<mml:mn>00</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> (for clauses with two negated literals), <inline-formula id="inf131">
<mml:math id="minf131">
<mml:mrow>
<mml:msub>
<mml:mi>R</mml:mi>
<mml:mrow>
<mml:mn>01</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> (one negated literal) and <inline-formula id="inf132">
<mml:math id="minf132">
<mml:mrow>
<mml:msub>
<mml:mi>R</mml:mi>
<mml:mrow>
<mml:mn>11</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> (no negated literals). For example, <inline-formula id="inf133">
<mml:math id="minf133">
<mml:mrow>
<mml:msub>
<mml:mi>R</mml:mi>
<mml:mrow>
<mml:mn>01</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mn>0,0</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mn>0,1</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mn>1,1</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> is the set of satisfying assignments for a clause (&#xac;<inline-formula id="inf134">
<mml:math id="minf134">
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>&#x2228;</mml:mo>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. For training a RUN-CSP model we used 4,000 random 2-CNF formulas with 100 variables each. The number of clauses was sampled uniformly between 100 and 600 for every formula and each clause was generated by sampling two distinct variables and then independently negating the literals with probability 0.5.</p>
<sec id="s4-1-1">
<title>4.1.1 Random Instances</title>
<p>For the evaluation of RUN-CSP in M<sc>ax</sc>-2-SAT we start with random instances and compare it to a number of problem-specific heuristics. All baselines can solve M<sc>ax</sc>-SAT for arbitrary arities, not only M<sc>ax</sc>-2-SAT, while RUN-CSP can solve a variety of binary M<sc>ax</sc>-CSPs. The state-of-the-art M<sc>ax</sc>-SAT Solver <italic>Loandra</italic> (<xref ref-type="bibr" rid="B9">Berg et al., 2019</xref>) won the unweighted track for incomplete solvers in the Max-SAT Evaluation 2019 (<xref ref-type="bibr" rid="B7">Bacchus et al., 2019</xref>). We ran Loandra in its default configuration with a timeout of 20&#xa0;min on each formula. To put this into context, on the largest evaluation instance used here (9,600 constraints) RUN-CSP takes less than 7&#xa0;min on a single CPU core and about 5&#xa0;s using the GPU. <italic>WalkSAT</italic> (<xref ref-type="bibr" rid="B36">Selman et al., 1993</xref>; <xref ref-type="bibr" rid="B26">Kautz, 2019</xref>) is a stochastic local search algorithm for approximating M<sc>ax-</sc>S<sc>at</sc>. We allowed WalkSAT to perform 10 million flips on each formula using its &#x201c;noise&#x201d; strategy with parameters <inline-formula id="inf135">
<mml:math id="minf135">
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf136">
<mml:math id="minf136">
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>2000</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>. Its performance was boosted similarly to RUN-CSP by performing 64 runs and selecting the best result.</p>
<p>For evaluation we generated random formulas with 100, 400, 800, and 1,600 variables. The ratio between clauses and variables was varied in steps of 0.1 from 1 to 6. <xref ref-type="fig" rid="F3">Figure 3</xref> shows the average percentage of satisfied clauses in the solutions found by each method over 100 formulas for each size and density. The methods yield virtually identical results for formulas with less than 2 clauses per variable. For denser instances, RUN-CSP yields slightly worse results than both baselines when only 100 variables are present. However, RUN-CSP matches the results of Loandra for formulas with 400 variables and outperforms it for instances with 800 and 1,600 variables. The performance of WalkSAT degrades on these formulas and is significantly worse than RUN-CSP.</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption>
<p>Percentage of satisfied clauses of random 2-CNF formulas for RUN-CSP, Loandra and WalkSAT. Each data point is the average of 100 formulas; the ratio of clauses per variable increases in steps of 0.1.</p>
</caption>
<graphic xlink:href="frai-03-580607-g003.tif"/>
</fig>
</sec>
<sec id="s4-1-2">
<title>4.1.2 Benchmark Instances</title>
<p>For more structured formulas, we use M<sc>ax</sc>-2-SAT benchmark instances from the unweighted track of the M<sc>ax-</sc>SAT Evaluation 2016 (<xref ref-type="bibr" rid="B6">Argelich, 2016</xref>) based on the Ising spin glass problem (<xref ref-type="bibr" rid="B16">De Simone et al., 1995</xref>; <xref ref-type="bibr" rid="B23">Heras et al., 2008</xref>). We used the same general setup as in the previous experiment but increased the timeout for Loandra to 60&#xa0;min. In particular we use the same RUN-CSP model trained entirely on random formulas. <xref ref-type="table" rid="T1">Table 1</xref> contains the achieved numbers of unsatisfied constraints across the benchmark instances. All methods produced optimal results on the first and the third instance. RUN-CSP slightly deviates from the optimum on the second instance. For the fourth instance RUN-CSP found an optimal solution while both WalkSAT and Loandra did not. On the largest benchmark formula, RUN-CSP again produced the best result.</p>
<table-wrap id="T1" position="float">
<label>TABLE 1</label>
<caption>
<p>M<sc>ax</sc>-2-SAT: Number of unsatisfied constraints for M<sc>ax</sc>-2-SAT benchmark instances derived from the Ising spin glass problem.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th>Instance</th>
<th align="center">
<inline-formula id="inf137">
<mml:math id="minf137">
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mi>V</mml:mi>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>
</th>
<th align="center">
<inline-formula id="inf138">
<mml:math id="minf138">
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mi>C</mml:mi>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>
</th>
<th align="center">Opt</th>
<th align="center">RUN-CSP</th>
<th align="center">WalkSAT</th>
<th align="center">Loandra</th>
</tr>
</thead>
<tbody>
<tr>
<td>t3pm3</td>
<td>27</td>
<td>162</td>
<td>17</td>
<td>17</td>
<td>17</td>
<td>17</td>
</tr>
<tr>
<td>t4pm3</td>
<td>64</td>
<td>384</td>
<td>38</td>
<td>40</td>
<td>38</td>
<td>38</td>
</tr>
<tr>
<td>t5pm3</td>
<td>125</td>
<td>750</td>
<td>78</td>
<td>78</td>
<td>78</td>
<td>78</td>
</tr>
<tr>
<td>t6pm3</td>
<td>216</td>
<td>1,269</td>
<td>136</td>
<td>136</td>
<td>142</td>
<td>142</td>
</tr>
<tr>
<td>t7pm3</td>
<td>343</td>
<td>2,058</td>
<td>209</td>
<td>216</td>
<td>227</td>
<td>225</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Thus, RUN-CSP is competitive for random as well as spin-glass-based structured M<sc>ax</sc>-2-SAT instances. Especially on larger instances it also outperforms conventional methods. Furthermore, training on random instances generalized well to the structured spin-glass instances.</p>
</sec>
</sec>
<sec id="s4-2">
<title>4.2 Max Cut</title>
<p>M<sc>ax</sc>-C<sc>ut</sc> is a classical Max-CSP with domain <inline-formula id="inf139">
<mml:math id="minf139">
<mml:mrow>
<mml:mi>D</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mn>0,1</mml:mn>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> and only one relation <inline-formula id="inf140">
<mml:math id="minf140">
<mml:mrow>
<mml:msub>
<mml:mi>R</mml:mi>
<mml:mo>&#x2260;</mml:mo>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mn>0,1</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mn>1,0</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> used in the constraints.</p>
<sec id="s4-2-1">
<title>4.2.1 Regular Graphs</title>
<p>In this section we evaluate RUN-CSP&#x2019;s performance on this problem. <xref ref-type="bibr" rid="B43">Yao et al., (2019)</xref> proposed two unsupervised GNN architectures for M<sc>ax</sc>-C<sc>ut</sc>. One was trained through policy gradient descent on a non-differentiable loss function while the other used a differentiable relaxation of this loss. They evaluated their architectures on random regular graphs, where the asymptotic M<sc>ax</sc>-C<sc>ut</sc> optimum is known. We use their results as well as their baseline results for Extremal Optimization (EO) (<xref ref-type="bibr" rid="B10">Boettcher and Percus, 2001</xref>) and a classical approach based on semi-definite programming (SDP) (<xref ref-type="bibr" rid="B21">Goemans and Williamson, 1995</xref>) as baselines for RUN-CSP. To evaluate the sizes of graph cuts, <xref ref-type="bibr" rid="B43">Yao et al., (2019)</xref> introduced a relative performance measure called <italic>P-value</italic> given by <inline-formula id="inf141">
<mml:math id="minf141">
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>z</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>z</mml:mi>
<mml:mo>/</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>d</mml:mi>
<mml:mo>/</mml:mo>
<mml:mn>4</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:msqrt>
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mo>/</mml:mo>
<mml:mn>4</mml:mn>
</mml:mrow>
</mml:msqrt>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</inline-formula> where <italic>z</italic> is the predicted cut size for a <italic>d</italic>-regular graph with <italic>n</italic> nodes. Based on results of <xref ref-type="bibr" rid="B18">Dembo et al., (2017)</xref>, they showed that the expected <italic>P</italic>-value of <italic>d</italic>-regular graphs approaches <inline-formula id="inf142">
<mml:math id="minf142">
<mml:mrow>
<mml:msup>
<mml:mi>P</mml:mi>
<mml:mtext>&#x2a;</mml:mtext>
</mml:msup>
<mml:mo>&#x2248;</mml:mo>
<mml:mn>0.7632</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> as <inline-formula id="inf143">
<mml:math id="minf143">
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x2192;</mml:mo>
<mml:mi>&#x221e;</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>. <italic>P</italic>-values close to <inline-formula id="inf144">
<mml:math id="minf144">
<mml:mrow>
<mml:msup>
<mml:mi>P</mml:mi>
<mml:mtext>&#x2a;</mml:mtext>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> indicate a cut where the size is close to the expected optimum and larger values are better. While Yao et al. trained one instance of their GNN for each tested degree, we trained one network model on 4,000 Erd&#x151;s&#x2013;R&#xe9;nyi graphs and applied it to all graphs. For training, each graph had a node count of <inline-formula id="inf145">
<mml:math id="minf145">
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>100</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> and a uniformly sampled number of edges <inline-formula id="inf146">
<mml:math id="minf146">
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mo>&#x223c;</mml:mo>
<mml:mi>U</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mn>100,2000</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. Thus, the model was not trained specifically for regular graphs. <xref ref-type="table" rid="T2">Table 2</xref> reports the mean <italic>P</italic>-values across 1,000 random regular graphs with 500 nodes for different degrees. For every method other than RUN-CSP, we provide the values as reported by Yao et al. While RUN-CSP does not match the cut sizes produced by extremal optimization, it clearly outperforms both versions of the GNN as well as the classical SDP-based approach.</p>
<table-wrap id="T2" position="float">
<label>TABLE 2</label>
<caption>
<p>M<sc>ax</sc>-C<sc>ut</sc>: <italic>P</italic>-values of graph cuts produced by RUN-CSP, Yao, SDP, and EO for regular graphs with 500 nodes and varying degrees. We report the mean across 1,000 random graphs for each degree.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th>d</th>
<th align="center">RUN-CSP</th>
<th align="center">Yao Rel</th>
<th align="center">Yao Pol</th>
<th align="center">SDP</th>
<th align="center">EO</th>
</tr>
</thead>
<tbody>
<tr>
<td>3</td>
<td>0.714</td>
<td>0.707</td>
<td>0.693</td>
<td>0.702</td>
<td>0.727</td>
</tr>
<tr>
<td>5</td>
<td>0.726</td>
<td>0.701</td>
<td>0.668</td>
<td>0.690</td>
<td>0.737</td>
</tr>
<tr>
<td>10</td>
<td>0.710</td>
<td>0.670</td>
<td>0.599</td>
<td>0.682</td>
<td>0.735</td>
</tr>
<tr>
<td>15</td>
<td>0.697</td>
<td>0.607</td>
<td>0.629</td>
<td>0.678</td>
<td>0.736</td>
</tr>
<tr>
<td>20</td>
<td>0.685</td>
<td>0.614</td>
<td>0.626</td>
<td>0.674</td>
<td>0.732</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s4-2-2">
<title>4.2.2 Benchmark Instances</title>
<p>We performed additional experiments on standard M<sc>ax</sc>-C<sc>ut</sc> benchmark instances. The Gset dataset (<xref ref-type="bibr" rid="B44">Ye, 2003</xref>) is a set of 71 weighted and unweighted graphs that are commonly used for testing M<sc>ax</sc>-C<sc>ut</sc> algorithms. The dataset contains three different types of random graphs. Those graphs are Erd&#x151;s&#x2013;R&#xe9;nyi graphs with uniform edge probability, graphs where the connectivity gradually decays from node 1 to <italic>n</italic>, and 4-regular toroidal graphs. Here, we use two unweighted graphs for each type from this dataset. We reused the RUN-CSP model from the previous experiment but increased the number of iterations for evaluation to <inline-formula id="inf147">
<mml:math id="minf147">
<mml:mrow>
<mml:msubsup>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mtext>max</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:mtext>ev</mml:mtext>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>500</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>. Our first baseline by <xref ref-type="bibr" rid="B13">Choi and Ye (2000)</xref> uses an SDP solver based on dual scaling (DSDP) and a reduction based on the approach of <xref ref-type="bibr" rid="B21">Goemans and Williamson (1995)</xref>. Our second baseline Breakout Local Search (BLS) is based on the combination of local search and adaptive perturbation (<xref ref-type="bibr" rid="B8">Benlic and Hao, 2013</xref>). Its results are among the best known solutions for the Gset dataset. For DSDP and BLS we report the values as provided in the literature. <xref ref-type="table" rid="T3">Table 3</xref> reports the achieved cut sizes for RUN-CSP, DSDP, and BLS. On G14 and G15, which are random graphs with decaying node degree, the graph cuts produced by RUN-CSP are similar in size to those reported for DSDP. For the Erd&#x151;s&#x2013;R&#xe9;nyi graphs G22 and G55 RUN-CSP performs better than DSDP but worse than BLS. Lastly, on the toroidal graphs G49 and G50 all three methods achieved the best known cut size. This reaffirms the observation that our architecture works particularly well for regular graphs. Although RUN-CSP did not outperform the state-of-the-art heuristic in this experiment it performed at least as well as the SDP based approach DSDP.</p>
<table-wrap id="T3" position="float">
<label>TABLE 3</label>
<caption>
<p>M<sc>ax</sc>-C<sc>ut</sc>: Achieved cut sizes on Gset instances for RUN-CSP, DSDP, and BLS.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th>Graph</th>
<th align="center">
<inline-formula id="inf148">
<mml:math id="minf148">
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mi>V</mml:mi>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>
</th>
<th align="center">
<inline-formula id="inf149">
<mml:math id="minf149">
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mi>E</mml:mi>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>
</th>
<th align="center">RUN-CSP</th>
<th align="center">DSDP</th>
<th align="center">BLS</th>
</tr>
</thead>
<tbody>
<tr>
<td>G14</td>
<td>800</td>
<td>4,694</td>
<td>2,943</td>
<td>2,922</td>
<td>3,064</td>
</tr>
<tr>
<td>G15</td>
<td>800</td>
<td>4,661</td>
<td>2,928</td>
<td>2,938</td>
<td>3,050</td>
</tr>
<tr>
<td>G22</td>
<td>2,000</td>
<td>19,990</td>
<td>13,028</td>
<td>12,960</td>
<td>13,359</td>
</tr>
<tr>
<td>G49</td>
<td>3,000</td>
<td>6,000</td>
<td>6,000</td>
<td>6,000</td>
<td>6,000</td>
</tr>
<tr>
<td>G50</td>
<td>3,000</td>
<td>6,000</td>
<td>5,880</td>
<td>5,880</td>
<td>5,880</td>
</tr>
<tr>
<td>G55</td>
<td>5,000</td>
<td>12,468</td>
<td>10,116</td>
<td>9,960</td>
<td>10,294</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s4-2-3">
<title>4.2.3 Weighted Maximum Cut Problem</title>
<p>Additionally, we evaluate RUN-CSP on the weighted M<sc>ax</sc>-C<sc>ut</sc> problem, where every edge <inline-formula id="inf150">
<mml:math id="minf150">
<mml:mrow>
<mml:mi>e</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>E</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> has an associated weight <inline-formula id="inf151">
<mml:math id="minf151">
<mml:mrow>
<mml:msub>
<mml:mi>w</mml:mi>
<mml:mi>e</mml:mi>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. The aim is to maximize the objective:<disp-formula id="equ7">
<mml:math id="mequ7">
<mml:mrow>
<mml:mtext>&#x398;</mml:mtext>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>S</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>T</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:munder>
<mml:mstyle displaystyle="true">
<mml:mo>&#x2211;</mml:mo>
</mml:mstyle>
<mml:mrow>
<mml:mi>e</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>E</mml:mi>
<mml:msup>
<mml:mstyle displaystyle="true">
<mml:mo>&#x2229;</mml:mo>
</mml:mstyle>
<mml:mtext>&#x200b;</mml:mtext>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>S</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:mi>T</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:munder>
<mml:msub>
<mml:mi>w</mml:mi>
<mml:mi>e</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>where the partition <inline-formula id="inf152">
<mml:math id="minf152">
<mml:mrow>
<mml:mi>S</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>T</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> of <italic>V</italic> defines a cut. We can apply RUN-CSP to this problem by training a model for the constraint language <inline-formula id="inf153">
<mml:math id="minf153">
<mml:mrow>
<mml:mtext>&#x393;</mml:mtext>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>R</mml:mi>
<mml:mo>&#x3d;</mml:mo>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>R</mml:mi>
<mml:mo>&#x2260;</mml:mo>
</mml:msub>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> over the domain <inline-formula id="inf154">
<mml:math id="minf154">
<mml:mrow>
<mml:mi>D</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mn>0,1</mml:mn>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. Here, <inline-formula id="inf155">
<mml:math id="minf155">
<mml:mrow>
<mml:msub>
<mml:mi>R</mml:mi>
<mml:mo>&#x3d;</mml:mo>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf156">
<mml:math id="minf156">
<mml:mrow>
<mml:msub>
<mml:mi>R</mml:mi>
<mml:mo>&#x2260;</mml:mo>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> are the equality and inequality relations, respectively. We model every positive edge as a constraint with <inline-formula id="inf157">
<mml:math id="minf157">
<mml:mrow>
<mml:msub>
<mml:mi>R</mml:mi>
<mml:mo>&#x2260;</mml:mo>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and every negative edge with <inline-formula id="inf158">
<mml:math id="minf158">
<mml:mrow>
<mml:msub>
<mml:mi>R</mml:mi>
<mml:mo>&#x3d;</mml:mo>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>. We trained a RUN-CSP network on 4,000 random Erd&#x151;s&#x2013;R&#xe9;nyi graphs with <inline-formula id="inf159">
<mml:math id="minf159">
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>100</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> nodes and <inline-formula id="inf160">
<mml:math id="minf160">
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mo>&#x223c;</mml:mo>
<mml:mi>U</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mn>100,300</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> edges. The weights <inline-formula id="inf161">
<mml:math id="minf161">
<mml:mrow>
<mml:msub>
<mml:mi>w</mml:mi>
<mml:mi>e</mml:mi>
</mml:msub>
<mml:mo>&#x223c;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> were drawn uniformly for each edge.</p>
<p>We evaluate this model on 10 benchmark instances obtained from the Optsicom Project<xref ref-type="fn" rid="FN2">
<sup>2</sup>
</xref>, namely the 10 smallest graphs of set 2. These instances are based on the lsing spin glass problem and are commonly used to evaluate heuristics empirically. All 10 graphs have <inline-formula id="inf162">
<mml:math id="minf162">
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>125</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> nodes and <inline-formula id="inf163">
<mml:math id="minf163">
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>375</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> edges. <xref ref-type="bibr" rid="B27">Khalil et al., (2017)</xref> utilize reinforcement learning to guide greedy search heuristics for combinatorial problems including weighted M<sc>ax</sc>-C<sc>ut</sc>. They evaluated their method on the same benchmark instances for weighted M<sc>ax</sc>-C<sc>ut</sc> and compared the performance to a classical greedy heuristic (<xref ref-type="bibr" rid="B28">Kleinberg and Tardos, 2006</xref>) and an SDP-based method (<xref ref-type="bibr" rid="B21">Goemans and Williamson, 1995</xref>). Furthermore, they approximated the optimal values by running CPLEX for 1&#xa0;h on every instance. We use their reported results and baselines for a comparison with RUN-CSP. Crucially, <xref ref-type="bibr" rid="B27">Khalil et al., (2017)</xref> trained their network on random variations of the benchmark instances, while RUN-CSP was trained on purely random data. <xref ref-type="table" rid="T4">Table 4</xref> provides the achieved cut sizes. On all but one benchmark instance RUN-CSP yields the largest cuts and on five out of 10 instances it even found the optimal cut value. The classical approaches based on Greedy Search and SDP performed substantially worse than both neural methods.</p>
<table-wrap id="T4" position="float">
<label>TABLE 4</label>
<caption>
<p>M<sc>ax</sc>-C<sc>ut</sc>: Achieved cut sizes on Optsicom Benchmarks. The optimal values were estimated by <xref ref-type="bibr" rid="B27">Khalil et al., (2017)</xref> by running CPLEX for 1&#xa0;h on each instance.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th>Graphs</th>
<th align="center">Opt</th>
<th align="center">RUN-CSP</th>
<th align="center">Khalil et al</th>
<th align="center">Greedy</th>
<th align="center">SDP</th>
</tr>
</thead>
<tbody>
<tr>
<td>G54100</td>
<td>110</td>
<td>110</td>
<td>108</td>
<td>80</td>
<td>54</td>
</tr>
<tr>
<td>G54200</td>
<td>112</td>
<td>112</td>
<td>108</td>
<td>90</td>
<td>58</td>
</tr>
<tr>
<td>G54300</td>
<td>106</td>
<td>106</td>
<td>104</td>
<td>86</td>
<td>60</td>
</tr>
<tr>
<td>G54400</td>
<td>114</td>
<td>112</td>
<td>108</td>
<td>96</td>
<td>56</td>
</tr>
<tr>
<td>G54500</td>
<td>112</td>
<td>112</td>
<td>112</td>
<td>94</td>
<td>56</td>
</tr>
<tr>
<td>G54600</td>
<td>110</td>
<td>110</td>
<td>110</td>
<td>88</td>
<td>66</td>
</tr>
<tr>
<td>G54700</td>
<td>112</td>
<td>110</td>
<td>108</td>
<td>88</td>
<td>60</td>
</tr>
<tr>
<td>G54800</td>
<td>108</td>
<td>106</td>
<td>108</td>
<td>76</td>
<td>54</td>
</tr>
<tr>
<td>G54900</td>
<td>110</td>
<td>108</td>
<td>108</td>
<td>88</td>
<td>68</td>
</tr>
<tr>
<td>G541000</td>
<td>112</td>
<td>110</td>
<td>108</td>
<td>80</td>
<td>54</td>
</tr>
<tr>
<td>Approx. Ratio</td>
<td>1.0</td>
<td>1.01</td>
<td>1.02</td>
<td>1.28</td>
<td>1.90</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
</sec>
<sec id="s4-3">
<title>4.3 Coloring</title>
<p>Within coloring we focus on the case of three colors, i.e., we consider CSPs over the domain <inline-formula id="inf164">
<mml:math id="minf164">
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mn>1,2,3</mml:mn>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> with the inequality relation <inline-formula id="inf165">
<mml:math id="minf165">
<mml:mrow>
<mml:msub>
<mml:mi>R</mml:mi>
<mml:mo>&#x2260;</mml:mo>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>. In general, RUN-CSP aims to satisfy as many constraints as possible and therefore approximates M<sc>ax</sc>-3-C<sc>ol</sc>. Instead of evaluating on M<sc>ax</sc>-3-C<sc>ol</sc>, we evaluate on its practically more relevant decision variant 3-COL which asks whether a given graph is 3-colorable without conflicts. We turn RUN-CSP into a classifier by predicting that a given input graph is 3-colorable if and only if it is able to find a conflict-free vertex coloring.</p>
<sec id="s4-3-1">
<title>4.3.1 Hard Instances</title>
<p>We evaluate RUN-CSP on so-called &#x201c;hard&#x201d; random instances, similar to those defined by <xref ref-type="bibr" rid="B30">Lemos et al., (2019)</xref>. These instances are a special subclass of Erd&#x151;s&#x2013;R&#xe9;nyi graphs where an additional edge can make the graph no longer 3-colorable. We describe our exact generation procedure in the <xref ref-type="sec" rid="s35">Supplementary Material</xref>. We trained five RUN-CSP models on 4,000 hard 3-colorable instances with 100 nodes each. In <xref ref-type="table" rid="T5">Table 5</xref> we present results for RUN-CSP, a greedy heuristic with DSatur strategy (<xref ref-type="bibr" rid="B11">Br&#xe9;laz, 1979</xref>), and the state-of-the-art heuristic HybridEA (<xref ref-type="bibr" rid="B19">Galinier and Hao, 1999</xref>; <xref ref-type="bibr" rid="B32">Lewis et al., 2012</xref>; <xref ref-type="bibr" rid="B31">Lewis, 2015</xref>). HybridEA was allowed to make 500 million constraint checks on each graph. We observe that larger instances are harder for all tested methods and between the three algorithms there is a clear hierarchy. The state-of-the-art heuristic HybridEA clearly performs best and finds solutions even for some of the largest graphs. RUN-CSP finds optimal colorings for a large fraction of graphs with up to 100 nodes and even a few correct colorings for graphs of size 200. The weakest algorithm is DSatur which even fails on most of the small 50 node graphs and gets rapidly worse for larger instances.</p>
<table-wrap id="T5" position="float">
<label>TABLE 5</label>
<caption>
<p>3-COL: Percentages of hard 3-colorable instances for which optimal 3-colorings were found by RUN-CSP, Greedy, and HybridEA. We evaluate on 1,000 instances for each size. We provide mean and standard deviation across five different RUN-CSP models.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th>Nodes</th>
<th align="center">RUN-CSP</th>
<th align="center">Greedy</th>
<th align="center">HybridEA</th>
</tr>
</thead>
<tbody>
<tr>
<td>50</td>
<td>98.4 &#xb1; 0.3</td>
<td>34.0</td>
<td align="center">100.0</td>
</tr>
<tr>
<td>100</td>
<td>62.5 &#xb1; 2.7</td>
<td>6.7</td>
<td align="center">100.0</td>
</tr>
<tr>
<td>150</td>
<td>15.5 &#xb1; 2.3</td>
<td>1.5</td>
<td align="center">98.7</td>
</tr>
<tr>
<td>200</td>
<td align="center">2.6 &#xb1; 0.4</td>
<td>0.5</td>
<td align="center">88.9</td>
</tr>
<tr>
<td>300</td>
<td align="center">0.1 &#xb1; 0.0</td>
<td>0.0</td>
<td align="center">39.9</td>
</tr>
<tr>
<td>400</td>
<td align="center">0.0 &#xb1; 0.0</td>
<td>0.0</td>
<td align="center">15.3</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Choosing larger or more training graphs for RUN-CSP did not significantly improve its performance on larger hard graphs. We assume that a combination of increasing the state size, complexity of the message generation functions, and number and size of training instances is able to achieve better results, but on the cost of efficiency.</p>
<p>In <xref ref-type="table" rid="T5">Table 5</xref> we do not report results for GNN-GCP by <xref ref-type="bibr" rid="B30">Lemos et al., (2019)</xref> as the structure of the output is fundamentally different. While the three algorithms in <xref ref-type="table" rid="T5">Table 5</xref> output a coloring, GNN-GCP outputs a guess on the chromatic number without providing a proof that this is achievable. We trained instances of GNN-GCP on 32,000 pairs of hard graphs of size 40 to 60 (small) and 50 to 100 (medium). For testing, we restricted the model to only choose between the chromatic numbers 3 and 4, when allowing a wider range of possible values, the accuracy of GNN-GCP drops considerably. The network was able to achieve test accuracies of 75% (respectively 65% when trained and evaluated on medium instances). The model generalizes fairly well, with the small model achieving 64% on the medium test set and the large model achieving 74% on the small test set, almost matching the performance of the network trained on graphs of the respective size. On a set of test instances of hard graphs with 150 nodes, GNN-GCP achieved an accuracy of 52% (54% for the model trained on medium instances). Thus, the model performs significantly worse than RUN-CSP which achieves 81% (GNN-GCP 59%) accuracy on a test set of graphs of size 100, and 68% on graphs of size 150 where GNN-GCP achieves up to 54%. The numbers for RUN-CSP are larger than those reported in <xref ref-type="table" rid="T5">Table 5</xref> since in the table only 3-colorable instances were considered. Here, the accuracy is computed over 3-colorable instances as well as their non-3-colorable counter parts. By design, RUN-CSP achieves perfect classification on negative instances.</p>
<p>Overall, we see that despite being designed for maximization tasks, RUN-CSP outperforms greedy heuristics and neural baselines on the decision variant of 3-COL for hard random instances.</p>
</sec>
<sec id="s4-3-2">
<title>4.3.2 Structure Specific Performance</title>
<p>On the example of the coloring problem, we evaluate generalization to other graph classes. We expect a network trained on instances of a particular structure to adapt toward this class and outperform models trained on different graph classes. We briefly evaluate this hypothesis for four different classes of graphs.</p>
<p>
<bold>Erd&#x151;s&#x2013;R&#xe9;nyi Graphs:</bold> Graphs are generated by uniformly sampling <italic>m</italic> distinct edges between <italic>n</italic> nodes.</p>
<p>
<bold>Geometric Graphs:</bold> A graph is generated by first assigning random positions within a <inline-formula id="inf166">
<mml:math id="minf166">
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#xd7;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> square to <italic>n</italic> distinct nodes. Then an edge is added for every pair of points with a distance less than <italic>r</italic>.</p>
<p>
<bold>Powerlaw-Cluster Graphs:</bold> This graph model was introduced by <xref ref-type="bibr" rid="B24">Holme and Kim (2002)</xref>. Each graph is generated by iteratively adding <italic>n</italic> nodes and connected to <italic>m</italic> existing nodes. After each edge is added, a triangle is closed with probability <italic>p</italic>, i.e., an additional edge is added between the new node and a random neighbor of the other endpoint of the edge.</p>
<p>
<bold>Regular Graphs:</bold> We consider random 5-regular graphs as an example for graphs with a very specific structure.</p>
<p>We trained five RUN-CSP models on 4,000 random instances of each type where each graph had between 50 and 100 nodes. We refer to these groups of models as <inline-formula id="inf167">
<mml:math id="minf167">
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mrow>
<mml:mtext>ER</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf168">
<mml:math id="minf168">
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mrow>
<mml:mtext>Geo</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf169">
<mml:math id="minf169">
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mrow>
<mml:mtext>Pow</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf170">
<mml:math id="minf170">
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mrow>
<mml:mtext>Reg</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>. Five additional models <inline-formula id="inf171">
<mml:math id="minf171">
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mrow>
<mml:mtext>Mix</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> were trained on a mixed dataset with 1,000 random instances of each graph class. The exact parameters for generating the graphs can be found in the <xref ref-type="sec" rid="s35">Supplementary Material</xref>. Note that the parameters for each class were purposefully chosen such that most graphs are not 3-colorable. This allows us to evaluate the relative performance on the maximization task. <xref ref-type="table" rid="T6">Table 6</xref> contains the percentage of unsatisfied constraints over the models on 1,000 fresh graphs of each class. We observe that all models perform well on the class of structures they were trained on and <inline-formula id="inf172">
<mml:math id="minf172">
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mrow>
<mml:mtext>Reg</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> yields the worst performance on all other classes. Both <inline-formula id="inf173">
<mml:math id="minf173">
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mrow>
<mml:mtext>Geo</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf174">
<mml:math id="minf174">
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mrow>
<mml:mtext>Pow</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> outperform <inline-formula id="inf175">
<mml:math id="minf175">
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mrow>
<mml:mtext>ER</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> on Erd&#x151;s&#x2013;R&#xe9;nyi graphs while <inline-formula id="inf176">
<mml:math id="minf176">
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mrow>
<mml:mtext>ER</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> outperforms <inline-formula id="inf177">
<mml:math id="minf177">
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mrow>
<mml:mtext>Geo</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> on Powerlaw-Cluster and <inline-formula id="inf178">
<mml:math id="minf178">
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mrow>
<mml:mtext>Pow</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> on geometric graphs. When averaging over all four classes, <inline-formula id="inf179">
<mml:math id="minf179">
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mrow>
<mml:mtext>Mix</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> produces the best results, despite not achieving the best results for any particular class. Additionally, we observe a very low variance in performance between the different models trained on the same dataset. Only the models trained on relatively narrow graph classes, namely regular graphs and to some extent also Powerlaw-Cluster graphs, exhibit a higher variance.</p>
<table-wrap id="T6" position="float">
<label>TABLE 6</label>
<caption>
<p>M<sc>ax</sc>-3-C<sc>ol</sc>: Percentages of unsatisfied constraints for each graph class under the different RUN-CSP models. Values are averaged over 1,000 graphs and the standard deviation is computed with respect to the five RUN-CSP models.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th>Graphs</th>
<th align="center">
<inline-formula id="inf180">
<mml:math id="minf180">
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mrow>
<mml:mtext>ER</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> <inline-formula id="inf181">
<mml:math id="minf181">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mtext>%</mml:mtext>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>
</th>
<th align="center">
<inline-formula id="inf182">
<mml:math id="minf182">
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mrow>
<mml:mtext>Geo</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> <inline-formula id="inf183">
<mml:math id="minf183">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mtext>%</mml:mtext>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>
</th>
<th align="center">
<inline-formula id="inf184">
<mml:math id="minf184">
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mrow>
<mml:mtext>Pow</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> <inline-formula id="inf185">
<mml:math id="minf185">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mtext>%</mml:mtext>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>
</th>
<th align="center">
<inline-formula id="inf186">
<mml:math id="minf186">
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mrow>
<mml:mtext>Reg</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> <inline-formula id="inf187">
<mml:math id="minf187">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mtext>%</mml:mtext>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>
</th>
<th align="center">
<inline-formula id="inf188">
<mml:math id="minf188">
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mrow>
<mml:mtext>Mix</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> <inline-formula id="inf189">
<mml:math id="minf189">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mtext>%</mml:mtext>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>
</th>
</tr>
</thead>
<tbody>
<tr>
<td>Erdos-Renyi</td>
<td align="center">4.75 &#xb1; 0.01</td>
<td align="center">4.73 &#xb1; 0.02</td>
<td align="center">4.72 &#xb1; 0.02</td>
<td align="center">6.69 &#xb1; 1.60</td>
<td align="center">4.73 &#xb1; 0.01</td>
</tr>
<tr>
<td>Geometric</td>
<td>10.33 &#xb1; 0.07</td>
<td align="center">10.16 &#xb1; 0.04</td>
<td align="center">11.39 &#xb1; 0.66</td>
<td align="center">18.99 &#xb1; 3.32</td>
<td valign="top" align="char" char="plusmn">10.18 &#xb1; 0.03</td>
</tr>
<tr>
<td>Pow. Cluster</td>
<td align="center">1.89 &#xb1; 0.00</td>
<td align="center">1.96 &#xb1; 0.01</td>
<td align="center">1.87 &#xb1; 0.00</td>
<td align="center">2.44 &#xb1; 0.67</td>
<td valign="top" align="char" char="plusmn">1.89 &#xb1; 0.00</td>
</tr>
<tr>
<td>Regular</td>
<td align="center">2.33 &#xb1; 0.01</td>
<td align="center">2.41 &#xb1; 0.03</td>
<td align="center">2.33 &#xb1; 0.02</td>
<td align="center">2.32 &#xb1; 0.00</td>
<td valign="top" align="char" char="plusmn">2.33 &#xb1; 0.00</td>
</tr>
<tr>
<td>Mean</td>
<td align="center">4.83 &#xb1; 0.02</td>
<td>4.82 &#xb1; 0.03</td>
<td align="center">5.08 &#xb1; 0.18</td>
<td align="center">7.61 &#xb1; 1.40</td>
<td valign="top" align="char" char="plusmn">4.78 &#xb1; 0.01</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Overall, this demonstrates that training on locally diverse graphs (e.g., geometric graphs or a mixture of graph classes) leads to good generalization toward other graph classes. While all tested networks achieved competitive results on the structure that they were trained on, they were not always the best for that particular structure. Therefore, our original hypothesis appears to be overly simplistic and restricting the training data to the structure of the evaluation instances is not necessarily optimal.</p>
</sec>
</sec>
<sec id="s4-4">
<title>4.4 Independent Set</title>
<p>Finally, we experimented with the maximum independent set problem M<sc>ax</sc>-IS. The independence condition can be modeled through a constraint language <inline-formula id="inf190">
<mml:math id="minf190">
<mml:mrow>
<mml:msub>
<mml:mtext>&#x393;</mml:mtext>
<mml:mrow>
<mml:mtext>IS</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> with one binary relation <inline-formula id="inf191">
<mml:math id="minf191">
<mml:mrow>
<mml:msub>
<mml:mi>R</mml:mi>
<mml:mrow>
<mml:mtext>IS</mml:mtext>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mn>0,0</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mn>0,1</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mn>1,0</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. Here, assigning the value 1 to a variable is interpreted as including the corresponding node in the independent set. M<sc>ax</sc>-IS is <italic>not</italic> simply M<sc>ax</sc>-CSP(<inline-formula id="inf192">
<mml:math id="minf192">
<mml:mrow>
<mml:msub>
<mml:mo>&#x393;</mml:mo>
<mml:mrow>
<mml:mtext>IS</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>), since the empty set will trivially satisfy all constraints. Instead, M<sc>ax</sc>-IS is the problem of finding an assignment which satisfies <inline-formula id="inf193">
<mml:math id="minf193">
<mml:mrow>
<mml:msub>
<mml:mi>R</mml:mi>
<mml:mrow>
<mml:mtext>IS</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> at all edges while maximizing an additional objective function that measures the size of the independent set. To model this in our framework, we extend the loss function to reward assignments with many variables set to 1. For a graph <inline-formula id="inf194">
<mml:math id="minf194">
<mml:mrow>
<mml:mi>G</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>V</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>E</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> and a soft assignment <inline-formula id="inf195">
<mml:math id="minf195">
<mml:mrow>
<mml:mi>&#x3c6;</mml:mi>
<mml:mo>:</mml:mo>
<mml:mi>V</mml:mi>
<mml:mo>&#x2192;</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mn>0,1</mml:mn>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, we define<disp-formula id="e6">
<mml:math id="me6">
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:msub>
<mml:mi mathvariant="normal">&#x2112;</mml:mi>
<mml:mrow>
<mml:mtext>MIS</mml:mtext>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>&#x3c6;</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>&#x3ba;</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi mathvariant="normal">&#x2112;</mml:mi>
<mml:mrow>
<mml:mtext>CSP</mml:mtext>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>&#x3c6;</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x22c5;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi mathvariant="normal">&#x2112;</mml:mi>
<mml:mrow>
<mml:mtext>size</mml:mtext>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>&#x3c6;</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:msub>
<mml:mi mathvariant="normal">&#x2112;</mml:mi>
<mml:mrow>
<mml:mtext>size</mml:mtext>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>&#x3c6;</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mi>V</mml:mi>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
<mml:munder>
<mml:mstyle displaystyle="true">
<mml:mo>&#x2211;</mml:mo>
</mml:mstyle>
<mml:mrow>
<mml:mi>v</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>V</mml:mi>
</mml:mrow>
</mml:munder>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>&#x3c6;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>.</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
<label>(6)</label>
</disp-formula>Here, <inline-formula id="inf196">
<mml:math id="minf196">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="normal">&#x2112;</mml:mi>
<mml:mrow>
<mml:mtext>CSP</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the standard RUN-CSP loss for <inline-formula id="inf197">
<mml:math id="minf197">
<mml:mrow>
<mml:msub>
<mml:mtext>&#x393;</mml:mtext>
<mml:mrow>
<mml:mtext>IS</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and &#x3ba; adjusts the relative importance of <inline-formula id="inf198">
<mml:math id="minf198">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="normal">&#x2112;</mml:mi>
<mml:mrow>
<mml:mtext>CSP</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf199">
<mml:math id="minf199">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="normal">&#x2112;</mml:mi>
<mml:mrow>
<mml:mtext>size</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>. Intuitively, smaller values for &#x3ba; decrease the importance of <inline-formula id="inf200">
<mml:math id="minf200">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="normal">&#x2112;</mml:mi>
<mml:mrow>
<mml:mtext>size</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> which favors larger independent sets. A naive weighted sum of both terms turned out to be unstable during training and yielded poor results, whereas the product in <xref ref-type="disp-formula" rid="e6">Eq. 6</xref> worked well. For training, <inline-formula id="inf201">
<mml:math id="minf201">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="normal">&#x2112;</mml:mi>
<mml:mrow>
<mml:mtext>MIS</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is combined across iterations with a discount factor &#x3bb; as in the standard RUN-CSP architecture.</p>
<sec id="s4-4-1">
<title>4.4.1 Random Instances</title>
<p>We start by evaluating the performance on random graphs. We trained a network on 4,000 random Erd&#x151;s&#x2013;R&#xe9;nyi graphs with 100 nodes and <inline-formula id="inf202">
<mml:math id="minf202">
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mo>&#x223c;</mml:mo>
<mml:mi>U</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mn>100,600</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> edges each and with <inline-formula id="inf203">
<mml:math id="minf203">
<mml:mrow>
<mml:mi>&#x3ba;</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>. For evaluation we use random graphs with 100, 400, 800 and 1,600 nodes and a varying number of edges. For roughly <inline-formula id="inf204">
<mml:math id="minf204">
<mml:mrow>
<mml:mn>6</mml:mn>
<mml:mtext>%</mml:mtext>
</mml:mrow>
</mml:math>
</inline-formula> of all predictions, the predicted set contained induced edges (just a single edge in most cases), meaning the predicted sets where not independent. We corrected these predictions by removing one of the endpoints of each induced edge from the set and only report results after this correction. We compare RUN-CSP against two baselines: ReduMIS, a state-of-the-art M<sc>ax</sc>-IS solver (<xref ref-type="bibr" rid="B3">Akiba and Iwata, 2016</xref>; <xref ref-type="bibr" rid="B29">Lamm et al., 2017</xref>) and a greedy heuristic, which we implemented ourselves. The greedy procedure iteratively adds the node with lowest degree to the set and removes the node and its neighbors from the graph until the graph is empty. <xref ref-type="fig" rid="F4">Figure 4</xref> shows the achieved independent set sizes, each data point is the mean IS size across 100 random graphs. For graphs with 100 nodes, RUN-CSP achieves similar sizes as ReduMIS and clearly outperforms the greedy heuristic. On larger graphs our network produces smaller sets than ReduMIS. However, RUN-CSP&#x2019;s performance remains similar to the greedy baseline and, especially on denser graphs, outperforms it.</p>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption>
<p>Independent set sizes on random graphs produced by RUN-CSP, ReduMIS and a greedy heuristic. The sizes are given as the percentage of nodes contained in the independent set. Every data point is the average for 100 graphs; the degree increases in steps of 0.2.</p>
</caption>
<graphic xlink:href="frai-03-580607-g004.tif"/>
</fig>
</sec>
<sec id="s4-4-2">
<title>4.4.2 Benchmark Instances</title>
<p>For more structured instances, we use a set of benchmark graphs from a collection of hard instances for combinatorial problems (<xref ref-type="bibr" rid="B40">Xu, 2005</xref>). The instances are divided into five sets with five graphs each. These graphs were generated through the RB Model (<xref ref-type="bibr" rid="B42">Xu and Li, 2003</xref>; <xref ref-type="bibr" rid="B41">Xu et al., 2005</xref>), a model for generating hard CSP instances. A graph of the class frb<italic>c</italic>-<italic>k</italic> consists of <italic>c</italic> interconnected <italic>k</italic>-cliques and the M<sc>AX</sc>-IS has a forced size of <italic>c</italic>. The previous model trained on Erd&#x151;s&#x2013;R&#xe9;nyi graphs did not perform well on these instances and produced sets with many induced edges. Thus, we trained a new network on 2,000 instances we generated ourselves through the RB model. The exact generation procedure of this dataset is provided in the <xref ref-type="sec" rid="s35">Supplementary Material</xref>. We set <inline-formula id="inf205">
<mml:math id="minf205">
<mml:mrow>
<mml:mi>&#x3ba;</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0.1</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> to increase the importance of the independence condition. The predictions of the new model contained no induced edges for all benchmark instances. <xref ref-type="table" rid="T7">Table 7</xref> contains the achieved IS sizes. We observe that RUN-CSP yields similar results to the greedy heuristic. While our network does not match the state-of-the-art heuristic, it beats the greedy approach on large instances with over 100,000 edges.</p>
<table-wrap id="T7" position="float">
<label>TABLE 7</label>
<caption>
<p>M<sc>ax</sc>-IS: Achieved <sc>IS</sc> sizes for the benchmark graphs. We report the mean and std. deviation for the five graphs in each group.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th>Graphs</th>
<th align="center">
<inline-formula id="inf206">
<mml:math id="minf206">
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mi>V</mml:mi>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>
</th>
<th align="center">
<inline-formula id="inf207">
<mml:math id="minf207">
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mi>E</mml:mi>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>
</th>
<th align="center">RUN-CSP</th>
<th align="center">Greedy</th>
<th align="center">ReduMIS</th>
</tr>
</thead>
<tbody>
<tr>
<td>frb30&#x2013;15</td>
<td>450</td>
<td align="center">18&#xa0;k</td>
<td>25.8 &#xb1; 0.8</td>
<td>24.6 &#xb1; 0.5</td>
<td align="center">30 &#xb1; 0.0</td>
</tr>
<tr>
<td>frb40&#x2013;19</td>
<td>790</td>
<td align="center">41&#xa0;k</td>
<td>33.6 &#xb1; 0.5</td>
<td>33.0 &#xb1; 1.2</td>
<td>39.4 &#xb1; 0.5</td>
</tr>
<tr>
<td>frb50&#x2013;23</td>
<td>1,150</td>
<td align="center">80&#xa0;k</td>
<td>42.2 &#xb1; 0.4</td>
<td>42.2 &#xb1; 0.8</td>
<td>48.8 &#xb1; 0.4</td>
</tr>
<tr>
<td>frb59&#x2013;26</td>
<td>1,478</td>
<td align="center">126&#xa0;k</td>
<td>49.4 &#xb1; 0.5</td>
<td>48.0 &#xb1; 0.7</td>
<td>57.4 &#xb1; 0.9</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
</sec>
</sec>
<sec sec-type="conclusions" id="s5">
<title>5 Conclusions</title>
<p>We have presented a universal approach for approximating M<sc>ax</sc>-CSPs with recurrent neural networks. Its key feature is the ability to train without supervision on any available data. Our experiments on the optimization problems M<sc>ax</sc>-2-SAT, M<sc>ax</sc>-C<sc>ut</sc>, 3-COL and M<sc>ax</sc>-IS show that RUN-CSP produces high quality approximations for all four problems. Our network can compete with traditional approaches like greedy heuristics or semi-definite programming on random data as well as benchmark instances. For M<sc>ax</sc>-2-SAT, RUN-CSP was able to outperform a state-of-the-art M<sc>AX</sc>-SAT Solver. Our approach also achieved better results than neural baselines, where those were available. RUN-CSP networks trained on small random instances generalize well to other instances with larger size and different structure. Our approach is very efficient and inference takes only a few seconds, even for larger instances with over 10,000 constraints. The runtime scales linearly in the number of constraints and our approach can fully utilize modern hardware, like GPUs.</p>
<p>Overall, RUN-CSP seems like a promising approach for approximating Max-CSPs with neural networks. The strong results are somewhat surprising, considering that our networks consist of just one LSTM cell and a few linear functions. We believe that our observations point toward a great potential of machine learning in combinatorial optimization.</p>
<sec id="s5-1">
<title>Future Work</title>
<p>We plan to extend RUN-CSP to CSPs of arbitrary arity and to weighted CSPs. It will be interesting to see, for example, how it performs on 3-SAT and its maximization variant. Another possible future extension could combine RUN-CSP with traditional local search methods, similar to the approach by <xref ref-type="bibr" rid="B33">Li et al., (2018)</xref> for M<sc>ax</sc>-IS. The soft assignments can be used to guide a tree search and the randomness can be exploited to generate a large pool of initial solutions for traditional refinement methods.</p>
</sec>
</sec>
</body>
<back>
<sec id="s6">
<title>Data Availability Statement</title>
<p>The code for RUN-CSP including the generated datasets and their generators can be found on github <ext-link ext-link-type="uri" xlink:href="https://github.com/toenshoff/RUN-CSP">https://github.com/toenshoff/RUN-CSP</ext-link>. The additional datasets can be downloaded at their sources as specified in the following: Spinglass <sc>2-CNF</sc> (<xref ref-type="bibr" rid="B23">Heras et al., 2008</xref>), <ext-link ext-link-type="uri" xlink:href="http://maxsat.ia.udl.cat/benchmarks/">http://maxsat.ia.udl.cat/benchmarks/</ext-link> (Unweighted Crafted Benchmarks); Gset (<xref ref-type="bibr" rid="B44">Ye, 2003</xref>), <ext-link ext-link-type="uri" xlink:href="https://www.cise.ufl.edu/research/sparse/matrices/Gset/">https://www.cise.ufl.edu/research/sparse/matrices/Gset/</ext-link>; M<sc>ax</sc>-IS Graphs (<xref ref-type="bibr" rid="B40">Xu, 2005</xref>), <ext-link ext-link-type="uri" xlink:href="http://sites.nlsde.buaa.edu.cn/kexu/benchmarks/graph-benchmarks.htm">http://sites.nlsde.buaa.edu.cn/kexu/benchmarks/graph-benchmarks.htm</ext-link>; Optsicom (<xref ref-type="bibr" rid="B14">Corber&#xe1;n et al., 2006</xref>), <ext-link ext-link-type="uri" xlink:href="http://grafo.etsii.urjc.es/optsicom/maxcut/">http://grafo.etsii.urjc.es/optsicom/maxcut/</ext-link>.</p>
</sec>
<sec id="s7">
<title>Author Contributions</title>
<p>Starting from an initial idea by JT, all authors contributed to the presented design and the writing of this manuscript. Most of the implementation was done by JT, with help and feedback from MR and HW. The work was supervised by MG.</p>
</sec>
<sec id="s8">
<title>Funding</title>
<p>This work was supported by the German Research Foundation (DFG) under grants GR 1,492/16&#x2013;1 <italic>Quantitative Reasoning About Database Queries</italic> and GRK 2236 (UnRAVeL).</p>
</sec>
<sec sec-type="COI-statement" id="s9">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<ack>
<p>This work is part of Jan T&#xf6;nshoff&#x2019;s Master&#x2019;s Thesis and already appeared as a preprint on arXiv (<xref ref-type="bibr" rid="B39">Toenshoff et al., 2019</xref>).</p>
</ack>
<sec id="s35">
<title>Supplementary Material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://www.frontiersin.org/articles/10.3389/frai.2020.580607/full&#x23;supplementary-material">https://www.frontiersin.org/articles/10.3389/frai.2020.580607/full&#x23;supplementary-material</ext-link>.</p>
<supplementary-material xlink:href="datasheet1.pdf" id="SM1" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<fn-group>
<fn id="FN1">
<label>1</label>
<p>Our Tensorflow implementation of RUN-CSP is available at <ext-link ext-link-type="uri" xlink:href="https://github.com/toenshoff/RUN-CSP">https://github.com/toenshoff/RUN-CSP</ext-link>.</p>
</fn>
<fn id="FN2">
<label>2</label>
<p>
<ext-link ext-link-type="uri" xlink:href="http://grafo.etsii.urjc.es/optsicom/maxcut/">http://grafo.etsii.urjc.es/optsicom/maxcut/</ext-link>.</p>
</fn>
</fn-group>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Abboud</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Ceylan</surname>
<given-names>I. I.</given-names>
</name>
<name>
<surname>Lukasiewicz</surname>
<given-names>T.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Learning to reason: leveraging neural networks for approximate DNF counting</article-title>. <comment>Preprint repository name [Preprint]. Available at:arXiv preprint arXiv:1904.02688</comment> (<comment>Accessed June 4, 2019</comment>).</citation>
</ref>
<ref id="B2">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Adorf</surname>
<given-names>H.-M.</given-names>
</name>
<name>
<surname>Johnston</surname>
<given-names>M. D.</given-names>
</name>
</person-group> (<year>1990</year>). &#x201c;<article-title>A discrete stochastic neural network algorithm for constraint satisfaction problems</article-title>,&#x201d; in <conf-name>IJCNN international joint conference on neural networks</conf-name>, <conf-loc>San Diego, CA, USA</conf-loc>, <conf-date>June 17&#x2013;21, 1990</conf-date> (<publisher-name>IEEE</publisher-name>), <fpage>917</fpage>&#x2013;<lpage>924</lpage>. </citation>
</ref>
<ref id="B3">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Akiba</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Iwata</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Branch-and-reduce exponential/FPT algorithms in practice: a case study of vertex cover</article-title>. <source>Theor. Comput. Sci.</source> <volume>609</volume>, <fpage>211</fpage>&#x2013;<lpage>225</lpage>. <pub-id pub-id-type="doi">10.1016/j.tcs.2015.09.023</pub-id> </citation>
</ref>
<ref id="B4">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Amizadeh</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Matusevych</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Weimer</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2019</year>). &#x201c;<article-title>Learning to solve circuit-SAT: an unsupervised differentiable approach</article-title>,&#x201d; in <conf-name>International conference on learning representations</conf-name>, <conf-loc>New Orleans, Louisiana, USA</conf-loc>, <conf-date>May 6&#x2013;9, 2019 (Amherst)</conf-date>.</citation>
</ref>
<ref id="B5">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Apt</surname>
<given-names>K.</given-names>
</name>
</person-group> (<year>2003</year>). <source>Principles of constraint programming</source>. <publisher-loc>Amsterdam, NL</publisher-loc>: <publisher-name>Cambridge University Press</publisher-name>, <fpage>1</fpage>&#x2013;<lpage>17</lpage>. </citation>
</ref>
<ref id="B6">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Argelich</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2016</year>). [Dataset] <article-title>Eleventh evaluation of max-SAT solvers (Max-SAT-2016)</article-title>.</citation>
</ref>
<ref id="B7">
<citation citation-type="book">
<person-group person-group-type="editor">
<name>
<surname>Bacchus</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>J&#xe4;rvisalo</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Martins</surname>
<given-names>R.</given-names>
</name>
</person-group> (Editors) (<year>2019</year>). <source>MaxSAT evaluation 2019: solver and benchmark descriptions</source>. <publisher-name>Helsinki: University of Helsinki</publisher-name>, <fpage>49</fpage>.</citation>
</ref>
<ref id="B8">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Benlic</surname>
<given-names>U.</given-names>
</name>
<name>
<surname>Hao</surname>
<given-names>J.-K.</given-names>
</name>
</person-group> (<year>2013</year>). <article-title>Breakout local search for the max-cutproblem</article-title>. <source>Eng. Appl. Artif. Intell.</source> <volume>26</volume>, <fpage>1162</fpage>&#x2013;<lpage>1173</lpage>. <pub-id pub-id-type="doi">10.1016/j.engappai.2012.09.001</pub-id> </citation>
</ref>
<ref id="B9">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Berg</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Demirovi&#x107;</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Stuckey</surname>
<given-names>P. J.</given-names>
</name>
</person-group> (<year>2019</year>). &#x201c;<article-title>Core-boosted linear search for incomplete maxSAT</article-title>,&#x201d; in <conf-name>International conference on integration of constraint programming, artificial intelligence, and operations research</conf-name>, <conf-loc>Thessaloniki, Greece</conf-loc>, <conf-date>June 4&#x2013;7, 2019</conf-date> (<publisher-loc>Cham, Switzerland</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>39</fpage>&#x2013;<lpage>56</lpage>. </citation>
</ref>
<ref id="B10">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Boettcher</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Percus</surname>
<given-names>A. G.</given-names>
</name>
</person-group> (<year>2001</year>). <article-title>Extremal optimization for graph partitioning</article-title>. <source>Phys. Rev.</source> <volume>64</volume>, <fpage>026114</fpage>. <pub-id pub-id-type="doi">10.1103/physreve.64.026114</pub-id> </citation>
</ref>
<ref id="B11">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Br&#xe9;laz</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>1979</year>). <article-title>New methods to color the vertices of a graph</article-title>. <source>Commun. ACM</source> <volume>22</volume>, <fpage>251</fpage>&#x2013;<lpage>256</lpage>. <pub-id pub-id-type="doi">10.1145/359094.359101</pub-id> </citation>
</ref>
<ref id="B12">
<citation citation-type="web">
<person-group person-group-type="author">
<name>
<surname>Chen</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Bruna</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2019</year>). &#x201c;<article-title>Supervised community detection with line graph neural networks</article-title>,&#x201d; in <conf-name>International Conference on Learning Representations</conf-name>, <conf-loc>New Orleans, Louisiana, USA</conf-loc>, <conf-date>May 6&#x2013;9, 2019 (Amherst)</conf-date>. </citation>
</ref>
<ref id="B13">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Choi</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Ye</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<year>2000</year>). <source>Solving sparse semidefinite programs using the dual scaling algorithm with an iterative solver</source>. <publisher-loc>Iowa City, IA</publisher-loc>: <publisher-name>Department of Management Sciences, University of Iowa.</publisher-name>
</citation>
</ref>
<ref id="B14">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Corber&#xe1;n</surname>
<given-names>&#xc1;.</given-names>
</name>
<name>
<surname>Peir&#xf3;</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Campos</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Glover</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Mart&#xed;</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2006</year>). <source>Optsicom project</source>.</citation>
</ref>
<ref id="B15">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Dahl</surname>
<given-names>E.</given-names>
</name>
</person-group> (<year>1987</year>). &#x201c;<article-title>Neural network algorithms for an np-complete problem: map and graph coloring</article-title>,&#x201d; in <conf-name>Proceedings First International Conference Neural Networks III</conf-name>, (<publisher-loc>San Diego, NY</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>113</fpage>&#x2013;<lpage>120</lpage>. </citation>
</ref>
<ref id="B16">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>De Simone</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Diehl</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>J&#xfc;nger</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Mutzel</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Reinelt</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Rinaldi</surname>
<given-names>G.</given-names>
</name>
</person-group> (<year>1995</year>). <article-title>Exact ground states of Ising spin glasses: new experimental results with a branch-and-cut algorithm</article-title>. <source>J. Stat. Phys.</source> <volume>80</volume>, <fpage>487</fpage>&#x2013;<lpage>496</lpage>. <pub-id pub-id-type="doi">10.1007/bf02178370</pub-id> </citation>
</ref>
<ref id="B17">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Dechter</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2003</year>). <source>Constraint processing</source>. <publisher-loc>San Mateo, CA</publisher-loc>: <publisher-name>University of Morgan Kaufmann</publisher-name>, <fpage>344</fpage>.</citation>
</ref>
<ref id="B18">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dembo</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Montanari</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Sen</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Extremal cuts of sparse random graphs</article-title>. <source>Ann. Probab.</source> <volume>45</volume>, <fpage>1190</fpage>&#x2013;<lpage>1217</lpage>. <pub-id pub-id-type="doi">10.1214/15-aop1084</pub-id> </citation>
</ref>
<ref id="B19">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Galinier</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Hao</surname>
<given-names>J.-K.</given-names>
</name>
</person-group> (<year>1999</year>). <article-title>Hybrid evolutionary algorithms for graph coloring</article-title>. <source>J. Combin. Optim.</source> <volume>3</volume>, <fpage>379</fpage>&#x2013;<lpage>397</lpage>. <pub-id pub-id-type="doi">10.1023/a:1009823419804</pub-id>
</citation>
</ref>
<ref id="B20">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Gassen</surname>
<given-names>D. W.</given-names>
</name>
<name>
<surname>Carothers</surname>
<given-names>J. D.</given-names>
</name>
</person-group> (<year>1993</year>). &#x201c;<article-title>Graph color minimization using neural networks</article-title>,&#x201d; in <conf-name>Proceedings of 1993 international conference on neural networks</conf-name>, <conf-loc>October 25&#x2013;29, 1993</conf-loc> (<publisher-loc>Nagoya, Japan</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>1541</fpage>&#x2013;<lpage>1544</lpage>. </citation>
</ref>
<ref id="B21">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Goemans</surname>
<given-names>M. X.</given-names>
</name>
<name>
<surname>Williamson</surname>
<given-names>D. P.</given-names>
</name>
</person-group> (<year>1995</year>). <article-title>Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming</article-title>. <source>J. ACM</source> <volume>42</volume>, <fpage>1115</fpage>&#x2013;<lpage>1145</lpage>. <pub-id pub-id-type="doi">10.1145/227683.227684</pub-id> </citation>
</ref>
<ref id="B22">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Harmanani</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Hannouche</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Khoury</surname>
<given-names>N.</given-names>
</name>
</person-group> (<year>2010</year>). <article-title>A neural networks algorithm for the mi nimum coloring problem using FPGAs&#x2020;</article-title>. <source>Int. J. Model. Simulat.</source> <volume>30</volume>, <fpage>506</fpage>&#x2013;<lpage>513</lpage>. <pub-id pub-id-type="doi">10.1080/02286203.2010.11442597</pub-id> </citation>
</ref>
<ref id="B23">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Heras</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Larrosa</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>De Givry</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Schiex</surname>
<given-names>T.</given-names>
</name>
</person-group> (<year>2008</year>). <article-title>2006 and 2007 max-SAT evaluations: contributed instances</article-title>. <source>Schweiz. Arch. Tierheilkd.</source> <volume>4</volume>, <fpage>239</fpage>&#x2013;<lpage>250</lpage>. <pub-id pub-id-type="doi">10.3233/sat190046</pub-id>
</citation>
</ref>
<ref id="B24">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Holme</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>B. J.</given-names>
</name>
</person-group> (<year>2002</year>). <article-title>Growing scale-free networks with tunable clustering</article-title>. <source>Phys. Rev.</source> <volume>65</volume>, <fpage>026107</fpage>. <pub-id pub-id-type="doi">10.1103/physreve.65.026107</pub-id> </citation>
</ref>
<ref id="B25">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hopfield</surname>
<given-names>J. J.</given-names>
</name>
<name>
<surname>Tank</surname>
<given-names>D. W.</given-names>
</name>
</person-group> (<year>1985</year>). <article-title>&#x201c;Neural&#x201d; computation of decisions in optimization problems</article-title>. <source>Biol. Cybern.</source> <volume>52</volume>, <fpage>141</fpage>&#x2013;<lpage>5210</lpage>. <pub-id pub-id-type="doi">10.1007/BF00339943</pub-id> <ext-link ext-link-type="uri" xlink:href="https://pubmed.ncbi.nlm.nih.gov/4027280/">PubmedAbstract</ext-link> </citation>
</ref>
<ref id="B26">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kautz</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2019</year>). [Dataset]. <article-title>Walksat home page</article-title> </citation>
</ref>
<ref id="B27">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Khalil</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Dai</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Dilkina</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Song</surname>
<given-names>L.</given-names>
</name>
</person-group> (<year>2017</year>). &#x201c;<article-title>Learning combinatorial optimization algorithms over graphs</article-title>,&#x201d; in <conf-name>Advances in neural information processing systems 30: annual conference on neural information processing systems 2017</conf-name>, <conf-loc>Long Beach, CA, USA</conf-loc>, <conf-date>December 4&#x2013;9, 2017</conf-date>. <fpage>6348</fpage>&#x2013;<lpage>6358</lpage>. <comment>Red Hook (NY): Curran Associates</comment>. </citation>
</ref>
<ref id="B28">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Kleinberg</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Tardos</surname>
<given-names>E.</given-names>
</name>
</person-group> (<year>2006</year>). <source>Algorithm design</source>. <publisher-name>Pearson Education India</publisher-name>, <fpage>864</fpage> .</citation>
</ref>
<ref id="B29">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lamm</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Sanders</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Schulz</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Strash</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Werneck</surname>
<given-names>R. F.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Finding near-optimal independent sets at scale</article-title>. <source>J. Heuristics</source> <volume>23</volume>, <fpage>207</fpage>&#x2013;<lpage>229</lpage>. <pub-id pub-id-type="doi">10.1007/s10732-017-9337-x</pub-id>
</citation>
</ref>
<ref id="B30">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Lemos</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Prates</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Avelar</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Lamb</surname>
<given-names>L.</given-names>
</name>
</person-group> (<year>2019</year>). <source>Graph coloring meets deep learning: effective graph neural network models for combinatorial problems</source>. <comment>Preprint repository name [Preprint]. Available at:arXiv preprint arXiv:1903.04598</comment> (<comment>Accessed March 11, 2019</comment>).</citation>
</ref>
<ref id="B31">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Lewis</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2015</year>). <source>A guide to graph coloring</source>. <publisher-name>Basel: Springer</publisher-name>, <volume>Vol. 7</volume>, <fpage>253</fpage>.</citation>
</ref>
<ref id="B32">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lewis</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Thompson</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Mumford</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Gillard</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2012</year>). <article-title>A wide-ranging computational comparison of high-performance graph coloring algorithms</article-title>. <source>Comput. Oper. Res.</source> <volume>39</volume>, <fpage>1933</fpage>&#x2013;<lpage>1950</lpage>. <pub-id pub-id-type="doi">10.1016/j.cor.2011.08.010</pub-id> </citation>
</ref>
<ref id="B33">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Koltun</surname>
<given-names>V.</given-names>
</name>
</person-group> (<year>2018</year>). &#x201c;<article-title>Combinatorial optimization with graph convolutional networks and guided tree search</article-title>,&#x201d; in <conf-name>Advances in Neural Information Processing Systems</conf-name>, <fpage>539</fpage>&#x2013;<lpage>548</lpage>. </citation>
</ref>
<ref id="B34">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Prates</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Avelar</surname>
<given-names>P. H. C.</given-names>
</name>
<name>
<surname>Lemos</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Lamb</surname>
<given-names>L. C.</given-names>
</name>
<name>
<surname>Vardi</surname>
<given-names>M. Y.</given-names>
</name>
</person-group> (<year>2019</year>). &#x201c;<article-title>Learning to solve NP-complete problems: a graph neural network for decision TSP</article-title>, <source>Aaai</source>,&#x201d; in <conf-name>Proceedings of the AAAI conference on artificial intelligence</conf-name>, <conf-loc>New York, USA</conf-loc>, <conf-date>February 7&#x2013;12, 2020</conf-date> (<publisher-loc>Palo Alto, California USA</publisher-loc>: <publisher-name>AAAI Press</publisher-name>), <fpage>4731</fpage>&#x2013;<lpage>4738</lpage>. </citation>
</ref>
<ref id="B35">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Raghavendra</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>2008</year>). &#x201c;<article-title>Optimal algorithms and inapproximability results for every CSP?</article-title>&#x201d; in <conf-name>Proceedings of the 40th ACM symposium on theory of computing</conf-name>, <conf-loc>New York, USA</conf-loc>, <conf-date>May, 2018</conf-date> (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>Association for Computing Machinery</publisher-name>), <fpage>245</fpage>&#x2013;<lpage>254</lpage>. </citation>
</ref>
<ref id="B36">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Selman</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Kautz</surname>
<given-names>H. A.</given-names>
</name>
<name>
<surname>Cohen</surname>
<given-names>B.</given-names>
</name>
</person-group> (<year>1993</year>). <article-title>Local search strategies for satisfiability testing</article-title>. <source>Cliques Coloring Satisfiab.</source> <volume>26</volume>, <fpage>521</fpage>&#x2013;<lpage>532</lpage>. <pub-id pub-id-type="doi">10.1090/dimacs/026/25</pub-id>
</citation>
</ref>
<ref id="B37">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Selsam</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Lamm</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>B&#xfc;nz</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Liang</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>de Moura</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Dill</surname>
<given-names>D. L.</given-names>
</name>
</person-group> (<year>2019</year>). &#x201c;<article-title>Learning a SAT solver from single-bit supervision</article-title>,&#x201d; in <conf-name>international conference on learning representations</conf-name>, <conf-loc>New Orleans, LA</conf-loc>, <conf-date>Apr 30, 2019(Amherst)</conf-date>.</citation>
</ref>
<ref id="B38">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Takefuji</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>K. C.</given-names>
</name>
</person-group> (<year>1991</year>). <article-title>Artificial neural networks for four-coloring map problems and k-colorability problems</article-title>. <source>IEEE Trans. Circ. Syst.</source> <volume>38</volume>, <fpage>326</fpage>&#x2013;<lpage>333</lpage>. <pub-id pub-id-type="doi">10.1109/31.101328</pub-id> </citation>
</ref>
<ref id="B39">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Toenshoff</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Ritzert</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Wolf</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Grohe</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2019</year>). <source>Graph neural networks for maximum constraint satisfaction</source>
<comment>[Preprint]. arXiv:1909.08387</comment>.</citation>
</ref>
<ref id="B40">
<citation citation-type="web">
<person-group person-group-type="author">
<name>
<surname>Xu</surname>
<given-names>K.</given-names>
</name>
</person-group> (<year>2005</year>). [Dataset] <article-title>BHOSLIB: benchmarks with hidden optimum solutions for graph problems (maximum clique, maximum independent set, minimum vertex cover and vertex coloring)</article-title>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="http://www.nlsde.buaa.edu.cn/%7Ekexu/benchmarks/graph-benchmarks.htm">http://www.nlsde.buaa.edu.cn/&#x223c;kexu/benchmarks/graph-benchmarks.htm</ext-link>
</comment> (<comment>Accessed April 20, 2014</comment>).</citation>
</ref>
<ref id="B41">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xu</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Boussemart</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Hemery</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Lecoutre</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2005</year>). &#x201c;<article-title>A simple model to generate hard satisfiable instances</article-title>,&#x201d; in <conf-name>IJCAI-05, Proceedings of the nineteenth international joint Conference on artificial intelligence</conf-name>, <conf-loc>Edinburgh, Scotland, UK</conf-loc>, <conf-date>July 30&#x2013;August 5, 2005</conf-date>. <fpage>337</fpage>&#x2013;<lpage>342</lpage>. <comment>Denver: Professional Book Center</comment>
</citation>
</ref>
<ref id="B42">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xu</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>W.</given-names>
</name>
</person-group> (<year>2003</year>). <article-title>Many hard examples in exact phase transitions with application to generating hard satisfiable instances</article-title>. <comment>Preprint repository name [Preprint]. Available at:arXiv preprint cs/0302001</comment> (<comment>Accessed November 11, 2003</comment>).</citation>
</ref>
<ref id="B43">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Yao</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Bandeira</surname>
<given-names>A. S.</given-names>
</name>
<name>
<surname>Villar</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2019</year>). <source>Experimental performance of graph neural networks on random instances of max-cut</source>. <comment>Preprint repository name [Preprint]. Available at:arXiv preprint arXiv:1908.05767</comment> (<comment>Accessed August 15, 2019</comment>).</citation>
</ref>
<ref id="B44">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Ye</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<year>2003</year>). [Dataset] <source>The Gset dataset</source>. </citation>
</ref>
</ref-list>
<sec id="s10">
<title>Acronyms</title>
<p>
<bold>RUN-CSP</bold> Recurrent Unsupervised Neural Network for Constraint Satisfaction Problems.</p>
<p>
<bold>3-COL</bold> 3-Coloring Problem.</p>
<p>
<bold>CSP</bold> Constrain Satisfaction Problem.</p>
<p>
<bold>GNN</bold> Graph Neural Network.</p>
<p>
<bold>MVC</bold> Maximum Vertex Cover Problem.</p>
<p>
<bold>M</bold>
<sc>
<bold>ax</bold>
</sc>
<bold>-C</bold>
<sc>
<bold>ut</bold>
</sc> Maximum Cut Problem.</p>
<p>
<bold>M</bold>
<sc>
<bold>ax</bold>
</sc>
<bold>-2-S</bold>
<bold>AT</bold> Maximum Satisfiability Problem for Boolean formulas with two literals per clause.</p>
<p>
<bold>M</bold>
<sc>
<bold>ax</bold>
</sc>
<bold>-3-C</bold>
<sc>
<bold>ol</bold>
</sc> Maximum 3-Coloring Problem.</p>
<p>
<bold>M</bold>
<sc>
<bold>ax</bold>
</sc>
<bold>-IS</bold> Maximum Independent Set.</p>
<p>
<bold>TSP</bold> Traveling Sales Person Problem.</p>
</sec>
</back>
</article>
