<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article article-type="research-article" dtd-version="2.3" xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Big Data</journal-id>
<journal-title>Frontiers in Big Data</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Big Data</abbrev-journal-title>
<issn pub-type="epub">2624-909X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">666174</article-id>
<article-id pub-id-type="doi">10.3389/fdata.2021.666174</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Big Data</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Adaptive On-the-Fly Changes in Distributed Processing Pipelines</article-title>
<alt-title alt-title-type="left-running-head">Albers et&#x20;al.</alt-title>
<alt-title alt-title-type="right-running-head">Adaptive Changes in Distributed Processing</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Albers</surname>
<given-names>Toon</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1228773/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Lazovik</surname>
<given-names>Elena</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1385278/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Hadadian Nejad Yousefi</surname>
<given-names>Mostafa</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1193675/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Lazovik</surname>
<given-names>Alexander</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
<uri xlink:href="https://loop.frontiersin.org/people/854881/overview"/>
</contrib>
</contrib-group>
<aff id="aff1">
<label>
<sup>1</sup>
</label>Monitoring &#x26; Control Services Department, TNO, <addr-line>Groningen</addr-line>, <country>Netherlands</country>
</aff>
<aff id="aff2">
<label>
<sup>2</sup>
</label>Distributed System Group, Faculty of Science and Engineering, Bernoulli Institute, University of Groningen, <addr-line>Groningen</addr-line>, <country>Netherlands</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>
<bold>Edited by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/925374/overview">Donatella Firmani</ext-link>, Roma Tre University, Italy</p>
</fn>
<fn fn-type="edited-by">
<p>
<bold>Reviewed by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/553317/overview">Aastha Madaan</ext-link>, University of Southampton, United&#x20;Kingdom</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1365661/overview">Christian Pilato</ext-link>, Politecnico di Milano, Italy</p>
</fn>
<corresp id="c001">&#x2a;Correspondence: Alexander Lazovik, <email>a.lazovik@rug.nl</email>
</corresp>
<fn fn-type="other">
<p>This article was submitted to Data Mining and Management, a section of the journal Frontiers in Big&#x20;Data</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>26</day>
<month>11</month>
<year>2021</year>
</pub-date>
<pub-date pub-type="collection">
<year>2021</year>
</pub-date>
<volume>4</volume>
<elocation-id>666174</elocation-id>
<history>
<date date-type="received">
<day>09</day>
<month>02</month>
<year>2021</year>
</date>
<date date-type="accepted">
<day>30</day>
<month>08</month>
<year>2021</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2021 Albers, Lazovik, Hadadian Nejad Yousefi and Lazovik.</copyright-statement>
<copyright-year>2021</copyright-year>
<copyright-holder>Albers, Lazovik, Hadadian Nejad Yousefi and Lazovik</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these&#x20;terms.</p>
</license>
</permissions>
<abstract>
<p>Distributed data processing systems have become the standard means for big data analytics. These systems are based on processing pipelines where operations on data are performed in a chain of consecutive steps. Normally, the operations performed by these pipelines are set at design time, and any changes to their functionality require the applications to be restarted. This is not always acceptable, for example, when we cannot afford downtime or when a long-running calculation would lose significant progress. The introduction of variation points to distributed processing pipelines allows for on-the-fly updating of individual analysis steps. In this paper, we extend such basic variation point functionality to provide fully automated reconfiguration of the processing steps within a running pipeline through an automated planner. We have enabled pipeline modeling through constraints. Based on these constraints, we not only ensure that configurations are compatible with type but also verify that expected pipeline functionality is achieved. Furthermore, automating the reconfiguration process simplifies its use, in turn allowing users with less development experience to make changes. The system can automatically generate and validate pipeline configurations that achieve a specified goal, selecting from operation definitions available at planning time. It then automatically integrates these configurations into the running pipeline. We verify the system through the testing of a proof-of-concept implementation. The proof of concept also shows promising results when reconfiguration is performed frequently.</p>
</abstract>
<kwd-group>
<kwd>distributed computing</kwd>
<kwd>big data applications</kwd>
<kwd>on-the-fly updates</kwd>
<kwd>adaptive dynamic systems</kwd>
<kwd>industrial data management</kwd>
<kwd>dynamic software updating</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1">
<title>1 Introduction</title>
<p>Industrial organizations are increasingly dependent on the digital components of their business. Industry 4.0 is based on further digitalization and, in particular, on the concepts of automation and data exchange to achieve efficiency and zero-downtime manufacturing. Organizations trying to keep pace with the new challenges are faced with processing a large number of data (<xref ref-type="bibr" rid="B8">Che et&#x20;al., 2013</xref>). It is required for their core business and it becomes a part of their decision-making processes. Additionally, it provides a competitive advantage over companies not investing in digitalization. To achieve their goals, industrial organizations must often deal with various kinds of data. This leads to different requirements regarding how that data is processed. For example, some data such as readings from physical sensor networks from factory equipment may require real-time processing, while other data such as customer or supplier analytics can be processed in batches at set intervals (<xref ref-type="bibr" rid="B1">Assun&#xe7;&#xe3;o et&#x20;al., 2015</xref>).</p>
<p>Performing analysis on all available datasets on a single computer may not be fast enough or may be impossible due to the operational infrastructure requirements (such as storage space or memory). By distributing the processing over multiple computers, the hardware requirements per computer can be decreased and total processing time can also be lowered as each computer operates in parallel. While taking a mainframe approach (i.e.,&#x20;a single high-performance computer) may be possible, it is often not as cost-effective as a distributed approach (<xref ref-type="bibr" rid="B14">Franks, 2012</xref>). Such distributed processing and analysis are commonly done through <italic>distributed processing pipelines</italic>. Note that the term <italic>pipeline</italic> is sometimes also used to describe machine learning systems built on top of distributed data processing frameworks. Machine learning specific aspects are not covered in this paper. We define a pipeline as generic distributed data processing performed through a sequence of steps, where each step performs a specific part of data processing, and the output of that step is used as an input to one or multiple subsequent&#x20;steps.</p>
<p>The distributed data processing in Industry 4.0 has a number of open issues. One of these is the fixed nature of distributed processing platforms currently available on the market. That means that steps in a pipeline on one of these platforms cannot be changed once a calculation has been started. However, updating a running pipeline is needed in many cases including the following:<list list-type="simple">
<list-item>
<p>&#x2022; After changes in the environment</p>
<list list-type="simple">
<list-item>
<p>e.g., external services become unavailable or a new data source provides data in a different format.</p>
</list-item>
</list>
</list-item>
<list-item>
<p>&#x2022; After changes in business model and&#x20;goals</p>
<list list-type="simple">
<list-item>
<p>e.g., calculating different statistics based on the same data as a result of business industrial demands.</p>
</list-item>
</list>
</list-item>
<list-item>
<p>&#x2022; Upgrading of processing models or parameters</p>
<list list-type="simple">
<list-item>
<p>e.g., fixing errors in the pipeline code to provide new functionality, introducing more accurate algorithms, or tweaking and tuning algorithm parameters for better results.</p>
</list-item>
</list>
</list-item>
</list>
</p>
<p>Two approaches are generally used to update distributed processing pipelines<xref ref-type="fn" rid="FN1">
<sup>1</sup>
</xref>. The first requires stopping a running pipeline and then starting a new updated version. It is not always appropriate or possible, such as in the case of permanent monitoring and controlling systems that need to be operational 24/7 or for batch processing pipelines that are in the middle of a long-term computation. In these cases, we cannot always afford the resulting downtime or loss of progress, or it could simply not be desirable. The second option is executing a new updated version in parallel with the old version and taking over processing when the new pipeline is ready. If the processing resources required for a pipeline are significant, running a new pipeline in parallel is not always an option because of the limited infrastructure available or excessive extra costs required for&#x20;it.</p>
<p>In the case of stopping and then restarting, the resulting downtime could delay or completely miss the analysis of vital data. In the case of parallel computations, significant progress could be lost, depending on how long ago the computation had been started.</p>
<p>In a previous paper, we have developed a framework <italic>spark-dynamic</italic> (<xref ref-type="bibr" rid="B31">Lazovik et&#x20;al., 2017</xref>), built on top of the popular distributed data processing platform Apache Spark (<xref ref-type="bibr" rid="B49">The Apache Software Foundation, 2015b</xref>) to enable the updating of the steps and algorithm parameters of running pipelines without restarting them. This process is called <italic>reconfiguration</italic>. In this work, we extend the functionality of <italic>spark-dynamic</italic> to automate parts of the reconfiguration process using Artificial Intelligence Planning techniques to guarantee consistency of performed updates. The resulting system uses constraints of different types to model pipeline behavior. It is able to automatically generate and validate pipeline configurations based on the provided model and goals and can automatically integrate these configurations into a running pipeline, even when internal pipeline data types differ between versions.</p>
<p>The need for verified reconfiguration is twofold. First, the <italic>spark-dynamic</italic> library only provides a basic updating mechanism with checks for serialization success and type compatibility. However, this is not enough, because, for example, changes to the internal functionality of one step could differ from the expectations of some subsequent step, which could result in general inconsistencies or outright crashes of the whole process. Secondly, the verification process is a complex task, and by automating some aspects of the reconfiguration process, we can drastically simplify it. In the future, this may allow industrial users without any development experience to make changes to a running pipeline when they are required without coding efforts. For example, an asset manager who wants insight into the state of equipment and the trends in aging of that equipment can start such a permanent analysis, check the results at any moment, and tweak the business or technical constraints when it is needed. What is important, though, is that when the update happens, the end user should have enough trust that it does not crash the whole system. Automated synthesis and validation of every update allow us to formally ensure that&#x20;trust.</p>
<p>Many distributed data processing frameworks operate on the principles of a Directed Acyclic Graph (DAG), distinguishing themselves through their focus on batched or streaming data processing [e.g., Apache Spark (<xref ref-type="bibr" rid="B49">The Apache Software Foundation, 2015b</xref>) mainly focuses on batched processing while supporting streaming workloads, whereas for Apache Flink (<xref ref-type="bibr" rid="B48">The Apache Software Foundation, 2015a</xref>), the opposite is true]. In this paper, we use Apache Spark as the distributed data processing framework due to its popularity. However, as many modern frameworks are built on similar principles, the mechanisms described in this paper are in major parts applicable to other distributed data processing frameworks.</p>
<p>A typical example of a distributed data processing pipeline for Industry 4.0 is the case of predictive maintenance with the goal of zero downtime. With predictive maintenance, factories and other industries can improve the efficiency of their systems and prolong their lifetime. Consider a goal of finding devices in a factory that degrade over time. With predictive maintenance, we could find degraded devices before they fail. Predictors could be the age of devices or the measured efficiency through some sensors. A schematic Spark data pipeline can be constructed as shown in <xref ref-type="fig" rid="F1">Figure&#x20;1</xref>.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption>
<p>Example predictive maintenance pipeline.</p>
</caption>
<graphic xlink:href="fdata-04-666174-g001.tif"/>
</fig>
<p>Later, new predictors could be developed such as one based on the failure rate of each type of device. Existing predictors could also be found ineffective and be removed from service.</p>
<p>In summary, the main contribution of this paper is a distributed data processing pipeline reconfiguration framework based on constraint-based AI planning. It ensures that the current industrial user goals are satisfied, takes into account the dependencies between related steps within the pipeline (and thus ensuring its data type and structural consistency), and automatically incorporates the new configuration. The feasibility of the approach is tested using Apache Spark as a target distributed processing framework. In this paper, we further demonstrate the generic methodology to enable adaptive on-the-fly changes of applications in distributed data analysis for industrial organizations in the Industry 4.0 era and a software library as a proof of concept with the demonstration of the guarantees of updating for the industrial&#x20;user.</p>
<p>The rest of the paper is organized as follows. In <xref ref-type="sec" rid="s2">Section 2</xref>, we look at current research into runtime updating, pipeline synthesis, and consistency checking. Then, in <xref ref-type="sec" rid="s3">Section 3</xref>, we show an overview of our proposed system. This is followed by a closer look at the planner design in <xref ref-type="sec" rid="s4">Section 4</xref>. Next, we provide an evaluation of the system in <xref ref-type="sec" rid="s5">Section 5</xref>. Finally, we provide conclusions and discussion in <xref ref-type="sec" rid="s6">Section&#x20;6</xref>.</p>
</sec>
<sec id="s2">
<title>2 Related Work</title>
<p>The problem of distributed software reconfiguration is not new. We, therefore, start with an insight into the state of the art of relevant techniques. We begin by looking at dynamic updates not specific to distributed processing.</p>
<sec id="s2-1">
<title>2.1 Runtime Updating</title>
<p>A common term for runtime updating is Dynamic Software Updating (DSU) (<xref ref-type="bibr" rid="B21">Hicks and Nettles, 2005</xref>; <xref ref-type="bibr" rid="B41">Pina et&#x20;al., 2014</xref>). With DSU, running processes are updated by rewriting the running code and process memory. Compared to traditional software updating, the process does not need to be stopped, although possibly it can be temporarily halted. Because of this, the state of the running process can be preserved. As a result, running sessions and connections can be kept active and no costly application boot is required.</p>
<p>Many early updating systems require a specialized program environment. Notably, <xref ref-type="bibr" rid="B10">Cook and Lee (1983)</xref> have described a system called DYMOS that encompasses nearly all aspects of a software system: a command interpreter, a source code manager, an editor, a compiler, and a runtime support system. By having control over all of these aspects, they have the ability to add and monitor synchronization systems allowing the updates to be performed seamlessly.</p>
<p>More recent models for DSU do not require an all-encompassing system. Instead, DURTS (<xref ref-type="bibr" rid="B34">Montgomery, 2004</xref>) requires only a custom linker and a module to load and synchronize replacement modules. The linked module is loaded into heap space from within the application and a pointer-to-function variable is used to execute the function, where the pointer value is updated to point to newer versions. Similarly, <xref ref-type="bibr" rid="B21">Hicks and Nettles (2005)</xref> have described an approach where they used the C-like language <italic>Popcorn</italic>, compiling code patches into <italic>Typed Assembly Language</italic> that can be dynamically linked and integrated. There are other works on a language level like (<xref ref-type="bibr" rid="B36">Mugarza et&#x20;al., 2020</xref>), which are implemented on the <italic>Ada</italic> programming language. Alternatively, <xref ref-type="bibr" rid="B2">Bagherzadeh et&#x20;al. (2020)</xref> present a language-independent approach based on model execution systems (<xref ref-type="bibr" rid="B22">Hojaji et&#x20;al., 2019</xref>).</p>
<p>Some other systems use and extend the functionality provided by the platform an application runs on. For example, <xref ref-type="bibr" rid="B26">Kim et&#x20;al. (2011)</xref> used the Java Virtual Machine (JVM) HotSwap capability to replace code, adding features such as bytecode rewriting to work around HotSwap limitations. One of the more complex systems is Rubah (<xref ref-type="bibr" rid="B41">Pina et&#x20;al., 2014</xref>). It uses a manual definition of update points, combined with bytecode rewriting. Their update process consists of three steps. The first step is <italic>quiescence</italic>, reaching a stable state where it is safe to perform updates. In the next step, the running state is transformed, going over the objects in heap memory and changing the fields and methods from existing objects to their new versions. Next, the program threads are restarted at their equivalent location in the new version of the application. Later, <xref ref-type="bibr" rid="B40">Pina et&#x20;al. (2019)</xref> improved system availability by warming up updates. They run old and new versions and perform the update if both versions converge. <xref ref-type="bibr" rid="B38">Neumann et&#x20;al. (2017)</xref> and <xref ref-type="bibr" rid="B19">Gu et&#x20;al. (2018)</xref> later introduced similar systems based on the same principles. <xref ref-type="bibr" rid="B47">&#x160;elajev and Gregersen (2017)</xref> presented a runtime state analysis system to detect runtime issues caused by updating Java applications.</p>
<p>Clearly, many different DSU systems and types of systems exist. In fact, as early as 1993, <xref ref-type="bibr" rid="B45">Segal and Frieder (1993)</xref> have given a summary of on-the-fly updating. Some solutions require a complete restructuring of code to support updates and some even act as an entire operating system. <xref ref-type="bibr" rid="B46">Seifzadeh et&#x20;al. (2013)</xref> have provided a more recent overview of different dynamic software update frameworks and approaches. They have also included a categorization for these updating frameworks and describe the metrics by which the frameworks are then compared. <xref ref-type="bibr" rid="B35">Mugarza et&#x20;al. (2018)</xref> also analyzed the existing DSU techniques focusing on safety and security.</p>
<p>Most approaches require complete control over the environment in which the software is executed. However, developing applications in distributed systems is much more complicated, because process control has been handed to distributed data processing platforms such as Apache Spark (<xref ref-type="bibr" rid="B52">Zaharia et&#x20;al., 2010</xref>), Flink (<xref ref-type="bibr" rid="B53">Carbone et&#x20;al., 2015</xref>), and Storm (<xref ref-type="bibr" rid="B50">Toshniwal et&#x20;al., 2014</xref>) instead of the user code. The platforms handle distribution, scheduling, and execution automatically, and users have only marginal influence in these areas. Implementing the approaches described above would require changes to the distributed processing platforms. However, modifying the existing distributed stream processing frameworks is undesirable as they are meant to act as a general core, where the user applications simply use existing functionality. Solutions should then be found that work within the current control paradigms to also work with the newer version of the application.</p>
</sec>
<sec id="s2-2">
<title>2.2 Updating Distributed Data Processing Pipelines</title>
<p>Updating variables of a pipeline is a common way of adding flexibility to distributed pipelines. <xref ref-type="bibr" rid="B7">Boyce and Leger (2020)</xref> provided a solution on top of Apache Spark for changing variables at the runtime by extending Broadcast variables. In Apache Flink, one can use the CoFlatMapFunction to get two streams of the original data and the parameters streams and assign parameters for each data record or use Apache Zookeeper (<xref ref-type="bibr" rid="B23">Hunt et&#x20;al., 2010</xref>) for storing the configuration and let the Apache Storm application listen for an update. However, all these approaches have two significant limitations, 1) only the parameters can be updated; 2) the updates should be anticipated before launching an application.</p>
<p>Another approach is to use variation points as originally defined for Software Product Lines (SPLs) (<xref ref-type="bibr" rid="B42">Pohl et&#x20;al., 2005</xref>). SPL is a concept where reusable components are created for a domain that can then be composed in multiple ways to develop new products. However, the variation points still need to be defined before launching the software. Overcoming this issue, Dynamic Software Product Lines (DSPLs) (<xref ref-type="bibr" rid="B20">Hallsteinsen et&#x20;al., 2008</xref>; <xref ref-type="bibr" rid="B12">Eichelberger, 2016</xref>) extend SPL to allow the composition of predefined components to be done at runtime.</p>
<p>To apply DSPLs to distributed data processing pipelines, we must first be able to model such pipelines. <xref ref-type="bibr" rid="B5">Berger et&#x20;al. (2014)</xref> and <xref ref-type="bibr" rid="B11">Dhungana et&#x20;al. (2014)</xref> have described approaches to model topological variability, by which they mean connecting components in a specific order and in interconnected hierarchies. These hierarchies then have to be respected during reconfiguration. <xref ref-type="bibr" rid="B44">Qin and Eichelberger (2016)</xref> have previously implemented DSPLs on top of Apache Storm, allowing runtime switching between alternatives, but requiring that they already be implemented at design&#x20;time.</p>
</sec>
<sec id="s2-3">
<title>2.3&#x20;Spark-Dynamic</title>
<p>In a previous work done by the authors (<xref ref-type="bibr" rid="B31">Lazovik et&#x20;al., 2017</xref>), we have investigated the feasibility of dynamically updating the processing pipeline of an Apache Spark application. Apache Spark is one of the most popular big data processing platforms. It is a unified engine providing various operations, including SQL, Machine Learning, Streaming, and Graph Processing. Spark is based on the concept of Resilient Distributed Datasets (RDDs). An RDD is a read-only, distributed collection partitioned into distinct sets distributed over a computing cluster. RDDs form a pipeline which is a DAG where performing an operation over one RDD results in a new RDD. Edges of the Spark DAG are standard operations, e.g., map and reduce, while inside each operation, there is a user-defined function. Internally, the Spark task scheduler uses <italic>scopes</italic> instead of RDDs, where a single operation is represented by one scope but may internally create multiple (temporary)&#x20;RDDs.</p>
<p>The <italic>spark-dynamic</italic> framework is an extension on top of Apache Spark, making it able to update both parameters and functions within pipeline steps during runtime. Variation points can be updated using a REST API, where functions are updated by providing a new byte code. Instead of the fixed alternatives of <xref ref-type="bibr" rid="B44">Qin and Eichelberger (2016)</xref>, new algorithms and new version of algorithms can be used. The extension wraps each operation to pull the updated value for parameters and functions on every invocation. The wrapped methods are named dynamic(Operation) where operation is the original method name. For example, dynamicMap is a wrapper for the map operation.</p>
<p>Apart from updating parameters and functions, <italic>spark-dynamic</italic> can also change data sources on the fly. An intermediate Data Access Layer is introduced to intervene between the Spark processing pipeline and Spark Data Source Relation, which is responsible for preparing the RDD for the requested&#x20;data.</p>
<p>The performance of the prototype was also measured as part of the feasibility study, with promising results (<xref ref-type="bibr" rid="B31">Lazovik et&#x20;al., 2017</xref>). The solutions from this paper are applied on top of this earlier system.</p>
</sec>
<sec id="s2-4">
<title>2.4 Techniques for Building and Checking Pipelines</title>
<p>With research showing the feasibility of modeling and updating distributed processing pipelines, given a distributed computational pipeline with placeholders, we should also be able to automatically select a component for each placeholder to satisfy the goal of the pipeline. To ensure that the newly generated pipeline configuration is valid and, for a running pipeline, that parameter updates do not introduce any errors, we must apply some form of consistency checking. When developers want to introduce an update, they are able to change both objects and functions as long as their signatures stay the same. However, these signatures do not describe all details of these updates. For example, a function may take the same types and number of arguments and yet provide different results. Consider <italic>f</italic>(<italic>a</italic>: <italic>Double</italic>, <italic>b</italic>: <italic>Double</italic>) &#x2192; <italic>a</italic>&#x2217;<italic>b</italic> and <italic>g</italic>(<italic>a</italic>: <italic>Double</italic>, <italic>b</italic>: <italic>Double</italic>) &#x2192; <italic>a</italic>/<italic>b</italic> that share the same signature but should be used differently. Since only signature checking is not enough, we must research other methods of consistency checking. We have focused on the topics of model checking, constraint programming, and automated planning since these techniques are relatively popular (<xref ref-type="bibr" rid="B17">Ghallab et&#x20;al., 2004</xref>) and extensible and do not require the use of complex features such as a cost function or probability calculations.</p>
<p>Model checking is a brute-force method of examining all possible states of a system (<xref ref-type="bibr" rid="B3">Baier and Katoen, 2008</xref>), to determine if, given a program <italic>M</italic> and a specification <italic>h</italic>, the behavior of <italic>M</italic> meets the specification <italic>h</italic> (<xref ref-type="bibr" rid="B13">Emerson, 2008</xref>). The program is represented in specialized languages such as PROMELA or through analysis of source code such as C (<xref ref-type="bibr" rid="B33">Merz, 2001</xref>). The specification can among others be done through Linear Temporal Logic (LTL) and Computation Tree Logic (CTL).</p>
<p>We could use model checking to verify if a proposed pipeline configuration is valid. For the generation of configurations, however, we would need to brute-force the search space; i.e.,&#x20;we would need to iterate over every possible configuration until one is found that satisfies the property specification. This method of evaluation is not very efficient and therefore we would also have to investigate heuristics to speed up the process.</p>
<p>In constraint programming, the behavior of a system is specified through constraints, for example, by restricting the domains of individual variables or imposing constraints on groups of variables (<xref ref-type="bibr" rid="B6">Bockmayr and Hooker, 2005</xref>). This is done by having each subsequent constraint restrict the possible values in a constraint store. This way all possible combinations can be tested and a solution can be given if all&#x20;constraints can be met. Basic constraints exist such as <italic>v</italic>
<sub>1</sub>.gt(<italic>v</italic>
<sub>2</sub>) that defines an arithmetic constraint over two variables <italic>v</italic>
<sub>1</sub> and <italic>v</italic>
<sub>2</sub>, as well as global constraints such as <monospace>model</monospace>
<monospace>.allDifferent</monospace>(<italic>v</italic>
<sub>1</sub>,&#x2026;,<italic>v</italic>
<sub>
<italic>n</italic>
</sub>) (<xref ref-type="bibr" rid="B51">van Hoeve, 2001</xref>) that are defined over sets of variables.</p>
<p>Automated planning is a relatively broad subject, but classical planning is perhaps the most general. Classical planning is based on transition systems. States are connected through transitions, in which an action is applied that actually changes the one state into a consecutive one (<xref ref-type="bibr" rid="B17">Ghallab et&#x20;al., 2004</xref>). An action typically has preconditions and effects. The preconditions are propositions required on a state to be able to apply the action, and the effects are the propositions set on the resulting state. Note that, in literature, an action is typically defined as a ground instance of a planning operator or action template (<xref ref-type="bibr" rid="B16">Ghallab et&#x20;al., 2016</xref>). These operators or templates can include parameters that define the targets on which the preconditions and effects must apply. For simplicity, we will only reason about grounded planning operators in this work, and as such we will use only the term action instead of planning operator. Planning aims to solve a planning problem, which often consists of the transition system, one or more initial states, and one or more goal states.</p>
<p>Different types of transition systems are used, one of which is State Transition Systems (STSs) (<xref ref-type="bibr" rid="B17">Ghallab et&#x20;al., 2004</xref>; <xref ref-type="bibr" rid="B16">Ghallab et&#x20;al., 2016</xref>). In this model, states can be changed not only by actions but also by events, which cannot be controlled. They are defined as &#x3a3; &#x3d; (<italic>S</italic>, <italic>A</italic>, <italic>E</italic>, <italic>&#x3b3;</italic>), with <italic>S</italic> being the set of states, <italic>A</italic> being the set of actions, <italic>E</italic> being the set of events, and the state transition function <italic>&#x3b3;</italic>: <italic>S</italic>&#x20;&#xd7; (<italic>A</italic>&#x20;&#x222a; <italic>E</italic>) &#x2192; 2<sup>
<italic>S</italic>
</sup> (<xref ref-type="bibr" rid="B37">Nau, 2007</xref>). Further, restricted STSs do not allow events; thus, &#x3a3; &#x3d; (<italic>S</italic>, <italic>A</italic>, <italic>&#x3b3;</italic>). In this case, <xref ref-type="bibr" rid="B16">Ghallab et&#x20;al. (2016)</xref> define the transition function as <italic>&#x3b3;</italic>: <italic>S</italic>&#x20;&#xd7; <italic>A</italic>&#x20;&#x2192; <italic>S</italic> since there is only one resulting state for a transition. A planning problem on such a system is defined as <inline-formula id="inf1">
<mml:math id="m1">
<mml:mi mathvariant="script">P</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi mathvariant="normal">&#x3a3;</mml:mi>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>g</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>, with <italic>s</italic>
<sub>0</sub> being the initial state and <italic>g</italic> being the set of goal states.</p>
<p>
<xref ref-type="bibr" rid="B17">Ghallab et&#x20;al. (2004)</xref> distinguish three representations of classical automated planning: set-theoretic representation, classical representation, and state-variable representation.</p>
<p>In the set-theoretic representation, each state is a set of propositions, and each action has propositions that are required to apply the action (preconditions), propositions that will be added to a new state when the action is applied (positive effects), and propositions that will be removed (negative effects). The classical representation is similar to the set-theoretic representation, except states are logical atoms that are either true or false, and actions change the truth values of these atoms. In the state-variable representation, each state contains the values for a set of variables, where different states contain different values for these variables and actions are partial functions that map between these tuples.</p>
<p>These base techniques are often extended. For example, some authors describe the use of model checking to solve planning problems (<xref ref-type="bibr" rid="B9">Cimatti et&#x20;al., 1998</xref>; <xref ref-type="bibr" rid="B18">Giunchiglia and Traverso, 2000</xref>). Similarly, <xref ref-type="bibr" rid="B24">Kaldeli (2013)</xref> describes a planner built through constraint programming, based on earlier work from <xref ref-type="bibr" rid="B29">Lazovik et&#x20;al. (2005)</xref>, <xref ref-type="bibr" rid="B30">Lazovik et&#x20;al. (2006)</xref>. The planner definitions such as the actions and goals are first translated into constraints. Then, the constraints are evaluated using an off-the-shelf Constraint Satisfaction Problem (CSP) solver.</p>
<p>When comparing model checking and automated planning, both techniques have the same time complexity (EXPTIME or NEXPTIME for nondeterministic planning) (<xref ref-type="bibr" rid="B17">Ghallab et&#x20;al., 2004</xref>; <xref ref-type="bibr" rid="B32">Meier et&#x20;al., 2008</xref>). However, instead of the brute-force validation of configurations when using model checking, automated planning can use heuristics to reduce the search space, which could decrease the final planning time. Furthermore, automated planning techniques use proven concepts for the generation of plans which could serve as a basis on top of which we define our pipeline concepts, whereas for model checking, the basic configuration generation algorithms would still need to be designed.</p>
<p>Finally, we compare constraint programming and automated planning. The automated planning concepts are more abstract, separating the planning goal from the state and transition behaviors. Planning has a solid foundation with its transition systems, while constraint programming permits a lot of freedom in the behavior that can be built using constraints. Instead of choosing between these two techniques, we can combine them. We can use constraint programming as a basis to create an automated planner, by implementing the transition systems and other planning concepts using constraints. We then retain the flexibility of constraint programming in case we want to add features to the planner&#x20;later.</p>
</sec>
</sec>
<sec id="s3">
<title>3 General Overview</title>
<p>We designed a system to help the designer plan a pipeline while supporting runtime updates in distributed systems. The proposed system is depicted in <xref ref-type="fig" rid="F2">Figure&#x20;2</xref>. There are three types of user activities: I) submit a new pipeline, II) create or update modules, and III) request replanning the pipeline during runtime. Our design is platform-independent and could be implemented on top of various distributed data processing platforms. The blue components are the bases of almost every such platform, and the green ones are the extra components that we proposed and could be seen as plugins to the existing platforms.</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption>
<p>Conceptual overview of the system.</p>
</caption>
<graphic xlink:href="fdata-04-666174-g002.tif"/>
</fig>
<p>The user does almost everything the same as before, which was developing the code for the designed pipeline describing what each step does and how they are connected. However, we help the user in deciding what each step should do to reach the goal. The user can create a pipeline where the steps of the pipeline contain calls to variation points instead of directly containing user code. The code referenced from the variation points is not fixed and can be updated at runtime. The user can also annotate the pipeline with constraints, such as specifying initial conditions and goals. Finally, the user submits the pipeline code to the pipeline manager.</p>
<p>Apart from the pipeline, the user also submits code that can be executed from the variation points. This code is prepared as a <monospace>PlanningModule</monospace>, which contains the user code as well as metadata such as the function signature and user constraints. These constraints describe the functionality of the module in such a way that the planner can determine if the module is needed to fulfill a pipeline&#x2019;s&#x20;goal.</p>
<p>After submission, the pipeline manager compiles the pipeline code into tasks and pipeline information. The tasks are directly related to pipeline steps which may have fixed functionalities or contain a variation point. The pipeline manager then distributes the tasks among the cluster of workers to execute the tasks. The pipeline manager and workers can be mapped to any platform. For example, they can be a Spark Driver and Workers or Hadoop JobTracker and NodeManagers. Simultaneously, the pipeline manager submits a planning request to the planner. The request contains the pipeline information describing the steps, inputs, desired output, and variation points.</p>
<p>The planner then decides on the modules that need to be placed at each variation point in the pipeline to meet the goals. If there is a feasible assignment, the planner will send it to the coordinator.</p>
<p>At the same time, the workers start executing tasks. For each execution of variation points, the workers will request the coordinator to hand them the respective module to run. The coordinator will then fetch the module from the repository according to the planner assignment or wait until a plan is available. The workers and the coordinator continue to run the pipeline collaboratively, where the worker executes the modules provided by the coordinator for each variation&#x20;point.</p>
<p>The module repository contains <monospace>PlanningModules</monospace> and their different versions. Note that the assignment contains the modules&#x2019; versions, and the coordinator will continue to use the assigned version unless a new assignment arrives.</p>
<p>After the pipeline has started, the user can also manually request the planner to update the assignments. The planner will do the same as before while also fetching the new and updated modules from the repository. If the planner finds a new feasible assignment, it will inform the coordinator about the updates; otherwise, the assignments will remain intact. Any updates to the assignments of variation points must go through the planner to ensure that the pipeline is consistent and the update will not break&#x20;it.</p>
</sec>
<sec id="s4">
<title>4 Planner Design for Pipeline Reconfiguration</title>
<p>The planning process described in this section performs two roles at once, both generating and validating configurations. If we represent the planning problem using constraints, any valid assignment of variables that satisfies all constraints can be regarded as a valid plan. Given an encoding of the planning problem as a CSP, the constraint solver would be able to perform the planning process by attempting each possible configuration of actions. We use classical planning techniques as they have been shown to be appropriate for restricted STSs. We use the state-variable representation for its expressiveness as well as the similarity of concepts with constraint programming.</p>
<sec id="s4-1">
<title>4.1 Core Planning Model</title>
<p>Before we describe how to transform our pipeline information into a planning problem, we must first formulate the problem of pipeline reconfiguration as a planning problem.</p>
<p>We modify the classical definitions presented by <xref ref-type="bibr" rid="B17">Ghallab et&#x20;al. (2004)</xref>; since our planning problem does not contain a single initial state, our transition system can contain branches and joins and our plans must match the topology of the Spark pipeline we are planning for. Thus, we define our planning problem as<disp-formula id="e1">
<mml:math id="m2">
<mml:mi mathvariant="script">P</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi mathvariant="normal">&#x3a3;</mml:mi>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mover>
<mml:mrow>
<mml:mi mathvariant="normal">&#x3a3;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x2243;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mover>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>g</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
<label>(1)</label>
</disp-formula>With the transition system &#x3a3; &#x3d; (<italic>S</italic>, <italic>A</italic>, <italic>&#x3b3;</italic>) and with <inline-formula id="inf2">
<mml:math id="m3">
<mml:mrow>
<mml:mover>
<mml:mrow>
<mml:mi mathvariant="normal">&#x3a3;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x2243;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula> to map the planning problem to our domain, we simplify the notation <italic>s</italic>
<sub>
<italic>m</italic>
</sub> &#x2208; <italic>&#x3b3;</italic>(<italic>s</italic>
<sub>
<italic>n</italic>
</sub>, <italic>a</italic>), that is, the application of action <italic>a</italic> onto <italic>s</italic>
<sub>
<italic>n</italic>
</sub> resulting in <italic>s</italic>
<sub>
<italic>m</italic>
</sub>, as <inline-formula id="inf3">
<mml:math id="m4">
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mover>
<mml:mrow>
<mml:mo>&#x2192;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:mrow>
</mml:mover>
</mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>. Each state in <italic>S</italic> is represented by state variables; i.e.,<disp-formula id="e2">
<mml:math id="m5">
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mo>&#x2200;</mml:mo>
<mml:mi>s</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>:</mml:mo>
<mml:mi>s</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:mi>v</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>V</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>s</mml:mi>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>v</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>v</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>l</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2286;</mml:mo>
<mml:mi>D</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>m</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:math>
<label>(2)</label>
</disp-formula>
</p>
<p>Here, <italic>Vars</italic> is the set of variables in our planning problem, <italic>Dom</italic>(<italic>v</italic>) is the domain of a specific variable <italic>v</italic> (i.e.,&#x20;all possible assignments to an instance of that variable), and <italic>val</italic>(<italic>s</italic>, <italic>v</italic>) represents the value or possible values of the state-variable representing variable <italic>v</italic> in state <italic>s</italic>. <italic>S</italic>
<sub>0</sub> &#x2286; <italic>S</italic> is the set of initial states and <italic>S</italic>
<sub>
<italic>g</italic>
</sub> &#x2286; <italic>S</italic> is the set of goal states. Note also that there are no states before the initial states, i.e.,&#x20;(<italic>&#x2200;s</italic>
<sub>0</sub> &#x2208; <italic>S</italic>
<sub>0</sub>)(<italic>&#x2200;s</italic>
<sub>
<italic>n</italic>
</sub> &#x2208; <italic>S</italic>)(<italic>&#x2200;a</italic> &#x2208; <italic>A</italic>): <italic>s</italic>
<sub>0</sub>&#x2209;<italic>&#x3b3;</italic>(<italic>s</italic>
<sub>
<italic>n</italic>
</sub>, <italic>a</italic>), and there are no states after the goal states, i.e.,&#x20;(<italic>&#x2200;s</italic>
<sub>
<italic>g</italic>
</sub> &#x2208; <italic>S</italic>
<sub>
<italic>g</italic>
</sub>)(<italic>&#x2200;a</italic> &#x2208; <italic>A</italic>): <italic>&#x3b3;</italic>(<italic>s</italic>
<sub>
<italic>g</italic>
</sub>, <italic>a</italic>) &#x3d; &#x2205;.</p>
<p>As in <xref ref-type="bibr" rid="B17">Ghallab et&#x20;al. (2004)</xref>, we define an action as<disp-formula id="e3">
<mml:math id="m6">
<mml:mi>a</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>m</mml:mi>
<mml:mi>e</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>p</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>d</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>e</mml:mi>
<mml:mi>f</mml:mi>
<mml:mi>f</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>s</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
<label>(3)</label>
</disp-formula>
</p>
<p>Supported preconditions and effects are variable equality (both with constant or other variables), constraint conjunction, constraint disjunction, and the negation of these constraints [e.g., <monospace>&#x201c;(a :&#x3d; 1) &#x26;:&#x26; (b !:&#x3d; true)&#x201d;]</monospace>. Similar to (<xref ref-type="bibr" rid="B24">Kaldeli, 2013</xref>), each constraint is a propositional formula over a state variable or a combination of two constraints. To apply <inline-formula id="inf4">
<mml:math id="m7">
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mover>
<mml:mrow>
<mml:mo>&#x2192;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:mrow>
</mml:mover>
</mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>, we must ensure that <italic>s</italic>
<sub>
<italic>n</italic>
</sub> satisfies all preconditions of the action and that the effects of the action can be applied onto <italic>s</italic>
<sub>
<italic>m</italic>
</sub>. We can encode this with a generalized &#x201c;<italic>state</italic> satisfies <italic>constraints</italic>&#x201d; relation, where <italic>s</italic>
<sub>
<italic>n</italic>
</sub> satisfies <italic>precond</italic>(<italic>a</italic>) and <italic>s</italic>
<sub>
<italic>m</italic>
</sub> satisfies <italic>effects</italic>(<italic>a</italic>). We define this relation, using <italic>Cstrs</italic> as the set of all constraints in the planning problem, as<disp-formula id="e4">
<mml:math id="m8">
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mo>&#x2200;</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>C</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>:</mml:mo>
<mml:mi>s</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>e</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>c</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
<label>(4)</label>
</disp-formula>
<disp-formula id="e5">
<mml:math id="m9">
<mml:mtable class="align" columnalign="left">
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mi>s</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>c</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>o</mml:mi>
<mml:mi>p</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>b</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mspace width="-26.0pt"/>
<mml:mfenced open="{" close="">
<mml:mrow>
<mml:mtable class="cases">
<mml:mtr>
<mml:mtd columnalign="left">
<mml:mspace width="-4.0pt"/>
<mml:mfenced open="{" close="">
<mml:mrow>
<mml:mtable class="cases">
<mml:mtr>
<mml:mtd columnalign="left">
<mml:mspace width="-1.0pt"/>
<mml:mi>v</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>l</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>b</mml:mi>
<mml:mo>,</mml:mo>
<mml:mspace width="1em"/>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mtext>if&#x2009;</mml:mtext>
<mml:mi>a</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>V</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>s</mml:mi>
<mml:mo>&#x2227;</mml:mo>
<mml:mi>b</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>D</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>m</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="left">
<mml:mspace width="-1.0pt"/>
<mml:mi>v</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>l</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>v</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>l</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>b</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mspace width="1em"/>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mtext>if&#x2009;</mml:mtext>
<mml:mi>a</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>V</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>s</mml:mi>
<mml:mo>&#x2227;</mml:mo>
<mml:mi>b</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>V</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>s</mml:mi>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mfenced>
<mml:mspace width="1em"/>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mspace width="-5.0pt"/>
<mml:mi>e</mml:mi>
<mml:mi>q</mml:mi>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="left">
<mml:mspace width="-4.0pt"/>
<mml:mfenced open="{" close="">
<mml:mrow>
<mml:mtable class="cases">
<mml:mtr>
<mml:mtd columnalign="left">
<mml:mspace width="-1.0pt"/>
<mml:mi>v</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>l</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2260;</mml:mo>
<mml:mi>b</mml:mi>
<mml:mo>,</mml:mo>
<mml:mspace width="1em"/>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mtext>if&#x2009;</mml:mtext>
<mml:mi>a</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>V</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>s</mml:mi>
<mml:mo>&#x2227;</mml:mo>
<mml:mi>b</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>D</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>m</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="left">
<mml:mspace width="-1.0pt"/>
<mml:mi>v</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>l</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2260;</mml:mo>
<mml:mi>v</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>l</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>b</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mspace width="1em"/>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mtext>if&#x2009;</mml:mtext>
<mml:mi>a</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>V</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>s</mml:mi>
<mml:mo>&#x2227;</mml:mo>
<mml:mi>b</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>V</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>s</mml:mi>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mfenced>
<mml:mspace width="1em"/>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mspace width="-5.0pt"/>
<mml:mi>n</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>q</mml:mi>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="left">
<mml:mspace width="-4.0pt"/>
<mml:mfenced open="{" close="">
<mml:mrow>
<mml:mtable class="cases">
<mml:mtr>
<mml:mtd columnalign="left">
<mml:mspace width="-1.0pt"/>
<mml:mi>s</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2227;</mml:mo>
<mml:mi>s</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>b</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mspace width="1em"/>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mspace width="-5.0pt"/>
<mml:mtext>if&#x2009;</mml:mtext>
<mml:mi>a</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>C</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>s</mml:mi>
<mml:mo>&#x2227;</mml:mo>
<mml:mi>b</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>C</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>s</mml:mi>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mfenced>
<mml:mspace width="1em"/>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mspace width="-5.0pt"/>
<mml:mi>a</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>d</mml:mi>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="left">
<mml:mspace width="-4.0pt"/>
<mml:mfenced open="{" close="">
<mml:mrow>
<mml:mtable class="cases">
<mml:mtr>
<mml:mtd columnalign="left">
<mml:mspace width="-1.0pt"/>
<mml:mi>s</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2228;</mml:mo>
<mml:mi>s</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>b</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mspace width="1em"/>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mspace width="-5.0pt"/>
<mml:mtext>if&#x2009;</mml:mtext>
<mml:mi>a</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>C</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>s</mml:mi>
<mml:mo>&#x2227;</mml:mo>
<mml:mi>b</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>C</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>s</mml:mi>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mfenced>
<mml:mspace width="1em"/>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mspace width="-5.0pt"/>
<mml:mi>o</mml:mi>
<mml:mi>r</mml:mi>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="left">
<mml:mspace width="-4.0pt"/>
<mml:mfenced open="{" close="">
<mml:mrow>
<mml:mtable class="cases">
<mml:mtr>
<mml:mtd columnalign="left">
<mml:mspace width="-3.0pt"/>
<mml:mo>&#xac;</mml:mo>
<mml:mi>s</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2228;</mml:mo>
<mml:mo>&#xac;</mml:mo>
<mml:mi>s</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>b</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mspace width="1em"/>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mspace width="-5.0pt"/>
<mml:mtext>if&#x2009;</mml:mtext>
<mml:mi>a</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>C</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>s</mml:mi>
<mml:mo>&#x2227;</mml:mo>
<mml:mi>b</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>C</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>s</mml:mi>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mfenced>
<mml:mspace width="1em"/>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mspace width="-5.0pt"/>
<mml:mi>n</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>d</mml:mi>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="left">
<mml:mspace width="-4.0pt"/>
<mml:mfenced open="{" close="">
<mml:mrow>
<mml:mtable class="cases">
<mml:mtr>
<mml:mtd columnalign="left">
<mml:mspace width="-3.0pt"/>
<mml:mo>&#xac;</mml:mo>
<mml:mi>s</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2227;</mml:mo>
<mml:mo>&#x00ac;</mml:mo>
<mml:mi>s</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>b</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mspace width="1em"/>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mspace width="-5.0pt"/>
<mml:mtext>if&#x2009;</mml:mtext>
<mml:mi>a</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>C</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>s</mml:mi>
<mml:mo>&#x2227;</mml:mo>
<mml:mi>b</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>C</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>s</mml:mi>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mfenced>
<mml:mspace width="1em"/>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mspace width="-5.0pt"/>
<mml:mi>n</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>r</mml:mi>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mfenced>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="right"/>
</mml:mtr>
</mml:mtable>
</mml:math>
<label>(5)</label>
</disp-formula>where a constraint is described by a source <italic>a</italic>, a target <italic>b</italic>, and an operation <italic>op</italic>, with <italic>op</italic> translated from a constraint as defined by <xref ref-type="table" rid="T1">Table&#x20;1</xref>.</p>
<table-wrap id="T1" position="float">
<label>TABLE 1</label>
<caption>
<p>Mapping between constraints and their representation in the planning problem as <italic>op</italic>.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Op</th>
<th align="center">Constraint</th>
<th align="center">Constraint type</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td rowspan="2" align="left">eq</td>
<td align="center">
<monospace>a &#x3d;:&#x3d; b</monospace>
</td>
<td align="left">Precondition</td>
</tr>
<tr>
<td align="center">
<monospace>a :&#x3d; b</monospace>
</td>
<td align="left">Effect</td>
</tr>
<tr>
<td rowspan="2" align="left">neq</td>
<td align="center">
<monospace>a !:&#x3d; b</monospace>
</td>
<td align="left">Precondition</td>
</tr>
<tr>
<td align="center">
<monospace>a !:&#x3d; b</monospace>
</td>
<td align="left">Effect</td>
</tr>
<tr>
<td align="left">and</td>
<td align="center">
<monospace>a &#x26;:&#x26; b</monospace>
</td>
<td align="left">Both</td>
</tr>
<tr>
<td align="left">or</td>
<td align="center">
<inline-formula id="inf5">
<mml:math id="m10">
<mml:mi mathvariant="monospace">a</mml:mi>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mo>:</mml:mo>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi mathvariant="monospace">b</mml:mi>
</mml:math>
</inline-formula>
</td>
<td align="left">Both</td>
</tr>
<tr>
<td align="left">nand</td>
<td align="center">
<monospace>(not) a &#x26;:&#x26; b</monospace>
</td>
<td align="left">Both</td>
</tr>
<tr>
<td align="left">nor</td>
<td align="center">
<inline-formula id="inf6">
<mml:math id="m11">
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi mathvariant="monospace">n</mml:mi>
<mml:mi mathvariant="monospace">o</mml:mi>
<mml:mi mathvariant="monospace">t</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mi mathvariant="monospace">a</mml:mi>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mo>:</mml:mo>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi mathvariant="monospace">b</mml:mi>
</mml:math>
</inline-formula>
</td>
<td align="left">Both</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>We also formulate the <italic>frame axiom</italic> similar to <xref ref-type="bibr" rid="B4">Bart&#xe1;k et&#x20;al. (2010)</xref> and <xref ref-type="bibr" rid="B24">Kaldeli (2013)</xref>, which specifies that, unless the action being applied on a state modifies a state variable, the state variable will remain the same in the state following that action:<disp-formula id="e6">
<mml:math id="m12">
<mml:mtable class="aligned">
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mo>&#x2200;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mo>&#x2200;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mo>&#x2200;</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>V</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mo>&#x2200;</mml:mo>
<mml:mi>a</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>:</mml:mo>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="right">
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>&#x3b3;</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mspace width="0.28em"/>
<mml:mo>&#x21d2;</mml:mo>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mi>v</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>l</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>v</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>l</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="right"/>
<mml:mtd columnalign="left">
<mml:mo>&#x2228;</mml:mo>
<mml:mi>v</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>f</mml:mi>
<mml:mi>f</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>d</mml:mi>
<mml:mi>B</mml:mi>
<mml:mi>y</mml:mi>
<mml:mi>A</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>n</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
<label>(6)</label>
</disp-formula>
<disp-formula id="e7">
<mml:math id="m13">
<mml:mi>v</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>f</mml:mi>
<mml:mi>f</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>d</mml:mi>
<mml:mi>B</mml:mi>
<mml:mi>y</mml:mi>
<mml:mi>A</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>n</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2261;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mo>&#x2203;</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>e</mml:mi>
<mml:mi>f</mml:mi>
<mml:mi>f</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>s</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>:</mml:mo>
<mml:mi>c</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>f</mml:mi>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
<label>(7)</label>
</disp-formula>
<disp-formula id="e8">
<mml:math id="m14">
<mml:mtable class="align" columnalign="left">
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mi>c</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>o</mml:mi>
<mml:mi>p</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>b</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mi>a</mml:mi>
<mml:mi>f</mml:mi>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mfenced open="{" close="">
<mml:mrow>
<mml:mtable class="cases">
<mml:mtr>
<mml:mtd columnalign="left">
<mml:mi>a</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo>,</mml:mo>
<mml:mspace width="1em"/>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mtext>if&#x2009;</mml:mtext>
<mml:mi>a</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>V</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>s</mml:mi>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="left">
<mml:mi>b</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo>,</mml:mo>
<mml:mspace width="1em"/>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mtext>if&#x2009;</mml:mtext>
<mml:mi>b</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>V</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>s</mml:mi>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="left">
<mml:mi>a</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>f</mml:mi>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2009;</mml:mo>
<mml:mo>&#x2228;</mml:mo>
<mml:mo>&#x2009;</mml:mo>
<mml:mi>b</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>f</mml:mi>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mspace width="1em"/>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mtext>if&#x2009;</mml:mtext>
<mml:mi>o</mml:mi>
<mml:mi>p</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>d</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>o</mml:mi>
<mml:mi>r</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>n</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>d</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>n</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>r</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="left">
<mml:mi>f</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>l</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>e</mml:mi>
<mml:mo>,</mml:mo>
<mml:mspace width="1em"/>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mtext>otherwise</mml:mtext>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mfenced>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
<label>(8)</label>
</disp-formula>With the base planning model fully described, we extend it below to support the planning of pipelines.</p>
</sec>
<sec id="s4-2">
<title>4.2 Mapping to the Distributed Pipeline</title>
<p>Since the plans we generate must be applied onto a fixed Spark pipeline topology, we cannot simply generate any plan that satisfies the goal constraints. Instead, we must relate our STS (<italic>State</italic> &#x2192; <italic>Action</italic> &#x2192; <italic>State</italic>) to the Spark DAG (<italic>Scope</italic> &#x2192; <italic>Operation</italic> &#x2192; <italic>Scope</italic>). To implement this mapping, we first construct a separate transition system to which the final plan must be isomorphic. That is, any relation in the STS should be represented in the DAG and vice versa. We define this isomorphic system as<disp-formula id="e9">
<mml:math id="m15">
<mml:mrow>
<mml:mover>
<mml:mrow>
<mml:mi mathvariant="normal">&#x3a3;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x2243;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mover>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>R</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>T</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>r</mml:mi>
<mml:mi>d</mml:mi>
<mml:mi>d</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>V</mml:mi>
<mml:mi>P</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>v</mml:mi>
<mml:mi>p</mml:mi>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mover>
<mml:mrow>
<mml:mi>&#x3b3;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x2243;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mover>
</mml:mrow>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
<label>(9)</label>
</disp-formula>
</p>
<p>We describe the parts of this equation in this section. The set <italic>R</italic> represents the RDD <italic>scopes</italic> in the pipeline, and <italic>T</italic> represents the Spark operations that transform one RDD scope to a new scope. The transition function <inline-formula id="inf7">
<mml:math id="m16">
<mml:mrow>
<mml:mover>
<mml:mrow>
<mml:mi>&#x3b3;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x2243;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mover>
</mml:mrow>
<mml:mo>:</mml:mo>
<mml:mi>R</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:mi>T</mml:mi>
<mml:mo>&#x2192;</mml:mo>
<mml:mi>R</mml:mi>
</mml:math>
</inline-formula> describes how the transitions apply to the RDD scopes.</p>
<p>Besides adding constraints to actions, we also allow constraints to be added to the pipeline topology itself, both manually by the user and automatically inferred from the pipeline topology, such as the datatype of the initial RDD. Preconditions added by the user are applied to the scope they are defined on, while effects are applied to the scope(s) following&#x20;it.</p>
<p>We connect these transition systems through the relation <italic>rdd</italic>: <italic>S</italic>&#x20;&#x2192; <italic>R</italic>, where<disp-formula id="e10">
<mml:math id="m17">
<mml:mtable class="aligned">
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mo>&#x2200;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mo>&#x2200;</mml:mo>
<mml:mi>a</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mo>&#x2200;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>:</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>&#x3b3;</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mspace width="0.28em"/>
<mml:mo>&#x21d2;</mml:mo>
<mml:mspace width="0.28em"/>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mo>&#x2203;</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>T</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mo>&#x2203;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>R</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mo>&#x2203;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>R</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>:</mml:mo>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="right"/>
<mml:mtd columnalign="left">
<mml:msub>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>r</mml:mi>
<mml:mi>d</mml:mi>
<mml:mi>d</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="right"/>
<mml:mtd columnalign="left">
<mml:mo>&#x2227;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>r</mml:mi>
<mml:mi>d</mml:mi>
<mml:mi>d</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="right"/>
<mml:mtd columnalign="left">
<mml:mo>&#x2227;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mover>
<mml:mrow>
<mml:mi>&#x03B3;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x2243;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="right"/>
<mml:mtd columnalign="left">
<mml:mo>&#x2227;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mtext>&#x2009;satisfies&#x2009;</mml:mtext>
<mml:mi>e</mml:mi>
<mml:mi>f</mml:mi>
<mml:mi>f</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>s</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x222a;</mml:mo>
<mml:mi>p</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>d</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
<label>(10)</label>
</disp-formula>
</p>
<p>
<xref ref-type="fig" rid="F3">Figure&#x20;3</xref> shows an example mapping between planning transition system &#x3a3; and a Spark DAG <inline-formula id="inf8">
<mml:math id="m18">
<mml:mrow>
<mml:mover>
<mml:mrow>
<mml:mi mathvariant="normal">&#x3a3;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x2243;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula>, based on the scenario from <xref ref-type="sec" rid="s1">Section 1</xref>. In this example, <italic>&#x3b3;</italic> for some states contains multiple applicable actions (e.g., <italic>a</italic>
<sub>1</sub> and <italic>a</italic>
<sub>2</sub> can both be applied from the first state) or multiple possible result states based on an action (e.g., <italic>a</italic>
<sub>1</sub> from the first state leads to two next states if an <italic>or-constraint</italic> was used).</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption>
<p>Example mapping between a Spark DAG <inline-formula id="inf9">
<mml:math id="m19">
<mml:mrow>
<mml:mover>
<mml:mrow>
<mml:mi mathvariant="normal">&#x3a3;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x2243;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula> <bold>(bottom)</bold> and STS &#x3a3; <bold>(top)</bold>.</p>
</caption>
<graphic xlink:href="fdata-04-666174-g003.tif"/>
</fig>
<p>Since we only support planning for our variation points, we do not include in our planning problem any scopes that do not contain a variation point. On the other hand, we add a scope at the end of the pipeline so that we can attach the goal constraints, even if the final action of the pipeline does not contain a variation&#x20;point.</p>
<p>We also add special handling for join operations. We show this in <xref ref-type="fig" rid="F3">Figure&#x20;3</xref> as a transition function in the form of <italic>&#x3b3;</italic>(<italic>s</italic>
<sub>
<italic>n</italic>
</sub>, <italic>i</italic>
<sub>
<italic>n</italic>
</sub>). These transitions are applied when there is a transition in the Spark DAG that the user has no control over. For example, when joining two RDDs, the user cannot provide their own function that will be applied during the join. Nevertheless, we want to encode these transitions in order to accurately represent the pipeline. We introduce these <italic>implicit transitions</italic> in <xref ref-type="sec" rid="s4-2-3">Section&#x20;4.2.3</xref>.</p>
<p>We use the relation <italic>vp</italic>: <italic>T</italic>&#x20;&#x2192; <italic>VP</italic> to map our transitions to variation points. This allows us to take into account additional constraints relating to the topology, such as the input/output combinations from <xref ref-type="table" rid="T2">Table&#x20;2</xref>, which we will discuss in <xref ref-type="sec" rid="s4-2-1">Section 4.2.1</xref>. Additionally, we encode a constraint that ensures that if a single variation point is used in multiple transitions in a pipeline, the actions applied in those transitions are the same. This is done because one variation point can only hold a single <monospace>PlanningModule</monospace> and therefore can only be assigned a single action.<disp-formula id="e11">
<mml:math id="m20">
<mml:mtable class="aligned">
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mo>&#x2200;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mo>&#x2200;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mo>&#x2203;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>T</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>:</mml:mo>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>&#x3b3;</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mo>&#x2227;</mml:mo>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>&#x3b3;</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mtd>
<mml:mtd columnalign="right"/>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mo>&#x2227;</mml:mo>
<mml:mspace width="0.3333em"/>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mi>r</mml:mi>
<mml:mi>d</mml:mi>
<mml:mi>d</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mover>
<mml:mrow>
<mml:mi>&#x3b3;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x2243;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>r</mml:mi>
<mml:mi>d</mml:mi>
<mml:mi>d</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mtd>
<mml:mtd columnalign="right"/>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mo>&#x2227;</mml:mo>
<mml:mspace width="0.3333em"/>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mi>r</mml:mi>
<mml:mi>d</mml:mi>
<mml:mi>d</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mover>
<mml:mrow>
<mml:mi>&#x3b3;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x2243;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>r</mml:mi>
<mml:mi>d</mml:mi>
<mml:mi>d</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mtd>
<mml:mtd columnalign="right"/>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mo>&#x2227;</mml:mo>
<mml:mspace width="0.3333em"/>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mi>v</mml:mi>
<mml:mi>p</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>v</mml:mi>
<mml:mi>p</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mtd>
<mml:mtd columnalign="right"/>
<mml:mtd columnalign="left">
<mml:mspace width="0.28em"/>
<mml:mo>&#x21d2;</mml:mo>
<mml:mspace width="0.28em"/>
<mml:msub>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
<label>(11)</label>
</disp-formula>
</p>
<table-wrap id="T2" position="float">
<label>TABLE 2</label>
<caption>
<p>Operation shape categories, where T and U are placeholders for some type and (T, U) represents a tuple of type T and U.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th colspan="2" align="center">RDD</th>
<th colspan="2" align="center">Function</th>
<th rowspan="2" align="center">Example operation</th>
<th rowspan="2" align="center">Compatible categories</th>
</tr>
<tr>
<th align="center">Pre</th>
<th align="center">Post</th>
<th align="center">In</th>
<th align="center">Out</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td rowspan="4" align="center">T</td>
<td rowspan="4" align="center">U</td>
<td rowspan="4" align="center">T</td>
<td rowspan="4" align="center">U</td>
<td rowspan="4" align="center">map</td>
<td align="center">OneToOne</td>
</tr>
<tr>
<td align="center">OneToPair</td>
</tr>
<tr>
<td align="center">PairToOne</td>
</tr>
<tr>
<td align="center">PairToPair</td>
</tr>
<tr>
<td rowspan="2" align="center">T</td>
<td rowspan="2" align="center">T</td>
<td rowspan="2" align="center">T</td>
<td rowspan="2" align="center">Boolean</td>
<td rowspan="2" align="center">filter</td>
<td align="center">OneToOne<xref ref-type="table-fn" rid="Tfn1">
<sup>a</sup>
</xref>
</td>
</tr>
<tr>
<td align="center">PairToOne<xref ref-type="table-fn" rid="Tfn1">
<sup>a</sup>
</xref>
</td>
</tr>
<tr>
<td rowspan="2" align="center">T</td>
<td rowspan="2" align="center">T</td>
<td rowspan="2" align="center">(T, T)</td>
<td rowspan="2" align="center">T</td>
<td rowspan="2" align="center">reduce</td>
<td align="center">PairToOne</td>
</tr>
<tr>
<td align="center">PairToPair<xref ref-type="table-fn" rid="Tfn2">
<sup>b</sup>
</xref>
</td>
</tr>
<tr>
<td rowspan="2" align="center">(T,U)</td>
<td rowspan="2" align="center">(T,U)</td>
<td rowspan="2" align="center">(U, U)</td>
<td rowspan="2" align="center">U</td>
<td rowspan="2" align="center">reduceByKey</td>
<td align="center">PairToOne</td>
</tr>
<tr>
<td align="center">PairToPair<xref ref-type="table-fn" rid="Tfn3">
<sup>c</sup>
</xref>
</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="Tfn1">
<label>a</label>
<p>If action output is of type <italic>Boolean</italic>.</p>
</fn>
<fn id="Tfn2">
<label>b</label>
<p>If T is also a&#x20;tuple.</p>
</fn>
<fn id="Tfn3">
<label>c</label>
<p>If U is also a&#x20;tuple.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<p>Expanding our definition of the transition function <inline-formula id="inf10">
<mml:math id="m21">
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mover>
<mml:mrow>
<mml:mo>&#x2192;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:mrow>
</mml:mover>
</mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula> with the isomorphism requirement, we get<disp-formula id="e12">
<mml:math id="m22">
<mml:mtable class="aligned">
<mml:mtr>
<mml:mtd columnalign="right">
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mover>
<mml:mrow>
<mml:mo>&#x2192;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:mrow>
</mml:mover>
</mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mspace width="0.28em"/>
<mml:mo>&#x21d4;</mml:mo>
<mml:mspace width="0.28em"/>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mtext>&#x2009;satisfies&#x2009;</mml:mtext>
<mml:mi>p</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>d</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mo>&#x2227;</mml:mo>
<mml:mspace width="0.3333em"/>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mtext>&#x2009;satisfies&#x2009;</mml:mtext>
<mml:mi>e</mml:mi>
<mml:mi>f</mml:mi>
<mml:mi>f</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>s</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mo>&#x2227;</mml:mo>
<mml:mspace width="0.3333em"/>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mo>&#x2200;</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>V</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>:</mml:mo>
<mml:mi>v</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>l</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>v</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>l</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="right"/>
<mml:mtd columnalign="left">
<mml:mspace width="2em"/>
<mml:mspace width="2em"/>
<mml:mo>&#x2228;</mml:mo>
<mml:mi>v</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>f</mml:mi>
<mml:mi>f</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>d</mml:mi>
<mml:mi>B</mml:mi>
<mml:mi>y</mml:mi>
<mml:mi>A</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>n</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mo>&#x2227;</mml:mo>
<mml:mspace width="0.3333em"/>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mo>&#x2203;</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>T</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>:</mml:mo>
<mml:mi>r</mml:mi>
<mml:mi>d</mml:mi>
<mml:mi>d</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mover>
<mml:mrow>
<mml:mi>&#x3b3;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x2243;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>r</mml:mi>
<mml:mi>d</mml:mi>
<mml:mi>d</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mo>&#x2227;</mml:mo>
<mml:mspace width="0.3333em"/>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mtext>&#x2009;satisfies&#x2009;</mml:mtext>
<mml:mi>e</mml:mi>
<mml:mi>f</mml:mi>
<mml:mi>f</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>s</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>r</mml:mi>
<mml:mi>d</mml:mi>
<mml:mi>d</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mo>&#x2227;</mml:mo>
<mml:mspace width="0.3333em"/>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mtext>&#x2009;satisfies&#x2009;</mml:mtext>
<mml:mi>p</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>d</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>r</mml:mi>
<mml:mi>d</mml:mi>
<mml:mi>d</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
<label>(12)</label>
</disp-formula>
</p>
<sec id="s4-2-1">
<title>4.2.1 Data and Operation Types</title>
<p>An important property of Apache Spark pipelines and of most distributed data processing pipelines in general is typed data. Each vertex in the DAG has a type, and operations can change the data type. We, therefore, add <italic>Type</italic> as a variable to each planning state, and actions can change this type from one state to the next. In order to properly represent the intricacies of a Spark pipeline, we treat the <italic>Type</italic> variable differently from other state variables. In the first place, this is because we can infer the appropriate type constraints from the provided actions and pipeline topology. A second reason is that Spark treats RDDs differently depending on if they contain a single value or a tuple of two values (a value pair). For example, some operations (such as reduceByKey) require an RDD with a value pair to be applicable, while for others, it does not matter. Furthermore, some operations (such as map and reduce) can transform between these categories, while others require the input/output multiplicity to be the same (such as filter). In our planner, we must therefore distinguish between these input/output categories, labeled <monospace>OneToOne, OneToPair, PairToPair,</monospace> and <monospace>PairToOne</monospace>. We store this category in the variation point and then ensure only appropriate actions are selected, with &#x201c;categoryOf&#x201d; as a lookup for the category:<disp-formula id="e13">
<mml:math id="m23">
<mml:mtable class="aligned">
<mml:mtr>
<mml:mtd columnalign="right">
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mover>
<mml:mrow>
<mml:mo>&#x2192;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:mrow>
</mml:mover>
</mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mspace width="0.28em"/>
<mml:mo>&#x21d2;</mml:mo>
<mml:mspace width="0.28em"/>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mo>&#x2203;</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>T</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>:</mml:mo>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="right"/>
<mml:mtd columnalign="left">
<mml:mi>r</mml:mi>
<mml:mi>d</mml:mi>
<mml:mi>d</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mover>
<mml:mrow>
<mml:mi>&#x3b3;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x2243;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>r</mml:mi>
<mml:mi>d</mml:mi>
<mml:mi>d</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="right"/>
<mml:mtd columnalign="left">
<mml:mo>&#x2227;</mml:mo>
<mml:mo>&#x2009;</mml:mo>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mo>&#x2009;</mml:mo>
<mml:mi>v</mml:mi>
<mml:mi>p</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="right"/>
<mml:mtd columnalign="left">
<mml:mo>&#x2227;</mml:mo>
<mml:mtext>&#x2009;categoryOf&#x2009;</mml:mtext>
<mml:mi>v</mml:mi>
<mml:mi>p</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mtext>&#x2009;categoryOf&#x2009;</mml:mtext>
<mml:mi>a</mml:mi>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
<label>(13)</label>
</disp-formula>
</p>
<p>Apart from the RDD types, the applicability of an action is also dependent on the logic of the operation itself. For example, consider an RDD of type T. When you apply a <monospace>plannedFilter</monospace> operation, the filter function receives the same type T as input but gives a <monospace>Boolean</monospace> as output. However, Spark uses that result to filter the dataset and returns an RDD that still contains the same T type. The different combinations used by Spark are listed in <xref ref-type="table" rid="T2">Table&#x20;2</xref>.</p>
<p>To be able to handle all required type restrictions, we have split our <italic>Type</italic> variable into a <italic>TypeLeft</italic> and a <italic>TypeRight</italic> variable, representing the left and right elements of a tuple, respectively. For types that are not tuples, only the value for <italic>TypeLeft</italic> will be set, and <italic>TypeRight</italic> will be set to a null value. This allows us to specify constraints based on only one part of the data type, for example, with <italic>TypeLeft</italic> &#x3d; <italic>Boolean</italic> for plannedFilter.</p>
<p>To distinguish between the RDD type and Function type from <xref ref-type="table" rid="T2">Table&#x20;2</xref>, type information is encoded differently depending&#x20;on the&#x20;Spark operation used. For example, for <monospace>plannedReduceByKey</monospace> with a <monospace>PairToOne</monospace> module, we encode our constraints as<disp-formula id="e14">
<mml:math id="m24">
<mml:mtable class="aligned">
<mml:mtr>
<mml:mtd columnalign="right">
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mover>
<mml:mrow>
<mml:mo>&#x2192;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:mrow>
</mml:mover>
</mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mspace width="0.28em"/>
<mml:mo>&#x21d2;</mml:mo>
<mml:mspace width="0.28em"/>
<mml:mi>v</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>l</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>T</mml:mi>
<mml:mi>y</mml:mi>
<mml:mi>p</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>L</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>f</mml:mi>
<mml:mi>t</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mo>&#x3d;</mml:mo>
<mml:mi>v</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>l</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>T</mml:mi>
<mml:mi>y</mml:mi>
<mml:mi>p</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>L</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>f</mml:mi>
<mml:mi>t</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mo>&#x2227;</mml:mo>
<mml:mi>v</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>l</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>T</mml:mi>
<mml:mi>y</mml:mi>
<mml:mi>p</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>R</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>g</mml:mi>
<mml:mi>h</mml:mi>
<mml:mi>t</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mo>&#x3d;</mml:mo>
<mml:mi>v</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>l</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>T</mml:mi>
<mml:mi>y</mml:mi>
<mml:mi>p</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>R</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>g</mml:mi>
<mml:mi>h</mml:mi>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="right"/>
<mml:mtd columnalign="left">
<mml:mo>&#x3d;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>L</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>f</mml:mi>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="right"/>
<mml:mtd columnalign="left">
<mml:mo>&#x3d;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>R</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>g</mml:mi>
<mml:mi>h</mml:mi>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="right"/>
<mml:mtd columnalign="left">
<mml:mo>&#x3d;</mml:mo>
<mml:mi>o</mml:mi>
<mml:mi>u</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>L</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>f</mml:mi>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
<label>(14)</label>
</disp-formula>with <italic>inLeft</italic>(<italic>a</italic>) and <italic>inRight</italic>(<italic>a</italic>) describing the input types for action <italic>a</italic> and <italic>outLeft</italic>(<italic>a</italic>) describing the output type for <italic>a</italic>. Since <italic>a</italic> represents a PairToOne module, <italic>outRight</italic>(<italic>a</italic>) would return &#x2205;.</p>
</sec>
<sec id="s4-2-2">
<title>4.2.2&#x20;No-Operation</title>
<p>Some pipeline topologies may contain more operations than required to fulfill the planning goal. We introduce the no-op action <italic>&#x3c4;</italic> (<xref ref-type="bibr" rid="B39">De Nicola and Vaandrager, 1990</xref>; <xref ref-type="bibr" rid="B17">Ghallab et&#x20;al., 2004</xref>) so that we can still assign an action to every variation point in the pipeline, while not performing unnecessary computations. We create both a <monospace>NoOpOneToOne</monospace> and a <monospace>NoOpPairToPair</monospace> action but do not include no-op actions for the <monospace>OneToPair</monospace> and <monospace>PairToOne</monospace> categories as performing any operation on them would not be idempotent.</p>
</sec>
<sec id="s4-2-3">
<title>4.2.3 Join States</title>
<p>Finally, we must support pipelines that contain join operations such as the union of two datasets. To simplify our planner, we restrict our implementation to join operations that do not change&#x20;the datatype. We initially support the <monospace>union</monospace> and <monospace>intersection</monospace> operations.</p>
<p>To illustrate why we need to explicitly add support for join operations, consider the example illustrated in <xref ref-type="fig" rid="F4">Figure&#x20;4</xref>. Here, two states with conflicting state variables are joined. For variable <italic>y</italic>, it is clear that, in <italic>s</italic>
<sub>4</sub>, the state variable <italic>y</italic>&#x20;&#x3d; 2. However, for variable <italic>x</italic>, it could be true that either <italic>x</italic>&#x20;&#x3d; 0 or <italic>x</italic>&#x20;&#x3d;&#x20;1.</p>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption>
<p>Three views of an example transition system with a join state and conflicting state variables. Variable <italic>x</italic> in <italic>s</italic>
<sub>4</sub> could have multiple values.</p>
</caption>
<graphic xlink:href="fdata-04-666174-g004.tif"/>
</fig>
<p>There are two strategies to resolve this conflict:<list list-type="simple">
<list-item>
<p>&#x2022; Require equality</p>
<list list-type="simple">
<list-item>
<p>If we add a constraint requiring both values to be equal [e.g., <italic>val</italic>(<italic>s</italic>
<sub>3</sub>, <italic>v</italic>) &#x3d; <italic>val</italic>(<italic>s</italic>
<sub>1</sub>, <italic>v</italic>) &#x3d; <italic>val</italic>(<italic>s</italic>
<sub>2</sub>, <italic>v</italic>)], there will be no uncertain values in <italic>s</italic>
<sub>4</sub>. The actions applied onto <italic>s</italic>
<sub>1</sub> or <italic>s</italic>
<sub>2</sub> will have to be changed so that all state variables end up with the same values, in order to generate a valid&#x20;plan.</p>
</list-item>
</list>
</list-item>
<list-item>
<p>&#x2022; Accept either</p>
<list list-type="simple">
<list-item>
<p>By accepting either value, we say <italic>val</italic>(<italic>s</italic>
<sub>3</sub>, <italic>v</italic>) &#x3d; <italic>val</italic>(<italic>s</italic>
<sub>1</sub>, <italic>v</italic>) &#x2228;&#x20;<italic>val</italic>(<italic>s</italic>
<sub>3</sub>, <italic>v</italic>) &#x3d; <italic>val</italic>(<italic>s</italic>
<sub>2</sub>, <italic>v</italic>). This introduces nondeterministic behavior in the steps following <italic>s</italic>
<sub>3</sub>. We do not apply this functionality to the <italic>Type</italic> variable, as the RDD resulting from the join must always be set to a single data&#x20;type.</p>
</list-item>
</list>
</list-item>
</list>
</p>
<p>The &#x201c;require equality&#x201d; strategy can be too rigid for certain pipelines, while the &#x201c;accept either&#x201d; strategy can greatly enlarge the search space of our planner since there are more possibilities to try. We, therefore, allow the user to set the resolution strategy for each join individually.</p>
<p>Since join operations do not allow custom code, we should not allow the selection of any action for these operations. We, therefore, modify the STS to include <italic>implicit transitions</italic> that reach the next state without an action being applied. We could redefine <italic>&#x3b3;</italic> to support this but for simplicity, we instead define an implicit transition as a regular transition that always only applies the no-op action.</p>
<p>We also add <italic>intermediary states</italic> on which constraints of the transitions before the join states are encoded. If we directly encode the constraints onto the join state, both sets of constraints must always hold, and the &#x201c;accept either&#x201d; strategy would not be respected. Instead, the intermediary states allow us to evaluate the strategy over the intermediate variables of each intermediary state, represented as the white boxes in <xref ref-type="fig" rid="F4">Figure&#x20;4A</xref>.</p>
</sec>
</sec>
<sec id="s4-3">
<title>4.3 Planner Representation as CSP</title>
<p>Now that we have a formal description of our planning model; we can describe how we have implemented it. We have used the Java-based Choco-solver (<xref ref-type="bibr" rid="B43">Prud&#x2019;homme et&#x20;al., 2017</xref>) library to implement our CSP-based planner, which has good interoperability with Scala.</p>
<p>We encode each &#x201c;state variable&#x201d; as a CSP variable [e.g., <italic>val</italic>(<italic>s</italic>
<sub>1</sub>, <italic>y</italic>) is encoded as variable <italic>y@</italic>1]. The domain of the variable is the set of possible assignments found in the planning problem. All values are converted to integers in the CSP. For the <italic>Type</italic> variables, we first convert a type name into a fully qualified name [e.g., Seq(String) becomes &#x201c;scala.collection.Seq(java.lang.String)&#x201d;], and for each unique name, we assign a unique integer in the domain. The actions applied on transitions are tracked through &#x201c;action variables&#x201d; (e.g., <italic>a</italic>1<italic>@</italic>1 &#x2192; 3), where a value of <italic>true</italic> indicates that the action (in this case <italic>a</italic>1) is applied in that transition (in this case 1 &#x2192; 3). <xref ref-type="fig" rid="F4">Figure&#x20;4C</xref> shows a CSP encoding of the join example discussed in <xref ref-type="sec" rid="s4-2-3">subsection&#x20;4.2.3</xref>.</p>
<p>Encoding the constraints from <xref ref-type="disp-formula" rid="e5">Eq. 5</xref> defined on the topology and on actions is also straightforward, since equality and inequality can be encoded as constraints {e.g., <italic>val</italic>(<italic>s</italic>, <italic>v</italic>) &#x3d; <italic>b</italic> becomes arithm[v@s, &#x201c;&#x3d;,&#x201d; encode(b)]}, with encode being the conversion of values described above. Conjunction and disjunction of constraints can also be encoded in the CSP [e.g., <italic>val</italic>(<italic>s</italic>, <italic>v</italic>
<sub>1</sub>) &#x2260; <italic>val</italic>(<italic>s</italic>, <italic>v</italic>
<sub>2</sub>) becomes arithm(v1@s, &#x201c;!&#x3d;&#x201d; v2@s)].</p>
<p>By only creating transitions between the scopes following the Spark DAG, we automatically fulfill part of the requirement of the plan to be isomorphic to <inline-formula id="inf11">
<mml:math id="m25">
<mml:mrow>
<mml:mover>
<mml:mrow>
<mml:mi mathvariant="normal">&#x3a3;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x2243;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula> such that <inline-formula id="inf12">
<mml:math id="m26">
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mover>
<mml:mrow>
<mml:mo>&#x2192;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:mrow>
</mml:mover>
</mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mspace width="0.28em"/>
<mml:mo>&#x21d2;</mml:mo>
<mml:mspace width="0.28em"/>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mo>&#x2203;</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>T</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>:</mml:mo>
<mml:mi>r</mml:mi>
<mml:mi>d</mml:mi>
<mml:mi>d</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mover>
<mml:mrow>
<mml:mi>&#x3b3;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x2243;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>r</mml:mi>
<mml:mi>d</mml:mi>
<mml:mi>d</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
<p>Subsequently, we add a constraint stating that exactly one of the action variables in each transition can be used. If we treat a false Boolean value as zero and a true Boolean value as one, we can add a constraint that sums each action variable in a transition and ensures the sum is equal to&#x20;one.</p>
<p>Finally, we add an optimization objective that maximizes the occurrences of the no-op actions, to ensure that we do not perform extremely unnecessary processing (e.g., performing some function <italic>f</italic>, undoing it, and then redoing&#x20;it).</p>
<p>From the action variables defined on the transitions in the CSP, we can then extract the actions assigned on those transitions, which correspond to user code that should be assigned to variation points.</p>
</sec>
<sec id="s4-4">
<title>4.4 Planning Model Justification</title>
<p>Having defined our planning system, we must first know that it provides correct results before it can be used in practice. This is based on the concepts of <italic>soundness</italic> and <italic>completeness</italic> (<xref ref-type="bibr" rid="B16">Ghallab et&#x20;al., 2016</xref>). A planning system is sound if, for any solution plan it returns, the plan is a solution for the planning problem. A system is complete if, given a solvable planning problem, the system will return at least one solution&#x20;plan.</p>
<p>Because we represent our planning problem as a CSP and use an existing constraint solver, we do not evaluate the planning problems ourselves. Nevertheless, we can guarantee that the planning process is <italic>complete</italic>; i.e.,&#x20;it will eventually stop and result in either a generated configuration or a failure. This is true because our planning problem is finite (the Spark DAG is of finite size, each state in &#x3a3; has a finite number of state variables, and each state variable has a finite domain) and as a result, the encoded CSP also contains a finite number of variables with a finite number of domains. Since the CSP solver works through piecemeal reduction of the variable domains, given a correct solving algorithm (<xref ref-type="bibr" rid="B28">Kondrak and van Beek, 1997</xref>), eventually, a solution will be reached or the solving will fail in case the CSP is unsatisfiable. The result is also <italic>sound</italic> since we encode all aspects of our planning problem as constraints as described in the previous section (i.e.,&#x20;every possible variable and its type and all possible actions), and the constraint solver will ensure every constraint on the CSP is met. Therefore, any solution to the CSP is a solution to the planning problem.</p>
<p>Soundness and completeness of the planning problem itself are based on both the construction of the transition system &#x3a3; as well as the representation of the distributed processing pipeline <inline-formula id="inf13">
<mml:math id="m27">
<mml:mrow>
<mml:mover>
<mml:mrow>
<mml:mi mathvariant="normal">&#x3a3;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x2243;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula>. This is again focused on the Spark DAG but can be modified for other distributed processing frameworks. Since our planning system is based on the state-variable representation as described by <xref ref-type="bibr" rid="B17">Ghallab et&#x20;al. (2004)</xref>, we know that this approach can yield correct results. Therefore, we will only discuss the correctness of the Spark DAG translation, i.e.,&#x20;<inline-formula id="inf14">
<mml:math id="m28">
<mml:mrow>
<mml:mover>
<mml:mrow>
<mml:mi mathvariant="normal">&#x3a3;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x2243;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula>. We do this based on our definitions in <xref ref-type="sec" rid="s4-2">Section 4.2</xref>, of which the most important is the final definition of the transition function in <xref ref-type="disp-formula" rid="e12">Eq.&#x20;12</xref>.</p>
<p>Since Spark uses a (directed) acyclic graph, the transition system must also be acyclic. We represent the DAG as <inline-formula id="inf15">
<mml:math id="m29">
<mml:mrow>
<mml:mover>
<mml:mrow>
<mml:mi>&#x3b3;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x2243;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mover>
</mml:mrow>
<mml:mo>:</mml:mo>
<mml:mi>R</mml:mi>
<mml:mo>&#xd7;</mml:mo>
<mml:mi>T</mml:mi>
<mml:mo>&#x2192;</mml:mo>
<mml:mi>R</mml:mi>
</mml:math>
</inline-formula>. Since <inline-formula id="inf16">
<mml:math id="m30">
<mml:mrow>
<mml:mover>
<mml:mrow>
<mml:mi>&#x3b3;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x2243;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula> is created from the Spark DAG, these properties (such as it being acyclic) also hold for <inline-formula id="inf17">
<mml:math id="m31">
<mml:mrow>
<mml:mover>
<mml:mrow>
<mml:mi>&#x3b3;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x2243;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula>. Next, recall the <italic>rdd</italic> relation between <inline-formula id="inf18">
<mml:math id="m32">
<mml:mrow>
<mml:mover>
<mml:mrow>
<mml:mi>&#x3b3;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x2243;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula> and <italic>&#x3b3;</italic>, defined as <inline-formula id="inf19">
<mml:math id="m33">
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mo>&#x2200;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mo>&#x2200;</mml:mo>
<mml:mi>a</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mo>&#x2200;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mo>&#x2203;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>T</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>:</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>&#x3b3;</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mspace width="0.28em"/>
<mml:mo>&#x21d2;</mml:mo>
<mml:mspace width="0.28em"/>
<mml:mi>r</mml:mi>
<mml:mi>d</mml:mi>
<mml:mi>d</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mover>
<mml:mrow>
<mml:mi>&#x3b3;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x2243;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>r</mml:mi>
<mml:mi>d</mml:mi>
<mml:mi>d</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>.</mml:mo>
<mml:mtext>If there</mml:mtext>
</mml:mrow>
</mml:math>
</inline-formula>If there isa cycle between states in <italic>&#x3b3;</italic>, there must then also be a cycle between RDD scopes in <inline-formula id="inf20">
<mml:math id="m34">
<mml:mrow>
<mml:mover>
<mml:mrow>
<mml:mi>&#x3b3;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x2243;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula>. This is not possible; therefore, <italic>&#x3b3;</italic> must also be acyclic.</p>
<p>Apart from relating <italic>&#x3b3;</italic> with <inline-formula id="inf21">
<mml:math id="m35">
<mml:mrow>
<mml:mover>
<mml:mrow>
<mml:mi>&#x3b3;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x2243;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula> as described above, the <italic>rdd</italic> relation also ensures that topology constraints that specify the behavior of the pipeline are met. This is achieved through the encoding of <italic>effects</italic>(<italic>rdd</italic>(<italic>s</italic>
<sub>
<italic>n</italic>
</sub>)) and <italic>precond</italic>(<italic>rdd</italic>(<italic>s</italic>
<sub>
<italic>m</italic>
</sub>)) as constraints that must be satisfied, given (<italic>&#x2203;t</italic> &#x2208; <italic>T</italic>) such that <inline-formula id="inf22">
<mml:math id="m36">
<mml:mi>r</mml:mi>
<mml:mi>d</mml:mi>
<mml:mi>d</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mover>
<mml:mrow>
<mml:mi>&#x3b3;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x2243;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>r</mml:mi>
<mml:mi>d</mml:mi>
<mml:mi>d</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>. If there is no transition <italic>t</italic>, that means there is no <italic>s</italic>
<sub>
<italic>n</italic>
</sub> &#x2208; <italic>&#x3b3;</italic> that results in <italic>s</italic>
<sub>
<italic>m</italic>
</sub>. In this case, <italic>s</italic>
<sub>
<italic>m</italic>
</sub> is an initial state and as such, it is already described by the initial conditions&#x20;<italic>S</italic>
<sub>0</sub>.</p>
<p>Further complications arise from the possibility of multiple RDDs to be joined, for which our solution is described in <xref ref-type="sec" rid="s4-2-3">Section 4.2.3</xref>. We support two strategies that reconcile the state variables between the two branches. For the &#x201c;require equality&#x201d; strategy, all state variables are related through an equality constraint. If a state variable in one branch is different from the other, this constraint would be violated and the configuration would not be provided as a solution. For the &#x201c;accept either&#x201d; strategy, the user explicitly states that such conflicting state variables are still acceptable in a solution for the planning problem. Nevertheless, the final Spark configuration is still limited in that an RDD can only be assigned a single type. As a result, a solution where multiple possible types are assigned to a single state could not be applied to the pipeline. As mentioned in the description of the &#x201c;accept either&#x201d; strategy, we slightly restrict the planning model by specifying that the <italic>Type</italic> variables should always be equal between branches regardless of the chosen strategy.</p>
<p>Another complication of join operations is that they are the only operations supported by our planning system that do not accept a user function (in which an action could be applied). A configuration would therefore be invalid if an action is assigned to such an operation. Through the implicit transitions mentioned in <xref ref-type="sec" rid="s4-2-3">Section 4.2.3</xref>, we enforce a constraint that ensures a no-op action is applied in those transitions; therefore, no invalid action can be assigned.</p>
<p>Finally, through the <italic>vp</italic> relation, we ensure that a module is of the same category as the variation point in an operation. Furthermore, we ensure that the module is compatible with the operation. If the user attempts to use a variation point in an operation of a type that is not compatible, the chosen module must be of the same incompatible type. As this is prevented in our action encoding, no plan can be generated.</p>
</sec>
</sec>
<sec id="s5">
<title>5 Evaluation</title>
<p>In this chapter, we evaluate our planning and runtime updating system with respect to its runtime performance. We measure the performance using three different experiments:<list list-type="simple">
<list-item>
<p>&#x2022; Plan generation&#x20;time</p>
<list list-type="simple">
<list-item>
<p>Determine how long it takes to generate a plan for a pipeline with either a varying number of steps, a varying number of possible actions per step, or varying numbers of&#x20;joins.</p>
</list-item>
</list>
</list-item>
<list-item>
<p>&#x2022; Runtime overhead</p>
<list list-type="simple">
<list-item>
<p>Determine the difference in performance between our solution and regular Spark, excluding the time it takes to perform the configuration planning process.</p>
</list-item>
</list>
</list-item>
<list-item>
<p>&#x2022; Restarting experiment</p>
<list list-type="simple">
<list-item>
<p>Quantify the performance of reconfiguring the pipeline by updating running scenarios versus having to restart the entire application.</p>
</list-item>
</list>
</list-item>
</list>
</p>
<p>We performed experiments several times for a fair and stable evaluation. Each experiment has been run a total of six times, each time executing in the above&#x20;order.</p>
<p>The experiments were run on a dedicated cluster consisting of one master and five slaves, running in Spark&#x2019;s standalone cluster mode. All six machines have the same specifications, listed in <xref ref-type="table" rid="T3">Table&#x20;3</xref>.</p>
<table-wrap id="T3" position="float">
<label>TABLE 3</label>
<caption>
<p>Experiment cluster node specifications.</p>
</caption>
<table>
<tbody valign="top">
<tr>
<td align="left">Spark version</td>
<td align="center">2.1.0</td>
</tr>
<tr>
<td align="left">Java version</td>
<td align="left">jre-1.8.0-openjdk</td>
</tr>
<tr>
<td align="left">Operating system</td>
<td align="left">CentOS 6.8 64 bits</td>
</tr>
<tr>
<td align="left">Processor</td>
<td align="left">2.7&#xa0;GHz AMD hexacore</td>
</tr>
<tr>
<td align="left">Memory</td>
<td align="left">48&#xa0;GB, 1,333&#xa0;MHz DDR3</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>
<inline-graphic xlink:href="fdata-04-666174-fx1.tif"/>
</p>
<sec id="s5-1">
<title>5.1 Plan Generation Time</title>
<p>The plan generation time experiments explore three different properties of a pipeline to determine their impact on the time it takes to generate a configuration.</p>
<p>The first property we examine is pipeline length. We measure how long it takes to generate a plan for a pipeline with <italic>n</italic> sequential operations, as shown in <xref ref-type="fig" rid="F5">Figure&#x20;5A</xref>. We first measure the planning time for only one operation, followed by the planning time for two operations, up until <italic>n</italic>&#x20;&#x3d; 16. For each step, only a single action is applicable. This is achieved by generating an action specifically for each step in the pipeline, as shown in <xref ref-type="list" rid="list1">Listing&#x20;1</xref>.</p>
<fig id="F5" position="float">
<label>FIGURE 5</label>
<caption>
<p>Plan generation for a varying number of <bold>(A)</bold> steps, <bold>(B)</bold> possible actions per step, and <bold>(C)</bold> joins. Circles and rectangles represent steps and actions, respectively.</p>
</caption>
<graphic xlink:href="fdata-04-666174-g005.tif"/>
</fig>
<p>
<statement content-type="listing" id="list1">
<label>Listing 1</label>
<p>Overview of the generator for variable length pipelines.</p>
<p>The second property being examined is the number of alternatives per step, as shown in <xref ref-type="fig" rid="F5">Figure&#x20;5B</xref>. We generate a pipeline with four steps, but instead of having exactly one action for each Spark operation, we generate multiple actions with the same preconditions and effects on the <italic>Step</italic> variable, as shown in <xref ref-type="list" rid="list1">Listing 1</xref> with ALTERNATIVES &#x3e;&#x20;1.</p>
<p>The last property under examination is the number of joins in a pipeline. We again allow only one possible module per step, but instead of applying a single operation to the initial RDD, we apply two separate map operations on the same RDD, followed by a union. The resulting pipeline topology is shown in <xref ref-type="fig" rid="F5">Figure&#x20;5C</xref>. We determine the time it takes to generate such pipelines for both join strategies mentioned in <xref ref-type="sec" rid="s4-2-3">Section 4.2.3</xref>: &#x201c;require equality&#x201d; and &#x201c;accept either.&#x201d; We otherwise generate the pipeline in the same way as <xref ref-type="list" rid="list1">Listing&#x20;1</xref>.</p>
<p>The results of all measurements in this experiment are shown in <xref ref-type="fig" rid="F6">Figure&#x20;6</xref>.</p>
<p>Let us first discuss the results for the varying number of operations and number of alternative actions per operation. For both measurements, the growth is exponential, although the planning generation time for the variable pipeline length grows considerably faster.</p>
<p>We attribute this higher growth rate to multiple factors. First, the domain for the <italic>Step</italic> variable increases, as the final goal value is increased. Next, the system also needs to test the applicability of more possible actions, since we generate one action for each step in the pipeline. For the steps themselves, we need to encode more constraints (described in <xref ref-type="sec" rid="s4">Section 4</xref>), and we also have more steps where we try to apply the no-op actions. The measurements for the varying number of alternatives are also affected by the increase of possible actions to be tested; however, this introduces much fewer constraints to the&#x20;CSP.</p>
<p>The results for the experiments using a variable number of joins show a higher growth rate when using the &#x201c;accept either&#x201d; strategy compared to the &#x201c;all equal&#x201d; strategy. The planning time when using the &#x201c;all equal&#x201d; strategy still grows faster than that of the varying pipeline size and varying number of alternatives, since for each join, two actions need to be applied, one for each branch. Using the &#x201c;accept either&#x201d; strategy results in even bigger planning time growth, since the system must accept either value as a result of one join. This uncertainty propagates through every step in the pipeline, meaning that we cannot reduce the search space as quickly as for the other experiments.</p>
</statement>
</p>
<fig id="F6" position="float">
<label>FIGURE 6</label>
<caption>
<p>Results of the plan generation time experiments. The lines represent the average value of each measured property, and dots represent individual measurements.</p>
</caption>
<graphic xlink:href="fdata-04-666174-g006.tif"/>
</fig>
</sec>
<sec id="s5-2">
<title>5.2 Dynamic Versus Static</title>
<p>Next, we compare the adaptive framework with the <italic>spark-dynamic</italic> framework (<xref ref-type="bibr" rid="B31">Lazovik et&#x20;al., 2017</xref>) and with regular Spark implementations. These experiments are based on three implemented scenarios inspired by real projects, each built using commonly used operations and increasing in complexity.</p>
<sec id="s5-2-1">
<title>5.2.1 Scenarios</title>
<p>The simple scenario is from the Energy domain. Based on the temperature inside and outside a house, we calculate the power required to heat the house based on an ideal temperature. Accurately estimating the future power usage of a building may allow more efficient distribution of available power within Smart Buildings (<xref ref-type="bibr" rid="B15">Georgievski et&#x20;al., 2012</xref>) or Smart Power Grids (<xref ref-type="bibr" rid="B27">Kok, 2013</xref>), resulting in lower energy costs. A representation of this pipeline is shown in <xref ref-type="fig" rid="F7">Figure&#x20;7A</xref>.</p>
<fig id="F7" position="float">
<label>FIGURE 7</label>
<caption>
<p>Pipelines for three different scenarios: <bold>(A)</bold> simple, <bold>(B)</bold> middle, and <bold>(C)</bold> complex.</p>
</caption>
<graphic xlink:href="fdata-04-666174-g007.tif"/>
</fig>
<p>The middle scenario is based on autonomous driving. In this scenario vehicles are driving along a highway that has been outfitted with sensors that track the location of all vehicles driving on it, as well as the location of any accidents that happen on that highway. We focus our scenario not on autonomous driving itself, but on a small part of the information processing. When a vehicle approaches an accident, laws or safety requirements could indicate that a vehicle should have a specific (minimum) distance to the vehicle in front of it. In this scenario, we calculate the speed required to reach the required distance based on several factors: the location of the accident (from the starting point of the roadway), the speed and location of the current vehicle, and the speed and location of the vehicle ahead. We use this data to calculate the distance from the current vehicle to the vehicle ahead and the distance from the current vehicle to the accident. The pipeline used in this scenario is shown in <xref ref-type="fig" rid="F7">Figure&#x20;7B</xref>. This scenario is a bit more complex than the previous one because the pipeline contains a split and join as well as operations other than just plannedMap.</p>
<p>The final scenario is the most complex, containing multiple splits and joins as well as multiple types of operations. This scenario is related to the healthcare domain. In the scenario, patients are being monitored remotely based on their heart rate, blood pressure, how much they are moving, and whether they are in their bed. Several risk assessments are done relating to the health of the patients based on this data. For example, if the heart rate of the patient is high and they are not moving, something might be wrong. By creating a plannable pipeline for this monitoring, changes can be made without having to temporarily interrupt the monitoring process, which could result in dangerous or relevant medical situations being missed. The pipeline for this scenario is shown in <xref ref-type="fig" rid="F7">Figure&#x20;7C</xref>.</p>
<p>Each of these scenarios is implemented in three different ways:<list list-type="simple">
<list-item>
<p>&#x2022; Static</p>
<list list-type="simple">
<list-item>
<p>This implementation is a regular Spark pipeline as it would be written without the system described in this&#x20;paper.</p>
</list-item>
</list>
</list-item>
<list-item>
<p>&#x2022; Dynamic</p>
<list list-type="simple">
<list-item>
<p>The dynamic implementation uses the variation points from <italic>spark-dynamic</italic> for each operation, with the variation points given preassigned functions.</p>
</list-item>
</list>
</list-item>
<list-item>
<p>&#x2022; Planned</p>
<list list-type="simple">
<list-item>
<p>The planned version uses the planner and plannable variation points from this&#x20;paper.</p>
</list-item>
</list>
</list-item>
</list>
</p>
<p>The planned implementations of the simple and complex scenarios also include alternative PlannedModules that can fulfill the scenario goals. The middle scenario instead contains extra variation points that should be assigned no-op actions. This&#x20;way, all three scenarios are given a bigger search space during the planning process. For the static implementation of each scenario, only the base scenario is implemented, since updating it is not possible. The dynamic implementations also only contain the base scenario. This is because the Spark pipeline is still restricted to static RDD types with the <italic>spark-dynamic</italic> library. The implementations of the scenarios can be found in an external repository<xref ref-type="fn" rid="FN2">
<sup>2</sup>
</xref>. This includes the code used to generate the input data. We have also added detailed DAG representations to the repository for each scenario that includes pipeline constraints and assigned PlannedModules.</p>
<p>
<inline-graphic xlink:href="fdata-04-666174-fx2.tif"/>
</p>
<p>
<inline-graphic xlink:href="fdata-04-666174-fx3.tif"/>
</p>
</sec>
<sec id="s5-2-2">
<title>5.2.2 Runtime Overhead</title>
<p>In the runtime overhead experiment, we run each implementation of every scenario and measure how long one iteration takes to complete. For the planning implementation, we do not replan the pipeline between iterations, as we are just interested in the actual runtime overhead. Within this experiment, all scenarios (simple, middle, and complex) were executed 240 times. Every 60 iterations, the application terminates so that any optimization to the bytecode done by the JVM during a run does not greatly affect the benchmark. This scheduling is shown in <xref ref-type="list" rid="list2">Listing 2</xref>. Since each experiment is repeated six times, each scenario is started four times and each scenario runs the pipeline 60 times; in total, we have 6 &#xd7; 4&#x20;&#xd7; 60&#x20;&#x3d; 1440 measured&#x20;runs.</p>
<p>
<statement content-type="listing" id="list2">
<label>Listing 2</label>
<p>Scheduling of the runtime overhead experiment.</p>
<p>Each iteration of the simple scenario is run with 4,400 input objects, the middle scenario is run with 1551 &#xd7; 2 input objects from two source RDDs, and the complex scenario is run with 6600 &#xd7; 4&#x20;&#x3d; 26400 input objects from four source&#x20;RDDs.</p>
<p>
<xref ref-type="list" rid="list3">Listing 3</xref> shows how a single run of this experiment is performed. First, we randomly generate the data that will be used for that run of the experiment. Next, we run the pipeline once and store the generated plan since that is not a factor we want to test with this experiment. We then perform warmup cycles on the data to eliminate JVM startup interference, followed by timing the real benchmark cycles.</p>
</statement>
</p>
<p>
<statement content-type="listing" id="list3">
<label>Listing 3</label>
<p>Overview of the runtime overhead experiment&#x20;code.</p>
<p>The results of this experiment are shown in <xref ref-type="fig" rid="F8">Figure&#x20;8</xref>, <xref ref-type="table" rid="T4">Table&#x20;4</xref>. First, the <italic>dynamic</italic> implementation of each scenario takes longer to run than the <italic>static</italic> Spark implementation. This matches the results of the earlier experiments done for <italic>spark-dynamic</italic> (<xref ref-type="bibr" rid="B31">Lazovik et&#x20;al., 2017</xref>). The reason for this is that we have added extra functionality on top of the existing static Spark&#x20;code.</p>
<p>Looking at the results for the simple scenario, when using the <italic>static</italic> implementation as a baseline, the <italic>dynamic</italic> implementation takes approximately 13<italic>%</italic> longer since we have to download the assigned contents of the variation points and process them. The running time of the <italic>planned</italic> implementation is approximately 46<italic>%</italic> longer than the baseline. This is because we do not only have the overhead of the <italic>dynamic</italic> implementation but also have extra logic to enable the dynamic typing, such as casting the input data. The loading of variation points is also slightly more complicated since the library must make sure that planning has been completed. Individually, these steps would not take much time but since this is repeated for every record of every operation in the pipeline, their effects become significant. The <italic>planned</italic> implementation also suffers (more than the other implementations) from irregular increases in the time per iteration, which could be the result of networking lag, thread scheduling, or garbage collection.</p>
<p>The results for the middle scenario are similar, with the <italic>dynamic</italic> implementation having an overhead of approximately 11<italic>%</italic> above baseline, and the <italic>planned</italic> implementation having an overhead of 27<italic>%</italic>. Since the average iteration time for this scenario is longer than the simple scenario, the irregular spikes mentioned above have a smaller effect on the averages.</p>
<p>The results for the complex scenario show overhead for the <italic>dynamic</italic> implementation of approximately 10<italic>%</italic> above baseline and overhead of approximately 31<italic>%</italic> over the <italic>static</italic> baseline for the <italic>planned</italic> implementation. These results show a new phenomenon, where the iterations appear to increase in running time every iteration until the Spark application is restarted. This is primarily the result of garbage collection performed by the JVM. Switching to a different garbage collection implementation or changing the memory size allocated to Spark executors changes the curves of the results.</p>
</statement>
</p>
<fig id="F8" position="float">
<label>FIGURE 8</label>
<caption>
<p>Results of the runtime overhead experiment per scenario. Each scenario is given a color, with the lightest, the mild, and darkest shades of each color representing the <italic>static</italic>, the <italic>dynamic</italic>, and the <italic>planned</italic> implementations, respectively. The vertical lines indicate 60 pipeline iterations, after which the Spark application is restarted.</p>
</caption>
<graphic xlink:href="fdata-04-666174-g008.tif"/>
</fig>
<table-wrap id="T4" position="float">
<label>TABLE 4</label>
<caption>
<p>Averaged results of the runtime overhead experiment per scenario.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th rowspan="2" align="left">Impl.</th>
<th colspan="6" align="center">Scenario</th>
</tr>
<tr>
<th colspan="2" align="center">Simple (ms)</th>
<th colspan="2" align="center">Middle (ms)</th>
<th colspan="2" align="center">Complex (ms)</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">Static</td>
<td align="char" char=".">76.00</td>
<td align="center">&#x2014;</td>
<td align="char" char=".">241.62</td>
<td align="center">&#x2014;</td>
<td align="char" char=".">1311.58</td>
<td align="center">&#x2014;</td>
</tr>
<tr>
<td align="left">Dynamic</td>
<td align="char" char=".">86.08</td>
<td align="center">(&#x2b;13.3<italic>%</italic>)</td>
<td align="char" char=".">267.85</td>
<td align="center">(&#x2b;10.9<italic>%</italic>)</td>
<td align="char" char=".">1441.57</td>
<td align="center">(&#x2b;9.9<italic>%</italic>)</td>
</tr>
<tr>
<td align="left">Planned</td>
<td align="char" char=".">111.18</td>
<td align="center">(&#x2b;46.3<italic>%</italic>)</td>
<td align="char" char=".">306.85</td>
<td align="center">(&#x2b;27.0<italic>%</italic>)</td>
<td align="char" char=".">1711.74</td>
<td align="center">(&#x2b;30.5<italic>%</italic>)</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s5-2-3">
<title>5.2.3 Restarting Experiment</title>
<p>In this experiment, we determine how much increase in performance we can achieve by using the planning system. We define the performance gain as the difference in time it takes to process a set of data while performing reconfiguration at runtime compared to having to restart a static Spark application.</p>
<p>A basic overview of this experiment is shown in <xref ref-type="fig" rid="F9">Figure&#x20;9</xref>. Here, we split the dataset into sections and first process all eight sections of the dataset. For the <italic>static</italic> implementation, this means we only start the application once, and for the <italic>planned</italic> implementation, this means we only generate a new plan once. In the next test, we only process half of the dataset before a reconfiguration takes place: for the <italic>static</italic> implementation, Spark terminates after each section of the dataset and we restart it for the next dataset; for the <italic>dynamic</italic> case, we simply start the next iteration without making any changes; for the <italic>planned</italic> case, our system must generate a new plan. After the reconfiguration, we continue processing the other half. We continue subdividing the dataset for these tests until we have to reconfigure the application after every section of the dataset.</p>
<fig id="F9" position="float">
<label>FIGURE 9</label>
<caption>
<p>Overview of the restarting experiment, with the start and stop symbols representing the reconfiguration of the pipelines.</p>
</caption>
<graphic xlink:href="fdata-04-666174-g009.tif"/>
</fig>
<p>In the actual experiment, we do not use just eight sections of the data as described above but instead use the scheme shown in <xref ref-type="list" rid="list4">Listing 4</xref>, e.g., one iteration with 80 copies of the database, followed by two iterations of 40 copies of the database, etcetera. During experimentation, the number of iterations/slices per dataset was increased until a stable trend was found at 20 iterations per dataset and further increased to 80 iterations to ensure that the trend remained stable. Since the <italic>static</italic> implementation cannot be updated, it is fully restarted for every iteration. <xref ref-type="list" rid="list5">Listing 5</xref> shows a rough overview of the applications used in the experiment.</p>
<p>
<statement content-type="listing" id="list4">
<label>Listing 4</label>
<p>Scheduling of the restarting experiment.</p>
<p>
<inline-graphic xlink:href="fdata-04-666174-fx4.tif"/>
</p>
</statement>
</p>
<p>
<statement content-type="listing" id="list5">
<label>Listing 5</label>
<p>Overview of the restarting experiment&#x20;code.</p>
<p>
<inline-graphic xlink:href="fdata-04-666174-fx5.tif"/>
</p>
<p>The averaged results of these experiments are shown in <xref ref-type="fig" rid="F10">Figure&#x20;10</xref>. We see that if the <italic>static</italic> implementations have to be restarted 80 times, it takes a considerable amount of time for all scenarios. Restarting only 40&#x20;times takes around half as much time. The <italic>dynamic</italic> implementations take a roughly constant amount of time for each scenario, as there is barely any cost apart from the actual processing of the sections of the datasets. The <italic>planned</italic> implementations show a slight linear increase in processing time when the number of reconfigurations is increased, and this is most pronounced for the complex scenario. This increase is the result of the planning process, where a plan has to be generated and the variation points have to wait for the planning process to complete for each iteration. The <italic>planned</italic> implementations nevertheless perform much better in this experiment than the <italic>static</italic> implementations.</p>
<p>
<xref ref-type="table" rid="T5">Table&#x20;5</xref> shows the results of this experiment for the cases where the dataset is processed without restarting and where the dataset is divided into 80 sections. Similar to the previous experiment, when the dataset is processed without restarting, the <italic>static</italic> implementation of each scenario outperforms the other implementations. However, when the reconfiguration is done 80 times, the <italic>planned</italic> implementations process the entire dataset in around 90% less time than it takes for the <italic>static</italic> implementations to finish. The results when the dataset is processed without restarting only roughly match those of the runtime overhead experiment since that experiment does not include the planning and application startup time in its measurements and its RDDs contain much fewer records.</p>
<p>In this experiment, we assumed that the data is not dependent and can be split into chunks, and all updates are anticipated. However, in many cases, the same algorithm should be applied to the whole dataset where a restart will lose progress, and the updates are unforeseen. For example, a pipeline with only two steps, <italic>&#x3b1;</italic> and <italic>&#x3b2;</italic>, is running. While <italic>&#x3b1;</italic> is almost over, you will find a bug in <italic>&#x3b2;</italic>. Applying a bug fix in the static implementation means losing <italic>&#x3b1;</italic>&#x2019;s computations while using our system <italic>&#x3b2;</italic> can be updated separately. In the worst-case scenario, the planned implementation introduced 46% overhead, which means even a single update after 46% of the pipeline runtime will benefit from planned implementation. Obviously, the planned implementation will outperform the static implementation in multiple update scenarios. It is noteworthy to mention that this is just a proof-of-concept implementation of our system.</p>
</statement>
</p>
<fig id="F10" position="float">
<label>FIGURE 10</label>
<caption>
<p>Results of the restarting experiment per scenario. The zoomed subfigures show the breakdown for each scenario. Each scenario is given a color, with the lightest, the mild, and darkest shades of each color representing the <italic>static</italic>, the <italic>dynamic</italic>, and the <italic>planned</italic> implementations, respectively. <italic>Iterations to process full dataset</italic> indicates the number of sections in the dataset. With one iteration, the entire dataset is processed without restarting, while with 80 iterations, the pipeline is restarted 79&#x20;times to process all&#x20;data.</p>
</caption>
<graphic xlink:href="fdata-04-666174-g010.tif"/>
</fig>
<table-wrap id="T5" position="float">
<label>TABLE 5</label>
<caption>
<p>Results of the restarting experiment per scenario for 1 and 80 iterations.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th rowspan="2" align="left">Itt</th>
<th rowspan="2" align="center">Impl.</th>
<th colspan="6" align="center">Scenario</th>
</tr>
<tr>
<th colspan="2" align="center">Simple (s)</th>
<th colspan="2" align="center">Middle (s)</th>
<th colspan="2" align="center">Complex (s)</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td rowspan="3" align="left">1</td>
<td align="left">static</td>
<td align="char" char=".">9.6</td>
<td align="center">&#x2014;</td>
<td align="char" char=".">16.8</td>
<td align="center">&#x2014;</td>
<td align="char" char=".">71.6</td>
<td align="center">&#x2014;</td>
</tr>
<tr>
<td align="left">Dynamic</td>
<td align="char" char=".">10.4</td>
<td align="center">(&#x2b;8.4<italic>%</italic>)</td>
<td align="char" char=".">17.6</td>
<td align="center">(&#x2b;5.0<italic>%</italic>)</td>
<td align="char" char=".">78.5</td>
<td align="center">(&#x2b;9.7<italic>%</italic>)</td>
</tr>
<tr>
<td align="left">planned</td>
<td align="char" char=".">12.1</td>
<td align="center">(&#x2b;25.8<italic>%</italic>)</td>
<td align="char" char=".">19.2</td>
<td align="center">(&#x2b;14.3<italic>%</italic>)</td>
<td align="char" char=".">86.2</td>
<td align="center">(&#x2b;20.3<italic>%</italic>)</td>
</tr>
<tr>
<td rowspan="3" align="left">80</td>
<td align="left">static</td>
<td align="char" char=".">572.9</td>
<td align="center">&#x2014;</td>
<td align="char" char=".">708.4</td>
<td align="center">&#x2014;</td>
<td align="char" char=".">854.6</td>
<td align="center">&#x2014;</td>
</tr>
<tr>
<td align="left">dynamic</td>
<td align="char" char=".">18.5</td>
<td align="center">(&#x2212;96.8<italic>%</italic>)</td>
<td align="char" char=".">31.6</td>
<td align="center">(&#x2212;95.5<italic>%</italic>)</td>
<td align="char" char=".">71.1</td>
<td align="center">(&#x2212;91.7<italic>%</italic>)</td>
</tr>
<tr>
<td align="left">planned</td>
<td align="char" char=".">26.8</td>
<td align="center">(&#x2212;95.3<italic>%</italic>)</td>
<td align="char" char=".">43.3</td>
<td align="center">(&#x2212;93.9<italic>%</italic>)</td>
<td align="char" char=".">114.0</td>
<td align="center">(&#x2212;86.7<italic>%</italic>)</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
</sec>
</sec>
<sec id="s6">
<title>6 Conclusion and Discussion</title>
<p>In this work, we have introduced a system for adaptive on-the-fly changes in distributed data processing pipelines using constraint-based AI planning techniques. The feasibility of the approach is tested using Apache Spark as a target distributed processing framework. In this paper, we also present the generic methodology that enables adaptive on-the-fly changes of applications in distributed data analysis for industrial organizations in the Industry 4.0 era. While the proof-of-concept implementation is specific to Apache Spark, the methodology and planning model can be applied to any distributed data process platform operating on the same principles (that is, through a sequence of operations forming a DAG that allows custom user code to be executed). Regarding the proof of concept itself, rapid development and modification of running pipelines could in some cases already benefit from the use of this system.</p>
<p>The results of our experiments show the exponential nature of the planning time, which is dependent on the computational complexity of the planning problem itself (pipeline length, number of alternative actions, and number of joins). The results also show the overhead introduced by the additional functionality that enables dynamic typing, compared to the more restrictive <italic>spark-dynamic</italic> system (<xref ref-type="bibr" rid="B31">Lazovik et&#x20;al., 2017</xref>).</p>
<p>We also note that our evaluation was performed using comparatively small datasets, with between 4,400 and 264,000 entries per RDD in the runtime overhead experiment. As a result, the overhead introduced by both <italic>spark-dynamic</italic> and our planning system could be overemphasized compared to real-world usage of the system. However, this did allow us to repeat the experiments multiple&#x20;times.</p>
<p>We believe that the system described in this paper provides a solid foundation and starting point for automated DSU systems for distributed data processing frameworks, where the general feasibility of this approach is shown through our implemented scenarios and their evaluation.</p>
<p>Since this work is one of the first attempts at integrating adaptive reconfiguration and DSU to the field of distributed data processing, a lot of directions for possible future research exist. Due to the novel nature of this research, we do not consider this as a weak point but instead as an opportunity for further development of this field. Techniques such as distributed planning, optimized replanning (fewest changes compared to the previous plan), compile-time validation, and preplanning can increase the performance of the planning process, as well as pipeline development in general. Furthermore, allowing extended goals (such as achieve-and-maintain), partial knowledge, dynamic goals (updating topology constraints), and planning over multiple pipelines can increase the usefulness of the system. Another beneficial feature would be the ability to perform replanning for batch pipelines where some steps have already been completed, in which case only uncompleted steps should be replanned. Finally, implementing reconciliation strategies for in-transit data (when a pipeline is updated while processing) and allowing dynamic pipeline topologies are important points that still need to be addressed. It is also important to investigate in which cases a dynamic updating framework as described in this paper should (or should not) be used, which will require using it in operational settings in different domains.</p>
</sec>
</body>
<back>
<sec id="s7">
<title>Data Availability Statement</title>
<p>The original contributions presented in the study are included in the article/Supplementary Materials; further inquiries can be directed to the corresponding author.</p>
</sec>
<sec id="s8">
<title>Author Contributions</title>
<p>All authors listed have made a substantial, direct, and intellectual contribution to the work and approved it for publication.</p>
</sec>
<sec id="s9">
<title>Funding</title>
<p>This research has been partially sponsored by NWO C2D and TKI HTSM Ecida Project Grant No. 628011003, Evolutionary Changes for Distributed Analysis, and by the EU H2020 FIRST Project, Grant No. 734599, FIRST: vF Interoperation suppoRting buSiness innovaTion.</p>
</sec>
<sec sec-type="COI-statement" id="s10">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s11">
<title>Publisher&#x2019;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations or those of the publisher, the editors, and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.</p>
</sec>
<fn-group>
<fn id="FN1">
<label>1</label>
<p>See <ext-link ext-link-type="uri" xlink:href="https://spark.apache.org/docs/latest/streaming-programming-guide.html#upgrading-application-code">https://spark.apache.org/docs/latest/streaming-programming-guide.html&#x23;upgrading-application-code</ext-link>
</p>
</fn>
<fn id="FN2">
<label>2</label>
<p>See <ext-link ext-link-type="uri" xlink:href="https://github.com/rug-ds-lab/planning-dynamic-spark-supplemental">https://github.com/rug-ds-lab/planning-dynamic-spark-supplemental</ext-link>
</p>
</fn>
</fn-group>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Assun&#xe7;&#xe3;o</surname>
<given-names>M. D.</given-names>
</name>
<name>
<surname>Calheiros</surname>
<given-names>R. N.</given-names>
</name>
<name>
<surname>Bianchi</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Netto</surname>
<given-names>M. A. S.</given-names>
</name>
<name>
<surname>Buyya</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Big Data Computing and Clouds: Trends and Future Directions</article-title>. <source>J.&#x20;Parallel Distributed Comput.</source> <volume>79-80</volume>, <fpage>3</fpage>&#x2013;<lpage>15</lpage>. <pub-id pub-id-type="doi">10.1016/J.JPDC.2014.08.003</pub-id> </citation>
</ref>
<ref id="B2">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Bagherzadeh</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Kahani</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Jahed</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Dingel</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2020</year>). &#x201c;<article-title>Execution of Partial State Machine Models</article-title>,&#x201d; in <conf-name>IEEE Trans. Software Eng.</conf-name>, <fpage>1</fpage>. <pub-id pub-id-type="doi">10.1109/tse.2020.3008850</pub-id> </citation>
</ref>
<ref id="B3">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Baier</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Katoen</surname>
<given-names>J.-P.</given-names>
</name>
</person-group> (<year>2008</year>). <source>Principles of Model Checking</source>. <publisher-loc>Cambridge, Massachusetts</publisher-loc>: <publisher-name>MIT Press</publisher-name>. </citation>
</ref>
<ref id="B4">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bart&#xe1;k</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Salido</surname>
<given-names>M. A.</given-names>
</name>
<name>
<surname>Rossi</surname>
<given-names>F.</given-names>
</name>
</person-group> (<year>2010</year>). <article-title>Constraint Satisfaction Techniques in Planning and Scheduling</article-title>. <source>J.&#x20;Intell. Manuf.</source> <volume>21</volume>, <fpage>5</fpage>&#x2013;<lpage>15</lpage>. <pub-id pub-id-type="doi">10.1007/s10845-008-0203-4</pub-id> </citation>
</ref>
<ref id="B5">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Berger</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>St&#x103;nciulescu</surname>
<given-names>&#x15e;.</given-names>
</name>
<name>
<surname>&#xd8;g&#xe5;rd</surname>
<given-names>O.</given-names>
</name>
<name>
<surname>Haugen</surname>
<given-names>&#xd8;.</given-names>
</name>
<name>
<surname>Larsen</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>W&#x105;sowski</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>To Connect or Not to Connect: Experiences from modeling topological variability</article-title>. in <conf-name>Proceedings of the 18th International Software Product Line Conference - Volume </conf-name>(<publisher-loc>New York, NY</publisher-loc>: <publisher-name>Association for Computing Machinery</publisher-name>), SPLC &#x2019;14, <fpage>330</fpage>&#x2013;<lpage>339</lpage>. <pub-id pub-id-type="doi">10.1145/2648511.2648549</pub-id> </citation>
</ref>
<ref id="B6">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bockmayr</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Hooker</surname>
<given-names>J.&#x20;N.</given-names>
</name>
</person-group> (<year>2005</year>). <article-title>Constraint Programming</article-title>. <source>Handbooks Operations Res. Manage. Sci.</source> <volume>12</volume>, <fpage>559</fpage>&#x2013;<lpage>600</lpage>. <pub-id pub-id-type="doi">10.1016/s0927-0507(05)12010-6</pub-id> </citation>
</ref>
<ref id="B7">
<citation citation-type="web">
<person-group person-group-type="author">
<name>
<surname>Boyce</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Leger</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Stateful Streaming with Apache Spark: How to Update Decision Logic at Runtime. <italic>DATA&#x2b;AI Summit Europe</italic>
</article-title>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="https://databricks.com/session_eu20/stateful-streaming-with-apache-spark-how-to-update-decision-logic-at-runtime">https://databricks.com/session_eu20/stateful-streaming-with-apache-spark-how-to-update-decision-logic-at-runtime</ext-link>
</comment>. </citation>
</ref>
<ref id="B53">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Carbone</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Ewen</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Haridi</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Katsifodimos</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Markl</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Tzoumas</surname>
<given-names>K.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Apache Flink: Unified Stream and Batch Processing in a Single Engine</article-title>. <source>Data Engineering</source> <volume>36</volume>, <fpage>28</fpage>&#x2013;<lpage>38</lpage>. <pub-id pub-id-type="doi">10.1109/IC2EW.2016.56</pub-id> </citation>
</ref>
<ref id="B8">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Che</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Safran</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Peng</surname>
<given-names>Z.</given-names>
</name>
</person-group> (<year>2013</year>). &#x201c;<article-title>From Big Data to Big Data Mining: Challenges, Issues, and Opportunities</article-title>,&#x201d; in <source>Lecture Notes in Computer Science</source>. <source>LNCS</source> (<publisher-loc>Berlin, Heidelberg</publisher-loc>: <publisher-name>Springer</publisher-name>), <volume>Vol. 7827</volume>, <fpage>1</fpage>&#x2013;<lpage>15</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-642-40270-8_1</pub-id> </citation>
</ref>
<ref id="B9">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Cimatti</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Roveri</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Traverso</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>1998</year>). &#x201c;<article-title>Strong Planning in Non-deterministic Domains <italic>via</italic> Model Checking</article-title>,&#x201d; in <conf-name>Proceedings of the Fourth International Conference on Artificial Intelligence Planning Systems.</conf-name> (<publisher-loc>Pittsburgh, PA</publisher-loc>: <publisher-name>AAAI Press</publisher-name>), <fpage>36</fpage>&#x2013;<lpage>43</lpage>. </citation>
</ref>
<ref id="B10">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cook</surname>
<given-names>R. P.</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>I.</given-names>
</name>
</person-group> (<year>1983</year>). <article-title>A Dynamic Modification System</article-title>. <source>ACM SIGPLAN Notices</source> <volume>18</volume>, <fpage>201</fpage>&#x2013;<lpage>202</lpage>. <pub-id pub-id-type="doi">10.1145/1006142.1006188</pub-id> </citation>
</ref>
<ref id="B39">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>De Nicola</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Vaandrager</surname>
<given-names>F.</given-names>
</name>
</person-group> (<year>1990</year>). &#x201c;<article-title>Action versus State Based Logics for Transition Systems, In Semantics of Systems of Concurrent Processes&#x201d;</article-title>, In <source>Ed. I. Guessarian</source>. (<publisher-loc>Berlin, Heidelberg</publisher-loc>: <publisher-name>Springer Berlin Heidelberg</publisher-name>), <volume>Vol. 469</volume>, <fpage>407</fpage>&#x2013;<lpage>419</lpage>. <pub-id pub-id-type="doi">10.1007/3-540-53479-217</pub-id> </citation>
</ref>
<ref id="B11">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Dhungana</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Schreiner</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Lehofer</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Vierhauser</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Rabiser</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Gr&#xfc;nbacher</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>2014</year>). &#x201c;<article-title>Modeling Multiplicity and Hierarchy in Product Line Architectures</article-title>,&#x201d; in <conf-name>Proceedings of the WICSA 2014 Companion Volume</conf-name> (<publisher-loc>New York, New York, USA</publisher-loc>: <publisher-name>ACM Press</publisher-name>), <fpage>1</fpage>&#x2013;<lpage>6</lpage>. <pub-id pub-id-type="doi">10.1145/2578128.2578236</pub-id> </citation>
</ref>
<ref id="B12">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Eichelberger</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>2016</year>). &#x201c;<article-title>A Matter of the Mix: Integration of Compile and Runtime Variability</article-title>,&#x201d; in <conf-name>2016 IEEE 1st International Workshops on Foundations and Applications of Self&#x2a; Systems (FAS&#x2a;W)</conf-name> (<publisher-loc>Piscataway</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>12</fpage>&#x2013;<lpage>17</lpage>. <pub-id pub-id-type="doi">10.1109/FAS-W.2016.17</pub-id> </citation>
</ref>
<ref id="B13">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Emerson</surname>
<given-names>E. A.</given-names>
</name>
</person-group> (<year>2008</year>). &#x201c;<article-title>The Beginning of Model Checking: A Personal Perspective</article-title>,&#x201d; in <source>Lecture Notes in Computer Science</source>. <source>LNCS</source>, <volume>Vol. 5000</volume>, <fpage>27</fpage>&#x2013;<lpage>45</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-540-69850-0_2</pub-id> </citation>
</ref>
<ref id="B14">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Franks</surname>
<given-names>B.</given-names>
</name>
</person-group> (<year>2012</year>). <source>Taming the Big Data Tidal Wave: Finding Opportunities in Huge Data Streams with Advanced Analytics</source>. <publisher-loc>Hoboken, NJ</publisher-loc>: <publisher-name>John Wiley &#x26; Sons</publisher-name>. </citation>
</ref>
<ref id="B15">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Georgievski</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Degeler</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Pagani</surname>
<given-names>G. A.</given-names>
</name>
<name>
<surname>Nguyen</surname>
<given-names>T. A.</given-names>
</name>
<name>
<surname>Lazovik</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Aiello</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2012</year>). <article-title>Optimizing Energy Costs for Offices Connected to the Smart Grid</article-title>. <source>IEEE Trans. Smart Grid</source> <volume>3</volume>, <fpage>2273</fpage>&#x2013;<lpage>2285</lpage>. <pub-id pub-id-type="doi">10.1109/TSG.2012.2218666</pub-id> </citation>
</ref>
<ref id="B16">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Ghallab</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Nau</surname>
<given-names>D. S.</given-names>
</name>
<name>
<surname>Traverso</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>2016</year>). <source>Automated Planning and Acting</source>. <publisher-loc>Cambridge</publisher-loc>: <publisher-name>Cambridge University Press</publisher-name>. </citation>
</ref>
<ref id="B17">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Ghallab</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Nau</surname>
<given-names>D. S.</given-names>
</name>
<name>
<surname>Traverso</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>2004</year>). <source>Automated Planning: Theory and Practice</source>. <publisher-loc>Amsterdam, Netherlands</publisher-loc>: <publisher-name>Elsevier</publisher-name>. </citation>
</ref>
<ref id="B18">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Giunchiglia</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Traverso</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>2000</year>). &#x201c;<article-title>Planning as Model Checking</article-title>,&#x201d; in <conf-name>Recent Advances in AI Planning</conf-name> (<publisher-loc>Berlin, Germany</publisher-loc>: <publisher-name>Springer Berlin Heidelberg</publisher-name>), <fpage>1</fpage>&#x2013;<lpage>20</lpage>. <pub-id pub-id-type="doi">10.1007/10720246_1</pub-id> </citation>
</ref>
<ref id="B19">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Gu</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Ma</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Jiang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Cao</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Lu</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2018</year>). &#x201c;<article-title>Automating Object Transformations for Dynamic Software Updating <italic>via</italic> Online Execution Synthesis</article-title>,&#x201d; in <conf-name>32nd European Conference on Object-Oriented Programming (ECOOP 2018)</conf-name> (<publisher-loc>Wadern, Germany</publisher-loc>: <publisher-name>Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik</publisher-name>). </citation>
</ref>
<ref id="B20">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hallsteinsen</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Hinchey</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Sooyong Park</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Schmid</surname>
<given-names>K.</given-names>
</name>
</person-group> (<year>2008</year>). <article-title>Dynamic Software Product Lines</article-title>. <source>Computer</source> <volume>41</volume>, <fpage>93</fpage>&#x2013;<lpage>95</lpage>. <pub-id pub-id-type="doi">10.1109/MC.2008.123</pub-id> </citation>
</ref>
<ref id="B21">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hicks</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Nettles</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2005</year>). <article-title>Dynamic Software Updating</article-title>. <source>ACM Trans. Program Lang. Syst.</source> <volume>27</volume>, <fpage>1049</fpage>&#x2013;<lpage>1096</lpage>. <pub-id pub-id-type="doi">10.1145/1108970.1108971</pub-id> </citation>
</ref>
<ref id="B22">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hojaji</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Mayerhofer</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Zamani</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Hamou-Lhadj</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Bousse</surname>
<given-names>E.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Model Execution Tracing: a Systematic Mapping Study</article-title>. <source>Softw. Syst. Model.</source> <volume>18</volume>, <fpage>3461</fpage>&#x2013;<lpage>3485</lpage>. <pub-id pub-id-type="doi">10.1007/s10270-019-00724-1</pub-id> </citation>
</ref>
<ref id="B23">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Hunt</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Konar</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Junqueira</surname>
<given-names>F. P.</given-names>
</name>
<name>
<surname>Reed</surname>
<given-names>B.</given-names>
</name>
</person-group> (<year>2010</year>). &#x201c;<article-title>Zookeeper: Wait-free Coordination for Internet-Scale Systems</article-title>,&#x201d; in <conf-name>Proceedings of the 2010 USENIX Conference on USENIX Annual Technical Conference</conf-name>, (<publisher-loc>United States</publisher-loc>: <publisher-name>USENIX Association</publisher-name>), USENIXATC&#x2019;10, 11. </citation>
</ref>
<ref id="B24">
<citation citation-type="thesis">
<person-group person-group-type="author">
<name>
<surname>Kaldeli</surname>
<given-names>E.</given-names>
</name>
</person-group> (<year>2013</year>). &#x201c;<article-title>Domain-Independent Planning for Services in Uncertain and Dynamic Environments</article-title>,&#x201d; (<publisher-loc>Groningen</publisher-loc>: <publisher-name>University of Groningen</publisher-name>). <comment>Ph.D. thesis</comment>. </citation>
</ref>
<ref id="B25">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Katsifodimos</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Schelter</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Haridi</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Katsifodimos</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Markl</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Tzoumas</surname>
<given-names>K.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Apache Flink: Stream Analytics at Scale</article-title>. <source>Data Eng.</source> <volume>36</volume>, <fpage>28</fpage>&#x2013;<lpage>38</lpage>. <pub-id pub-id-type="doi">10.1109/IC2EW.2016.56</pub-id> </citation>
</ref>
<ref id="B26">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kim</surname>
<given-names>D. K.</given-names>
</name>
<name>
<surname>Tilevich</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Ribbens</surname>
<given-names>C. J.</given-names>
</name>
</person-group> (<year>2011</year>). <article-title>Dynamic Software Updates for Parallel High-Performance Applications</article-title>. <source>Concurrency Computat.: Pract. Exper.</source> <volume>23</volume>, <fpage>415</fpage>&#x2013;<lpage>434</lpage>. <pub-id pub-id-type="doi">10.1002/cpe.1663</pub-id> </citation>
</ref>
<ref id="B27">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Kok</surname>
<given-names>K.</given-names>
</name>
</person-group> (<year>2013</year>). &#x201c;<article-title>The PowerMatcher: Smart Coordination for the Smart Electricity Grid</article-title>,&#x201d; (<publisher-loc>Amsterdam, Netherlands</publisher-loc>: <publisher-name>Vrije Universiteit</publisher-name>). <comment>Doctral thesis</comment>. </citation>
</ref>
<ref id="B28">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kondrak</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>van Beek</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>1997</year>). <article-title>A Theoretical Evaluation of Selected Backtracking Algorithms</article-title>. <source>Artif. Intelligence</source> <volume>89</volume>, <fpage>365</fpage>&#x2013;<lpage>387</lpage>. <pub-id pub-id-type="doi">10.1016/S0004-3702(96)00027-6</pub-id> </citation>
</ref>
<ref id="B29">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Lazovik</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Aiello</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Gennari</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2005</year>). &#x201c;<article-title>Encoding Requests to Web Service Compositions as Constraints</article-title>,&#x201d; in <source>Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)</source>. <source>LNCS</source> (<publisher-loc>Berlin, Germany</publisher-loc>: <publisher-name>Springer</publisher-name>), <volume>Vol. 3709</volume>, <fpage>782</fpage>&#x2013;<lpage>786</lpage>. <pub-id pub-id-type="doi">10.1007/11564751_64</pub-id> </citation>
</ref>
<ref id="B30">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lazovik</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Aiello</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Papazoglou</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2006</year>). <article-title>Planning and Monitoring the Execution of Web Service Requests</article-title>. <source>Int. J.&#x20;Digit Libr.</source> <volume>6</volume>, <fpage>235</fpage>&#x2013;<lpage>246</lpage>. <pub-id pub-id-type="doi">10.1007/s00799-006-0002-5</pub-id> </citation>
</ref>
<ref id="B31">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Lazovik</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Medema</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Albers</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Langius</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Lazovik</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2017</year>). &#x201c;<article-title>Runtime Modifications of Spark Data Processing Pipelines</article-title>,&#x201d; in <conf-name>2017 International Conference on Cloud and Autonomic Computing (ICCAC)</conf-name>, <conf-loc>Tucson, AZ, USA</conf-loc>, <conf-date>8-22 Sept. 2017</conf-date> (<publisher-name>IEEE</publisher-name>), <fpage>34</fpage>&#x2013;<lpage>45</lpage>. <pub-id pub-id-type="doi">10.1109/iccac.2017.11</pub-id> </citation>
</ref>
<ref id="B32">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Meier</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Mundhenk</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Thomas</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Vollmer</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>2008</year>). <article-title>The Complexity of Satisfiability for Fragments of CTL and CTL<sup>&#x22c6;</sup>
</article-title>. <source>Electron. Notes Theor. Comput. Sci.</source> <volume>223</volume>, <fpage>201</fpage>&#x2013;<lpage>213</lpage>. <pub-id pub-id-type="doi">10.1016/j.entcs.2008.12.040</pub-id> </citation>
</ref>
<ref id="B33">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Merz</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2001</year>). &#x201c;<article-title>Model Checking: A Tutorial Overview</article-title>,&#x201d; in <source>Modeling and Verification of Parallel Processes</source>. <source>LNCS</source>, <volume>Vol. 2067</volume>, <fpage>3</fpage>&#x2013;<lpage>38</lpage>. <pub-id pub-id-type="doi">10.1007/3-540-45510-8_1</pub-id> </citation>
</ref>
<ref id="B34">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Montgomery</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2004</year>). <article-title>A Model for Updating Real-Time Applications</article-title>. <source>Real-Time Syst.</source> <volume>27</volume>, <fpage>169</fpage>&#x2013;<lpage>189</lpage>. <pub-id pub-id-type="doi">10.1023/B:TIME.0000027932.11280.3c</pub-id> </citation>
</ref>
<ref id="B35">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mugarza</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Parra</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Jacob</surname>
<given-names>E.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Analysis of Existing Dynamic Software Updating Techniques for Safe and Secure Industrial Control Systems</article-title>. <source>Int. J.&#x20;SAFE</source> <volume>8</volume>, <fpage>121</fpage>&#x2013;<lpage>131</lpage>. <pub-id pub-id-type="doi">10.2495/safe-v8-n1-121-131</pub-id> </citation>
</ref>
<ref id="B36">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mugarza</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Parra</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Jacob</surname>
<given-names>E.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Cetratus: A Framework for Zero Downtime Secure Software Updates in Safety&#x2010;critical Systems</article-title>. <source>Softw. Pract. Exper</source> <volume>50</volume>, <fpage>1399</fpage>&#x2013;<lpage>1424</lpage>. <pub-id pub-id-type="doi">10.1002/spe.2820</pub-id> </citation>
</ref>
<ref id="B37">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nau</surname>
<given-names>D. S.</given-names>
</name>
</person-group> (<year>2007</year>). <article-title>Current Trends in Automated Planning</article-title>. <source>AI Mag.</source> <volume>28</volume>, <fpage>43</fpage>. <pub-id pub-id-type="doi">10.1609/aimag.v28i4.2067</pub-id> </citation>
</ref>
<ref id="B38">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Neumann</surname>
<given-names>M. A.</given-names>
</name>
<name>
<surname>Bach</surname>
<given-names>C. T.</given-names>
</name>
<name>
<surname>Ding</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Riedel</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Beigl</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2017</year>). &#x201c;<article-title>Low-disruptive and Timely Dynamic Software Updating of Smart Grid Components</article-title>,&#x201d; in <conf-name>Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST. Vol. 203</conf-name> (<publisher-loc>Cham</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>155</fpage>&#x2013;<lpage>171</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-319-61813-5_16</pub-id> </citation>
</ref>
<ref id="B40">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Pina</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Andronidis</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Hicks</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Cadar</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2019</year>). &#x201c;<article-title>Mvedsua: Higher Availability Dynamic Software Updates <italic>via</italic> Multi-Version Execution</article-title>,&#x201d; in <conf-name>Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems</conf-name>, <conf-loc>Providence, RI</conf-loc> (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>Association for Computing Machinery</publisher-name>), <fpage>573</fpage>&#x2013;<lpage>585</lpage>. </citation>
</ref>
<ref id="B41">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Pina</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Veiga</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Hicks</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2014</year>). &#x201c;<article-title>Rubah</article-title>,&#x201d; in <conf-name>Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages &#x26; Applications - OOPSLA &#x2019;14</conf-name>, <conf-loc>Portland, OR</conf-loc> (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>Association for Computing Machinery</publisher-name>), <fpage>103</fpage>&#x2013;<lpage>119</lpage>. <pub-id pub-id-type="doi">10.1145/2660193.2660220</pub-id> </citation>
</ref>
<ref id="B42">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Pohl</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>B&#xf6;ckle</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>van Der Linden</surname>
<given-names>F. J.</given-names>
</name>
</person-group> (<year>2005</year>). <source>Software Product Line Engineering: Foundations, Principles and Techniques</source>. <publisher-loc>Berlin, Germany</publisher-loc>: <publisher-name>(Springer Science &#x26; Business Media)</publisher-name>. </citation>
</ref>
<ref id="B43">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Prud&#x2019;homme</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Fages</surname>
<given-names>J.-G.</given-names>
</name>
<name>
<surname>Lorca</surname>
<given-names>X.</given-names>
</name>
</person-group> (<year>2017</year>). <source>Choco 4 Documentation. TASC - LS2N CNRS UMR 6241, COSLING S.A.S</source>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="https://choco-solver.org/">https://choco-solver.org/</ext-link>
</comment> </citation>
</ref>
<ref id="B44">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Qin</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Eichelberger</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>2016</year>). &#x201c;<article-title>Impact-minimizing Runtime Switching of Distributed Stream Processing Algorithms</article-title>,&#x201d; in <conf-name>Proceedings of the Workshops of the {EDBT/ICDT} 2016 Joint Conference</conf-name>, <conf-loc>Bordeaux, France</conf-loc>, <conf-date>March 15, 2016</conf-date> (<publisher-name>{CEUR-WS.org</publisher-name>). </citation>
</ref>
<ref id="B45">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Segal</surname>
<given-names>M. E.</given-names>
</name>
<name>
<surname>Frieder</surname>
<given-names>O.</given-names>
</name>
</person-group> (<year>1993</year>). <article-title>On-the-fly Program Modification: Systems for Dynamic Updating</article-title>. <source>IEEE Softw.</source> <volume>10</volume>, <fpage>53</fpage>&#x2013;<lpage>65</lpage>. <pub-id pub-id-type="doi">10.1109/52.199735</pub-id> </citation>
</ref>
<ref id="B46">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Seifzadeh</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Abolhassani</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Moshkenani</surname>
<given-names>M. S.</given-names>
</name>
</person-group> (<year>2013</year>). <article-title>A Survey of Dynamic Software Updating</article-title>. <source>J.&#x20;Softw. Evol. Proc.</source> <volume>25</volume>, <fpage>535</fpage>&#x2013;<lpage>568</lpage>. <pub-id pub-id-type="doi">10.1002/smr.1556</pub-id> </citation>
</ref>
<ref id="B47">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>&#x160;elajev</surname>
<given-names>O.</given-names>
</name>
<name>
<surname>Gregersen</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2017</year>). &#x201c;<article-title>Using Runtime State Analysis to Decide Applicability of Dynamic Software Updates</article-title>,&#x201d; in <conf-name>Proceedings of the 12th International Conference on Software Technologies</conf-name> (<publisher-name>SciTePress</publisher-name>), <fpage>38</fpage>&#x2013;<lpage>49</lpage>. </citation>
</ref>
<ref id="B48">
<citation citation-type="book">
<comment>[Dataset]</comment> <collab>The Apache Software Foundation</collab> (<year>2015a</year>). <source>Apache Flink.</source>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="https://flink.apache.org/">https://flink.apache.org/</ext-link>
</comment> </citation>
</ref>
<ref id="B49">
<citation citation-type="book">
<comment>[Dataset]</comment> <collab>The Apache Software Foundation</collab> (<year>2015b</year>). <source>Apache Spark.</source>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="https://flink.apache.org/">https://flink.apache.org/</ext-link>
</comment> </citation>
</ref>
<ref id="B50">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Toshniwal</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Taneja</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Shukla</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Ramasamy</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Patel</surname>
<given-names>J.&#x20;M.</given-names>
</name>
<name>
<surname>Kulkarni</surname>
<given-names>S.</given-names>
</name>
<etal/>
</person-group> (<year>2014</year>). &#x201c;<article-title>Storm@ Twitter</article-title>,&#x201d; in <conf-name>Proceedings of the 2014 ACM SIGMOD international conference on Management of data</conf-name>, <conf-loc>Snowbird, UT</conf-loc> (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>Association for Computing Machinery</publisher-name>), <fpage>147</fpage>&#x2013;<lpage>156</lpage>. <pub-id pub-id-type="doi">10.1145/2588555.2595641</pub-id> </citation>
</ref>
<ref id="B51">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>van Hoeve</surname>
<given-names>W.</given-names>
</name>
</person-group> (<year>2001</year>). <article-title>The Alldifferent Constraint: A Survey</article-title>. <source>Arxiv preprint cs/0105015</source>, <fpage>1</fpage>&#x2013;<lpage>42</lpage>. </citation>
</ref>
<ref id="B52">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Zaharia</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Chowdhury</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Franklin</surname>
<given-names>M. J.</given-names>
</name>
<name>
<surname>Shenker</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Stoica</surname>
<given-names>I.</given-names>
</name>
</person-group> (<year>2010</year>). &#x201c;<article-title>Spark : Cluster Computing with Working Sets</article-title>,&#x201d; in <conf-name>HotCloud&#x2019;10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing</conf-name>, <conf-loc>Boston, MA</conf-loc> (<publisher-loc>United States</publisher-loc>: <publisher-name>USENIX Association</publisher-name>),&#x20;<fpage>10</fpage>. </citation>
</ref>
</ref-list>
</back>
</article>