A framework for implementing metaheuristic algorithms using intercellular communication

Metaheuristic procedures (MH) have been a trend driving Artificial Intelligence (AI) researchers for the past 50 years. A variety of tools and applications (not only in Computer Science) stem from these techniques. Also, MH frequently rely on evolution, a trademark process involved in cell colony growth. Generally, MH are used to approximate the solution to difficult problems but require a large amount of computational resources. Cell colonies harboring synthetic distributed circuits using intercell communication offer a direction for tackling this problem, as they process information in a massively parallel fashion. In this work, we propose a framework that maps MH elements to synthetic circuits in growing cell colonies. The framework relies on cell-cell communication mechanisms such as quorum sensing (QS) and bacterial conjugation. As a proof-of-concept, we also implemented the workflow associated to the framework, and tested the execution of two specific MH (Genetic Algorithms and Simulated Annealing) encoded as synthetic circuits on the gro simulator. Furthermore, we show an example of how our framework can be extended by implementing another kind of computational model: The Cellular Automaton. This work seeks to lay the foundations of mappings for implementing AI algorithms in a general manner using Synthetic Biology constructs in cell colonies.


3
This alternative paradigm for designing, implementing and executing MH is developed in the context of large-scale individual systems. The original population of solutions that take part in the execution is replaced by a set of individual entities, such as cells (in this work bacteria, specifically). Large-scale parallelism is a consequence of moving towards this new paradigm, but also the use of the procedure within a biological environment. This expands the scope of MH, establishing a wider array of possible implementations and problems to tackle. This association is logical, as these techniques work on a set of different elements (solutions) and apply changes on these elements to explore a search space and eventually reach a good solution in a reasonable amount of time according to specific constraints.
MH have been long studied and possess a defined structure 16 . One approach towards implementing AI using Synthetic Biology is shown in this article in the form of a framework that automates the mapping of MH elements to synthetic constructs. A proof of concept is implemented to show the automation of the process and generation of readily executable simulation files for the cell colony simulator gro 17,18 .

RESULTS
One key aspect for using MH is to be able to represent all elements necessary for the execution of the procedure. Mainly, this involves the solution pool used in the execution, a fitness function to evaluate different solutions, and operations that carry out the exploration of new solutions. The application and design of these elements in a context of Synthetic Biology is not straightforward, as often they are dependent on the problem to solve. However, in this work, we propose a general mapping scheme to relate each of the elements which participate in an MH to a functional synthetic construct and make the association easier. The whole set of constructs is then organized and 4 distributed over a pool of individuals (in this case, bacteria) to represent, and reproduce dynamics associated to the procedures. These constructs are designed from a general standpoint and seek to translate each of the involved components using transcriptional logic gates, intercellular communication mechanisms, and external elements such as environmental signals. It should be highlighted that the mapping presented here is a proposal and could be complemented and extended with other kinds of mechanisms, such as CRISPR 19,20 systems, external conditions such as temperature, nutrient consumption, or specific spatial conditions. Also, it should be stressed that our proposal heavily relies on intercellular communication, since it offers a higher computation power and also distributes it among the colony cells. The main intercell communication processes used were bacterial conjugation [21][22][23] and Quorum Sensing (QS) [24][25][26][27] .
The framework is composed by three parts: 1) A set of parameters that configure the execution of the instance of a MH. This set of parameters is always the same for the selected technique, despite having specific values to solve different problems. At this stage, the input parameters for the procedure are abstracted and generalized for multiple instances of the selected MH.
2) A mapping/translation language to relate specific elements of the MH technique to genetic circuits. This is the fundamental idea and value of the presented work, as it provides the blueprints for automating the design of MH in Synthetic Biology. How specific elements are ported to a genetic circuit will be discussed further in this section.

5
3) An interpreter that automates the translation of the specification of the algorithm into a skeleton of a gro source code, so the MH can readily be simulated and tested. The output design of this interpreter is generated based on 2) and configured based on 1).
A depiction of the framework elements and their relationship is shown in Figure 1.  as first, it eliminates the need for fully understanding all intricate mechanisms of the MH despite being able to use it, and second, automates its design thanks to the mapping that translates all of the elements into gene circuits (that are outputted in the form of a gro specification file, but also set a starting point in the design of gene circuit implementation related to the MH in the wet-lab).
We believe that each MH implemented for cell colonies following the proposed approach, and pursuing an optimization goal, represents a specialized form of Directed Evolution. It establishes further definition and control from an algorithmic standpoint, because the general algorithmic logic and evolution steps are explicitly specified. Furthermore, this continuous evolution is constantly being evaluated in MH by means of a fitness function. The variability for expressing and implementing this function within the context of our framework offers improved flexibility, expressiveness and specificity in the expected solutions with respect to the original definition of Directed Evolution.
Circuit design is done sequentially over the framework on the basis of fundamental part integration and the idea that all of the components of the MH procedure can be expressed in terms of these parts. Such parts merge into a more complex circuit that evaluates the inputs and outputs a function of these inputs. The circuits implement different elements of an MH, such as fitness function, solution representation, or evolution operators. These circuits will be presented and described after reviewing the basics on MH.

Metaheuristics (MH)
MH are probabilistic techniques that take inspiration on certain observed general phenomena. The dynamics of the observed phenomena are then simplified and expressed in procedures that use input parameters for configuring and following the execution sequence. MH are mostly used for 7 approximating solutions to difficult optimization problems. The simplification of the phenomenon and general approach of algorithm execution is a heuristic 28 . The heuristic is a function that seeks to guide the execution by estimating the reward that would be associated to carrying out a given step or strategy.
Inspiration for MH can originate from the most varied situations. Early instances of these techniques are Evolutionary Strategies 29 , Genetic Algorithms 30,31 and Genetic Programming 32 .
These procedures are all based on the phenomenon of natural selection. Evolving solutions, evaluating them and selecting for the best ones is the main heuristic driving these techniques. Since the evolution of the solution is guided by the heuristic, and the techniques are probabilistic, there is no certainty of convergence or eventually reaching the global optimum (and therefore are tagged as approximation procedures). However, the current best solution will tend towards an optimum (either local or global). Solutions are explored over a large landscape, called the search space.
Since exploring the whole search space is not possible in most cases, strategies to partially visit the search space and evaluate solutions -to find the best possible one -are implemented. In sum, the procedure will actually be capable of improving the solution more by further evolving it, and in the long run (possibly infinite), will find the best possible solution. This same scenario can be modelled for other inspiration sources such as metallurgy processes, ant colonies or bird flock movements to name a few.
In the next subsections, the two MH (Simulated Annealing and Simple Genetic Algorithms) that will specifically be implemented and tested through our framework will be described.

Simulated Annealing (SA)
SA 8-10 is a MH that uses a controlled annealing process as inspiration for searching for optimal solutions. The goal of this algorithm is to best approximate a global optimum of a function. The definition of SA relies mainly on a temperature cooling function: the real annealing process requires the input metals to first be heated, merged, and later slowly cooled down. The goal is to achieve larger stable crystallization. Hence, the size and stability of the crystals depends on how the metal mixture is cooled down.
Being mainly an optimization technique, SA seeks to improve a solution according to specific problem conditions. The procedure works iteratively: first, a random solution is chosen and stored as the best one that has been found. Then, at each step, a random solution in the neighborhood of the previously chosen one is selected and evaluated. This new solution replaces the best one found with a given probability, and dependent on the temperature function and the fitness value of each of the involved solutions. This means that even a worse quality solution could replace the current best solution with a certain probability. The temperature function represents the probability of accepting any solution as a better one while the search space is explored. This function decreases as the algorithm execution progresses, entailing a gradually more localized search. The spirit of exploring potentially worse quality solutions is to reach other better solutions through them, and to not stay trapped at a local optimum.
The procedure of SA is shown in Figure 2.

Simple genetic algorithm (SGA)
SGA 30,31 finds its origins in evolution: it is an iterative MH that evaluates and evolves a pool of solutions in search of the fittest one. It is strongly based on evolution defined by Charles Darwin 33 .
The key to assessing each solution lies in the definition of a fitness function, that returns a score based on the features of the individual being evaluated. Two kinds of operations are used for evolving: crossover and mutation. Each of these operations represents, respectively, local search

Synthetic circuit designs for MH simulation
Heuristics are embedded in MH through a fitness function that evaluates and guides the search for best solutions within a search space. As a base assumption in this context, we will link the individual solution to the information inside a single cell. The presence or absence of a set of proteins of interest will act as the specific solution instance. Therefore, depending on which proteins are present or absent, each cell represents an individual and independent solution.
Both evaluation and evolution dynamics will be implemented by taking advantage of cell capabilities. Bacterial conjugation will be used as the main backbone for evolution operations both in SA (solution mutation) and in SGA (crossover operation). Since solutions are represented as a set of proteins within a cell, perturbations of the set occur upon the arrival of a plasmid containing new proteins of interest into the cell. Fitness evaluation is organized in a synthetic circuit that senses the presence or absence of the proteins and performs a certain action (GFP expression in this case) when fitness is optimal.
A summary of the complete proposed mapping for both SA and SGA is depicted in Figure 4. if said evaluation is successful, triggers an action. In this case, it is GFP expression, but that action may be replaced by any other. B. The mutation operation for SGA acts on the expression of a specific protein in the design, changing the solution to evaluate. The mutation rate parameter for SGA maps directly to the mutation rate configured in the simulation. This operation accounts for global search in terms of the solution exploration. C. Crossover is a recombination operation that we mapped to bacterial conjugation. Part of a foreign solution is integrated to the current one. In our mapping, we individualized a single protein to be held by a unique plasmid, therefore mobilizing a single protein between solutions for recombination. Conjugation rate is the parameter that accounts for the SGA crossover rate parameter. D. SA is largely based on a temperature decrease function: we use environmental signals (such as aTc or IPTG) as its representation. The temperature is associated to the signal concentration at a given location. The decrease is achieved by the shoving mechanical effects of the cell colony: the center of the colony experiences a greater temperature that the outer sections.

13
Other evolution models

Cellular automaton (CA)
A CA 34-36 is a n-dimensional grid structure where each of its cells has a state. These states are Algorithm-related circuit construction process is automatically done in the next step.
These are all general parameters that are useful for specifying both SGA and SA. However, some specific parameters must also be collected in the case of each algorithm. For SGA, both mutation 15 and crossover ratios must be provided. Mutation is implemented as an arbitrary change in the state of a protein within a cell, while crossover is simulated as a bacterial conjugation event. In the case of SA, the basic additional parameter is the temperature decrease ratio. In terms of gro simulations, this ratio is translated to the diffusion factor of an environmental signal (such as aTc), since the temperature can be associated to the signal concentration.
For implementing CA, they key parameter is setting intercell signaling using appropriate diffusion and degradation ratios. These parameters configure the distance from the signaling cell to its furthest neighbor. The goal of this configuration process is to emulate the Moore neighborhood in 2D. Within these settings, rules are encoded based on the concentration of signal sensed by cells.
An example is shown in Figure 5, and a summary of the parameters involved in the framework (and for CA modeling) is compiled in Table 1.

Automated generation of gro skeleton simulation file
Finally, with the model in place and informed by the input parameters, a simulation file generator constructs a gro simulation skeleton file. The mapping to the gro file takes place using the abstractions present in the simulator such as proteins, plasmids, environmental signals, etc. This file can be directly run by the gro simulator, or it can be modified by the final user for more specific operation.

SGA examples were designed and implemented for simulation. In all examples, each solution is
represented as a set of proteins that can be either present or absent. Each of these proteins is under the control of a promoter as a single gene in an operon. In turn, each operon resides in a different conjugative plasmid.
Each solution is then evaluated through a fitness function. This function is encoded in an operon that checks for a subset of necessary proteins that should be present, a subset of detrimental proteins that should be absent, and a subset of proteins that have no effect on the fitness of the solution. The operons that implement this function are also encoded in a single plasmid. Cells that comply with the requirements of the fitness function are classified as optimal solutions. Optimal solutions are marked by expressing GFP, and all other bacteria are uncolored. This is done merely for simulation purposes, but GFP could be replaced with different processes such as cell death, growth rate configuration or intercell signaling, depending on the purpose of the evaluation.
Crossover operation is mapped to bacterial conjugation between cells. Conjugation rate is therefore associated to the crossover rate parameter of the original SGA. Mutation operation was modeled as promoter mutation leading to incorrect functioning of the circuit, and arbitrary change in protein expression. Selection is random, since arbitrary recombination occurs, and bacterial conjugation is a simulated as a stochastic process. The last important element of SA, the temperature value, was related to an aTc global signal.

SA execution examples
Therefore, this value is linked to the concentration of aTc at a given location. Temperature decrease is simulated mechanically, as cells are pushed outwards and experience a lower aTc concentration (equivalent to a lower temperature).

CA execution example
Finally, and as an extension to the proposed framework, we implemented a simulation in gro relying mainly on QS. This was necessary to map the idea of Moore neighborhood to a cell colony context (see Figure 5). To preserve a static neighborhood, we eliminated growth from the colony 20 simulation. The implemented CA simulation was an adaptation of Conway's Game of Life. The synthetic circuit to implement this logic is based on the idea of band detection 38,39 : overcrowding and under-crowding are conditions that induce grid cell death, while a mid-level crowding amount induces grid cell life. The equivalence between the original Game of Life model and the proposed simulation is that a grid cell is mapped to a single cell in a colony.
The color code for this simulation was to use RFP for live cells and uncolored cells for "dead" ones. Cell state is determined based on the concentration of AHL at the cell location: high and low concentrations induce the cell to the "death" state, while a mid-level concentration makes the cell glow red. There are also some green cells, which have the task of starting the system, since there is no initial amount of AHL in the environment. These cells are placed randomly in the colony and are controlled by an environmental signal (aTc) as a start switch.
A summary of the circuits implemented for the simulations is depicted in Figure 6. It should be noted that a CA does not use a fitness function, therefore does not include this part in the implementation. One of the modules initiates the system, since no AHL signals are present in the beginning (and represent the neighborhood signaling for a "live cell"). Once the system is started, the first module ceases its operation and the other module, an adaptation of the Basu et al. 38

DISCUSSION
We presented a new framework that proposes a mapping to associate MH to genetic circuits to be encoded in a growing cell colony. The idea behind this proposal is to bring the inspiration from evolution, used for EA (and more broadly MH), back to its origins -an evolving and growing cell colony -for assessing its viability as an implementation testbed. One of the advantages of this approach is that several intrinsic processes involved in the cell and necessary for evolution, such as growth, gene circuit operation, or intercell communication need not be artificially imposed on the model. However, it is their integration into the model, and the level of control which must be studied and adapted to be a suitable element within the mapping. One example of such features may be mutations that occur in the DNA sequence: although this is a process that can be directly linked to mutations in the definition of SGA, for instance, it is also practically impossible to guarantee a mutation rate (as a parameter for SGA). This leads to a couple of possible options.
First, a redefinition of MH within a different paradigm immersed in a biological context.
Adaptation of MH to a context in which some parameters and/or elements are not fixed, controllable or can be mapped. Second, a direct and artificial mapping of the MH, forcing relationships and mappings to maintain a strict link to their original definition. Since our testbed was a simulation platform, we took a hybrid approach, leaving some of the processes, such as colony growth, to be controlled by the simulator and to be interpreted within the MH execution.
This flexibility can be observed, for example, in the implementation of the original version of SA, where there is no mention of "growth" in its definition. Our implementation maps it to the evolution of the temperature function, and thanks to mechanical shoving of the cells outwards of the colony, acts as temperature decrease (aTc concentration is a candidate to represent the temperature measure). However, a strict link was maintained to the evolution of the solution itself and encoded in the cells as a boolean function based on plasmid and/or protein presence. This could have been modeled in a different manner and only have relied on intrinsic mutation. In sum, we propose a framework and one possible mapping for relating MH to synthetic circuits, however, other possible mappings are also valid. Concerning CA, QS is a key player in our implementation, driving the simulation of the model, as it relies heavily on the signal diffusion and degradation parameters. It is very difficult to faithfully reproduce an original Moore neighborhood using QS, since it is not guaranteed that the neighborhood includes a specific number of neighbors or that diffusion can be parametrized, in-vitro or in-vivo, to an extent that an immediate neighborhood can be detected with a very low concentration of AHL. However, we have shown that it is possible to simulate a CA using (simulated) cell colonies. Characterizing the power of CA in cell colonies and specifying the limits of the expressivity for these models becomes an important matter, as there are cases of CA that are Turing-complete models 40,41 .
We believe that the model proposed in this work can be directed towards solving problems involving a large number of variables and in which many solutions need to be evaluated. This is based on the fact that cell colonies have very large counts and large-scale parallelism in the solution evaluations is possible. We think that the number of solutions that is evaluated in parallel within this context is not something that can be achieved by a traditional computer. However, it is also true that even though the proposed model is extensible, a lack of well-characterized synthetic parts may pose a problem in terms of orthogonal intercell communication [42][43][44][45] and variety of elements to 24 construct large synthetic circuits. Future in-vivo implementations can be assisted by software to help select the proper parts for the design 46,47 . Typical SGA and SA algorithms running in a computer use a relatively "low" number of solutions with respect to our proposal (and our testbed is still low on cell counts). Cell colonies are capable of evaluating orders of magnitude more solutions. One constraint to which our work is subjected at this stage is that the current version of the gro simulator works with digital proteins. We are aware that this is a large limitation and that the power of the proposed framework would increase greatly by extending its definition to work with analog values of protein expression instead of digital values.
An existing and common related technique is directed evolution. We believe that including rules defined by a MH, evolution can be controlled further and more precisely. Also, it is our opinion that the process is made autonomous, since the selection machinery can be programmed and expressed in terms of synthetic circuits. Also, if multicellular distributed circuits with intercell communication are taken into account, complex computation and conditions can be described 48,49 .
Protein engineering 50 is an application to which this research could be applied in that the properties or functionalities of the protein are encoded as a fitness function, establishing the selection mechanism for desired proteins to be evolved.

Future work
Current research is being invested into relating different AI algorithms such as Neural Networks, Reinforced Learning (Q-Learning) and other MH, such as Ant Colony Optimization, to our framework. Also, a related direction would be to further study the proposed framework by broadening the tools used to implement the underlying synthetic circuits. An idea in this direction 25 would be to include bacteriophage infection as an intercell communication method 51 in the framework definition. The current renewed interest in AI, and its need of powerful computational resources, offers a huge opportunity for directing the potential of Synthetic Biology towards satisfying those needs and providing an alternative paradigm (and more natural, since inspiration for most MH actually comes from biology) for solving difficult problems. The goal of this ongoing and future research is to reach the definition of a global AI framework 52 . Characterizing CA computational power in cell colonies is another interesting and important topic to study. Another long-standing debt of our research group is the linkage of the gro simulator to accept SBOL 53,54 specifications as input. In the context of the work presented in this paper, the association of SBOL to Agent/Individual based Model (AbM/IbM) simulators such as gro can go further and entail an AI toolkit within SBOL for immediate implementation of such algorithms in cell colonies. A more distant direction is to use the framework to internally implement EA for Xenobots 55 using division of labor 56 to perfect and automate their functionality.

Materials and Methods
We