Edited by: Alex Pitti, Université de Cergy-Pontoise, France
Reviewed by: Frederic Alexandre, Inria Bordeaux - Sud-Ouest Research Centre, France; Benoît Girard, Centre National de la Recherche Scientifique (CNRS), France
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
We propose an architecture for the open-ended learning and control of embodied agents. The architecture learns action affordances and forward models based on intrinsic motivations and can later use the acquired knowledge to solve extrinsic tasks by decomposing them into sub-tasks, each solved with one-step planning. An affordance is here operationalized as the agent's estimate of the probability of success of an action performed on a given object. The focus of the work is on the overall architecture while single sensorimotor components are simplified. A key element of the architecture is the use of “active vision” that plays two functions, namely to focus on single objects and to factorize visual information into the object appearance and object position. These processes serve both the acquisition and use of object-related affordances, and the decomposition of extrinsic goals (tasks) into multiple sub-goals (sub-tasks). The architecture gives novel contributions on three problems: (a) the learning of affordances based on intrinsic motivations; (b) the use of active vision to decompose complex extrinsic tasks; (c) the possible role of affordances within planning systems endowed with models of the world. The architecture is tested in a simulated stylized 2D scenario in which objects need to be moved or “manipulated” in order to accomplish new desired overall configurations of the objects (extrinsic goals). The results show the utility of using intrinsic motivations to support affordance learning; the utility of active vision to solve composite tasks; and the possible utility of affordances for solving utility-based planning problems.
This work proposes an architecture for the control and learning of embodied agents. The architecture has been developed within an open-ended learning context.
Example scenario used to test the open-ended learning architecture proposed here.
Facing the challenges posed by the scenario requires different functions. The functions used by the architecture proposed here are summarized in
Main functions (gray boxes) and their relations (arrows with labels) implemented by the architecture proposed here. The dashed and dotted frames contain the main functions used in the intrinsic phase and extrinsic phase, respectively. During the intrinsic phase
The proposed architecture has been developed within the area of developmental and autonomous robotics called
A central concept in open-ended learning is the one of
While various works use IMs as a means to directly guide the autonomous learning of skills (e.g., Schmidhuber,
Much research on open-ended learning has focused on the autonomous acquisition of knowledge during “intrinsically motivated” phases. On the other side, only a few works (e.g., Schembri et al.,
In the extrinsic phase, we consider a test that requires solving a complex task formed by multiple sub-tasks each involving a specific object. The reason why we focus on these types of complex tasks is that: (a) they are involved in most sensorimotor non-navigation robotics scenarios requiring object-manipulation; (b) active perception (see below) can be extremely useful to tackle these scenarios. A possible strategy to solve complex tasks is based on
In order to study the relations between affordances (see below) and planning, we focus here on two types of planning strategies investigated in the literature on planning (Ghallab et al.,
As mentioned above, an important dimension of the focus of this work concerns
In the literature, the concept of affordance has occasionally been broadened to refer to various elements relevant for planning. For example, affordances have been associated with three functions linking three critical elements of behavior (Montesano et al.,
Here we contribute to the investigation of the possible functions of affordances for autonomous agents by assuming a restricted definition of them. This definition allows us to evaluate the utility of affordances within planning systems and also to contribute to clarifying the relation between the concept of affordance used in psychology and in robotics. Informally, the definition is this:
We illustrate the features of this definition and its differences and links with other definitions. (a) The definition is more specific than other definitions that often have vague features. (b) The states
We now introduce a pivotal feature of our system, the use of
A first contribution of this study is on how intrinsic motivations can support efficient learning of affordances, in particular when an attention mechanism focusing on objects is used. There are some previous studies linking affordance learning to intrinsic motivations and active learning (Ugur et al.,
A second contribution of this work is the study of how the introduction of the attention mechanism, extracting information about the single object and about the object appearance/location impacts (a) the affordance learning process and (b) the second extrinsic phase where planning is needed to accomplish an extrinsic complex goal. The first issue has only been indirectly studied in the literature on affordances where models often assume pre-processing mechanisms to extract information on specific objects (see also the “OAC – Object Action Compound” framework pivoting on object information; Krüger et al.,
A third contribution of this work concerns the relationship between affordances and planning. In particular we will face the problem of what could be the utility of affordances, defined as the probability estimate of action success, within a planning system that is endowed with refined components implementing
The rest of the paper is organized as follows. Section 2 illustrates the experimental setup, and the architecture and functioning of the system. Section 3 shows the results of the tests. Section 4 compares the system proposed here to other systems proposed in the literature. Finally, section 5 draws the conclusions and illustrates open problems that might be tackled in the future.
The experimental scenario (
Each test consists of two phases: the
The agent (section 2.3) is endowed with a simulated camera sensor that can look at different sub-portions of the working space, and is able to select and perform four actions on the object that is at the center of its camera. The actions can move the object to a new position or change its texture to a particular color (red/green/blue); different objects afford only a subset of these actions. As often assumed in the affordance literature, action execution is based on pre-programmed routines implementing the effect of the action that is selected and triggered.
Three versions of the system are implemented and compared: IGN, FIX and IMP. The three systems differ in the IM mechanisms they use to support affordance learning: FIX uses a mechanism taken from the literature (Ugur et al.,
The working space is formed by a 150 × 150 pixel square. The points of the working space are encoded as a 3D binary array where the first 2 dimensions encode the x-y pixel position, and the third dimension encodes the color (RGB). The color of the working space background is black. All objects are initially located on the vertexes of a 3 × 3 regular grid (white square in
Each object presents the following attributes: (a) center: x-y coordinates; (b) color: three values for red, green, and blue; (c) shape: either square, circle, or rectangle. Assuming the working space has a side measuring 1 unit, the circle has a diameter measuring 0.1 units, the square has a side measuring 0.1 units, and the rectangle has sides measuring 0.6 and 0.16 units.
As a consequence of the actions performed by the agent, the position and color of the objects can change. This defines the possible affordances of objects: “movable,” “greenable,” “redable,” and “bluable.” Each object has a specific subset of affordances, for example a blue circle is “movable” and “redable.” In some tests, affordances are stochastic in the sense that the related actions can produce an effect only with a certain probability.
The system controller consists of three different components (
System architecture: main components.
The perception component is responsible for the attention processes supporting visual exploration of both the environment and, in the extrinsic phase, the goal image. The perception component implements two attention processes: an inner attention process operating in parallel with an outer attention process. The outer attention scans the environment on the basis of two bottom-up processes both affecting gaze (they sum up): the first process is sensitive to the saliency of objects, and the second process is sensitive to the changes of the appearance of objects produced by actions. The inner attention scans the goal image on the basis of either the saliency of objects or by having the same focus as the one of the outer attention process. All these attentional processes are activated when needed on the basis of intrinsic motivations or planning processes, as we now illustrate more in detail.
Attention actively guides a RGB visual sensor (a pan-tilt camera) returning an image centered on the current attention focus and sufficient to always cover the whole working space independently of the gaze pointing. The central part of such an image, called
The first type of bottom-up attention process, the saliency-based one, is driven by the most “salient” elements in the peripheral image, here the activation of pixels corresponding to objects. This process is implemented as follows. First a random noise (ϵ ∈ [−0.05, 0.05]) is added to each pixel of the peripheral image and the resulting image is smoothed with a Gaussian filter. Then the pixel with the maximum activation is used as the focus of attention (but if a change happens the second bottom-up attention mechanism also intervenes, see below). Thanks to the Gaussian smoothing, the focus falls around the center of the focused object, which thus becomes wholly covered by the focus image. The noise fosters exploration as it adds randomness to the saliency of objects, thus leading the system to explore the different objects.
Note that in the future more sophisticated approaches might be used to ensure that the focus-image involves only the focused object of interest (e.g., object-background segmentation approaches). Moreover, other mechanisms might be used to ensure a more efficient scan of the environment (e.g., inhibition of return might be used to avoid scanning the same location multiple times). These mechanisms are not considered here for simplicity and because the focus of this research is on the
The second bottom-up attention process, sensitive to changes, is directed to detect the effects of actions. The process works as follows. Firstly the system focuses on the portion of space where a
An affordance is considered to be in place if: (a) an effect is detected; (b) the effect is the one related to the performed action (e.g., the object is displaced by the action “move,” or the object is made green by the action “change object color to green”). This information is used to train the affordance predictors. The focus image and object position after the action execution are used to train the effect predictors.
The system is equipped with four actions: move object, change object color to green, change object color to red, and change object color to blue. The move action can displace objects in the environment if they have the affordance for this effect. The move action is parametric: it affects the target object (object under focus) on the basis of two parameters corresponding to the object desired x-y location. During the intrinsic phase, the desired location is randomly generated within the working space (excluding positions that cause object overlapping). During the extrinsic phase, the target location corresponds to the location of the “sub-goal” that the system is currently attempting to accomplish (see section 2.4.3). The color-changing actions are non-parametric: they simply change the color of the target object into the desired one if the object has the affordance for the corresponding effect. Only one color-change action might have been considered if parameterized with the color (this would have been a discrete parameter vs. the continuous parameters of the move action). We chose a non-parametric version of the color actions to develop the features of the system working for both parametric and non-parametric actions.
The predictor component is formed by 16 predictors (these are regressors), 4 for each of the 4 actions: (a)
The affordance predictors estimate the affordance probability
Each learning-progress predictor gets as input the focus image and returns, with a continuous linear output unit, the learning progress of the associated affordance predictor. The predictor is updated with where the target for learning is the difference in the output of the corresponding affordance predictor, computed before and after the action is performed and before the affordance predictor is updated.
Each of the what-effect predictors gets as input the focus image and predicts, with sigmoidal output units, the focus image after the action performance. The predictor is updated with rule with a target corresponding to the observed focus image after the action is performed.
Each of the where-effect predictors gets as input the initial (x, y) position of the target object and the desired (x, y) position of the object depending on the sub-goal, and predicts, with two linear units, the predicted object (x, y) position after the action performance [x and y coordinates are each mapped to the range (0, 1)]. The predictor is updated with a where the target for learning is the object position after the action is performed.
In this section we first present the motivation signals (section 2.4.1) and the algorithm for learning affordances and forward models (section 2.4.2) used by the three versions of the system (IGN, FIX, and IMP) during the intrinsic phase. Then we describe the two algorithms of the attention-based goal planner (section 2.4.3) and the attention-based utility planner (section 2.4.4) used in the extrinsic phase.
In the intrinsic phase, the system autonomously explores the objects in the environment to learn affordances and train its predictors. The exploration process is driven by IMs related to the knowledge acquired by the affordance and learning-progress predictors. Depending on how the IMs are implemented, we have three versions of the system: FIX, IGN, and IMP.
The FIX system uses an IM mechanism for affordance learning like the one used by Ugur et al. (
In our case, as a measure of how interesting the current object is we consider the Shannon entropy of the estimated affordance probability:
where we considered
Following Ugur et al. (
The IGN system (IGN stands for “IGNorance”) is a first version of our system that is directed to overcome the limitations of the IM mechanism of FIX. The new mechanism uses a dynamic threshold
where
A limitation of the IM mechanisms of IGN is that it is not able to cope with stochastic environments where the success of an action is uncertain (e.g., when the agent tries to move an object, the object moves only with a certain probability). In this case, the affordance predictor will tend to converge toward the corresponding probability
The IMP system (IMP stands for “IMProvement”) overcomes the limitations of IGN by suitably coping with stochastic environments. The motivation signal used by IMP is implemented as the absolute value of the learning-progress predictor output (let's call this
The mechanism of the leaky average threshold, used in IGN and IMP, allows the agent to indirectly compare the relative levels of how interesting different objects are, and to focus the exploration effort on the most interesting of them notwithstanding the fact that
The intrinsic phase allows the system to autonomously explore the objects by looking at and acting upon them. Based on the observed consequences of actions, this allows the system to train the predictors, including those related to affordances. At each step of the phase, the system performs a number of operations as illustrated in
Intrinsic phase: one step of learning of affordances and forward models
1: | (object_image, object_position) ← Scan(environment) | |
2: | (action, motivation_signal) ← SelectActionWithHighestIM(action_list, predictors, object_image, object_position) | |
3: | ||
4: | ExecuteAction(action, object_image, object_position) | |
5: | (new_object_image, new_object_position) ← ScanEffect(new_environment, environment) | |
6: | affordance ←… | |
7: | Affordance(action, new_object_image, new_object_position, object_image, object_position) | |
8: | UpdateWeights(affordance_predictor, action, object_image, affordance) | |
9: | UpdateWeights(affordance_predictor, action, object_image, affordance, improve_predictor) | ⊳ Only IMP |
10: | |
|
11: | UpdateWeights(effect_predictors, action, object_image, object_position,… | |
12: | new_object_image, new_object_position) | |
13: | |
|
14: | motivation_threshold ← LeakyAverage(motivation_threshold, motivation_signal) | ⊳ Only IGN/IMP |
15: | ||
16: | motivation_threshold ← LeakyAverage(motivation_threshold, 0) | ⊳ Only IGN/IMP |
17: |
The algorithm is based on the following operations and functions: (a)
During the extrinsic phase, the system is tested for its capacity to accomplish an “overall goal” based on the knowledge acquired during the intrinsic phase. Such an overall goal is assigned to the agent through the presentation of a certain desirable spatial/color configuration of some objects in the environment. The agent stores the goal as an image (“goal image”). The configuration of the objects is then changed and the task of the agent is to act on the environment to arrange it according to the goal image.
Importantly, the agent scans the goal image through a second “inner” attention mechanism similar to the “outer” attention mechanism used to scan the external environment. This inner attention mechanism is important to parse the goal image into
The operations taking place in one step of this process are shown in detail in
Extrinsic phase: one step of goal-based planning
|
⊳ Select non-achieved sub-goal |
(sub_goal_image, sub_goal_position) ← Scan(goal) | |
focus_image ← ScanEnvironmentWithSameFocusAsSubGoal(environment) | |
sub_goal_active ← GoalNotAchievedCheck(sub_goal_image, focus_image) | |
|
|
|
|
(object_image, object_position) ← Scan(environment) | ⊳ Select object |
(sub_goal_achievable, action) ← ActionPlanning(predictors, action_list,… | |
(sub_goal_image, sub_goal_position, object_image, object_position) | ⊳ Plan action |
|
|
ExecuteAction(action, object_image, object_position, action) | ⊳ Perform action |
focus_image ← ScanEnvironmentWithSameFocusAsSubGoal(environment) | |
sub_goal_active ← GoalNotAchievedCheck(sub_goal_image, focus_image) | |
|
|
|
Next, the agent checks if the sub-goal has not been accomplished yet. To this purpose, the function
If the sub-goal has not yet been accomplished, the system scans the environment to find a new object and then the function
Lastly, if a potentially successful action has been identified, it is executed and then the system checks again if the sub-goal has been accomplished. If so,
Importantly, the
In the case of
The utility-based planner works as shown in
where
Extrinsic phase: one step of utility-based planning
|
⊳ Select non-achieved sub-goal |
(sub_goal_image, sub_goal_position) ← Scan(goal) | |
focus_image ← ScanEnvironmentWithSameFocusAsSubGoal(environment) | |
sub_goal_active ← GoalNotAchievedCheck(sub_goal_image, focus_image) | |
|
|
|
|
(object_image, object_position) ← Scan(environment) | ⊳ Select object |
(sub_goal_achievable, action) ← ActionPlanning(predictors, action_list,… | |
sub_goal_image, sub_goal_position, object_image, object_position) | ⊳ Plan action |
|
|
object_utility ← ComputeUtility(object_affordance, sub_goal_value) | |
|
⊳ Computing the maximum possible utility |
|
|
potential_utility ← LeakyAverage(potential_utility, object_utility) | ⊳ Increase utility expectation |
|
|
|
⊳ Acting if high utility is attainable |
|
|
ExecuteAction(object_image, object_position, action) | |
|
|
potential_utility ← LeakyAverage(potential_utility, 0) | ⊳ Decrease utility expectation |
|
|
sub_goal_active = FALSE | |
|
|
|
When the
To test the performance of the systems, different tests were run with both deterministic and stochastic environments. Performance in the intrinsic phase was measured by evaluating the quality of the output of the predictors when receiving as input each one of the nine focused images corresponding to the nine possible objects (
In the deterministic environment two tests were run to test the goal-based planner. The first, called the
The five different scenarios (goal image and initial environment setup) used to test the systems in the extrinsic phase. The scenarios involve an increasing number of objects.
The second test, called the
Affordance predictor learning rates were α = 0.01 in the IGN and IMP systems and α = 0.002 in the FIX system. The learning rates of the learning-progress predictors were set to α = 0.005. The leaky average of the intrinsic motivation was updated with a leak rate ν = 0.3 in the IGN system and ν = 0.1 in the IMP system. The results of these tests are presented in the following sections.
In the base test, all the nine objects afford all the four actions with the exception of those not causing any change (
Base test: affordance probabilities for all objects and actions.
1 | Red square | 1.0 | 1.0 | 0.0 | 1.0 |
2 | Green square | 1.0 | 0.0 | 1.0 | 1.0 |
3 | Blue square | 1.0 | 1.0 | 1.0 | 0.0 |
4 | Red circle | 1.0 | 1.0 | 0.0 | 1.0 |
5 | Green circle | 1.0 | 0.0 | 1.0 | 1.0 |
6 | Blue circle | 1.0 | 1.0 | 1.0 | 0.0 |
7 | Red rectangle | 1.0 | 1.0 | 0.0 | 1.0 |
8 | Green rectangle | 1.0 | 0.0 | 1.0 | 1.0 |
9 | Blue rectangle | 1.0 | 1.0 | 1.0 | 0.0 |
Base test: affordance predictions (y-axis) after 6,000 learning steps for the four actions (four graphs) and nine objects (x-axis) averaged over 10 trials in the base test, for the IGN, FIX, and IMP systems. Mid-line of boxes shows median values, boxes show quartiles, and bars show the min-max range. The target values that the predictors had to estimate were 0 or 1.
These results offer a first validation of the idea that the IMP and ING systems, using a dynamic threshold for evaluating the interest of the current object in terms of its potential return of information, outperform the FIX system previously proposed in the literature. The reason is that the IMP and IGN systems can decide to explore or ignore an object based on the possibility of learning more from other objects, rather than in absolute terms as in FIX.
Base test: success of the extrinsic-learning process for the three systems IGN, FIX, and IMP.
IGN | 1.0 | 1.0 | 1.0 | 1.0 | 0.9 |
FIX | 1.0 | 1.0 | 1.0 | 1.0 | 0.8 |
IMP | 1.0 | 1.0 | 1.0 | 1.0 | 0.9 |
Completion time in the IMP system showed an approximately quadratic dependency on the number of sub-goals (
Completion times (y-axis) for the IMP system in the different extrinsic-phase scenarios involving an increasing number of objects (x-axis). Data refers to 100 simulations (10 runs of the extrinsic-phase test for each of the 10 runs of the intrinsic-phase learning process). For each scenario, the mid-line of boxes shows median values, boxes show quartiles, and bars show the min-max range. The dashed line shows a quadratic fit:
Three late-object tests were run. The features of the tests are summarized in
The structure of the three late-object tests.
Start | Start | Late | |
1 | Move: 1 | Move: 1 | Move: 0 |
Color: 1 | Color: 1 | Color: 1 | |
Start | Start | Late | |
2 | Move: 1 | Move: 0 | Move: 1 |
Color: 1 | Color: 0 | Color: 1 | |
Start | Late | Start | |
3 | Move: 1 | Move: 0 | Move: 1 |
Color: 1 | Color: 0 | Color: 1 |
“
In the first late-object test, all three systems successfully learned the affordances of all objects, including the non-movable rectangles introduced late (
First late-object test, IGN system. Affordance prediction for the four actions (4 graphs) and nine objects (lines in each graph) averaged over 10 trials. Red lines refer to red objects, green dashed lines to green objects, and blue dotted lines to blue objects. Markers on lines represent the shape of objects, where squares refer to square objects, circles to circular objects, and stars to rectangular objects. Note that an object of a color does not have the affordance to be turned to the same color (e.g., a red object cannot be turned red) as this involves no change.
First late-object test, FIX system. Red lines refer to red objects, green dashed lines to green objects, and blue dotted lines to blue objects; markers on lines represent the shape of objects, where squares refers to square objects, circles to circular objects, and stars to rectangular objects.
First late-object test, IMP system. Affordance prediction for the four actions (4 graphs) and nine objects (lines in each graph) averaged over 10 simulations. Red lines refer to red objects, green dashed lines to green objects, and blue dotted lines to blue objects; markers on lines represent the shape of objects, where squares refers to square objects, circles to circular objects, and stars to rectangular objects.
First late-object test: affordance predictions after learning. Plotted as in
Performance in the five extrinsic phase test scenarios (
First late-object test: success of the three systems IGN, FIX, and IMP in the five extrinsic scenarios.
IGN | 1.0 | 0.4 | 0.4 | 0.4 | 0.1 |
FIX | 0.8 | 0.0 | 0.0 | 0.0 | 0.0 |
IMP | 1.0 | 0.9 | 0.9 | 0.9 | 0.6 |
In the second and third late-object test, the three systems differed in their behaviors while learning the affordances during the intrinsic phase, but all presented a similar performance when tested in the extrinsic phase, so we report the data related to them in
Regarding the extrinsic-phase tests (
Second late-object test: success of the three systems IGN, FIX, and IMP in the five extrinsic scenarios.
IGN | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 |
FIX | 0.9 | 0.9 | 0.7 | 0.0 | 0.0 |
IMP | 1.0 | 0.9 | 0.9 | 0.0 | 0.0 |
The first and second late-object tests confirm that the IGN and IMP systems outperform the FIX system in learning affordances as they can decide to explore a certain object on the basis of a comparison between its expected information gain and the information gain expected on average from other objects.
In the third late-object test, none of the systems successfully learned to focus on, and predict accurately, the lack of affordances of the late circle objects (
During the extrinsic phase (
Third late-object test: success of the three systems IGN, FIX, and IMP in the five extrinsic scenarios.
IGN | 0.8 | 0.8 | 0.7 | 0.0 | 0.0 |
FIX | 0.9 | 0.9 | 0.8 | 0.0 | 0.0 |
IMP | 1.0 | 0.9 | 0.8 | 0.0 | 0.0 |
The stochastic environment used stochastic affordances for some objects and actions whereas the other affordances were as in the deterministic environment (
Stochastic environment: affordance probabilities for all objects and actions.
1 | Red square | 0.6 | 0.7 | 0.0 | 0.8 |
2 | Green square | 1.0 | 0.0 | 1.0 | 0.8 |
3 | Blue square | 1.0 | 0.7 | 1.0 | 0.0 |
4 | Red circle | 0.6 | 0.7 | 0.0 | 0.8 |
5 | Green circle | 1.0 | 0.0 | 1.0 | 0.8 |
6 | Blue circle | 1.0 | 0.7 | 1.0 | 0.0 |
7 | Red rectangle | 0.6 | 0.7 | 0.0 | 0.8 |
8 | Green rectangle | 1.0 | 0.0 | 1.0 | 0.8 |
9 | Blue rectangle | 1.0 | 0.7 | 1.0 | 0.0 |
The intrinsic phase was run 10 times each for 10,000 steps. The plots show the average performance over 10 trials. The leaky average of the maximum utility estimation was updated with a leak rate ν = 0.1 in both the IGN and IMP systems. The learning rate of the affordance predictors was set to α = 0.001 and the learning rate of the learning-progress predictors was set to α = 0.0005.
After learning, all three systems showed a good capacity to predict the affordances, but the IMP system was more accurate than the IGN and FIX systems as it could better employ the available learning time to accumulate more knowledge (
Stochastic environment: affordance predictions after learning. Plotted as in
Both the FIX and IGN systems fail to learn accurate affordance probabilities as they get high motivation signals for exploring stochastic objects even when there is no more knowledge to be gained on them. Only the IMP system is able to focus on different objects depending on the actual learning progress they can furnish.
As the IMP system learned affordance probabilities better than IGN and FIX, we ran the extrinsic-phase tests illustrated in the next section using only such a system.
The goal-based planner and the utility-based planner were compared by running an extrinsic-phase test in a stochastic environment with the four objects indicated in
Utility planning test: Objects in the utility planning test and their corresponding values, probability of goal-accomplishing action success and expected utility.
Blue circle | 1 | 0.7 | 0.7 |
Green square | 1 | 0.8 | 0.8 |
Red rectangle | 2 | 0.7 | 1.4 |
Red square | 4 | 0.6 | 2.4 |
The test was run 20 times for each of the 10 simulations of the intrinsic phase using different action budgets available to the system (1 to 5 actions). A small action-budget constraint introduces the need of deciding which actions to perform based on their expected utility.
The results show that the utility-based planner performed significantly better than the goal-based planner when it could rely on a small number of actions, and showed a statistical trend to do so for a higher number of actions (
Performance of the goal-based planner and the utility-based planner. Each bar is the average utility over 10 repetitions of the intrinsic-phase training and 20 runs of the extrinsic-phase test. Statistical significance was computed with a double-tailed
The smaller difference between the models in utility for a higher number of actions is expected due to the fact that if all goals can be accomplished, independently of their utility, the order of their accomplishment does not matter. However, reality, offering a very large number of alternative (sub-)goals with respect to the actions that can be performed, is similar to the case of the experiment where the system has only 1 or 2 actions available, so utility-based planning is very important in such conditions.
Having illustrated the utility-planning experiment, it is now possible to show that IMP outperformed IGN and FIX not only in terms of the quality of learned affordances but also in terms of the quality of the learned forward models. To this purpose, we compared the performance of the utility-planner using affordances and forward models trained with either one of the IGN/FIX/IMP mechanisms for 4,000 executed actions, a time not sufficient to fully learn the forward models.
Stochastic environment: quality of forward models acquired. Statistical significance is based on a double-tailed
The better performance of IMP could be due to either worse affordances or worse forward models of IGN and FIX. To ascertain this, we repeated the experiment using forward models trained for a time allowing convergence (10,000 executed actions) for all the three systems. In this case the three systems showed a similar performance (data not reported). This indicates that the better performance of IMP in the previous experiment was due to better forward models. A possible explanation of this is that there is a correlation between the difficulty of learning the predictors estimating the affordance-probabilities and the predictors implementing the what-effect forward models as they share the same input (object image). So the effective decisions of IMP on which experiences to focus on to learn affordances also benefit the learning of the forward models. On the other hand, even if affordances of IGN and FIX have a lower quality than IMP (section 3.2.1), this does not negatively affect their performance as such lower quality does not impair the utility-based ranking of the object-related sub-goals.
We have seen that a potential benefit of affordances for planning is the possibility of reducing the number of actions that should be checked during the generation of the forward trajectories. To validate this idea, we ran again the previous extrinsic-phase test (section 3.2.2) but without constraining the number of actions that the system could perform. In particular, we compared two systems, a first one checking all available actions and a second one restricting the forward-model-based search to only those actions having an affordance ≥ 0.5 (here this value excludes from the search all non-afforded actions). The results show that the use of affordances allows a significant reduction of the mean number of actions checked (
Average number of actions to check for accomplishing each sub-goal in the case of affordance-based restricted and nonrestricted planning search. Statistical significance is based on a double-tailed t-test (Mid-line of boxes shows median values, boxes show quartiles, and bars show the min-max range, of the intrinsic-phase successful repetitions). ***
Consider that in realistic situations, the number of actions that can be performed in a certain condition is very high. Moreover, often several actions can be performed in sequences to accomplish a certain (sub-)goal, a situation not investigated here. In this case, the possible reduction based on affordances of the branching factor due to actions is even more important.
The architecture presented here integrates functionalities that have been investigated in isolation in other computational systems. In this section we review the systems that are more closely related to the system presented here, and compare their main features.
Many works have focused on intrinsic motivations as a means to solving extrinsic challenging tasks where a long sequence of skills is required to solve a task or maximize a specific reward function (“sparse reward,” e.g., Santucci et al.,
Seepanomwan et al. (
Planning represents a central theme in artificial intelligence (Ghallab et al.,
The work presented here has multiple links with the autonomous/developmental robotics literature on affordances. In an early work, Stoytchev (
Regarding the link between affordance acquisition and open-ended learning, Ugur et al. (
The link between affordances and the possible relations between the elements of the object/action/effect triad was investigated by Montesano et al. (
The link between affordances and planning was investigated in Ugur et al. (
The affordance concept used here is analogous to the one of “preconditions” used in STRIPS-based planning
Some of the relations between attention, affordances and intrinsic motivations were investigated in Nguyen et al. (
A last field of research related to this work involves active vision (Ballard,
We started to explore the factorization of a scene by an active vision system endowed with controllable restricted visual sensors in a camera-arm robot interacting with simple-shaped 2D objects as those used here (Ognibene et al.,
This work has focused on a possible specific instance of the concept of affordance intended as the probability of achieving a certain desired outcome associated to an action, by performing such action on a certain object. We investigated here three issues related to this concept: (a) within an open-ended autonomous learning context, how can intrinsic motivations guide affordance learning in a system that moves the attention of a visual sensor over different objects; (b) how can such an attention process support the decomposition of complex goals (tasks), involving multiple objects, into separated sub-goals related to single objects; (c) what could be the added value of affordances in planning systems already having sophisticated forward models of the world. For each issue we presented possible advancement with respect to the state of the art (section 4), and showed their advantages in specific experiments (section 3). Several aspects of the system could however be improved in future work.
Regarding the first issue, we proposed a mechanism to use intrinsic motivations (system IGN) to improve previously proposed ways (Ugur et al.,
With respect to intrinsic motivations, in the case of deterministic scenarios where the system knows in advance that the affordance probability is either 0 or 1, the value of actions on the current object and the cost of alternatives was here estimated in terms of intrinsic motivations measuring the system ignorance (system IGN). This is not possible in stochastic scenarios where the affordance probability can be any value ranging in (0, 1), so we proposed an intrinsic motivation tied to the improvement, rather than the level, of the probability estimation (system IMP). This solution, building on previous works on intrinsic motivations (e.g., Schmidhuber,
Another important aspect related to autonomous learning driven by intrinsic motivations is that here, for the sake of focussing the research, the current system learns affordances on the basis of pre-wired actions and goals (expected outcomes of affordances). In a fully autonomous open-ended learning agent such actions and goals should instead be autonomously learned. Much literature has focused on the autonomous learning of actions and, more recently, of goals (e.g., Kulkarni et al.,
Regarding the second issue, related to the advantage for planning of having an attention system focusing on objects, we showed how the parsing of the scene into objects allows the solution of non-trivial planning problems on the basis of relatively simple one-step planning mechanisms. This agrees with previous proposals, such as the “object action compound” framework (Krüger et al.,
Although the introduction of focused visual sensors (attention) facilitates the parsing of the scene into objects, it also makes decision making more difficult. Indeed, the system has to look at different objects, and store information on them, to decide on which object to act or not. We have seen that the information to store can for example involve either the expected information gain, as requested by intrinsic motivations, or the utility of sub-goals, as requested by the solution of a utility-based problem. Here we have proposed a first solution to this problem that requires low computational resources (scanning objects in sequence, computing their expected utility, updating a variable that stores the maximum expected utility encountered this far, and deciding to act on the current object depending on how its utility compares with the maximum expected utility). This mechanism proved effective in tests. However, other more efficient (but also computationally more expensive) mechanisms could be used, in particular based on a memory of the specific utility of the different scanned objects. This information could be indexed by the different positions that have been visually inspected in the scene so that the information itself is readily usable to guide top-down attention processes and actions on specific target objects (Ognibene and Baldassare,
Regarding the third and last issue, related to the possible added value of affordances in planning systems, we showed that affordances as defined here can be useful in goal-based planning systems as they allow a search focused on actions that can be used in the current context. In section 4 we mentioned that this function is similar to that played by preconditions in STRIPS-based planning and by the “initiation set” in reinforcement-learning options. We have also seen that a second function that affordances can play, in particular for utility-based planning problems, is for weighting the importance of alternative goals based on the probability to accomplish them. The definition used to this purpose,
Overall, we think that showing how attention can support a representation of information centered on objects rather than on whole states, and the implications of this for autonomous affordance learning and planning, is a very important issue to which this work contributed.
We conclude by discussing how the system might scale up to more complex scenarios. The overall architecture is expected to scale up well to more complex environments but the implementation of its components should be enhanced to such purpose. For the sake of simplicity here we developed the model components in a way that was sufficient to tackle a simple environment featuring a black background and non-overlapping objects. A realistic environment with a rich texture and several possibly-overlapping objects would produce cluttered images. To face this condition the system should be endowed with object segmentation capabilities (Zhang et al.,
A final general feature of the system that should be addressed in future work is the fact that the information flows between the several components of the architecture are managed by a hard-coded central algorithm using time flags and in some cases symbolic representations. This feature is shared by most architectures of this type. An alternative approach would be to follow the design of real brains where the information flows between components is continuous and has a distributed nature. An example of this is given in Baldassarre et al. (
GB: overall idea of the system, specification of the model and tests, analysis of results, and writing-up. WL: specification of the model and tests, implementation of the system, tests, data analysis, analysis of results, and writing-up. GG: specification of the model and initial tests, and analysis of results. VS: specification of the model and tests, analysis of results, and writing-up.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
We thank Ilaria Maglione for support on the initial implementation of the system.
The Supplementary Material for this article can be found online at:
1This and more complex versions of the scenario involving autonomous robots have been developed within the EU funded project “GOAL-Robots – Goal-based Open-ended learning Autonomous Robots,”