A developmental model of initiating joint attention through constructing state space
-
1
Osaka university, Adaptive Machine System, Graduate School of Engineering, Japan
-
2
Osaka university, Systems Innovation, Graduate School of Engineerning Science, Japan
I. INTRODUCTION
Inferring other’s intention is an important factor that can accelerate the development of joint attention (JA). In this paper, we aim to reveal the developmental mechanism of JA based on such inferring capability by a synthetic approach.
Human infants have gradually come to follow the gaze direction of caregivers from 6-months of age[1]. Until 12-month, they have accurately been able to find an object that her gaze and/or pointing are referring to[2], [3]. These behaviors are called responding to JA (RJA). Meanwhile, they have developed the ability to lead her attentions to an object by using their pointings, utterances, etc[2]. These actions are called initiative behaviors of JA (IJA). However, the internal mechanism which enables such a development of both RJA and IJA has not been well understand.
On the other hand, some studies have synthetically suggested the developmental models of JA in human infants as advocated in cognitive developmental robotics[4]. Those showed that a robot acquires gaze-following only through statictically mappping gaze and locations when it finds something salient[5], [6], and that a robot acquires gaze-alternation in JA through discovering and reproducing the contingency of caregiver-infant interactions based on information theory[7]. However, those have not deal with the development of inferring caregiver's internal states, nor explain about the relationship between it and JA.
This work presents our working model for a learning mechanism of optimal actions for JA in a scheme of reinforcement
learning (RL). It integrates methods for the autonomous state space construction and the hidden state estimation. It seems an important but formidable step for infants to estimate the caregiver's hidden state, whether she is interactive for JA or not, since it can not be directly observable from them but largely affects their future rewards. To overcome this, the proposed method has three steps. The first step is to divide state space, with consideration of the differences between the caregiver's hidden states, based on the extended method of ishiguro et al.[8]. The second step is to learn the probablistic model of state transition based on HMM. Through above two steps, the learner can acquire the ability to estimate the hidden states according to not only the directed observation at the time but also the history of their interactions. The final step is to learn the optimal action on
each state based on RL. We applied the proposed model to the computer simulation, and try to reproduce the developmental processes of both RJA and IJA in human infants.
II. LEARNING MODEL OF JA
Here, a certain type of caregiver-infant interaction was supposed. The learner first observes the caregiver and the objects in their environment, estimates her attentions and the hidden states. Then it selects one from possible actions: lookings at the caregiver, looking at an object, and uttering. If the JA between them is accomplished, the caregiver gives a reward. After experiencing those interactions, the learner constructs the state space, and learn the state transition model and optimal actions on each state. By repeating such learning processes, it can gradually understand her hidden states, and acquire the behaviors of RJA and IJA.
In this work, we assume that the caregiver’ hidden states are represented as three kinds of the modes: initiative mode, responsive mode, and non-interactive mode. These modes affect her policies of giving rewards and state transition. When she is the initiative mode or the responsive mode, the learner can get rewards by JA. On the other hand, it can’t get rewards when she is the non-interactive mode, even if it can follow her gaze directions. However, the learner can let her change mode from the initiative, or non-interactive, to the responsive one by its paticular behaviors: gaze-alternations and speakings to her. Therefore, it should induce her responsive mode in order to get rewards when the she is non-interactive mode.
A. System for JA
The system has three modules: state estimation module, action selection module, and memory module. The learner infers
the caregiver's modes by state estimation module, selects an action to get rewards by action selection module, and stores the interaction data for learnings in memory module.
The state estimation module is composed of two parts that are sensor classifier and state transition model. The sensor classifier calculates the current state probabilities from observable sensor data: gaze directions of the caregiver's and the learner’s, and locations of two objects. Meanwhile, by the state transition model, the learner also calculates those probabilities from the prior data: the state probabilities, the learner’s action and the caregiver's reward value at the last interaction. The learner estimates the current state by integrating the state probabilities based on two different way. Note that the way to give rewards is different depending on the caregiver's mode, the learner’s action, and the current state. Therefore, the state transition model must be utilized for estimating her current mode. By this method, the learner identifies the current state according to
her modes.
In the action selection module, an action is stochastically selected based on the indentified state and state-action values. The state-action value based on RL means the expected reward of taking an action in a state.
B. Learning algorithm
The learner needs to recognize a variety of states to achieve JA depending on the caregiver's mode which change following to the infant’s behavior. In order to discriminate these modes which are not observable at the moment, the system has two stages: (1) state space construction for RL to find adequate features based on an extension method of ishiguro et al. [8] and (2) parameter estimation of the state transition model based on HMM.
At the state space construction stage, the system first divides the continuous sensor space into two kinds of areas after each action, according to whether the learner is given a reward in the area or not. Moreover, the divided area may have more division if it has multiple peaks in the histogram of reward values depending on the different modes. As a result, the learner constructs the state space which is expected to reflect the differences among the modes.
At the parameter estimation stage of the state transition model, the system re-estimates the states from the past interactions. Based on the HMM, it calculates the probabilities of the state transition in terms of the learner's prior action, the caregiver's prior reward , and the previous state. The learner consequently understand how the states and modes change depending on its own behavior.
Moreover, the learner needs to learn adequate behaviors to achieve JA in each the caregiver's mode. In action learning, the system calculates state-action values by Q-learning of RL. The method of RL can be used to consider the near future rewards.
III. SIMULATION OF DEVELOPMENT OF IJA
By analizing the learner’s action selection in the computer simulation, we studied how did the learner’s behaviors of JA change along with the development of hidden state estimation. As the results, the learner first learned gaze-following, and then learned initiative behaviors of JA after acquiring the hidden state estimation. This was because the learner first found gaze-following which is a behavior connected to rewards directly. After learning the hidden state estimation, it found that it should induce the caregiver's mode to the responsive one by initiative behaviors of JA, which are connected to rewards undirecly but needed to get rewards, when she is the non-interactive mode.
IV. CONCLUSION
In this work, we proposed the method that can acquire optimal action based on reinforcemnt learning, which integrates
methods for state space construction and hidden state estimation. Results of computer simulation showed that the proposed method can learn not only gaze-following but also initiative behaviors of joint attention by estimating the caregiver's hidden states. Those results indicated the possibility that the learning method reproduced one of the developmental processes of joint attention affected by understanding other’s intention in human infants.
The caregiver's model in this work was a specific setting. Therefore, the future work will investigate what kinds of parameters in caregiver models are the important factors for the bahevior acquisition of joint attention in human infant.
Acknowledgements
This study is partially supported by Grants-in-Aid for Scientific Research (Research Project Number: 22220002) and Grant-in-Aid for JSPS Fellows(Research Project Number:21.57061).
References
[1] Butterworth and Jarrett. What minds have in common is space: Spatial mechanisms serving joint visual attention in infancy. British Journal of Dev. Psy., 9:55-72, 1991.
[2] Carpenter et al. Social cognition, joint attention, and communicative competence from 9 to 15 months of age. Monographs of the society for research in child Dev., 63:, 1998.
[3] Amanda L. Woodward. Infants’ developing understanding of the link between looker and object. Developmental Science, 6:297-311, 2003.
[4] Asada et al. Cognitive developmental robotics: a survey. IEEE Transactions on Autonomous Mental Development, 1:12-34, 2009.
[5] Nagai et al. A constructive model for the development of joint attention. Connection Science, 15:211-229 2003.
[6] Triesch et al. Gaze following : why (not) learn it. Develpmental Science, 9:125-147, 2006.
[7] Sumioka et al. Development of Jointctj Attention Related Actions Based on Reproducing Interaction Contingency. Proc. of the 7th ICDL, 2007.
[8] Ishiguro et al. Robot Oriented State Space Construction. In Proc. of the IEEE/RSJ Int’l Conf. on Intelligent Robots and System, 3:1496-1501, 1996.
Keywords:
developmental model,
Inferring other's intention,
joint attention,
synthetic approach
Conference:
IEEE ICDL-EPIROB 2011, Frankfurt, Germany, 24 Aug - 27 Aug, 2011.
Presentation Type:
Poster Presentation
Topic:
Social development
Citation:
Nakano
T,
Fujiki
S,
Yoshikawa
Y and
Asada
M
(2011). A developmental model of initiating joint attention through constructing state space.
Front. Comput. Neurosci.
Conference Abstract:
IEEE ICDL-EPIROB 2011.
doi: 10.3389/conf.fncom.2011.52.00006
Copyright:
The abstracts in this collection have not been subject to any Frontiers peer review or checks, and are not endorsed by Frontiers.
They are made available through the Frontiers publishing platform as a service to conference organizers and presenters.
The copyright in the individual abstracts is owned by the author of each abstract or his/her employer unless otherwise stated.
Each abstract, as well as the collection of abstracts, are published under a Creative Commons CC-BY 4.0 (attribution) licence (https://creativecommons.org/licenses/by/4.0/) and may thus be reproduced, translated, adapted and be the subject of derivative works provided the authors and Frontiers are attributed.
For Frontiers’ terms and conditions please see https://www.frontiersin.org/legal/terms-and-conditions.
Received:
23 Jun 2011;
Published Online:
12 Jul 2011.
*
Correspondence:
Prof. Minoru Asada, Osaka university, Adaptive Machine System, Graduate School of Engineering, Suita, Osaka, 565-0871, Japan, asada@otri.osaka-u.ac.jp