Edited by: Chung Hyuk Park, George Washington University, United States
Reviewed by: Tony Belpaeme, University of Plymouth, United Kingdom; Sofia Serholt, Department of Applied Information Technology, University of Gothenburg, Sweden
This article was submitted to Human-Robot Interaction, a section of the journal Frontiers in Robotics and AI
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
Prior research has demonstrated the importance of children's peers for their learning and development. In particular, peer interaction, especially with more advanced peers, can enhance preschool children's language growth. In this paper, we explore one factor that may modulate children's language learning with a peer-like social robot: rapport. We explore connections between preschool children's learning, rapport, and emulation of the robot's language during a storytelling intervention. We performed a long-term field study in a preschool with 17 children aged 4–6 years. Children played a storytelling game with a social robot for 8 sessions over two months. For some children, the robot matched the level of its stories to the children's language ability, acting as a slightly more advanced peer (
Children's early language development is linked to their academic and overall life success. Numerous studies in the United States, for example, have found that children who are not exposed to rich language learning opportunities as they grow up—such as vocabulary-building curricula, cognitively challenging preschool activities, greater numbers of novel words and total words heard—may be significantly impacted, showing language deficits, lower reading comprehension, and lower vocabulary ability (Huttenlocher et al.,
One way children's language learning can be supported is through peer interaction. Children's peer relationships provide opportunities for openness, exploration, and discovery. Research from the past several decades shows that children's peers, particularly more advanced peers, can enhance their overall preschool competency and language growth (Fuchs et al.,
This research is in line with various theories about how peer learning occurs, including Vygotsky's theory that a child's more advanced peers can help support or scaffold the child in acquiring and practicing skills that are otherwise beyond their skill level (Vygotsky,
Because children's peers can significantly and positively affect their language learning, numerous researchers in human-robot interaction have hypothesized that playing with a peer-like robot companion may lead to similar benefits. For example, some robots have been positioned as slightly advanced peers (e.g., Kanda et al.,
It is also very common for robots to be situated as teachers or tutors (e.g., Robins et al.,
Given this interest in using social robots to support children's language learning, we should examine more closely what modulates children's learning with peers, and by extension, mechanisms that robots can use to be more effective learning companions. That is: are children's peers approximately equal as sources for promoting language learning, or will children learn more effectively from some peers than from others? What features or behavior might help a social robot better enable children's language learning?
Some work has begun exploring these questions. For example, robots that use nonverbal social cues and nonverbal immediacy behaviors have led to increases in children's engagement, learning, and relationships during educational activities (e.g., Kanda et al.,
Another mechanism that may improve children's learning is rapport, as suggested by two recent studies of children's language learning during storytelling with social peer-like robots (Kory Westlund et al.,
Earlier work with adults and robots (Kidd and Breazeal,
Taken together, the research so far suggests that children's rapport with an interlocutor may affect their learning and language behavior. However, these studies were primarily one session; they did not examine children's learning or language behavior over time. As such, one open and important question was whether children would emulate the robot's language long-term, and if they did, whether this would be related to their vocabulary learning or their rapport with the robot. To explore this question, we performed new analyses on an existing dataset from an 8-session study in which children played a storytelling game with a peer-like social robot. The design and early results from this study were presented in (Kory,
We wanted to explore connections between children's learning, their rapport, and their emulation of a peer-like robot's language behavior. We asked whether children would be more likely to emulate language of a robot with whom they had more positive rapport, whether this was correlated with their learning, and furthermore, whether children's emulation or rapport were consistent over time.
We performed new analyses on an existing dataset that included stories from 14 children, who had played a storytelling game with a robot 1–2 times per week for 8 sessions (
The original study explored whether a peer-like social robot could facilitate preschool children's oral language development. In addition to being one of the first studies exploring the effectiveness of a long-term, storytelling intervention, this study examined whether personalizing the general language complexity of the robot's stories might increase children's learning of new words and use of more complex language in their own stories. The hypothesis was that presenting stories of an appropriate challenge for the child, slightly ahead of the child's general ability in the zone of proximal development, might promote learning (Vygotsky,
Two versions of each story told by the robot were created, a harder version and an easier version (for more detail regarding story creation, see Kory,
Seventeen children aged 4–6 years (10 female, 7 male) from two Boston-area preschools (9 from the first and 8 from the second) participated in the original study. Children were recruited from two schools in order to recruit sufficient children for the study. There were three 4-year-olds, thirteen 5-year-olds, and one 6-year-old (
For the purposes of our analyses here, our data included 206 stories from 14 children (8 female, 6 male, two 4-year-olds, twelve 5-year-olds, age
Children's parents gave written informed consent prior to the start of the study, and all children assented to participate. The protocol was approved by the MIT Committee on the Use of Humans as Experimental Subjects.
We expected the following:
Each child participated in a pretest session and 8 sessions with a teleoperated robot, over 10 weeks (Kory,
These initial assessments was used to split children into two groups: higher language ability (above the mean), and lower language ability (below the mean). These categorizations were for this study only; “higher/lower language ability” did not mean children were necessarily above or below what might be expected for their age, just that they were divided into two groups for the purposes of the robot's language level personalization. Children were randomly assigned to the
Each of the 8 sessions with the robot was 10–15 min long (
As mentioned above, in the first half of the study (sessions 1–4), all children heard the same stories. In the second half of the study (sessions 5–8), children in the
A storytelling activity was used to promote language development because storytelling is a socially situated activity that combines play and narrative, which are two important aspects of children's learning and development (Nicolopoulou,
Children were interviewed about their perception of the robot and interaction after sessions 4 and 8. The questions were adapted in part from (Jipson and Gelman,
This study used the Dragonbot (Setapen,
The robot followed a script of speech, expressions, and movement. Speech was recorded by a human adult female. The pitch of the speech was shifted higher to sound more like a child.
A human operator used a custom control interface to send action and speech commands to the robot. The teleoperator attended to the child's speech and actions in order to trigger the robot's actions (e.g., playing back speech or showing a facial expression) at appropriate times. Including a human in the loop allowed the robot to appear autonomous while sidestepping technical barriers such as autonomatic speech recognition and natural language understanding. When the robot's actions depended on what the child said or did, such as during the introductory conversation or when asking the child if they wanted to tell a story, the teleoperator selected among a limited set of dialogue options. The robot's gaze was automatically directed to either look up at the child or down at the game, based on data collected during the pilot study regarding where children look during play.
The teleoperator followed several general rules. First, the teleoperator made the robot's behavior as socially contingent as possible—reacting to the child as closely to as a human would in the same circumstance. When the child spoke, the robot would acknowledge through speech, verbal exclamations such as “Ooh!” and “Oh no!,” smiles, and short affirmative non-linguistic noises. These acknowledgments were primarily triggered during pauses in the child's speech. The same sounds or animations were not triggered twice in close succession, though the same sounds and animations were often used multiple times per session. Finally, the teleoperator made the robot's behavior as consistent as possible across participants, using the same set of sounds and animations with approximately the same frequency for all children. The same person operated the robot for all participants and had been previously operated this robot in numerous earlier studies.
The storytelling game was inspired by the game developed by Ryokai et al. (
The game included eight story scenes (
The eight story scenes used for the storytelling game. Two stories were written for each scene, for a total of 16 stories.
The robot's stories were based on stories told by children during pilot testing of the game at the Boston Museum of Science (Kory,
Twenty-four target vocabulary words were selected from Andrew Biemiller's “Words Worth Teaching” lists (Biemiller,
Audio and video of the study sessions were recorded with a camera beside the robot (
The recorded audio was used to transcribe children's speech. Children's stories were extracted from the full transcripts. All children spoke during the conversations with the robot, and most told stories as well.
The data we analyzed in this paper included 206 stories from 14 children and full transcripts from 17 children (3 children did not tell stories). In these data, we examined children's use of key vocabulary words and key phrases used by the robot, children's emulation of the robot's stories during their own storytelling, and children's language style matching (LSM). LSM is a measure of overlap in function words and speaking style as opposed to content words. Our phrase matching metrics looked primarily at content words. Research has shown that the more “in sync” two people are, the more they will match function words in their speech; it may reflect rapport and relationship (Niederhoffer and Pennebaker,
One limitation of this methodology is that LSM is a linguistic measure of rapport. It would be useful in future work to examine additional ways of measuring children's rapport with the robot, to see whether children's word and phrase use was related to any non-linguistic signs of rapport or relationship as well.
Using automated software tools, we counted the number of times children used each of the target vocabulary words in each session and in their stories. This analysis was performed on the full transcripts of each session. Usage of the words may reflect expressive vocabulary ability, which is often a stronger indicator of knowledge of a word than the receptive knowledge tested with the vocabulary assessment (Bloom,
LSM analysis requires a minimum of 50 words per participant in the conversation, but works better with a greater number of words (Pennebaker et al.,
We analyzed children's transcribed stories in five ways: length (in seconds), word count, vocabulary word use, and emulation of the robot's phrases. We created an automatic tool to obtain phrase matching scores comparing each child story to each robot story that the child had heard prior to telling the story. For example, a story told by a child in session 2 was compared to the stories the robot told in session 1 as well as any stories the robot told before the child in session 2. The analysis was then threefold: (1) compare each child story to the robot story just prior to it; (2) compare each child story to other stories in the same scene; (3) compare each child story to all stories prior to it. The matching algorithm was as follows:
Remove stopwords (i.e., words with no significant information such as “the,” “uh,” and “an”).
Stem words, i.e., convert words to their original form (e.g., “running” becomes “run”).
Find all N-grams in each text, where an N-gram is a continuous sequence of N words from the text.
Remove duplicate N-grams from one text.
Count how many N-grams are the same in both texts.
Return that number as the match score.
This produced a score reflecting the number of exact matches—i.e., words used in the same order by both the child and robot. It also produced a higher match score for texts that have both more matching phrases and longer matching phrases. We also implemented an algorithm for counting similar matches that were close to each other, but not exactly the same. This algorithm followed the same steps listed above, where step 5 (counting matching N-grams) used a fuzzy string matching algorithm to determine if N-grams matched.
For exact matches, we used
For example, one of the robot's stories included the sentences, “But Turtle still couldn't find Squirrel. Eventually, it got dark out and they all got sleepy. So Squirrel had to show his hiding place.” After stopword removal and stemming, this was converted to: “turtle still couldn't find squirrel eventually get dark out they all get sleepy squirrel show hiding place.” One child's story included the similar section, “But he still couldn't find Squirrel. Then he bumped into him and started playing. And it's getting late out. So Squirrel had not showed his hiding place,” which was converted to “he still couldn't find squirrel then he bump into him start play get late squirrel show hiding place.” This segment included several exactly matching phrases, e.g., “couldn't find squirrel,” as well as several similar matching phrases, e.g., (robot) “squirrel show hiding place” \ (child) “late squirrel show hiding.”
First, we discuss children's vocabulary learning and information about the kinds of stories children told. Some of these results were previously reported in Kory (
Next, we present our new analyses regarding children's use of the target words and key phrases, emulation of the robot, LSM scores, and correlations among these measures. Because the new analyses were
As reported in Kory (
The majority of children reported liking the robot and the storytelling game.
Across all the children, children's scores on the vocabulary assessment increased from the pretest (mean words correct = 13.4 of 24,
Children's vocabulary scores increased over the study, but more so in the
Nine children told stories aloud every session. Five children told primarily silent stories, in which they spent time dragging characters on the tablet and sometimes murmuring to themselves, but not speaking aloud very often. Their stories often appeared short because only spoken words were counted. Several of these “silent tellers” began vocalizing their stories more by the final session, telling stories closer in length to the other children. Three children told no stories, though they did talk at other times.
The children who spoke aloud told 206 stories with a mean word count of 81.7 words (
Qualitatively, children covered a range of themes in their stories. We observed that children often borrowed elements from the robot's stories—such as character names and activities characters performed. For example, one of the robot's stories was about a boy named Micah, who played ball with his friends. One child continued using this name and theme (XX's indicate inaudible words in the transcript):
“One time there were three friends, XX, Micah and Isabella. Micah liked going on the swings. Isabella liked going on the slide. One time they made a new friend, Daisy. She liked ball. One time she hid behind a bush until nobody saw her. Then both of the kids that were playing, approached and hid. Then, Micah slid down the slide and saw her. She stepped out but landed on the top of the brick tower. So then, they both came down together. The end.”
Several children also retold versions of the robot's stories, without prompting (they were merely asked to tell a story and were not prompted with regards to content). For example, after the robot told a story about three animals that played hide-and-seek together, one child told the following story:
“Once upon a time there was a squirrel named, Squirrel, a turtle named Turtle and a rabbit named Rabbit. That particular day they played hide and seek. Squirrel hid in the mud. Turtle hid in the trees while Bunny counted. One, two, three, four. Found you! Found you, Turtle. My turn. XX behind a tree. Squirrel found Turtle. And then they played again and again. The end.”
Our observations of these emulations suggested that children were, in fact, emulating the robot's stories, which was revealed quantitatively in our language eumulation results below.
We performed mixed analysis of variance with condition (between:
Children's mean use of the robot's key phrases and target vocabulary words by session.
We observed LSM scores ranging from 0.063 to 0.892, with a mean of 0.696 (
Children's mean LSM scores by condition for the first half vs. second half of the study.
As described earlier, phrase matching scores were computed against all previously heard stories, only stories from the same story scene, and only the story heard just prior to the child's. We used children's phrase matching scores as a measure of language emulation. We performed mixed analysis of variance with condition (between:
We observed a trend for main effect of time on the mean number of matching phrases used per story,
Children emulation the robot's phrases during their storytelling. Their emulation increased during the second half of the study in the
We observed a significant interaction of time with condition for the mean number of matching phrases used per story,
We observed a trend for an interaction of time with condition for the mean number of matching phrases used per story,
Children who emulated more of the robot's phrases during their storytelling also scored higher on the vocabulary posttest,
Children who had higher LSM scores during sessions 1–4 were more likely to emulate the robot's phrases during storytelling,
When looking at the mean of all children's scores for sessions 1–8, we observed that children who told longer stories also used more unique words (
Children who told longer stories also used more unique words and spent more time telling their stories.
We asked whether children would show greater vocabulary learning and language emulation when they showed greater rapport with a social robot with whom they played a storytelling game over time. We found some evidence supporting our hypotheses.
First, we observed that most children liked the robot, and their LSM scores reflected that liking, being reasonably high overall. We observed that children learned new vocabulary words, as evidenced by higher vocabulary posttest scores and use of the target words in their stories. This result reflects prior work in which children have learned and mirrored new vocabulary words with social robots during storytelling activities (e.g., Kory Westlund et al.,
In partial support of H1, we observed that children's LSM scores were positively related to their use of the robot's key words and phrases. However, contrary to our expectations, LSM scores were not significantly related to children's vocabulary test scores.
This may be for several reasons. First, because the sessions with the robot were fairly short (10–15min) and because not all children told long stories, the amount of conversation between the robot and child was limited. As such, the amount of data used to compute the LSM scores was limited, and the LSM scores should be interpreted with a degree of caution. Second, children's LSM scores may not perfectly reflect rapport. Prior work linked higher LSM scores between two people to higher rapport and a deeper relationship (e.g., Pennebaker et al.,
In our analyses here, we did observe that children's LSM scores correlated positively with their emulation of the robot during storytelling, as expected (H2). This suggests that rapport is linked to emulation, which is in line with prior work showing that people will mirror a variety of different behaviors in others with whom they have high rapport (e.g., Tickle-Degnen and Rosenthal,
In addition, we saw that children's emulation of the robot's language was positively correlated with their vocabulary scores, supporting H3. Children who correctly identified more of the target words on the receptive vocabulary test were also more likely to expressively use the words in their stories. These results suggest that children's emulation was related to their learning—perhaps their rapport with the robot led to greater emulation, and greater emulation was indicative of greater word learning. This would be worth investigating in a systematic way in follow-up work.
We find partial support for H4: When examining children's behavior over time, we saw that children slightly increased their use of the robot's keywords and phrases from the first half of the study to the second half. However, children's overall emulation decreased over time, while their use of unique words increased. It may be that children were more creative over time when telling stories, making up their own that drew less on the robot's stories for inspiration. The storytelling activity was designed to facilitate language development, so both creatively using language as well as imitating the robot's language were beneficial outcomes. Story re-telling (i.e., intentionally imitating another's storytelling) has often been used as an educational activity for helping children learn stories and vocabulary (e.g., Isbell,
Children's LSM scores, on average, did not show a strong increase over time (there were differences by condition, as discussed further below). This could indicate little increase in rapport, or could mean that LSM is not sufficiently sensitive to capture children's changes in rapport over the study.
Children's LSM scores and phrase emulation during storytelling increased over time for children in the
However, in addition to the small sample size, the two conditions were not fully balanced. There were more children in the
Taken together, our results suggest that first, interacting with a more advanced peer-like social robot can be beneficial for children's language learning. This is in line with work examining children's language learning with human peers (Fuchs et al.,
Finally, this study highlights new opportunities we have for using social robots as interventions for early language development, specifically by leveraging this connection between rapport and learning.
This study had several limitations. First, as mentioned earlier, the sample size was fairly small and conditions were unbalanced in number. As such, the statistical power of our analyses are underpowered. In addition, children's individual differences were not controlled for, such as learning ability or socio-economic status. These factors may all influence children's learning and social interactions with the robot. Future work should attempt to recruit a more balanced, homogeneous sample and explore the stability of the results across individual differences.
The target vocabulary words presented in the robot's stories included some words that were known by numerous children at the start of the study (as reported above, children identified a mean of 13.4 of 24 words correctly at the pretest,
Another limitation of the dataset was the lack of additional assessments of relationship and rapport. We used children's LSM scores as a measure of rapport, since numerous prior studies have linked higher LSM scores between two people to higher rapport and a deeper relationship (e.g., Pennebaker et al.,
Finally, this study explored a one-on-one interaction with the robot. However, children often learn with others—friends, siblings, parents, and teachers. Future work should explore group interactions that include multiple children or children with parents, caregivers, and teachers. This could give us insight into how to integrate robots into real-world educational contexts, such as schools and homes.
Despite these limitations, we did see numerous correlations and differences that are suggestive of links between children's learning, rapport, and language emulation. While these results are exploratory and not definitive, they do provide evidence that this in an area that warrants further study.
The datasets generated for this study are available on request to the corresponding author.
JK-W and CB: the study was conceived, designed, the paper was drafted, written, revised, and approved. JK-W: data analysis was performed.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.