Edited by: Cigdem Beyan, Istituto Italiano di Tecnologia, Italy
Reviewed by: Vicky Charisi, Joint Research Centre, European Commission, Belgium; Sofia Serholt, University of Gothenburg, Sweden; Takamasa Iio, University of Tsukuba, Japan
This article was submitted to Human-Robot Interaction, a section of the journal Frontiers in Robotics and AI
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
In positive human-human relationships, people frequently mirror or mimic each other's behavior. This mimicry, also called entrainment, is associated with rapport and smoother social interaction. Because rapport in learning scenarios has been shown to lead to improved learning outcomes, we examined whether enabling a social robotic learning companion to perform rapport-building behaviors could improve children's learning and engagement during a storytelling activity. We enabled the social robot to perform two specific rapport and relationship-building behaviors: speech entrainment and self-disclosure (shared personal information in the form of a backstory about the robot's poor speech and hearing abilities). We recruited 86 children aged 3–8 years to interact with the robot in a 2 × 2 between-subjects experimental study testing the effects of robot entrainment
Social robots have been designed as peers, tutors, and teachers to help children learn a variety of subjects (Belpaeme et al.,
Some prior work with adults provides evidence in support of this hypothesis (Kidd and Breazeal,
We have strong evidence that children's peer relationships provide bountiful opportunities for learning via observing peers, being in conflict with peers, and cooperating with peers (Piaget,
Two possible modulating factors are rapport and a positive relationship. Some recent work has linked rapport to improved learning outcomes in older children's human-human peer tutoring situations (Sinha and Cassell,
Many different social and relational factors can increase rapport, trust, and engagement with virtual agents and robots. For example, using appropriate social cues (Desteno et al.,
We chose to implement two rapport- and relationship-building behaviors in a social robot to explore their effects on young children's engagement and learning: speech entrainment and self-disclosure (shared personal information).
In positive human-human interpersonal interactions, people frequently mimic each other's behavior—such as posture, affect, speech patterns, gestures, facial expressions, and more—unconsciously, without awareness or intent (Davis,
Speech entrainment involves matching the vocal features such as speaking rate, intensity, pitch, volume, and prosody of one's interlocutor. This mimicry tends to happen unconsciously, and more often when rapport has been established—i.e., when one feels closer to or more positively about one's interlocutor (Porzel et al.,
Backstory is the story told by or about an agent, including personal story (e.g., origin, family, hobbies), capabilities, limitations, and any other personal information that might be disclosed. With young children in particular, we expect that sharing information about an agent in a story context could make it easier for children to understand.
Prior work has shown that the story told about a robot prior to interaction can change how people perceive the robot and interact with it. Telling participants that a robot is a machine vs. a human-like, animate agent (Stenzel et al.,
Backstory can also increase engagement with an agent. For example, in one study, giving a robot receptionist a scripted backstory during a long-term deployment increased engagement, since the story added interesting variation and history to the interactions people had with it (Gockley et al.,
Part of our goal in giving the robot a backstory was to promote a more positive relationship. Thus, we examined specific interventions regarding the acceptance of peers and how these interventions might play into the story told about the robot. Favazza and colleagues explored how to promote the acceptance of peers with disabilities in children's kindergarten classrooms, as well as how to measure that acceptance (Favazza and Odom,
There are ethical concerns regarding deception when giving robots stories that may elicit empathy, trust, or acceptance. In this study, the backstory we chose to use was fairly reflective of the actual limitations and capabilities of social robots. It pertained to the robot's difficulties with hearing and listening and was thus fairly realistic and not particularly deceptive, given general difficulties in social robotics with automatic speech recognition and natural language understanding. The remainder of the backstory discussed the robot's interest in storytelling and conversation, which was deceptive in that robots do not really have interests, but served to present the robot as a character with interests in these subjects in order to promote engagement in learning activities.
We wanted to explore whether a social robot that entrained its speech and behavior to individual children and provided an appropriate backstory about its abilities could increase children's rapport, positive relationship, acceptance, engagement, and learning with the robot during a single session.
The experiment included two between-subjects conditions: Robot entrainment (
We recruited 95 children aged 3–8 years (47 female, 48 male) from the general Boston area to participate in the study. We recruited a wide age range in order to recruit a sufficient number of participants and also because we were interested in seeing whether older children (e.g., 6–8 years) or younger children (e.g., 3–5 years) might relate differently to the robot's relational behavior, since children may develop relationships differently as they grow older (Hartup et al.,
Nine children were removed from analysis because they did not complete the study
We used random counterbalanced assignment to assign children to conditions. There were 20 in the
Demographic information about the participants by condition.
E-B | 5.40 (1.54) | 11 | 9 | 12 | 8 |
E-NB | 5.21 (1.34) | 7 | 9 | 9 | 7 |
NE-B | 5.44 (1.67) | 13 | 15 | 18 | 10 |
NE-NB | 5.27 (1.35) | 13 | 9 | 11 | 11 |
Children's parents gave written informed consent prior to the start of the study, and all children assented to participate. The protocol was approved by the MIT Committee on the Use of Humans as Experimental Subjects.
We expected that the robot's entrainment and backstory might affect both children's rapport and social behavior, as well as learning and retention, during a single session with the robot. Accordingly, we used a variety of measures to explore the effects of the robot's entrainment and backstory. We tentatively expected the following results:
Five different experimenters (three female adults and two male adults) ran the study in pairs in a quiet room in the lab. The study setup is shown in
For each child, the interaction with the robot lasted about 20 min, followed by 5–10 min for the posttests. The interaction script, full interaction procedure, and other study materials are available for download from figshare at:
The experimenter introduced the sleeping robot, Tega, to the child and explained that it liked looking at pictures and telling stories. If the child was in the Backstory condition, the experimenter also explained that Tega sometimes had trouble hearing: “Do you see Tega's ears? Tega's ears are hiding under all the fur, so sometimes Tega's ears don't work very well. Tega sometimes has a lot of trouble hearing. You should talk to Tega in a loud and clear voice so Tega can hear you. Try to be understanding if Tega needs to hear something again.” Then, in all conditions, the experimenter invited the child to help wake up the robot.
The robot interaction had four main sections: A brief introductory conversation (providing context for sharing the backstory, 2–3 min), a conversation about pictures (providing opportunities for speech entrainment and a helping/compliance request, 5–6 min), a sticker task (a sharing/compliance request, 1 min), a storytelling activity (providing opportunities to learn words and mirror the robot's speech, 10–12 min), and a brief closing conversation (1–2 min).
In the introductory conversation, the robot introduced itself, shared personal information about its favorite color and an activity it liked doing, and prompted the child for disclosure in return. Then, in the Backstory condition, the robot reinforced the backstory provided by the experimenter earlier, telling the child, “Sometimes I have trouble hearing and I can't always understand what people tell me. I try really hard, but sometimes I just don't hear things right. I need help and practice to get better!”
The picture conversation took approximately 5 min and was designed to provide many conversation turns for the child, and thus provide the robot with opportunities to entrain its speech to the child's. The experimenter placed photos one at a time in front of the robot and child (e.g., a collage of holidays or pictures from children's movies). For each picture, the robot introduced the picture content, expressed something it liked about the picture, asked the child a question, responded with generic listening responses (e.g., “Can you tell me more?,” “Oh, cool!,” “Keep going!”), shared another fact relevant to the picture, and asked another question. At two points during this activity, there were scripted moments where the robot had difficulty hearing (saying, e.g., “I didn't hear that, can you say it again?”), to reinforce its backstory. The experimenter explained that the robot and child had to do at least three pictures, but they could do one more if they wanted—this set up a later compliance/helping task after the third picture, in which the robot asked if the child would do a fourth picture with it to help it practice extra. If the child declined the fourth picture, the experimenter moved on.
The sticker task was used to see how likely the child was to agree to a request by the robot to share a favorite object. The child was allowed to pick out a sticker from a small selection. The robot stated that it wanted the child's sticker and asked for it. The child could spontaneously speak or give their sticker to the robot, or decline. If the child gave their sticker, the experimenter would conveniently find a duplicate sticker in their pocket to replace it, so that the child would not have to forgo their favorite sticker.
The storytelling activity was modeled after the story retelling task used in Kory Westlund et al. (
We embedded six target vocabulary words (all nouns) into the story. As in the prior study, we did not test children on their knowledge of these words prior to the storytelling activity because we did not want to prime children to pay attention to these words, since that could bias our results regarding whether or not children would learn or use the words after hearing them in the context of the robot's story. We used the six key nouns identified in the original story in Kory Westlund et al. (
After the robot told the story, the robot prompted children to retell the story. Children could use the tablet while retelling the story to go through the story pages, so they could see the pictures to help them remember the story. Twice during the retell, the robot had difficulty hearing (“What? Can you say that again?”), which reinforced the backstory. Children's retellings were used as a measure of their story recall, mirroring of the robot's speech, and expressive use of the vocabulary words.
As part of the closing conversation, we included a goodbye gift task. The experimenter brought out a tray with several objects on it: a small toy frog (because the frog was present in the robot's story), a small book (because the robot expressed great interest in stories), a sticker of the robot's favorite color (blue), and an orange sticker. The child could pick an object to give to the robot, and the experimenter followed up by asking why the child had picked that gift.
After the robot interaction, the experimenter administered a receptive vocabulary test of the six target words in the story. For each word, four pictures taken from the story's illustrations were shown to the child. The child was asked to point to the picture matching the target word. We examined both children's receptive knowledge of the words as well as children's expressive or productive abilities during the story retelling, since children who can recognize a word may or may not be able to produce it themselves.
This was followed by the Inclusion of Other in Self task, adapted for children as described in Kory-Westlund et al. (
Then the experimenter asked several questions taken from the Social Acceptance Scale for Kindergarten Children (Favazza and Odom,
We used the Tega robot, a colorful, fluffy squash and stretch robot designed for interactions with young children (Kory Westlund et al.,
Speech was recorded by a human adult female and shifted to a higher pitch to sound more child-like. All robot speech was sent through the automated audio entrainment module and streamed to the robot. For the
We used a Google Nexus 9 8.9-inch tablet to display the story. Touchscreen tablets have effectively engaged children and social robots in shared tasks (Park et al.,
As in the prior study (Kory Westlund et al.,
Using teleoperation allowed the robot to appear autonomous while removing technical barriers, primarily natural language understanding, since the teleoperator could be in the loop to parse language. The teleoperator triggered when the robot began each sequence of actions (speech, physical motions, and gaze), and when the storybook should turn the page. Thus, the teleoperator had to attend to timing in order to trigger action sequences at the right times. The timing of actions within sequences was automatic and thus consistent across children. There were also several occasions when the teleoperator had to listen to children's speech and choose the most appropriate of a small set of different action sequence options to trigger, namely during the picture conversation task.
The teleoperator performed one of two actions if the child asked an unexpected question or said something unusual. During the conversation portion of the interaction, the teleoperator could trigger one of the generic responses (e.g., “Mmhm!,” “Hm, I don't know!”) in reply. During the remainder of the interaction, the teleoperator had to continue in accordance with the interaction script, which essentially ignored unexpected behaviors. While this is not ideal from an interaction standpoint, it was necessary to ensure reasonably consistent behavior on the part of the robot across children.
In the
For speaking rate and pitch entrainment, the child's speech was automatically collected via the robot's microphone when it was the child's turn to speak in the conversation. Using automatic software scripts with Praat (audio analysis software), various features of the children's speech were extracted and used to modify the robot's recorded speech files. These modified audio files were then streamed to the robot for playback.
For speaking rate, the robot's speech was sped up or slowed down to match the child's speaking rate. Thus, if a child spoke slowly, the robot slowed down its speech as well. We included ceiling and floor values such that the robot's speech would only ever be sped up or slowed down by a maximum amount, ensuring that the speech stayed within a reasonable set of speeds. We used the Praat script for speaking rate detection from de Jong and Wempe (
The mean pitch of the robot's speech was shifted up or down. In doing this, the robot matches two features: (1) the child's age, (2) the child's current mean pitch. In general, people speak at a particular fundamental frequency, but there is variation within an individual (pitch sigma). Thus, we provided a table of mean fundamental frequencies for different age children based on the values computed in prior work (Weinberg and Zlatin,
We also manually adapted the robot's volume and exuberance. During the introduction and first picture in the picture task, the teleoperator observed the child's behavior and personality: were they shy, passive, reserved, or quiet (less exuberant/quiet children)? Or were they loud, extroverted, active, smiley, or expressive (more exuberant/loud children)? Based on this binary division, the teleoperator adjusted the robot's audio playback volume twice, at two specific points during the interaction, to either be slightly quieter (for less exuberant/quiet children) or slightly louder (for more exuberant/louder children). Furthermore, the teleoperator triggered different animations to be played on the robot at six different points during the interaction—more excited and bigger animations for more exuberant/louder children; quieter, slower, animations for less exuberant/quieter children.
We recorded audio and video of each interaction session using a camera set up on a tripod behind the robot, facing the child. All audio was transcribed by human transcriptionists for later language analyses. Children's responses to the posttest assessments were recorded on paper and later transferred to a spreadsheet.
For the analysis of children's story retellings, we excluded the three 3-year-olds because one did not retell the story, and the other two needed extra prompting by the experimenter and were very brief in their responses. Of the remaining 83 children, one child's transcript could not be obtained due to missing audio data. Fifteen children did not retell the story (the number from each condition who did not retell the story was not significantly different). Thus, in total, we obtained story retell transcripts for 67 children (15
We analyzed children's transcribed story retells in terms of story length (word count), overall word usage, usage of target vocabulary words, and similarity of each child's story to the robot's original story. We created an automatic tool to obtain similarity scores for each child's story as compared to the robot's story, using a phrase and word matching algorithm. The algorithm proceeded as follows: First, take both stories (the original story and the child's story) and remove stopwords (i.e., words with no significant information such as “the,” “uh,” and “an”). Second, stem words—i.e., convert words to their original form. For example, “jumping” would be converted to “jump.” Third, find all N-grams in each story, where an N-gram is a continuous sequence of N words from both texts. Fourth, remove duplicate N-grams from one text. Fifth, count how many N-grams are the same in both texts. The number of matches is the similarity score. This algorithm produces a score reflecting the number of exact matching phrases in both stories—i.e., words used in the same order by both the child and robot. It also produces a higher match score for texts that have both more matching phrases and longer matching phrases. We also implemented an algorithm for counting similar matches that are close to each other, but not exactly the same. This algorithm was the same as the above, where the fifth step (counting matching N-grams) used a fuzzy string matching algorithm to determine if the N-grams matched.
When running the algorithm to match stories, we used
For example, the robot's story included the sentences, “The baby frog liked the boy and wanted to be his new pet. The boy and the dog were happy to have a new pet frog to take home.” After stopword removal and stemming, this was converted to: “baby frog like boy want be new pet boy dog happy new pet frog take home.” One child's story included the similar section, “Then he hopped on his hand and he wanted to be his pet. And then the dog and the boy was happy to have a new pet,” which was converted to: “hop hand want be pet dog boy happy new pet.” There were several exactly matching phrases, e.g., “
We obtained children's facial expressions from the recorded videos using Affdex, emotion measurement software from Affectiva, Inc., Boston, MA, USA (McDuff et al.,
We focused our analysis on the following affective states and facial expressions: joy, fear, sadness, surprise, concentration, disappointment, relaxation, engagement, valence, attention, laughter, and smiles. We included valence in addition to specific emotions such as joy because Affdex uses different sets of facial expressions to detect the likelihood that a face is showing each affective state. Thus, valence is not detected from, e.g., the emotions joy or sadness; instead, it is calculated from a set of facial expressions that is somewhat different than, though overlapping with, the set of expressions used to calculate other emotions. The expression “concentration” was called “contempt” by Affectiva. Affectiva has no label for concentration or thinking expressions. Affectiva uses brow furrows and smirks to classify contempt; prior work has found that brow furrowing and various lip movements present in smirks such as mouth dimpling and lip tightens are also associated with concentration (Oster,
We coded children's responses to the Social Acceptance Scale questions on a 3-point scale, with “
We coded whether children agreed to do the fourth picture and whether they gave the robot their sticker with “
Our results are divided below into two parts, each reflecting one of our hypothesis areas: (1)
For all learning-related analyses of variance, we included Age as a covariate because we expected that children's age would be related to their language ability and thus to their vocabulary scores and the complexity and/or length of their stories.
We performed 2 × 2 between-subjects analyses of variance with Entrainment (
A 2 × 2 between-subjects analyses of variance with Entrainment (
Children in the
Overall, we saw no correlation between children's recognition of words on the vocabulary test and their subsequent use of those words in their retells, r
In summary, given that children's scores on the vocabulary identification test were not significantly different by condition, these results suggest that the robot's entrainment and backstory did not impact children's initial encoding of the words, but did affect children's expressive use of the words in their retelling.
The robot's story was 435 words long, including the dialogic questions. The mean length of children's retells was 304 words (
We performed 2 × 2 between-subjects analyses of variance with Entrainment (
Children used a mean of 37.7 unique words (
The number of overlapping words children used by entrainment condition
Children's stories received mean scores of 41.3 (
We performed 2 × 2 between-subjects analyses of variance with Entrainment (
Children's responses to the question, “Would you like to be good friends with a robot who can't hear well?” and the question, “Would you like to be good friends with a handicapped or disabled kid?” by condition. *
Overall, children were highly attentive and engaged, and displayed surprise and other emotions during the story (see
Analysis of facial expressions during the interaction by condition.
Engagement | 30.8 (11.7) | 33.3 (13.3) | 30.5 (12.0) | 29.6 (11.2) | 30.5 (11.4) |
Attention | 68.9 (13.4) | 62.2 (21.1) | 67.8 (15.2) | 71.9 (5.56) | 72.0 (9.51) |
Valence | −0.738 (9.11) | 3.51 (8.81) | 5.75 (13.72) | −4.13 (5.20) | −2.72 (8.47) |
Joy | 7.13 (8.04) | 9.13 (8.81) | 12.1 (12.5) | 5.48 (5.02) | 5.61 (7.26) |
Smiles | 8.98 (8.82) | 10.9 (9.35) | 14.6 (13.4) | 7.16 (5.65) | 7.52 (8.31) |
Laughter | 0.13 (0.22) | 0.23 (0.31) | 0.28 (0.36) | 0.08 (0.09) | 0.07 (0.11) |
Relaxation | 3.53 (5.31) | 4.13 (5.38) | 6.63 (9.61) | 2.49 (2.42) | 3.06 (5.03) |
Surprise | 7.21 (6.96) | 8.47 (9.22) | 4.53 (4.63) | 7.40 (5.32) | 7.43 (7.84) |
Disappointment | 4.98 (3.98) | 2.58 (2.01) | 3.58 (3.03) | 6.58 (4.37) | 5.72 (4.05) |
Fear | 1.48 (2.06) | 1.00 (1.40) | 0.38 (0.66) | 1.87 (2.04) | 1.93 (2.72) |
Concentration | 2.92 (2.48) | 2.02 (1.79) | 2.11 (1.87) | 3.20 (2.45) | 3.72 (3.03) |
Sadness | 0.27 (0.46) | 0.22 (0.34) | 0.49 (0.54) | 0.32 (0.59) | 0.17 (0.24) |
We found a significant main effect of Entrainment on children's expressions of joy,
Children's overall negative affect varied by entrainment condition.
Children's overall postive affect varied by entrainment condition.
Next, we asked whether children's affect changed during the session. We split the affect data into the first half of the session and the second half of the session, using the data timestamps to determine the halfway point. We ran a 2 × 2 × 2 mixed ANOVA with time (within: first half vs. second half) × Entrainment (between:
Like before, we found a significant main effect of Entrainment on disappointment,
Children's affect during the first half and the second half of the interaction varied by entrainment condition.
We found a significant main effect of time on joy,
We saw trends for interactions of Entrainment with time: concentration,
We also saw trends for interactions of time with Backstory for fear,
Children's affect during the first half and the second half of the interaction varied by backstory.
We performed a 2 × 2 × 5 mixed ANOVA with Entrainment (
Regarding the Picture Sorting Task, overall, Tega was placed at a mean position of 4.78 (
We performed a mixed ANOVA with Entrainment (between:
In the
The frog was placed significantly closer to the human adult than the robot arm and computer, and significantly farther from the human adult than the baby, but otherwise its position did not differ significantly from any other entities, except in the
In the
Regarding the distance of each entity relative to the Tega robot, we observed a significant main effect of Entity,
The cat was placed closer to Tega than most other entities. It was not placed significantly differently than the teddy bear in the
The computer was placed farther from Tega than all entities except the robot arm and, in the
Finally, we also observed trends for Tega to be placed farther from the frog, and also closer to the human adult than the frog was, in the
We observed no significant differences between conditions regarding whether children were more likely to agree to do the fourth picture with the robot, give the robot their sticker in the sticker task, or give the robot a bigger goodbye gift (in terms of how meaningful the robot might think it to be). About half the children in each condition chose to do the fourth picture; we did not see any effects of the number of picture conversations (i.e., the three required vs. the optional fourth one) on the results. If we looked at children's likelihood to perform all three activities (adding up the fourth picture, the sticker, and the goodbye gift, rather than any one individually), we saw a trend for children in the
We found that children who gave Tega a closer score on the IOS task were also more likely to use the target words in their stories,
In addition, children who placed Tega closer to the human in the Picture Sorting Task were also more likely to use phrases similar to the robot's,
We did not observe any significant correlations of children's vocabulary scores with their phrase mirroring or any of the relationship assessments.
We asked whether a social robot that entrained its speech and behavior to individual children and provided an appropriate backstory about its abilities could increase children's rapport, positive relationship, acceptance, engagement, and learning with the robot. Below, we discuss the main findings and then discuss the implications of these findings.
Children learned the target vocabulary words in the robot's story and were generally attentive and engaged with the robot regardless of the experimental condition. They showed a variety of emotional expressions throughout the interaction. Children remembered the robot's story as evidenced by their ability to retell the story and their identification of target words on the vocabulary test. These results are in line with the prior study using this story activity (Kory Westlund et al.,
We did see differences in children's learning by condition. Contrary to our hypotheses (H1), children in the
A second explanation pertains to the learning results we observed. There was a ceiling effect and little variance in children's responses, with 43% of children correctly identifying all six target words, and 41% correctly identifying 5 of the target words. If a significant number of children were already familiar with the target words, then the vocabulary tests would not reflect their learning during the task with the robot; the difference between conditions may not reflect children's learning in the task. Furthermore, given that children's receptive language abilities may precede their expressive abilities (Bloom,
When we examined children's mirroring of the robot's speech, we saw that children did mirror the robot (H2,
The lack of difference in phrase mirroring was counter to our hypotheses (H3). Perhaps children did not feel sufficiently more rapport with the entraining robot for this to affect their storytelling. Indeed, in all conditions, the robot was a friendly, expressive character, which children generally said they felt close to—as close as to pet or parent, though less close than to a best friend. The entrainment only affected the robot's speech and some animations (which were played primarily in accompaniment with speech). In particular, if a child was very shy and rarely spoke, then the robot had fewer opportunities to adapt and entrain to that child. Perhaps greater difference would be seen if the robot also entrained other behaviors, such as posture, gesture, or word use. Another explanation is that perhaps language mirroring is not as closely linked to rapport as we expected; there is limited research so far suggesting this link, and more is needed.
The robot's entrainment and backstory also affected children's displays of positive emotions during the interaction. All children were engaged, but children in the
Children in the
Related to this, we saw that children's attention increased over time in the
Regarding the decrease in attention in the
We observed that children showed greater acceptance of the robot when they had heard the robot's backstory, as we expected (H4;
As noted above, children generally felt as close to the robot as they did to a pet, favorite toy, or parent, though not quite so close as to their best friend (
In support of our hypotheses regarding the connection between children's feelings of closeness, rapport, and relationship with learning and mirroring the robot (H7), we observed that children who rated the robot as closer to themselves also used the target words more often and emulated the robot's story more (
Finally, we also observed a few age differences. The length of children's story retellings differed with respect to their age, but did not vary by condition (
Taken together, these results show that the robot's rapport and relationship-building behaviors do matter in interactions with young children. A robot that deliberately emulates a child's speech in a way similar to how people mirror each other can elicit more positive emotion and greater emulation of key words in a language learning activity. Children's feelings of closeness are related to their emulation of the robot's words in their stories.
Our results also mirror, to an extent, the results in the prior study that explored a robot's use of expressive vs. flat speech (Kory Westlund et al.,
However, in other work on language learning with social robots, the robot's social interactive capabilities have been found to influence children's relationships and social acceptance of the robot, but not their learning (e.g., Kanda et al.,
These studies, however, have generally included learning tasks that did not require a robot or much social behavior for learning to proceed. For example, the second language learning activities used by Vogt et al. (
This hypothesis is supported by Lubold and colleagues' recent work with middle school children and adults, in which a social robot with vocal entrainment contributed to increased learning on math tasks, though not increases in self-reported rapport (Lubold et al.,
Our results also extend prior work showing that children learn through storytelling with peer-like robot companions in ways that are significantly different from how children learn and engage with other technologies. We are seeing a peer learning dynamic similar to that seen in child-child interactions. Children socially model and emulate the behavior of the robots, like they do with other children. For example, children are more emotionally expressive when the robot is more expressive (Spaulding et al.,
This study had several limitations. First, we did not control for children's individual differences, particularly with regards to learning ability, language ability, or socio-economic status, all of which may affect individual children's social interactions and learning with the robot. Furthermore, we did not obtain an equal number of children at each age group to participate in the study. Future work should examine a more homogeneous sample as well as explore the stability of results across individual differences and across ages as children grow older.
We also lacked complete story retelling data and affect data for all children. Some children did not retell the story and in a few cases, we had issues regarding the audio quality of the recorded stories. Some children's faces were not recognized by the Affdex software, and a few videos were missing or insufficiently captured a full frontal view of the children's faces, which was necessary for affect recognition. As a result, the analyses reported are underpowered. Future work should take greater effort to obtain quality audio and video recordings for all children during the study.
As mentioned in Kory Westlund et al. (
The robot's automated entrainment was limited to its speaking rate and pitch, so if a child was very quiet or spoke rarely, the robot would not have been able to entrain to that child. Because volume and exuberance were teleoperated, these occurred for all children. Future work could explore ways of encouraging shy children to speak up, or explore other modalities for entrainment, such as posture, gesture, facial expressions, and word use.
It is also unclear how generalizable the results are to robots with different embodiments or morphologies. The Tega robot that we used appears much like a fluffy stuffed animal, and thus is morphology could be seen as more familiar to children than a robot such as the Aldebaran NAO, which is humanoid. Children may feel a different level of comfort or uncanniness with a humanoid robot than with the Tega robot.
Finally, this study explored only a single one-on-one interaction with the robot. As such, any overall effects could be related to the novelty of the robot. However, children had the same amount of exposure to the robot in all conditions, so novelty cannot explain the differences we observed between conditions regarding the effects of entrainment and backstory.
Because learning tends to happen over time, as does the development of relationships, future work should explore longitudinal interactions to help us better understand the relationship between learning and rapport. Furthermore, children are frequently accompanied by friends and siblings in educational contexts. We do not know how multiple encounters with the robot or how interacting in groups might affect children's development of a relationship and rapport with the robot. Exploring group interactions that include multiple children, or children in concert with parents and teachers, could help us learn how to integrate robots into broader educational contexts and connect learning with peers to learning in school and at home.
In this work, we explored the impact of a robot's entrainment and backstory on children's engagement, rapport, relationship, and learning during a conversation and story activity. We found that the robot's rapport- and relationship-building behaviors affected children's emulation of the robot's words in their own stories, their displays of positive emotion, and their acceptance of the robot, and their perception of the robot as a social agent. This study adds to a growing body of work suggesting that the robot's social design impacts children's behavior and learning. The robot's story, use of relationship behaviors, nonverbal immediacy and rapport behaviors, social contingency, and expressivity are all important factors in a robot's social design.
This study was carried out in accordance with the recommendations of the MIT Committee on the Use of Humans as Experimental Subjects with written informed consent from all child subjects' parents and verbal assent from all child subjects. All child subjects' parents gave written informed consent and all child subjects gave verbal assent in accordance with the Declaration of Helsinki. The protocol was approved by the MIT Committee on the Use of Humans as Experimental Subjects.
The study was conceived and designed by JK-W and CB. Data analysis was performed by JK-W. The paper was drafted, written, revised, and approved by JK-W and CB.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
We would like to thank Paul Harris for his advice regarding the assessments, Kika Arias and Adam Gumbardo for help creating study materials and performing data collection, and Farida Virani, Branden Morioka, Anastasia Ostrowski, and David Cruz for additional help with data collection.
The Supplementary Material for this article can be found online at:
1The children who failed to complete the study were primarily younger children (one 3-year-old, five 4-year-olds, one 5-year-old, and two six-year-olds). Most were very distracted during the session and did not want to play with the robot for the full duration of the session. One 4-year-old and the 3-year-old appeared scared of the robot and did not want to interact at all, even with parental prompting. One of the 6-year-olds had accidentally signed up for the study twice, and this was not noticed until after we began the session.