ORIGINAL RESEARCH article
Front. Robot. AI
Sec. Human-Robot Interaction
Volume 12 - 2025 | doi: 10.3389/frobt.2025.1547578
This article is part of the Research TopicPersonalized Robotics: Capturing Variability in Child–Robot Interactions in Education, Healthcare, and Daily LifeView all articles
UpStory: the Uppsala Storytelling dataset
Provisionally accepted- 1Department of Information Technology, Uppsala University, Uppsala, Sweden
- 2Department of Information and Computing Sciences, Faculty of Science, Utrecht University, Utrecht, Netherlands, Netherlands
- 3Department of Information Engineering and Computer Science, University of Trento, Trento, Italy
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Friendship and rapport play an important role in the formation of constructive social interactions, and have been widely studied in education due to their impact on learning outcomes. Given the growing interest in automating the analysis of such phenomena through Machine Learning, access to annotated interaction datasets is highly valuable. However, no dataset on child-child interactions explicitly capturing rapport currently exists. Moreover, despite advances in the automatic analysis of human behavior, no previous work has addressed the prediction of rapport in child-child interactions in educational settings. We present UpStory -the Uppsala Storytelling dataset: a novel dataset of naturalistic dyadic interactions between primary school aged children, with an experimental manipulation of rapport. Pairs of children aged 8-10 participate in a taskoriented activity: designing a story together, while being allowed free movement within the play area. We promote balanced collection of different levels of rapport by using a within-subjects design: self-reported friendships are used to pair each child twice, either minimizing or maximizing pair separation in the friendship network. The dataset contains data for 35 pairs, totaling 3h 40m of audiovisual recordings. It includes two video sources, and separate voice recordings per child.An anonymized version of the dataset is made publicly available, containing per-frame head pose, body pose, and face features. Finally, we confirm the informative power of the UpStory dataset by establishing baselines for the prediction of rapport. A simple approach achieves 68% test accuracy using data from one child, and 70% test accuracy aggregating data from a pair.
Keywords: Child-child interaction, Multimodal dataset, machine learning, Rapport, Social signals
Received: 18 Dec 2024; Accepted: 04 Jun 2025.
Copyright: © 2025 Fraile, Calvo-Barajas, Apeiron, Varni, Lindblad, Sladoje and Castellano. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence:
Marc Fraile, Department of Information Technology, Uppsala University, Uppsala, Sweden
Natalia Calvo-Barajas, Department of Information Technology, Uppsala University, Uppsala, Sweden
Ginevra Castellano, Department of Information Technology, Uppsala University, Uppsala, Sweden
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.