Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Robot. AI

Sec. Human-Robot Interaction

Volume 12 - 2025 | doi: 10.3389/frobt.2025.1547578

This article is part of the Research TopicPersonalized Robotics: Capturing Variability in Child–Robot Interactions in Education, Healthcare, and Daily LifeView all articles

UpStory: the Uppsala Storytelling dataset

Provisionally accepted
  • 1Department of Information Technology, Uppsala University, Uppsala, Sweden
  • 2Department of Information and Computing Sciences, Faculty of Science, Utrecht University, Utrecht, Netherlands, Netherlands
  • 3Department of Information Engineering and Computer Science, University of Trento, Trento, Italy

The final, formatted version of the article will be published soon.

Friendship and rapport play an important role in the formation of constructive social interactions, and have been widely studied in education due to their impact on learning outcomes. Given the growing interest in automating the analysis of such phenomena through Machine Learning, access to annotated interaction datasets is highly valuable. However, no dataset on child-child interactions explicitly capturing rapport currently exists. Moreover, despite advances in the automatic analysis of human behavior, no previous work has addressed the prediction of rapport in child-child interactions in educational settings. We present UpStory -the Uppsala Storytelling dataset: a novel dataset of naturalistic dyadic interactions between primary school aged children, with an experimental manipulation of rapport. Pairs of children aged 8-10 participate in a taskoriented activity: designing a story together, while being allowed free movement within the play area. We promote balanced collection of different levels of rapport by using a within-subjects design: self-reported friendships are used to pair each child twice, either minimizing or maximizing pair separation in the friendship network. The dataset contains data for 35 pairs, totaling 3h 40m of audiovisual recordings. It includes two video sources, and separate voice recordings per child.An anonymized version of the dataset is made publicly available, containing per-frame head pose, body pose, and face features. Finally, we confirm the informative power of the UpStory dataset by establishing baselines for the prediction of rapport. A simple approach achieves 68% test accuracy using data from one child, and 70% test accuracy aggregating data from a pair.

Keywords: Child-child interaction, Multimodal dataset, machine learning, Rapport, Social signals

Received: 18 Dec 2024; Accepted: 04 Jun 2025.

Copyright: © 2025 Fraile, Calvo-Barajas, Apeiron, Varni, Lindblad, Sladoje and Castellano. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence:
Marc Fraile, Department of Information Technology, Uppsala University, Uppsala, Sweden
Natalia Calvo-Barajas, Department of Information Technology, Uppsala University, Uppsala, Sweden
Ginevra Castellano, Department of Information Technology, Uppsala University, Uppsala, Sweden

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.