AUTHOR=Fraile Marc , Calvo-Barajas Natalia , Apeiron Anastasia Sophia , Varni Giovanna , Lindblad Joakim , Sladoje Nataša , Castellano Ginevra TITLE=UpStory: the uppsala storytelling dataset JOURNAL=Frontiers in Robotics and AI VOLUME=Volume 12 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/robotics-and-ai/articles/10.3389/frobt.2025.1547578 DOI=10.3389/frobt.2025.1547578 ISSN=2296-9144 ABSTRACT=Friendship and rapport play an important role in the formation of constructive social interactions, and have been widely studied in education due to their impact on learning outcomes. Given the growing interest in automating the analysis of such phenomena through Machine Learning, access to annotated interaction datasets is highly valuable. However, no dataset on child-child interactions explicitly capturing rapport currently exists. Moreover, despite advances in the automatic analysis of human behavior, no previous work has addressed the prediction of rapport in child-child interactions in educational settings. We present UpStory — the Uppsala Storytelling dataset: a novel dataset of naturalistic dyadic interactions between primary school aged children, with an experimental manipulation of rapport. Pairs of children aged 8–10 participate in a task-oriented activity: designing a story together, while being allowed free movement within the play area. We promote balanced collection of different levels of rapport by using a within-subjects design: self-reported friendships are used to pair each child twice, either minimizing or maximizing pair separation in the friendship network. The dataset contains data for 35 pairs, totaling 3 h 40 m of audiovisual recordings. It includes two video sources, and separate voice recordings per child. An anonymized version of the dataset is made publicly available, containing per-frame head pose, body pose, and face features. Finally, we confirm the informative power of the UpStory dataset by establishing baselines for the prediction of rapport. A simple approach achieves 68% test accuracy using data from one child, and 70% test accuracy aggregating data from a pair.