How can we make robot dance expressive and responsive? A survey of methods and future directions

Wallace, Benedikte; Glette, Kyrre; Szorkovszky, Alex

doi:10.3389/fcomp.2025.1575667

MINI REVIEW article

Front. Comput. Sci., 03 June 2025

Sec. Human-Media Interaction

Volume 7 - 2025 | https://doi.org/10.3389/fcomp.2025.1575667

This article is part of the Research TopicEmbodied Perspectives on Sound and Music AIView all 8 articles

How can we make robot dance expressive and responsive? A survey of methods and future directions

Benedikte Wallace^1,2^*

Kyrre Glette^1,2

Alex Szorkovszky³

¹RITMO Centre for Interdisciplinary Studies in Rhythm, Time and Motion, University of Oslo, Oslo, Norway
²Department of Informatics, University of Oslo, Oslo, Norway
³Department of Numerical Analysis and Scientific Computing, Simula Research Laboratory, Oslo, Norway

The development of robots that can dance like humans presents a complex challenge due to the disparate abilities involved and various aesthetic qualities that need to be achieved. This article reviews recent advances in robotics, artificial intelligence, and human-robot interaction toward enabling various aspects of realistic dance, and examines potential paths toward a fully embodied dancing agent. We begin by outlining the essential abilities required for a robot to perform human-like dance movements and the resulting aesthetic qualities, summarized under the terms expressiveness and responsiveness. Subsequently, we present a review of the current state-of-the-art in dance-related robot technology, highlighting notable achievements, limitations and trade-offs in existing systems. Our analysis covers various approaches, including traditional control systems, machine learning algorithms, and hybrid systems that aim to imbue robots with the capacity for responsive, expressive movement. Finally, we identify and discuss the critical gaps in current research and technology that need to be addressed for the full realization of realistic dancing robots. These include challenges in real-time motion planning, adaptive learning from human dancers, and morphology independence. By mapping out current methods and challenges, we aim to provide insights that may guide future innovations in creating more engaging, responsive, and expressive robotic systems.

1 Introduction

Getting robots to dance has been a goal since the dawn of the idea of robotics, an ultimate expression of human likeness. For roboticists, this can help facilitate trust and comfort with their creations, while for artists, both human-like and non-human-like motion are powerful expressive tools. Furthermore, the fact that dance incorporates so many abilities also makes robotic dance a prime example of “understanding by simulating” (Simon, 1996).

The specific motor-sensory abilities employed in human dance depend on the setting, but can include motor sequence learning, real-time imitation, spatial awareness, rhythmic entrainment, and fluid improvisation. In solo artistic settings, for example, emphasis is generally placed on precise, expressive, and elaborate motor sequences, possibly in tandem with music or video. Participatory dance, on the other hand, is relatively more dependent on real-time response and adaptation to musical cues, spatial constraints and other bodies. While many of these dance-related abilities have been implemented in robots, combining them requires a common framework for training and executing controllers.

A recent review by LaViers (2024) positions robotic dance as a field at the frontier of the study of movement, where imbuing robots with the ability to dance is a precursor to the general understanding of movement by machines. Indeed, major advances in robotic control are often demonstrated through dance performance. One such advance was bio-inspired robotics, where biological principles are used to overcome differences in morphology, to integrate multimodal information, and to move in tandem with others (Aucouturier et al., 2008). Advances in control techniques led to automated motion retargeting (Ramos et al., 2015), even demonstrated by dancers in real time (Hoffman et al., 2018). In industry, dance has been used as a marketing tool by the well-known robotics company Boston Dynamics. Their dance videos demonstrate their robots' control abilities, even resulting in the development of a Choreographer software development kit.¹ Another recent review of artistic robot dance (Maguire-Rosier et al., 2024) categorizes works according to their primary focus, whether it is motion, responsiveness, or the hybridization of human and robot bodies. The authors point to how dance can help to free human-robot interaction from cultural expectations such as human-ness, machine-ness and efficiency.

Now in the era of generative AI, novel expressive movements of robots and avatars can be learned from training on data (Wallace et al., 2021; Osorio et al., 2024), with implementation in dancing robots likely to be around the corner. With the proliferation of sophisticated machine learning techniques, a survey of computational methods that can be applied and combined for general robotic dance ability is required. In this review, we focus on expressive movement and responsiveness as key complementary qualities of dance, and the sensorimotor abilities behind them. Unlike Maguire-Rosier et al. (2024), we consider a standard human-robot interaction context, where robots are distinct from any human performers. While morphology has a profound influence on affective qualities, we are most interested in control methods that are morphology-independent and hence as widely applicable as possible. For the purposes of this review, we define expressiveness and responsiveness as follows:

• Expressiveness. We define expressiveness as the agent's ability to move in a manner that conveys a thought or feeling. Highly expressive robots should be able to signal agency and express a wide variety of emotional affect. This requires a high level of variability and flexibility in its movement patterns and likely also in the level of complexity of these movements. The dimensions of expressiveness can be captured in various ways, such as in the 12 principles of animation (Yoshida et al., 2022) or Laban movement analysis (Burton et al., 2016).

• Responsiveness. Current neuroscientific theory posits that music and dance help to train predictive networks in the brain (Koelsch et al., 2019), which may explain the pleasure associated with sensorimotor entrainment or “groove” (Witek, 2017). This points to real-time responsiveness as another fundamental aspect of human dance. We define responsiveness as movement initiated by external stimulus, or modified in response to changing stimulus. The appropriateness of the timing of response is generally important. Also included in this category is an awareness of context, such that not all stimuli are treated equally.

Movement that is both expressive and responsive requires combining several abilities of which we provide an overview in Section 2, focusing on key methods and recent developments. After surveying research performed on these abilities, Section 3 will discuss their relation to expressiveness and responsiveness, and identify shared computational methods and principles. Finally, we provide perspectives on directions for future research on dancing robotic systems.

2 Dance-related abilities in robots: state of the art

The key sensorimotor abilities involved in dance are actively researched within robotics, often with one ability as a sole focus, and not always applied specifically to dance. In the following sections, one for each ability that we have identified, we present a non-exhaustive overview of key papers and recent works in robotics and AI, focusing specifically on methods that can be transferred to dance.

2.1 Movement primitive learning

Movement, including dance, can often be divided into discrete building blocks such as stepping, reaching or kicking, which can be combined into more complex movements. In neuroscience and robotics these are often called “movement primitives” (Ijspeert et al., 2013; Ravichandar et al., 2020), while in human motion analysis they are sometimes termed “movemes” (Bregler, 1997). Such division of movement naturally leads to hierarchical control schemes, where a high-level controller selects and sequences actions, and low-level controllers execute them with the necessary proprioceptive feedback as input (Merel et al., 2019a).

A common method to learn and generate primitives in robots is to learn a trajectory in a low-dimensional space (such as a hand position) which is then transformed into a stable attractor of a dynamical system in this space. In doing so, the system makes use of motor synergies, such that groups of associated joints do not need to be controlled independently. Traditionally, these spaces have been specified manually (Ijspeert et al., 2001). More recent work uses unsupervised learning to find reusable low-dimensional latent spaces (Merel et al., 2019b).

Primitives can also be learned through automated segmentation of longer demonstrations, using techniques such as change-point detection (Konidaris et al., 2012). Recent work in 3D animation and robot simulation has used vector-quantized variational autoencoders to learn a range of primitives from human movement datasets, also incorporating latent spaces (Siyao et al., 2022; Luo et al., 2023).

For robot dance, movement primitives have been used to incorporate principles of specific dance styles such as neoclassical (Troughton et al., 2022) and tango (Kroma and Mazalek, 2021), albeit in a hand-crafted way. Modulating certain movement primitives in their speed or amplitude can imbue the motion with affective and expressive qualities (Thörn et al., 2020). Depending on how feedback is implemented, primitives can be made more or less responsive to external influences such as engagement (Javed and Park, 2022), tactile feedback (Kobayashi et al., 2022) or musical features (Boukheddimi et al., 2022).

2.2 Movement sequence generation

Over the past decade, there have been significant advancements in sequence modeling tasks such as text generation using data-driven deep learning models such as the Transformer (Vaswani et al., 2017) and variants of the recurrent neural network (RNN) such as the LSTM (Hochreiter and Schmidhuber, 1997). Movement data can also be represented as a sequence by segmenting dance recordings into sequences of individual poses and/or motion primitives. Using computers to generate dance steps has a history spanning decades; from the work of pioneers such as Merce Cunningham (Schiphorst, 2013), to data-driven models for creating and exploring sequences of choreography (Pettee et al., 2019; Li et al., 2021; Carlson et al., 2016). In animation, generative movement models have been used to create realistic, fluid movements for characters, allowing for lifelike motion in films, character control in video games (Alemi and Pasquier, 2017), and virtual avatars (Lee and Marsella, 2006). Recently, the automatic generation of movement for robot control using deep learning has also emerged (Osorio et al., 2024; Park et al., 2024).

2.3 Entrainment

When accompanied by music, dancers often need to align the timing of their motions with a beat. The technical term “entrainment” refers to the general ability for an oscillating system to adapt its frequency to that of an external stimulus. Mechanistic models that achieve this can be divided into two categories. The first uses adaptive feedback to adjust an intrinsic oscillation frequency (Mörtl et al., 2014), and has been used to find a natural walking frequency for a given robot weight (Buchli et al., 2006). The second method uses self-organized responses of dynamical systems such as coupled oscillators. This principle has been applied using gradient frequency networks (Large et al., 2015). In general, highly nonlinear neural-like oscillators, when connected in the right way, have a strong tendency to synchronize (Izhikevich, 2007). Neural circuits governing vertebrate locomotion, known as central pattern generators (CPGs) (Ijspeert, 2008), can also show entrainment to a wide range of rhythmic stimuli when their parameters and weights are evolved for this property (Szorkovszky et al., 2023a).

2.4 Imitation

In the context of dance, imitation is used when learning new movements from teacher demonstration. Imitation also occurs in the form of mirroring during dance improvisation. Imitation comprises two essential components—first, motion characteristics of a partner need to be recognized through the senses, then they need to be reproduced (or retargeted) in one's own body. Stoeva et al. (2024) reviews progress in robotic imitation. Mapping can occur on various levels of abstraction, from pure motion trajectories (e.g. rotation of a hand about an axis formed by its arm), to classifiable gestures or movement primitives (turning a door knob), to achievement of a goal (opening a door) (Lopes et al., 2010). For dance, lower level mapping is most relevant, as in this case motion is not necessarily connected to function.

2.4.1 Motion recognition

Various motion recognition systems provide low-level information of a partner's actions. Motion capture systems with reflective markers or inertial measurement units can provide this directly, although this requires the partner to wear a specialized suit (Welch and Foxlin, 2002). Deep learning based pose estimation, such as OpenPose (Cao et al., 2021), are an emerging way to estimate the same data from single camera feeds.

2.4.2 Motion retargeting

For humanoid robots, an approximately one-to-one low-level mapping is feasible using inverse kinematics to solve for body positions that match the partner's as closely as possible, while dealing with constraints such as balance and stiffness. Various data-driven machine learning approaches also exist to solve the inverse kinematic problem (Xu et al., 2017). Furthermore, by incorporating prediction into the model, a robot can be made to anticipate the partner's motion and hence increase synchronicity (Dallard et al., 2023). The robot Alter3 has been equipped with both automated low-level imitation and a “dream mode,” in which its history is searched for a movement that is closest to matching the target's optical flow, for a more abstract higher level imitation (Masumori et al., 2021).

For non-humanoid robots, although low level mapping of joints is not possible, analogous movement primitives can in principle be learned. In this case, the concept of affordance is useful (Lopes et al., 2010) yet underexplored. Laban movement analysis can also be used to describe the characteristics of the motion to be matched (Burton et al., 2016; Cui et al., 2019). Another solution is to specify situations where close position mapping is most important, such as when motion is slowest, and optimize for smooth transitions at other times (Rakita et al., 2017). In a dance context, rhythmic entrainment could also be used to find an analogous movement sequence to that of a partner with a different morphology (Michalowski et al., 2007; Szorkovszky et al., 2023b).

2.5 Stabilization

Balance and stability are important for dancers to execute complex movements in a controlled manner, allowing their movements to look effortless while also protecting themselves from injuries. Similarly, it is essential for legged robots to keep the center of mass supported at all times. Hence, continuous feedback control is ubiquitous for such robots in industrial settings. Typically, this relies on hand-designed motion plans so that the kinematic equations and hence the necessary feedbacks are known (Kuindersma et al., 2016). More recently, machine learning approaches such as reinforcement learning have been used to automatically find motion plans and corresponding feedbacks (Castillo et al., 2021). In both of these cases, robots employing feedback control can be seen as responsive, as the feedback can “respond” to perturbations quickly much in the same way as a dancer would if finding themselves in an unnatural position. However, since adjustments are directed toward a stable subset of positions, feedback control by itself limits expressiveness of motion.

2.6 Spatial awareness

Human-robot interaction in dance encompasses a variety of modes, each presenting distinct challenges and opportunities. A key distinction can be made between interactions that require real-time responsiveness and those that do not. For performances with minimal or no physical touch, such as those by Merritt Moore with a UR10e robot (Universal Robots, 2020) or Huang Yi with the industrial robot KUKA (Yi, 2017), the movements can be intricate and highly expressive. These performances often create the illusion of responsiveness, despite being pre-designed and static. In such cases, the lack of real-time interaction simplifies the computational demands but limits the dynamic exchange between human and robot. In contrast, partner dancing necessitates real-time responsiveness to a human partner's movements. Recent advances have incorporated tactile (Sun et al., 2024) and haptic sensors (Kobayashi et al., 2022) to enable robots to detect and respond to proxemic cues in partner dance scenarios. For human-robot partner dancing to succeed, the robot's movements must be both predictable and highly responsive to the human's actions. Responsiveness is not limited to partner dance; it can also be applied to other contexts, such as using human movement to control robot swarms (St-Onge et al., 2019).

2.7 Improvisation

Improvisation, the ability to respond spontaneously without a predefined plan, is an important part of many creative practices. During training and the ideation stage of choreographic development, for example, improvisation often plays a central role. In these contexts, a robot's ability to perform unexpected and surprising movements may be favorable compared to more predictable movement patterns. In order to achieve more surprising and interesting interactions, previous work has leveraged the power of generative AI. AI-generated outputs, often characterized by their unconventionality or absurdity, have been explored across various creative domains, including text-based prompts for writing (Singh et al., 2022) and the ideation phase of dance composition (Wallace et al., 2024) and could play an important role in developing robot dancers that can improvise. In the project AI_am (Berman and James, 2015), a dancer interacts with an AI in real time, experiencing its “weird grace, punctuated by a glitchy flow.” This was achieved by sampling trajectories through a low-dimensional “pose map” in an interactive but partially randomized way. Similarly, Carlson et al. (2016) used evolutionary algorithms to mutate keyframes in choreography, producing poses that were unconventional and, at times, impossible for dancers to replicate directly. The unpredictable nature of these systems enriched the dialogue between the dancer and the AI, broadening the dancer's movement vocabulary.

3 Discussion

How do we imagine a future robot dancer? We might imagine a human dancer sharing a stage with a robot partner. It's body may not be human, but the way it moves and reacts to the human dancer is. When the dancer changes their use of space on the stage, the robot accommodates the dancer by moving closer or farther away. Without mirroring, the robot adjusts the speed and intensity of its movements to compliment what the dancer is doing. Without words or designated signals, the robot is capable of interpreting the dancer's movements on-the-fly. The robot can determine the tempo, emotion, ethnographic roots and genre of the dance and any musical accompaniment, and use these facets to take part in shaping the story of the performance.

By combining the disparate abilities involved in dance under the rubric of expressiveness and responsiveness, the life-like behavior described above can be pursued in a systematic way. Quantitative measures that capture aspects of expressive movement can be derived based on Laban movement analysis (Knight and Simmons, 2014; Burton et al., 2016) or animation principles (Yoshida et al., 2022). Entropy measures can then capture the overall utilization of the affective spaces that span a range of qualities or a learned set of motion primitives (Xiao et al., 2024). Due to the subjective nature of expressiveness, these should be combined with surveys following performance and interaction studies. Metrics that capture responsiveness include Granger causality for low-dimensional data such as agent position (Chang et al., 2017) and Markov-based conditional independence for higher-dimensional data such as motion capture (Castri et al., 2024). It is likely that these methods would need to be complemented with measures of entrainment, such as mean asynchrony (Aschersleben, 2002), as synchronized repetitive motion will eliminate the temporal differences used to infer causality.

In Table 1, we summarize the abilities and methods reviewed in Section 2. For each ability, we have listed the most promising methods and whether they require online or offline training/optimization (if relevant). While offline training reduces the computational load while running, online learning allows the robot to learn from its environment and interactions, avoiding “reality gaps” (Collins, 2022). Therefore this is an important consideration when combining abilities. We have also specified whether each ability is expected to increase or decrease expressiveness and responsiveness, or whether this depends on context or method (labeled “either”).

Table 1

Table 1. Desired abilities for dancing robots, relevant methods, and their impact on responsiveness and expressiveness.

Although expressiveness and responsiveness are not mutually exclusive, care needs to be taken to maintain a balance between them. For example, stabilizing feedback will make a robot more reactive to sensory input and less likely to produce extreme motions. Conversely, sequence generation on its own will plan an expressive series of movements that are not conditional on the environment.

A combination of abilities is clearly required to allow a high degree of both qualities. In order to make these compatible, it is necessary to identify shared computational methods and principles. Here, we outline some selected approaches that we believe could be relevant in an integrated design approach:

• Latent spaces: these are low-dimensional representations that capture variation or structure in data, well suited to the high dimensional spatiotemporal data encountered in the study of movement. As such, they have been used for learning movement primitives (Merel et al., 2019b), imitation (Masumori et al., 2021) and improvisation (Berman and James, 2015).

• Hierarchical control: this divides the control problem into that of low-level movement aided by proprioceptive feedback and high level movement selection (Merel et al., 2019a). While this idea is implicit to movement primitives, it is has also been fruitfully applied to entrainment (Mörtl et al., 2014). Although end-to-end control approaches with deep neural networks have made tremendous progress in recent years, they would be insufficient in providing the long-term expressive motion planning required in a dance context. A hierarchical approach would enable, for example, entrainment and sequence generation to be executed at the level of discrete motions, with expressivity and stabilization delegated to a continuous control scheme.

• Online adaptivity: a fully improvisational and interactive dancing robot would need to monitor its performance at runtime, and adaptively manage tradeoffs between e.g. responsive and expressive behavior, or improvisation and precision. In this context, an architecture considering computational self-awareness could be a relevant framework in the design process (Lewis et al., 2015). Since it is difficult to optimize several abilities simultaneously, it would also be useful to include some modules pre-trained offline, leaving continuous learning for modules where this is most important.

To achieve the goal of a fully capable robotic dancer, we also identify some important challenges to be considered:

• Morphology dependence: given a common framework for movement, can it be applied to a biped, quadruped, hexapod or other body plan? The field of evolutionary robotics has devised several approaches to adapt controllers to newly generated morphologies (Le Goff et al., 2022). Likewise, in reinforcement learning there have also recently been significant advances in morphology-agnostic control architectures (Bohlinger et al., 2024; Doshi et al., 2024). However, these are generally applied to locomotion or manipulation. Additionally, since the robots we imagine will need to be trained primarily on datasets of human dance, and interact with human partners, finding a morphology-agnostic way to map gestures in tasks such as retargeting remains an important challenge. A promising future source of data in this case is from dancers exploring the affordances of non-anthromorpic structures (Gemeinboeck and Saunders, 2017).

• Latency: responsive motion, in particular, requires fast motors and rapid computation. Although hierarchical control allows for a repertoire of pre-computed low-level actions, there is very little room to spare when including communication and motor latency. For this reason, sensory feedback loops need to be made as short as possible. Castillo et al. (2021) provide one such architecture that achieves this, with deep reinforcement learning and low level kinematic modules operating asynchronously at different sampling rates. While low-cost platforms can be made rapidly responsive (Michalowski et al., 2007), expressiveness may be limited due to lack of computational power.

• Context-aware motion planning: the responsiveness quality that we have considered so far is purely reactive—a robot that responds to some stimulus need only alter its current movement to be considered responsive. However, to smoothly deal with spatial constraints and aesthetic context, a competent human dancer would also adjust their plans, signaling of intention, and predictions of others' intentions (Kobayashi et al., 2022). The future robot dancer that we imagined would require a context-rich state variable to allow flexibility in where and how it moves ahead in time.

With these obstacles overcome, and with the outlined abilities combined under a common framework, it is possible that a robot can dance both expressively and responsively, untethered to human bodily constraints. It has already been found that AI-generated movement that would be unrealistic for humans can facilitate creativity in dancers (Wallace et al., 2024). In human-robot collaborative dance, we predict that by “closing the loop” with responsiveness, complex feedback can lead to more and more surprising movement, outside the bounds of any training data.

Dance is an artform that expands the boundaries of how we move and perceive motion, and robots are designed, above all, to move. As such, the emerging field of robotic dance has much to offer both pursuits (LaViers, 2024). We have reviewed several sensorimotor abilities that, if integrated in a single robotic system, can greatly expand both its expressiveness and responsiveness. Progressing toward this goal can expand the use of robots as creative tools, advance understanding of the human sensorimotor system, and stimulate future work in adaptive control and social robotics.

Author contributions

BW: Conceptualization, Writing – original draft, Writing – review & editing. KG: Supervision, Writing – original draft, Writing – review & editing. AS: Conceptualization, Project administration, Supervision, Writing – original draft, Writing – review & editing.

Funding

The author(s) declare that financial support was received for the research and/or publication of this article. This work was partially supported by the Research Council of Norway through its Centers of Excellence scheme, project number 262762.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that no Gen AI was used in the creation of this manuscript.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Footnotes

1. ^https://dev.bostondynamics.com/docs/concepts/choreography/choreographer.html

References

Alemi, O., and Pasquier, P. (2017). “Walknet: a neural-network-based interactive walking controller,” in International Conference on Intelligent Virtual Agents (Berlin: Springer), 15–24. doi: 10.1007/978-3-319-67401-8_2