Of altered instrumental relations: a practice-led inquiry into agency through musical performance with neural audio synthesis and violin

Stefánsdóttir, Halla Steinunn; Magnusson, Thor

doi:10.3389/fcomp.2025.1578595

ORIGINAL RESEARCH article

Front. Comput. Sci., 21 October 2025

Sec. Human-Media Interaction

Volume 7 - 2025 | https://doi.org/10.3389/fcomp.2025.1578595

This article is part of the Research TopicEmbodied Perspectives on Sound and Music AIView all 14 articles

Of altered instrumental relations: a practice-led inquiry into agency through musical performance with neural audio synthesis and violin

Halla Steinunn Stefánsdóttir^*

Thor Magnusson

Intelligent Instruments Lab, School of Humanities, University of Iceland, Reykjavík, Iceland

Recent developments in artificial intelligence (AI) are rapidly generating new musical practices. While the use of generative AI in producing music is increasingly well known, intelligent algorithms are also being incorporated directly into musical instruments. Often based on small, personal artistic datasets, these systems augment computational agency in ways that alter the perception of the human performer and transform the performer–instrument relationship. Such developments raise questions about co-creativity, instrumental materiality, augmentation through code, and how musical expressivity and communication materialise in performance with AI. This article reports on research conducted through artistic experimentation and live performance. The project involved the design of an “intelligent violin” and proceeded in four phases: (1) curating datasets, (2) training a neural audio synthesis model, (3) working with the model in practice and live performance, and (4) analysing the artistic outcomes. Documentation and analysis of the artistic process provided the basis for identifying emergent creative and phenomenological relationships between performer and instrument. The findings reveal how algorithmic augmentation reshapes the agencies at play in performance and transforms both the affordances and the sociality of the creative encounter. The intelligent violin altered performer perception, shifting the dynamics of control, responsibility, and co-creativity. The research further documents how these processes affected musical expressivity and performer–instrument communication.

1 Introduction

Advancements in artificial intelligence (AI) are rapidly impacting musical practice, including composition, production, performance, and consumption. A new augmented computational agency applied to the arts has resulted in a new machinic language (Maeda, 2019) and a new framework to think in. This article addresses these altered technological and sociomaterial relations by focusing on the effects and affects of co-creative AI in music. Such innovations exemplify the contemporary shift in focus from the “composition of works” to the “invention of systems” (Magnusson, 2019), which redefines our notions of authorship and the ontology of musical works (Gioti et al., 2023). Importantly, these machinic systems also alter the perception of the human performer and result in different performer-instrument relations and performance practices (Fiebrink and Sonami, 2020; Magnusson, 2019; Vigliensoni and Fiebrink, 2025).

Musical work with AI has already raised diverse questions about co-creativity (Dahlstedt, 2021; McCormack and d’Inverno, 2012; Du Sautoy, 2019), prompted critical thinking regarding instrumental materiality and its augmentation through code, and addressed how musical expressivity and communication are mediated in performance (Caramiaux and Donnarumma, 2021; Paz and Knotts, 2022; Privato and Magnusson, 2024). However, there is still limited research on the phenomenological experience of so-called intelligent instruments. A focus on the phenomenological relationships of musical creativity has provided a valuable entry into music research (Schaeffer, 2017; De Souza, 2017). Such findings have revealed not only embodied perspectives but also the roles that technologies and environments play in phenomenological relationships (Stefánsdóttir and Östersjö, 2022; Stefánsdóttir and Franzson, 2025).

This article presents an investigation of the phenomenological experience of instruments mediated by AI through the artistic experimentation of one of the authors, musician, composer, and curator Halla Steinunn Stefánsdóttir. Designed as an experiment with an extended encounter of agential instruments involving the creation of an “intelligent violin,” the study aims to explore the new phenomenological and creative relationships that emerge when a musician rediscovers their instrument through algorithmic augmentation. Furthermore, the study seeks to further understand the role of data curation in developing AI-augmented instruments. In this research we ask: (a) what is the role of data curation when prototyping AI-mediated musical instruments? (b) how does algorithmic augmentation transform a musician’s relationship with their instrument? and (c) what modes of listening and performance emerge through the interaction with AI-augmented instruments?

This practice-led research aligns with a broader call for contributions to artistic research that complement the predominantly technoscientific approach to research in music and AI (Gioti et al., 2023). AI music studies are emerging as an important field (Sturm et al., 2024) that is extremely relevant at this point in our musical history, but the focus has primarily been on the automatic generation of music. The current article provides insights and contributions into creative AI as part of real-time musical performance and its integration into physical musical instruments.

Documenting the current artistic research provides the opportunity to analyse all its stages, and in turn shift the focus towards the altered affordances and sociality of such work (Waters, 2021). The detailed documentation of the project development allowed us to address the field’s lack of attention to data curation, understood as materiality that is applied in the training of new neural audio synthesis models, which are subsequently applied to intelligent instruments. Because of the phenomenological approach taken, the article reports on the first author’s engagement with machine learning from the perspective of someone who is not a computer scientist. In doing so, the present study also acts as a direct response to recent calls for creative work with deep learning models to be more inclusive of users (Jourdan and Caramiaux, 2023).

2 Background

2.1 Exploring agency in co-creative AI

This project explores music-making with neural audio synthesis models as part of physical instrument functionality. This new synthesis technique is now becoming popular amongst instrument designers, commonly using the RAVE technology (Caillon and Essling, 2021), where large datasets of sounds are used in the training of a model that can serve as a real-time sound engine.

Making intelligent instruments builds upon a lineage of interactive music technology systems (for further discussion, see Di Scipio, 1998; Impett, 2000; Lewis, 1999; Rowe, 1993). As Gioti et al. (2023) note, earlier systems share some interactive and generative similarities with recent machine learning systems. However, the use of neural networks introduces a mode of interaction in which the systems are not pre-programmed from a top-down position, but rather “trained” to behave as we want them to behave. Therefore, instead of symbolic algorithms, expert systems, evolutionary algorithms or artificial life of various kinds, learning systems are rather trained on large sets of data, a process that often removes control from instrument designers and other musicians and opens up a new type of interaction. This observation is crucial to the present article, as this novel approach raises questions as to the nature of these learning processes, the new types of agencies that emerge, where responsibility lies, and the systems’ effects and expressive possibilities.

Currently, much of the discourse on music and AI centres on “prompting” that is mimetic processes aimed at producing musical works or artefacts (Dahlstedt, 2021). This focus is unsurprising given the corporate and technoscientific-driven developments that have occurred since the advent of deep learning in 2012 (Caramiaux and Donnarumma, 2021; Magnusson, 2019), in addition to the developments of the past few years, involving services such as Udio and Suno. These developments have pushed for the incorporation of AI into all aspects of digital life, leaving little room for users to develop technological literacy (Caramiaux and Donnarumma, 2021). This explains, in part, why research into the co-creative nature of work with machine learning systems is still in its early stages and why there is much to explore regarding alternative scenarios that surface when procedural and embodied learning engagement with creative AI becomes a more established practice. A new field of Music AI studies is emerging (Sturm et al., 2024). However, of importance for the present article is Magnusson’s (2019) observation that the invention of deep learning neural networks not only results in computers listening differently but also invites creators to consider which parameters to apply in the training process to shape the expressive models themselves ideally.

In this work, we shift from an anthropocentric understanding of machines being merely a supporting tool or, indeed, creators in their own right to an understanding of technology as a co-creative agency in an emergent dialectic interplay. Such creative work may not be done without a certain degree of anthropomorphism—projecting human attributes onto systems. This may again put constraints on the work, as noted by Henrik Frisk:

While this contributes to the sensation that the machine is intelligent in a human sense (which it is not), it also puts a limitation to what is possible from a system point of view. If it proves to be possible to create a machine system that can improvise interactively and creatively, why restrain the machine to behave like a human? (Frisk, 2020, p. 34).

Exploring the co-creative possibilities between humans, technologies, and environments is therefore not only bound to result in different phenomena, but also a different type of sociality (Waters, 2021). It would be limiting to remain within the realm of the anthropomorphic, as we are really confronted with another type of intelligence here: a form of alien intelligence, that of neural network architectures combined with the physics of material instruments. This gives rise to a sociology of the human and the more-than-human, one of associations in which meaning and agency emerge from relational contexts (Latour, 2005). The study of this sociology can be explored in an interdisciplinary undertaking through an artistic research lens, which brings us to the concept of the artist-researcher laboratory.

2.2 The artistic research laboratory

The model of the artistic and embodied laboratory (Östersjö, 2020; Stefánsdóttir, 2023) has gained momentum in recent years as artistic research has solidified its methodologies. This model shares similarities with scientific laboratory work, which also involves situated and embodied procedural knowing (Latour and Woolgar, 1986) conducted through experimental systems in which material or epistemic processes form the basis of inquiry (Rheinberger, 1997). However, artistic laboratories typically broaden the scope of scientific experimental systems and their epistemic practices (Magnusson et al., 2024). They exist in a liminal space between science and artworlds (Becker, 1982), in which the artistic laboratory is reinvented each time, relying on specific practices and contexts (Stefánsdóttir, 2023) and applying new bespoke methodologies relevant to the task.

The extended instrumental encounter (Armitage et al., 2022) presented in this paper has unfolded within the framework of the Intelligent Instruments Lab, a research lab devoted to designing instruments embedded with creative AI. The lab provides resources, networks, and opportunities for transdisciplinary or interdisciplinary approaches to, for example, musical instruments, framed by Magnusson and other members of the laboratory as “boundary objects” (Leigh Star and Griesemer, 1989), through which participants may conduct musical, technical, and social experimentation (Magnusson et al., 2024). This protocol breaks with much of the HCI tradition of conducting experiments in controlled contexts, aiming for data reproducibility, which creates an environment where such findings are seen as meaningful (Caramiaux and Donnarumma, 2021). By distancing itself from the concept of technical experiments, the aim is to further understand the distributed practices of musical HCI from the perspective of sociality (Waters, 2021)—understood here as processes of mediation that include humans, technologies, and environments. As a result, the artistic HCI laboratory may yield activity in which what is seen as meaningful differs from traditionally steered arts practices, along with hardware and software engineering.

2.3 Doing phenomenological laboratory research

For this project, the first author worked in the Intelligent Instruments Lab to design a new instrument, the intelligent violin. This was done in collaboration with lab members Nicola Privato and Victor Shepardson. All interactions and design sessions were recorded and documented on video.

The project began with initial discussions with lab members, exploring the Organium (Magnusson et al., 2024), which is a technical library for improvisatory design thinking, and discussing potential ways to extend the violin. Following these conversations, we tested several pre-trained models and found a promising direction. This led to a structured approach: selecting data, training models, and conducting tests—all of which formed the first round of our discussions and analysis of results. The project then progressed to practice sessions and showcase events. All lab sessions between Stefánsdóttir and other laboratory members were recorded on video and later analysed. The process included reflections on experiences as the project progressed and discussions and reflections among lab members.

A constant challenge in music research is the lack of a one-to-one relationship between music and verbal discourse (Stefánsdóttir and Östersjö, 2022). The method of stimulated recall offers a solution—an approach made possible by recording technologies, that has gained prominence in music research over the last decade (Bloom, 1953; Stefánsdóttir and Östersjö, 2022). Significantly, this method of “doing phenomenology” functions both as an analytical tool and as a means of artistic creation, in which intersubjective knowledge emerges through repeated engagement and phenomenological variation (Stefánsdóttir, 2023; Stefánsdóttir and Östersjö, 2022). This may, in turn, help further understand the mediated processes, including their ethics.

Stimulated recall was made possible through audio and video documentation of testing, practicing, and showcasing moments. This method enabled engagement with the playback from multiple perspectives, shifting between first-, second-, and third-person insights. Think-aloud protocols were integrated into initial testing through turn-taking during breaks. This method, while aimed at better understanding how such a performance unfolds, also became a way of sharing interests and establishing trust in the lab. Although both approaches were conceived as research methods, they also functioned in this project as artistic approaches.

2.4 Core concepts

In order to be precise in our application of the concepts below, we will contextualise some of the key notions here. Since this work opens up new experiences of instrumental agency, it is important to understand agency in this experiment as the result of “situated actions” (Suchman, 1987, 2007) through which creatives seek to develop their practice, distributed through culture, environment, and technology. Similar to Suchman, we assert that further understanding agency necessitates paying attention to the material particularities and unruly contingencies in encounters (Suchman, 2007). Furthermore, agency cannot be seen as something that exists only in the object itself or in the performer, but rather as a relational phenomenon that emerges in the interaction.

This article presents a twofold approach to phenomenology. As a way of “doing phenomenology,” stimulated recall (Bloom, 1953; Stefánsdóttir and Östersjö, 2022) was employed as a method of analysis but also as an artistic method. In addition, we agree with Tim Ingold’s point about how sensing forms “through a nascent world” (Ingold, 2011, p. 73). To further unpack such situations, we turn to Ihde (1990) and Verbeek’s (2008a) post-phenomenological theorising, which analyses the role of technological mediation to further understand our relations to the world. Such a material reading of human relations is complemented by Malafouris and Koukouti’s (2022) analysis of skilled creative material engagement.

In line with the preceding theories, we see co-creativity not as reserved for humans. Rather, entities and technologies are co-constitutive and part of our posthuman sociality (Latour, 2005; Waters, 2021). Similarly, by focusing on sociality, we push back against the historical tendency located within technoscience where the technical is privileged over social engagement (Suchman, 2007). Of importance is that such processes of “otherwise” becoming, when seen as ethics-in-movement, may give rise to discomfort and expose vulnerability, but also result in values, norms, and habits being challenged (Garrett et al., 2023).

3 Experimenting with intelligent instruments

This section details and analyses Stefánsdóttir’s experimentation with the creation of an intelligent violin performance platform. The experiment is an example of thinking-through-prototyping (Xambó Sedo, 2015), wherein creators engage in an ongoing dialogue with various materials, such as sketching ideas on paper or moulding clay, to develop their products through continuous creative exploration (Xambó Sedo, 2015). This is an exploratory approach driven by bottom-up ways of working, wherein “loose ends” or “failures” may feed into renewed iterative cycles (da Rocha et al., 2022). Through the prototyping process, musicians develop their practice through culture, environment, and technology (Suchman, 1987). The analysis, therefore, pays attention to not only the mediation of perception and environment (Merleau-Ponty, 2012; Elo, 2018) but also to Ihde’s post-phenomenological stance that, through technological engagement, the human becomes a “body in technology” (Ihde, 2001). When applied to prototyping, this awareness may lead humans to actively share autonomy with machine systems as they search for “different” outcomes (Andersen et al., 2019).

However, the outcomes of machine learning build on blackboxed processes, wherein the training of a model, or an interface, is not fully explainable or indeed understandable through direct, explicit logic. This leads to experimental approaches in which humans and machines interact through what researchers call “tinkering” or “probing” (Tahıroğlu et al., 2020). We explore the new models through use, and for that, we need new interfaces, such as Stacco (Privato et al., 2024). In music, this process is guided by the becoming-curator, or curatorial, an agency present and at play within artistic emergent processes. The curatorial is reinvented through each situation and is driven by contextual sensitivity while negotiating material and immaterial formations (Stefánsdóttir, 2023). Curating is involved in how we record, select, and prepare datasets, train the models, explore the models, tune instrument sensor data for the models, decide upon the aesthetics of the work, and so on. The agency exists, therefore, at the crossroads of interdisciplinarity. The results may be conceptualised as “cultural probes” (Gaver et al., 1999; Tahıroğlu et al., 2020). In such real-time experiments, as noted by Tahıroğlu et al., something is extended into the world, be it as an intelligent violin performance platform, a code library, or a showcasing event, which opens up an opportunity to assess its reception. Below, the account of the extended encounter with the new instrumental systems will be given in the first person by the first author of the article, in an autobiographical reporting style, followed by an analysis in the final section.

3.1 Methodology

The work went through four phases: first, an experimentation resulting in the curation of seven data sets. Then, there were training sessions in RAVE (Realtime Audio Variational autoEncoder) and in Autocoder (a real-time spectrum-based variational autoencoder) (Franzson et al., 2022). In the third phase, the models were applied in performance by using a baroque violin and engaging with a Max/MSP patch, and a newly invented neural audio synthesis looper. This was done both in laboratory settings and for an audience at the concert venue Mengi in Reykjavík, Iceland.¹ Finally, the complete process was documented from start to finish, and the fourth phase analysed the process and the findings that emerged in this new, unusual technosocial context.

3.2 Phase 1—curating datasets

The new technology of neural audio synthesis operates with large datasets of recorded audio. The models are trained with hours of data, sometimes of different kinds and styles, and at other times with smaller datasets, for example, from one performer or type of sound. The instruments created for this experiment were all applied to different types of neural audio synthesis models and neural spectral audio synthesis, each created from different datasets of sounds. Dataset curation aimed to explore a variety of expressions while promoting a personalised approach, which allowed for an awareness of how and why the archives were created, which was essential to a contextualised AI-training practice.

The following datasets of various lengths (recordings typically around 45 min to 1 h long) were used to create respective models:

1. My baroque violin music stems, a discrete part of the tracks, from the album project strengur.

2. Guitar improvisations by Victor Shepardson.

3. Free saxophone improvisations by Franziska Schroeder.

4. Speech and singing with my late great-grandmother, Halla Lovísa Loftsdóttir (1886–1975), from Ísmús, an Icelandic online music and culture archive.

5. Multiple sound sources from performances by myself (baroque violin), Privato (Stacco), Shepardson (Jazzmaster guitar), Miguel Angel Crozzoli (augmented saxophone), Nguyễn Thanh Thủy (Vietnamese zither/đàn tranh), and Stefan Östersjö (Vietnamese lute/đàn tỳ bà).

6. Sounds focusing on technological agency normally perceived as ‘glitches’ in field recordings (Stefánsdóttir) and latent space (Privato, Shepardson).

7. My field recordings of wind.

Looking at the datasets, the first three exemplify an approach to generative timbral training, with models trained on either a personal body of work or other instrumentalists’ work. The fourth dataset, or heritage archival recordings, comprising Loftsdóttir’s singing and speech recordings, was material that I had previously used in composition (Stefánsdóttir, 2023). The fifth dataset was crafted to compare live improvisational sociality with the sociality emerging from performing a model trained on recordings of the same musicians. The sixth and seventh datasets were developed to emphasise non-human agencies—specifically, technologies and wind phenomena.

These datasets were used to train models that were then used as an extension to the instrument. The size of each dataset was decided based on information from Privato and Shepardson. Furthermore, with the wish to produce a diverse model, the focus was on selecting a broad spectrum of sounds within each category. The editing involved several strategies. Through consultation with Shepardson and Privato, I learned that adding silences between edits was necessary to teach the model to recognise such transitions. For some datasets, such as those by Nguyễn Thanh Thủy and Stefan Östersjö, initial editing had already been completed during album production. For my own album dataset (no. 1), I chose to use dry stems rather than signal-processed ones. This approach prevented introducing additional computational or sound processing agencies into the model training. Similarly, I selected wind recordings that were made inside cairns and stone fences to prevent wind distortion that would highlight the agency of the recording equipment.

For the glitch dataset, I deliberately included sounds that are typically removed from compositions, preserving their raw form without noise removal or signal processing. For Shepardson and Privato, the approach not only became a way to retrieve material from their machine learning archives but also to venture into the liminal space of latent sound. For dataset four, in order to keep the focus on the timbre of my great-grandmother’s voice, I edited out the voices of the anthropologists and refrained from using a noise-removal plugin as I wanted the sonic signature of the original recording device to be included in the training. A similar strategy was employed by Shepardson when he edited out any talking by Schroeder, located in her studio recording, restricting it to performance only. As the material was handed over to Privato and Shepardson, they checked for amplitude peaks and reduced them if necessary. Similarly, they kept the compression relatively low as the aim was to bring forth an accurate reconstruction of the sounds.

As a result, the mindset of curating and caring for datasets for neural audio synthesis models differs fundamentally from my other music production: it revolves around introducing silences that go beyond pauses between tracks or musical phrases to separate units for training; it also invites consideration around the diversity of sources, around allowing sounds to remain unprocessed, and around gathering sounds as-they-are rather than creating an archive of pitches or techniques, removing sounds that might interfere with the overall ideation of dataset curation, thinking of how datasets are trained, the size of the dataset, how many epochs are performed in the training of the model and how it is eventually tested.

3.3 Phase 2—training models

We trained neural audio synthesis models using each of the data sets described above. The first six models were trained by Privato and Shepardson in RAVE, and the seventh was trained by Davíð Brynjar Franzson through neural spectral audio synthesis.

As noted by Shepardson and Privato, the RAVE training was done on a dedicated computer with a GPU, through the Intelligent Instruments Lab’s server. In addition, Shepardson and Privato employed pre-existing models for transfer learning, meaning that they started with an existing model and re-trained it with a new dataset. This was a faster approach, although it might have left some traces of the original model. In this experiment, when the model was applied within the generative interface, which added to the traces, it did not become a focal point for performative engagement. RAVE’s open-source training software was already installed on the lab server; configurations were made using a text editor, and then the data preprocessing and training were run via Python scripts. The progress could be monitored in an application called TensorBoard, which displays diagnostic graphs and audio samples from the model.

Training the first phase of a RAVE model takes approximately 1–2 days. In the second phase of training, the model usually starts to sound more interesting and realistic. To achieve a high-quality model, it is often necessary to train it more often and for a longer period of time. For this particular project, the average training time for RAVE models was 5 days. There is usually no clear point at which training should end; rather, it changes more and more slowly as the second phase of training continues. It is important to note here that although the focus is on timbral training with an emphasis on an authentic representation of the original sounds, the process also results in a “semantic depletion” (Privato and Magnusson, 2024), leaving the archive behind as residual waste.

As it exits training, the model is not only listened to as we do in a recording session; it also has to be activated through performance engagement. Listening here becomes performing. Initial tests aim to unearth what the training produced, and this is often done by whistling, shouting, or singing into the model through an interactive interface, to later proceed to using an instrument unless it is deemed to require further training. To test the models, we primarily apply two methods: running the models with audio input, such as in Shepardson’s Living Looper software (Shepardson, 2024), or using gestural control parameters to navigate the multidimensional latent space of the model, such as in Privato and Lepri’s Stacco (Privato et al., 2024). The materiality of such sounding produces varying responses, which will be detailed in later sections. The key point here is that the aesthetic signature of search, caused by the blackboxed training, continues to reverberate throughout the process, affecting the agencies involved.

As explained by Franzson, the neural spectral audio synthesis model was trained using the Autocoder framework on a GPU-enabled cloud instance (via Google Colab). The Autocoder software uses a variational autoencoder to learn the spectral features of an input file, allowing it to generate spectral data across the spectral features present in the training sound. As this framework is much faster at training than more elaborate ones (training time was approximately 10 min), it was possible to experiment with the hyperparameters for training, and find a point at which it both still generalised, while also providing a highly spectrally accurate output.

3.4 Phase 3—instrument design and iterative development

When we determined that the RAVE models were working and operating properly, we conducted initial tests as an extension of the baroque violin through an interface that transformed the violin’s sound into a guitar (no. 2), sax (no. 3), or other sounds, leveraging the RAVE technology’s re-synthesis capabilities. This did not feel artistically interesting or challenging to me as a composer and an experienced violinist who often works with electroacoustic music. To address this, I decided to use the Living Looper, a neural audio synthesis looper newly invented by lab member Shepardson. I could play into the looper, which then transformed the sounds over time by mixing recorded loops through a statistical ML prediction algorithm (Shepardson and Magnusson, 2023; Shepardson et al., 2025). The seventh model was applied to the Audiocoder machine learning model within a Max/MSP patch designed by Franzson et al. (2022).

The prototyping took place at the lab and home studio before being applied in a concert at Mengi, a venue in Reykjavík, Iceland. Rehearsals were recorded to study the performance from various angles, with findings that fed into ongoing work. The physical setup was that the violin was miked with a clip-on condenser microphone, and the signal was sent to the system, as seen in Figures 1, 2, but also directly to the speaker.

Figure 1

Flowchart illustrating a process starting with “press,” leading to “play violin,” then “release” and finally “computer loops.” A downward arrow from play passes through “RAVE encoder” to “fit model.” A box labeled “sample model” passes through “RAVE decoder” to “computer loops.” A box labeled “other loops” also flows into “fit model” and “sample model.”

Figure 1. Creating a living loop. Time flows from left to right. The controls are red, the sounds are blue, and the living loop algorithm is green. While the footswitch is held down, inputs are used to fit an autoregressive living loop model. While the footswitch is released, the living loop model is sampled to produce new outputs.

Figure 2

Flowchart depicting an autocoder system. A violin symbolizes input to the autocoder encoder, which processes it into latent data. This data undergoes a

Figure 2. The autocoder system.

In the case of the Max patch, the parameters were operated from the computer. In the case of the Living Looper, the performance involved using a MIDI foot controller depicted in Figure 3 to record the loops and react to their playback. The Living Looper’s four loops manifests through both audio and visual representations, as seen in Figure 4.

Figure 3

Diagram of the SoftStep 2 foot controller with ten numbered foot pads arranged in two rows. Each pad is cross-shaped, and they are labeled zero to nine.

Figure 3. The MIDI foot controller used in our prototype (manufacturer’s illustration). Eight pads were configured to send a message upon pressing and releasing. The four-way pad and continuous control features were not used.

Figure 4

Digital interface of the “Living Looper” v1.2.0b software, showing four colorful, abstract visualizations resembling tentacles. Each section has controls like erase, record, mute, and solo, and pan, with input and output settings displayed above.

Figure 4. A screenshot of the Living Looper graphical representation.

The Living Looper has a robust generative agency, allowing for only a certain amount of control or predictability. This expands the looper beyond instrument and co-creator, into a composition. While halting and observing its sound became an essential part of the performance, it also invited contemplation of whether to work with or against the looper. Shepardson observed during my first encounter with the looper that I consistently played with all four loops active. This contrasted with prior testing at the lab, which he had conducted alongside other guitarists. This difference highlighted how a musician’s habituation informs engagement with other technologies (Stefánsdóttir, 2023). Thus, rather than treating it as a “different” loop station, I was engaging with the looper more as an entity—one with four built-in tracks of dynamic potential.

The Max patch for the resonance model marked a shift from my previous collaboration with Franzson—here, I had direct control. I could adjust the timing and frequency of the model’s responses and determine how much the AI responded to or ignored my input. Here, the intelligent platform served as both an instrument and a co-creator, while maintaining connections to the aural score practices of our prior collaboration, which I will detail in a later section. In what follows, I will analyse the effect of the performance of each model.

3.4.1 Guitar and violin models

The work with the violin (1) and guitar models (2) allowed for a rather direct entry into improvisation. There were clear experiential differences between the two, which can be easily explained by the different materialities of the original archives. Describing my first encounter with the violin model, I noted that it was not so “inviting,” an observation perhaps symptomatic of how I perceived the model as “matte,” which made the organic materialities of the gut strings and horsehair bow more present to me, but thereby failed to send my imagination flying.

The guitar model, however, produced a more interesting sonic rapport through its metallic and shimmering sounds. It became immediately apparent that the plucking of my violin strings elicited an interesting response from the model, and I gained insight into what evoked soft and loud responses. However, I refrained from approaching the performance through the method of rigorous mapping, meaning that I did not systematically go through all my techniques to try and map the response. Such an entry into playing would have evoked the notion of “mapping-as-control.” Rather, my focus was to get to know the models through free improvisation, a process that also helped with familiarisation with the Living Looper.

Such probing may be seen as working with and against an instrument’s resistances, as they arise through relational use context (Ihde and Malafouris, 2019). However, since I am working with a hybrid here, there is an alteration in what materialities are at play. As a result, although I may be habitually working with my violin, I also attend to the qualities of the system, which at times may transfer my focus more towards the quality of the model, or the activity of the interface, which elicits in return a different response from me through my violin. It is a constant phase of honing my sensitivity to the situation.

I may even be sent on a “rescue” mission, that is, be forced to stop an unwanted development. An explicit example is how the Living Looper can kick into a mechanistic “hissing” mode and even involve all the other loops in the process. This occurs when the loop models collapse to the zero vector, where they essentially play the “average timbre” of the dataset according to the RAVE model. Although it catches one’s ear at first, it can become a tedious sound, necessitating a shift in agency towards that of a “repair.” The timing of such a hiss may feel right or wrong, depending on the context. At the showcasing event, when it occurred as I was letting a model sound out for the first time, via the respective loops, I saw it as a disturbance, whereas later in the improvisation, I relished in it, or rather, did almost not notice it until watching the documentation later. There, it was a sound that enabled me to enter a drone-like improvisation with the noise drone.

This highlights the unpredictability involved in such a performance, which invites one to juggle multiple techniques and constant calibration of sensitivities. If viewed from the perspective of the foot controller, then it is an entry point for both activation and negotiation of the performance; sometimes it becomes part of a rescue mission. When I no longer use it, it is a sign that I have arrived at a point of interest, where I can let the system do its thing while remaining responsive to the system. At the same time, my violin gains new agency, and we embark on a sonic journey, sounding into liminal space, which again alters the waters of our co-creator and how we respond. In some cases, however, the response of the system begs me to abandon the violin, as experienced in the next example.

3.4.2 Saxophone model

During the initial engagement with the saxophone model (no. 3) that was trained on recordings with Franziska Schroeder, the model resisted revealing its saxophone-like properties. Suddenly, however, it responded to quick, gestural playing in the violin’s high register with a watery effect—a sound that Shepardson later traced to Schroeder improvising with water in her mouth at one point during the improvisation. To encounter such traces of the body and fluids within the software had its effect. It was a reminder of feminist teachings about situated knowledges, and the idea that we should not try to transcend physical existence in our technological undertakings (Wajcman, 2004). In fact, I later listened to the entire recording with Schroeder—which had been the basis of the training—allowing me to identify connections between specific sounds in the model and her instrumental techniques.

As the model was transferred to the Living Looper, and despite trying two different training models, the violin signal sent the Living Looper into mechanical hissing, similar to what I described in the previous section, when the loop models collapsed to the zero vector. Stopping the looper suspected of initiating this effect did not suffice, as it seemed to reconfigure other loops into a similar hiss. To bypass this issue, I turned to shells and other objects, played either into the microphone or on the violin body, which successfully engaged the model. In the process, the foot controller was reconfigured—rather than facilitating mediation between me, the objects, the Living Looper, and the model, it took on the form of recording equipment. Pushing a button to record material activated a performance by the system, which again allowed me to listen to and judge the outcome. Pushing a button to stop the sound became not only a means for exploration, but also a way of saying “no” when the quality felt wrong.

The preceding example of watery experience had its effect, meaning that I had to respond to its call or somehow acknowledge its effect in my performance. This resulted in the composition of an electroacoustic piece, juxtaposed with the performance of the system at the showcasing event. This staged the complexity of working with the model and, through it, established performability. The electroacoustic work grew out of an older archive of proximal sounds, featuring female family members, friends, and colleagues. This archive was produced as a response to how female experiences and voices are undervalued and erased historically within Western Art Music (Eckhardt and De Graeve, 2017; Rodgers, 2010), but also in public life (Lane, 2020).

Here, stimulated recall to a recording with the saxophone model and the composition sketches became a way to adapt the pace of the latter and the density of its sounds to the system’s improvisation. Furthermore, when on stage, the juxtaposition of the electroacoustic composition and the Living Looper performance was enhanced by placing them in different speakers. Through such devising, the electroacoustic work also took on the form of virtual scenography, all the while supporting the search for meaningful sounds within the looper in order to honour the model. This, in addition, contrasted not only the difference between traditional electroacoustic composition and this material engagement but also emphasised that the data was not data from nowhere.

3.4.3 Cultural archive model

Editing the material now, as when I engaged with the heritage material (no. 4) as a composer a few years ago, became a lesson in the colonising effect of ethnographic techniques. As someone who engaged with the recordings in the 21st century, it begged me to ask how I could honour my great-grandmother through their usage. This explained my discomfort as initial tests revealed an immediate challenge: a dark timbre that seemed to mock Loftsdóttir’s voice—a direct result of testing neural audio synthesis systems with this material.

As with the saxophone model, this was viewed as a challenge to be negotiated through continued prototyping. Shepardson joined as a collaborating musician. We shifted the model out of the looper for more subtle processing and reintroduced the original recordings into the performance to enhance Loftsdóttir’s agency. I refined the material to include three spoken testimonies about female creativity and one song, and also translated the material and provided a cultural context for Shepardson. The chosen piece, Ólafur Liljurós, is a repetitive vikivaki that complements the Living Looper’s mechanisms. We then went on to develop a performance structure for the recordings through iterative testing.

I performed using my own model to highlight Loftsdóttir’s absence and lack of agency. Shepardson facilitated the work by processing Loftsdóttir’s archive recordings through the voice model encoder with RAVE latent space offset. The latents went through two decoders: voice and guitar models. He incorporated feedback from the decoder outputs into the encoder input. During the performance, Shepardson controlled sample triggering, delay times, feedback levels, and output mixing between the voice sample, voice model, and guitar model. This allowed for dynamic control over voice intelligibility, guitar presence, and voice alteration while responding to rhythms and timbres. Loftsdóttir’s voice appeared from the fictive archive and disappeared again, only to become fragments in its reworking. I oscillated between listening to her and holding back to allow her to be audible, to then oscillate again towards a response to the model and Shepardson’s archival reworking.

Shepardson and I exchanged models during the singing, with me controlling the guitar model—a demonstration of mutual trust. Herein, I worked in part with a chordal progression in response to Loftsdóttir’s singing. Shepardson’s setup involved three distinct models: one processing the voice into harmonic and percussive sounds, with the former sent to the voice model and the latter to the guitar model, while a third model created a synthetic string-like version of the voice recording, using pitch tracking and comb filters, combined with my violin model. Live violin input could feed into all models via feedback delay. The performance was effective; as Loftsdóttir sang about a human’s encounter with the uncanny world of elves, our playing, or “making strange,” unfolded. At the end, in what resembled a coda, I mimicked a horse’s gallop, bringing the wounded Ólafur to die in his mother’s arms,² and then concluded by detuning my G-string and deactivating the loopers until no more could be played. In the performance, Shepardson triggered the voice samples, set the delay time and feedback, and mixed the levels of the four sounds. He also controlled the amount of live violin sound entering the models and the length of delay on it. In this way, he could orchestrate the “accompaniment” to Loftsdóttir’s song and switch over to responding to my performance during breaks.

During both performances, my focus would shift and react to the varying events occurring within the network. At the same time, I felt that we were conjuring up something resembling a ceremony. It was a way of coping with colonising techniques, all the while honouring my great-grandmother. It was a way of prototyping a new network, all the while staying true to our feelings. The performance at Mengi became a way to diversify how archives might be shared with an audience, meaning that the audience members encountered the archive in an “otherwise” way.

3.4.4 Glitch and wind resonance models

Moving on to the next example, models six and seven, then they had their focus on more-than-human intentionality. In a model, which the team came to call the “glitch model” (no. 6), RAVE was trained on sounds revealing unexpected agencies at the interface of field recordings made by me, or in latent space by Privato and Shepardson. This was prompted by thoughts around the “residual waste” produced by the training and its kinship to field recordings.

To further enhance more-than-human intentionality, I decided to play the field recordings used in model training back at the model via a portable speaker. Fittingly, the Living Looper responded by producing a glitch, most likely caused by friction between the microphone and the lo-fi hiss of the unedited track, which was magnified by the speaker. It was not a model that I had rehearsed much with prior to the showcasing event; rather, I wanted things to happen in the moment. The performance was kept short, and at the outset, I approached it as “activations”³ of my co-creator through a series of fast alternating techniques. It was not until the end that I really started playing the violin with it, thereby drawing down on my activation through the foot controller.

The wind resonance model (no. 7) necessitates unpacking how work with an intelligent performance platform may grow out of longitudinal collaboration, diversifying how ergodynamics (Magnusson, 2019) are approached. The model was the result of a long-term collaborative relationship with composer Franzson. Through this, artistic methods were used to enhance the musician’s contribution to the prototyping, thereby challenging historical protocols of Western Art Music. This led to processes that incorporated methods of shared listening and stimulated recall in relation to material and ethnographic engagement, extending beyond the usual ergodynamic training and design of musical HCI (for further information, Stefánsdóttir, 2023; Stefánsdóttir and Franzson, 2025).

In our collaboration on the album project strengur, Franzson synthesised my field recordings of wind for use in my own compositions. A by-product of this rendering was a white noise-like component, which he later used as an aural score in compositions incorporating AI-generated “resonances” that I subsequently performed. This background explains why, when playing with the wind resonances, I was engaging in a process where the ergodynamics bore a trait of complexity, symptomatic of a cooperative distribution of agency within prototyping processes. I have an ownership and familiarity with them, and I can easily embark on an improvisation that seeks inspiration in the feel of, or atmosphere of, the wind. Here, I was able to start singing, which is something I did not find an entry for with the looper. I had, through testing, determined that a 20-s delay felt comfortable in terms of pacing, and I activated my co-creator and responded to it. Midway through the showcasing performance, I preferred for it to start “listening” to me differently, that is, sending the signal into other configurations, producing new atmospheric entities.

3.4.5 Multiple bodies of work model

The final model in this section was a model trained on multiple bodies of work (no. 5), linked to a row of musicians from a concert that I played a month earlier in Mengi. The event included another member from the Iceland-based chamber ensemble Nordic Affect, two musicians from the lab, and two musicians from the Swedish/Vietnamese ensemble The Six Tones. The model, which also included performances by Shepardson, served, as stated, as an intervention to contrast the improvising events’ sociality with that of human-instrument improvisation.

From the outset, it was clear that the training and consequent application in the Living Looper revealed sounds and timbres that sounded out its preference for sounds from the datasets of lab members Privato, Shepardson, and Crozzoli, rather than acoustic instruments such as the baroque violin, the Vietnamese zither (đàn tranh), or the Vietnamese lute (đàn tỳ bà). Initial testing became a search for techniques to elicit traces of these instruments, honouring everyone involved in the model. During the showcasing event, my improvisation unfolded in a way that it would not usually, allowing me to detect in the documentation when I was playing a strophe, which was primarily aimed at trying to diversify the response of sounds from the looper. It was a rupture in what I was doing, aimed at trying to find something within the model. After it, I went back to developing material that I had been engaging with earlier in my own playing as a form of co-creation. I also started doing slow glissandos, a way of remembering a beautiful improvisation that occurred during the live event a month earlier between a harpsichord and the zither. This revealed how the intelligent performance platform might invite citation-like effects, a type of sonification of memories related to the agencies found within the model.⁴

In the background, video documentation from a rehearsal with the musicians was projected. Expanding the media on stage became a way to emphasise the model’s origins, which, in performance with the system, enacted or conveyed the differences in sociality to the audience. The performance with this and other models was accompanied by commentary, providing the audience with an entry into what working with creative AI might entail. Such an approach is a common practice for the Intelligent Instruments Lab, where concerts are showcasing events aimed at making the latest innovations accessible to the general public outside of academia. All the while, the performance served as an important stepping stone to test the prototyping.

3.5 Analysis

The aim of the Intelligent Violin project was to explore a creative engagement with a new type of intelligent violin performance platform for experienced instrumentalists. This resulted in an experimental design that juxtaposed differing datasets and consequent generative AI models to further understand the phenomenological and creative relationships that emerge through instrumental algorithmic augmentation. This project opened up new work practices for me as a technologically skilled performer of electronics, as I engaged in prototyping through curation of datasets, training, and testing—processes done very differently when designing with generative AI.

While “data” is a ubiquitous term—and as Cascone reminds us, “all data can become fodder for sonic experimentation” (Cascone, 2000, New Music From New Tools section)—the data in this project specifically refers to digital sound-data of a small non-computational nature. An exemption was the “glitch” dataset, which resulted from exploration of latent space. The selection of data and pre-trained models was not about picking just “any” data. Rather, it was a situated activity, wherein the curation—understood as a form of planning and projection (Suchman, 2007)—was driven by an awareness of how the materiality of field recordings, and recordings done within institutional spaces, reflects the individual recordist’s approach, with their personal techniques and methods influencing the outcome. Similarly, I was not interested in “sonic butterfly catching” (Voegelin, 2014) or “helicopter research” approaches that colonise others. At the same time, I was aware that the data would help prototype a future instrument and co-creator. Selecting data was a multi-faceted role that demanded respect and careful consideration for the dataset, its creators, and its origins, acknowledging that it would take on an “an alternative form of intelligence in an alternative form of materiality” (Caramiaux, 2023, p. 88), thereby floating a future space for altered social encounters.

If viewed from the perspective of laboratory experimentation, the recordings and the dataset from latent space represent traces and archival inscriptions of epistemic processes (Rheinberger, 2023), which then become part of a new “epistemic thing” that is material or epistemic processes that are not yet fully understood or defined (Rheinberger, 1997). Working with pre-existing personal archives in this particular experiment was then both a way of grappling with the ethics of data curation while creating an opportunity for the artist-researcher to further grasp its outcomes by contrasting them to familiar sociomaterial situations. In this way, the experiment shed light on both possibilities and challenges of such work, producing findings that fed into later creative iterations. The curation of data, as it continued in the phase of editing, was not a matter of pre-given rules, but rather emerged through the encounter, a phenomenological way to knowledge (Ingold, 2011), in which sounds were explored and actively weighed in. The learning process—conducted through a method of reduction or, as defined by Husserl, a means of “leading back” (Husserl, in Christensen, 2012) to the way the world manifests to us—involves repeated listening to a phenomenon from various angles, as a form of phenomenological variation (Ihde, 1977), followed by cutting and splicing. Decisions were interlinked with projections about the model’s future outcomes as they were to be activated in performance. To edit in silences was to plan for the model to be able to understand silences in the interplay. To avoid the agency of album signal processing on the instrument, I opted for dry violin stems. As a way of honouring glitchy agencies, noise plug-ins were not employed.

The preceding examples align with how ethics-in-movement invite humans to cultivate a sensitivity towards our many human and more-than-human others, in addition to the situation at hand (Garrett et al., 2023). An explicit example of this was demonstrated by the heritage model. As the voices of anthropologists were cut out in order to keep the focus on Loftsdóttir, the listening thereof also became a lesson in the techniques of anthropologists of the past,⁵ emphasising the question of how to treat the material ethically. The sentiment directly re-surfaced when work with the trained model started. Editing is, then, a mediated process, afforded by micro-sonic listening (Östersjö, 2020) driven by the close monitoring over headphones wherein the production of sonic data sets and the agencies that surface through them give way to intersubjective relations (Stefánsdóttir and Östersjö, 2022), which will affect a future interplay as the models are applied.

The violin recordings presented another interesting case. While initially made during the time of editing multiple works for an album, they entered the prototyping phase as representations of my violin and playing style. My intentions of constructing an “instrument” resulted in focusing on the recordings as violin phenomena rather than as parts from within larger compositions—perhaps enhanced by my awareness that the training would radically alter the material, reducing the music to timbre. This situation reminds us how artefacts have an enabling and constraining mediational potential to the extent that it may become difficult to pinpoint where human intentions and material affordances begin and end (Malafouris, 2008).

Focusing on the moment when the models exited the training, the whistling or shouting into latent space bore kinship to the type of listening found in nature, termed by Truax (1984) as “listening-through-search.” Developed as part of his theorising of environmental listening from a human perspective, it is an approach of attentive listening that focuses on detail, such as that done through echolocation. To listen-through-search into latent space produces a different response than whistling down a gore or clapping in a performance venue. What echoes back is a transformed material of the timbral kind, which at the same time bears some uncanny relations to the original dataset. Meanwhile, the “agency” or distributed effect of synthesis becomes quickly apparent to the musician: this is exemplified by how the resonance and the RAVE models represented vastly different sonic signatures, a testimony to the difference in system coding.

The RAVE models produced an uncanny feeling of familiarity that differed from engagement with other electronic instrument systems. I began to feel the grains of the voices (Barthes, 1977) of other people, their aesthetics, their lives, but also those of technologies. As a result, the lively traces of other people and non-human entities became part of my violin playing, which was configured with the system. This relates to Privato’s hauntography (Privato and Magnusson, 2024), a method he has been developing in relation to his work at the lab and evident in the creation of the Stacco instrument, which resembles an Ouija board where the performer explores the sonic spectres lurking within the model.

Summoning traces of recordings of the same instrument with my own violin was a strange encounter—an uncanny phenomenological variation of the instrument. Through this experience, and in line with Ihde’s (2009) post-phenomenological theorising, horsehair, wood, and gut strings “spoke” to me differently. At the same time, the lack of traces from agencies found within an original dataset may raise ethical concerns, experienced in this project, in how a bias in the system coding made low-amplitude instruments in model 5 and field recordings in model 6 harder to call forth within the model. The search for missing agencies was then a critical iteration set to enact responsibility (Suchman, 2007) rather than mapping as “manipulation” or “control.”

Sociality in music can take on radically different forms than in other social settings. An explicit example is how “rudeness” is not uncommon in improvisation in music, manifesting through, as Frisk (2020) notes, protocols of non-listening through which “ethical capacity is increased rather than hindered” (p. 38). When working through an intelligent performance platform, the augmenting of ethical capacity may fall along such lines of pre-established social norms, but it also amplifies another approach, that is, how ethics can necessitate lending technologies a new role. This was made evident in the choice of giving the foot controller and looper the agency of a recording device during the saxophone performance. Further strategies to honour the model prompted me to abandon my violin and use shells, a paper clip, and worn-out gut strings. This brings to mind Don Ihde’s concept of “multistable variations,” or how a technology’s structure allows for different trajectories or developments (Ihde, 2007). In this case, it was a creative way to negotiate the ethics of performance. Such curatorial intentions formed “through” performing, which, within the intelligent performance platform, due to the liveness of algorithmic agency and the haunting effect, operated on a complex scale.

Focusing on the agency of the violin, it maintained its autonomy within the platform rather than being overtaken by the interfaces. This allowed for signal processing, which during the showcasing event involved familiar EQ’ing of the amplified violin signal. Other choices, such as adding reverb, directly related to the system’s projected sound, which again stood in relation to the architectural surface—a familiar procedure from playing with playback and other systems. At the same time, the violin’s signal was sent to the system. To further understand the mediation, we may look towards Verbeek’s expansion of Ihde’s hermeneutics into “composite intentionality,” which entails a double intentionality, or “one of technology towards ‘its’ world, and one of human beings towards the result of this technological intentionality” (Verbeek, 2008a, p. 393). This entails that during the concert, two forms of human-technology relations were unfolding simultaneously:

performer → (violin-mic-speaker → world).

↑ ↓

performer → (violin-mic-foot – controller-interface-model-speaker → world).

The lower compound of agencies enacted a different performance space, wherein the instrument/co-creator changed with each dataset, consequently sending the musician off on vastly different paths of search and learning. Such composite relations affected the agency of the violin. It oscillated between being an instrument for poetic expression and communication, and being an input device, likened to echo-sounding equipment that enters latent space. It could take on many modes. For example, when a foot controller was activated to fit an input to an autoregressive living loop model, it could be done in the spirit of allowing it to eavesdrop on my expressive performance on the violin. Another example was how the sounds from a model—such as a circle of friends—inspired a performance aimed at uncovering the agencies within the model that evaded response. This was achieved through alteration of technique, by playing non-pitched percussive sounds to draw them out. Thus, the performative engagement was reinvented through each situation with new techniques on the go. At the same time, I can listen in real-time to the instrumental effect, which is entangled with my violin; an altered sentient relationship.⁶

To further unpack the dynamics of performance, we may look towards Malafouris and Koukouti’s (2022) analysis of a potter’s work with clay. They identify how such material engagement forms along the lines of a “conscious” approach to later move into an “immersive” mode (Malafouris and Koukouti, 2022). When applied to an intelligent performance platform, then the conscious mode results in an analysis of the mediations in place and the agencies that surface, which in turn invites further interventions and testing. An explicit example is how it may prompt an instrumentalist to abandon their violin and employ objects and a portable speaker. Similarly, a psychologically pressing experience, such as that experienced with the watery bodily sound of the saxophone model, or the menacing sound of a heritage model, can result in the introduction of electroacoustic composition and networking performance as a way of working through the difficult ethics that arise through the alternative sociality. It is this altered orientation (Ahmed, 2006), as exemplified through hybrid relations, that explains why later immersion is different from the immersion that arises from human-human improvisation. Here, what counts as “immersion” is the ability to tune into and balance hybrid relations through improvisation.

Thus, we may see how both conscious and immersive material engagement with an intelligent violin does not differ from engagement with humans or animals, in that sentience may result in what Candiotto (2025) described as “sparking doubt” (Peirce, 1986 in Candiotto, 2025), resulting in strategies for improving aims, but also a “loving epistemology” (De Jaegher, 2019 in Candiotto, 2025), wherein one may embody a stance of “letting be.” The engagement may result in feelings of frustration, curiosity, and surprise, but also a placid feeling that things are just as they should be: a contentment over the beauty of the sounds arising from an old gut string as it traces a violin body, and the responses they evoke in the saxophone model, and again, how they correlate to an electroacoustic composition.⁷ This, like other situations of music making, transcends mere personal expression or communication—it is a form of reciprocal action, both acting and being acted upon, albeit in novel ways.

Furthermore, this work demonstrates how new technologies transfer the notion of virtuosity, understood as mastery over an instrument (Stefánsdóttir, 2023). Here, instead of control and mastery, which traditionally falls along the lines of pushing what is humanly doable when it comes to instruments (Stefánsdóttir, 2023) and is tied up with what is perceived as challenging musical material (Melbye, 2023) the musician is invited into an alternate sociality, where she can explore through “hybrid intentionality” (Verbeek, 2008b), relational techniques, all the while trying to locate the spaces that are interesting in the co-player. Here, the models are not deterministic or closed but rather open to exploration and even surprise. This brings me to the penultimate stage of this project, or the showcasing event. While this event marked the final phase of prototyping, its documentation proved essential to this analysis. The showcasing event existed in a liminal space between the science and artworlds. By displaying the prototypes and engaging in a consequent analysis, this work challenges a common design practice noted by da Rocha et al. (2022)—the tendency to edit out failures and retroactively modify process descriptions to highlight only successful outcomes.

Afterwards, the event provided a space for gaining insight into some of the audience’s impressions: how they perceived the performance and the technologies, and the treatment of data.⁸ This analysis continues along similar lines, examining both the resonance and the frictions that arise through the prototyping and showcasing processes. Essentially, the experiment offered new ways of sensing through music and new insights into the prototyping of generative AI, all the while carving out a responsible approach to working with it. The following section will contextualise this work further in relation to current developments.

4 Discussion

One of the aims of this project was to explore the role and potential of data curation in prototyping an intelligent performance platform, while exploring the resulting interplay. Another aim was to study the experience of playing with an agential instrument, one that has been extended with a neural audio synthesis mode. As this was done by reporting on the first author’s experiences, this article responds to the call for practice-led research to complement the predominantly technoscientific approach to music and AI studies. The project also addressed the need to make creative work with deep learning models more inclusive, in addition to studying the ethical and cultural aspects of making such models for subsequent sharing with the world. The following discussion examines the outcomes through the lenses of agency, ethics, and performance practice.

4.1 Agency

Through the extended experiment and intermediary positioning within prototyping, the musician’s role evolved beyond that of a mere “user”—a term that reflects utilitarian design thinking—and also transcended the traditional assistant or advisory role typically given to performers in Western Art Music. Through this process, and in line with how laboratory experimentation comes to reconfigure roles and practices, the interdisciplinary musician’s role shifted further towards composition as a mode of instrument design, while augmenting the agency of the curatorial.

The data or recording is materiality produced in a certain context, affording repeated listening and phenomenological variation, and in this project, an entry into prototyping alternative materialities and sociality. As a result, and through the experiment, each material dataset floats a new digital music-machine-ship,⁹ where sounds, vibrations, and frequencies emerge, prompting the prototyping musician to question through each phase what agency is at play. The engagement with the models as they were applied to the interfaces resulted in a sense of uncanniness but also altered imagination. This was traceable not only to the liveness of the system, but also due to the model’s hauntographic properties, which again allowed memory and imagination to take new forms through the mediation. The same process highlighted how the effect of distributed prototyping was not lost on the musician, but rather represented yet another agency that requires a response. An explicit example: when the datasets were diversified beyond the Western mainstream music, the training model exhibited an inherent bias—favouring digital sounds over low-frequency acoustic instruments and preferring sounds from its own latent space rather than those of other technologies. This points towards how prototyping needs to allow for retraining of models to adjust such biases.

Through the project, the musician and their familiar instrument entered a mediation in which the robust system—along with its generative material traces—contributed to producing a new reality. This led to a reconfiguration of the violin’s agency, which oscillated between a familiar violin and what was described as an echo-sounding equipment. This was symptomatic of how the system invited probing of a blackboxed latent space. At the same time, the intelligent performance platform was one of a hybrid shape, and included that of an instrument, co-creator, and at times, a composition.

The experiment also revealed how the work is subject to ongoing testing and curatorial formulation, wherein the musician’s response to their co-creator may oscillate on a scale of acceptance over to a resistance of the resulting hybrid intentionality. It is here that we can better grasp the asymmetry of improvisation with humans: the curatorial activity during prototyping and testing turns the musician into a designer, sound producer, set designer, visual artist, composer, trickster and firefighter, historian, anthropologist, archivist, translator and healer, to name a few. This brings to mind Eldridge’s (2022) point about how experimentation, albeit formulated in relation to work with a feedback cello, can give the musician an entry into “complexity literacy,” during moments of building, initial encounters and continued improvisatory engagement.

Literacy, expressed in this experiment through curatorial agency, enabled the musician to recognise new sensibilities and to create novel connections. Such findings reveal a resistance to Kozel’s (2007) claim that “we can regard technologies not as tools, but as filters or membranes for our encounters with others.” (p. 70). What the experiment revealed is that technologies are not membranes; rather, it is the meeting with such a technological “other,” and all its entangled agencies, situated within a wider context and within complex temporalities, that is at the heart of the matter.

This research aligned with previous findings in that it does not attribute human cognitive skills to such systems (Frisk, 2020). Rather, it contributed to the broader quest of what prototyping engagement and machinic asymmetry may afford. In line with recent studies into algorithmic agency (Gioti et al., 2023; Melbye, 2023) it led to a decentering both of the human subject and instrumental agency. This initial experiment also raised questions about alternative approaches to staging and designing such processes—a topic that we will address in the section on performance practice. The next section transitions from examining emergent agency to considering the ethical implications of these encounters.

4.2 Ethics

The analysis revealed that to curate datasets is to hold oneself accountable for the techniques of gathering and to avoid the colonising tendencies associated with such acts. As a result, the selection of small datasets for this project was not about starting afresh with a blank slate—rather, it acknowledged how data is always sociomaterial and can be read as such (Whitelaw, 2004). This invites the practitioner to cultivate contextual sensitivity, manifested, for example, through personalised datasets, and also through the exploration of material that goes beyond the Western mainstream music, thereby promoting inclusive AI (Bryan-Kinns et al., 2024).

The editing process involved ongoing decisions about what to include and exclude; these decisions were made through dialectical engagement with the material and in relation to projections around the system’s future performativity. Being mindful of human and more-than-human others meant choosing not to use noise-removal plugins and thereby preserving glitches and the sonic signature of an old recording device. It also meant carefully cutting out unwanted material such as talking, the breaking of wind on microphones, or ethnographers’ voices. This approach acknowledged the relationships, context and power dynamics at play, which ultimately affected the prototyping outcomes.

Engaging with the model through an interface created a hybrid encounter, where the algorithmic augmentation functioned as an instrument, co-creator, and composition. As in other improvisational contexts, the musician must remain open and embrace a learning mindset—oscillating between accepting the system’s differences and performance, “letting it be” at times, and choosing when to resist at others, thereby reshaping its expressivity. However, the system’s expressiveness can sometimes lead to a breakdown in trust. The reasons are complex: in this experiment, they may stem from perceiving the Living Looper as too robust or compositional, from timbral training that distorts archived voices, or from watery mouth sounds that trigger concerns about what kind of practice this digital music-machine-ship produces.

In his research on ethics in machine improvisation interaction, Frisk (2020) notes that such an experience concerns the “self” as much as the system’s ability to be a good “musician.” From this perspective, while many lab testing moments involved playful engagement, the core question remained: how can “I” participate in this? This question became more urgent when sensibilities became confused or performability broke down. In the experimental prototyping situation, these challenges were approached in the following way: as a situation to reflect on the ethics and mediations of specific situations and as an opportunity to engage with the question of how to restore performability. The discovery unfolded through the situation itself—sometimes requiring an immediate response during performance, and other times demanding thoughtful curation and further testing outside the performance context. This experiment included various adaptations: abandoning an interface to create a networked performance, replacing the violin with alternative objects, incorporating composition and visuals into the work and turning off loopers during performance as they gravitated towards a zero vector.

The preceding examples reveal that it may be simplistic to claim that aesthetics shape ethics in music. As Frisk (2020) observes, machine improvisation experiments create a space that extends both technosocial and traditional music practices, revealing a complex interrelation between ethics and aesthetics. Design practice—or prototyping—invites us to take responsibility for the mediations that the digital music-machine-ship floats. Through his theorisation on morality in design, Verbeek pointed out how such an “action-ethical” approach can evolve around “assessing technological mediations, focusing on the quality of the practices that are introduced by the mediating technologies, and their implications for the kind of life we are living” but also through experimentation forged to “assess mediations, and to try to help shape them” (p. 101). This article presented such an approach, showing how musicians’ insights contributed to building accountability. This approach not only helped musicians cope with performance situations involving altered phenomenological orientations, but also informed future prototyping initiatives—which leads us to the final section of discussion: performance practice.

4.3 Performance practice

The search for performance practices involving intelligent performance platforms is still in its early stages. In line with how performance is an embodied and situated activity we can assert that future performance practices will be defined by how they align with and differ from existing practices—in all their fragmentation. What the experiment has revealed is that the laboratory model is an effective way of working toward what a performance with algorithmic performance platforms can become. This allows participants to pursue their work in an interdisciplinary manner, at the same time allowing for their practice to be challenged, explored, and developed, even through insights afforded by failure. This leads us to state that although the formulation of what the pedagogy of intelligent instruments may become is yet to follow—we cannot see it as being separate from the artistic research laboratory model.

The curatorial and compositional agencies become augmented through such work. Meanwhile, the augmentation through code creates a situation in which intersubjective relations are characterised by listening, albeit one which emphasises learning and search—a significant component of the resulting aesthetics. In this way, the algorithmically augmented system invites musicians to attend to the mediation involved, the altered reality, and the relations that arise from it, resulting in the decentring of the musician and alteration in co-creative agency.¹⁰ This effect, as has been shown, challenges traditional notions of virtuosity and highlights how musicians must excel at balancing hybrid or composite relations. The hauntographic effect is also a significant element, introducing different dimensions to nested cognition (Kiverstein and Rietveld, 2018) and the role memory plays in improvisation. As this work is still in the early stages, it will be interesting to follow what form memory and ethics take as practitioners carve out a longitudinal performance practice with these systems.

The altered orientations of practice invite continued discussion regarding attribution. As noted by Franzson et al. (2022), how should a performance that contains the aura of sounds by others be treated from the perspective of copyright? The first author’s work with Franzson addresses one practice-led side of this question, as we only work through a personalised approach. The decision to project video footage from rehearsals with some of the musicians who contributed to the sixth dataset represents another creative approach to attribution. This model, similar to the work with Franzson, remains accessible only to dataset contributors due to ethical and copyright considerations. In the coming years, we will likely see a rich spectrum of approaches in which participants explore the gathering of and engagement with, data within such an alternative intelligence, a development that is bound to make music research even more relevant across disciplines. In this instance, there is much to be learned from the revision that has already taken place in relation to field recording, site-responsive and ethnographic practices.

Working with algorithmic agency has the potential to take music-making beyond what the historical protocols of music dictate. The strangeness and altered tunings create a different pathway for how a musical performance may unfold. In line with the previous point about data gathering and attribution, there is work to be done when it comes to further exploring the staging or designing of such a prototyping process, which is bound to affect the outcome. Similarly, it is worth asking what “instrumentality” we are seeking. To respond to such a question, a musician is bound to experiment and transfer even further towards compositional agency. What will the initial ideation be and what will spark it? What digital music-machine-ships will it float? How much is the musician ready to be decentered? What other sociality will it reveal? This work requires integrated collaboration that transcends historical protocols that relegate musicians to the role of technology users or composers’ assistants. Here we have identified how methods such as turn-taking, shared listening, think-aloud protocols, and stimulated recall serve dual purposes—both as research tools and artistic approaches. These methods are essential for understanding both the effects and affects of prototyping and music-making, as they emerge through algorithmic augmentations and altered tunings.

5 Conclusion

This article introduced a new instrument design that uses neural audio synthesis with small datasets, detailing four stages of design: dataset curation, model training, interface design and performance, followed by analysis. Within such a complex new instrument design, musicians gain prototyping agency through careful dataset curation and editing for training: a significant role within a wider distributed web of mediations where the act of instrument design becomes a process of inventing systems as well as composing.

We have shown how a small data selection can take on a form that resists corporate helicopter research approaches to large data curation, notably through the participant’s cultivation of sensitivity towards the techniques of gathering and editing. Importantly, we have detailed how such an intention of responsibility is an ongoing process, as the musician enters a dialectical engagement with the transformed material through an intelligent performance platform. A musician will feel the care and ideation of the dataset curation in the workings of the model. Furthermore, we have observed that augmentation through code creates a situation in which the musician’s intersubjective relations are characterised by learning and search—significant components of the resulting aesthetics. These hybrid relations open the door to alternative approaches to performance. This leads us to conclude that datasets serve both as cultural probes and valuable tools for practice-based AI experimentation.

From our experience at the Intelligent Instruments Lab, this aligns with a certain posthuman sentiment among performers, who are interested in relinquishing full control of the instrument and seeking a more dialectical relationship with it. Our findings underline how such technoscientific developments need to be explored from an intermediary laboratory positioning so that we may better grasp what such alternative or alien sociality may become and what orientations it can inspire, all the while advancing responsible AI.

Author’s note

The saxophone and guitar models are available both as RAVE models and proprietary models of the Living Looper: https://huggingface.co/Intelligent-Instruments-Lab/rave-models.

Data availability statement

The datasets presented in this article are not readily available because the dataset is of artistic nature and is therefore owned by the researcher and her collaborators. An exemption is the heritage data (dataset no. 4), which is available at the Ísmús open access online cultural archive. An excerpt from the Showcasing Event at Mengi is available at: https://www.researchcatalogue.net/shared/9ef80fddc52d142737aa2c711463e45a.Requests to access the datasets should be directed to aGFsbGFzdGVpbnVubkBoaS5pcw==.

Author contributions

HS: Writing – original draft, Writing – review & editing. TM: Writing – review & editing.

Funding

The author(s) declare that financial support was received for the research and/or publication of this article. This research is supported by the European Research Council (ERC) as part of the Intelligent Instruments project (INTENT), under the European Union’s Horizon 2020 research and innovation programme (Grant agreement no. 101001848).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that no Gen AI was used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Footnotes

1. ^A link to an excerpt from the Showcasing Event at Mengi: https://www.researchcatalogue.net/shared/9ef80fddc52d142737aa2c711463e45a

2. ^As Loftsdóttir notes in the interview, this vikivaki portrays Ólafur’s conflict with elves as a form of hyperdulia.

3. ^During my doctoral studies, I intuitively started referring to my experimental design as “activations.” I saw these unfolding in two ways: I activated inherited ways of doing while simultaneously entering into altered material engagement, where the activity of material would create alternative ways of existing (for further information, Stefánsdóttir, 2023).

4. ^Music making, as other human undertakings, relies on nested cognition that may span “multiple scales of complexity, many of which reach far beyond what is taking place here and now” (Kiverstein and Rietveld, 2018, p. 157). In this way, memory has come to play a significant role in contemporary improvisational practice (Mayas, 2019). The citation-like performance is therefore not unique to work with intelligent performance platforms; however, as will be detailed in the Analysis (3.5) and Discussion (4) sections, it takes on a different form in this project due to the emergent hauntographic element and the ethics and care that it inspires.

5. ^It is not until late in one of the interviews that my great-grandmother sounds as if she is familiar with one of the interviewees, evident by the altered tone of her voice. This makes me wonder if it is ever “her” that I hear in the interviews, revealing how ethnographic techniques come to reconfigure persons. Similarly, she sounds apologetic about her singing and notes that she hopes that this is not being recorded. Today, the unedited material is accessible in the Open Access Ísmús archive. The archive may be accessed through this link: https://www.ismus.is.

6. ^The only opportunity to enter this with my violin, is when it enters relations with a particularly resonant architecture.

7. ^This aligns with my habituation, or how the tactile and grainy agency of my Hopf violin has shaped my intentionality. This is evident in how I approach other technologies, such as photography and audio and video field recording through an interest in the tactile and microscopic (Stefánsdóttir, 2023; Stefánsdóttir and Östersjö, 2022).

8. ^It became apparent that they sensed that there was something different going on here than with music sampling or electroacoustic music, and that it even afforded moments of poetic beauty. Similarly, the guests did not find the performance offensive and, through the event, as attested by the audience afterwards, people became aware of the Ísmús open access online cultural archive, which was previously unbeknownst to many of them. In this way, we can see how the prototypes had the potential to create new connections.

9. ^This is a modulation of Andersen et al. (2019) “digital craft-machine-ship”—a concept similar to assemblage or apparatus. This term challenges the typical anthropocentric view of machines and suggests “a new kind of digital craftsmanship, one in which we may craft with the digital and find ways to make the machines craft along with us” (p. 32).

10. ^This aligns with P–P Verbeek’s (2008a) recommendations for considering design as a means of combining agencies and taking responsibility for them.

References

Ahmed, S. (2006). Queer phenomenology: Orientations, objects, others. Durham: Duke University Press.