Affect-Driven Modelling of Robot Personality for Collaborative Human-Robot Interactions

Collaborative interactions require social robots to adapt to the dynamics of human affective behaviour. Yet, current approaches for affective behaviour generation in robots focus on instantaneous perception to generate a one-to-one mapping between observed human expressions and static robot actions. In this paper, we propose a novel framework for personality-driven behaviour generation in social robots. The framework consists of (i) a hybrid neural model for evaluating facial expressions and speech, forming intrinsic affective representations in the robot, (ii) an Affective Core, that employs self-organising neural models to embed robot personality traits like patience and emotional actuation, and (iii) a Reinforcement Learning model that uses the robot's affective appraisal to learn interaction behaviour. For evaluation, we conduct a user study (n = 31) where the NICO robot acts as a proposer in the Ultimatum Game. The effect of robot personality on its negotiation strategy is witnessed by participants, who rank a patient robot with high emotional actuation higher on persistence, while an inert and impatient robot higher on its generosity and altruistic behaviour.

Abstract-Collaborative interactions require social robots to adapt to the dynamics of human affective behaviour. Yet, current approaches for affective behaviour generation in robots focus on instantaneous perception to generate a one-to-one mapping between observed human expressions and static robot actions. In this paper, we propose a novel framework for personality-driven behaviour generation in social robots. The framework consists of (i) a hybrid neural model for evaluating facial expressions and speech, forming intrinsic affective representations in the robot, (ii) an Affective Core, that employs self-organising neural models to embed robot personality traits like patience and emotional actuation, and (iii) a Reinforcement Learning model that uses the robot's affective appraisal to learn interaction behaviour. For evaluation, we conduct a user study (n = 31) where the NICO robot acts as a proposer in the Ultimatum Game. The effect of robot personality on its negotiation strategy is witnessed by participants, who rank a patient robot with high emotional actuation higher on persistence, while an inert and impatient robot higher on its generosity and altruistic behaviour.

I. INTRODUCTION
I N collaborative Human-Robot Interaction (HRI) scenarios, where robots need to effectively negotiate with humans, it is particularly important for them to be sensitive to human affective behaviour [1]. Furthermore, instead of using static behaviour policies that fail to engage users over continued interactions [2], robots should understand the affective impact of their interactions, and, over time, evolve their behaviour.
Much of the current research in Affective Computing and Social Robotics focuses on instantaneous (frame-based or using very-short sequences) affect perception (see [3], [4] for an overview). Although this works well for short-term interactions, longer context-driven conversations require a robot to analyse and understand human behaviour over an entire interaction [5]. This is primarily because frame-based techniques rely on glimpses of heightened audiovisual stimuli to infer the affective state of the user [3], missing out subtle nuances in expressions. To engage the users, robots should form an evolving understanding of human behaviour [6] by modelling affective representations [5], [7] that track user behaviour over time and create a dynamic and robust model for its affective appraisal. Additionally, for naturalistic interactions, developing idiosyncratic behavioural tendencies can provide means for embedding specific personality traits in robots to influence their affective appraisal as well as their behaviour [8].
Humans are particularly known to possess such innate behavioural traits that affect their experiences. The affective core [9], [10] of an individual acts as a primitive pleasure model modulating agency and intrinsic motivation. It impacts an individual's behaviour, given their temperament, that is, an inherent inclination that shapes up an individual's behaviour [11], influencing their subjective appraisal of the surroundings [12] as well as their decision-making [13]. Modelling such an affective core for robots can provide the necessary modulation for personality-driven behaviour learning.
Collaborative HRI scenarios also require modelling naturalistic interaction dynamics between humans and robots. Hence, achieving adaptability, such that a robot shows an improved and evolving understanding of user behaviour, becomes a principal objective. Exhibiting such 'personal ontogeny' [14] can also hint at the robot intelligently interacting with users. This work proposes the robot's affective appraisal as the basis of learning adaptive interaction behaviours. Different from existing approaches that mimic the user's expressions [15], [16], here we propose forming evolving intrinsic affective responses in the robot towards a user's affective state, providing for personalised and adaptive interaction capabilities. These affective responses are modelled as the robot's affective memory [17], that is, the affective impact of past interactions with a user, as well as its mood [18], [19], representing its affective appraisal. This affective appraisal is modulated by specifically modelled intrinsic personality traits, or the affective core of the robot to learn appropriate robot behaviour while negotiating resources with users in the Ultimatum Game [20]. The main contributions of this work can be summarised as the following: 1) A deep, hybrid neural model is trained for robust multimodal affective appraisal, evaluating the user's facial expressions and speech. These evaluations help form the affective memory and the intrinsic mood of the robot. 2) An Affective Core for the robot, modelled using recurrent self-organising neural networks, is proposed to enforce distinct personality dispositions on the mood of the robot. Two influences, namely, 'time perception' as the impact of the duration of interaction, and the robot's 'social conditioning' or emotional actuation, that is, the intensity with it experiences an interaction, are explored. 3) Robot's mood is then used to learn an optimal negotiating behaviour in the Ultimatum Game [20]. An actor-critic-based [21] Reinforcement Learning (RL) model is proposed that learns to negotiate resources with users based on their affective responses to the robot's offers. The multi-modal appraisal allows the robot to comprehend evolving human affective behaviour. The emergence of different personality dispositions, as a result of the affective core of the robot, modulate its intrinsic mood, forming the basis for learning robot behaviour. The Ultimatum Game is explored as it underlines the expectations from robots in collaborative HRI scenarios, particularly concerning adaptability and naturalistic interactions.
II. RELATED WORK The affective impact of one's interactions with others plays an important role in human cognition [22]. The core affect in an individual, forms a neurophysiological state [10] resulting from the interplay between the valence of an experience and the emotional arousal it invokes. This influences how people perceive situations and regulates their responses. From early stages of development, human behaviour is seen to be governed by such an affective core [9] that develops, initially, as a procedural understanding of their surroundings, and later, to a more cognitive representation that influences human agency and behaviour. Self-regulatory aspects of personality acquired as a result of interactions are essential for cognitive development and act as anchors for perception and understanding. Such individualistic attributes of temperament, evolving into personality, can be seen as the "basis for dispositions and orientations towards others and the physical world and for shaping the person's adaptations to that world" [11].
Understanding the evolution of human affective behaviour enables us to emulate such characteristics in social robots. It allows robots to ground intrinsic models of affect to improve their interaction capabilities. This section presents a brief overview on multi-modal affective appraisal (Section II-A) and behaviour synthesis (Section II-B) in social robots discussing different existing frameworks that use affective appraisal as the basis for modelling robot behaviour in HRI scenarios.

A. Multi-modal Affective Appraisal in Social Agents
Humans interact with each other using different verbal and non-verbal cues such as facial expressions, gestures and speech. Although various outward signals [23] can be observed to model emotion perception in agents, here we discuss facial expressions and auditory signals as the predominantly used modes of perception, and how these can be combined for multimodal affect perception in agents.

1) Facial Expression Recognition (FER):
Evaluating facial expressions is one of the most straightforward and commonly used approach for affect perception. Facial expressions can be categorised into several emotional categories [24] or represented on a dimensional scale [23], [25]. Traditionally, computational models have used hand-crafted features such as shapebased, spectral or histogram-based analysis, and other featurebased transformations for affect perception (see [3], [4], [26] for a detailed analysis). More recently, deep learning has enhanced the performance of FER models by reducing the dependency on the choice of features and instead, learning these features directly from the data [27], [28]. Although these work well in clean and noise-free environments, spontaneous emotion recognition in less controlled settings is still a challenge [3]. Thus, the focus has now shifted towards developing techniques that are able to recognise facial expressions in real-world conditions [29], [30], robust to movements of the observed person, noisy environments and occlusions [31].

2) Speech Emotion Recognition (SER):
Affective responses can also be evaluated using speech, either by processing spoken words to extract the sentiment behind them or understanding speech intonations. While spoken words convey meaning, paralinguistic cues enhance a conversation by highlighting the affective motivations behind these spoken words [32]. Despite providing information about the context and intent [33] in an interaction, it is difficult to deduce the emotional state of the individual using only linguistic information [34]. Extracting spectral and prosodic representations can help better analyse affective undertones in speech. Different studies on SER (see [35] for an overview) make use of representations such as Mel-Frequency Cepstral Coefficients (MFCC) or features like pitch and energy to evaluate expressed emotions. More recently, (deep) learning is employed to extract relevant features directly from the raw audio signals [36], [37].
3) Combining Modalities: As certain emotions are better expressed using facial expressions (or body gestures) while others are elucidated in speech [38], considering behavioural cues across multiple modalities has shown to improve the perception capabilities of agents [39]. Most of the current approaches [37], [40] combine different modalities to recognise the emotions expressed by an individual. This combination can either be achieved using weighted averaging or majority voting [39] from individual modalities [41] or feature-based sensor fusion [37], [42] and deep learning [40].

4) Intrinsic Representation of Affect:
For long-term adaptation, it is important that robots not only recognise affect but also model continually evolving intrinsic affective representations [43]. Kirby et al. [5] explore slow-evolving affect models such as moods and attitudes that consider personal history and the environment to estimate an affective state for the robot. Barros et al. [19] propose the formation of an intrinsic mood that uses an affective memory [17] of an individual as an influence over the spontaneous perception. The WASABI model [44] represents the intrinsic state of the robot on a PAD-scale that adapts as the agent interacts with the user. In the SAIBA framework [45], the agent's intrinsic state is modelled using mark-up languages that models intent in the robot and uses it to generate corresponding agent behaviour. The (DE)SIRE framework [46] represents this intrinsic affect as a vector in a 4-d space for the robot which is then mapped to corresponding expressions across different modalities. Schröder et al. [47] create four different virtual characters to measure user engagement, with their behaviour corresponding to a particular quadrant of the arousal-valence space. Each character thus tries to invoke the same responses in the users as its inclination. Although all these approaches are able to provide necessary biases for modelling intrinsic personalities in agents, they require careful initialisation, across n-dimensional vector-spaces, to result in the desired effect. It will be beneficial if these intrinsic representations could be learnt dynamically by the agent as a result of its interactions.

B. Behaviour Synthesis in Social Agents
Recent works on behaviour learning in social agents investigate the role of affect as a motivation to interact with their environment. Such strategies may include affective modulation on computation of the reward function where explicit feedback from the user is shown to speed up learning [48]. Alternatively, affective appraisal can be viewed as an inherent quality of the robot, motivating it to interact with its environment [8], [49]. Affect is modelled as an evaluation of physiological changes (changing battery level or motor temperatures) that occur in the robot, with their behaviour influenced by homeostatic drives that lead towards a stable internal state [50]. Other approaches examine different cues such as novelty and the relevance of an action to the task to appraise the robot's performance [51]. In case of value-based approaches [52], the state-space of the robot is mapped onto different affective states and the value of any state represents the affective experience of the robot in that state. Reward-based approaches, on the other hand, consider temporal changes in the reward or the reward itself as the basis of the robot experiencing different affective states [53].
Proposing a framework for modelling naturalistic interactions, this work attempts to move beyond expression recognition tasks towards grounding evolving affective representations in the robot that help estimate its role in an interaction. Adapting the robot's intrinsic state in response to changing human behaviour will allow for smoother transitions during interactions. This will help avoid the pitfalls of frame-based expression recognition techniques that facilitate only static robot behaviours. Furthermore, affective appraisal contributing, not just to the intrinsic state of the robot but also acting as a reward for appropriate behaviour, is expected to improve the robot's ability to comprehend user responses, providing an evaluation of its own performance.

III. THE PROPOSED FRAMEWORK
In this paper, we propose an affective framework for modelling robot personality and behaviour generation, consisting of four main components (see Fig. 1). Firstly, we present the multi-modal perception (Section III-A) that observes the facial expressions and speech intonations of the user to form intrinsic models of affect in the robot. Continuously tracking the user's affective state to model an affective memory [17] of the user elicits an intrinsic response towards the user in the form of the robot's mood. Secondly, the affective core is proposed for the robot (see Section III-B) that models personality dispositions on its affective appraisal. Section III-C, describes how the different intrinsic and extrinsic evaluations such as users' expressions, the affective memory and the affective core of the robot impact its mood. This intrinsic mood represents the robot's state and forms the basis for the behaviour learning model. Section III-D describes the RL model that enables the robot to learn to effectively negotiate with users in the Ultimatum Game scenario.

A. Multi-Modal Affect Perception
The affect perception model, adapted from [18], [19], consists of three components, namely the Multi-Channel Convolutional Neural Network (MCCNN) network [54] for multi-modal feature extraction and fusion, the Perception-Growing-When-Required (GWR) for prototyping extracted features to improve the robustness of the model to changing lighting conditions and variance within an individual's expressions, and the affective memory [17] that evaluates how the affective state of the user evolves during an interaction (see Fig 1) .
1) The MCCNN Network: MCCNN [18], [54] consists of two separate channels for processing facial and auditory information and then combines the learnt features into a combined representation. Rather than using categorical labels for classification, the model is adapted to represent affect in the form of the valence and arousal dimensions (see Fig. 1).
The face channel takes a (64 × 64) greyscaled mean-face image from every 12 frames (considering a 500 milliseconds window) recorded at 25 FPS. It consists of 2 convolutional (conv) layers, each followed by (2 × 2) max-pooling. The first layer performs (9 × 9) convolutions while the second consists of (7 × 7) filters using shunting inhibition [55] to obtain filters robust to geometric distortions. The conv layers are followed by a fully-connected (FC) layer consisting of 512 units.
The audio channel uses Mel-spectrograms computed for every 500 milliseconds of the audio signal, re-sampled to 16 kHz and pre-emphasised. A frequency resolution of 1024 Hz is used, with a Hamming window of 10ms, generating Melspectrograms consisting of 64 bins with 65 descriptors each. The audio channel consists of two conv layers with a filter size of (9 × 10) and (7 × 7) each followed by (2 × 2) max-pooling. The conv layers are followed by a FC layer with 512 units.
The FC layers from both the face and audio channels are concatenated into a single dense representation consisting of 1024 units and connected to another FC layer consisting of 200 units. This enables the network to be trained to extract features that are able to predict arousal and valence values by combining the two modalities (see Section IV-A1).
2) Perception-GWR: Even though the MCCNN model can predict (using a 2-unit linear activation-based MCCNN output layer) the affective state of the user in terms of the arousal and valence it encodes, variance within an individual's expressions may result in different outputs for the same affective state. Thus, to allow for a more robust approach, it is beneficial to adopt a developmental view on affect perception that can account for the variance with which users express the same affective state [54]. We achieve this by using a Growing-When-Required (GWR) network [56] that incrementally builds feature representations as the model receives different inputs, accounting for the variance in audio-visual stimuli (see [19] for a detailed analysis). The extracted (fused) feature representations from the 200 unit FC layer are passed to the Perception-GWR which learns feature prototypes, in an unsupervised manner, that represent the users' expressions over the two modalities (see Section IV-A2). Thus, rather than considering the output of the MCCNN classifier, we extract the learnt feature prototypes from the Perception-GWR by taking the two winner neurons closest to the input. These winner neurons are then classified using the MCCNN-output layer into the encoded arousal-valence values.
3) Affective Memory: To model long-term interactions, the robot needs to account for past interactions with users, forming a memory model that grows and adapts with them. Such an affective memory [17] (see Fig. 1), developing as the robot interacts with the user, forms an expectation model for the robot that can reduce the impact of sudden changes in perception due to misclassifications or noise. As users interact with the robot, 2 Best Matching Units (BMUs) or winner neurons from the Perception-GWR model, that is, feature prototypes for each 500 milliseconds of audio-visual input, are used to train the robot's affective memory (see Section IV-A2). This memory is modelled using a Gamma-GWR network [57] (explained in detail in Section III-B) consisting of neurons with recurrent connections remembering past interactions.

B. Modelling the Affective Core of the Robot
The Affective Core in humans acts as an emotional disposition, not just contributing towards their affective appraisal, but also governing their behaviour [9]. Similarly, an affective core for a robot can be used as the basis for inherent personality traits that may influence its perception and behaviour. This work explores the influence of two intrinsic qualities, namely time perception and the social conditioning of the robot, forming the affective core of the robot (see Fig. 1). While time perception refers to how the robot is impacted by the duration of an interaction, social conditioning accounts for the acculturation or emotional actuation of the robot as a result of its repeated interactions with affective stimuli. These qualities, amongst others, are also found to have an influence on personality formation in infants [11] resulting from engagement with caregivers.
We propose the use of Recurrent Gamma-GWR models [57], equipped with a Gamma-context memory [58], for modelling the affective core of the robot. Yet, rather than focusing on the temporal evolution of an expression, for example, onset to offset for a facial expression [3], we focus on tracking the evolution of the overall affective behaviour over several time-steps. The encoded arousal-valence values obtained by classifying the feature prototypes resulting from the perception model are examined over the entire duration of the interaction. To account for such temporal dynamics, each neuron is equipped with a fixed number of context descriptors which increase the temporal resolution of the model.
The learning rule and activation functions for the GWR model [56] are modified to account for activation of the neurons from the previous K (number of Gamma filters) time-steps. The BMU or winner neuron b is computed as follows: where d i is the distance of the neuron i from the data-point. The activation takes into account both the distance between the input and the weights at the current time-step as well as uses the context activation over the last K gamma filters: where x(t) represents the current input (in this case, the 200d combined dense representation for the Perception-GWR and the affective memory and the encoded 2-d arousal-valence value for the affective core), w i represents the weight vector of the i th neuron, α w and α k are constants influencing the modulations from past activation and the current input, C = [c i 1 , c i 2 , . . . , c i k ] is the set of context vectors for the i th neuron with k = 1, 2, . . . , K being the Gamma filter order. Global context C k (t) is given as: where β controls the influence of the previous activation on the current processing of input, b(t − 1) is the winner neuron from the previous time-step and c 0 Once the BMU is selected, the weight of the winning neuron and the context vectors are updated as follows: where i is the learning rate that modulates the updates and does not decay over time. The firing counter η i , on the other hand, is used to modulate learning [56]. It is initialised to 1 (η 0 = 1) and decreased according to the following rule: where constants κ and τ i control decay curve behaviour. 1) Interaction Time Perception: Starting from the same initial affective state, the robot, given its inherent time perception, maintains its intrinsic state for the entire duration of the interaction with a user. To simulate the impact of time, a decay function (y = exp(−τ t)) is implemented that modulates the affective state of the robot at any given time. For simulating patience, the decay is slow and gradual (τ = 0.01) allowing the robot to maintain its affective state for a longer duration while in case of impatient time perception, this decay is rapid (τ = 0.08). The decay function dynamics can be seen in Fig. 2a and Fig. 2b, respectively. The empirical choice of τ values assures smooth decay curves over a minimum of 90 time-steps. These values can be adapted as per the desired impact of time perception in the robot.
Given a patient or impatient modulation, an affective core bias is modelled using the Gamma-GWR model. The initial state of the robot is set to a mean positively excited arousal, valence values of (0.5, 0.5) and then modulated over time using the decay function y. At each time-step, the Gamma-GWR model receives this modulated input state and forms intrinsic prototypes following the process described in Eq. 1−6. This models the decay dynamics of the robot's intrinsic state at different time-steps, forming a time perception bias that encodes patient (Fig. 2c) or impatient (Fig. 2d) behaviour.
2) Social Conditioning: Social conditioning of the robot can be used to formulate anchors for its affective appraisal. The robot, through continued and repeated interaction with affective stimuli, can get acculturated, developing qualities central to its personality. Such conditioning can be excitatory (higharousal), amplifying the impact of perception, or inhibitory (low-arousal), diminishing it. To model such influences, the robot is shown videos (see Section IV-B for details) encoding different emotional intensities. To model an excitatory effect, videos encoding high arousal are used whereas, for the inhibitory core, low arousal videos are used. The videos (a) Excitatory Affective Core.

C. Mood Formation for the Robot
The intrinsic mood forms the affective appraisal of a robot, estimating its intrinsic state during interactions. Given past experiences with a user, the robot monitors their affective behaviour to conclude their affective state. This input, modulated by the robot's affective core, elicits an emotional response in the robot in the form of its mood (see Fig. 1), acting as the motivation for subsequent interactions.
In this work, we model robot's mood as a Gamma-GWR [57] (following Eq. 1−6) that evaluates the current behaviour of the user (affect perception), modulated by past experiences (affective memory) to form an intrinsic affective response in the robot towards the user. This is further influenced by the robot's affective core. All these inputs are processed asynchronously to allow for the evolution of the robot's mood even when they are sparsely available. This results in the robot forming an organic affective response towards the user rather than merely mimicking them. Different affective core biases in the robot result in the same stimulus being evaluated differently. For example, a patient robot with an excitatory conditioning will be able to retain its positive mood for longer, despite receiving a series of negative inputs. This is important as it can be used to integrate different personality traits in the robot, with different combinations of the affective core influences (see Table II) yielding significantly different mood estimates.

D. Learning Robot Behaviour
The affective appraisal of the robot, under specific affective core traits can help learn generate different robot behaviours in collaborative HRI scenarios. In this work, we explore the Ultimatum Game [20] to embed different negotiating behaviours in the robot, given its affective core. We propose, a Deep Deterministic Policy Gradients (DDPG)-based actor-critic model [21] that learns to interact with human participants, incorporating the robot's mood, both in the state-value function as well as in the reward received by the robot. The proposed model aims to evaluate how the robot, given its personality traits, can learn to successfully negotiate resources with human participants. Furthermore, evaluation of the robot by the participants under different affective core conditions can highlight the contribution of the robot's personality towards its negotiation capabilities.
1) The Ultimatum Game: The traditional design for the Ultimatum Game [20] involves two participants namely, a proposer and a respondent, negotiating a split of resources (usually money). The proposer offers a split, based on which the respondent either accepts or rejects the offer. Only if the offer is accepted, resources are shared as per the agreed split.
We extend this design by incorporating a negotiation between the participants (see Fig. 4) and the Neuro-Inspired Companion (NICO) robot [59] acting as the proposer. NICO and each participant are given 100 points that can be exchanged for 20 bonbons, with every 5 points fetching them one bonbon. Bonbons are used as they give a visual motivation for the negotiation. As NICO makes offers to the respondent, if they accept the offer, the interaction culminates with both receiving the agreed split. In case of a rejection, NICO asks the participants for reasons for their rejection and based on their affective responses, it appraises their affective state as an evaluation of the offer, eliciting a change in its own mood. This mood (or change in the mood thereof) is used to update its offer in a way that the participant may accept the new offer, without NICO losing a lot of points. The participant and NICO thus negotiate a split of the 100 points with NICO updating its offer upon each rejection. To assure that negotiations come to a conclusion, the negotiation is aborted with no one getting any points after the participant rejects 20 consecutive offers.
2) Learning to Negotiate: While negotiating with the participants, the intrinsic mood of the robot after each rejection, concatenated with the rejected offer value, is mapped to the state-space of the robot to generate actions in the form of increments or decrements on the previous offer. This results in a continuous, high-dimensional action-space making the use of traditional Q-learning approaches difficult as they become intractable in such high-dimensional spaces [21]. Also, it is desirable that these updates to the offer are not modelled as fixed increments or decrements to enable a more naturalistic negotiation between participants and NICO. Thus, this work employs a DDPG-based actor-critic model [21] to learn an optimal negotiating behaviour.
The model consists of two separate models (see Fig. 5) for the actor and the critic, respectively. The actor takes the robot's current mood (mean arousal-valence vector computed from all the neurons of the mood Gamma-GWR model (see Section III-C)) as well as the previously rejected offer 1 as inputs and concatenates them into a single 4-tuple representing the state of the robot. This state is passed to the actor, predicting a real-valued update over the previous offer.
The critic network takes the current state of the robot as well as the actor-generated update value as inputs to evaluate the actor's performance, predicting a Q-value ∈ R for the stateaction pair. This predicted Q-value and the reward received by the robot (see Section IV-C for details) are used to update both the critic and the actor [21].
The actor and critic are modelled as Multilayer Perceptron (MLP) networks (see Fig. 1). The actor network takes as input the current state of the robot (4−tuple). This input is connected to a FC layer consisting of 50 units which is further connected to an output neuron, predicting real-valued updates on the offer. For the critic, the state 4−tuple and the predicted update value are connected to individual FC layers of 50 units each. These FC layers are then concatenated and connected to another FC layer of 10 units combining the representations. Finally, a single output neuron predicts the Q-value ∈ R for the state-action pair.

A. Multi-Modal Affect Perception
1) The MCCNN Network: Training the MCCNN model requires multi-modal datasets that provide good quality samples for both vision and speech modalities with continuous arousalvalence annotations. Most of the available multi-modal datasets rely on the visual information as the dominant modality deciding affective labels. This is seen in the Aff-Wild [29] and AFEW-VA [30] datasets where the audio samples are affected by background music or noise. On the other hand, datasets like RAVDESS [60] and SAVEE [61] provide clean audio and video samples but use categorical labelling.
Thus, we pre-train the face-channel of MCCNN combining Aff-Wild and AFEW-VA datasets with normalised arousalvalence labels ∈ [−1, 1] for each frame. The face-channel is trained with a 60:20:20 (train, validation, test) data split reaching competitive Concordance Correlation Coefficient (CCC) scores of 0.68 for arousal and 0.57 for valence (compared to baselines [29], [30]). The face-channel model is then used to classify facial images from the RAVDESS and SAVEE datasets, generating arousal and valence labels. These labels are then used to train the combined MCCNN network using audio-visual information. This approach is inspired from Lakomkin et al. [62] who conclude that augmenting datasets using labels from one modality contributes positively towards improving the overall performance of the model.  Table I for details) controls the frequency of weight updates while the insertion threshold controls when a new neuron needs to be added. This results in a total of 458 neurons which sufficiently represent the entire training set (≈ 20k data points). These neurons act as feature prototypes for the entire dataset, enabling a robust evaluation of the arousal-valence represented in the data samples. Fig. 1 shows the Perception-GWR with each neuron plotted according to the arousal and valence it encodes. The choice of the different thresholds is determined empirically, given the resultant GWR's ability to represent the training set.
The affective memory Gamma-GWR model consists of 10 context descriptors, implementing a temporal resolution of 10 time-steps. The model is trained following Eq. 1−6. The chosen parameters (see Table I) allow the model to map, and remember, the affective context for at least one complete interaction (5 − 8 seconds). Fig. 1 shows the affective memory for the user with each neuron plotted according to the arousal and valence it encodes. The insertion and habituation thresholds control the update of existing neurons and add new neurons only when needed. A separate affective memory is created for each user interacting with the robot.

B. Mood Formation under Affective Core Influence
To evaluate the impact of the different affective core influences on the mood formation of the robot, 20 videos each from the KT Emotion Dataset [17] and the OMG-Emotion Dataset [63] are selected as both these datasets consist of clean audio-visual samples encoding different affective contexts. Each video is split into data-chunks representing 500 milliseconds of audio-visual information. The pre-trained Face Detector from the Dlib python library is used for extracting faces while the python Scipy Signal processing library is used to generate mel-spectrograms for each data-chunk. The data is input sequentially to the MCCNN and perception-GWR for feature extraction and representation while the different Gamma-GWR networks model the affective memory and affective core biases. The mood Gamma-GWR model is trained for 10 epochs taking as input, for every 500 milliseconds of audiovisual input, 2 BMUs from the Perception-GWR encoded into the arousalvalence values they represent, the mean arousal-valence vector from the affective memory, 5 BMUs from the social conditioning Gamma-GWR and 2 BMUs from the time perception Gamma-GWR. Different combinations of affective core biases are explored to evaluate how these influence mood formation in the robot. A Two-Sided Mann-Whitney U test shows significant differences (see Table II) in the resultant mood under different affective cores, compared to when no affective core bias is used. The model is shown the same video sequences changing only the affective core between repetitions. Keeping all other variables constant, any change in the resultant mood can be attributed to the affective core.
The model, for the same input stimuli, is seen to form different estimates of its intrinsic mood under different affective core biases (see Fig. 6 for arousal and Fig. 7 for valence distributions). The affective appraisal of the robot under the different affective core biases is compared to the No Core condition which considers only the agent's current perception and its affective memory for mood formation.
The arousal values show more deviation from the baseline due to the excitatory or inhibitory effect of the affective core. Since these biases predominantly affect the intensity of the robot's intrinsic mood, the corresponding plots for the valence show much less deviation. On the other hand, the patient or impatient biases impact both the valence and arousal.
This effect is validated by conducting a Two-sided Mann-Whitney U test [64] using the resultant mood estimates (arousal-valence) under different affective core biases with the alternative hypothesis that the resultant mood is different from the No Core condition. Robot mood results in significantly different arousal and valence distributions (see Table. II) for the same input. Even though the input videos are chosen to cover different affective contexts, the intrinsic mood of the agent is influenced by the respective affective core. For a social robot, this means that rather than mimicking the user, the robot, true to its intrinsic personality traits, can formulate distinct affective responses towards the user.

C. Pre-Training Robot Behaviour Model Off-line
The intrinsic mood of the robot is used as the motivation to learn different robot behaviours in the Ultimatum Game. As a negotiation might be short-lived and may not provide with sufficient examples to train an RL model, we pre-train the model using a probability-based reward function. The acceptance of an offer is modelled in a stochastic manner (see Eq. 7) based on the fraction of the resources being offered to the participant.
The reward function models two competing goals for the robot; keeping a higher share of resources for itself and, at the same time, eliciting a positive response from the respondent. The two components of the reward are explained as: • The offer reward provides intermediate positive rewards if the new offer increases the respondent's share while keeping the robot's share > 50%. These rewards smoothen the learning curve, guiding the robot to an optimal behaviour. • The mood reward computes a (cosine) distance measure between the previous and the new mood state of the robot and rewards a positive change in the robot's mood. As the robot's mood reflects the respondent's affective state, it learns to evoke positive responses to its offer. The DDPG model is pre-trained using data samples generated by processing 20 videos from the KT Emotion dataset through the MCCNN -Perception-GWR model and selecting the two BMUs. This is augmented by adding randomly generated arousal-valence vectors to cover the entire state-space. A total of 500 random samples are added, drawn from a standard normal distribution sliced to range ∈ [−1, 1]. To match video dynamics, each added sample undergoes an interaction decay (forming a trajectory) simulating robot mood. This decay emulates affective responses from a respondent that witnesses consecutive unfair offers from the robot and rejects them.
The model balances both the offer and mood reward and converges to offering 40 − 60% points, yielding an optimal reward for the robot (see Fig. 8b). The average number of interactions reduce to ≈ 10 as the model learns to find an optimal offer that faces fewer rejections (see Fig. 8a).

D. User Study
To investigate whether the affective core of the robot results in different negotiation strategies in the Ultimatum Game, we conducted a user study that assessed how different participants evaluate NICO's behaviour (realised using the pre-trained RL model). Furthermore, quantitative performance factors like success-rate (acceptance of the robot's offer), mean accepted offer value, and the number of interactions are also evaluated.
The user study was conducted with 31 participants (20 male, 11 female) from 16 countries in the age group of 18 − 49. All participants, recruited amongst university students and employees, reported conversational proficiency in English (the language used to model interactions). The participants were briefed about the objectives of the experiment and the interaction procedure and they provided informed consent for their participation. The consent form and the experiment protocol were approved by the Ethics Commission 3 of the Department of Informatics, University of Hamburg.
The experiment set-up (see Fig. 4) consists of an artificially well-lit room to exclude effects of changing natural lighting conditions. The participants and NICO are positioned across a round-table, opposite to each other. Bonbons are placed on the table along with a microphone.
1) Experiment Conditions: The user study is conducted as a between-group study with two condition groups. Each group consists of two sub-conditions implementing the No Core condition as the baseline, along with one of the measured conditions. In the No Core condition, the robot is not embedded with any affective core and considers only the perception input for its intrinsic mood. The two condition groups are: • Patient High-arousal: In this group, the measured condition involves the robot with patient time perception and excitatory social conditioning biases to influence its mood formation.
A total of 16 participants were randomly assigned to this condition group. • Impatient Low-arousal: In this group, the measured condition involves the robot embedded with an impatient time perception and inhibitory social conditioning bias that influence mood formation. The second condition group consisted of 15 randomly assigned participants.
2) Experiment Protocol: Once the participants are assigned to a condition group, they are introduced to the experiment set-up where NICO greets them by modelling a short interaction with the participants informing them about the rules of the game. Google Text-To-Speech python library is used to generate NICO's voice. During this interaction, NICO asks the participants about their excitement towards participating in the experiment. With this, it builds a model of its affective memory and intrinsic mood as a starting point for both sub-conditions. After the introduction round, NICO starts negotiating with the participant, randomly loading the first sub-condition. The negotiations consist of two distinct phases: • Offer Phase: NICO makes an offer to the participants which they can accept (saying 'Yes') or reject (saying 'No').
A rejection results in NICO asking them to explain their rejection while monitoring their affective responses as they describe their opinion about the offer. • Update Phase: Observing the participants' responses, NICO models its mood, as an evaluation. The mood represents the current state of the robot and is used to compute a new offer for the participants. Each condition begins a random unfair offer with at the most 20 points offered to the respondent to make sure that the participants are inclined to negotiate at least once with the robot. Negotiations continue until the participant either accepts the offer or rejects 20 consecutive times (empirically defined limit). The participants are told that the robot shall abort the negotiation if a stalemate is reached, blinding them from this limit to avoid any behavioural conditioning.
After each sub-condition, the participants fill out a pseudonymised 3-part questionnaire about their experience with the robot to measure any reported difference in robot behaviour between the two sub-conditions. Finally, participants are debriefed and informed about the condition they were assigned to. In the absence of any monetary compensation, as a reward for participation, they are offered all the bonbons.
3) Quantitative Results: For a quantitative evaluation (see Table III) of the robot's performance under different conditions, several factors are examined. The success rate denotes the fraction of participants that accepted the robot's offer. The average number of interactions denotes the number of rejections, on average, before an offer was accepted while the average accepted offer highlights the accepted offer. The fraction of offers where the participants were offered 50% or more of the points by the robot is also reported.
The Patient High-arousal condition, on average, took longer than the baseline condition to get the participant to accept an offer with a large effect size (G = 0.77) shown using the Hedges' G test. The Impatient Low-arousal condition however, needed fewer interactions than the baseline condition with a medium effect size (G = 0.44) in the other direction. Comparing the two measured conditions directly thus, shows a large effect size (G = 0.87) for the number of interactions. Furthermore, under the Impatient Low-arousal condition, the robot was able to reach an offer >= 50% of the points for 80% of the participants as compared to 62% for the Patient High-arousal condition. Despite reaching a higher offer more often, the success rate and the mean accepted offer for the Impatient Low-arousal condition were lower than those for the Patient High-arousal condition. As participants increasingly received more points in the Impatient Low-arousal condition, they frequently exhausted the 20 offers, anticipating the robot to increase the offer further. This observation is validated by the mean offer value before aborting being higher for the Impatient Low-arousal condition with a large effect size (G > 2.0) between the two measured conditions. 4) Qualitative Results: Since the participants' subjective evaluation of the robot's negotiation strategy influences their acceptance or rejection, quantitative factors provide only partial information about the robot's overall performance. Thus, participants' evaluations on the 3-part Likert-scale questionnaire, based on the GODSPEED [65], Mind Perception [66] and Asch's Personality Impression tests [67], are examined.
As participants evaluate each measured condition with respect to the baseline (No Core) condition, the two measured conditions can be compared directly only if the baseline subcondition is evaluated the same way in the two groups. A Twosided Mann-Whitney U test shows no significant difference (p > 0.05) in any dimension between the two baselines in any questionnaire. This allows for the two measured sub-conditions to be compared to each other, directly. (a) GODSPEED: The GODSPEED test [65] is used to measure participants' impression of the robot on anthropomorphism, animacy, likeability, perceived intelligence and perceived safety. A one-sided Mann-Whitney U test is conducted for all dimensions with an alternative hypothesis that the Impatient Low-arousal condition is rated higher than Patient High-arousal. The results show no significant differences (p > 0.05) in any dimension despite some evidence for the robot rated as more natural (U = 154.5, p = 0.07), humanlike (U = 158.0, p = 0.053) and conscious (U = 158.0, p = 0.061) under the Impatient Low-arousal condition. (b) Mind Perception: The Mind Perception test [66] measures agency and experience for attributing a mind in an entity (in this case, NICO). The robot is evaluated on its ability to experience fear, exercise self-control, feel pleasure, remember the participant, feel hunger and to act morally. Based on these factors, the robot's agency and experience under different conditions is concluded. A one-sided Mann-Whitney U test is conducted with the alternative hypothesis that the Impatient Low-arousal condition is rated higher Fig. 9: Asch's Test results with mean and 95% CI for individual dimensions comparing the two measured conditions. on agency and experience with no significant difference (p > 0.05) concluded between the two conditions. (c) Asch's Formation of Impressions of Personality: Asch's study [67] measures the impact of independent behavioural traits on the overall impression of an individual. Here, participants evaluate NICO on 10 different parameters. Their impressions for the robot under the two measured conditions can be seen in Fig. 9. For all dimensions, except wisdom and persistence, the Impatient Low-arousal condition is rated higher, while in these dimensions, the Patient High-arousal condition is rated higher. A one-sided Mann-Whitney U test conducted on all dimensions shows significant results (p < 0.05) in the generous (U = 67.0, p = 0.018) and altruistic (U = 74.0, p = 0.033) dimensions in favour of the Impatient Low-arousal condition (see Table. IV), and in the persistence (U = 75.0, p = 0.034) dimension in favour of the Patient High-arousal condition. Despite some evidence supporting the alternative hypothesis for goodnatured dimension (U = 79.0, p = 0.052), no other conclusions can be drawn.
V. DISCUSSION This work explores a robot's appraisal to ground evolving affective representations that not only consider the behaviour of the participant during an interaction (see Section III-A), but also understand its impact on the conversation (see Section III-C), learning how to respond to them (see Section III-D). This is guided by the personality traits of the robot (see Section III-B) which have a significant impact on its affective appraisal.
Quantifying the affective impact of the duration of an interaction is beneficial for a robot, particularly in collaborative HRI scenarios. A patient time perception can be helpful in dealing with negative or even aggressive situations as it will allow the robot to maintain a positive outlook during the interaction. This can be beneficial for robots acting as companions for humans in different collaborative scenarios such being caretakers for the elderly and teachers for the young. Conversely, impatience results in a significantly lower intrinsic state of the robot, rapidly decaying its mood as the interaction progresses. This may enhance spontaneity in robot behaviour as it finds ways to resolve a negotiation quickly, to avoid negative intrinsic states.
Interactions with high intensity cause the robot to form excitatory (or high-arousal) tendencies that amplify its affective state. While interacting with the users, the robot is easily excitable, experiencing every situation in the extreme. An inhibitory (or low-arousal) conditioning, on the other hand, results in a subjugated behaviour of the robot, diminishing the impact of affective interactions and adopting an inert approach towards its interaction with the users. Combining time perception and social conditioning allows for modelling specific personality dispositions in the robot with the two influences either complementing each other, for example, Patient high-arousal and Impatient low-arousal, or contrasting each other, for example, Patient low-arousal and Impatient high-arousal conditions. These conditions have a distinct impact on the affective appraisal of the robot (see Table II) as the resultant mood does not merely mimic the user's affective state but reflects the robot's intrinsic dispositions.
The robot's intrinsic mood, modulated by specific affective core dispositions, as well as history with a user, governs its negotiations in the Ultimatum Game. In our experiments, the patient high-arousal robot is witnessed to stand its ground longer, driving a hard bargain with users while the impatient low-arousal robot, on the other hand, is more giving and generously offers more points. This is highlighted in the quantitative analysis (see Section IV-D3) of the robot as well as the subjective evaluations by the participants (see Section IV-D4).
During interactions, based on the robot's behaviour, the participants were witnessed adopting different negotiating strategies. While some approached the interaction donning a more commanding role, strongly arguing with the robot to yield, others followed a fawning approach trying to manipulate the robot by smiling more often and requesting more points. Both strategies, given the experiment condition and the expressiveness of the participants, worked to some extent with the robot offering as high as 52% of the points. Furthermore, at the beginning of the interactions, some participants were more conscious and distant, but as the interaction progressed, they became more open and proactive in the interaction. This is seen in the reasoning provided by them for their rejection which ranged from a cold and direct "I want more points" later to a more expressive and layered "Come on, NICO. This isn't fair. You can do better". This suggests that as the interaction progressed, the robot was able to engage the users. It exhibited responsiveness towards the users' negotiating strategies, initially yielding to their demands for more points but, as the interaction progressed, it adapted its negotiation strategy, encouraging the users to also adapt.
Despite the participants noticing significant differences in its negotiating strategy (see Table IV), the general perception of the robot did not change under different conditions. This could be due to the fact that the only difference between conditions is in how the robot updates its offers. The interaction structure, what is said and robot's facial expressions remain the same between conditions. This difference is perhaps too subtle to induce an overall change in perception towards the robot. It will be interesting to also modulate dialogues to reflect the robot's mood, adding phrases that reflect the affective core condition.

VI. CONCLUSION
In this work, we present a comprehensive framework for modelling personality-driven robot behaviour in collaborative HRI scenarios. Using a multi-modal affective appraisal model, it forms an evolving understanding of human behaviour, yielding intrinsic responses in the robot towards the user, that constitute its own affective state. This intrinsic state is used to learn negotiating behaviour in the Ultimatum Game. The affective core of the robot realises specific personality traits in the robot that influence its intrinsic state as well as its behaviour. This is beneficial for the robot to dynamically interact with users rather than following static pre-determined behaviour policies.
The results from the user study show that the participants were able to notice the effect of the affective core on factors such as generosity and persistence which directly evaluated the robot's behaviour in the Ultimatum Game. The general impression of the robot, however, did not change significantly. Further experimentation is needed, involving longitudinal studies with more participants, to conclude any significant impact on the overall impression of the robot. Furthermore, in the user study, the affective core models are pre-trained and used after freezing the weights for the Gamma-GWR model. This was done to simplify the training and eliminate the effect of changing affective core biases on the performance of the robot. It will be interesting to let these models to grow and adapt as the robot interacts with more users, allowing the robot to change its outlook on the users as it interacts with them.