The Psychophysiological Experience of Solving Moral Dilemmas Together: An Interdisciplinary Comparison Between Participants With and Without Depression

Dyads with a depressed and a non-depressed participant (N = 15) and two non-depressed participants (N = 15) discussed a moral dilemma, during which the participants’ gaze direction and skin conductance (SC) were measured. Partner gazing occurred most frequently when a speaker took a strong stance toward saving a person in the dilemma, depressed participants however looking at their co-participants less often than non-depressed participants. The participants’ SC response rates were higher during responsive utterances expressing disagreement (vs. agreement) with co-participant ideas or suggesting that a person be sacrificed (vs. saved). We argue that a better understanding of the affective corollaries of human social interaction necessitates a balanced consideration of both contents of talk and behavioral patterns.


INTRODUCTION
Choosing the least of several evils is a common everyday challenge, which is likely to provoke anxiety and arousal in most individuals. It is therefore only to be expected that people tend to discuss their dilemmatic situations and the different choices that they entail with other people, instead of mulling over them in solitude. This may be assumed to be the case particularly in those situations where the decision-making is intertwined with complex moral considerations. It is therefore remarkable that, although there are large bodies of studies on solving moral dilemmas as an individual phenomenon (see Christensen et al., 2014), there are only few studies addressing moral problem solving in social interactional encounters (see, however, Lavelle et al., 2014). Furthermore, the experience of solving moral dilemmas may be distinct in depression, due to the increased threat arousal and pathological worry that have been associated with the condition (Starcevic, 1995), but not much is known about how this might show when solving these types of problems together with others. In addition, the field of empirical social interaction research has been divided in that the researchers typically focus either on the content of talk (e.g., Bales, 1950;Levenson and Gottman, 1983;Luminet et al., 2000;Zech and Rimé, 2005;Smirnov et al., 2019) or on the patterns of the turn-by-turn unfolding of interaction that are independent of the specific contents of utterances (e.g., Schegloff, 2007;Sidnell, 2014;Arundale, 2020). Here, we argue that a better understanding of human social interaction necessitates a balanced consideration of both of these aspects of it. Drawing on data from an experiment where participants with and without depression discussed a moral dilemma, which required them to make a decision to sacrifice a person in order to save others, we examine, how both the contents of talk and the patterns of gaze and the turn-by-turn unfolding of conversational utterances are underpinned by the participants' physiological arousal responses during the conversation.

Solving Moral Dilemmas Together
Moral dilemmas have become a standard methodology for research on moral judgment (Christensen et al., 2014). Moral dilemmas are hypothetical short stories, which describe a situation in which conflicting moral reasons are relevant (e.g., Foot, 1967; also see; Thomson, 1976). Traditional theories of moral development (see Kohlberg, 1969) have emphasized the role of controlled cognition in the maturation of moral judgment. In general, solutions to moral dilemmas have been clarified with reference to two different philosophical ethics: utilitarianism or deontology. Utilitarian judgments (Mill, 1998) focus on "the greater good" in the outcome and aim at maximizing benefits for the largest number of people. The deontological perspective (Kant, 1959), in contrast, highlights one's obligations and responsibilities towards other people (e.g., the imperative not to kill), which can trump utilitarian considerations (Greene et al., 2008).
The conflicting types of moral judgment have been compared and examined in their own right, studies showing that utilitarian judgments are rational and unemotional (Lee and Gino, 2015) and require a high working memory capacity (Moore et al., 2008). Deontological judgments, then again, have been suggested to indicate intuitive and emotional processes (Greene et al., 2008; for a review, see ; Christensen and Gomila, 2012), motivated by one's relationships and personal pursuits towards other people (cf. Scheffler, 2010). The decision-makers' judgments have been shown to be influenced by different parameters of the moral dilemma task, such as psychological and emotional distance, concreteness and visuality, as well as the existence of time pressure (Amit and Greene, 2012;Aguilar et al., 2013;Körner and Volk, 2014). Some researchers have also investigated and compared the task-related emotional arousal of the participants dealing with different types of dilemmas by using self-report measures (Christensen et al., 2014).
In all the diverse above-mentioned studies, the moral dilemmas are directed to one "decision-maker" (Kvalnes, 2019). However, if the moral dilemma task is solved in interaction with another participant, the situation becomes inherently much more complex. In order to address moral decision-making as a social interactional phenomenon with its specific behavioral and emotional processes, some researchers have used the so-called "balloon task" to stimulate conversation between participants in experimental settings (e.g., Purver et al., 2003;McCabe and Lavelle, 2012;Lavelle et al., 2013;Lavelle et al., 2014;Howes et al., 2016). In the balloon task, which we also use in the current study, participants are presented with a fictional scenario in which a hot-air balloon is losing altitude and is about to crash. The only way for any of the three passengers of the balloon to survive is that one of them jumps to a certain death. The three passengers are: a cancer scientist, a pregnant primary school teacher, and the husband of the teacher, who is also the pilot of the balloon. The balloon task has been deemed effective in generating debates between the participants in interaction (Purver et al., 2003, p. 6). In the previous studies using the balloon task, the investigation has however focused merely on the patterns of interactional behavior unrelated to the content of the moral dilemma task, such as the use of clarification questions (Purver et al., 2003) and the practices of participation and nonverbal communication (Lavelle et al., 2013;Lavelle et al., 2014). In this paper, in contrast, our focus will be on the core aspect of solving moral dilemmas in social interaction: making proposals with different contents and varying degrees of expressive strength, defending these proposals through different types of arguments, agreeing or disagreeing with the arguments of the co-participant, and, finally, negotiating a joint decision.
Each of the above-listed conversational actions, which constitute the activity of solving moral dilemmas together, can have significant affective corollaries. A proposal as an "initiating utterance" is a powerful conversational action, which entails, not only a claim of the right to have a word to say in the matter at hand, but also a claim of the right to determine the content of the participants' local interactional agenda. Proposal speakers are sensitive to these implicit claims, orienting to a need to mitigate their proposals and their implicit claims of power in various ways (Stevanovic 2013;Stevanovic 2015). Furthermore, in the context of the balloon task, the mere content of the proposalthat is, the question of who should jump to deathcan in itself be an arousing matter to say aloud in the presence of another person. Initiating utterances, in turn, make relevant "responsive utterances", which, in the case of proposals, may either agree or disagree with the arguments presented in the proposal. While there are "sociable arguments" (Schiffrin 1984;Schiffrin 1990) and specific conversational contexts, such as radio or television talk shows (Hutchby, 1996;Thornborrow, 2015), where controversies are highly expected, it may be generally assumed that speakers tend to avoid argument and disagreement. This is shown in the participants orienting to a need to mitigate the facethreatening implications of their differences of opinion (Goffman, 1955;Brown and Levinson, 1987) and to display an overall preference for agreement, for example, by producing their dispreferred responses with delay (Pomerantz, 1984). All this suggests that the production of disagreeing turns in response to proposals is something that the participants themselves perceive as interactionally problematic. Finally, also the reaching of a joint decision can be an arousing interactional eventespecially, when the participants feel responsible for its content (Stevanovic et al., under review) and, presumably, also when the content of the decision in itself has affective salience, as is the case in the context of solving moral dilemmas.
One important resource that participants use to regulate the affective corollaries of their utterances is gaze. First, prior literature on the use of gaze in face-to-face suggests that gaze can be used to regulate the interactional force of one's utterances. With reference to the notion of "mobilizing response" Stivers and Rossano (2010) suggested that, across various types of utterances, the speaker's gaze on the recipient increases the recipient's pressure to respond to the utterance. The response-mobilizing function of gaze is in line with the psychological literature where gaze directed straight to the co-participant is perceived as an indication of dominance (Argyle and Dean, 1965;Hall et al., 2005). When such dominance co-occurs with an utterance that potentially implies a decision, it may be perceived as strengthening the display of the speaker's commitment to it. It is a different thing to say "Let's do X" when you look at your coparticipants, compared to when you don't. Second, gaze withdrawal may be used to indicate and manage the relative delicacy of the content of the talk. While direct eye contact is common when the topic under the discussion is "easy"that is, cognitively more straightforward and less personal (Argyle and Dean, 1965), people tend to direct their gaze away from the coparticipant when discussing a difficult topic or feeling uncertain or ashamed (Burgoon et al., 1996;Bente et al., 1998). Avoidance of mutual gaze is also more frequent when the social situation is experienced as threatening or anxiety provoking (Ewbank et al., 2009;Schulze et al., 2013). All this suggests that the investigation of the behavioral and emotional processes associated with moral decision-making should include the examination of the participants' use of gaze as a key aspect of their interactional behavior.

Affective and Psychophysiological Underpinnings of Social Interaction
Many studies of interaction have considered the specific contents or topics of participants' talk as the main target of investigation. In his pioneering work, Bales (1950) developed systematic methods of group observation and measurement of interaction processes, launching a coding system that classified group behavior into task-oriented and relationship-oriented interactions. Analogous coding schemes have also been used in a range of psychophysiological and neurological studies of social interaction. In a seminal study, Levenson and Gottman (1983) investigated discussions between marital couples. The authors employed a combination of measurements, such as heart rate, skin conductance and movement, to construct a combined measure to assess psychophysiological synchronicity between the participants, showing that this measure was higher when the participants were discussing their marital problems and lower when they were discussing more neutral topics. In a similar vein, Smirnov et al., (2019) investigated the synchronization of brain activity across speakers and listeners during the telling of emotional or neutral autobiographical stories. Contents and topics of social interaction have also been investigated from the point of view of people's subjective needs to talk about specific, affectively salient issues (e.g., Luminet et al., 2000;Zech and Rimé, 2005).
In contrast, empirical social interaction studies utilizing conversation analysis have mainly focused on describing the patterns of the turn-by-turn sequential unfolding of naturally-occurring interactionthat is, the chaining of conversational actions (such as requests, proposals, invitations) and their responses (such as acceptances and rejections)such "structures of social action" (Atkinson and Heritage, 1984) having been considered as independent of the specific contents of talk (Schegloff, 2007;Sidnell, 2014;Arundale, 2020). A central advantage of the approach lies in its capacity to reveal participants' own orientations (emic) to what is going on in the encounter (see e.g., Garfinkel and Harvey, 1970: 345;Schegloff, 1997), instead of being based on the researcher's a priori assumptions about the social world and interaction (etic).
During the most recent years, however, novel conversationanalytically informed research interests have emerged, which have also given rise to new types of theoretical and methodological challenges. On the one hand, contemporary measurement technologies such as motion capture (Edlund et al., 2013;Stevanovic et al., 2017) and eye-tracking (Dindar et al., 2017;Kendrick and Holler, 2017) have been seen as valuable tools to get detailed knowledge of participant behaviors. Using these technologies in a laboratory, however, involves a shift from naturally-occurring interactions toward more researchercontrolled realizations of the interactional encounters under investigation. Of course, knowledge about the basic structures of interaction may also be gained in these settings, but the task instructions and their potential influence on the results must be carefully considered (for a discussion on the "natural-contrived" continuum of producing social interaction data, see Speer, 2002).
In a similar vein, the rise of conversation-analytically informed interdisciplinary research endeavors concerning, for example, the kinds of prereflective, unconscious, or involuntary phenomena such as body sway (Stevanovic et al., 2017) or psychophysiological reactions (Peräkylä et al., 2015;Stevanovic et al., 2019;Stevanovic et al., 2021;Voutilainen et al., 2014) has made it necessary to move beyond the mere case-by-case qualitative analysis of interactional sequences to approaches involving coding and quantification, which enable the making of generalizations across multiple instances of data. This is necessary to be able to deal with the relatively high level of "noise" that is an inevitable part of these types of data. Coding and quantification, however, involves a risk of an epistemological shift from the emic towards the etic (e.g., Markee, 2012). Conversationanalytically informed researchers nonetheless seek to incorporate participants' own orientations in the coding schemes as far as is possible (for a discussion on the topic, see Stivers, 2015), and their studies have contributed to an increasing understanding of several, inherently emic concerns associated with the turn-byturn sequences of interaction. For example, Peräkylä and colleagues (2015) found that affiliative story reception is associated with a decrease in the storyteller's arousal and an increase in the story recipient's arousal, as measured by the participants' skin conductance (SC) responses. In a similar vein, Stevanovic and colleagues (under review) used a series of food-decision-making tasks, observing that the relinquishing of one's initially established preferences was associated with higher SC response rates than either acceptances or rejections of the coparticipants' proposals. Indeed, building on, extending, and contributing to the initial goal of conversation analysis, which Frontiers in Communication | www.frontiersin.org February 2021 | Volume 6 | Article 625968 is to reveal participants' own orientations to interactional events and behaviors, these studies have shed light on the psychophysiologically underpinned experiential aspects of precisely these types of orientations.
In this study, we consider the psychophysiological underpinnings of moral-decision-making interaction. In our view, the consideration of this type of interaction necessitates consideration of the specific contents of the participants' utterances. Solving a moral dilemma together involves each participant drawing on their own moral judgment, defending and opposing views that come across as justified or objectionable. Proposals in this context are thus not "just" proposals to be treated independent of what has been suggested, but the moral implications of these proposals may be tightly bound to their specific contents (e.g., saving or sacrificing an individual from the crashing balloon). But this context also makes it relevant to consider the more generic patterns of interactional conduct. When co-occurring with affectively salient contents, such as the ones that characterize moral-decision-making interaction, instances of agreement and disagreement may reverberate in the participants' bodies even more than they would do in more neutral everyday settings. In this sense, our study draws on both of the two broad traditions of empirical social interaction research described above, seeking to build a bridge between them.

Depression and Social Interaction
The Diagnostic and Statistical Manual of Mental Disorders (DSM-5, American Psychological Association, 2013) associates depression with loss of pleasure, feelings of worthlessness, indecisiveness, and thoughts of death. It is therefore not a surprise that text analysis methods have shown that those with symptoms of depression use excessive number of words conveying negative emotions (Tausczik and Pennebaker, 2010). A recent computerized big data text analysis conducted by Al-Mosaiwi and Johnstone (2018) examined the use of absolutist words, such as "always", "nothing" or "completely", and found that absolutist words were even better markers for mental health forums than negative emotion words. This was interpreted in relation to so-called "absolutist thinking", which has been suggested to underlie anxiety and depression (Beck, 1979;Burns, 1989;Williams and Garland, 2002). In addition, dichotomous thinking, cognitive rigidity, and problem-solving deficits have been repeatedly found to co-occur in suicidal individuals (see Ellis and Rutherford, 2008 for a review). In a more interactional perspective, studies on storytelling in therapeutic interactions and clinical interviews have identified specific depression-related language use, which highlight the feelings of helplessness, hopelessness, and low personal agency in the narratives of individuals with depression (Vanheule and Hauser, 2008;Angus and Greenberg, 2011;Ekberg and LeCouteur, 2015;Muntigl, 2016). It is an open question, however, whether the above-mentioned findings regarding expressions of negative affect and low personal agency characterize the conversational interactions of individuals with depression in non-clinical contexts, especially when individuals with depressive symptoms can be very skillful in hiding their condition (Kirk et al., 2000).
There is much research on how gaze behavior is altered in depression. Results of eye-tracking studies reveal that, compared to non-depressed controls, individuals with depression spend more time viewing negative images (e.g., sad faces) and less time with positive or neutral images (Kellough et al., 2008;Sanchez et al., 2013;Isaac, et al., 2014). Furthermore, research on clinical interviews has shown that patients with depression display less eye contact with mental health professionals than the patients with other psychiatric conditions such as schizophrenia (Jones and Pansa, 1979) and non-depressed controls (Hinchliffe et al., 1970;Sobin and Sackeim, 1997;Fiquer, et al., 2018). Interestingly, the avoidance of eye contact has been observed to emerge regardless of the severity of depression and to persist relatively long after treatment (Fiquer et al., 2018). In the context of social interaction, the gaze behavior of individuals with depression has been interpreted as withdrawal from social contacts and as avoidance of intimacy (Hinchliffe et al., 1970). Less is known, however, of whether and how the complexities of gaze behavior during the micro-phenomena of the turn-by-turn sequential unfolding of interaction might be modulated by depression. Conversation analytic research on social interaction deficits, such as autism spectrum disorder, has suggested that the deviances attributable to the clinical condition may sometimes instantiate themselves particularly at very specific moments of the interactional sequences (Wiklund, 2012). What is not yet known, however, is whether something like this could also characterize gaze behavior in depression.
Given the overall anchoredness of social interaction in embodied, emotional and psychophysiological processes (see Peräkylä et al., 2015), there may exist idiosyncratic patterns of psychophysiological responses for participants with depression engaged in social interaction. In general, depression is associated with dysregulation in both parasympathetic and sympathetic branches of the autonomic nervous system (ANS) (e.g., Rottenberg, 2007;Rottenberg et al., 2007;Kemp et al., 2010;Beauchaine 2015;Koenig et al., 2016;Sarchiapone et al., 2018;Brush et al., 2019). These idiosyncrasies include a flat or low SC profile (Vahey and Becerra, 2015), which seems to be a reliable feature of depression and a valid marker of suicidal risk (Sarchiapone et al., 2018), and is consistent with early theorizing considering the behavioral and physiological underarousal as a prominent part of depressive symptomatology (Grossberg, 1972;Benning and Ait Oumeziane, 2017). Many laboratory studies have also associated depression with alleviated reactions to negative and positive cues, such as winning or losing money in mock gambling paradigms (Henriques and Davidson, 1990;Henriques and Davidson, 2000;Sloan et al., 2001) or watching sad or amusing films (Rottenberg et al., 2002). While some researchers have thus considered alleviated reactivity to positive and negative social cues as a hallmark of major depressive disorder (see Henriques and Davidson, 1991;Rottenberg, 2005), this conclusion has been challenged in studies on the reactivity to social cues outside the laboratory. In these studies, individuals with depressions have been, in contrast, observed to display heightened sensitivity to both positive and negative social cues (Needles and Abramson, Frontiers in Communication | www.frontiersin.org February 2021 | Volume 6 | Article 625968 1990; Allen et al., 2004;Gilbert, 2006;Steger and Kashdan, 2009). It is therefore possible that, for example, the phenomena of increased threat arousal and pathological worry, which have been considered as a part of the etiology of depression (Starcevic, 1995), could show in heightened physiological arousal during conversational encounters. This contrasting hypothesis received support from our own earlier study (Stevanovic et al., under review), where we found that, during an affectively neutral conversational decision-making task, participants with depression exhibited generally higher SC response rates than their healthy comparisons. In this study, we consider whether this finding applies also to decision-making interactions with affectively more salient content.

Research Question and Hypotheses
In this study, we ask how the participants' interactional behavior during a dyadic moral dilemma task is reflected in their psychophysiological responses and gaze behavior. The more specific hypotheses, which we test empirically, are the following: Hypothesis 1: We assume that transitions between activities necessitate heightened intersubjectivity (Stevanovic et al., 2017), which will be reflected in the participants' higher SC response rates during beginning and end phases of the moral dilemma task, compared to the middle phase.
Hypothesis 2: During the middle phase, the content of talk and the patterns of gaze and the turn-by-turn unfolding of conversational utterances are reflected in the participants' psychophysiological responses. Here, we make the following, more specific predictions: a). Given the previous association between physiological arousal and talk about affectively salient issues (Levenson and Gottman, 1983;Luminet et al., 2000;Zech and Rimé, 2005), we assume that utterances concerning a balloon passenger sacrificing him-or herself by jumping from the balloon, are associated with higher SC response rates in the speaker than utterances concerning the saving of a balloon passenger. b). Drawing on previous literature on the relationship between the delicacy of talk and gazial behavior (Argyle and Dean, 1965;Burgoon et al., 1996;Bente et al., 1998), we assume that strong "initiating utterances"that is, utterances that present a specific proposal in favor of sacrificing or saving a personare associated with more gazing towards the co-participant than the weak ones. c). Based on the idea of gaze having specific "response mobilizing" features (Stivers and Rossano, 2010), we assume that more gazing towards the co-participant is associated with faster recipient responses. d). Drawing on the notion of preference (Pomerantz, 1984) and on the assumption that the production of dispreferred actions may thus be experienced as affectively salient, we assume that "responsive utterances" that express disagreement with what the co-participant has said before are produced with delay and associated with higher SC response rates than responsive utterances that represent agreement.
Hypothesis 3: The contents of talk and the patterns of gaze and the turn-by-turn unfolding of conversational utterances, and the SC response rates may be different for participants with and without depression. Here, we make the following, more specific predictions: a). In accordance with earlier literature concerning depressed individuals' excessive use of negative emotion words and language conveying hopelessness and low personal agency (e.g., Tausczik and Pennebaker, 2010;Angus and Greenberg, 2011) we assume that participants with depression diagnosis make fewer strong (vs. weak) and initiating (vs. responsive) utterances in general but have a higher proportion of sacrificing (vs. saving) utterances than their non-depressed comparisons. b). Participants with depression exhibit less coparticipant gazing than their non-depressed comparisons, these differences being possibly most prevalent at specific, critical moments of interaction (e.g., Wiklund, 2012). c). SC response rates may be different for participants with and without depression. Given the mixed evidence so far, involving both the ideas of the physiological underarousal (Grossberg, 1972;Benning and Ait Oumeziane, 2017) and increased threat arousal and worry (Starcevic, 1995;Stevanovic et al., under review) as parts of depressive symptomatology, we refrain from making predictions about the direction of the effect.

Ethics
Informed, written consent was given by all participants prior to study, after they had been informed about the aims of the study and about their rights to withdraw their consent anytime they wished (see below). Institutional Review Board approval was obtained from the Ethics Committee of the Helsinki University Central Hospital [June 18, 2018].

Participants
We recruited participants (N 15) who had been diagnosed with middle stage depression within the past 12 months, and, as a comparison group, participants (N 45) who had not got a depression diagnosis within the past ten years. The participants (N 60) had at least five years (3 years if under 25) of working life experience and with at least one bachelor's degree or equivalent level of education. The participants were divided into two groups of pairs: 15 pairs, where one participant had a depression diagnosis ("case pair"), and 15 pairs, where neither participant had been diagnosed with depression within the past ten years ("comparison pair").
Participants were recruited through social media and the University of Helsinki mailing lists. Potential participants were asked for background information (age, education, work history, and earlier depression diagnosis) through a phone interview. Based on this information the candidate was either excluded from the research or guided to the group of participants with depression diagnosis or to the comparison group. The clinical status of the participants with depression diagnosis was confirmed by a medical specialist in psychiatry and general practice, who met each participant privately and did a clinical interview and needed inquiry on symptoms by using the Beck Depression Inventory (Beck et al., 1961) and Montgomery-Åsberg Depression Rating Scale (Montgomery and Åsberg, 1979). The medical specialist also took care of arranging treatment for the participants when needed. Before the experiment, both participants were guided to fill out a set of questionnaires and the purpose of the research was clarified verbally and in writing. The participants were informed about our focus being on the structures of decisionmaking interaction and on the impact of mood on its dynamics. The clinical status of those participants with depression diagnosis was not revealed to the interaction partner, because the information could have affected the dynamics of the subject of study and, furthermore, could have unnecessarily stigmatized these participants. At this point we also gave the participants the opportunity to ask questions about the research. The participants were informed about the researchers' obligation to maintain secrecy, the practices of anonymity and data management, the publication of the research results, and the voluntariness of participation in the research. The participants were also told that, even after the written consent, they could still withdraw their participation at any time without this affecting their position or treatment. The participants were also told how to reverse their consent in practice.

Equipment
Skin conductance (SC), as well as blood volume pulse (BVP), were measured from both participants at a 128 Hz sampling rate with NeXus-10 (Mind Media, Netherlands) devices. SC was measured via two foam electrodes that were placed on the medial side of the left foot. The BVP sensor was attached to the second digit of the left foot. Binocular head-mounted Pupil Labs eyetrackers (Pupil Labs UG haftungsbeschrnkt, Berlin, Germany) were used to record eye-movements from both participants at a 60 Hz sampling rate. The eye-trackers were simultaneously calibrated with 16 calibration markers that were presented one by on a LG OLED55C7V 55" monitor. The open-source Pupil Capture software (v1.8 from: https:// github.com/pupil-labs/pupil) was used to record and calibrate the eye tracker. In addition, Shimmer3 IMUs (Shimmer Sensing, Ireland, Dublin) were attached to the right wrist of each participant to record linear acceleration and angular velocity. The NeXus-10, Shimmer3 and Pupil recordings were synchronized via Unix timestamps with a custom-made software (https://github.com/samtuhka/InteractionExperiment-Controller). Only skin conductance data and gaze data are analyzed in this paper.

Experiment
One pair of participants was studied at a time. As described in the Introduction, the participants were presented with a moral dilemma where they were asked to imagine a fictional scenario where a hot-air balloon with three passengers is losing altitude and about to crash. The only way for any of the passengers to survive is for one of them to jump to a certain death. The three passengers are: a scientist whose research could bring about a revolutionary treatment for cancer, a pregnant primary school teacher, and the husband of the teacher, who is also the balloon pilot. The participants were asked to come up with an agreed-upon decision on which one of the passengers should jump from the balloon.
The instructions and description of the task were presented verbally by one of the experimenters. No further instructions were provided on whether, for example, the two remaining passengers could steer the balloon without the pilot, or how much of his research the scientist may have shared with his colleagues. No time limits or other constraints were placed on the participants.
The participants also completed two other tasks that are not reported in this paper (see Stevanovic et al., 2021;Stevanovic et al., under review). The order of these tasks was counterbalanced across dyads. The eye trackers were calibrated between each trial. The participants sat facing each other at about an 120°angle from each other. The angle was chosen so that the participants wouldn't have to change position to calibrate the eye-tracker.
At the beginning of each session, the participants were asked to fill in a set of questionnaires: (1) Locus of Control Scale (Rotter, 1966), (2) Self-Monitoring Scale (Snyder and Gandestad, 1986), (3) Empowerment Scale (Rogers et al., 2010), (4) Ten-Item Personality Inventory, TIPI (Gosling et al., 2003), as well as to answer questions about their perceptions and experiences of the task requirements, their interaction partner, and the dynamics of interaction.

Annotations
We used Praat (Boersma, 2001) to annotate the participants' interactional behavior during the moral dilemma task (see Supplementary Material for a more detailed description of the annotations). First, we broke each task down into three phases. The beginning phase starts when the experimenter stops giving the instructions to the participants and ends after 10 s, during which the participants' usually give their first reactions to the task. The middle phase is where the participants negotiate about who should be sacrificed and who should be saved. In the end phase the participants' give their final reactions to the task after making the decision. The end phase starts at the moment when one of the participants begins to pronounce their final decision, summarizing what has been tentatively agreed upon previously (e.g., "Let's take the pilot") and ends when the participants stop discussing the task.
Second, during the middle phase of the task, we annotated the initiating utterances, where one of the participants presents a specific proposal concerning a person in the balloon. We coded for the content of the utterances based on whether they promoted the saving or sacrificing a target, and also the relative strength of the utterance i.e. whether it was produced in an absolute manner (strong) or whether it was expressed as a question or with a condition (weak). Finally, again during the middle phase of the task, we coded the responsive utterances, where one of the participants reacts to a suggestion made previously (e.g., "Yes that's true, and we don't even know if the medicine works"). Similarly to the initiating utterances, we first coded the content of the responsive utterances based on whether they promoted the saving or sacrificing a target. Furthermore, we coded for the interactional pattern based on whether the responsive utterance supported what the co-participant had said previously (agreement) or was in opposition to what the co-participant had said (disagreement). The responsive turns' annotations as agreeing or disagreeing were determined based on the participants' own orientations: we examined if and how the responsive turn was interactionally produced as (dis)agreeing with the prior. Responsive utterances that neither clearly Frontiers in Communication | www.frontiersin.org February 2021 | Volume 6 | Article 625968 supported nor opposed the prior were considered as ambiguous and excluded from the analysis. Six dyads (i.e., 20% of the entire data set) were randomly chosen to be independently annotated by a second rater for validation. Cohen's kappa coefficient was chosen as the statistical measure of interrater reliability. The derived kappa coefficient of 0.77 suggests a substantial amount of agreement (Landis and Koch, 1977), but it should be noted that this does not account for missing cases (approximately 37% of all annotations) where one rater had no comparable annotation at the spot.
In respect to the SC and gaze analysis, each annotated initiating and responsive utterance was regarded as a 4 s segment, beginning from 2 s before the point of annotation and ending 2 s after. This was chosen to accommodate the fact that there's no clear singular point in time when the participants 'should' physiologically react and that SC responses in particular can have a several second latency from the onset of a stimuli (Dawson et al., 2017).

Skin Conductance Responses
The SC signal was deconvoluted with the Richardson-Lucy algorithm (Richardson, 1972) in order to distinguish between overlapping SC responses (Bach et al., 2010;Benedek and Kaernbach, 2010). SC responses were identified (see Figure 1) from the deconvoluted signal through peak detectionall local maxima with a minimum prominence of 0.05 μS and a height of one standard deviation or higher above the mean level.

Face Detection
We used the YOLOv3 (Redmon and Farhadi, 2019) object detection algorithm (open-source implementation from: https://github.com/sthanhng/yoloface) to detect faces in each video frame (videos were produced by the forward cameras of the eye-trackers). Correspondingly, the gaze of each participant was determined using the 3D calibration and mapping mode of the Pupil Capture software. The gaze signal and the detected face locations were used to determine whether each participant was gazing the other or not on each frame (i.e. whether their gaze was located within the detected face).
In respect to the annotated segments, we determined a gazing rate for each participant by dividing the number of frames where the participant was gazing at their partner by the total number of frames.

RESULTS
The results section is divided into seven subsections. The first subsection investigates Hypotheses 1 and 3c, specifically the SC rates during the three main phases (beginning, middle, end) of the conversation. The next three sections concern Hypotheses 2 and 3b, and describe variables influencing, or at least correlating with, the gazing patterns during initiating and responsive utterances and the amount of time between these utterances. The two following sections feature the SC rates during initiating and responsive utterances in a similar manner. The final subsection concerns the general contents and patterns of talk concerning Hypothesis 3 and the differences between participants with and without a depression diagnosis.
Unless otherwise specified, statistical analyses were conducted via generalized linear mixed models (GLMM) with Gaussian response and identity link to control for the nonindependence of measures from individual dyads and participants. The p-values for GLMMs were estimated with the Satterthwaite approximation (Satterthwaite, 1941). Frontiers in Communication | www.frontiersin.org February 2021 | Volume 6 | Article 625968

Skin Conductance Response Rates in Different Phases of Conversation
In the investigation of Hypotheses 1 and 3c, we examined the three annotated phases of the conversation to see if mean SC response rates differed between the phase or on the basis of whether the participants were diagnosed with depression or not. Unlike in the other analyses where the utterances are always regarded as 4-s segments, the phases were of varying length (see Supplementary Materials for details on how the length of the phases were determined). The statistical analysis was conducted via a GLMM. Depression diagnosis and phase (three levels: beginning, middle, end) were included as fixed effects. The dyad and participant were incorporated as nested random effects (random intercepts) to control for the non-independence of participants within a dyad and repeated measures from an individual participant. The model summary can be seen on Table 1.
In contrast to the middle phase, both the beginning phase (p 0.02) and end phase (p < 0.001) had significantly higher SC response rates as predicted by the hypothesis. The depression diagnosis of the participants, however, had no significant effect. The mean response rates in different phases are visualized in Figure 2.

Gaze Patterns During Initiating Utterances
To investigate Hypotheses 2a, 2b and 3b in terms of the participants' gaze behavior, we examined the gazing ratios of speakers of initiating utterances in respect to whether the utterance was strong or weak, whether they proposed to sacrifice or save one of the passengers, and whether the speaker was diagnosed with depression or not. As explained in the Methods section, each utterance was regarded as a 4 s segment for the purpose of the analysis.
In the GLMM analysis, diagnosis status of the speaker, content (sacrifice or save), and strength (weak or strong) of the utterance were chosen as fixed effects with the content*strength interaction term included. As previously, the dyad and participant were incorporated as nested random effects. The model summary can be seen on Table 2.
Depression diagnosis had a significant negative effect (p 0.024) on gazing, indicating that depressed participants gazed at their co-participant less than the non-depressed participants as predicted in the hypotheses. On the other hand, the main effects of content and strength were not significant. However, the content*strength interaction term had a positive significant effect (p 0.043), indicating that strong initiating utterances had a more positive effect on gazing when the proposal was to save someone. The mean gazing ratios in respect to the depression diagnosis of the speaker, strength and content are visualized in Figure 3.

Gaze Patterns during Responsive Utterances
Similarly, to probe Hypotheses 2a, 2d and 3b, we investigated gazing ratios during responsive utterances in respect to the responder's depression diagnosis status, whether the responsive utterance aligned with what their co-participants had suggested previously and whether the utterance was in favour of sacrificing or saving one of the passengers. In a comparable manner to how the initiating utterances were examined, the gaze behavior analysis was conducted via a GLMM with diagnosis status of the speaker, agreement (disagreeing or agreeing), and content (sacrifice or save) of the utterance as fixed effects. The model summary can be seen on Table 3. No significant effects were found.

Response Time Patterns
In respect to Hypothesis 2c, we investigated if there was a correlation between the time interval between initiating utterances and responses and how much the speaker of the initiating utterance had gazed at their co-participant. In addition, we examined if there might be an effect regarding response times in terms of how strong the initiating utterance was, the depression status of the participants, and whether the responsive utterance aligned with the original proposal. The summary of the constructed GLMM can be seen on Table 4.
Both the gazing ratio of the speaker of the initiating utterance (p 0.02) and the agreement of the response (p 0.002) had a significant negative effect on the response time. In other words, higher amount of gaze (by the speaker of the initiating utterance) and agreeing responsive utterances were associated with faster responses.

Skin Conductance Response Rates During Initiating Utterances
Comparably to the gaze analysis, in investigation of Hypotheses 2a, 2b and 3c in respect to skin conductance, we examined the SC response rates of the participants during initiating utterances in terms of whether the utterance was strong or weak, whether the proposal was to sacrifice or save one of the passengers, and whether the speaker was diagnosed with depression or not. As before, each initiating utterance was regarded as a 4 s segment.
As in the gaze pattern analysis, diagnosis, strength, content and the strength*content interaction were included as fixed effects in the GLMM. Dyad and participant were chosen as nested random effects. The model summary can be seen on Table 5.
No statistically significant effects were found. However, it may be worth noting that the magnitudes of the estimated coefficients of strength and content are relatively large.

Skin Conductance Response Rates during Responsive Utterances
To probe Hypotheses 2a, 2d and 3c, SC response rates during responsive utterances were investigated in a comparable manner. As in the gaze analysis, diagnosis, agreement (disagreeing or agreeing), and content (sacrifice or save) of the utterance were chosen as fixed effects for the GLMM (see Table 6).
Both agreement (p 0.034) and content (p 0.023) had a negative significant effect, indicating that both disagreeing and sacrificing responsive utterances were associated with higher SC response rates compared to agreeing and saving utterances. This is in line with our hypotheses. The mean SC response rates in respect to agreement and content of the utterances are visualized in Figure 4.

General Contents and Patterns of Talk
In total, each participant made on average 2.3 (SD 1.8) initiating utterances (see Table 7 for relative frequencies in respect to the different classifications). There was no significant difference between participants with or without depression (participants with depression made on average 2.2 initiating utterances vs. 2.33 by participants without depression. Wilcoxon signed rank test for the case pairs, p 0.55). Nor was there a large difference among depressed and non-depressed participants in the proportion of initiating utterances (70% among participants with depressions vs 60% among those without. Wilcoxon signed rank test for the case pairs, p 0.24) that were to sacrifice one of the passengers as  opposed to saving them. There was no difference in terms of the strength of the utterance either (approximately 50% of the utterances were strong among both participants with and without depression. Wilcoxon signed test for the case pairs, p 0.96).
No differences were found in the patterns of the responsive utterances (see Table 8 for relative frequencies) either in respect to participants' diagnostic statusapproximately 70% of the responses were agreeing for both groups (Wilcoxon signed test for the case pairs, p 0.39).
In terms of the final choice on who should be jump, the balloon pilot was chosen by 16 dyads (case pairs: 8, control pairs: 8), the cancer scientist by 12 dyads (case pairs: 6, control pairs: 6) and the pregnant primary school teacher by two dyads (case pairs: 1, control pairs: 1). The two distributions between control and case pairs were identical.
In summary, in terms of the general contents and patterns of talk we observed none of the effects (in respect to the depression diagnosis of the participants) that were predicted by Hypothesis 3.

DISCUSSION
In this study, we have considered how the participants' interactional behavior during a dyadic moral dilemma task is reflected in their psychophysiological responses and gaze behavior. Here we will discuss the results in relation to our specific research hypotheses.
Our Hypothesis 1 was informed by the assumption that transitions between activities necessitate "heightened intersubjectivity", which can show in the participants' bodies as higher psychophysiological arousal during the beginning and end phases of the conversational task than during the middle phases of the task (see Stevanovic et al., 2017). Our data of the particicipants' SC response rates clearly support that conclusion. It is during the beginning and end phases of the conversational task that the participants need to pay particular attention to each other and determine how to start the decision-making activity and bring it coordinatedly to a close. In addition to reaching a common understanding of what the actual and binding decision ultimately is, the participants also need to manage their interaction then and there and know when it is appropriate to move on.
Next, we assumed that, during the middle phase of the moral dilemma task, the content of talk and the patterns of gaze and the turn-by-turn unfolding of conversational utterances are reflected in the participants' psychophysiological responses. With regard to the SC response rates (Hypothesis 2a), we hypothesized that utterances which suggest a balloon passenger to sacrifice him-or herself by jumping from the balloon will be associated with higher SC response rates in the speaker than the utterances concerning the saving of a balloon passenger. Our results support this conclusion. While making a conversational contribution in itself always entails speakers to put something of him-or herself "out there" for others to judge, and thus to submit into a vulnerable position (see Goffman, 1959;Goffman, 1967), our results show that the specific contents of utterances play a significant role in how they are psychophysiologically underpinned. Our results suggest that, in the context of solving a moral dilemma, suggesting a   person (pregnant woman, scientist or balloon pilot) to sacrifice themselves can be experienced as an interactionally more risky or threatening move than suggesting that someone should be saved. Notably, however, our results on the proportionately higher SC response rates in the sacrificing utterances are differentiated depending on the status of the utterance as an initiating vs. responsive one. The observed higher SC response rates were statistically significant only with regard to the responsive utterances, while the effect was smaller in the initiating utterances. At this point, we may only speculate why this might be the case. One possibility is that, in this particular context, responsive sacrificing utterances are one step closer to the reaching of the final moral decision (of who should jump from the balloon), which might show in an elevated SC response in the producer of the responsive utterance.
We hypothesized that strong initial utterances are associated with more gazing towards the co-participant than the weak ones FIGURE 3 | Mean gazing ratios and 95% confidence intervals during initiating utterances both in respect to their strength (weak or strong) and content (sacrifice or save) and whether the speaker had a depression diagnosis (green bars) or not (red bars).
FIGURE 4 | Mean SC response rates and 95% confidence intervals during responsive utterances in respect to whether they were agreeing or disagreeing, and to save or sacrifice. Both saving and agreeing had a negative effect in respect to SC response rate.
Frontiers in Communication | www.frontiersin.org February 2021 | Volume 6 | Article 625968 11 (Hypothesis 2b). Our eye tracking results support this conclusion only partially: the initiating utterances exhibited an interaction effect between the relative strength of the verbal expression (e.g., "should we spare the woman?" vs. "the pregnant woman cannot possibly jump") and the content of the expression (saving vs. sacrificing a person), with partner gazing occurring most frequently during those utterances in which the speaker argued strongly for saving a person. In other words, the effect was not observable in strong utterances where a person was suggested to be sacrificed (e.g., "we don't need the pilot"). Prior literature has pointed to the function of gaze as a way to increase the strength of one's utterance, which gets support from our result of more partner gazing leading to faster co-participant responses (Hypothesis 2c). This is linked to what Stivers and Rossano (2010) have described as "mobilizing response": gaze increases the pressure on the recipient to respond to an utterance. This is also in line with the notions in psychological literature according to which gaze directed straight to the co-participant is perceived as more dominant than gaze withdrawal (Argyle and Dean, 1965;Hall et al., 2005). Studies have also shown that the content of the talk affects gazing behavior. In general, there is usually more direct eye contact when the topic under the discussion is more "easy" and cognitively more straightforward (Argyle and Dean, 1965). When discussing difficult topics, feeling uncertain or ashamed, people tend to direct their gaze away from their coparticipant (Burgoon et al., 1996;Bente et al., 1998). This literature is also very much in line with our results, as strong utterances proposing the saving of a person were associated with more gazing towards the co-participant than sacrificing utterances, which can be seen as topically more delicate.
With regard to the patterns of talk, we hypothesized that responsive utterances that express disagreement with the initial proposal are produced with a delay and associated with higher SC response rates in the participant (Hypothesis 2d). This hypothesis was supported by our data. The result can be clarified with reference to the fundamental notion of "preference organization" most famously promoted in the field of conversation analysis (Pomerantz, 1984;Bilmes, 1988). Disagreement in decision-making is often expressed with a delay, which conveys that something problematic is going on, whereas acceptance is done straight away (Houtkoop, 1987; also see; Davidson, 1984;Pomerantz, 1984). Furthermore, in this context of moral stance-taking, displaying agreement with the co-participant's stance can be described as an affiliative action (Stivers, 2008;Stivers et al., 2011;Lindström and Sorjonen, 2013), whereas disagreement conveys disaffiliation. Our results suggest that the problematic interactional experience associated with disagreement and disaffiliation may have a psychophysiological correlate, leading to increased arousal in the speaker. Even though the anxiety-provoking and stressful nature of these disaffiliative actions seems intuitively plausible, the finding is not self-evident. In their study of storytelling interaction, Peräkylä and colleagues (2015) found that it was the empathetic and affiliative displays of recipiency to a story that led to increased psychophysiological arousal, which the authors interpreted with reference to the notion of "emotional labor" (Hochschild, 1979). As an activity, however, storytelling can be considered to be fundamentally different from solving moral dilemmas together. In this context, we argue, agreement, in the sense of going along with the co-participant's proposal, is not specifically a taxing interactional task, whereas disagreeing with the co-participant's moral stance might require emotional work (e.g., cautious formulations of disagreement, vigilant monitoring of co-participant reactions) to limit the damage that the disagreement might cause to the solidarity and affiliation between the participants.
Finally, we suggested that the contents and patterns of talk (Hypothesis 3a), patterns of gaze behavior (Hypothesis 3b), and the SC response rates (Hypothesis 3c) may be different for participants with and without depression. As for the contents and patterns of talk, we found no differences between the participant groups. While we assumed that participants with depression diagnosis would make fewer proposals with a higher proportion of sacrificing (vs. saving) proposals than participants without depression, our results did not lend support to such conclusions. As previous studies have shown (Kirk et al., 2000), participants diagnosed with depression can be highly skilled in concealing their condition, which may also have been the case in our sample. Concealing depressive symptoms can be motivated, for example, by a desire to maintain normality in front of other people (Draucker, 2005) and cultural patterns where emotional control, self-esteem, and invulnerability are central virtues (Emslie et al., 2006). We also hypothesized that participants with depression diagnosis would differ in their psychophysiological responses to conversational phenomena, but this prediction was not supported by our data. Our results therefore cannot shed light on the mixed evidence so far, involving both the ideas of the physiological underarousal (Grossberg, 1972;Benning and Ait Oumeziane, 2017) and increased threat arousal and worry (Starcevic, 1995;Stevanovic et al., under review) as parts of depressive symptomatology. However, we did find differences in the patterns of gaze behavior between the groups of depressed and non-depressed participants. In line with our hypothesis, the participants with depression were gazing less towards their co-participants, this result being statistically significant specifically during the production of initiating utterances, which may be argued to be most critical utterances in determining which direction the conversation will take. Hence, as all interactional resources, also gaze behaviors have distinct consequences depending on their precise location within interactional sequences (Rossano, 2012), which means that also interactional deficits should be examined by bearing in mind that it is specifically during those moments where partner gazing is most critical that also a lack of gaze may have quite drastic interactional corollaries (Wiklund 2012).
Our study has at least five key limitations, which we will discuss below. First, all the participants in our study were female. If our sample had included male dyads or crossgender dyads, the results might have been different. For example, Tang and Schmeichel (2015) found that direct eye contact with a target face especially affected the men's behavior, who acted in a more dominant fashion when making decisions in a hypothetical ultimatum game. Second, the fact that the participants were strangers to each other may have generated different results from what dyadic interactions between everyday acquaintances, friends, or family members would have brought about. Third, the participant sample for this research consists of volunteers, which can lead to a selfselection bias. As the participants were socially courageous Frontiers in Communication | www.frontiersin.org February 2021 | Volume 6 | Article 625968 enough to decide to volunteer in a study where one is expected to be talking with a stranger, it is likely that those who find such situations particularly stressful did not take part in our study, which may have influenced the results concerning stress-related physiological responses. Fourth, it should be noted that, though skin conductance measures provide powerful tools for assessing the level of arousal in participants, they provide no direct information about the valence of that arousal. Finally, the methodology we have utilized to investigate the psychophysiological experience of individuals with depression is somewhat limited and it should be complemented by other methods, such as in-depth studies of their everyday living environment, to reach a more comprehensive picture about their interactional competences and experiences in solving moral dilemmas with other people. In addition, the interdisciplinarity of our approach is associated with a set of theoretical and methodological contradictions that call for commenting. As pointed out at the beginning of this paper, conversation analysis is essentially about investigating naturally-occurring interactions in order to identify the participants' own orientations to interactional patterns and events (see e.g., Garfinkel and Harvey, 1970: 345;Schegloff, 1997). Our study involved two different types of compromises in this regard. First, our investigation of the psychophysiological underpinnings of interaction was conducted in a laboratory environment, where the realization of the interaction was under the control of the researchers. Our study was, however, informed by previous case-by-case conversation analytic studies on joint decision-making, proposals, and agreements (e.g., Davidson, 1984, Pomerantz, 1984Stevanovic, 2013;Stevanovic, 2015) and we sought to design our task instructions to maintain the essential natural dynamics of these particular interactional phenomena as far as possible. Second, the relatively high level of "noise" that is an inevitable part of psychophysiological signal made it necessary for us to resort to coding and quantification and thus to go beyond the case-by-case qualitative analysis of interactional sequences, where the focus is on how the meaning of each behavior is collaboratively negotiated throughout the sequence. While our coding of "strong" and "weak" proposal forms was thus essentially a matter of applying previous conversation-analytic findings on this dataset in an a priori fashion, as for agreements and disagreements, in contrast, the participants' own orientations were incorporated in our coding scheme. From this point of view, we believe that our findings are not entirely foreign to the emic concerns that conversation analysts are generally interested in. Earlier psychological studies investigating moral dilemmas have either focused on the individual and his/her moral judgment (e.g., Christensen et al., 2014) or, in contrast, studied the interactional patterns that the task generates without reference to the specific content of the conversational actions (e.g., Lavelle et al., 2014). This same tension can be found more generally in different social psychological domains, where the main focus is often in either the content of talk (e.g., Bales, 1950) or in the patterns of the turn-by-turn unfolding of interaction (e.g., Schegloff, 2007). Our study has shown that both of these aspects of human interaction are highly relevant, as they resonate in the participants' physical bodies. Furthermore, our research can contribute to the field of studying human moral judgment. Studies utilizing hypothetical moral dilemmas have been criticized for having little predictive value for actual behavior (see Bostyn et al., 2018), as participants in real-life situations refer to their "commonsense morality" (Kahane, 2015) instead of following purely deontological or utilitarian rules. What the discussion has been lacking, however, is the fact that moral decisions in the real world are rarely mulled over in solitude. Instead, people tend to discuss their dilemmatic situations and the different choices that they entail with other people and when they do, our study suggests that the contents and patterns of their interactional contributions reverberate in their physical bodies. Thus, to increase understanding of how moral decisions, which may sometimes have profound consequences, come to being, we need a deeper understanding of the affective and psychophysiological processes that underlie social situations.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Ethics Committee of the Helsinki University Central Hospital [18.06.2018]. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
EK is the main author of the paper and together with ST and MS was responsible for the analysis and interpretation of the results. ST was responsible for data collection and ES for the verification of the participants' depression diagnoses. Experiments were run by ST, KV and MJ. The writing of the first draft was accomplished by EK, MS, ST, TV and EW, and the revised version by EK, MS, ST and TV. MS was responsible for the conception and design of the study. All authors have approved the submitted manuscript and agreed to be personally accountable for their own contributions and for the integrity of the study.

FUNDING
This work was supported by the Academy of Finland (grant number 307630) and the University of Helsinki.