# EARLY MORAL COGNITION AND BEHAVIOR

EDITED BY : Kelsey Lucca, J. Kiley Hamlin and Jessica Sommerville PUBLISHED IN : Frontiers in Psychology

#### Frontiers eBook Copyright Statement

The copyright in the text of individual articles in this eBook is the property of their respective authors or their respective institutions or funders. The copyright in graphics and images within each article may be subject to copyright of other parties. In both cases this is subject to a license granted to Frontiers. The compilation of articles constituting this eBook is the property of Frontiers.

Each article within this eBook, and the eBook itself, are published under the most recent version of the Creative Commons CC-BY licence. The version current at the date of publication of this eBook is CC-BY 4.0. If the CC-BY licence is updated, the licence granted by Frontiers is automatically updated to the new version.

When exercising any right under the CC-BY licence, Frontiers must be attributed as the original publisher of the article or eBook, as applicable.

Authors have the responsibility of ensuring that any graphics or other materials which are the property of others may be included in the CC-BY licence, but this should be checked before relying on the CC-BY licence to reproduce those materials. Any copyright notices relating to those materials must be complied with.

Copyright and source acknowledgement notices may not be removed and must be displayed in any copy, derivative work or partial copy which includes the elements in question.

All copyright, and all rights therein, are protected by national and international copyright laws. The above represents a summary only. For further information please read Frontiers' Conditions for Website Use and Copyright Statement, and the applicable CC-BY licence.

ISSN 1664-8714 ISBN 978-2-88963-188-9 DOI 10.3389/978-2-88963-188-9

#### About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

#### Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

#### Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

#### What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# EARLY MORAL COGNITION AND BEHAVIOR

Topic Editors:

Kelsey Lucca, Arizona State University and University of Washington, United States J. Kiley Hamlin, University of British Columbia, Canada Jessica Sommerville, University of Toronto and University of Washington, United States

Photo by Katie Emslie on Unsplash.

Citation: Lucca, K., Hamlin, J. K., Sommerville, J., eds. (2019). Early Moral Cognition and Behavior. Lausanne: Frontiers Media SA. doi: 10.3389/978-2-88963-188-9

# Table of Contents


Tanya Mayer, Aja Faranda, J. Kiley Hamlin and Laurie R. Santos *144 Examining Infants' Individuation of Others by Sociomoral Disposition* Hernando Taborda-Osorio, Ashley B. Lyons and Erik W. Cheries

Katherine McAuliffe, Michael Bogese, Linda W. Chang, Caitlin E. Andrews,

# Editorial: Early Moral Cognition and Behavior

#### Kelsey Lucca1,2 \*, J. Kiley Hamlin<sup>3</sup> and Jessica A. Sommerville1,4

<sup>1</sup> Department of Psychology, University of Washington, Seattle, WA, United States, <sup>2</sup> Psychology Department, Arizona State University, Tempe, AZ, United States, <sup>3</sup> Department of Psychology, University of British Columbia, Vancouver, BC, Canada, <sup>4</sup> Department of Psychology, University of Toronto, Toronto, ON, Canada

Keywords: moral cognition, social cognition, infancy, early childhood, moral development

#### **Editorial on the Research Topic**

#### **Early Moral Cognition and Behavior**

To date, research on moral cognition and behavior has focused primarily on children and adults—leaving open critical questions surrounding earlier developmental origins of morality. This special issue presents an integrative collection of pioneering research in early moral cognition and behavior that fills this gap. This work investigates a range of timely and important questions surrounding the extents of early moral cognition and behavior, demonstrating that human infants and young children have an unmatched flexibility in their thinking and acting in the moral domain: within the first several years of life, moral representations are quite robust, flexible, and complex in nature. This work also sheds light on sources of variability in moral cognition and behavior, such as interactions in the home environment, a previously understudied topic. And finally, this research provides novel insights into continuities and discontinuities in moral behavior and cognition across ages (i.e., 4 months to middle childhood), populations (i.e., children with autism, children from non-Western countries), and species (i.e., dogs). This research employs a range of methodological techniques, such as pupillometry, behavioral experiments, and large-scale survey studies that span diverse theoretical approaches, including computational modeling and constructivism. In sum, the papers in this issue stress four main themes: the extents and boundaries of early moral cognition, diverse populations and approaches, factors that moderate moral thinking and action, and new theoretical frameworks for understanding moral cognition. Here, we address each of these themes in turn, and highlight how these papers demonstrate that early moral cognition and behavior, starting in early infancy and extending into early childhood, is highly flexible, shaped in important ways by various contextual and experiential factors, and continuous across cultures and development.

The first set of papers tackle important questions regarding the extents and boundaries of early moral representations by probing infants' reasoning about the social world. Existing work has established that very young infants are sensitive to nice and mean actions: they prefer those who help over those who hinder. After infants' first birthday, they demonstrate a similar sensitivity to fairness, preferring those who behave fairly to those who don't. However, research has yet to examine these two important dimensions of morality in tandem, leaving open critical questions about the similarity in timeline of these traits, and whether infants' judgements are simply temporary evaluations, or whether they view these traits as enduring and stable behavioral dispositions. Surian et al. demonstrate that by 14 months, infants expect individuals who have previously helped (as opposed to harmed) others to be fair in future interactions, demonstrating that infants link the domains of harm, help, and fairness, and may attribute moral "traits" to others. Previous research on fairness expectations in the first year of life has yielded mixed results: some research has found that young infants expect third parties

#### Edited by:

Carmelo Mario Vicario, University of Messina, Italy

Reviewed by: Chiara Lucifora, Università Degli Studi di Messina, Italy

> \*Correspondence: Kelsey Lucca KLUCCA@ASU.EDU

#### Specialty section:

This article was submitted to Developmental Psychology, a section of the journal Frontiers in Psychology

Received: 12 August 2019 Accepted: 19 August 2019 Published: 20 September 2019

#### Citation:

Lucca K, Hamlin JK and Sommerville JA (2019) Editorial: Early Moral Cognition and Behavior. Front. Psychol. 10:2013. doi: 10.3389/fpsyg.2019.02013

**4**

to act fairly, whereas other research has not. In a series of four experiments, Dawkins et al. resolve these disparate findings by demonstrating the precise conditions under which early fairness expectations exist: 4- and 9-month-olds are sensitive to fairness, but only when distributions are small and markedly different from each other, highlighting that although fairness expectations emerge early in development, there are also important limits to these expectations. Taborda-Osorio et al. further probe the extents of infants' sociomoral representations by asking whether infants perceive sociomoral dispositions as a deep and identitydetermining features. Using an object individuation task, they find that infants interpret sociomoral actions (i.e., helping, hindering) as stable behavioral dispositions. Together, this research moves beyond past work by showing infants' fairness understanding emerges earlier than previously thought, and is flexible and cohesive across domains—highlighting that infants' judgements about the moral behavior of others are not just fleeting evaluations, but a true understanding of the behavioral dispositions that underlie the actions of others.

Historically, the field of early moral cognition and behavior has been dominated by research with children from western, educated, industrialized, rich, and democratic (WEIRD) societies, raising important questions about universality of early moral cognition and sources of variability. The research in this collection alters the course of this narrative by working with understudied populations. In an experiment examining patterns of attention to prosocial events, Hepach and Herrmann find important continuities across cultures and ages: children from 3 to 9 years in both Germany and Zambia show similar pupillary responses to helping scenarios, and process social information similarly: they are better equipped to anticipate the solution to social (compared to non-social) problems. Chernyak et al. also investigate moral cognition and behavior in Zambian children, and similarly find important cross-cultural similarities: across cultures, rates of prosociality are scaled to the cost of the action. They also identify a range of cultural factors that contribute to individual differences in moral cognition, such as parental perception of inequality. The field, prior to this special issue, has also been limited in that conclusions typically rest on experiments conducted with neurotypical children. Dunfield et al. tackle this issue by studying children diagnosed with Autism Spectrum Disorder (ASD) and find that children with ASD engage in similar levels of helping and sharing as typically developing (TD) children. However, children with ASD are less inclined to engage in prosocial behaviors when the cost of acting is high—thereby emphasizing that social cognition and social motivation combined are critical features of prosocial behavior across diverse groups.

The articles presented in this collection also diversify the field by utilizing novel approaches. Using a large-scale online survey, Hammond and Brownell map the developmental trajectory of early helping behaviors and demonstrate that children's earliest helping behaviors are driven by social engagement, praise, and fun, and that these motivations differentiate and expand across development to also include more altruistic motives. McAuliffe et al. take a comparative approach and ask whether domestic dogs, similar to human infants, form social evaluations based on third party interactions. Unlike human infants, who prefer helpers over hinderers from a very early age, dogs do not show any preference. In this way, human infants have an unmatched flexibility in their early moral cognition.

The last set of empirical papers explore a range of factors that moderate morally-relevant behavior and cognition. Prior to this collection, little was known about the relative weighting of different factors in moral-decision making at different stages of development. The papers by Van de Vondervoort et al. and Fedra and Schmidt illustrate that intentionality plays a fundamental role in early social reasoning. Van de Vondervoort et al. demonstrate that young children privilege intentions over outcomes when making moral judgements about helping and hindering agents. Fedra and Schmidt show that children's reasoning about the moral behaviors of others goes beyond actions that are intrinsically helpful and harmful, and extend to verbal actions that reveal intentions to help or harm, such as factual statements and assertions. This work highlights that the ability to inspect and appraise the moral consequences of what people say, and reason about the underlying intentional structure of actions, is an important feature of mature moral reasoning present early in life.

The papers by Lee et al. and Misch et al. demonstrate that group membership is another key factor involved in moral decision making. Their work illustrates that children treat both in-group and out-group members fairly, but will override fairness concerns in favor of group loyalty when resources are limited. Misch et al. examine how children navigate the tension between standing up for what's right and remaining loyal to a group: when the stakes for the group are low, after a minor transgression, children blow the whistle on both ingroup and outgroup transgressors—but when there stakes for the group are high, after a severe moral transgression, children are less likely to blow the whistle on an ingroup member.

Prior to this special issue, cohesive theoretical frameworks for explaining where prosocial tendencies come from and how they lead to prosocial actions were missing from the literature, making it difficult to interpret and make sense of empirical findings. The final set of papers, by Dahl and Killen and Bridgers and Gweon, offer novel theoretical perspectives on the origins of morality. In addition to providing a comprehensive and integrative definition of morality—"prescriptive norms concerning others' welfare, rights, fairness, and justice"—Dahl and Killen take a constructivist approach to interpreting the evidence on the developmental trajectory of morality, arguing that early morality is neither innate nor learned, but rather constructed through reciprocal interactions. Bridgers and Gweon also explore the question of why and how prosocial behaviors develop, with an eye toward explaining why certain behaviors tend to emerge earlier and with less prompting than others. They argue that deconstructing early prosocial behaviors into complex decisionmaking processes, and developing computational models that formalize these processes, can help elucidate the developmental trajectory of moral development.

Together, this collection highlights that human infants and children demonstrate an unmatched flexibility in their thinking and acting in the moral domain. This collection also points to constraints on early moral cognition and behaviors, and help elucidate the contexts in which these constraints exist—such as when group membership is at stake or when the processing load is too high. Though these papers make large strides in moving the field forward toward a more cohesive and stable representation of early morality, they also pose important questions and challenges for the field moving forward. For example, although much of the work presented here is suggestive of promising applications for fostering early moral concerns and behaviors, both the degree to which a "moral sense" is malleable and the long-term effects of early attempts at intervention remain unknown. Future work in this vein, coupled with the advancements presented in this collection, will help construct a more unified understanding of the origins and development of morality.

# AUTHOR CONTRIBUTIONS

KL drafted the editorial. JS and JH provided critical feedback. All authors contributed equally to editing this special issue.

**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Lucca, Hamlin and Sommerville. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# The Whistleblower's Dilemma in Young Children: When Loyalty Trumps Other Moral Concerns

Antonia Misch<sup>1</sup> \* † , Harriet Over <sup>2</sup> and Malinda Carpenter 1†

*<sup>1</sup> Department of Developmental and Comparative Psychology, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany, <sup>2</sup> Department of Psychology, University of York, York, United Kingdom*

#### Edited by:

*Kelsey Lucca, University of Washington, United States*

#### Reviewed by:

*Xiao Pan Ding, National University of Singapore, Singapore Ben Kenward, Oxford Brookes University, United Kingdom*

> \*Correspondence: *Antonia Misch antonia.misch@psy.lmu.de*

#### Present Address:

*Malinda Carpenter, School of Psychology and Neuroscience, University of St Andrews, St Andrews, United Kingdom Antonia Misch, Department of Psychology, Ludwig Maximilian University, Munich, Germany*

†

#### Specialty section:

*This article was submitted to Developmental Psychology, a section of the journal Frontiers in Psychology*

Received: *21 November 2017* Accepted: *15 February 2018* Published: *01 March 2018*

#### Citation:

*Misch A, Over H and Carpenter M (2018) The Whistleblower's Dilemma in Young Children: When Loyalty Trumps Other Moral Concerns. Front. Psychol. 9:250. doi: 10.3389/fpsyg.2018.00250* When a group engages in immoral behavior, group members face the whistleblower's dilemma: the conflict between remaining loyal to the group and standing up for other moral concerns. This study examines the developmental origins of this dilemma by investigating 5-year-olds' whistleblowing on their in- vs. outgroup members' moral transgression. Children (*n* = 96) watched puppets representing their ingroup vs. outgroup members commit either a mild or a severe transgression. After the mild transgression, children tattled on both groups equally often. After the severe transgression, however, they were significantly less likely to blow the whistle on their ingroup than on the outgroup. These results suggest that children have a strong tendency to act on their moral concerns, but can adjust their behavior according to their group's need: When much is at stake for the ingroup (i.e., after a severe moral transgression), children's behavior is more likely to be guided by loyalty.

Keywords: intergroup cognition, group loyalty, morality, whistleblowing, social cognition

# INTRODUCTION

During recent years, high profile cases of whistleblowing have garnered enormous public attention and caused controversy in politics and the international media. For example, recently, the former CIA contractor Edward Snowden, who revealed top-secret information about surveillance programs run by the US National Security Agency, was extensively both reviled and lauded in equal measure for being a whistleblower. Whistleblowing is the disclosure of one's own group's transgressions with the intention of stopping the group's wrongdoing, which necessarily involves an act of disloyalty against the group (see Jubb, 1999). Whistleblowers thus experience a dilemma in which they have to decide whether to act on their feelings of group loyalty or on other moral principles (Waytz et al., 2013). According to Haidt and colleagues, loyalty is one of five moral foundations (Haidt and Joseph, 2007) and requires preferential treatment for members of one's own group. In contrast, other moral concerns, such as fairness and care, demand equal treatment for all (Haidt and Graham, 2007). Thus, loyalty can involve sacrificing other moral principles to protect the group, while whistleblowing involves privileging these other moral concerns over loyalty. The consequences of whistleblowing for both the group and the whistleblower can often be severe. The group may be punished externally, and the whistleblower may be punished by the group as a traitor, and maybe even excluded or banned.

Surprisingly, the conditions under which people decide whether to blow the whistle on their group have not been extensively investigated. Research with adults has examined the effects of

**7**

factors such as the interests of the group and role responsibility (Trevino and Victor, 1992), or monetary incentives and legal protections (Oh and Teo, 2010). Other studies have focused on whistleblowing in interpersonal rather than group contexts (e.g., Gino and Bazerman, 2009; Bocchiaro et al., 2012; Waytz et al., 2013). Only a small amount of research has directly investigated the effects of morality and loyalty concerns on whistleblowing. A set of studies conducted by Waytz et al. (2013) suggests that participants' willingness to blow the whistle on another person is predicted by their endorsement of fairness over loyalty concerns. They also found that participants' willingness to blow the whistle decreases with closeness between the participant and the transgressor. It is not yet clear, however, what happens when loyalty and other moral concerns are directly pitted against each other in a group context. Furthermore, a common feature of previous research is that it has assessed participants' predictions of how they might act if faced with this dilemma. But evidence suggests that participants' predictions can diverge from their actual behavior. For example, in a study conducted by Bocchiaro et al. (2012), a large majority of participants predicted that they would blow the whistle on an unethical request, but only a small minority actually did so when put to the test, stressing the importance of investigating the whistleblowing dilemma in a behavioral set-up.

Developmental research has shown that both components of the whistleblower's dilemma, feelings of group loyalty and other moral concerns, are present early in childhood. At least by 5 years of age, children clearly value loyalty to the group: They favor loyal over disloyal group members (e.g., Abrams et al., 2003, 2008; Misch et al., 2014). They also show loyal behavior themselves, even when it is costly for them to do so (Misch et al., 2016). Young children are also sensitive to other basic moral principles. For example, from the age of 3 years, children actively intervene in moral transgressions in which a third party has been harmed (Rossano et al., 2011; Vaish et al., 2011), give more resources to an individual who behaved in a morally good way (Kenward and Dahl, 2011), and avoid helping people with harmful intentions (Vaish et al., 2010). They are also concerned with fairness, for example they prefer a fair to an unfair distributor in a third party context (e.g., Shaw et al., 2012).

However, to our knowledge, no study has directly investigated the conflict between loyalty and other moral considerations in an intergroup context in young children. The few studies that have investigated the related issue of the interplay between ingroup favoritism<sup>1</sup> and fairness in children have found mixed results. DeJesus et al. (2014) found that in a third-party context, at least from 6 years of age, children expect others to favor their ingroup, but evaluate fair distributions as nicer. However, when evaluating their own ingroup members' resource distributions between groups, Cooley and Killen (2015) found that 3.5- to 6-yearold children value fairness over group considerations, whereas Jordan et al. (2014) found that 6- and 8-year-old children tended to decide whether to punish unfair distributors based on group membership, and in doing so, sacrificed moral considerations that demand equal treatment for all (Rhodes and Chalik, 2013).

The studies that have come closest to investigating the conflict between loyalty and other moral concerns are studies on so-called "blue lies"—the opposite of whistleblowing—that is, lies that are told to protect someone else. Several studies have investigated children's evaluations of blue lies in story vignettes and found that with age, children evaluate blue lies to cover up the ingroup's transgression more positively (e.g., Sweet et al., 2010; Lau et al., 2013; Chiu Loke et al., 2014; Fu et al., 2016). To our knowledge, only one behavioral study has directly focused on children's blue lies by asking participants to report their own group's wrongdoing. Fu et al. (2008) tempted class groups of 7- to 11 year-old Chinese children to cheat in a competition by allocating more expert players to their team than were allowed. Afterwards, an uninvolved experimenter asked children in a confidential oneto-one situation whether their team really played by the rules. The majority of children confessed their team's transgression and thus acted according to their moral considerations rather than their feelings of loyalty. However, this study did not include an outgroup comparison so it is not known whether children would have been even less likely to lie for an outgroup. It thus still remains open how children would weigh moral and loyalty concerns when deciding what to do about an ingroup vs. an outgroup member's transgression.

A promising approach to study this conflict is to look at children's tattling behavior. The terms tattling and whistleblowing are often used interchangeably (see e.g., Waytz et al., 2013), but one important distinction will be made here: While tattling can be used rather generally and independently of group membership or affiliations (see e.g., Ingram and Bering, 2010), whistleblowing refers specifically to tattling about one's own organization or group (e.g., Jubb, 1999). For children, tattling is a frequent and natural way of dealing with others' transgressions and misbehavior. Young children do not perceive tattling as negative and thus frequently tattle on peers in school (Ingram and Bering, 2010), on their siblings (Den Bak and Ross, 1996), on puppets in experimental settings (Vaish et al., 2011; Schmidt et al., 2012), and even on adults' transgressions (Heyman et al., 2016).

To investigate the origins of the whistleblower's dilemma in young children, we thus study children's tattling behavior. Children observed either ingroup or outgroup members commit a moral transgression. Afterwards, an uninvolved experimenter entered the room and gave children the chance to spontaneously tattle before asking more direct questions. We expected that children would be more likely to tattle on the outgroup's than on the ingroup's transgression, because previous developmental research has shown that young children are loyal to their groups (Misch et al., 2016) and research with adults has shown that the closeness of one's relationship to the transgressor is negatively correlated with the likeliness to blow the whistle on him/her (Waytz et al., 2013). We chose to test 5-year-old children because this is the earliest age at which clear evidence exists that children both value loyalty to the group (Misch et al., 2014, 2016) and are concerned about moral transgressions (Blake and Harris, 2009; Rossano et al., 2011; Vaish et al., 2011).

<sup>1</sup>Note that loyalty to the group is more than simple ingroup preference, in that it entails a sense of commitment and often the willingness to sacrifice personal benefits for the sake of the group (see Brewer and Silver, 2000, p. 162).

Additionally, to investigate the conflict between loyalty and morality more deeply, we were also interested in the impact of the severity of the transgression. More specifically, we wished to examine whether and how loyalty and the severity of the moral violations would interact. Results from previous studies looking at children's evaluations of and reactions to different types of transgressions are mixed. One line of research has found that 4 to 7-year-old children endorse tattling on both major and minor transgressions equally, and only from around 8–9 years endorse tattling on major more than on minor transgressions (Lyon et al., 2010; Loke et al., 2011; Chiu Loke et al., 2014; Heyman et al., 2016). However, in these studies children were simply asked to evaluate or predict vignette story characters' tattling behavior. Behavioral studies that have investigated children's own behavior following different types of transgressions have found that by 3 years of age, children differentiate between severe moral transgressions and more minor conventional violations in that they protest more strongly when someone destroys the possession of another person compared to when someone plays a game incorrectly (Schmidt et al., 2012).

In the current study, the transgression was implemented in the form of a theft. Previous research has shown that from around 3 years of age, children understand the violation of property rights and protest against this (Rossano et al., 2011). At least by age 5, they understand the illegitimate nature of stealing (Blake and Harris, 2009). An advantage of using this type of transgression is that it allowed for a quantitative manipulation of severity: Children in the mild transgression condition observed two puppets take only a little bit of someone's possession (i.e., 1 out of 10 gemstones), while children in the severe transgression condition observed these puppets take nearly all of that resource (i.e., 9 out of 10 gemstones). For children for whom these two puppets were outgroup members (outgroup condition) we expected generally high levels of tattling in both transgression conditions (although they might tattle more in the severe transgression condition). Children should not feel any loyalty to the outgroup members, and therefore should act according to their moral considerations and, consequently, tattle. For children for whom the transgressors were ingroup members (ingroup condition), observing the mild vs. severe transgression should also elicit mild vs. severe moral considerations; however in this case these considerations should conflict with loyalty considerations. We expected that children's feelings of group loyalty would make it more difficult for them to blow the whistle on their ingroup members. There were two different possible ways in which the severity of the transgression might influence their behavior.

The first possibility was that in the mild condition, compared to their feelings of loyalty, children's moral considerations should be relatively low, and consequently children might act according to their feelings of loyalty and keep quiet about their group's transgression. After the severe transgression, however, moral considerations should outweigh feelings of loyalty, and thus children might act on their moral considerations and blow the whistle on their group.

The social psychological literature with adults suggests a second possibility. According to a nonabandonment norm, group members should stick to their group in all circumstances (Zdaniuk and Levine, 2001), but especially in situations in which it is needed most (e.g., because the group is under threat; see Ellemers et al., 2002; Van Vugt and Hart, 2004). Indeed, some evidence supports the notion that threat to the group increases group cohesion or ingroup bias (Turner et al., 1984; Hunter et al., 2005), and that after undergoing negative experiences, group members feel more fused with each other (e.g., Jong et al., 2015) and show more pro-group behavior (Swann et al., 2014). If this is the case, then children should keep quiet after their own group's severe transgression, as otherwise the group members would have to face punishment or other negative consequences. After a mild transgression, in contrast, potential negative consequences should be relatively minor and not harm the ingroup much; therefore children could act according to their moral considerations and blow the whistle.

# METHODS

# Participants

Participants were 96 5-year-old children (48 girls and 48 boys, age range 5 years; 27 days – 5 years; 9 months, 9 days; M = 5 years, 6 months). The number of participants (24 per condition) was specified in advance based on previous research (Misch et al., 2016). Twenty-two additional children were tested but excluded for failing one of the critical control questions that tested whether they understood the procedure (i.e., failing to correctly say which group they were in [1], failing to correctly say which group the transgressors were in [4], or failing to remember whether one vs. many gemstones were taken away [8]), or for experimenter error (5), not responding at all (1), leaving the room during the procedure (2), or naming one of the transgressors after herself (1).<sup>2</sup>

Children were recruited and tested in their daycare centers in a mid-sized city in Germany. The test session took approximately 20 min. No SES or ethnicity data were collected, but approximately 98% of the population from which the sample was drawn are native German. The study was developed and conducted in accordance with ethical guidelines and was approved by the institution's ethics committee (Max Planck Institute for Evolutionary Anthropology Child Subjects Committee).

# Materials

We used puppets as in- and out-group members because previous work has shown that children are willing to tattle on puppets' transgressions (e.g., Vaish et al., 2011). Children were tested by three female experimenters: a moderator (M) and two puppeteers (E1 and E2). Each puppeteer played one female and one male hand puppet (see **Figure 1**). The two puppets played by E1 were the transgressors. In the ingroup condition the child was allocated to the same group as the transgressors;

<sup>2</sup>For 21 of these excluded children it was still possible to obtain a tattling score. Adding them to the main analysis did not change the results [full-null model comparison: χ 2 (3) <sup>=</sup> 8.04, <sup>p</sup> <sup>=</sup> 0.045; interaction between group membership and transgression type: Estimate = 2.84, SE = 1.14, χ 2 (1) <sup>=</sup> 7.08, <sup>p</sup> <sup>&</sup>lt; 0.01].

in the outgroup condition the child was allocated to the other group.

A set of green and yellow scarves (two puppet-sized scarves and a child-sized scarf in each color; see **Figure 1**) were stored in a box with a lid. Ten fake red gemstones were used as spoils (see **Figure 2**). They were hidden in a small purse located on a box on the left side of the room (approximately 2 m away from the door).

There was a low cardboard barrier (30 cm in height) on the other side of the room. Thirty large marbles and a marble bag were used to keep children occupied and in place before and during the transgression, and a marble run was used for the preference test at the end.

# Design and Counterbalancing

Children were tested in a 2 × 2 between-subjects design. We manipulated the transgressors' group membership and the transgression type: Transgressors were either in the child's in- or out-group, and took either a little (only one out of 10 gemstones in the mild condition) or a lot (nine out of 10 gemstones in the severe condition).

Across children, we counterbalanced the color of the child's group (so that half of the children in each condition were in the yellow group, and the other half were in the green group) and the color of the transgressors' group (so that half of the time they were in the yellow group, and half of the time they were in the green group).

# Procedure

Children were picked up from their classroom individually by all three experimenters. At the start of the procedure, there was a brief warm-up phase during which children became acquainted with the experimenters and the four puppets that would later be allocated to groups. First, the moderator (M) introduced the child to the puppets and then asked the puppets to introduce themselves. Following this, M asked the child and each of the puppets two questions to engage them in a brief conversation (e.g., about what they had had for breakfast, or which parent dropped them off at the daycare). This was done in order to make the child feel comfortable in the situation and to establish that the puppets should be treated as if they were real individuals around the same age as the child.

### Group Allocation

After the warm-up, M allocated the child and the four puppets to groups. She did this by saying, "Today, we need two different groups. We will have a yellow group and a green group. First of all, we need to know which group everyone belongs to." M then picked up the box and explained that in this box there were yellow and green scarves, and that she would now pull out one scarf for each of them, thereby finding out which group they belonged to. Then, one by one, she allocated each of the puppets and the child into groups by apparently randomly drawing yellow and green scarves out of the box and placing them on each individual's neck. Group allocation always started with one of the child's ingroup puppets, then proceeded to an outgroup puppet, then to the child, the other outgroup puppet, and finally the other ingroup puppet.

### Transgression

After the group allocation, M said that next they would need the marbles that were lying on the floor behind the low barrier in one corner of the room. She noticed that the marble bag was missing and asked the child to come with her to look for the bag outside of the room. This was an excuse so that E1 and E2 could leave the room unseen and wait in an adjacent room. When M and the child returned with the marble bag, M pretended to be surprised that the others were missing and asked the child to put all the marbles into the bag while she looked for the others outside. The task of putting the marbles into the bag was given to children so that they would be occupied with a simple activity on one side of the room, but would still be attentive enough to observe the transgression. While the child was busy picking up the marbles, the two puppets played by E1 entered the room. Depending on condition, they were either in the same group as the child (ingroup condition) or in the other group (outgroup condition). They recognized and greeted the child very briefly, before turning to each other and ignoring the child. The male puppet then said, "Look, there is the purse! Maybe there are gemstones in it again, and we could

FIGURE 2 | The ten gemstones used as spoils.

take some again!" To make sure that children understood that the puppets were not entitled to take the gemstones, the other puppet was skeptical and pointed out that the gemstones did not belong to them. In order to convey the idea that this was something this group did regularly, the first puppet said "But we are members of the yellow/green group, and the yellow/green group always does it like that!" The female puppet then replied, "Ok, then let's have a look. But let's be quick and quiet, so that no one will catch us!" They then opened the purse and admired the gemstones. Depending on the condition, they took either one (mild transgression) or nine of the ten gemstones (severe transgression). In the mild condition they said to each other, "Let's take only a little bit, only one gemstone. There are still many left, certainly no one will notice!" In the severe condition they said, "Let's take a lot of them, nearly all the gemstones. There is still one left, certainly no one will notice!" After they put the gemstone(s) into their purse, M called them from the outside, "[Transgressor puppets' names], where are you?" The puppets replied to M, "We are coming," and then said to themselves, "Let's leave quickly, so that no one will catch us!" Finally, before leaving the room, they asked the child to wait inside.

### Tattling Opportunity

Then, M entered the room and gave the child the chance to tattle. In order to assess how quickly and spontaneously children tattled, she used a stepwise, ramping-up procedure with a 5-s pause in between each step to give children time to tattle. She first started with very general comments (Step 0: "I'm back" and Step 1: "Is everything okay?"), and then gave some hints that something was amiss while looking at the bag (Step 2: "What is going on here?" and Step 3: "What did I miss?"). She then asked more directly about the bag (Step 4: "There is a bag. Someone must have forgotten it..." and Step 5: "The zipper is open. Maybe someone took something out?"). In the final step she finally suspected the puppets directly (Step 6: "I think I just saw [names of transgressor puppets] leave, maybe they took something?"). If the child did not respond at all during a given 5-s response period, or only said something unrelated (e.g., just talked about the marbles that they had picked up), M moved on to the next step. For children's statements to qualify as tattling, they had to make it clear that someone had taken something away (for step 6 it was sufficient if they confirmed M's suspicion by saying "yes"). If children only gave a hint of this, M further encouraged them by saying "Uh huh, tell me!" If children correctly described what had happened but failed to name the transgressor(s), M asked "And who?" Following that, children had another 5 s until, if needed, M moved on to the next step. To minimize social pressure on children, M looked only briefly at them and then continued to inspect the scene. Thus, children were free to remain silent.

### Post-Test Measures and Resolution

To explore the motivation underlying children's behavior, we asked them some post-test questions about their justification for and evaluation of the transgression, their judgment of the transgressors, their own accountability (only for tattlers), their loyalty, and their group preference. Because these questions were exploratory, we did not push children to answer if they did not respond. As a consequence the number of no answer responses was relatively high and the results should be taken with some caution. Furthermore, grouping children depending on their tattling behavior led to small and uneven sample sizes in the different cells. Thus, for most of the measures, statistical analysis was not appropriate; therefore we report these results in the Supplementary Material.

## **Memory questions**

After the first set of post-test questions (but before the loyalty question), in order to make sure that children had followed and understood the procedure, M asked children three memory questions: "Which group did the two who took the stones belong to?," "How many stones did they take, many or just a few?," and "Which group do you belong to?"

## **Resolution of the situation**

After M's loyalty question, before the preference test, the two transgressor puppets re-entered the room. They were clearly upset by their wrongdoing, confessed everything to M, and apologized. M explained that taking away others' belongings is not okay and made them promise never to do anything like this again. Then the other two puppets came back and everyone played together with a marble run. Finally, children were thanked and taken back to their classroom.

# Coding and Reliability

Our main interest was in whether children tattled on the puppets' transgression or not (saying, e.g., "They took the gemstones" or "I saw that they stole something"). Children's statements were coded as tattling if they made it clear that someone (e.g., "they," "the puppets," "the two," or using their names) had taken something away (e.g., "They took something," "They swiped the stones"). Only for Step 6 was it sufficient if they clearly confirmed M's suspicion by saying "yes").

In addition, for those children who tattled, we also investigated how quickly and spontaneously they tattled. For this analysis, children received a score between 0 and 7, corresponding to the step at which they tattled (e.g., they received 1 if they tattled at step 1, or 6 if they tattled at step 6). If they tattled before M's first hint, they received a 0, and if they did not tattle at all, they received a score of 7.

The main coding was done by the first author. To assess inter-rater reliability, an independent coder who was unaware of the hypotheses of the study coded a random sample of 25% of children for both measures together from the videos. Reliability (Cohen's weighted kappa) was perfect with κ = 1.00.

# RESULTS

All statistical analyses were performed using R (R Core Team, 2014) version 3.2.0. Significance of the models was tested using both likelihood ratio tests (LRT), by comparing the fit of the full model with that of the respective reduced models, and the p-values provided by the final model.

A preliminary analysis revealed no effects of children's gender or color group on the main results regarding children's tattling (General Linear Model, full-null model comparison, p > 0.25). Therefore, we collapsed across these variables and do not consider them further.

Our main interest was in how many children tattled about the puppets' transgression at any point during the test phase. Overall, across all four conditions, the majority of children tattled (82.3%), suggesting a general concern for harm. **Figure 3** depicts the percentage of children who tattled in each condition for each transgression type.

A GLM was run with group membership and transgression type as predictors, and the binomial measure of tattling (yes or no) as response variable. The full model differed significantly from the null model [χ 2 (3) = 8.14, p = 0.043] and revealed a significant interaction between group membership and

transgression type [Estimate = 3.05, SE = 1.36, χ 2 (1) = 6.26, p = 0.012, Nagelkerke's R <sup>2</sup> = 0.11]. Post-hoc tests revealed that children in the ingroup condition were significantly less likely to tattle on a severe transgression than were children in the outgroup condition (Fisher's exact test, p = 0.023, risk ratio = 2.17); all other pairwise comparisons were non-significant (Fisher's exact tests, all p's > 0.16).

To investigate whether the conditions had an effect on how quickly children tattled, we ran a GLM with Poisson distribution only on children who had tattled at some point (n = 79), with children's tattling score (0–6) as the response variable. The full model did not differ from the null model (p = 0.21), indicating that the conditions had no significant effect on how quickly children tattled (see **Figure 4**).

# DISCUSSION

The aim of this study was to examine the whistleblower's dilemma: the conflict between feelings of loyalty and other moral concerns. This was done by looking at children's willingness to blow the whistle on their in- vs. out-group members' mild vs. severe transgression. An interesting pattern of results emerged: Rather than simply tattling more on outgroup members across the board, children showed a complex weighting of loyalty and moral considerations. After the mild transgression, children acted on their moral considerations: They tattled on both groups at equally high rates. After the severe transgression, however, they were significantly less likely to blow the whistle on their ingroup than on the outgroup, suggesting that children's feelings of loyalty to the group sometimes outweighed other moral considerations. Consistent with the idea of a nonabandonment norm, these results support the notion that group loyalty becomes most important when much is at stake for the group, that is, that one should show the strongest loyalty when the group is under threat (Ellemers et al., 2002; Van Vugt and Hart, 2004). These results therefore suggest that young children are already capable of flexibly weighing moral and loyalty considerations and, in some cases, are willing to sacrifice their moral principles for the sake of their group.

There are a number of possible motivations that could have been underlying children's loyal behavior. We aimed to investigate these with the questions we asked children following the tattling phase. Unfortunately, these findings were underpowered due to low response rates (see Supplementary Material). Still, some of our and others' findings can shed light on the possible underlying motivations. For example, it is possible that children generally perceive transgressions of their ingroup members as less severe than transgressions of their outgroup members and that this led them to blow the whistle less often in the ingroup condition. Previous research has shown that children are more forgiving and forgetful when it comes to the negative behavior of their ingroup members (Corenblum, 2003; Dunham et al., 2011). However, in the mild condition of the current study children were equally likely to blow the whistle on their in- and out-group members, suggesting that this factor alone cannot explain the observed results. Another possibility

is that children might have wanted to protect their ingroup from the potential negative consequences (e.g., punishment) of their whistleblowing and/or avoid being punished themselves. Previous work has shown that children feel responsible for their group members' negative actions (Over et al., 2016), and consequently children's feelings of shame or embarrassment about their group members' transgression might have decreased their whistleblowing in the ingroup severe condition. Relatedly, some children might have been shocked about their group's transgression and thus too preoccupied to speak out about it. Future research should thus investigate the role of moral emotions such as guilt and embarrassment, and also the fear of negative consequences in the context of loyal behavior. Another potential reason for children's increased loyalty after the severe transgression is the fact that the group was now in possession of the stolen goods. Previous research has shown that children prefer wealthy over less wealthy groups (Horwitz et al., 2014) and show more loyalty to groups that are of high status (e.g., Nesdale and Flesser, 2001). However, children's justifications for why they wanted to stay in or leave their group suggest that this was not the reason for their choice in this situation: No child ever justified their choice to stay with or join the transgressors' group by mentioning the group's wealth, higher status, or the possession of the stones more generally, while the transgression was a common reason for joining or not leaving the non-transgressors' group.

In future research, it would be informative to investigate in which situations children are willing to override their moral concerns in order to remain loyal. It would be interesting to look at children's loyalty after even more different types of transgressions, including a wider range of severity and different kinds of moral violations. Also, if children were asked to choose between internal within-group protest (i.e., scolding and correcting ingroup members privately) and external tattling (i.e., telling someone outside the group more publicly), would their choice depend on their group membership?

In summary, our findings suggest that both loyalty and other moral considerations guide 5-year-old children's behavior. When moral concerns are relatively low, children act freely on them by tattling on the outgroup and even blowing the whistle on their own group. In contrast, when moral concerns increase, children's behavior is guided by their loyalty: They tattle freely on their outgroup, but are less likely to blow the whistle on their own group. Thus, already by 5 years of age, children consider both loyalty and other moral concerns together, and adapt their behavior flexibly. Even though they clearly understood the negative nature of the transgression, they were willing to sacrifice their personal moral concerns for the sake of their group. This is an interesting finding, given the fact that from very early on, children show a strong appreciation for key moral domains such as care and fairness (e.g., Hamlin et al., 2007; Hamlin and Wynn, 2011; Vaish et al., 2011), while robust preferences for minimal ingroups and clear loyal behavior do not appear much before the age of five (Dunham et al., 2011; Misch et al., 2016). Thus, right around the time that loyalty to the group first appears in ontogeny, it can already have a dark side, overriding other moral concerns. This can lead to rather undesirable behavior on the one hand, for example when it results in concealing moral transgressions of the ingroup. However, from the perspective of the group, it may be seen as desirable in that it helps ensure from early on that group members are trustworthy and protective of their group and thus that they can be counted on when most needed.

# AUTHOR CONTRIBUTIONS

All authors developed the study concept and design. Testing, data collection, and data analysis were performed by AM, who also drafted a first manuscript. MC and HO provided critical revisions. All authors approved the final version of the manuscript for submission.

# REFERENCES


# ACKNOWLEDGMENTS

We thank Melissa Cherouny, Luise Hornhoff, Claudia Salomo, Monique Horstmann, and Lara Wintzer for help with data collection and coding; Roger Mundry for statistical advice; and the day care centers and children for their friendly cooperation. This study was conducted as part of AM's dissertation (Misch, 2015).

# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyg. 2018.00250/full#supplementary-material


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Misch, Over and Carpenter. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Do Infants Attribute Moral Traits? Fourteen-Month-Olds' Expectations of Fairness Are Affected by Agents' Antisocial Actions

#### Luca Surian<sup>1</sup> \*, Mika Ueno<sup>2</sup> , Shoji Itakura<sup>2</sup> and Marek Meristo2,3

*<sup>1</sup> Department of Psychology and Cognitive Sciences, University of Trento, Trento, Italy, <sup>2</sup> Department of Psychology, Kyoto University, Kyoto, Japan, <sup>3</sup> Department of Psychology, University of Gothenburg, Gothenburg, Sweden*

We investigated whether and how infants link the domains of harm, help and fairness. Fourteen-month-old infants were familiarized with a character that either helped or hindered another agent's attempts to reach the top of a hill. Then, in the test phase they saw the helper or the hinderer carrying out an equal or an unequal distribution toward two identical recipients. Infants who saw the helper performing an unequal distribution looked longer than those who saw the helper performing an equal distribution, whereas infants who saw the hinderer performing an unequal distribution looked equally long than those who saw the hinderer performing an equal distribution. These results suggest that infants linked the hindering actions to a diminished propensity for distributive fairness. This provides support for theories that posit an early emerging ability to attribute moral traits to agents and to generate socio-moral evaluations of their actions.

# RESEARCH HIGHLIGHTS


#### Keywords: infants, fairness, distributive justice, moral development, moral judgment

The developmental origins of socio-moral evaluations may be seen well before the kindergarten age. At 15–24 months, infants prefer agents that distribute resources equally, rather than unequally (Geraci and Surian, 2011; Burns and Sommerville, 2014; Surian and Franchin, 2017a) and by 9–10 months they expect resources to be distributed equally (Meristo et al., 2016; Ziv and Sommerville, 2017). Infants look longer when they see an agent distributing the available resources unequally rather than equally to similar recipients (Sloane et al., 2012; Sommerville et al., 2013), and by 24 months their expectations are guided by agents' merit and group membership (Sloane et al., 2012; Surian and Franchin, 2017b; Bian et al., 2018) and are associated to their altruistic sharing of a preferred toy (Schmidt and Sommerville, 2011; Ziv and Sommerville, 2017; Sommerville, 2018). Infacts' reactions to distributive events are not due to perceptual factors or expectations about

#### Edited by:

*Kelsey Lucca, University of Washington, United States*

#### Reviewed by:

*Erik Cheries, University of Massachusetts Amherst, United States Yuyan Luo, University of Missouri, United States*

> \*Correspondence: *Luca Surian luca.surian@unitn.it*

#### Specialty section:

*This article was submitted to Developmental Psychology, a section of the journal Frontiers in Psychology*

Received: *20 May 2018* Accepted: *17 August 2018* Published: *07 September 2018*

#### Citation:

*Surian L, Ueno M, Itakura S and Meristo M (2018) Do Infants Attribute Moral Traits? Fourteen-Month-Olds' Expectations of Fairness Are Affected by Agents' Antisocial Actions. Front. Psychol. 9:1649. doi: 10.3389/fpsyg.2018.01649*

**16**

agents' affiliative behavior (Meristo et al., 2016). They expect fair and unfair agents to be differently praised and admonished (DesChamps et al., 2016) or rewarded and punished (Meristo and Surian, 2013, 2014).

The early emergence of moral cognition is also revealed by studies on how infants react at agents that help or hinder others. Infants were presented animated scenarios or stage shows with agents that helped or hindered others' attempts to reach the top of a hill (e.g., Kuhlmeier et al., 2003; Hamlin et al., 2007, 2013), or live puppet shows involving a character that helped or prevented another agent from opening a box, or that returned a ball to someone who dropped it, rather than running away with it (e.g., Hamlin and Wynn, 2011; Hamlin, 2013). Infants' preference for helpers has been found in many studies, using a variety of stimuli and procedures (for a meta-analysis, Margoni and Surian, 2018). They also spontaneously help others, suggesting an understanding and concern for others' goals, intentions and needs (e.g., Dunfield and Kuhlmeier, 2010).

While there is now a growing body of evidence revealing infants' ability to evaluate hindering, helping and distributive actions, we do not know yet if they are also able to represent any link between these domains of actions. In this study we presented infants with agents that first helped or hindered another agent and then saw the helper or the hinderer performing either a fair or an unfair distribution. If infants take agents' helping or hindering behavior as a cue to their propensity to be fair or unfair, they should react differently to the distributive actions performed by the helper and the hinderer.

# METHODS

# Participants

Thirty-two healthy full-term infants from Japanese-speaking families participated (age range: 14 months 19 days −15 months 25 days; M = 15 months 4 days; 16 female, 16 male). An additional 8 infants were tested but excluded from the sample because they were inattentive during the familiarization phase (n = 4), inattentive during the test phase (n = 3), or because of technical error (n = 1). Sample size was determined by similarity with most previous relevant studies. The infants recruited for the experiment were registered with Kyoto University Infant Research Fellow Program. We contacted their parents, explained them the outline of the experiments and their main purpose and obtained their written consent. The study was conducted according to Code of Ethics and Conduct of The Japanese Psychological Association.

# Materials and Procedure

During the test session infants were seated on the parent's lap in a dimly lit and quiet booth 50–70 cm away from a 17-in.-monitor used to display the stimuli. The caretakers were asked to turn their head away from the screen and not to communicate with the infants during the testing. Infants' looking behavior was recorded and analyzed using a Tobii T60 (Tobii Technology, Sweden) corneal reflection near infrared Eye Tracker. Each testing session began with a 5-points infant calibration procedure.

Sixteen infants were assigned to one of two conditions, the helper and hinderer conditions. In both conditions infants saw four familiarization events followed by one test event. In the familiarization phase all infants saw two "helper events" and two "hinderer events." In the helper familiarization events, infants first saw an agent, the "climber," entering the screen from the right side at the bottom of a hill (see **Figure 1**). The climber then climbed to the lower plateau, and rotated itself slightly for 2 s, then attempted twice to reach the upper plateau, each time falling back to the lower plateau. Then the helper entered the display from the lower right, moved up the incline and bumped the climber twice, each time pushing it farther up until the climber reached the upper plateau. The climber then remained stationary at the top of the hill, while the helper moved back to the bottom of the hill and left the screen. In the "hinderer familiarization event," the hinderer entered from the upper left, moved downward and bumped the climber twice, each time pushing it farther down. The climber then remained stationary, while the hinderer moved back on top of the hill and then exited.

The familiarization phase was followed by the test phase. In the helper condition infants saw the helper distributing two strawberries to two identical green stars-shaped characters (see **Figure 2**). The test event started with the two stars present, one on the left side and one on the right side in the upper part of the screen. Then, the helper entered from the right or the left side carrying two red strawberries and gave them to the stars. Half the infants saw an equal distribution and the other half saw an unequal distribution. At the end the helper stayed in the middle of the screen.

The hinderer condition started with the same familiarization phase used in the helper condition, but in the test phase the distributor of the strawberries was the hinderer instead.

We fully counterbalanced across the participants: (1) identity of the helper/hinderer (circle vs. triangle), (2) order of familiarization events (Help-Hinder-Hinder-Help vs. Hinder-Help-Help-Hinder), (3) side of the delivery of the first strawberry in the test event (Left vs. Right), and (4) test event (Equal vs. Unequal distribution), resulting in 16 different sessions. Infants had to follow at least three familiarization events to be included in the final analyses. The dependent measure was the time the infant spent looking at the still picture at the end of the test movie, until he or she looked away for at least 2.5 consecutive s, after having looked for at least 2.5 s.

# RESULTS

Preliminary analyses assessed the effects of order of familiarization events (Help, Hinder, Hinder, Help vs. Hinder, Help, Help, Hinder) and identity of the helper and the hinderer (Square vs. Triangle), and found they had no main effect on looking times at the test trials, nor there was a significant interaction between such factors and the type of test event (equal vs. unequal distribution).

Looking times in the final test event were analyzed in a 2 × 2 ANOVA with condition (helper or hinderer) and test event (equal

FIGURE 2 | Selected frames from the test events with equal or unequal distribution.

or unequal) as between-subject factors. The analyses showed a main effect for test event, F(1, 31) = 7.08, p = 0.013, η <sup>2</sup> = 0.20, and significant condition x test event interaction effect, F(1, 31) = 10.00, p = 0.003, η <sup>2</sup> = 0.28.

Planned contrast revealed a significant difference, with longer looks at unequal test events (M = 26.60 s, SD = 9.94) compared to the equal test events (M = 9.35 s, SD = 4.92), t(14) = 4.40, p = 0.001, η <sup>2</sup> = 0.58, in the helper condition, but not in the hinderer condition (Equal: M = 17.92 s, SD = 10.76; Unequal: M = 16.04 s, SD = 5.27), t(14) = 0.45, p = 0.663, η <sup>2</sup> = 0.01).

# DISCUSSION

Infants were first presented with agents that carried out either a helping or hindering action and then they saw the same agents performing a fair or an unfair distribution. We found that infants looked longer at the unfair compared to the fair distribution performed by the helper, but looked equally long at the equal and unequal distributions performed by the hinderer.

This suggests that in the helper condition infants generated and maintained the default expectations about agents' fairness that have been shown in several previous studies (e.g., Schmidt and Sommerville, 2011; Sloane et al., 2012; Meristo et al., 2016; Ziv and Sommerville, 2017), but they canceled such expectations in the hinderer condition. The fact they looked equally long at the two types of distributions performed by the hinderer suggest that they did not generate an expectation opposite to the default expectation. At present, we do not know why this is the case. We suggest that infants may refrain from generating negative expectations about the agents' future actions and this bias could be the root of a phenomenon recently discovered in the adult literature, namely the bias to represent agents as possessing morally virtuous selves (De Freitas et al., 2017). This gives raise also to an alternative explanation for the present results: suppose that infants expected, by default, that agents would act helpfully toward other agents. When they saw the helper, infants left their default expectations unchanged. By contrast, when they saw the hinderer, they may have tagged that agent as one that behaves inconsistently. This alternative account differs from the one we proposed at the beginning because it is not committed to inconsistency just in morally valenced behavior, but in behavior more generally.

The present results support the claim that infants may be able to attribute a goodness trait linking the domains of fairness and prosocial actions. An ERP study that employed the same stimuli used here suggests that this tendency is preserved in adults (Ishikawa et al., 2017).

How deep and stable is this representation? One possibility, the "early concept view," is that infants have already developed a rudimentary concept of good agent that includes features about

agents' helping attitudes as well as their propensity to behave fairly (Uhlmann et al., 2015). An alternative possibility, the "simple mismatch view," claims that the present results were simply driven by the mismatch in the values attributed to the actions performed by the helping agent in the familiarization and test phases, with no role played by prior expectations about how an helping agent will or will not behave in a distributive context. The present findings are consistent with both of these interpretations. Note, however, that the simple mismatch view would predict significant results also in the hinderer condition. By contrast, the early concept view does not make such prediction since the features used to diagnose agents' goodness and badness are different. Also, in both accounts the underpinned processes require an attribution of opposite values to helping and unfair actions, consistently with current proposals on infants' sociomoral competence (Baillargeon et al., 2015).

Further studies are needed to see whether the present results generalize to other instances of helping/hindering actions and infants' inferences can run from observing distributive behavior to expecting helping or hindering actions. It would also be interesting to test whether the same results can be found if infants, in the familiarization phase, do not see both a helper and hinderer, but just one of these two agents. This would be helpful in deciding whether they need to see both types of characters in order to evaluate them and generate behavioral expectations. Other crucial goals for future studies would be to investigate the duration of memories about agents' pro- and anti-social tendencies.

The ability to rely on information about agents' hindering or helpful actions to generate expectations about their distributive

### REFERENCES


fairness has, potentially, far-reaching implications. Most importantly, it suggests that infants display an early ability to attend and evaluate actions in order to construct a stable socio-moral representation of agents. This ability may provide the initial basis for the acquisition of an explicit conception of moral goodness.

# ETHICS STATEMENT

The study was conducted according to Code of Ethics and Conduct of The Japanese Psychological Association. The research project was approved by the Ethical Committee of the University of Trento. The parents of the infants who participated in the experiment gave their written consent.

## AUTHOR CONTRIBUTIONS

LS, MM, and SI designed the study; MM prepared the experimental materials; MM and MU carried out the data collection and the statistical analyses; LS and MM wrote the first draft of the manuscript; SI and MU provided revisions.

# ACKNOWLEDGMENTS

We would like to thank all the parents and babies that took part in this study. This research was supported by grants from Japanese Society for Promotion of Science (JSPS, 25245067 & 25240020) to SI and by a grant from the Italian Ministry of Education, University and Research (2009LNJ2AP).


Ziv, T., and Sommerville, J. A. (2017). Developmental differences in infants' fairness expectations from 6 to 15 months of age. Child Dev. 88, 1930–1951. doi: 10.1111/cdev.12674

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Surian, Ueno, Itakura and Meristo. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Preschoolers Favor Their Ingroup When Resources Are Limited

Kristy Jia Jin Lee<sup>1</sup> , Gianluca Esposito1,2 and Peipei Setoh<sup>1</sup> \*

<sup>1</sup> Psychology, School of Social Sciences, Nanyang Technological University, Singapore, Singapore, <sup>2</sup> Department of Psychology and Cognitive Science, University of Trento, Trento, Italy

The present study examined how 2- to 4-year-old preschoolers in Singapore (N = 202) balance fairness and ingroup loyalty in resource distribution. Specifically, we investigated whether children would enact fair distributions as defined by an equality rule, or show partiality toward their ingroup when distributing resources, and the conditions under which one distributive strategy may take precedence over the other. In Experiment 1, children distributed four different pairs of toys between two puppets. In the Group condition, one puppet was assigned to the same group as the child while the other puppet was assigned to a different group using colored stickers in the No Group condition, no group assignments were made. Children's distributions were assessed for whether the toys were fairly (equally) distributed or unfairly (unequally) distributed in favor of either puppet. Experiment 2 was identical to the Group condition in Experiment 1, except that a third identical toy was introduced following the distribution of each toy pair. Distributions were separately assessed for whether the first two toys were fairly (equally) distributed or unfairly (unequally) distributed in favor of either puppet, and whether children distributed the third toy to the ingroup or outgroup puppet. Overall, the vast majority of children abided by an equality rule when resources were precisely enough to be shared between recipients, but distributed favorably to the ingroup member when there was limited resource availability. We found that fairness trumped ingroup loyalty except in resource distribution involving limited resources. Our results are consistent with findings from other resource distribution studies with preschoolers and similar studies measuring young infants' expectations of distributive behaviors in third-party observations. Taken together, there is evidence suggesting stability in the development of knowledge to behavior in the subdomains of fairness and ingroup loyalty.

Keywords: fairness, ingroup loyalty, resource distribution, moral cognition, early childhood

# INTRODUCTION

Two fundamental motivations underlie children's decisions about resource distribution: fairness and ingroup loyalty. Fairness and ingroup loyalty represent central themes in human evolutionary history (Choi and Bowles, 2007). Fairness as a guiding principle has shaped many human communities, ranging from food-sharing practices in hunter-gatherer settlements to egalitarian sentiments in contemporary societies (Fehr et al., 2008). At the same time, ingroup loyalty is visible in many spheres of social life and encompasses biases such as favoring one's group member

#### Edited by:

J. Kiley Hamlin, University of British Columbia, Canada

#### Reviewed by:

Nadia Chernyak, Boston College, United States Peter R. Blake, Boston University, United States

> \*Correspondence: Peipei Setoh psetoh@ntu.edu.sg

#### Specialty section:

This article was submitted to Developmental Psychology, a section of the journal Frontiers in Psychology

Received: 18 May 2018 Accepted: 30 August 2018 Published: 19 September 2018

#### Citation:

Lee KJJ, Esposito G and Setoh P (2018) Preschoolers Favor Their Ingroup When Resources Are Limited. Front. Psychol. 9:1752. doi: 10.3389/fpsyg.2018.01752

**21**

in economic decisions, and in its extreme manifestation, is reflected in prejudice and gross discrimination against people who do not share the same group identity as oneself (Everett et al., 2015).

Researchers have proposed a principle-based conception of moral reasoning built on innate, domain-specific moral knowledge (Haidt and Joseph, 2004; Premack, 2007; Baillargeon et al., 2014). According to this view, fairness and ingroup loyalty are core principles in human moral cognition. In the realm of developmental research, young children are thought to simultaneously weigh fairness considerations against social obligations toward their ingroup (Killen et al., 2006; Rutland et al., 2010). Extant literature suggests that infants as young as 2 years old possess a rudimentary understanding of moral principles that dictate fair and loyal behaviors. With relation to the fairness principle, a commentary by Sommerville (2018) highlighted cumulative evidence pointing to the early development of distributive fairness, in that infants expect fair resource distributions and evaluate agents according to the fairness of their distributions. With relation to the ingroup principle, infant studies have documented third-party expectations of ingroup support, such as the obligation to help and allocate limited resources to the ingroup (Jin and Baillargeon, 2017; Bian et al., 2018).

In the present study, we looked at how 2- to 4-year-old preschoolers balance concerns about fairness and ingroup loyalty in the context of resource distribution. Specifically, we investigated whether 2- to 4-year-old preschoolers would enact fair distributions as defined by equal distributions, or show partiality toward their ingroup when distributing resources, and the conditions under which one distributive strategy may take precedence over the other. Recent studies examined young infants' rank-ordering of fairness and ingroup loyalty in distribution scenarios where the two principles lead to opposing outcomes (e.g., Bian et al., 2018), however, these studies focus on expectations about distributive behaviors in third-party observations. Studies with older children, who can themselves participate in resource distribution, will help to shed light on the extent to which the same trends generalize from knowledge in early infancy to behavior later in development.

On the one hand, children demonstrate a strong preference for fairness. Preverbal infants expect others to act fairly (Schmidt and Sommerville, 2011; Sloane et al., 2012), and select fair distributors as social partners (Lucca et al., 2018). Sensitivities to fairness continue to strengthen over the course of development (Geraci and Surian, 2011; Sommerville et al., 2013; Deschamps et al., 2015; Ziv and Sommerville, 2016). By around 3 years of age, children react emotionally to unequal resource distributions (LoBue et al., 2011), identify egalitarian sharing as what one should do (Smith et al., 2013), allocate rewards based on an equality rule (Thomson and Jones, 2005), and negatively judge inequitable resource allocations (McCrink et al., 2010). An altruistic tendency to uphold fairness emerges at around 5 years of age, as children will protest unequal distributions of earnings from a joint effort, regardless of whether the affected individual is oneself or a third party (Rakoczy et al., 2016). Older children also enforce fairness at a cost to themselves, choosing to share their resources equally (Smith et al., 2013), sacrificing gains to punish selfish resource allocations (Jordan et al., 2014), discarding resources to avoid unequal distributions (Shaw and Olson, 2012), and endorsing resource allocations that are free of inequality, even when the inequality is beneficial to themselves (Fehr et al., 2008; Blake and McAuliffe, 2011; McAuliffe et al., 2013). Furthermore, in a study by McAuliffe and Dunham (2017), 6- to 10-year-olds proposed equal resource splits and rejected unequal offers in an ultimatum game, both when the other player was an ingroup member and when the other player was an outgroup member. Given that the enforcement of fairness norms in resource sharing was largely unaffected by group membership, this finding suggested that fairness may trump group loyalty in resource-related decisions.

On the other hand, children's fairness preferences are heavily modulated by group membership. Young children exhibit biases favoring those of the same race (Renno and Shutts, 2015; Qian et al., 2016), same gender (Weller and Lagattuta, 2014), and who speak the same language (Kinzler et al., 2007; Pun et al., 2018). In third-party distribution tasks, 3- to 6-year-old children were found to place ingroup loyalty before fairness by distributing resources more favorably to family or friends than to strangers (Olson and Spelke, 2008; Moore, 2009), and similarly, expect others to share more with friends than with disliked peers (Paulus and Moore, 2014). Shaw (2013)'s partiality account of resource distribution further postulates that one may use resource sharing as a cue to infer the strength of social relationship between distributor and recipients. In line with this account, 4 to 9-year-olds expected a distributor to be better friends (thus stronger ingroup status) with a recipient whom the distributor had allocated a larger quantity of desirable items compared to another recipient who received a smaller allocation (Liberman and Shaw, 2017). Additionally, various studies highlight an interplay between group affiliation and fairness expectations. An aversion to behaviors that perpetuate inequality was greater when the victim belonged to the child's ingroup than when the victim was an outgroup member (Fehr et al., 2008; Elenbaas et al., 2016); also, a social preference for fair distributors was observed only when the fair distributor was from a racial ingroup and when the disadvantaged recipient was of an outgroup race (Burns and Sommerville, 2014).

Notably, there is evidence of cross-cultural variation in children's fairness concerns during resource distribution. A study by Blake et al. (2015) found that by the age of 9–10 years old, children in Western societies began to abide by stringent fairness criteria which led them to reject even resource inequity that was advantageous to themselves, but this developmental trend was not observed in non-Western societies, where children would only reject disadvantageous resource inequity. In a separate study, Ugandan children chose to distribute an uneven number of items unequally between two anonymous recipients, in contrast to children in the United States who would rather throw the odd item away to maintain equality, revealing yet another crosscultural difference in fairness concerns (Paulus, 2015).

More interestingly, children's perception of fairness appears to differ across cultures. While 4- to 6-year-old preschoolers in China preferred equal distributions over distributions that

showed a consideration of recipient need (Chai and He, 2017), children in the United States prioritized recipient need. Fiveto six-year-old American preschoolers gave more resources to poorer recipients than to wealthy recipients who already had plentiful resources (Paulus, 2014; Elenbaas et al., 2016; Rizzo and Killen, 2016), suggesting that their concept of fairness encompassed rectifying existing inequalities by favoring the recipient with greater need for the resource. Similarly, while an equality preference dominated African children's distribution of spoils following a collaborative effort, children from Western societies distributed spoils from a collaborative effort unequally depending on the amount of contribution from each recipient (Schäfer et al., 2015), thereby indicating different levels of attention to merit in their notions of fairness.

The individualism-collectivism cultural distinction also contributes to the relative weight accorded to fairness versus ingroup loyalty. Fairness is classified as an "individualizing" principle that promotes the well-being of individual agents, while ingroup loyalty is classified as a "binding" principle that places collective group interests at the forefront, sometimes at the expense of those who exist outside a restricted social circle (Graham et al., 2009). In adult studies, people from Eastern countries were found to prioritize binding principles which support group interests over individualizing principles which cater to individual welfare, and rated transgressions related to ingroup loyalty as higher on moral relevance than people from Western countries (Graham et al., 2011; Kim et al., 2012).

While current research on resource distribution has primarily targeted children living in homogeneous populations, these findings do not accommodate the full range of experiences encountered by children living in more diverse populations. Singapore, the testing ground for the present study, is positioned at the cultural crossroads of the East and West, receiving strong influences from both individualistic values of fairness and collectivistic values of ingroup loyalty (Tan and Farley, 1987). The Singapore population is multi-ethnic, consisting about 74.3% Chinese, 13.4% Malay, 9.0% Indian, and 3.2% other ethnicities (Singapore Department of Statistics, 2017), with most children raised as simultaneous bilinguals proficient in English and a mother tongue. As such, there are significant deviations in Singapore's sociocultural circumstances from her Asian counterparts, rendering generalizations based on an East–West dichotomy less likely to be germane to Singapore. For instance, the multicultural community in Singapore comprises diverse ethnic groups living in harmony, supported by social policies that enforce norms of equality, inclusivity, and intergroup camaraderie. Since these aspects of the social environment could influence the development of egalitarian and parochial motivations, the question of how children in Singapore navigate concerns about fairness and ingroup loyalty warrants investigation.

In the present study, 2- to 4-year-old preschoolers in Singapore participated in an intergroup resource distribution task. The choice of sampling 2- to 4-year-olds was motivated by their ability to provide behavioral data, even though they are younger than what has been studied in the majority of work on behavioral equality. Children in this age group in Singapore have started to attend preschool and are thus regularly exposed to the dynamics of peer interactions which foster an appreciation of fairness and ingroup loyalty; in addition, classroom play often involves sharing toys, hence these children are well-acquainted with the act of giving and receiving resources. Moreover, children from 3 years of age have been found to engage in behavioral sanctions of harm transgressions (e.g., Vaish et al., 2011), suggesting that they not only understand moral concepts but are capable of acting in ways which reflect such an understanding.

Minimal groups were assigned to children and two animal puppets using colored stickers, such that one puppet belonged to the same group as the child (ingroup), while the other puppet belonged to a different group (outgroup). This minimal group paradigm has been successfully employed in past studies: in a study that utilized shirt color as the basis of group categorization, 5-year-olds displayed ingroup favoritism on a range of behavioral measures including explicit and implicit attitudes, expectations of reciprocity, and encoding of positive information (Dunham et al., 2011).

Participants in Experiment 1 were tasked to distribute different pairs of toys between the two puppets on four test trials. Toy distribution was compared between a Group condition and a No Group condition in which no groups were assigned, and distributions were assessed for whether the toys were fairly (equally) distributed or unfairly (unequally) distributed in favor of either puppet. Experiment 2 was identical to the Group condition of Experiment 1, except that we introduced a third identical toy following the distribution of each toy pair. This twopart distribution task allowed us to determine whether children would show ingroup loyalty when given the option to distribute a single limited resource to either an ingroup or an outgroup member.

# EXPERIMENT 1

# Methods

#### Participants

Participants were 92 typically developing children (43 males; Mean age = 3.09 years, SD = 0.45, range = 2.33–4.25 years). Consent forms were distributed at local preschools and children

TABLE 1 | Number of children in final and excluded samples, by age, condition, and experiment.


#An additional three children were excluded for non-responsiveness on third-toy distribution. <sup>+</sup>An additional two children were excluded for non-responsiveness on third-toy distribution.

whose parents gave consent participated in a short testing session. 82.6% of participating children were Chinese, 7.6% were Malays, 7.6% were Indians, and 2.2% were of other ethnicities. The ethnic composition of the sample is a close approximate of the overall ethnic composition in the Singapore population. An additional 20 children were tested but excluded due to failure to distribute items between puppets on at least three out of four test trials (n = 19) and interference from classmates (n = 1). Refer to **Table 1** for age distribution of final and excluded samples. The experiment was conducted in accordance with ethical guidelines and was approved by the institutional ethics review board at Nanyang Technological University.

### Design

The experiment was a between-subjects design with two conditions. Forty-six children were assigned to the Group condition (21 males; Mean age = 2.94 years, SD = 0.35, range = 2.42–4.25 years), and another 46 children were assigned to the No Group condition (22 males; Mean age = 3.24 years, SD = 0.50, range = 2.33–4.25 years).

#### Apparatus and Materials

A puppet stage was set up by mounting a rectangular wooden frame (105 cm wide × 75 cm high × 6.25 cm thick) upright on a table. The wooden frame was attached with strong adhesive Velcro to two weighted triangular blocks that held it securely in place. A black curtain covered the opening of the frame (95 cm × 65 cm). During the experiment, a puppeteer sat at the back of the puppet stage and was concealed behind the curtain. A camcorder was discreetly positioned to take video recordings for coding purposes.

Puppets were a tiger and two identical rhinoceroses made of furry fabric, each measuring about 25 cm × 12 cm × 8 cm. The puppets emerged from behind the black curtain at marked locations – the tiger puppet appeared alone in a central spot equidistance from both sides of the stage frame, while the rhinoceros puppets appeared together, about 35 cm apart from each other, on the left and right of the stage, respectively.

The unoccupied table space in front of the stage frame was used as a platform for placing toys during distribution trials. The participating child was seated on a chair in a central position approximately 15 cm away from the puppet stage, where they could easily reach and place toys in front of the puppets. Other materials included a gray hedgehog, a blue ball, a yellow rubber duck, a small red car, two toy corns, two blocks, two toy apples, two toy rabbits and big round stickers (red and blue; 11 cm in diameter). A schematic representation of the experimental set-up is available in **Figure 1**.

### Procedure

Children were tested one at a time at their respective preschools. Both experimenter and puppeteer were Chinese Singaporeans who spoke fluent English and Mandarin Chinese. The experiment was conducted in English with 71 participants and in Mandarin Chinese with 21 participants, depending on the child's preferred language.

## **Warm-up phase**

There were two warm-up trials for the child to practice give-and-take actions. First, the experimenter placed a toy hedgehog on the table and said, "Look at my toy! Do you want to see it? Here you go!" The child was then allowed to play with the toy for a few seconds before the experimenter requested, "Now can you give the toy back to me?" This process was repeated with a ball.

#### **Familiarization phase**

Next, there were two familiarization trials to familiarize the child with giving items to a puppet. A tiger puppet appeared in the center of the stage. The experimenter then introduced the puppet as Sam (if child was male) or Jessica (if child was female), took out a rubber duck, and said, "Look, I have a toy! I want to give it away! Can you help me give the toy away?" This process was repeated with a toy car.

## **Test phase**

Following familiarization, children took part in four test trials. In the Group condition, the experimenter gave the child a red or blue sticker. Two rhinoceros puppets, introduced as Matt and Adam (if child was male), or Amy and Katie (if child was female), appeared on the left and right of the stage. One of the puppets had a red sticker affixed to its front, while the other puppet had a blue sticker. The experimenter then pointed to the puppet of the same sticker color as the child and exclaimed that the puppet's sticker was red or blue, "Just like yours!" To establish group membership, the experimenter also remarked that the child and puppet were both "on the red/blue team!" Children's sticker colors were randomly assigned; puppets' sticker colors and positions (on the left or right of child) were counterbalanced across children. The No Group condition followed the same procedure except that children were not assigned any sticker color and there was no mention of them being on the same team as either puppet.

Next, the experimenter took out two toy corns and said, "I have some corns! I want to give them away! Can you help me give the corns away?" Once the child had distributed the corns, the experimenter put them away and repeated the instructions with three other toy pairs (blocks, apples and rabbits; in fixed order). Children had to distribute all the toys to the puppets and were not allowed to keep any toy for themselves or to discard any toy.

## Coding

Two independent observers coded children's distributions on each of the four test trials from video recordings. Disagreements between observers were rare, resulted from human error, and were resolved by having both observers watch the videos again. Inter-observer agreement on the final coding was 100%.

On each test trial, toy distribution was coded using the following coding scheme: a 1:1 split reflects a fair distribution as the two toys were divided equally between the puppets. In contrast, a 2:0 split reflects an unfair distribution as the child gave both toys to one of the puppets and none to the other. Non-valid responses include giving the toys back to the experimenter, placing the toys in between puppets, and inaction despite repeated prompts. Children who gave non-valid responses on three or more test trials were excluded from analyses (n = 19).

# Results

All statistical analyses were conducted with R statistical software (version 3.4.1; R Core Team, 2017). Each child had four data points, entered using a binary response term (1 = fair, 2 = unfair) for whether the child had distributed toys equally or unequally between the puppets on each test trial. Generalized linear mixed models were run on the data using the glmer function in R package lme4 (Bolker et al., 2009), and child ID was fit as a random effect in all models to account for repeated measures. To test if the inclusion of predictors resulted in a significantly better model fit to the data, likelihood ratio tests (LRT) were used to compare the full model to a null model with only child ID entered as a random effect; and where predictors emerged significant, to compare the full model to a reduced model with significant predictors sequentially dropped from the full model.

TABLE 2 | Estimates and standard error of fixed effects in generalized linear mixed models predicting children's distribution outcomes in Experiment 1.


Reference levels for categorical variables were set at default: condition= Group, gender = female, distribution outcome = fair.

Preliminary analyses confirmed that counterbalanced variables and language used for testing did not predict distribution outcomes, hence these variables were not included in subsequent analyses. The final model comprised the following predictors of interest: age in months, condition (Group or No Group), gender (female or male), and a two-way interaction between age and condition. A generalized linear mixed model yielded no significant predictor of distribution outcomes. The full model did not perform better than a null model [LRT, χ 2 (4) = 1.17, p = 0.88]. There was no significant effect of age (B = −0.36, SE = 0.53, p = 0.49), condition (B = 1.76, SE = 31.70, p = 0.96), gender (B = 1.19, SE = 3.00, p = 0.69), nor any interaction between age and condition (B = −0.07, SE = 0.97, p = 0.94). See **Table 2** for model output.

Further analyses were conducted to examine the specific distribution pattern within each condition. Proportion of test trials with fair distribution was calculated for each child by dividing the number of trials coded as fair, by the total number of completed trials. All children, except three of them who completed only one or two trials, provided valid responses on all four test trials. We report the aggregate results for all children, but the exclusion of children who did not complete all four trials would not change the results.

Two-tailed one-sample t-tests against chance (test value = 0.50) indicated that on average, children in the Group condition (M = 0.89, SD = 0.29) distributed fairly on a significantly greater proportion of trials than expected by chance, t(45) = 9.25, p < 0.001, d = 1.34, as did children in the No Group condition (M = 0.96, SD = 0.21), t(45) = 15.02, p < 0.001, d = 2.19. Results are depicted in **Figure 2**.

# Discussion

fpsyg-09-01752 September 15, 2018 Time: 9:56 # 6

In Experiment 1, children who distributed resources in an intergroup context did not employ a different distributive strategy from children who distributed resources in the absence of a salient intergroup context. Almost all children, regardless of whether group membership had been assigned to themselves and the recipient puppets, showed a robust tendency to distribute two toys equally between the two recipients rather than favor either recipient through unequal distribution, and this tendency was consistent across multiple trials.

Because the items in the distribution task were perfectly divisible between recipients, it remains unclear how children would distribute resources between an ingroup and an outgroup member when there is a limited quantity of items that does not permit equal distribution. For example, in a study by Olson and Spelke (2008), children who were asked to distribute resources on behalf of a doll consistently enacted fair distributions when given precisely enough resources for all recipients but favored the doll's ingroup members under conditions of resource scarcity (e.g., two items, four recipients).

A second consideration is that perhaps children did not demonstrate partiality in their distributions because the minimal groups were not sufficiently distinct and thus the puppets were not truly perceived as ingroup or outgroup members. Experiment 2 addressed this consideration using a two-part distribution task, such that children were required to distribute a third toy following the first two toys. In distributing the third toy, children had to make a forced choice between benefiting the ingroup or outgroup member. If children showed consistent ingroup favoritism on third-toy distribution, it was unlikely that the minimal group paradigm in Experiment 1 had failed to elicit clear group distinctions.

Experiment 2 was identical to the Group condition of Experiment 1, except that we introduced a third identical toy

following the distribution of each toy pair. We also increased the sample size from 46 participants in the Group condition of Experiment 1 to 110 participants in Experiment 2. The rationale for increasing the sample size was that we intended for two-toy distribution in Experiment 2 to serve as a replication for the Group condition in Experiment 1, where we found close to 90% mean proportion of fair trials. We wanted to confirm those results with a larger sample that would provide greater power.

# EXPERIMENT 2

# Methods

#### Participants

One hundred and ten children (58 males; Mean age = 3.21 years, SD = 0.49, range = 2.42–4.25 years) were tested at local preschools after obtaining parental consent. 85.5% of participating children were Chinese, 3.6% were Malays, 8.2% were Indians, and 2.7% were of other ethnicities. The ethnic composition of the sample is a close approximate of the overall ethnic composition in the Singapore population. Another 21 children were tested but excluded due to failure to distribute items between puppets on at least three out of four test trials (n = 19) and interference from classmates or teachers (n = 2). Refer to **Table 1** for age distribution of final and excluded samples. The experiment was conducted in accordance with ethical guidelines and was approved by the institutional ethics review board at Nanyang Technological University.

#### Apparatus and Materials

The same puppet stage and puppets in Experiment 1 were used. Materials were identical, except that there were three instead of two of each toy (corns, blocks, apples, and rabbits).

#### Procedure

The experiment was conducted in English with 85 participants and in Mandarin Chinese with 25 participants, depending on the child's preferred language. The procedure was identical to the Group condition in Experiment 1, except that on each test trial, after the child had distributed the first two toys, the experimenter took out an identical third toy and said, "I found one more (corn/block/apple/rabbit)! I want to give this one away too! Can you help me give this one away?"

### Coding

Children's distributions of the first two toys on each of the four test trials were observed and coded from video recordings using the same coding scheme in Experiment 1. In addition, each test trial was coded for whether the third toy was given to the ingroup or outgroup puppet. Non-valid responses on three or more trials resulted in exclusion from analyses (n = 19 for two-toy distribution; an additional n = 5 for third-toy distribution). Like in Experiment 1, disagreements between observers were rare, resulted from human error, and were resolved by having both observers watch the videos again. Inter-observer agreement on the final coding was 100%.

∗∗∗p < 0.001.

# Results

Experiment 2 followed the same analyses as Experiment 1. In addition, on third-toy distribution, each child had four data points, entered using a binary response term (1 = ingroup, 2 = outgroup) for whether the child had distributed the third toy to the ingroup or outgroup puppet on each test trial.

Preliminary analyses confirmed that counterbalanced variables and language used for testing did not predict outcomes on both two-toy and third-toy distribution, hence these variables were not included in subsequent analyses. The final model comprised the following predictors of interest: age in months and gender (female or male).

#### Two-Toy Distribution

A generalized linear mixed model yielded no significant predictor of two-toy distribution outcomes. The full model performed no better than a null model [LRT, χ 2 (2) = 0.02, p = 0.99]. There was no significant effect of age [B = −0.02, SE = 0.13, p = 0.89] or gender [B = −0.002, SE = 1.48, p = 1.00]. See **Table 3** for model output.

To further examine the distribution pattern, proportion of test trials with fair two-toy distribution was calculated for each child by dividing the number of trials coded as fair, by the total number of completed trials. All children completed all four test trials. A two-tailed one-sample t-test against chance (test value = 0.50) revealed that on average, children distributed fairly on a significantly greater proportion of trials (M = 0.83, SD = 0.34) than expected by chance, t(109) = 10.32, p < 0.001, d = 0.98. Results are depicted in **Figure 3**.

#### Third-Toy Distribution

A generalized linear mixed model found no significant predictor of third-toy distribution outcomes. The full model performed no better than a null model [LRT, χ 2 (2) = 3.62, p = 0.16]. There was no significant effect of age [B = −0.02, SE = 0.03, p = 0.49] or gender [B = −0.71, SE = 0.41, p = 0.08]. See **Table 3** for model output.

To further examine the distribution pattern, proportion of test trials on which the third toy was distributed to ingroup instead of outgroup puppet was calculated for each child by dividing the number of trials coded as ingroup, by the total number of completed trials. All children, except three of them who did not respond on one trial, completed all four test trials. We report the aggregate results for all children, but the exclusion of children who did not complete all four trials would not change the results.

A two-tailed one-sample t-test against chance (test value = 0.50) indicated that on average, children distributed favorably to their ingroup on a significantly greater proportion of trials (M = 0.65, SD = 0.34) than expected by chance, t(104) = 4.62, p < 0.001, d = 0.45. See **Figure 3** for graphical depiction of results.

# Discussion

In Experiment 2, we found that children alternated between fairness and ingroup loyalty on a two-part distribution task: they tended to be fair by distributing the first two items equally between the two recipients but exhibited ingroup loyalty by distributing the third item preferentially to the ingroup recipient. Since group membership influenced to whom children distributed the third limited resource, it is unlikely that the same minimal groups had been ineffective in creating an intergroup setting in Experiment 1. Therefore, the results in Experiment 1, which were replicated by two-toy distribution in Experiment 2, truly reflected children's choice of fairness over ingroup loyalty when distributing an evenly divisible quantity of resources. This finding also highlights the role of resource availability as a contextual cue in guiding children's distributive decisions.

# GENERAL DISCUSSION

In the present study, we examined how preschoolers in Singapore weigh concerns about fairness and ingroup loyalty in an intergroup resource distribution task. Our main finding was that the vast majority of children abided by an equality rule when resources were precisely enough for two recipients but demonstrated ingroup loyalty when distribution involved a single, non-divisible resource. Overall, we found evidence that preschoolers in Singapore are predominantly fair when distributing resources and ingroup loyalty only becomes apparent under conditions of limited resource availability.

In Experiment 1, a comparison of distributive patterns between children in Group and No Group conditions suggested that group membership did not result in greater ingroup favoritism at the expense of fairness. Regardless of whether groups were assigned to the distributor and recipients, children's distributions were largely fair (equal). One possible explanation is that resources, or the lack thereof, signal reward or punishment, such that there is resistance against unequal outcomes in resource distribution where equal outcomes are a possibility, unless the recipient demonstrates a clear lack of deservingness through

TABLE 3 | Estimates and standard error of fixed effects in generalized linear mixed models predicting children's distribution outcomes in Experiment 2.


Reference levels for categorical variables were set at default: gender = female, distribution outcome = fair (on two-toy distribution) or ingroup (on third-toy distribution).

inadequacies in performance or culpable conduct. There is some support for this speculation (e.g., Kenward and Dahl, 2011; Baumard et al., 2012; Surian and Franchin, 2017). Another possible explanation is that the minimal group paradigm, which relies on novel and artificial groupings such as sticker colors, had failed to elicit group identification and related intergroup processes required for ingroup loyalty to be relevant. The latter possibility was dismissed by the results obtained in Experiment 2, which similarly used minimal groups – when children had to make a forced choice between an ingroup and an outgroup member as the recipient of a limited resource, they took the course of action that benefited the ingroup member. Because Experiment 2 effectively elicited expressions of ingroup loyalty using an identical minimal group paradigm, there is evidence that the lack of group effect on resource distribution in Experiment 1 could be attributed to a robust tendency to disregard group membership when there are clear opportunities for equality, and not to an unsuccessful group manipulation.

Our findings are consistent with prior studies on children's expectations of resource distribution. A study by Bian et al. (2018) found that 1- to 2-year-olds expected an animal puppet to distribute items equally between two other animal puppets regardless of whether the recipient was of the same species or a different species from the distributor, however, when there were just enough items for the ingroup, infants expected the distributor to exclude the different-species outgroup and give all items to the same-species ingroup. In another study, 5 to 10-year-olds expected human agents to distribute resources favorably to their own group when the groups were described to be competing over scarce resources (DeJesus et al., 2014). Based on these studies, children expect ingroup loyalty to override fairness in resource distribution involving limited resources.

Our findings are also consistent with prior studies on children's resource distribution. In a study by Olson and Spelke (2008), children were asked to distribute resources when there were sufficient resources for all recipients and when there were insufficient resources to go around. The study found that children distributed equally regardless of the social relationship with recipients, only favoring kin and friends over strangers when resources were not enough for everyone. Unlike in the current study, however, children were told to act as proxies for a doll, such that the social relationship with recipients and distributive decisions were both established in relation to the doll, and hence the results were conceptually more reflective of children's beliefs about the normative behaviors of others rather than their own distributive patterns. Similarly, another study found that children were more likely to favor race and gender ingroups when resources did not suffice for an equal distribution compared to when there were enough resources for every recipient (Renno and Shutts, 2015). In light of these findings, a likely explanation for the salience of ingroup loyalty under conditions of resource scarcity is offered by theories of intergroup conflict suggesting that the struggle to secure limited resources fuels competition between groups (realistic group conflict theory, Jackson, 1993). Ingroup loyalty is also thought to be linked to resource conflicts in our ancestral past as our predecessors worked in groups to obtain and protect valued resources from outgroup aggressors during a period of intergroup strife for survival (Benozio and Diesendruck, 2015).

While there is reason to expect that ingroup loyalty may be dominant in Singapore because of a collectivistic orientation, our findings on two-toy distribution suggest otherwise, echoing most of the work in Western samples where fairness trumps other types of concerns early in development (e.g., McAuliffe and Dunham, 2017). Nevertheless, it is clear from prior work (e.g., Misch et al., 2014, 2016), and from the results of third-toy distribution in Experiment 2, that children are concerned with ingroup loyalty; they simply do not manifest this concern in the context of third-party resource distribution, when resources are deemed to be sufficient for equal sharing and no additional contextual cues are provided save for group membership.

Although the present study defines fairness based on an equality rule (i.e., ensuring each recipient gets the same number of resources), this is a restrictive definition, because unequal distributions may sometimes be perceived as fair, such as when one recipient has a greater need for the resource, has worked harder to earn the resource, has rightfully won the resource from a competitive interaction, or has been assigned a greater amount of the resource through an impartial procedure. With age, children develop a nuanced perspective of what fairness entails, one that is not restricted to absolute equality but that appeals to principles related to need, merit, impartiality, norms and social justice (Schmidt et al., 2016). While findings from the current study coincide with findings from studies in other cultures, we might observe cultural differences when the definition of fairness is not constrained to a numerically equal distribution. For example, there is cross-cultural variation in the extent to which children consider work contributions and redistribution of wealth in their distributive decisions (Chai and He, 2017).

Across both experiments in the present study, recipients were identical except for group membership, which was established using superficial group markers (i.e., sticker colors). The lack of other meaningful social information about the recipients or about the nature of intergroup

relations could have led children to rely more heavily on an equal distributive strategy when resources were evenly divisible. Future research should look at a wider range of distributive contexts in which ingroup loyalty may exert greater dominance over fairness. Some factors include: group dynamics (e.g., presence of intergroup conflict, relative group status), type of resource (e.g., value and function of resource), and recipient characteristics (e.g., prosociality or antisociality, work contributions). Additionally, natural group markers like speech accent or collaborative interactions could strengthen the influence of ingroup loyalty on children's distributive decisions, in comparison to the static presentation of ingroup and outgroup members in the present study.

A final limitation of our study relates to the use of two items to represent a state of sufficiency, in that there were sufficient resources to be distributed equally among recipients, while one item was taken to represent limited resource availability. Two items can, however, still be construed as being limited in quantity, as giving both items to one recipient leaves the other recipient with none while having four items or more would alleviate such a concern. Future studies can vary the number of distributable items to convey varying degrees of resource sufficiency and scarcity, which may in turn elicit nuanced portrayals of generosity and parochial behaviors.

In summary, preschoolers in Singapore relied largely on the fairness principle to guide distributive decisions involving an evenly divisible quantity of resources but showed ingroup loyalty when distributing a limited resource. Our results converge with findings from other resource distribution studies with preschoolers and similar studies measuring young infants'

## REFERENCES


expectations of distributive behaviors in third-party observations. Taken together, there is evidence suggesting stability in the development of knowledge to behavior in the subdomains of fairness and ingroup loyalty.

# AUTHOR CONTRIBUTIONS

PS and KL developed the study concept and design. Data collection and data analysis were performed by KL, who also drafted a first manuscript. PS and GE provided critical revisions. All authors approved the final version of the manuscript for submission.

# FUNDING

This research was supported by Nanyang Technological University Start-up Grant (M4081490) and Singapore Ministry of Education Social Science Research Thematic Grant (MOE2016- SSRTG-017) to PS.

# ACKNOWLEDGMENTS

Many thanks to the children, parents, teachers, and preschools for participating in this research; to our partner NTUC My First Skool for their support; and to Renée Baillargeon for her comments on a previous draft of this manuscript.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Lee, Esposito and Setoh. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

fpsyg-09-01752 September 15, 2018 Time: 9:56 # 10

# A Developmental Perspective on the Origins of Morality in Infancy and Early Childhood

Audun Dahl<sup>1</sup> \* and Melanie Killen<sup>2</sup>

<sup>1</sup> Department of Psychology, University of California, Santa Cruz, Santa Cruz, CA, United States, <sup>2</sup> University of Maryland, College Park, College Park, MD, United States

Key constituents of morality emerge during the first 4 years of life. Recent research with infants and toddlers holds a promise to explain the origins of human morality. This article takes a constructivist approach to the acquisition of morality, and makes three main proposals. First, research on moral development needs an explicit definition of morality. Definitions are crucial for scholarly communication and for settling empirical questions. Second, researchers would benefit from eschewing the dichotomy between innate and learned explanations of morality. Based on work on developmental biology, we propose that all developmental transitions involve both genetic and environmental factors. Third, attention is needed to developmental changes, alongside continuities, in the development of morality from infancy through childhood. Although infants and toddlers show behaviors that resemble the morally relevant behaviors of older children and adults, they do not judge acts as morally right or wrong until later in childhood. We illustrate these points by discussing the development of two phenomena central to morality: Orientations toward helping others and developing concepts of social equality. We assert that a constructivist approach will help to bridge research on infants and toddlers with research on moral developmental later in childhood and into adulthood.

#### Edited by:

Jessica Sommerville, University of Washington, United States

#### Reviewed by:

Valerie Kuhlmeier, Queen's University, Canada Tobias Krettenauer, Wilfrid Laurier University, Canada

> \*Correspondence: Audun Dahl dahl@ucsc.edu

#### Specialty section:

This article was submitted to Developmental Psychology, a section of the journal Frontiers in Psychology

Received: 21 May 2018 Accepted: 28 August 2018 Published: 20 September 2018

#### Citation:

Dahl A and Killen M (2018) A Developmental Perspective on the Origins of Morality in Infancy and Early Childhood. Front. Psychol. 9:1736. doi: 10.3389/fpsyg.2018.01736 Keywords: morality, infancy, constructivism, social development, helping behavior, intergroup attitudes

# INTRODUCTION

Key constituents of morality emerge early in ontogeny: by their fourth birthday, most children express obligatory judgments based on moral concerns with others' welfare, rights, and fairness through spontaneous reactions and reasoning about perceived violations (Schmidt et al., 2012; Smetana et al., 2012; Dahl and Kim, 2014; Rizzo et al., 2016; for a review, see Killen and Smetana, 2015). How do newborns–seemingly unconcerned with moral issues–develop into preschoolers with moral capabilities that, in some ways, resemble those of adults?

Recent research on social cognitive abilities among infants and toddlers promises to shed light on how preschoolers come to reason about and judge moral issues. Most of the foundational work on cognitive developmental approaches to moral development focused on older children and adults (Piaget, 1932; Kohlberg, 1963, 1971; Turiel, 1983). In the last two decades, numerous researchers from social and moral developmental psychology (Killen and Smetana, 2015), as well as other areas in developmental psychology, have explored the presence of morally relevant concepts and behaviors in infants and toddlers (Bloom, 2013; Hamlin, 2013; Sommerville, 2015; Tomasello, 2016). Discussions about the origins of morality in infancy have often centered on whether some

**31**

parts of morality are innate, or otherwise emerge independently of relevant experiences (Hamlin, 2013; Wynn and Bloom, 2014; Warneken, 2016). In these debates, key terms like "morality" and "innate" are often left undefined (Dahl, 2014).

In this article, we argue that explaining major transformations in early moral development requires a new lens, one that bridges the gap between infancy and childhood. This article makes three proposals for how to integrate research on very young children with research on moral development in later childhood. First, we propose that research on moral development needs explicit definitions of morality and other central concepts. Second, developmental acquisitions involve both genetic and environmental factors, and research on moral development would benefit from eschewing the dichotomy between innate and learned characteristics. Third, there are fundamental differences between the capabilities of infants and toddlers and the moral capabilities of older children. Within our framework, infants and toddlers demonstrate important precursors to morality, but lack core components of a developed morality. In elaborating on this third claim, we discuss age-related changes regarding young children's orientations toward helpful behaviors and toward generalizing moral obligations to members of different groups. These three issues are fundamental (definitions, acquisition, and age-related change), but they clearly do not exhaust all major points of debate about a complex construct such as morality. We hope that addressing these concerns will help integrate research on how morality develops during the first year of life.

# RESEARCH ON EARLY DEVELOPMENT NEEDS A DEFINITION OF MORALITY

We propose that an investigation of early moral development requires a definition of morality and other key concepts. In our view, explicit definitions of key terms are crucial to the accumulation of knowledge (Dahl, 2014; Dahl and Killen, 2018). In contrast, some scholars have explicitly stated that morality does not need to be defined and that the inquiry of moral concepts necessitates asking participants what morality means to them, noting that the word "morality" is used in a variety of ways (Greene, 2007; Haidt and Graham, 2007; Wynn and Bloom, 2014). We argue that morality, perhaps even more than other concepts, requires definition and criteria. One problem with defining morality in terms of what people label as moral is that morality can become relativistic; whatever action or belief any one person, group, or culture deems to be "moral" is so (for discussions, see Kohlberg, 1969; Turiel, 2002, 2015a). Moreover, when researchers do not define morality, it is difficult to determine whether disagreements among scholars result from different uses of the word "moral" or from different empirical claims. Indeed, explicit definitions of phenomena for investigation reflect the core of scientific analysis and are crucial for empirical evaluation of scientific claims.

In our work, we have defined morality as prescriptive norms concerning others' welfare, rights, fairness, and justice (Killen and Rutland, 2011; Turiel, 2015a; Dahl and Killen, 2018). The research task is to determine when children's judgments reflect these criteria. This definition of morality stems from neo-Kantian philosophical accounts of morality (Turiel, 1983; Smetana et al., 2014). Within our framework, morality is not the only basis for evaluative judgments: children and adults also make judgments about conventional, religious, and personal safety considerations (see Killen and Smetana, 2015). The usefulness of defining morality in terms of others' welfare, rights, fairness, and justice is now supported by a large body of research showing that children and adults distinguish moral considerations from considerations about social conventions, and from matters of personal choice (Killen and Smetana, 2015). For instance, most children across different communities think that it would be wrong to harm others even when parents or teachers condone it. In contrast, most children view conventional issues, such as dress codes or forms of address, as alterable by authorities. We are not asserting that there is only one definition of morality; our main point is that an explicit definition of morality is crucial for avoiding major miscommunication, and promoting accumulation of knowledge, in research on early moral development.

# EARLY MORALITY IS CONSTRUCTED, AND IS NEITHER INNATE NOR LEARNED

While psychological research in the first half of the 20th century often framed one of the fundamental questions about psychological behavior as whether it was innate or learned, extensive research has subsequently undermined the dichotomy between innate and learned characteristics. In fact, all developmental transitions involve genetic, cellular, neural, behavioral, and environmental processes (Gottlieb, 1991; Spencer et al., 2009; Moore, 2015).

Children construct morality through reciprocal interactions with their environments (Dahl and Killen, 2018). The constructivist view does not seek to separate innate and learned elements of morality (Piaget, 1932). This view is also supported by evidence that children have an abundance of morally relevant experience from early in life, involving helping and being helped as well as harming and being harmed (Reddy et al., 2013; Dahl, 2015, 2016a,b; Hammond et al., 2017). Through these experiences, children come to critically evaluate norms from parents and others (Dahl and Kim, 2014; Dahl, 2016b; Dahl and Killen, 2018).

The constructivist viewpoint differs from contemporary nativist and learning views of moral development. In discussions of innate characteristics, it is often unclear how to determine whether some characteristic is "innate" (Dahl, 2014; Turiel and Dahl, in press). It is biologically implausible that any characteristic would develop irrespective of environmental processes. Some have proposed that we infer characteristics to be innate whenever the characteristics develop in the absence of relevant experience (Bloom, 2012; Hamlin, 2013). However, for morality, virtually any social interaction is a relevant experience. From birth, most infants interact with people who help and comfort them, for instance by feeding them or responding to their crying (Richards and Bernal, 1972; Tronick,

1989; Hammond et al., 2017). An infant who develops in the absence of morally relevant experiences would not develop at all.

Importantly, the constructivist view also differs from learning or socialization views of moral development. Socialization and learning views portray moral development as a process of complying with the norms and views of one's community (Kochanska and Aksan, 2006; Grusec et al., 2014), leading to a relativistic theory of morality. In contrast, the constructivist view proposes that children acquire generalizable obligations about the fair and equal treatment of others through an active process, one that involves abstracting, interpreting, and evaluating social experiences, sometimes agreeing and sometimes challenging the norms held by one's community (Nucci, 2005). Children also construct other evaluative concepts through social experiences, for instance by learning about social conventions or religious norms adopted by their parents, and other community members (Turiel, 1983; Killen and Smetana, 2015).

In proposing a constructivist approach, we seek to reorient research on early moral development. Rather than asking whether a given capability is innate or due to experiential factors, research can investigate how children construct morality through reciprocal interactions.

# STUDYING DEVELOPMENTAL CHANGE

Developmental research is the study of change. Yet, recent discussions of early moral development have often emphasized continuities concerning the presence of moral knowledge between infants and adults. Some researchers have proposed that infants make moral judgments, and possess altruistic motives, around the first birthday (Warneken and Tomasello, 2006; Bloom, 2013; Hamlin, 2013; Wynn and Bloom, 2014; Warneken, 2016). Contrasting with this emphasis on continuity, researchers have recommended greater attention to developmental change in moral development (Kagan, 2008; Killen et al., 2015; Dahl and Freda, 2017; Sommerville, 2018). These age-related changes include conceptual advancements, coordination of knowledge, and priority of certain moral principles over others. These gradual changes reflect new understandings about morality that were not present at younger ages. Here, we call for greater attention to developmental change in research on early moral development through discussions of helping behavior and research on children's judgments of group-based social inequalities.

# Developmental Changes in Orientations Toward Helping

How do judgments of helpful actions develop? We assert that helping behavior, alone, is not necessarily "moral" behavior but reflects a first step toward the acquisition of morality. In some contexts, individuals judge helping as morally good or even obligatory, such as when it involves helping others from harm. In other situations, however, helping is viewed as undesirable and morally repugnant, such as helping someone cheat or steal (Miller et al., 1990; Kahn, 1992; Killen and Turiel, 1998; Turiel, 2015b). Thus, evaluations of helping behavior incorporate the goal of the action, and the basis for the motivation to help another person.

Early in life, children have experiences with helping and being helped by others. Most infants help others around the first birthday (Warneken and Tomasello, 2007; Sommerville et al., 2013; Dahl, 2015; Hammond et al., 2017). In one common laboratory paradigm, an adult accidentally drops a pen or a paperclip and unsuccessfully reaches for it. Infants commonly hand back the dropped object to the experimenter (Warneken and Tomasello, 2006; Warneken, 2013). In everyday life, 1-yearolds participate in a variety of chores, including putting toys away, laundry, self-care, and cleaning (Rheingold, 1982; Dahl, 2015; Hammond et al., 2017).

We propose that infants' earliest helping behaviors are based on a desire to participate in social interactions, and are not accompanied by moral judgments that helping is good or required (Dahl and Paulus, in press; Miller et al., 1990; Kahn, 1992; Killen and Turiel, 1998; Turiel, 2015b). First, infants are not very reliable helpers. Infants who help on one trial do not always help on another, and often opt to play instead of helping (Warneken and Tomasello, 2006; Waugh and Brownell, 2017). Infants' unreliable helping is striking because, in these studies, infants could help at minimal cost (Rheingold, 1982; Warneken et al., 2007). Second, when infants begin to help, they do not appear broadly concerned with others' welfare. While infants on average become more helpful early in the second year of life, they also use more interpersonal force in this period, sometimes hitting or kicking others for no apparent reason and without visible signs of anger or distress (Hay, 2005; Dahl, 2015, 2016a).

Finally, infants do not make categorical judgments based on moral concerns (Dahl, 2014; Dahl and Freda, 2017). Although infants and toddlers prefer to reach and look toward helpful puppets over hindering puppets, they also show such preferences based on non-moral characteristics such as food preferences (Hamlin et al., 2013; Wynn, 2016). Moreover, infants' preferences are relative, not qualitative: These studies show that infants prefer one puppet over another, but do not show that infants view some puppets as bad or wrong (Vaish et al., 2010; Dahl et al., 2013).

Infants' desire to participate in chores and other adult activities is an important developmental precursor to morality. Still, this desire does not constitute a moral concern. Orientations toward helping undergo transformations between infancy and later childhood (Dahl et al., 2018). By 3–4 years of age, children make categorical judgments about right and wrong based on concerns with welfare and rights (Nucci and Weber, 1995; Smetana et al., 1999; Schmidt et al., 2012; Dahl and Kim, 2014; Killen and Smetana, 2015; Josephs and Rakoczy, 2016). Hence, preschoolers have developed obligatory concepts and concerns regarding others' welfare and apply these in social situations. Past research indicates that children make judgments of right and wrong about helping by age 8, and likely before (Kahn, 1992; Nucci et al., 2017; Van de Vondervoort and Hamlin, 2017). More research is needed to explain the development of moral orientations toward helping, from a desire for participation to judgments based on concerns with welfare and rights (Dahl and Paulus, in press).

# Developmental Changes in Intergroup Attitudes and Moral Judgments

As children grow older, they also encounter acts that involve members of other groups. Over the past decade, research in developmental psychology has examined the origins of morality in concert with the emergence of social equality, or how young children apply their moral judgments to intergroup contexts (Schmidt et al., 2012; Hetherington et al., 2014; Weller and Lagattuta, 2014; Killen et al., 2015). Do young children distribute resources by giving more to their ingroup than to an outgroup when both groups are equally meritorious? Do moral judgments play a positive force, enabling children to reject peers who promote stereotypic or prejudicial attitudes (Killen et al., in press; Mulvey, 2016; Rutland and Killen, 2017). These are fundamental questions regarding how morality, defined as the fair and equal treatment of others, is applied in situations in which group identity is salient (Nesdale, 2004).

Group affiliation is necessary for human survival (Tomasello, 2016). At the same time, many forms of group loyalty are unfair, resulting in negative treatment toward others, and particularly those perceived as members of outgroups. Children and adults in many cultures view group norms related to societal conventions as contextually bound and consensus-driven whereas moral principles are generalizable and obligatory (Smetana et al., 2014), reflecting continuity in thinking about group norms. As early as 3–6 years of age children, view moral norms as obligatory, and view group loyalty as relative to the type of loyalty required, such as whether the loyalty is conventional (wearing the team colors) or moral (Liberman et al., 2018; Rizzo et al., 2018).

What changes with age is the recognition of the obligation and orientation to reject unfair group norms, which requires taking a number of contextual factors into account (Mulvey, 2016). A series of age-related shifts has been documented during early childhood in which children begin to actively challenge unfair group norms and view exclusion from groups based on stereotypic expectations of individuals as wrong (see Killen et al., in press). One finding that stands out is that, with age, knowledge about groups is related to children's increased ability to rectify inequalities (Elenbaas and Killen, 2016a). Further, an increase in psychological knowledge about others' intentions (such as theory of mind) enables children to reject exclusion as well as the denial of resource allocations based on stereotypic norms (Mulvey et al., 2016b; Rizzo and Killen, 2018).

Whereas 5 to 6-year-olds distribute resources equitably when faced with two characters, one who has lots of resources (e.g., wealthy) and one who does not (e.g., poor), 3 to 4-year-olds allocate equally (even though they recognize that equity would be legitimate if another child gave more to those who have less) (Rizzo and Killen, 2016). When asked about whether others would reduce inequalities, 5 to 6-year-olds, but not 3 to 4 year-olds expect individuals to seek more for their ingroup if they are told that the group prefers their ingroup. Younger children do not take information about ingroup bias into account when asked what groups will do (Elenbaas and Killen, 2016b).

With increasing theory of mind abilities, 4 to 6-year-old children allocate resources based on merit in gender nonstereotypic contexts in contrast to children without theory of mind who fail to reward meritorious behavior when the activity does not conform to the gender stereotype (e.g., boys making dolls or girls making trucks) (Mulvey et al., 2016a; Rizzo and Killen, 2018). Moreover, children who pass false belief theory of mind are more likely than children who fail to expect others to challenge gender stereotypes about what toy to play with and were also more supportive of those challenges (Mulvey et al., 2016b). Further, with age (from 5–6 years to 10–11 years) knowledge about group inequalities based on race has been shown to be related to decisions to rectify inequalities when distributing resources, with younger children less aware and more likely to perpetuate the inequality than older children (Elenbaas and Killen, 2016b). Thus, the emergence of morality reflects age-related changes regarding incorporating information about group identity and group norms into moral decisions and judgments.

# CONCLUSION AND FUTURE DIRECTIONS

This article proposes a constructivist approach to early moral development. We made three main points. First, a definition of morality is key to studying morality: definitions guide empirical research questions and hypotheses. Second, transitions in early moral development involve genetic, environmental, and socialcognitive factors. Morality and its precursors cannot be split into some characteristics that are innate and others that are learned. Third, an account of the origins of morality requires investigations of the processes that lead to the acquisition of new forms of moral judgments, reasoning, and concerns. In the area of helping, research that connects early helping behavior with evaluative judgments about helping in childhood would be fruitful. To extend research on morality in intergroup contexts, documenting the factors that enable children to challenge inequalities and unfair treatment would be impactful. We believe that scholars would benefit from providing explicit definitions of key terms, abandoning the dichotomy between innate and learned characteristics, and considering developmental change in research on early morality and its precursors.

# AUTHOR CONTRIBUTIONS

Both authors contributed equally to the conceptualization and writing of this article.

# FUNDING

The preparation of this article was supported in part by a grant from the National Institute of Child Health and Human Development (R03HD087590) to AD and by a grant from the National Science Foundation (BCS#1728918) to MK.

# REFERENCES

fpsyg-09-01736 September 18, 2018 Time: 16:50 # 5



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Dahl and Killen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Happily Unhelpful: Infants' Everyday Helping and its Connections to Early Prosocial Development

Stuart I. Hammond<sup>1</sup> \* and Celia A. Brownell<sup>2</sup>

<sup>1</sup> School of Psychology, University of Ottawa, Ottawa, ON, Canada, <sup>2</sup> Department of Psychology, University of Pittsburgh, Pittsburgh, PA, United States

Young children's everyday helping in the home has received relatively little attention in research on prosocial behavior. Nevertheless, key features such as young children's cheerful participation in chores around the home, including in ways that make accomplishing these chores more difficult for parents, can reveal important facets of early prosocial development. The present study reports the results of an Internet (MTurk) survey of over 500 families with children aged 1–4 years about their children's prosocial tendencies, participation in nine common chores, whether children's helping attempts were helpful or not, and attributions about children's motives for helping. Consistent with much prior research, parents reported that children became more prosocial with age. The majority of parents reported children's participation in everyday helping is at times unhelpful. Parents attributed children's helping to a variety of motives and these too, changed with age. Fathers had somewhat different perceptions of children's everyday helping than mothers. Results are discussed in terms of how understanding everyday helping can contribute to ongoing debates in the literature about the roots of prosocial behavior.

Keywords: prosocial behavior, infants, unhelpful helping, altruism, helping

# INTRODUCTION

The fact that infants help others early in life, soon after the first birthday, if not earlier (e.g., Svetlova et al., 2010; Dahl, 2015; Hammond et al., 2017), may reveal something profound about human nature. Infants' and toddlers' efforts to help others, which exceed those of one our closest relatives, the chimpanzee, may suggest that humans have evolved a "hypercooperativeness" (Warneken and Tomasello, 2006, p. 1302; see also Vaish and Tomasello, 2014; Warneken, 2015). Many researchers in the field share Warneken and Tomasello's (2006) view that humans are cooperative by nature. But important questions remain, such as whether prosociality is exclusively motivated by altruism (e.g., Hepach et al., 2012) or by other social motives (e.g., Carpendale et al., 2014; Pletti et al., 2017), and whether helping is unlearned, or if there is a role for socialization in prosocial development (e.g., Brownell and Early Social Development Research Lab, 2016; Warneken, 2016). Concerns about the role of evolution and development in children's prosocial behavior also motivated earlier work in prosocial development (see, e.g., Bridgeman, 1983).

#### Edited by:

Jessica Sommerville, University of Washington, United States

#### Reviewed by:

Fanli Jia, Seton Hall University, United States Rechele Brooks, University of Washington, United States

#### \*Correspondence:

Stuart I. Hammond shammond@uottawa.ca

#### Specialty section:

This article was submitted to Developmental Psychology, a section of the journal Frontiers in Psychology

Received: 29 May 2018 Accepted: 03 September 2018 Published: 21 September 2018

#### Citation:

Hammond SI and Brownell CA (2018) Happily Unhelpful: Infants' Everyday Helping and its Connections to Early Prosocial Development. Front. Psychol. 9:1770. doi: 10.3389/fpsyg.2018.01770

Seeking to elucidate these issues, in this paper we join a growing number of researchers in pointing out a mundane, but important, point: most human infants are not particularly good helpers. In experimental contexts, where infants are provided with opportunities to assist adults feigning distress, children often fail to help others (see Waugh and Brownell, 2017). In structured problem situations, infants often display wariness, turn to parents for security, or just continue to play. Infants' inconsistent helping can also be seen in the descriptives of nearly any experimental study of helping. For example, Dunfield et al. (2011) study, no infants helped an experimenter who appeared to have hurt their knee, although some infants helped with other tasks. In Warneken and Tomasello's (2006) study, in over half of the helping tasks fewer than half of the children helped. Despite the predominance of both helping and non-helping, when the time comes to draw conclusions from these studies, most often the overarching conclusion is simply that "toddlers help," which they most certainly do – but why only sometimes? Looked at more closely, children's actual behavior makes the claim that young children are reliably altruistic problematic when they often seem to care more about themselves than others; likewise, their behavior challenges claims that early helping is unlearned when this putative evolved mechanism seems to be built on an unsteady and unreliable foundation.

Moreover, even when they do try to help others, their behavior is often not helpful. In their daily lives, infants and toddlers engage in "everyday helping" as they participate in the life of the home, helping clean up toys, water the garden, and doing other tasks. But infants are not little angels, and parents do not look to raising them as a time of relative ease when they can relax while their young children help around the home and reduce their own workload. This fact was briefly raised in a seminal study by Rheingold (1982), who found that although toddlers can and do try to help, parents often "avoid what they viewed as interference . . . [by doing] chores while the children were taking their naps" (p. 122). Hammond (2014) labeled this phenomenon "unhelpful helping," meaning that young children's "helpful" participation makes the task more difficult for a parent rather than less. Although this construct has the potential to provide unique insights into the nature and motives for early-appearing prosocial behavior, no study, to date, has specifically examined "unhelpful helping:" how frequent is it, does it predominate earlier in the development of prosocial behavior, how does it relate to prosocial tendencies more generally?

More positively, Rheingold (1982) also noted that children take part in activities with others with good humor and sprightly energy. In this vein, Forman (2007) remarked that parents must sometimes manage toddlers who adamantly and enthusiastically want to participate in a given household task, whether the parent wants them to or not. As others have argued, toddlers have a strong motive to "belong" (Baumeister and Leary, 1995; Barragan and Dweck, 2014). Participating in parents' activities fulfills that drive even without any helpful intentions. Ultimately, however, unhelpful toddlers become helpful, even caring, preschoolers. An important question is how children's efforts and intentions to help change with age, from participating and playing to contributing and caring.

# Present Study

Although young children's everyday helping in the home has received little attention in research on prosocial behavior, its features, such as the way children cheerfully participate in chores around the home, sometimes in ways that make these more difficult for their parents, may nevertheless reveal important characteristics about the structure of early prosocial development. In particular, systematically studying everyday helping in the early years can reveal how young children's participation in such activities changes with age; to what extent their participation takes the form of interference or obstruction rather than helping; what motivates young children's helping behavior in the home and how motives to help change with age. To examine these features of early helping we asked parents of 12 to 59 month old children to report their children's current participation in everyday household chores, whether such participation was ever unhelpful, and what they believed motivated their children's helpful and unhelpful participation.

# MATERIALS AND METHODS

# Participants

Participants were recruited to and participated in the study through Amazon MTurk, and were compensated 50 cents for participation in what was an approximately 10-min anonymous survey. Participants needed to be living in the United States, 18 years of age and older, and the parent of a child between 1 and 4 years of age. If participants had two or more children of eligible age, they were asked to fill out the survey for the youngest eligible child.

Although MTurk is rarely used in developmental psychology research with young children, one of its advantages is that it draws a more diverse sample (see Buhrmeister et al., 2011). Indeed, unusually for a child development study, approximately half of the participants in the present study were fathers, the consequences of which we will discuss in more detail below.

#### Data Screening

Given that the data were collected nationally and anonymously, it was screened conservatively. Participants' responses were eliminated if there were apparent mistakes in their data that might indicate inattention or falsification (e.g., inconsistencies in responses to the number of children living in the home, and the number of siblings the child has), and responses from multiple respondents were eliminated if coming from an identical IP address.

### Final Sample

After screening, the final sample consisted of 528 participants (253 girls; 275 boys; 279 mothers; 249 fathers), with a mean age of Mage = 35.17 months (SDage = 12.19 months), with children ranging from 12 to 59 months of age. Participants overwhelming

identified their child as belonging to one ethnicity (86%), with approximately 14 percent reporting two or more ethnicities. Eighty-two percent of participants had some Caucasian ethnicity (68% Caucasian only), with approximately 12 percent of children being identified as primarily or some Hispanic or Latino (5% Latino Only), 11 percent as primarily or some African-American (7% Black only), 9 percent as primarily or some Asian (4% Asian only), and 2 percent as Native American or Pacific Islander (2% Native American or Pacific Islander only).

Approximately 10 percent of the children came from a household where the highest education attainment was a graduate degree, 38 percent of households held a bachelor degree, 14 percent of households held an associate degree, and 24 percent had some college, but no degree, 13 percent held a high school degree or equivalent, and about 1 percent had less than high school. Approximately 10 percent had incomes higher than 100,000 US dollars, 36 percent had a household income of between 50,000 and 99,999 US dollars, 40 percent of participants had a household income of between 25,000 and 49,999 US dollars, and approximately 14 percent had incomes below 25,000 US dollars.

# Procedures

Participants filled out an eligibility and consent form, then responded to a short set of questions on demographics, their child's prosocial tendencies, and their child's everyday helping in the home.

# Measures

#### Demographics

Participants were asked about their child's gender and age; family income, education, and ethnicity as noted above; presence of siblings and pets in the home; and the child's attendance at preschool or daycare.

#### Prosocial Tendencies

Participants filled out the prosocial subscale of the Goodman (1997) Strengths and Difficulties questionnaire, which comprised five questions about the child's tendency to help and comfort others, on a three-point Likert scale ranging from Not True to Completely True. Responses were scored and summed to form a composite prosocial tendency score that ranged between 5 (Not True for all questions) to 15 (Completely True for all questions). The composite score had a Cronbach alpha of 0.76.

#### Everyday Helping

Participants were asked to fill-out a series of questions about their children's help in the home.

#### Chores

Participants were asked about children's participation in nine common chores in the home (laundry; vacuuming/sweeping; dishes; cooking/food preparation; groceries/shopping; gardening; putting away toys/cleaning up own room; throwing away trash). These common chores were derived from (unpublished) survey data collected with Hammond and Carpendale (2015). Responses were given a score of 1 if parents indicated that the chore was done Always/Almost Always or Sometimes, and 0 if done Rarely or otherwise, for a composite score that could range between 0 and 9. The composite score had a Cronbach alpha of 0.78. Parents could also fill-in an "Other" textbox to indicate other forms of helping.

#### Unhelpful Helping

Participants were asked to respond to the question "When your child gets involved in the above activities, is this ever unhelpful to you (e.g., they mix dirty laundry and clean laundry)? How do you respond in these sorts of situations?" Parents' responses to these questions were scored with a 1 if they indicated the child was at times unhelpful (e.g., "Yes"; "I tell him what a good job he is doing, and when he's not looking, I redo it"), and a 0 if they indicated the child was never unhelpful (e.g., "No"; "No, he is not unhelpful. He puts all of his toys away in his toy box, helps pick up the floor, and puts garbage in the trash can").

#### Motives

Participants were asked "Why do you think your child wants to get involved in these sorts of activities?" and a series of six potential motives were listed: being asked ("Because I ask them to help"); being rewarded ("reward them when they help [e.g., sweets, allowance]"); being praised ("I praise them when they help"); fun ("They find these activities fun"); social affiliation ("They enjoy spending time with me"); and care ("They care about other people"). As with chores, these were drawn from a prior study (Hammond and Carpendale, 2015). Parents could check as many as applied, and responses were coded with a 1 if selected and a 0 if unselected. They were also afforded the option to fill out a text box with an "other" category for any other motives.

# RESULTS

# Demographics in Relation to Prosocial Behavior

In preliminary analyses, age was related to several variables of interest, and subsequent analyses are broken down by children's age in years. Parents' gender was also related to several prosocial variables as noted and discussed further below. In contrast, children's gender, and demographic variables such as household income and education, and reported ethnicity were unrelated to prosocial variables. For mean comparisons, non-parametric analyses (e.g., the Kruskal–Wallis test, an ANOVA analog) were used, as the number of participants by year of age was uneven.

# Prosocial Tendencies

The mean for parent-reported prosocial tendencies increased with children's age, although the only significant difference was

TABLE 1 | Descriptives of prosocial tendencies, participation in chores, and unhelpful helping by age of child in years and gender of parents (with significant differences by parent gender noted).


<sup>∗</sup>p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001.

between Year 1, and all subsequent years (see **Table 1**; Kruskal– Wallis test, p < 0.001, with Dunn-Bonferroni post hoc, p < 0.001 for each post hoc comparison).

# Everyday Helping

#### Chores

**Figure 1** depicts the most commonly reported chores across ages. The mean number of chores that children participate in increased with children's age (see **Table 1**; Kruskal–Wallis test, p < 0.001). Participation in chores at Year 1 differed from all subsequent years (Dunn-Bonferroni post hoc, p < 0.001 for each post hoc comparison), participation at Year 2 and Year 3 did not differ from each other (Dunn-Bonferroni post hoc comparison, n.s.), and participation at Year 2 and Year 3 both differed from Year 4 (Dunn-Bonferroni post hoc comparison, p < 0.001). **Table 2** displays some examples of other types of helping that parents provided in the fill-in section for "other helping."

Children's reported participation in chores also varied by parents' gender (see **Table 1**). Mothers reported greater participation in chores across ages than fathers (Mann–Whitney, p < 0.001), and, analyzed by year, this difference was significant at Year 2 (Mann–Whitney, p < 0.05), Year 3 (Mann–Whitney, p < 0.001), and Year 4 (Mann–Whitney, p < 0.01).

#### Unhelpful Helping

Across ages, the majority of parents reported their children engaged in unhelpful helping. See **Table 2** for examples of unhelpful helping. Parent-reported unhelpful helping did not increase by age (see **Table 1**). Across all ages, mothers reported more unhelpful helping than fathers (Mann–Whitney, p < 0.001); broken down by age, this parental gender difference was significant only at Age 3 (Mann–Whitney, p < 0.05).

#### Motives

Across ages, parents endorsed praise as the most likely motive for children's helping, followed by fun, social affiliation, being asked, caring for others, and being rewarded (see **Figure 2**). Overall, mothers were more likely to endorse praise (Mann– Whitney, p < 0.01) and fun (Mann–Whitney, p < 0.01) as motives for children's participation than fathers. Although most parents left the other text box for motives empty, a notable minority endorsed imitation, mimicry, and copying parents as a motive for helping (e.g., "He wants to be just like mommy"; "I really think it is because they like to be like us as much as possible").

#### **Motives by year**

Broken down by age, the most frequently endorsed motivation at Year 1 and Year 2 was praise, with a tie between being praised and social affiliation at Year 3. Social affiliation was the most frequently endorsed motive at Year 4 (see **Figure 3**).

#### TABLE 2 | Individual examples of helping and parental views on unhelpful helping.


#### **Motives by type by year**

Parents tended to endorse more motives as children grow older (Kruskal–Wallis test, p < 0.001; also, see **Figure 3**), though post hoc tests suggest that this difference lies in the difference between children at Year 1 versus all other years (Dunn-Bonferroni post hoc, p < 0.05 for Year 1 to Year 2, p < 0.01 for Year 1 to Year 3, and p < 0.001 for Year 1 to Year 4), though there were no differences between Year 2, 3, or 4. In Year 1, over half of parents endorsed three motives for children's helping (praise; fun; social affiliation). By Year 4, over half of parents endorsed five motives for children's helping (social affiliation; praise; fun; being asked; and caring for others).

Rates of parental endorsement of fun and praise as motives for helping did not differ by year. In terms of values that did differ significantly, parents were more likely to endorse being asked as a motive for children's helping at Year 3 than Year 1 (Kruskal–Wallis test, p < 0.05, with Dunn-Bonferroni post hoc, p < 0.01). Parents were more likely to endorse being rewarded at Years 3 and 4 than Year 1 (Kruskal–Wallis test, p < 0.001, with Dunn-Bonferroni post hoc, p < 0.001 for Year 1 and Year 3, and p < 0.01 for Year 1 and Year 4), and at Year 3 than Year 2 (Dunn-Bonferroni post hoc, p < 0.01). Parents were more likely to endorse social affiliation at Year 4 than Years 1 or 3 (Kruskal–Wallis test, p < 0.01, with Dunn-Bonferroni post hoc, p < 0.01 for Year 1 and Year 4, and p < 0.05 for Year 3 and Year 4). Parents were more likely to endorse caring at Years 2 through 4 than at Year 1 (Kruskal–Wallis test, p < 0.001, with Dunn-Bonferroni post hoc, p < 0.001 for each post hoc comparison).

# Relations Between Prosocial Tendencies, Everyday Helping, and Motives for Helping

**Table 3** depicts partial correlations (Spearman's rho) between prosocial tendencies, participation in chores, unhelpful helping, and parent-reported motives for helping, controlling for age in months. Parents' gender was not controlled in this analysis, as for all means, mothers had a consistently higher value than fathers, therefore correlations were relatively unchanged, and due to concerns with including a dichotomous variable as a control variable (though see Bay and Hakstian, 1972). Children's participation in chores was correlated with both their prosocial tendencies (0.30, p < 0.001) and unhelpful helping

TABLE 3 | Partial Spearman's rho correlations, controlling for age in months, between children's prosocial tendencies, participation in chores, unhelpful helping, and attributed motivations for helping.


∗∗p < 0.01; ∗∗∗p < 0.001.

(0.22, p < 0.001), but prosocial tendencies and unhelpful helping were unrelated (n.s.).

Praise, fun, social affiliation, and caring were all related to children's prosocial tendencies, with helping motivated by caring being the best predictor of prosocial tendencies (0.36, p < 0.001). All motives were related to participation in chores, except for reward, with everyday helping motivated by fun being the best predictor of participation in chores (.32, p < 0.001). Praise, fun, and social affiliation were related to unhelpful helping, with helping motived by praise and fun being the best predictors of unhelpful helping (both at 0.14, p < 0.01).

# DISCUSSION

The present study reported on the findings of a survey of over 500 participants, examining parental reports of prosocial tendencies and everyday helping among children aged 1–4 years (12– 59 months), including participation in common chores, tendency to be unhelpful during chores, and motives for everyday helping.

# Developmental Trends in Prosocial Behavior

As with prior studies, children generally become more helpful with age (e.g., Svetlova et al., 2010), with greater prosocial tendencies, and greater proclivity toward everyday helping. Also supporting prior studies, children participate in some forms of help quite early in life, even by about 12 months of age (e.g., Dahl, 2015; Hammond et al., 2017).

Controlling for age, children with higher prosocial tendencies had a higher tendency to participate in chores. However, whereas caring as a motive for everyday helping best predicted children's prosocial tendencies, fun as a motive best predicted participation in chores. This offers partial support to Dunfield's (2014) proposition that different domains of prosocial behavior have unique developmental pathways. More importantly, it shows that motives for everyday helping become more complex, as well as more numerous, with development, with praise and fun as the foundation of early helping, and social affiliation and caring emerging to greater prominence later.

# Everyday Helping and Unhelpful Helping

Supporting Rheingold's (1982) finding that children's assistance can be a nuisance for parents, the majority of parents in the present study reported that their children's help in the home is at times unhelpful. This finding is somewhat different than that of Waugh and Brownell (2017), who studied the frequent non-helping behaviors that emerged in toddlers' interactions with unfamiliar adults experiencing difficulties in laboratory tasks. In the home, children are often explicitly attempting to participate (i.e., they are not warily backing away from their parents), even when the parent is not experiencing difficulty in completing the task (and if they are, it is often the child's participation that is making it more difficult). That is, "helping" in the home is often not about helping someone else with a difficulty, but about participating or collaborating together in the same activity.

Although unhelpful helping seems to be unrelated to children's general prosocial tendencies, it is related to participation in chores in the home. Parents who were more likely to report their children's motives for helping as fun, social affiliation, and praise were also more likely to report unhelpful helping. Parents in the present study noted diverse ways of managing unhelpful helping. One notable approach was to view the unhelpful behavior as part of positive development that would one day lead to genuinely helpful behavior, or to appreciate the behavior as pleasurable, enjoyable, or amusing. This finding supports Brownell et al. (2002) and Brownell and Early Social Development Research Lab (2016) contention that positive social interactions form an important context for prosocial development. These strategies may also support Rogoff and colleagues' (e.g., Rogoff, 2014) findings that European Americans treat their infants' efforts to help others in "mock" ways, in the sense that the efforts are seen as cute and pleasant, but not necessarily important contributions to the life of the family or community. In contrast, in many indigenous communities, children come to play serious and important roles in supporting community living.

# Motives for Helping

The present study found that parents endorsed both internal (care; fun; social affiliation) and external (being asked;

being praised; being rewarded) motivators for children's help. Supporting the general view in the literature that internal motives are of greater interest and importance (e.g., Rheingold, 1982; Warneken and Tomasello, 2006; Hepach et al., 2012), being asked and particularly being rewarded were less predictive of prosocial tendencies and helping. Further supporting Rheingold (1982) and others (e.g., Carpendale et al., 2014), parents attribute multiple motives for their children's helping, predominantly praise, social affiliation, and fun. Fun and social affiliation seemed to be important motivators for everyday helping across ages, supporting Rheingold's (1982) view that children's help is often characterized by cheerful energy, or "alacrity."

However, contrary to a wholly internally motivated view of helping, parents tended to endorse praise as a motivator for helping (see also Dahl, 2015). Parents' ranking of praise as a primary motivator for helping, and particularly its presence at younger ages, is important to the extent that this would suggest that parents often use praise (and indeed, this emerges in parents' characterization of managing even unhelpful helping, e.g., "Yes it is usually unhelpful, but I praise him for helping anyway while correcting" – Mother of 23-month-old boy). The widespread use of praise even with the youngest children weakens support for at least one argument for early helping being unlearned, namely that "[i]nfants 18 months of age are too young to have received much verbal encouragement for helping from parents" (Warneken and Tomasello, 2006, p. 1302).

The findings reported here present somewhat mixed support for the view that caring, often seen as a form of altruism, is foundational to the early emergence of prosocial behavior. Overall, parents endorsed caring as a motivator for everyday helping less frequently than many other motives. However, care seems to emerge as a more important motivator of helping in older children, suggesting that this feature of prosocial behavior is later developing, building on earlier parental praise and having fun as motives. With age controlled, care does seem to predict children's prosocial tendencies and their helping in the home, although not unhelpful helping, showing that altruistic motives are important to prosocial behavior, consistent with the assertions of many theories.

# Fathers' Perceptions of Children's Prosocial Behavior

The present study had an unusually high number of fathers, approximately half the sample, likely because it was administered via MTurk. As participants in most studies of prosocial behavior are mothers, we had few expectations of how fathers' responses might differ from those of mothers. Fathers appear to view their children's everyday helping and unhelpful helping as occurring at lower rates than do mothers, although both mothers and fathers seem to view children's prosocial tendencies in about the same way. Fathers are also less likely to see fun and being praised as motives for everyday helping. Although the issue is impossible to resolve from the present data, one possibility is that mothers are spending more time with their young children, or do more chores in the home to begin with, and are more likely to notice children's helping, and unhelpful helping.

# Limitations

This is the first study of early helping to use MTurk to obtain information from a wide swath of parents about their young children's behavior. However, survey data have many well-known weaknesses, relying heavily on parents' perceptions of their children's helping, rather than direct observation. However, we hope that the present study will encourage researchers to explore everyday and unhelpful helping in future observational and experimental studies.

Although the participants in the present study are more diverse than a typical developmental study, particularly in terms of parental gender, the participants are nevertheless largely from Caucasian majority ethnic backgrounds, and may respond differently to children's help than other parents (e.g., Rogoff, 2014). Given the unexpected findings about fathers' perceptions of prosocial behavior, more details about primary parenting, time spent with children, and the parents' involvement in the life of the home would be advisable in future research studies.

# Social Ecology and Future Directions

This study began by noting a consensus in the research literature on prosocial development, namely that human infants are hypercooperative compared to other species. The findings presented here add to recent research that explores what Dahl (2017) calls the ecology of development, and presents evidence that challenges some evolutionary speculations in the field, such as to natural altruism and unlearned helping. Children may begin helping with motives other than altruism, such as fun and social affiliation (see Rheingold, 1982; Pletti et al., 2017), and seem to assist others in ways that are not always very helpful (see also Waugh and Brownell, 2017). In noting these disruptive developmental realities, we wish to close by offering a further, if speculative, evolutionary hypothesis for the field to explore.

The present study found evidence, also suggested by other researchers (e.g., Rheingold, 1982; Carpendale et al., 2014; Brownell and Early Social Development Research Lab, 2016), that positive contexts are important to prosocial development. Children may be motivated to engage in everyday helping by a desire for fun, and their efforts are often unhelpful. Yet at an evolutionary level of analysis, play is an important evolved feature of mammalian behavior, and human infants seem to be the most playful of all (see Bruner, 1972; Schank et al., 2015). Play helps animals learn about and explore their

environment and social others in ways that are, to the outside observer, often quite silly to behold. As we consider the role of positive, and even fun, contexts and the role of unhelpful helping in early prosocial development, perhaps we should also begin to conceptualize humans' evolved nature as "hyperplayful" as well as hypercooperative.

# ETHICS STATEMENT

This study was carried out in accordance with the recommendations of the Institutional Review Board, at the Research Conduct and Compliance Office, University of Pittsburgh. The protocol was approved by the University of Pittsburgh Institutional Review Board. All subjects gave written informed consent in accordance with the Declaration of Helsinki.

# REFERENCES


# AUTHOR CONTRIBUTIONS

SH was the primary author. CB contributed to research, conceptualization, editing, and analysis.

# FUNDING

This research was supported by the Social Sciences and Humanities Research Council of Canada.

# ACKNOWLEDGMENTS

We would like to thank all the parents and children who participated in this study, and the helpful comments of the reviewers and editor. Thank you to Anisa Yan for typo hunting – any remaining mistakes are those of the first author.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The reviewer RB and handling Editor declared their shared affiliation at the time of review.

Copyright © 2018 Hammond and Brownell. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Means-Inference as a Source of Variability in Early Helping

#### Sophie Bridgers\* and Hyowon Gweon

*Department of Psychology, Stanford University, Stanford, CA, United States*

Humans, as compared to their primate relatives, readily act on behalf of others: we help, inform, share resources with, and provide emotional comfort for others. Although these prosocial behaviors emerge early in life, some types of prosocial behaviors seem to emerge earlier than others, and some tasks elicit more reliable helping than others. Here we discuss existing perspectives on the sources of variability in early prosocial behaviors with a particular focus on the variability within the domain of instrumental helping. We suggest that successful helping behavior not only requires an understanding of others' goals (goal-inference), but also the ability to figure out *how* to help (means-inference). We review recent work that highlights two key factors that support means-inference: causal reasoning and sensitivity to the expected costs and rewards of actions. Once we begin to look closely at the process of deciding how to help someone, even a seemingly simple helping behavior is, in fact, a consequence of a sophisticated decisionmaking process; it involves reasoning about others (e.g., goals, actions, and beliefs), about the causal structure of the physical world, and about one's own ability to provide effective help. A finer-grained understanding of the role of these inferences may help explain the developmental trajectory of prosocial behaviors in early childhood. We discuss the promise of computational models that formalize this decision process and how this approach can provide additional insights into why humans show unparalleled propensity and flexibility in their ability to help others.

Keywords: prosocial behavior, instrumental helping, decision-making, causal reasoning, cost-benefit-analysis, cognitive development

# 1. INTRODUCTION: VARIABILITY IN EARLY PROSOCIAL BEHAVIORS

Humans are not only social creatures, we are also prosocial. We often take actions that benefit others even at the expense of our own time, energy, and resources. The tendency to act on others' behalf emerges remarkably early in life; even preverbal infants readily help when others are struggling to achieve a goal (Warneken and Tomasello, 2006) or point to the locations of objects for which others are searching (Liszkowski et al., 2008). The fact that these behaviors emerge quite early in life has been taken as evidence for an intrinsic motivation to be helpful: We want to help others (Warneken and Tomasello, 2006; Tomasello, 2009).

Early prosocial behaviors have been categorized into different domains that vary in terms of "what" is being offered (Zahn-Waxler et al., 1992; Tomasello, 2009; Dunfield et al., 2011): Instrumental helping (physical, goal-directed action), informing (information), sharing (resources, such as food), and comforting (emotional expressions and gestures). Prior comparative and

#### Edited by:

*Jessica Sommerville, University of Washington, United States*

#### Reviewed by:

*Stuart I. Hammond, University of Ottawa, Canada Carolyn Palmquist, Amherst College, United States*

#### \*Correspondence:

*Sophie Bridgers sbridge@stanford.edu*

#### Specialty section:

*This article was submitted to Developmental Psychology, a section of the journal Frontiers in Psychology*

Received: *26 June 2018* Accepted: *28 August 2018* Published: *26 September 2018*

#### Citation:

*Bridgers S and Gweon H (2018) Means-Inference as a Source of Variability in Early Helping. Front. Psychol. 9:1735. doi: 10.3389/fpsyg.2018.01735*

**45**

developmental work suggests that only some of these behaviors are shared with non-human primates while others may be uniquely human (Warneken, 1994; Warneken and Tomasello, 2009; Horner et al., 2011). Some work further suggests that even within humans, these behaviors may have rather independent developmental trajectories; helping and informing behaviors are observed at an earlier age than sharing or comforting (e.g., Liszkowski et al., 2008; Brownell et al., 2009; Svetlova et al., 2010), and children's tendency to act prosocially in one domain does not necessarily correlate with behaviors in other domains (Dunfield et al., 2011, 2013).

Such between-domain variability suggests that different prosocial behaviors may be subserved by different evolutionary roots and social-cognitive mechanisms (Tomasello, 2009). Researchers also appeal to different underlying motivational sources triggered by different cues (e.g., emotion contagion or emphathic concern triggers comforting, while goal-alignment leads to instrumental helping; Paulus, 2014). While these accounts may differ in their level of explanation and the proposed origins of differences observed across domains of prosocial behaviors, they generally agree that this variability reflects deeper differences between domains (e.g., Tomasello, 2009; Brownell, 2012; Dunfield et al., 2013; Paulus, 2014): We want to help more in some domains than others.

Wanting to help, however, is not the same as actually helping. For our prosocial motivation to lead to an action, it is also critical to figure out how to help. Depending on what others want, what went wrong, and what we can do to help, we may choose to help others in different ways, or not help at all. In fact, there is substantial variability even within a prosocial domain, raising the possibility that there may be other factors beyond between-domain differences in motivational sources and socialcognitive reasoning that influence children's tendency to help. However, the variability within domains has received relatively little attention.

Here we suggest that the pattern of data across different tasks within a domain can provide important insights into the development of prosocial behavior. We begin by taking a closer look at the variability in early instrumental helping in particular, and explore the nature of the inferences required by different tasks (i.e., inferences about others' goals and the means by which to help). We conclude by discussing how goals- and meansinferences can help explain the variability in early prosociality not only within but also across domains.

# 2. VARIABILITY IN EARLY INSTRUMENTAL HELPING: THE ROLE OF GOAL-INFERENCE

A seminal study by Warneken and Tomasello (2006) provides compelling evidence for the early emergence of instrumental helping. Human infants (18-month-olds) and chimpanzees were placed in a range of scenarios where a human adult attempted but failed to achieve a goal: (1) out-of-reach tasks, (2) physicalobstacle tasks, (3) wrong-result tasks, and (4) wrong-means tasks. (see **Figure 1A** for details). Both groups helped, though human infants did so more frequently and more reliably across different scenarios than chimpanzees. Based on these results, the authors argue that humans are naturally inclined to help, and that the motivation to help may have emerged sometime in evolutionary history before humans and chimpanzees diverged.

This study, however, also nicely demonstrates substantial within-domain variability in early instrumental helping behaviors. While often overlooked, these data are especially valuable because few studies have used such a wide range of tasks within the same domain (also see Warneken and Tomasello, 2007); most subsequent studies focused on comparing helping behaviors across domains and so used a subset of instrumental helping tasks that were shown to elicit high rates of helping. For our purposes, Warneken and Tomasello (2006) provides an ideal case-study for taking an in-depth look at the variability found in various instrumental helping tasks.

In this study, children most reliably helped in the out-ofreach tasks (over 60% in three of four tasks, see **Figure 1B**). They were more likely to pick up the out-of-reach object when the experimenter accidentally dropped and reached for it (experimental conditions) compared to when he intentionally threw it away and did not reach for it (control conditions), suggesting that they recognized the experimenter's goal and selectively provided help when he needed it. However, in other tasks children helped less frequently or not at all, or helped in both the experimental and control conditions. What might explain such variability?

One possibility, as the authors suggest, is that these tasks differ in how easy it is to infer the experimenter's goal from his behavior (Warneken and Tomasello, 2006). Yet, exactly how goal clarity might differ across tasks has not been explored in detail. Below we offer some speculation on the relationship between the difficulty of goal-inference and the rates of helping in this study.

The relatively lower rate of helping in some tasks can indeed be explained by goal ambiguity. In the Cabinet task (physical-obstacle), while the most plausible reason for why the experimenter is banging into the cabinet doors is that he wants to open them but his hands are full, he may have other reasons for doing so (e.g., trying to maneuver around the cabinet, or just doing it for fun). In contrast, the experimenter's goal may have been too clear in the Clips task (wrong-ends), eliciting the target action in both the experimental and control conditions. Here the experimenter has clips lined up on a board, and either unsuccessfully attempts to place three more clips on the board (experimental) or intentionally places the three clips next to the board (control). The already lined-up clips with three clips that remain off of the board might have led children to believe even in the control condition that the experimenter wanted them to place these remaining clips on the board.

In other tasks, however, the goal seems clear, yet children do not help reliably. For example, in the Rake task (wrongmeans), the experimenter reaches for blocks inside of a vertical box with a transparent side, presumably making his goal as explicit as in the out-of-reach tasks. In the Chair task (physicalobstacle), the experimenter tries to sit down on a chair (but cannot because a bottle is on the seat), again making his goal

difference between conditions was found for the Cap, Chair, Clips, and Rake tasks. Error bars represent SE; \**p* < 0.05. (Reproduced with permission from Warneken and Tomasello, 2006).

of sitting down quite obvious. Nevertheless, both tasks elicited little to no instances of the target helping behavior. The absence of help is most striking in one of the out-of-reach tasks (Cap) where the experimenter reaches for his cap/hat, just like in other out-of-reach tasks. If the goal was clear in these tasks, why didn't children help? In order to help someone, the helper should of course understand what goal needs to be fulfilled (goalinference). However, it takes more for a prosocial motivation to manifest as observable behavior; the helper also has to figure out how to help, or the means by which they can provide help. Below we discuss the role of this means-inference in more detail.

# 3. THE IMPORTANCE OF MEANS-INFERENCE: FIGURING OUT HOW TO HELP

Means-inference involves identifying the cause of the problem and the appropriate means to resolve the issue. Additionally, the helper needs to know whether this appropriate intervention is feasible and worthwhile to execute (i.e., costs and benefits of performing the action).

In the out-of-reach tasks in Warneken and Tomasello (2006), both goal- and means-inferences are relatively straightforward; the goal is clear, and the appropriate way to help is to simply reenact the experimenter's action (i.e., reach to retrieve the object), which is well within preverbal infants' behavioral repertoire (Meltzoff, 1995; Hamlin et al., 2008).

However, comparing the tasks that elicited little helping (Rake, Chair, and Cap) with other tasks in the same category reveals how such inferences can be more complex. In the wrong-means Rake task, children watched an experimenter use the rake to retrieve objects but never used it themselves, whereas in the other wrong-means (Flap) task, children had a chance to lift the flap on the box before the critical test. Similarly, in the physical-obstacle Chair task, children did not have a chance to remove an object from the chair, whereas in the other physicalobstacle (Cabinet) task they were given an opportunity to open the cabinet doors. Finally, in the Cap task (the only out-ofreach task that did not elicit help), the out-of-reach object was the experimenter's personal possession, which might have made children unsure of whether it was okay to pick it up (i.e., permissibility of target action). These comparisons, while speculative, suggest that children's prior experience with the exact means to help may have influenced their tendency to help. It is difficult to say if children in this study could not infer the means, were uncertain about their ability (or the permissibility) to perform the means, or both. Nonetheless, these examples highlight that children's ability to understand how to help may impact whether they ultimately help even when the helpee's goal is clear.

Means-inference is often the crux of what makes helping hard. People in need of help are aware of their own goals, and often communicate these goals to the helper (e.g., "can you help me fix my computer?"), eliminating the need for goal-inference. However, figuring out how to help is usually up to the helper; most often, people need help because they do not know how to remedy their problem. Thus, studying the role of meansinference in particular might be critical to understanding what supports the planning and production of a helpful action, what might prevent us from producing it, and what changes across development.

Compared to goal-inference, means-inference has received comparatively less attention. Some studies discuss the difficulty of means-inference as a possible source of variability (e.g., Dunfield et al., 2011, 2013) but few directly investigate children's ability to infer the appropriate means to help while holding the goal and task constant. Below we discuss recent work that begins to shed light on young children's ability to figure out how to help.

# 3.1. Deciding How to Help by Identifying the Cause of Failure

One critical aspect of means-inference is a causal analysis of the situation: What is the source of the helpee's problem, and what can be done to address it? Depending on the cause of the helpee's failure, the helper may need to take different courses of action to resolve the problem. Prior work suggests that young children can make powerful and sophisticated causal inferences even from sparse evidence, aided by their understanding of others' knowledge, goals, and intentions (e.g., Gopnik et al., 2004; Kushnir et al., 2008; Shafto et al., 2012; Bridgers et al., 2016a; Sim et al., 2017). Remarkably, preverbal infants can infer the cause of their own failures based on the covariation information embedded in others' successes and failures (Gweon and Schulz, 2011). Yet, much remains unknown about how causal reasoning might inform how children help.

One recent study suggests that even toddlers readily recruit their causal knowledge to decide how to help (Bridgers et al., 2017). Two- and three-year-olds were introduced to three toys, each of which had a yellow button on one side that played music and a red inert button on the other side (but one of the toys was broken such that neither button played music). Then a naïve confederate pressed a button on one of the toys only to fail to play music and asked children for help. The only difference between the two conditions was whether the confederate tried the yellow button (suggesting that her toy was broken) or the red button (suggesting that she was trying the wrong side), but children responded in very different ways; they got her a different toy that worked in the first condition, but flipped the confederate's toy over to show her the correct (yellow) button in the second. The confederate's goal was very clear (she stated she wanted music), but knowing her goal was not enough: Knowledge of how the toys worked was critical to infer the source of the confederate's problem and select the appropriate means to help. These results suggest that even toddlers readily take advantage of what they have just learned minutes before to infer the correct means and provide effective help.

# 3.2. Deciding How to Help via Cost-Benefit Analyses of Actions

Another critical aspect of means-inference is determining the feasibility of the means. This involves understanding whether one has the resources and the competence to perform the necessary action, and is socially permitted to do so. Recent work suggests that young children's tendency to help is affected by the expected difficulty of their own actions: Toddlers are less likely to offer instrumental help when it involves carrying a heavy object v. a light object, although their tendency to perform these actions increases as their motor capacity develops (Sommerville et al., 2018).

Beyond considering the physical costs of helping from their own perspective, children also begin to proactively consider the consequences of their actions for the helpee. When it is clear that obeying a specific request for help would not fulfill the helpee's goal (e.g., the requested cup has a crack), children override the request and help via a different means (e.g., giving her an intact cup; Martin and Olson, 2013). Given a forced choice, preschool-aged children also understand that it is more desirable to offer help with a difficult task than an easy one (Bridgers et al., 2016b; Bennett-Pierre et al., 2018). Furthermore, children are sensitive to whether reciprocity is encouraged and are more likely to help others who have engaged with them in reciprocal play than in parallel play (Barragan and Dweck, 2014). Children also become increasingly aware of the cultural normativity and permissibility of their own and others' actions (Nucci and Turiel, 1978; Rakoczy et al., 2008; Legare and Harris, 2016). The idea that objective and subjective costs of actions influence children's tendency to act prosocially is consistent with the proposal that humans have an intuitive understanding of the costs and rewards of their own and others' goal-directed actions (Gergely and Csibra, 2003; Jara-Ettinger et al., 2016).

Together, these studies suggest that helping is more than figuring out others' goals. It also involves recruiting one's knowledge to infer the appropriate means to resolve others' problems and determining whether it is feasible (or worthwhile) to help. If children are uncertain about any of these inferences, they may not help; not because they do not have the motivation or the desire to do so, but because they may be unsure of whether help is really needed, what actions make sense, or whether they are able to offer appropriate help.

# 3.3. Means-Inference Can Give Rise to Different Forms of Prosocial Behaviors

The significance of figuring out how to help might extend beyond instrumental helping. For example, in Bridgers et al. (2017), one can provide instrumental help to address the immediate cause of the confederate's failure, such as giving her one of the working toys or pressing the button that works. However, addressing the ultimate cause of the confederate's failure—her ignorance about how the toys work would involve informing or teaching. Indeed, children's help was often accompanied by communicative behaviors that resemble pedagogical demonstrations (e.g., eye-contact, pointing; Csibra and Gergely, 2009) and even verbal information (e.g., "That one [toy] is not working"; "This button has no music."), suggesting that some children were not only helping but also informing. Furthermore, if none of the confederate's toys played music, children might willingly share their own toy or even try to comfort the confederate to relieve her disappointment. The costs and feasibility of different means might also play a role; a child who wants to inform might resort to instrumental helping if her verbal proficiency is limited, or offer emotional comfort instead of giving away her favorite toy.

Our analysis illustrates how the lines between these different forms of helping may not be as distinct as previously proposed. Given the same goal, we can choose to provide instrumental help, information, resources, or comfort depending on the source of others' failure and what we can do to address it. Although there may be distinct perceptual, physiological, or cognitive mechanisms associated with different forms of prosocial behavior, their role might be to provide input to a more general decision-making process that generates the observed response<sup>1</sup> .

# 4. PROSOCIAL BEHAVIOR AS A DECISION-MAKING PROCESS

A motivated helper needs to figure out what help is needed (goal-inference), what action would fulfill that need, and whether or not they are able to execute that action (means-inference). From this perspective, the production and form of prosocial behaviors are more than responses to cues or triggers; they are the output of a sophisticated decision-making process about the most effective way to help. We suggest that this process involves understanding the latent causal structure of the situation by integrating one's knowledge about (1) others (their goals, knowledge, preferences, competence, resources, etc.), (2) the physical world (intuitive physics, causality, etc.), and (3) the

<sup>1</sup>While some prosocial behaviors might be driven by immediate physiological responses to others' plight or danger, most laboratory studies are likely probing more deliberate decisions to help. Here, we constrain our focus to how we can better evaluate and compare these different experimental tasks.

self (one's own knowledge, preferences, competence, resources, etc.) (see **Figure 2**). As children still have much to learn about each of these domains, investigating how children draw and coordinate inferences across these domains may help us better understand why some forms of helping seem to emerge earlier than others.

This framework highlights why it is difficult to draw strong developmental claims from the rates of helping across different tasks and situations alone. In studies that compare rates of helping across domains, the out-of-reach tasks are commonly used to measure children's tendency to help instrumentally (e.g., Svetlova et al., 2010; Dunfield et al., 2011, 2013). In these studies, younger children tend to instrumentally help more frequently than they share or comfort, which has led some authors to conclude that sharing and comforting involve more sophisticated social-reasoning and higher personal costs than instrumental helping. As our analysis reveals, however, children are less likely to provide instrumental help when the helpee's goal is ambiguous, means are hard to identify, or there is uncertainty about the feasibility of the needed actions. Even a strong desire to help may not produce an observable behavior if the appropriate means are unclear or the costs are too high (Jara-Ettinger et al., 2016). Thus, before concluding that competence in sharing and comforting emerges later in development (e.g., Brownell et al., 2009; Svetlova et al., 2010), it is important to ask if the tasks we use to index these abilities involve goals and means that are more ambiguous than the tasks we use to measure instrumental helping. It remains an open question whether tasks that are better matched along these dimensions would produce less variability across domains. Consistent with this possibility, in sharing tasks where the goal and means are made more explicit (e.g., the experimenter holds out her hand), children are more likely to share (Dunfield et al., 2011).

Our account does, however, motivate some clear developmental hypotheses. Young children may struggle to infer the helpee's goal or figure out the means to help, or may lack the necessary competence or resources to help; thus with increased age and experience, the frequency of helping, as well as the diversity and sophistication of the means employed may increase. Furthermore, as children's reasoning about others' minds develops across early childhood, they may become better able to signal their helpful intent regardless of the effectiveness of their actions, and even begin to show adult-like sensitivity to how their help might be perceived by the helpee (e.g., seeing helping as patronizing, etc.).

A key challenge in drawing developmental conclusions from behavioral data is that the absence of a particular behavior does not entail the absence of underlying mental constructs (e.g., motivation to help, the ability to draw goal- and meansinferences, physical competence, etc.). Computational models can complement developmental methods because they are particularly useful in revealing how multiple decision factors interact and contribute to generating behavior (see Shafto et al., 2014; Jara-Ettinger et al., 2016, for recent computational work on social cognition). Characterizing prosocial behavior as the output of a decision-making process lends itself well to formalization (Tenenbaum et al., 2011; Berger, 2013). This approach would force researchers to express the inferences involved (and the knowledge that supports them) in precise, quantitative terms, and would generate graded predictions about how likely children are to help, what form this help might take, and how effective it is likely to be in a given situation.

# 5. CONCLUDING THOUGHTS

In sum, while prior work has found compelling evidence for a remarkably early emergence of prosocial behavior, it also has found substantial within-domain variability across different kinds of instrumental helping tasks. Because the primary question in previous studies was whether preverbal infants can help at all, this variability has not received as much attention as variability across domains, and was largely attributed to children's developing ability to identify others' goals.

However, a closer look at different tasks raises the possibility that the ease of figuring out how to help may also modulate children's tendency to help. Studying early prosocial behavior as a decision-making process highlights the importance of both goals- and means-inferences, provides grounds for connecting developmental literature with studies of cooperative behaviors in adults (e.g., Rand and Nowak, 2013), and opens up avenues for computational research investigating how intuitive theories and inferential abilities allow prosocial motivations to manifest as observable, effective actions. Our analysis also highlights the importance of taking seriously the inferential demands of different tasks; by designing tasks that systematically vary in the complexity of the goal- and means-inferences involved, we can better characterize children's helping abilities both within and across domains.

Humans are not only motivated to help; we are also good at it. Studying how children become able helpers, knowledgeable teachers, effective cooperators, and empathic companions will allow us to better understand how across generations humans have accomplished so much and become the most powerful and flexible species on the planet.

# AUTHOR CONTRIBUTIONS

All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

# ACKNOWLEDGMENTS

We thank M. H. Tessler for his helpful comments. We would also like to thank our funding sources: McDonnell Scholars Award (HG), NSF Graduate Research Fellowship (SB).

# REFERENCES


ignorant partners. Cognition 108, 732–739. doi: 10.1016/j.cognition.2008. 06.013


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Bridgers and Gweon. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Preschoolers Understand the Moral Dimension of Factual Claims

Emmily Fedra<sup>1</sup> \* and Marco F. H. Schmidt1,2

1 International Junior Research Group Developmental Origins of Human Normativity, Department of Psychology, LMU Munich, Munich, Germany, <sup>2</sup> Department of Developmental and Educational Psychology, University of Bremen, Bremen, Germany

Research on children's developing moral cognition has mostly focused on their evaluation of, and reasoning about, others' intrinsically harmful (non-)verbal actions (e.g., hitting, lying). But assertions may have morally relevant (intended or unintended) consequences, too. For instance, if someone wrongly claims that "This water is clean!," such an incorrect representation of reality may have harmful consequences to others. In two experiments, we investigated preschoolers' evaluation of others' morally relevant factual claims. In Experiment 1, children witnessed a puppet making incorrect assertions that would lead to harm or to no harm. In Experiment 2, incorrect assertions would always lead to harm, but the puppet either intended the harm to occur or not. Children evaluated the puppet's factual claim more negatively when they anticipated harmful versus harmless consequences (Experiment 1) and when the puppet's intention was bad versus good over and above harmful consequences (Experiment 2). These findings suggest that preschoolers' normative understanding is not limited to evaluating others' intrinsically harmful transgressions but also entails an appreciation of the morally relevant consequences of, and intentions underlying, others' factual claims.

#### Edited by:

Jessica Sommerville, University of Washington, United States

#### Reviewed by:

Melanie Killen, University of Maryland, United States Tilmann Habermas, Goethe-Universität Frankfurt am Main, Germany

#### \*Correspondence:

Emmily Fedra e.fedra@psy.lmu.de

#### Specialty section:

This article was submitted to Developmental Psychology, a section of the journal Frontiers in Psychology

Received: 21 May 2018 Accepted: 10 September 2018 Published: 28 September 2018

#### Citation:

Fedra E and Schmidt MFH (2018) Preschoolers Understand the Moral Dimension of Factual Claims. Front. Psychol. 9:1841. doi: 10.3389/fpsyg.2018.01841 Keywords: factual claims, normativity, norm psychology, social-cognitive development, assertive speech acts, moral cognition

People make assertions about the world every day. Many of these (e.g., "The sun is smaller than the earth.") are typically orthogonal to moral issues and can simply be accepted or rejected given observable reality or some piece of evidence. Others may be morally relevant for a speaker intends to (interpersonally) deceive an addressee (e.g., lying). But sometimes, even simple factual claims – which we keep distinct from the term "lying" – (e.g., "This water is clean!") may become morally relevant, in that they may have harmful consequences (e.g., influence others to act in harmful ways). What is more, speakers may even use factual claims that are easily refutable (e.g., simple generalizations, or claims like "This project was not a success!," "The Earth is flat!") not so much to deceive others, but rather as a means to bring about certain (harmful) consequences (e.g., instill conflict, uncertainty). That is, factual claims may have a moral dimension over and above questions of deceptive intent, truthfulness (i.e., whether the speaker believes the claim or not), or intrinsic harmfulness (e.g., insults). Perhaps especially in the digital age of today in which we face all kinds of assertions that may be associated with certain (intended or unintended) consequences, it seems vital to assess children's understanding of the moral relevance of simple factual claims. In the present study, we investigate preschoolers' understanding of the moral dimension of others' factual claims with a focus on harmful consequences on the one hand, and harmful intentions (regarding harmful consequences) on the other.

**52**

# CHILDREN'S EVALUATION OF OTHERS' MORAL TRANSGRESSIONS

Developmental research over the past couple of decades has accumulated much evidence that preschoolers and, to some extent, even very young children understand much about the moral dimension of others' actions (Turiel, 2006; Schmidt and Tomasello, 2012; Hamlin, 2013; Killen and Smetana, 2015; Rottman and Young, 2015; Sommerville and Enright, 2018). Most prominently, a bulk of interview studies based on social domain theory suggests that preschoolers reliably differentiate between moral norms (e.g., norms forbidding violent behavior, such as hitting) and conventional norms (e.g., norms prescribing appropriate clothing, such as not wearing pajamas to school), judging that – compared with conventional violations – moral transgressions are more severe, more deserving of punishment, more widely applicable and independent of authority demands (Turiel, 2006; Smetana et al., 2014; Killen and Smetana, 2015). Another line of research focused on children's disinterested enforcement of norms in social interactions and found that from around 3 years of age, children spontaneously protest and criticize agents who violate conventional norms, such as (agreed-upon) simple game rules (Rakoczy, 2008; Rakoczy et al., 2008; Schmidt et al., 2016), and agents who commit moral transgressions, such as violating others' rights or harming others (Rossano et al., 2011; Vaish et al., 2011; Schmidt et al., 2012, 2013).

And from around 3 to 5 years of age, children do not just reject and negatively evaluate harmful physical actions but also show some understanding of intrinsically harmful verbal actions that produce psychological harm (typically given the content of the speech act), such as name-calling or teasing (Helwig et al., 2001; Smetana et al., 2012; Ball et al., 2017), or "epistemic harm" (given the speaker's deceptive intent to instill a false belief in the listener), such as lying and deceiving (Peterson et al., 1983; Bussey, 1999; Lyon et al., 2013). Together, these studies using different methodologies equally suggest that at preschool age, children understand much about the moral dimension of intrinsically harmful non-verbal and verbal actions.

# CHILDREN'S EVALUATION OF OTHERS' ASSERTIONS

While there is much evidence that preschoolers understand the moral dimension of others' intrinsically harmful (non-)verbal actions (e.g., hitting, stealing, lying), there is, to our knowledge, no research on their understanding (in terms of normative evaluation) of the moral dimension of others' factual claims that become morally relevant not because of their deceptive motivation, but because of the harmful consequences – intended or not – they may entail. Past work has focused on whether children, or even infants, categorize others' speech acts as correct or incorrect or, at minimum, as statistically expectable or not. For instance, research has shown that even infants are sensitive to whether a speaker labels an object correctly (Koenig and Echols, 2003). And 2-year-olds spontaneously reject assertions that do not match reality (e.g., "Peter is eating the cake" when Peter instead is eating a carrot; Pea, 1982). Moreover, 3-year-olds understand that imperative speech acts should lead to a change of reality (e.g., a person should follow an imperative), whereas assertive speech acts should describe the present reality correctly (Rakoczy and Tomasello, 2009). At the same age, children can identify persons that say something correct or say something wrong and distinguish correct from incorrect statements (Koenig et al., 2004; Lyon et al., 2013).

# INVESTIGATING CHILDREN'S UNDERSTANDING OF THE MORAL DIMENSION OF FACTUAL CLAIMS

Some assertions, such as (malicious) lies, may be considered intrinsically harmful as they are morally relevant regardless of their consequences (Turiel, 1983; Lee, 2013). That is, even if a lie is not effective or does not produce major harm, we may find the mere act of lying, the deceptive intent, blameworthy. However, there is also a more extrinsic component of moral relevance to assertions, namely, the potentially harmful consequences they may entail. For instance, factual claims, such as "This water is clean!," may simply be false given observable reality. Thus, one may easily refute them. However, they may also bring about harmful consequences beyond questions of truthfulness or deception (e.g., someone might get sick by drinking dirty water). Thus, we can morally evaluate assertions for their consequences just like physical actions (Cushman, 2008). Moreover, we may have information about whether the speaker intends harmful consequences to occur or not. Importantly, the speaker may not even have deceptive intent or believe the claim to be false, but rather use the speech act to bring about harmful consequences. Thus, we may also morally evaluate assertions for the intentionality of their consequences.

Hence, here we are interested in two major questions concerning children's understanding of the moral dimension of factual claims: (i) how do children evaluate assertions that lead to harmful consequences? And (ii) does it matter for children's moral evaluation whether the harmful consequences were intended by the speaker or not? Evaluating morally relevant assertions is more complex than evaluating morally relevant actions. Regarding the former, the child can directly assess someone's action considering moral norms or principles (e.g., "Hitting is wrong!"). Regarding the latter, however, the child needs to infer that a factual claim (e.g., "This is an X!") – which, per se, could be considered amoral in that it merely corresponds to reality or not (Turri, 2017) – may lead to harmful consequences and that those consequences may be intended or not. Hence, the crux is to evaluate the assertion as good or bad not in light of its correspondence to reality, but regarding the moral relevance of its consequences and the intentionality of those consequences.

Ever since Piaget's (1932) seminal work, researchers were interested in whether children put more weight on the consequences of an agent's morally relevant action or on the agent's mental states, such as intention, when evaluating the moral valence of an act. While Piaget was clear that children begin with outcome-based evaluations and only later consider

others' intentions in their moral evaluation, more recent research produced heterogeneous results. Whereas some researchers suggest that even school-aged children tend to give more weight to outcomes than to intentions (Costanzo et al., 1973; Yuill, 1984; Zelazo et al., 1996; Helwig et al., 2001; Cushman et al., 2013; Gummerum and Chu, 2014), others found that when using simplified procedures (e.g., simpler vignettes) or controlling for confounding factors (e.g., the action of the well-intended and the ill-intended actors led to the same outcome), even 4- to 5 year-old (and in some work, even 3-year-old) children consider an agent's intention (Chandler et al., 1973; Nelson, 1980; Baird and Astington, 2004; Nobes et al., 2009, 2016; Vaish et al., 2010; Killen et al., 2011; Gvozdic et al., 2016). A recent study (Josephs et al., 2016) demonstrated that 4-year-old (and to some extent even 3-year-old) children take into account an agent's intentionality (freedom of choice) and protested more when a moral transgression occurred under free conditions than if it occurred under constrained ones. For conventional violations, however, children tended to put more weight on outcomes.

When evaluating others' morally relevant factual claims, children thus need to coordinate both consequences (e.g., harmful vs. harmless) and intentions (e.g., good vs. bad) regarding consequences. For intentions, in particular, children are required to use both their normativity and theory of mind skills (Perner et al., 1989; Killen and Smetana, 2008; Killen and Rizzo, 2014; Schmidt and Rakoczy, 2018).When it comes to explicitly evaluating others' morally relevant actions, children begin to consider the importance of intentions by around 4 to 5 years of age (Nelson, 1980; Nobes et al., 2016), which coincides with children's becoming competent at false belief tasks (Perner and Roessler, 2012). Recently, Killen and colleagues (2011) investigated 3.5- to 7.5-year-old children's understanding of intentions in a morally relevant context – morally relevant theory of mind (MoToM). In MoTom tasks, children receive vignettes in which a "transgressor" accidentally causes harm to another person (e.g., accidentally throws a bag with another person's cupcake away). Children who failed classical false belief tasks were more likely to attribute bad intentions to an accidental transgressor and to accept punishment of the accidental transgressor than children who passed the false belief task. Overall, children began to take into account the transgressor's intention between 3.5 and 5.5 years of age.

# THE PRESENT STUDY

In the current study, therefore, we are interested in speech acts that are in and of themselves amoral (i.e., they are simply correct or incorrect and not deceptive), but come with moral relevance, either in terms of anticipated consequences or in terms of the intentionality of those consequences. We sought to investigate in two experiments whether 4- to 5-year-old children understand the moral dimension of factual claims and evaluate and reason about such claims in terms of morally relevant consequences (Experiment 1) or the intentionality of morally relevant consequences (Experiment 2). Importantly, to investigate children's evaluation of assertions, and not of (nonverbal) actions, one needs to make sure that children only witness a speaker making an assertion, but not performing an action (which could be directly assessed without referring to the speaker's assertion). Moreover, to exclude the moral evaluation of epistemic harm (e.g., deceptive intent) and psychological harm (e.g., teasing), it is crucial to use assertions that can easily be rejected given observable reality, and that do not have a specific addressee (that might be deceived or insulted). In Experiment 1, therefore, a puppet made simple incorrect factual claims (e.g., "This is an X!," although it was a Y) and children were told that this incorrect claim would either lead to harm (another puppet would lose her property) or to no harm (a paper ball would be thrown away). In Experiment 2, incorrect claims would always lead to harm, but the puppet either intended the harmful consequences (bad intention) or not (good intention). We predicted that preschoolers would evaluate the incorrect factual claim more negatively (i) when it would lead to harm than when it would not cause any harm (Experiment 1), and (ii) when it was based on a bad intention than when its underlying intention was good (Experiment 2). Moreover, we predicted that children who differentiate correctly between the two types of incorrect factual claims in both experiments would be more likely to provide adequate justifications (referring to consequences in Experiment 1, and to intentions in Experiment 2) for their differential evaluation than children who did not differentiate between the two types of incorrect factual claims.

# EXPERIMENT 1

In Experiment 1, we sought to investigate how children evaluate and justify their evaluation about others' morally relevant factual claims. We manipulated the consequences of the incorrect claim: it would either lead to harm or to no harm.

# Methods

#### Participants

Twenty-four (51–69 months; M = 5 years, 0 months; 12 girls) preschoolers participated in the study. Children came from mixed socio-economic backgrounds from a large German city and were recruited via urban daycare centers (in which testing took place). Parents provided written informed consent. One additional child was tested but excluded due to uncooperativeness.

### Design

In a within-participants design, all children received a factual claim task with two conditions: a puppet made an incorrect claim that would either lead to harm (harm condition) and to no harm (no harm condition). The factual claim task was preceded by a warm-up session (playing with a ball) and a training phase which consisted of two instrumental warm-up tasks (one harm, one no harm condition). The order of condition was counterbalanced between children.

## Procedure

Two experimenters conducted the study, which lasted roughly 10 minutes: E1, the coordinator, and E2, who operated two puppets (an elephant named "Susi" and an owl named "Lore"). The child, E1, and E2 sat at a table. E1 sat to the child's left, and E2 on the child's right. The factual claim task was preceded by a training phase with two warm-up tasks to make sure children understood the consequences of an incorrect behavior that led either to harm or to no harm.

### **Training phase**

In the harm condition, E1 first showed the child and the puppets five stickers and put them in front of the owl ("Look Lore, these are your stickers. These are yours. Look [referring to the child] these are Lore's stickers and Lore really likes these stickers."). The owl confirmed this by saying, "Yes, I really like these stickers! And if my stickers are gone, I will be very sad!" and subsequently said goodbye and went to sleep. First, the experimenter performed an instrumental action that the child could reproduce (e.g., using a hammer to hit on wooden balls to send them through holes of a cuboid). After that she put a box on the table asking the child to pay attention ("And now pay attention to what Susi will do! But Susi must not do anything wrong! If Susi does something wrong, I will take away all of Lore's stickers and put them in this box and then Lore is very sad!"). In the no harm condition, there was only the elephant present and instead of stickers, a paper ball was the object of interest. The experimenter showed the child another instrumental action that the child could reproduce (e.g., putting a disc on a peg). Thereafter, the experimenter put a box on the table asking the child to pay attention ("And now pay attention to what Susi will do! But Susi must not do anything wrong! If Susi does something wrong, I will take this paper and put it in this box and then no one is sad!"). In the test phase of both the harm and the no harm conditions, the elephant made an instrumental mistake by failing to use a conventional means necessary to achieve an aim (e.g., failing to use the hammer). When the experimenter turned back she asked the child two control questions, "Did Susi do it right or wrong?" and "What will I do with these stickers/the paper?" Depending on the child's answer, the experimenter either confirmed the child's answer or she corrected him/her, and as announced, the experimenter put the stickers/paper in the box on the table. After answering the control questions, the child was asked to evaluate the elephant's action for its moral valence on a four-point Likert scale with smiley faces as anchor ("Susi did it wrong. Is this very bad [German: "schlecht"], a little bad, good or very good.") and was asked to justify his/her evaluation.

### **Factual claim task**

The important difference between the factual claim task and the warm-up tasks in the training phase was that instead of evaluating instrumental actions the child was asked to evaluate factual claims for their moral valence and the child did not see the announced consequences, but had to anticipate them. The setup was similar to the one in the training phase but differed in two ways: in the harm condition, the stickers were replaced by gems and in both conditions, objects were used instead of toys. In the introduction phase, the owl again declared that she likes her gems very much and would be very sad if her gems would be gone and subsequently went to sleep. Then, the experimenter put an object (e.g., a spoon) and a box on the table and asked the child to pay attention to what the elephant was going to say "And now pay attention to what Susi will say. But Susi must not say anything wrong! If Susi says something wrong, I will take away all of Lore's gems and put them in this box and then Lore is very sad." (harm condition), or "If Susi is saying something wrong, I will take this paper and put it in this box and then no one is sad!" (no harm condition). When the experimenter had turned around, the elephant thought aloud: "Well, when I am saying something wrong, [experimenter's name] will take away all of Lore's gems and put them in this box and then Lore is very sad." (harm condition), or "Well, when I am saying something wrong, [experimenter's name] will take this paper and put it in this box and then no one is sad!" (no harm condition).

In the test phase of both conditions, the elephant pointed to the object (e.g., spoon) and made an incorrect claim: "I say this is an X (e.g., cat)." The experimenter then turned back and corrected the elephant saying, "This is an Y, not an X!" The child was then asked to evaluate the elephant's speech act for its moral valence on a four-point Likert scale with smiley faces as anchor ("Susi said it wrong. Is this very bad, a little bad, good or very good?") and to justify his/her evaluation.

# Coding and Reliability

All sessions were transcribed and coded from videotape by a single observer. A second independent observer, blind to the hypotheses and conditions of the study, transcribed and coded a random sample of 25% of all sessions for reliability.

Children's answers to the control questions (dichotomous variable: correct or incorrect response to E1's questions), their evaluation on the Likert scale – from 1 (very good) to 4 (very bad) – and the justification of their evaluation were coded. Children's verbal responses were assigned to the following categories (the first and third categories were determined a priori; see also Nobes et al., 2009): (a) references to consequences (e.g., "Because now all gems are gone."; "Because now no one is sad."); (b) references to the elephant's actions and speech acts (e.g., "Because she did it wrong.," "Because it is not a cat."); (c) references to the elephant's intentions (e.g., "Because she [the elephant] wants to have the stickers."); (d) irrelevant justifications (e.g., "Because the gems are so beautiful."); or (e) no justifications (including "Don't know").

Interrater reliability was very good, Cohen's κ = 1 (both answers to the control question 1 and 2), κ = 1 (warm-up task evaluation), κ = 1 (warm-up task justification), κ = 1 (factual claim task evaluation), and κ = 1 (factual claim task justification).

# Statistical Analysis

Statistical Analysis were run in R, version 3.4.2 (R Core Team, 2016). For the measure evaluation of the action in the warm-up and the speech act in the factual claim task, we used nonparametric statistics (Wilcoxon Z-tests) instead of paired sample t-tests, because errors were not normally distributed. For nonparametric tests, we computed the generic effect size r.

#### Fedra and Schmidt Moral Evaluation of Factual Claims

# Results

#### Factual Claim Task

fpsyg-09-01841 September 28, 2018 Time: 13:22 # 5

#### **Evaluation**

In the factual claim task, children evaluated the puppet's speech act significantly more negative when the speech act would lead to harm (M = 3.29, SD = 0.75) than when it would lead to no harm (M = 2.54, SD = 1.06; Z = −2.360, p = 0.018, r = 0.481). **Figure 1** shows the mean score of children's evaluation of the puppet's speech act.

#### **Justifications**

Children also had the opportunity to justify their evaluation. **Table 1** shows the frequencies of children's justifications.

#### **Relation between evaluation and justifications**

For the purposes of analyses, children were categorized as "competent" (i.e., children who evaluated the puppet's speech act that would lead to harm more negatively than the speech act that would lead to no harm) and "other" (i.e., the rest of the sample). There were significant associations between children's justifications and their competence in evaluating the moral valence of the puppet's speech act both when it would lead to harm, χ 2 (2, N = 24) = 6.45, p = 0.011, V = 0.42 and to no harm, χ 2 (2, N = 24) = 4, p = 0.045, V = 0.31 (see **Table 2**), such that competent children were more likely to justify their evaluation referring to the consequences of the speech act (rather than using other types of justification) than other children.

#### Warm-Up Task

Children answered two control questions in the warm-up tasks to make sure they understood the consequences of a wrong action. In the harm condition, one child (4%), and in the no harm condition, two children (8%) gave incorrect answers to the first control question ("Did Lore do it right or wrong?," correct answer was "wrong"). In the harm condition, no child, and in the no harm condition, two children (8%) gave an incorrect answer to the second control question ("And what will I do with the stickers/paper?," correct answer was "You put them/it in the box.").

#### **Evaluation**

In the warm-up tasks, children evaluated the wrong behavior significantly more negative when the action led to harm (M = 3.38, SD = 0.65) than when it led to no harm (M = 2.71, SD = 1; Z = −2.495, p = 0.011, r = 0.509).

#### **Justifications**

See **Table 1** for the frequencies of children's justifications.

#### **Relation between evaluation and justifications**

There was no significant association between children's justifications and their competence in evaluating the moral valence of the puppet's action that led to harm, χ 2 (2, N = 24) = 1.74, p = 0.587, V = 0.27 (see **Table 2**). However, there was a significant association between children's justifications and their competence in evaluating the moral valence of the puppet's action that led to no harm, χ 2 (2, N = 24) = 5.71, p = 0.016, V = 0.36, such that competent children were more likely to justify their evaluation referring to the consequences of the action (rather than using other types of justification) than the other children.

## Discussion

Children in this experiment evaluated the puppet's factual claim act more negatively when it would lead to harmful consequences than when it would lead to no harm. Moreover, those children who evaluated the puppet's assertions competently (i.e., evaluating the harm-related assertion as worse than the no harm-related assertion) were more likely to justify their evaluation referring to the consequences of the factual claim than to give irrelevant or no justification, whereas the other children (i.e., those who did not differentiate between the two types of factual claims or gave a more negative evaluation of the no harm-related assertion) were more likely to refer to the incorrect factual claim itself, to give an irrelevant answer or no justification. This suggests that preschoolers' normative understanding goes beyond evaluating others' intrinsically harmful (non-)verbal actions (e.g., hitting, lying), and also entails an appreciation of the moral consequences of others' assertive speech acts. However, this experiment leaves open the question of whether children appreciate morally relevant intentions underlying others' assertive speech acts when controlling for outcome. Thus, to assess this question, we conducted a second experiment in which consequences would always be harmful and either intended by a puppet (bad intention) or not (good intention).

# EXPERIMENT 2

In Experiment 2, in contrast to Experiment 1, incorrect factual claims always would lead to harm. However, the puppet either intended those harmful consequences or not. Findings from different studies suggest that when confronted with vignettes about different types of transgressions, children can differentiate between acts based on good and acts based on bad intentions from around 4 to 5 years of age (Núñez and Harris, 1998; Nobes et al., 2016, 2009). Furthermore, Killen et al. (2011) found that children began to take into account a transgressor's intention between 3.5 and 5.5 years, such that children who passed classical false belief tasks were more likely to attribute good intentions to an accidental transgressor and to decline punishment of the accidental transgressor than children who failed the false belief task. Importantly, we went beyond prior work and did not investigate whether children consider intentions when evaluating intrinsically harmful non-verbal actions (e.g., physical harm, such as breaking cups or hurting another person accidentally or intentionally) or verbal actions (e.g., lying), but rather whether children consider whether a puppet intends harm to occur when evaluating her speech act. If they do, children should evaluate the well-intended puppet's incorrect factual claim more positively than the ill-intended puppet's incorrect factual claim.

# Methods

#### Participants

Twenty-four (48–71 months; M = 5 years, 0 months; 12 girls) preschoolers participated in the study. Children came from

TABLE 1 | Frequencies (percentage) of justifications.


TABLE 2 | Association between evaluation and justification.


mixed socio-economic backgrounds from a large German city and were recruited via urban daycare centers and a museum (in which testing took place). Parents provided written informed consent. One additional child was tested but excluded due to language difficulties.

#### Design

In a within-participants design, all children received a factual claim task in which a puppet made an incorrect assertion that would always lead to harm. The task had two conditions which differed in that the puppet's intention was either good or bad (good-intention condition and bad-intention condition). The factual claim task was preceded by a warm-up session (playing with a ball) and warm-up tasks which consisted of two instrumental tasks. A forced choice task always came last. The order of condition was counterbalanced between children. The order of the puppets' appearance remained the same (elephant, dog, lion, and seal).

## Procedure

Two experimenters conducted the study, which lasted roughly 15 minutes: E1, the coordinator, and E2, who operated the victim (an owl puppet), the two actor puppets (an elephant and a dog) and the two speaker puppets (a lion and a seal). Each puppet was used in one trial only. The child, E1, and E2 sat at a table. E1 sat to the child's left, and E2 sat vis-à-vis to the child (thus the child faced the puppets).

The factual claim task was preceded by a training phase, consisting of two warm-up tasks to make sure children understood the consequences of an incorrect behavior that was based on good or bad intentions.

### **Training phase**

E1 first showed the child and the two puppets (e.g., owl and elephant) five stickers and put them in front of the owl ("Look owl, these are your stickers. These are yours. Look [referring to the child] these are the owl's stickers and she really likes these stickers."). The owl confirmed this by saying "Yes, I really like these stickers! And if my stickers are gone, I am very sad!" and subsequently said goodbye and went to sleep. First, the experimenter performed an instrumental action that the child could reproduce (e.g., using a hammer to hit on wooden balls to send them through holes of a cuboid). After that she put a box in front of the elephant, and asked the child to pay attention ("And now pay attention to what the elephant will do! But he must not do anything wrong! If he does something wrong, I will take away all of the owl's stickers and put them in the elephant's box and then the owl is very sad!"). When the experimenter had turned around, the elephant repeated: "Well, if I do something wrong, [experimenter's name] will take away all of the owl's stickers and put them in my box, and then the owl is very sad." In the bad intention condition, he announced: "The owl should not keep the stickers. I want those stickers. That's why I want to do something wrong.," while announcing in the good intention condition: "The owl should keep the stickers. I do not want those stickers. That's why I want to do something right."

In the test phase, in both the good and the bad intention conditions, the elephant made an instrumental mistake, by failing to use a conventional means necessary to achieve an aim (e.g., failing to use the hammer). When the experimenter turned back, she asked the child "Did he do it right or wrong?" and "What will I do with these stickers?" Depending on the child's answer, the experimenter either confirmed the child's answer or she corrected him/her, and as announced, the experimenter put the stickers in the other puppet's box. After answering the control questions, the child had to evaluate the elephant's action for its moral valence on a Likert scale ("The elephant did it wrong. Is this mean [German "böse"], a little mean, good or very good of him?") and was asked to justify his/her evaluation. Note that we used the German word "böse" to allow children to focus on intentions and not only on the fact that harm occurred or even that the speech act was incorrect.

### **Factual claim task**

The important difference between the warm-up task and the factual claim task was that instead of evaluating an instrumental action the child had to evaluate factual claims for their moral valence, and the child did not see the announced consequences, but had to anticipate them. The setup was similar to the one in the warm-up task and differed only in two ways: the stickers were replaced by gems and in both conditions, objects were used instead of toys. In the introduction phase, the owl again declared that she likes her gems very much and would be very sad if her gems would be gone and subsequently went to sleep. Then, the experimenter put a box in front of the speaker puppet (e.g., the lion) and an object (e.g., a spoon) on the table, and asked the child to pay attention to what the speaker puppet was going to say ("And now pay attention to what the lion will say. But he must not say anything wrong! If he says something wrong, I will take away all of the owl's gems and put them in the lion's box and then the owl is very sad."). When the experimenter had turned around, the speaker puppet repeated: "Well, when I am saying something wrong, [experimenter's name] will take away all of the owl's gems and put them in my box and then the owl is very sad." In the bad intention condition, the puppet announced: "The owl should not keep the gems. I want those gems. That's why I want to say something wrong.," while announcing in the good-intention condition: "The owl should keep the gems. I do not want those gems. That's why I want to say something right."

In the test phase of both conditions, the speaker puppet pointed on the object (e.g., spoon) and made an incorrect claim: "I say this is an X (e.g., cat)." The experimenter then turned back and corrected the lion ("This is an Y, not an X!"). The child was asked to evaluate the lion's claim for its moral valence on a Likert scale ("The elephant said it wrong. Is this mean, a little mean, good or very good of him?") and to justify his/her evaluation.

After the evaluation trials, both speaker puppets (lion and seal) who took part in the factual claim task came back. The experimenter repeated the puppets' intentions: "The lion wanted to have the owl's gems and therefore wanted to say something wrong. And the seal did not want to have the owl's gems and therefore wanted to say something right. And then both said something wrong. But who of the two is mean?" The child had to choose one puppet and was asked to justify his/her choice.

### Coding and Reliability

All sessions were transcribed and coded from videotape by a single observer. A second independent observer, blind to the hypotheses and conditions of the study, transcribed and coded a random sample of 25% of all sessions for reliability.

Children's answers to the control questions (dichotomous variable: correct or incorrect response to E1's questions), their rating on the Likert scale – from 1 (very good) to 4 (mean) – and the justification of their rating were coded. Children's verbal responses were assigned to categories: (a) references to the puppet's intention (e.g., "Because he did it on purpose.," "Because he said he wants to say it right); (b) references to the consequences (e.g., "Because now all gems are gone.," "Because then she [the owl] is sad."); (c) references to the puppet's action or claim (e.g., "Because he did it wrong.," "Because they are actually scissors."); (d) references to the ownership (e.g., "Because these are the owl's gems."); (e) irrelevant justifications (e.g., "Because

he has sharp teeth."); or (f) no justifications (including "Don't know").

Interrater reliability was very good, Cohen's κ = 1 (both answers to the control question 1 and 2), Cohen's κ = 1 (warmup task evaluation), Cohen's κ = 1 (warm-up task justification), Cohen's κ = 1 (factual claim task evaluation), Cohen's κ = 1 (factual claim task justification), Cohen's κ = 1 (forced-choice task : "Who of the two is mean?"), Cohen's κ = 1 (forced-choice task justification).

#### Statistical Analysis

Statistical Analysis were run in R, version 3.4.2 (R Core Team, 2016). Analyses were carried out as in Experiment 1.

# Results

#### Factual Claim Task

#### **Evaluation**

In the factual claim task, children evaluated the puppet's speech act significantly more negatively when the puppet's intention was bad (M = 3.58, SD = 0.78) than when it was good (M = 3.42, SD = 0.78; Z = −2.00, p = 0.046, r = −0.408). **Figure 2** shows the mean score of children's evaluation of the puppet's speech act.

#### **Justifications**

Children also had the opportunity to justify their evaluation. **Table 3** shows the frequencies of children's justifications.

#### **Relation between evaluation and justification**

For the purposes of analyses, children were categorized as "competent" (i.e., children who evaluated the puppet's speech act that was based on bad intentions more negatively than when it was based on good intention) and "other" (i.e., did not differentiate between the two conditions). As predicted, there were significant associations between children's justifications and their competence in evaluating the moral valence of the puppet's speech act (see **Table 4**): bad intentions, χ 2 (2, N = 24) = 14.40, p = 0.001, V = 0.775; good intentions, χ 2 (2, N = 24) = 11.88, p = 0.002, V = 0.703, such that children who evaluated the puppet's speech act competently were more likely to give justifications that referred to the puppet's intentions (rather than using other justification categories) than children who did not differentiate between the two puppets. These children were more likely to give justifications that referred to the consequences of the speech act, irrelevant justifications or no justifications.

#### Forced-Choice Task

After the evaluation phase, children were asked to identify the "mean" puppet. To test whether the proportion of children choosing correctly the puppet with bad intentions was significantly different from chance (0.50), we conducted a planned exact binomial test (two-tailed). Children reliably chose the puppet with bad intentions (88% of children, p < 0.001). Furthermore, children were asked to justify their choice. Of the children who correctly identified the ill-intended puppet as the "mean" (German: "böse") puppet, nine children (43%) referred to the puppet's bad intentions, three children (14%) to the wrong speech act, two children (10%) to the consequences in their justification, three children (14%) gave an irrelevant, and four children (19%) gave no justification. Of the children who incorrectly identified the well-intended puppet as the "mean" puppet, one child referred to the puppet's bad intentions (33%), one child (33%) to the puppet's good intentions in their justification, and one child (33%) gave an irrelevant justification.

#### Warm-up Task

In the warm-up task, children answered two control questions to make sure they understood the consequences of a wrong action based on good or bad intentions. Only when the puppet had good intentions, eight children gave an incorrect answer to the first control question ("Did she do it right or wrong?," correct answer was "wrong"). When the puppet had bad intentions, one child gave an incorrect answer to the second control question ("And what will I do with the stickers?," correct answer was "You put them in the puppet's box.").

### **Evaluation**

In the training phase, children evaluated the puppet's action marginally more negative when the puppet had bad intentions (M = 3.62, SD = 0.71) than when she had good ones (M = 3.42, SD = 0.78; Z = −1.67, p = 0.096., r = −0.340).

#### **Justifications**

See **Table 3** for the frequencies of children's justifications.

#### **Relation between evaluation and justification**

As predicted, there were significant associations between children's justifications and their competence in evaluating the moral valence of the puppet's action (see **Table 4**): bad intentions, χ 2 (2, N = 24) = 14.29, p = 0.001, V = 0.772; good intentions, χ 2 (2, N = 24) = 8.84, p = 0.012, V = 0.607, such that children who evaluated the puppet's action competently were more likely to give justifications that referred to the puppet's intentions (rather than using other justification categories) than children who did not differentiate between the two puppets or wrongly evaluated the puppet's action more negatively when it was based on good than on bad intentions. These children were more likely to give justifications that referred to the consequences of the action, irrelevant justifications or no justifications.

# Discussion

Children in this experiment evaluated the puppet's factual claim – which was always incorrect and would always lead to harm – more negatively when the puppet intended the harmful outcome (bad intention) than when the puppet did not intend the harmful outcome (good intention). Moreover, competent children (who evaluated the ill-intended speech act more negatively than the well-intended one) were more likely to give justifications that referred to the puppet's intentions than to the consequences of the assertive speech act, whereas the other children (i.e., who did not distinguish between the two speech acts) were more likely to give a justification that referred to the consequences of the speech act, or, for instance, to the wrong speech act itself than to the puppet's intention. Furthermore, children reliably chose the ill-intentioned puppet as being the "mean" puppet. These findings suggest that preschoolers' normative understanding of morally relevant assertions also

TABLE 3 | Frequencies (percentage) of justifications.


TABLE 4 | Association evaluation and justification.


entails an appreciation of the intentions underlying those speech acts.

# GENERAL DISCUSSION

Much developmental research on children's understanding of normativity and morality focused on their evaluation of others' intrinsically harmful (non-)verbal actions, such as hitting, stealing, lying, or teasing. Verbal actions (e.g., assertions), however, may have a moral dimension beyond epistemic harm (e.g., lying) or psychological harm (e.g., teasing). For instance, if someone makes an incorrect factual claim (e.g., "This water is clean!" or "The Earth is flat!"), this may lead to harmful consequences to others. And the speaker may even want those harmful consequences to occur and therefore misuse the factual claim to reach an ill-intended goal. We investigated children's understanding of the moral

dimension of factual claims. In two experiments, children witnessed a speaker making an incorrect assertion ("This is an X!"). In Experiment 1, we varied the speech act's consequences: it would either lead to harm (another puppet would lose her property) or to no harm (a paper ball would be thrown away). Children evaluated the incorrect factual claim that would lead to harm more negatively than the incorrect factual claim that would not lead to any harm. In Experiment 2, the incorrect assertion would always lead to harm (a puppet would lose her property). However, we varied whether the puppet's intention was good (harmful consequences were not intended) or bad (harmful consequences were intended). When the speaker was illintended, children evaluated her claim more negatively than when she was well-intentioned, although both claims would lead to harmful consequences. Importantly, in neither experiment did children witness morally relevant (non-verbal) actions in the factual claim task, such as throwing away someone's property. Rather, they witnessed and evaluated morally relevant factual claims that were related to upcoming consequences or prior intentions.

These findings go beyond previous work on children's evaluation of, and reasoning about, others' morally relevant (non-)verbal actions (e.g., hitting, stealing, lying, and teasing) in interview studies (Peterson et al., 1983; Turiel, 1983; Tisak and Turiel, 1988; Bussey, 1992; Smetana, 2006; Smetana et al., 2012) and children's spontaneous protest responses to norm transgressions in social interactions (Schmidt and Tomasello, 2012). In our study, children did not witness concrete harming non-verbal actions, psychological harm or epistemic harm, but rather factual claims (which, per se, need not be considered moral, but rather correct or incorrect given observable reality; Turri, 2017) with moral relevance. Our findings also go beyond prior work on preschoolers' evaluation of speech acts which did not involve a moral dimension, such as harm. For instance, 3-yearolds were found to criticize speakers who make incorrect factual claims (Rakoczy and Tomasello, 2009). In our experiments, however, claims were always incorrect, and children had to reason about the additional moral layer (consequences or intentionality of consequences) when evaluating the factual claims.

Moreover, in both experiments, competent children (i.e., in Experiment 1, children who evaluated the harm-related speech act more negatively than the no harm-related one, and in Experiment 2, children who evaluated the ill-intended speech act more negatively than the well-intended one, respectively) were more likely to use the appropriate justification type (consequences in Experiment 1, intentions in Experiment 2) rather than other justification categories than the other children (i.e., children who made the reverse evaluation or no difference between the puppets' speech acts). These interrelations bolster the claim that children did not merely evaluate the incorrect factual claim per se, but focused on consequences and intentions, respectively. However, they also suggest that while as a group, children were competent at evaluating the factual claims in moral terms, there are also substantial individual differences in children's competence for evaluation and justification that should be investigated in future work. We should also note that Experiment 2, in particular, was challenging regarding both the design [constant harm, incorrect speech act, (un)intended consequences] and the experimenter's question which referred to the incorrectness of the factual claim ("X said it wrong. Is this mean, a little mean, good or very good of him?"). This might have led some children to focus on whether the assertion matched reality or not (thus not on moral questions). Similarly, Nobes and colleagues (2016) found that the phrasing of the experimenter's question had a huge influence on children's moral evaluation, such as whether they focused on intention or outcome. Moreover, the fact that the anticipated outcome would always be harmful in Experiment 2 (actual harm did not occur in the test phase) might in part explain why children's evaluation in Experiment 2 was overall rather negative. Thus, future research could vary the intentionality of consequences while keeping anticipated consequences harmless.

The forced-choice test in Experiment 2 in which the clear majority of children correctly identified the puppet with ill intentions (and often referred to intentions in their reasoning) as the "mean" one supports the notion that preschoolers appreciate others' intentions as morally relevant and use them for making moral evaluations. Similarly, Killen et al. (2011) found that from around late preschool age, children consider others' intentions regarding morally relevant non-verbal actions in which an accidental transgressor caused harm. Given that Killen and colleagues found systematic associations between children's competence in false belief tasks and their moral evaluation of the non-verbal actions, one interesting question for future research is whether theory of mind skills and moral evaluation of verbal actions – assertions underlain by good or bad intentions – are related.

Together, the present findings suggest that preschoolers' normative understanding is not only confined to evaluating others' intrinsically harmful (non-)verbal actions but also entails an appreciation of the moral dimension of factual claims that are typically merely true or false. And when children evaluate factual claims regarding their moral worth, they take into account consequences and intentions regarding consequences. The current work thus broadens the investigation of the ontogeny of normativity by integrating moral cognition with children's developing understanding of speech acts, such as factual claims. Developing the ability to scrutinize and evaluate factual claims with moral relevance is a crucial skill, perhaps even more so in our digital age in which children are confronted with assertions in virtual forums on a daily basis.

# ETHICS STATEMENT

This research was conducted in accordance with the Declaration of Helsinki, the Ethical Principles of the German Psychological Society (DGPs), and the American Psychological Association (APA) but was not individually reviewed by the ethics committee as this is not obligatory at LMU Munich. It involved no invasive or otherwise ethically problematic techniques and no deception. All parents provided written informed consent, and children gave oral consent to participate in accordance with the Declaration of Helsinki.

# AUTHOR CONTRIBUTIONS

fpsyg-09-01841 September 28, 2018 Time: 13:22 # 11

MS supervised and provided funding for the project. MS developed the study concept. EF and MS designed the study. EF conducted the study and analyzed the data. Both authors interpreted the results. EF drafted the manuscript, and MS provided critical revisions. Both authors approved the final version of the manuscript for submission.

# REFERENCES


# FUNDING

MS was supported by a grant from the Elite Network of Bavaria, an initiative of the Bavarian State Ministry of Science and the Arts.

# ACKNOWLEDGMENTS

The authors would like to thank the research assistants and students of the International Junior Research Group Developmental Origins of Human Normativity for help in recruiting children, coding, and collecting data. They are grateful to all daycare centers, children, and parents for participating in our study.



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Fedra and Schmidt. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Preschoolers Focus on Others' Intentions When Forming Sociomoral Judgments

#### Julia W. Van de Vondervoort\* and J. Kiley Hamlin

Centre for Infant Cognition, Department of Psychology, The University of British Columbia, Vancouver, BC, Canada

Many studies suggest that preschoolers initially privilege outcome over intention in their moral judgments. The present findings reveal that, in contrast, even younger preschoolers can privilege intentions when evaluating characters who successfully or unsuccessfully help or hinder a third party in achieving its goal. Following a live-action puppet show originally created for infant populations, children made a forced-choice social judgment (which puppet was liked) and two forced-choice moral judgments (which puppet was nicer, which puppet should be punished), and were asked to explain their punishment allocations. In two experiments (N = 195), 3- and 4-yearolds evaluated characters with distinct intentions to help or to hinder who were associated with either positive or negative outcomes. Both ages judged characters with more positive intentions as nicer, and allocated punishment to characters with more negative intentions; neither of these tendencies depended on the outcomes the characters were associated with. Three-year-olds' responses were somewhat less consistent than were 4-year-olds', in that 3-year-olds' judgments were disrupted by ambiguous harmful intent. Notably, children's social judgments were less consistent than their moral judgments. In a third and final experiment (N = 100), children evaluated characters with the same intention but who were associated with different outcomes. Children showed inconsistent responding across age and outcome valence, but only 4 year-olds evaluating two characters with positive intentions reliably responded based on outcome. When providing informative responses in all three studies, children most frequently explained their punishment allocations by appealing to the puppet's (attempted) hindering action or failure to help. These findings raise questions as to what underlies different patterns of response across studies in the literature, and suggests that observing live interactions may facilitate young children's intention-based moral judgments.

Keywords: preschoolers, moral judgments, sociomoral judgments, helping, hindering, intention, outcome

# INTRODUCTION

When considering whether an action is good or praiseworthy versus bad or blameworthy, adults are sensitive to both an agent's mental states (their intentions, beliefs, desires) and the outcomes they bring about. While in some cases adults do condemn those that unintentionally cause harm (e.g., Gino et al., 2008; Cushman et al., 2009; Cushman and Greene, 2012), adults typically privilege

#### Edited by:

Erika Nurmsoo, University of Kent, United Kingdom

#### Reviewed by:

Xiao Pan Ding, National University of Singapore, Singapore Gordon Patrick Ingram, University of Los Andes, Colombia

\*Correspondence: Julia W. Van de Vondervoort julia.vandevondervoort@psych.ubc.ca

#### Specialty section:

This article was submitted to Developmental Psychology, a section of the journal Frontiers in Psychology

Received: 04 June 2018 Accepted: 10 September 2018 Published: 02 October 2018

#### Citation:

Van de Vondervoort JW and Hamlin JK (2018) Preschoolers Focus on Others' Intentions When Forming Sociomoral Judgments. Front. Psychol. 9:1851. doi: 10.3389/fpsyg.2018.01851

**64**

intentions over outcomes when making moral judgments (e.g., Malle, 1999; Mikhail, 2007; Young et al., 2007; Cushman, 2008). The ability to incorporate mental state information into moral judgments, rather than focus strictly on the outcomes of morally relevant actions, has long been considered a hallmark of moral maturity (Piaget, 1932/1965; Kohlberg, 1969).

Beginning with the work of Jean Piaget, researchers have explored when this feature of the mature moral sense becomes operational, and have documented a developmental transition whereby children's moral judgments initially focus on outcomes and only later shift to focusing on intentions. For example, Piaget found that younger children tended to judge a child who accidently broke 15 cups as naughtier than a disobedient child who broke one cup, and it was not until age 8 – 10 that children focused on others' intentions by more positively evaluating the child who accidentally caused a large negative consequence (1932/1965).

Subsequent studies revealed that Piaget's methodology led him to underestimate the age at which children can use mental states to inform their moral judgments, suggestive that the centrality of intentions in moral judgments does not require many years of maturation, teaching, and/or relevant experiences to emerge. For example, young children incorporate intentions into their moral judgments when intentions are explicitly stated or otherwise made salient, when intentions are deconfounded from outcomes (e.g., consequences are held constant while intentions vary), and when a larger variety of test questions are used (e.g., asking about the agent rather than the acceptability of the act; e.g., Armsby, 1971; Buchanan and Thompson, 1973; Chandler et al., 1973; Costanzo et al., 1973; Farnill, 1974; Bearison and Isaacs, 1975; Berg-Cross, 1975; Karniol, 1978; Nelson-Le Gall, 1985; Cushman et al., 2013; Nobes et al., 2016). Under these circumstances, even 3-year-olds' judgments show sensitivity to intentions (Nelson, 1980; Yuill, 1984; Nobes et al., 2009). That said, a host of studies have repeatedly demonstrated that young children initially privilege outcome over intention when the two are in conflict, and increasingly consider intention as they age (e.g., Armsby, 1971; Costanzo et al., 1973; Imamoglu, 1975; Moran and O'Brien, 1983; Zelazo et al., 1996; Helwig et al., 2001; Baird and Astington, 2004; Killen et al., 2011; Margoni and Surian, 2017; see also Li and Tomasello, 2018).

Notably, children's ability to incorporate intentions into moral judgments has typically been tested using vignette-based tasks, in which experimenters narrate illustrated stories and then probe children's explicit judgments (but see Chandler et al., 1973; Farnill, 1974, and Li and Tomasello, 2018 for use of videotaped scenes). These judgments include both verbal and Likert scale ratings of action acceptability (e.g., "Is it okay for [her] to [perform that act]? How good is it for [her] to [perform that act]? Is it really, really good, or just a little good, or just okay?", Zelazo et al., 1996) and/or the moral worth of a character (e.g., "Is [he] a good boy or a bad boy?", Costanzo et al., 1973).

Clearly, the tasks described above require that children can process a story presented verbally as well as respond to explicit questioning. These requirements may exclude or underrepresent the abilities of children who are unwilling or unable to engage in explicit questioning. Thus, researchers have recently developed tasks tapping more implicit forms of evaluation. Rather than asking for responses to specific test questions, some researchers have explored early sensitivity to others' prosocial and antisocial intentions using age-appropriate behavioral tasks. Such studies provide further evidence that young children are sensitive to others' intentions. For example, in one study 3-year-olds were less likely to help an adult who attempted but failed to harm a third-party compared to a neutral adult (Vaish et al., 2010), while in another study 3- and 4-year-olds were more likely to spontaneously correct punishments imposed on others who accidentally rather than intentionally caused the same harmful outcome (Chernyak and Sobel, 2016). In a final example, 5- and 6-year-olds used informants' past intentions and outcomes when determining who to trust when searching for a prize (Liu et al., 2013).

Critically, more implicit forms of evaluation, in which neither the story presentations nor the response measures require verbal abilities, also allow for the study of preverbal children who are less than 3 years of age. To illustrate, in one study 5- and 9 month-old infants watched a live-action puppet show featuring a protagonist puppet who repeatedly tried but failed to open a box containing an attractive toy. In alternation, a helper puppet assisted the protagonist in opening the box so that he could access the toy, and a hinderer puppet slammed the box shut, preventing the protagonist from achieving his goal. When subsequently presented with a choice between the helper and hinderer, both 5- and 9-month-olds preferentially reached for the helper rather than the hinderer puppet, suggestive that infants differentially evaluated prosocial versus antisocial others (Hamlin and Wynn, 2011; for replications and related findings see Hamlin et al., 2007, 2010; Buon et al., 2014; Hamlin, 2015; Scola et al., 2015; Steckler et al., 2017; for failure to replicate see Salvadori et al., 2015; see also Scarf et al., 2012).

These implicit paradigms have also been utilized to explore infants' sensitivity to third-party scenarios in which intentions and outcomes conflict. In one such task, 5- and 8-month-olds watched puppet shows in which successful and unsuccessful helpers and hinderers intervened following a protagonist's repeated failure to open a box (Hamlin, 2013). The successful helper and hinderer achieved their respective goals to either assist or thwart the protagonist's goal (as in Hamlin and Wynn, 2011). Conversely, the failed helper and hinderer brought about an outcome that conflicted with their intention: the failed helper tried but failed to open the box, while the failed hinderer tried but failed to prevent the protagonist from opening the box. When presented with different combinations of successful and failed helpers and hinderers, 8-month-olds preferentially reached for puppets with helpful intentions, regardless of the outcome that occurred (i.e., successful helpers over failed hinderers, failed helpers over successful hinderers, and failed helpers over failed hinderers). In contrast, when presented with two puppets who had demonstrated the same intention (i.e., successful helper and failed helper, failed hinderer and successful hinderer), 8 month-olds showed no preference for either puppet, suggestive that they did not evaluate characters based on the outcomes they were associated with. Unlike 8-month-olds, 5-month-olds preferentially reached for successful helpers over successful

hinderers, but showed no preferences when presented with any failed puppet (Hamlin, 2013). Thus, infants' sociomoral evaluations appear to privilege intentions over outcomes by 8 months of age, but not at 5 months (for related evidence with accidental help and harm see Hamlin et al., 2013; Woo et al., 2017).

In another task measuring infants' expectations about characters involved in failed attempts to help and harm, 12 and 16-month-olds watched a video featuring a protagonist unsuccessfully attempting to climb a steep hill. Two characters alternately intervened: A successful hinderer who pushed the protagonist down the hill, and either a successful helper or unsuccessful helper (Lee et al., 2015). Subsequently, looking times suggested that 16-month-olds expected the protagonist to approach the character who had intended to help, even if he failed to do so and the protagonist's outcome was negative. In contrast, 12-month-olds expected the protagonist to approach the successful helper rather than the hinderer, but only to approach the failed helper over the hinderer when outcome information was removed from the video (Lee et al., 2015). Together, these studies demonstrate that although a salient outcome may disrupt this sensitivity, infants are sensitive to others' intentions to help or hinder – even when intentions and outcomes conflict. Indeed, unlike much work with young children (e.g., Armsby, 1971; Costanzo et al., 1973; Moran and O'Brien, 1983; Yuill, 1984; Zelazo et al., 1996; Helwig et al., 2001; Baird and Astington, 2004; Killen et al., 2011; Margoni and Surian, 2017), to our knowledge no infant studies to date have provided evidence that infants' third-party social evaluations and expectations either rely solely on outcome or initially privilege outcome over intention.

What accounts for this apparent developmental discontinuity, whereby infants seem to privilege intentions but young children privilege outcomes? We reasoned that one possibility is that presentation of the social interactions via live puppet shows or videos, rather than via illustrated vignettes, might facilitate understanding, in that a fully acted-out scenario provides richer and more complete information than does a narrated short vignette (see Chandler et al., 1973; Farnill, 1974 for evidence that children are sensitive to the intentions of characters in videotaped scenes from age 6). If so, then presenting preschoolers with live puppet shows may facilitate relatively more mature moral reasoning – that is, positive evaluations of those with positive intentions and negative evaluations of those with negative intentions, irrespective of the eventual outcomes characters bring about.

The current studies explore whether young preschoolers' social and moral judgments privilege intentions, even when the agents' intentions conflict with the outcome of their actions. Scenarios were enacted via a live puppet show and based on shows previously utilized to explore infants' sociomoral evaluations of characters with varying intention who are associated with varied outcomes (Hamlin, 2013). Children viewed events in which a protagonist unsuccessfully attempted to open a box to reach a toy inside (as in Hamlin and Wynn, 2011). Two additional puppets intervened: helpers demonstrated a positive intention to assist the protagonist, while hinderers demonstrated a negative intention to prevent the protagonist from achieving his goal. The helper and hinderer puppets were either successful in bringing about their objective, or failed to assist or thwart the protagonist's goal. Thus, across studies, the protagonist interacted with four puppets: (1) successful helpers who try and help the protagonist achieve his goal, resulting in a positive outcome for the protagonist, (2) successful hinderers who try and block the protagonist's goal, resulting in a negative outcome for the protagonist, (3) failed helpers who unsuccessfully try to help the protagonist achieve his goal, resulting in a negative outcome for the protagonist, and (4) failed hinderers who unsuccessfully try to block the protagonist's goal, resulting in a positive outcome for the protagonist.

Each child was presented with two distinct events (e.g., failed helping and successful hindering), and then were asked three test questions: (1) which of the two puppets they "like," (2) which was "nicer," and (3) which "should get in trouble." After children identified who should get in trouble, they were asked to explain this judgment. While these forcedchoice questions do not allow conclusions regarding whether children (for example) think either puppet is "nice" (rather than "nicer"), these questions have been used to examine 3- to 5 year-olds' social and moral judgments following helping and hindering puppet shows in which intentions and outcomes were not in conflict (Van de Vondervoort and Hamlin, 2017), are consistent with the forced-choice nature of infants' evaluations in past work, and are consistent with questions previously used to explore young children's explicit moral judgments (e.g., Costanzo et al., 1973; Zelazo et al., 1996; Baird and Astington, 2004; Cushman et al., 2013). Following the first round of liking/niceness/punishment test questions, children answered comprehension questions regarding the puppets' actions and the outcome of each event and then answered the same test questions again. Comprehension questions ensured that children attended to both the failed/successful helper/hinderer's actions and the outcome for the protagonist.

Experiment 1 explored whether 3- and 4-year-olds utilize actors' mental states to inform their social and moral judgments when outcomes are equivalent. Children observed a live puppet show featuring a protagonist who failed to achieve his goal to open a box. In the "positive outcome" condition, a successful helper and failed hinderer intervened, resulting in a positive outcome for the protagonist (i.e., the box was opened and the toy was reached). In the "negative outcome" condition, a failed helper and successful hinderer intervened, always resulting in a negative outcome for the protagonist (i.e., the box was closed and the toy was not reached). Experiment 2 then examined whether 3- and 4-year-olds' judgments privilege actors' mental states or the outcome of their actions when these intentions and outcomes conflict. Children observed live puppet show events featuring the same protagonist; a failed helper intervened to bring about a negative outcome for the protagonist, while a failed hinderer was associated with a positive outcome. Finally, Experiment 3 investigated whether 3- and 4-year-olds' judgments were sensitive to outcomes when actors' mental states were equivalent. In one condition a successful helper and a failed helper intervened in the protagonist's struggle, while in the second condition a successful

hinderer and a failed hinderer intervened; critically, both puppets in each condition had the same intention but brought about opposite outcomes.

Based on work showing that young children are sensitive to intentions when outcomes are equivalent across scenarios in vignette tasks [e.g., Cushman et al. (2013) found that children evaluated a character who attempted but failed to cause harm more negatively than a character who successfully brought about an intended positive outcome by age 4; see also Chernyak and Sobel, 2016] and that infants privilege agents' intentions following similar scenarios (Hamlin, 2013), we predicted that 3- and 4-year-olds in Experiment 1 would report liking the character with the positive intention, judge the character with the positive intention as nicer, and allocate punishment to the character with the negative intention, even though the characters were not distinguishable based on the valence of their associated outcomes. Further, based on work showing that 3-year-olds can have difficulty producing interpretable responses to open-ended questions (e.g., Kenward and Dahl, 2011; Van de Vondervoort and Hamlin, 2017), we predicted that in this and all further experiments, 4-year-olds would provide more informative verbal justifications than 3-year-olds. We also predicted that 4-year-olds would be more likely than 3-year-olds to reference sociomoral considerations as the reason for their punishment allocations, including references to the characters' successful or unsuccessful attempts to block the protagonist's goal. We did not predict that child's gender would influence responding, but did explore whether females and males responded similarly in this and all further experiments, as this is common in developmental work (e.g., Helwig et al., 2001; Nobes et al., 2009).

# EXPERIMENT 1

# Method

#### Participants

Children in all experiments were recruited through hospitals and preschools in Vancouver, British Columbia and tested in a university research center or the child's preschool. This and all other experiments were approved by the University of British Columbia's Behavioral Research Ethics Board. Twenty-four 3 year-olds (Mage = 3;6, range = 3;2–3;11, 13 girls) and 24 4-yearolds (Mage = 4;6, range = 4;0–4;11, 16 girls) participated in the positive outcome condition, while 26 3-year-olds (Mage = 3;6, range = 3;0–3;11, 15 girls) and 24 4-year-olds (Mage = 4;4, range = 4;0–4;10, 12 girls) participated in the negative outcome condition. Before data collection began we established a pre-set stopping rule of 24 children per age per condition; two extra 3-year-olds were run due to scheduling issues. An additional 26 3-year-olds were tested but replaced due to failure to complete an English language warm-up (2), procedure error (1), unwillingness to participate (1), and a color and/or side preference that resulted in pointing to the same puppet across all test questions in one or both rounds (22). An additional eight 4-year-olds were tested but replaced due to color/side preferences. The decision to remove children that displayed a color/side preference in one or both rounds of test questions was pre-set following a pilot study, as children who judged that the same puppet is "liked," "nicer" and "should get in trouble" appeared unmotivated and/or that they did not understand the test questions. The **Supplementary Materials** provide key analyses including children with color/side preferences; results are essentially identical in all experiments and do not influence the interpretations reported here. While demographic information was not formally collected, most participants in all experiments came from middle-class families representative of the racial and ethnic demographics of Vancouver, British Columbia.

### Procedure

#### **Warm-up**

Children were shown a picture of a playground and asked to find the swing and slide, and to name the color of a toy and their favorite outside activity. Before data collection began it was decided that children would be replaced in the sample if they were unable/unwilling to locate the swing or slide via pointing; verbal responses were not required.

#### **Puppet show**

Children participated in either the positive outcome condition or the negative outcome condition. All children watched a live puppet show featuring a protagonist struggling to achieve his goal to open a box and reach an attractive toy; a second and third puppet then intervened (successful helper and failed hinderer or failed helper and successful hinderer; see **Figure 1**). Puppet events were based on previous infant studies (Hamlin, 2013; see also Hamlin and Wynn, 2011), with two notable differences (as in Van de Vondervoort and Hamlin, 2017). First, for infants, the puppet events were enacted at the end of a long table and a curtain was lowered between events to hide the puppets; experimenters were hidden behind a curtain at the back of the table. Events in the current experiments were enacted on the floor or a table directly in front of the child and with the experimenter visible. Second, a few non-valenced words were added to the events for narration. All narrations were produced in a high-pitched, positive voice to indicate that the puppet was speaking rather than the experimenter; speech was not modulated based on the valence of the puppets' intention or the eventual outcome.

Children watched four puppet events; two successful helper and two failed hinderer events in the positive outcome condition, two failed helper events and two successful hinderer events in negative outcome condition. At the start of each event, the successful/failed helper/hinderer puppets were seated on either side and back from a clear box containing a purple whale toy. The experimenter enacted the protagonist walking up to the box, looking through the side of the box while saying "Look, a toy!", and unsuccessfully attempting to open the box five times. During the third to fourth attempt, the protagonist said, "Too heavy!". On the fifth attempt, the successful/failed helper/hinderer intervened:

Successful helper. In successful helper events, the puppet ran forward, joined the protagonist's struggle, and aided in opening the box while saying, "Open!" The puppet then ran away and the

FIGURE 1 | Visual depiction of the puppet show events. Written informed consent has been obtained from the depicted individual for the publication of these images.

protagonist laid facedown, grasping the toy inside the box while saying "Toy!"

Failed helper. In failed helper events, the puppet ran forward and joined the protagonist's attempts to open the box three more times; during the first attempt the puppet said "Open!" The puppet then ran away and the protagonist laid facedown beside the box while saying "No toy!"

Successful hinderer. In successful hinderer events, the puppet ran forward and jumped on the box, slamming it closed while saying "Close!" The puppet then ran away and the protagonist laid facedown beside the box while saying "No toy!"

Failed hinderer. In failed hinderer events, the puppet ran forward and jumped on the box, slamming it closed while saying "Close!" The protagonist then struggled to open the box while the puppet jumped on the box twice more<sup>1</sup> before running away. After another struggle, the protagonist successfully opened the box and laid facedown, grasping the toy while saying "Toy!"

The narration during each event was designed to highlight the intervening puppets' intention and the eventual outcome. Children were shown each event twice in a row, for a total of four events. Three puppets were used: a duck (protagonist) and two rabbits wearing a red and a green shirt (failed/successful helper/hinderer, identity counterbalanced). Additional counterbalanced variables were event order (red first, green first) and side of the puppets (red right, red left). For the question period puppets remained on the same side as during the show.

#### **Test questions**

Following the puppet events, children were presented with the successful/failed helper/hinderer puppets and asked (in counterbalanced order) which puppet they preferred (i.e., "Which one of these guys do you like the most?") and which puppet was nicer (i.e., "Which one of these guys was nicer?")<sup>2</sup> . To reduce response perseveration, children were asked to point

<sup>1</sup> In the failed hinderer events in Hamlin (2013) the puppet jumped on the box once; here we equated the number of failed attempts (three) across the failed helper and failed hinderer events. See Experiment 2 in the current paper for children's judgments of the failed hinderer following one versus three attempts to close the box.

<sup>2</sup>After children identified the nicer puppet, they were asked whether that puppet was a "little bit nice or a lot nice" (order counterbalanced). We initially planned to examine niceness judgments on a 3-point scale from "not nice" to "a lot nice," but because we did not train children on this scale prior to testing and because most children at each age responded that the selected puppet was "a lot" nice regardless of which puppet they indicated was nicer, this question is not considered further.

to each puppet in between the liking and niceness questions (e.g., "Point to the guy with a red/green shirt. Right!"). Children were then asked which puppet deserved punishment and to explain this choice (i.e., "I think that one of these guys should get in trouble. Who should get in trouble? Why should he get in trouble?"). Children were prompted if they did not explain their punishment allocation (e.g., "What do you think?"). Children then answered comprehension questions and were asked the same test questions again. For each test question, children received a score of 1 if they responded in the direction of the hypothesis and 0 if not, resulting in a total of six scores (three test questions, two rounds of questioning) between 0 and 1 per child. One 4-year-old in the positive outcome condition responded that both puppets were liked in round one and one 3-year-old in the negative outcome condition responded, "I don't know" when asked which puppet should be punished in round two; these responses were scored as against hypothesis.

#### **Comprehension questions**

Following the first round of test questions, children were shown each event type and asked one comprehension question about the intervening puppet's action (e.g., for successful puppets, "Did he open the box or close the box?" and for failed puppets, "Did he try to open the box or try to close the box?") and one comprehension question about the outcome for the protagonist (i.e., "Did the duck get the toy?"). If answered incorrectly, children were shown the event again and the comprehension question was repeated (e.g., "I don't think he opened the box. Let me show you that one again"). If a comprehension question was answered incorrectly twice, children were corrected (e.g., "He opened the box. This bunny opened the box."). Across experiments, 73% of children answered all four comprehension questions correctly the first time; only 3% of children required corrections before they answered test questions.

### Transcription and Coding

When permitted by caregivers and possible within the preschool, participation was audio and visually recorded. A research assistant transcribed children's verbal explanations from these recordings. When recording was not permitted (53 of 295 children across experiments), explanations were transcribed during the study by the experimenter. Two additional research assistants who were not involved in data collection or transcription coded children's explanations according to the following categories:

#### **Uninformative responses**

Uninformative responses included those in which children provided no verbal response, unintelligible responses, or verbal responses that did not include a justification for the punishment allocation. These verbal responses included statements unrelated to the puppet events (e.g., "there's a big storm"), statements without a justification (e.g., "because in trouble"), and statements that the child was unsure (e.g., "I don't know").

#### **Informative responses**

Informative responses were related to the shows and included:

Protagonist's goal. References to the protagonist's goal to open the box and/or reach the toy inside the box (e.g., "the duck [protagonist] was trying to open it").

Relevant action. References to the puppet's attempted or completed helping or hindering action (e.g., "he was trying to close the box," "he closed the box," "because she didn't open it").

Irrelevant action. References to positively or negatively valenced actions not from the shows (e.g., "because he punched this one"). While inaccurate, these responses may reflect children's beliefs about actions that typically lead to punishment.

Relevant skill valence. References to the positive or negative nature of the puppet's ability to open or close the box (e.g., "not strong," "he wasn't doing it very good")<sup>3</sup> .

Relevant general valence. References to the positive or negative valence of the puppet or its actions that were related to the shows, but not related to the puppet's ability to open or close the box (e.g., "he [selected puppet] was mean," "this one [unselected puppet] was nicer").

Irrelevant valence. References to the positive or negative valence of the puppet or its actions that were not directly related to the shows (e.g., "he's mad").

Non-social considerations. Responses that did not include sociomoral content, such as physical descriptions of the puppet (e.g., "he's soft"), general disliking of the puppet (e.g., "I like the green shirt one"), descriptions of neutral acts (e.g., "he's playing"), and ambiguous statements (e.g., "he does a lot of things").

Each explanation was coded by two independent research assistants for the presence or absence of each response type; coders were blind to the referent (failed/successful helper/hinderer) of the explanation. Informative response types were not mutually exclusive. To avoid over-representing talkative children, whose explanations may have contained several types of informative responses, instances of each explanation type were represented as proportions and averaged across the two rounds. Reliability across the eight categories was strong (average Cohen's kappa = 0.812; see McHugh, 2012). Disagreements were resolved by discussion among the two coders and the first author.

# Results

#### Test Questions

To explore whether responses differed before and after comprehension questions, we conducted a series of mixed-effect ANOVAs with round one scores and round two scores as withinsubjects variables, and age (3, 4) and gender (female, male) as between-subjects factors. When compared to a Bonferronicorrected alpha value of 0.017 (0.05/3), there were no main effects

<sup>3</sup>A need for this category had not been identified when the positive intention condition of Experiment 3 was initially coded. The category was added to the coding scheme once it was clear that some children were using skill explanations and the data was entirely recoded without discussing any statements within that condition. Due to the order in which the data was coded, this category was not included in the coding scheme for Experiment 2A; inspection of the transcriptions by the first author revealed it was not necessary to recode as no children used skill explanations. The full coding scheme including skill explanations was utilized for all other experiments and conditions.

of round or interactions involving round of questioning within the positive outcome condition (all Fs < 6.039, ps > 0.017, η 2 p s < 0.122) or the negative outcome condition (all Fs < 3.437, ps > 0.069, η 2 p s < 0.069). Thus, children's scores were summed across rounds resulting in three scores between 0 and 2 per child (liking, niceness, trouble scores). See the **Supplementary Materials** for scores in each round for all experiments; in all experiments, results from the first round are similar to those reported here and do not influence the interpretations presented in the main text. The dataset generated and analyzed for these experiments can be found on the Open Science Framework<sup>4</sup> .

#### **Confirmatory analyses**

To determine at what age(s) liking, niceness, and trouble scores differed from chance, a series of one-sample t-tests compared scores at each age to a chance score of 1. Three-year-olds in the positive outcome condition did not distinguish between the puppets when reporting who they liked (p = 0.137), while 4 year-olds liked the successful helper (p = 0.015). Both ages judged the successful helper to be nicer (ps < 0.001) and allocated punishment to the failed hinderer (ps < 0.001). In the negative outcome condition, both ages liked the failed helper (p3−year−olds = 0.047; p4−year−olds = 0.005), judged the failed helper as nicer (ps < 0.001), and allocated punishment to the successful hinderer (ps < 0.001; see **Figure 2** and **Table 1** for descriptive and test statistics).

#### **Exploratory analyses**

To examine whether age, gender, and/or question type influenced children's tendency to respond in the direction of the hypothesis, we conducted two mixed-effect ANOVAs with question type (liking, niceness, and trouble scores) as a within-subjects variable (repeated-measure), and age (3, 4) and gender (female, male) as between-subjects factors. In the positive outcome condition, there was only a main effect of question type (F[1.463,64.364] = 14.794, p < 0.001, η 2 <sup>p</sup> = 0.252; all other Fs < 1.542, ps > 0.220, η 2 p s < 0.035). To explore the main effect of question type, a series of paired-samples t-tests using the Bonferroni corrected alpha value of 0.017 (0.05/3) was used to compare scores on each question type across age. In the positive outcome condition, children were less likely to respond in the direction of the hypothesis when asked which puppet they liked (M = 1.333, SE = 0.113) compared to which puppet was nicer (M = 1.854, SE = 0.059; t[47] = 4.518, p < 0.001, d = 0.652) and which puppet should get in trouble (M = 1.792, SE = 0.079; t[47] = 4.276, p < 0.001, d = 0.617); there was no difference between niceness and trouble scores (t[47] = 1.000, p = 0.322, d = 0.144).

In the negative outcome condition, there was a main effect of question type (F[1.293,59.466] = 6.363, p = 0.009, η 2 <sup>p</sup> = 0.122) and an interaction between age and gender (F[1,46] = 4.483, p = 0.040, η 2 <sup>p</sup> = 0.089; all other Fs < 0.846, ps > 0.362, η 2 p s < 0.019). To explore the main effect of question type, a series of pairedsamples t-tests using the Bonferroni corrected alpha value of 0.017 (0.05/3) was used to compare scores on each question type across age. Children were again less likely to respond in the direction of the hypothesis when asked which puppet they liked

<sup>4</sup>https://osf.io/mgzq7/?view\_only=903f3a74292940ee92312a2edb3aa7be




Mean scores range between 0 and 2 with higher values indicating higher rates of with-hypothesis responding across two rounds of questioning.

(M = 1.420, SE = 0.115) compared to which puppet was nicer (M = 1.740, SE = 0.080; t[49] = 2.947, p = 0.005, d = 0.417) and which puppet should get in trouble (M = 1.660, SE = 0.093; t[49] = 2.585, p = 0.013, d = 0.366); again there was no difference between niceness and trouble scores (t[49] = 1.661, p = 0.103, d = 0.235). To explore the interaction between age and gender, two independent-samples t-tests using the Bonferroni corrected alpha value of 0.025 (0.05/2) were used to compare overall scores (summing liking, niceness, and trouble scores, resulting in a score between 0 and 6 for each child) across the three question types. Among 3-year-olds, males were more likely to respond in the direction of the hypothesis (M = 5.546, SE = 0.207) than were females (M = 4.067, SE = 0.530; t[18.024] = 2.600, p = 0.018, d = 0.945); there was no difference between 4-year-olds males' (M = 4.667, SE = 0.620) and females' scores (M = 5.250, SE = 0.392; t[22] = 0.796, p = 0.435, d = 0.339).

#### Punishment Explanations

Children at both ages most frequently appealed to relevant (un)successful helping or hindering actions when explaining their punishment allocations: 44% of 3-year-olds and 65% of 4-year-olds in the positive outcome condition, and 52% of 3-yearolds and 59% of 4-year-olds in the negative outcome condition did so (see **Table 2**). In their statements, nearly all children referenced the puppet's attempted or completed hindering action (e.g., "he was closing it," "he closed the lid"), although one 4-yearold in the negative outcome condition referenced a helping action (i.e., "he's trying to open that" in round one, and "because he was opening the box" in round two when explaining why the failed helper should be punished).

To test whether younger children provide less interpretable explanations, two factorial ANOVAs examined the effect of age (3, 4) and gender (female, male) on the proportion of uninformative responses across rounds. While the proportion of uninformative responses was greater among 3-year-olds compared to 4-year-olds in the positive outcome condition (F[1,44] = 4.117, p = 0.049, η 2 <sup>p</sup> = 0.086; all other Fs < 2.694, ps > 0.107, η 2 p s < 0.059), there was no difference in the proportion of uninformative responses across age in the negative outcome condition (F[1,46] = 1.665, p = 0.203, η 2 <sup>p</sup> = 0.035; all other Fs < 2.055, ps > 0.158, η 2 p s < 0.044).

Finally, to test whether 4-year-olds would be more likely than 3-year-olds to reference relevant sociomoral content when explaining their punishment allocations, we combined appeals to the protagonist's goal, relevant actions, relevant general valence, and relevant skill valence into a single "relevant responses" category. A factorial ANOVA examining the effect of age (3, 4) and gender (female, male) on the proportion of relevant responses revealed that both ages provided equally relevant responses in both the positive outcome condition (F[1,44] = 2.867, p = 0.097, η 2 <sup>p</sup> = 0.061; all other Fs < 0.187, ps > 0.667, η 2 p s < 0.005) and the negative outcome condition (F[1,46] = 0.180, p = 0.674, η 2 <sup>p</sup> = 0.004; all other Fs < 0.551, ps > 0.461, η 2 p s < 0.013).

Overall, Experiment 1 demonstrated that preschoolers distinguish between characters with opposing intentions when outcomes are uninformative. When presented with a successful helper and failed hinderer who both brought about a positive outcome, 3-year-olds showed no preference for either character while 4-year-olds' preferred the successful helper. Both 3- and 4-year-olds judged the successful helper to be nicer and allocated punishment to the failed hinderer. When presented with a failed helper and successful hinderer who both brought about a negative outcome, both 3- and 4-year-olds preferred the failed helper, judged the failed helper as nicer, and allocated punishment to the successful hinderer. Across conditions, children at both ages were more likely to respond in the direction of the hypothesis with respect to the moral questions (niceness/punishment) than the social questions (liking), and most often referenced


the character's attempted or completed hindering action when explaining which character should get in trouble.

Results from Experiment 1 are consistent with past work in which young children demonstrate sensitivity to others' intentions when intentions do not conflict with the outcomes brought about (Buchanan and Thompson, 1973; Costanzo et al., 1973; Farnill, 1974; Nelson, 1980). Experiment 2 sought to determine whether children still privilege intentions when they do conflict with outcomes. Children observed a puppet show featuring a protagonist who unsuccessfully attempted to open a box. A failed helper and failed hinderer intervened; both characters brought about outcomes that conflicted with their intention. Based on past work showing that older preschoolers can incorporate intention information into their vignette-based judgments [e.g., Cushman et al. (2013) found that children evaluate accidental harm more positively than attempted harm by age 5], younger preschoolers' sensitivity to intentions following puppet show events in Experiment 1, and past work showing that infants privilege agents' intentions following these puppet events (Hamlin, 2013), we predicted that both 3- and 4-year-olds would report liking the character with the positive intention, judge the character with the positive intention as nicer, and allocate punishment to the character with the negative intention.

# EXPERIMENT 2A

# Method

#### Participants

Twenty-four 3-year-olds (Mage = 3;5, range = 3;0–3;11, 13 girls) and 24 4-year-olds (Mage = 4;6, range = 4;0–4;11, 9 girls) were tested in a university research center or the child's preschool. An additional 15 3-year-olds were replaced due to unwillingness to participate (1), caregiver interference (1), failure to accept correction during comprehension questions (2), and a color/side preferences (11). An additional three 4-year-olds were replaced due to unwillingness to participate (1) and color/side preferences (2).

#### Procedure

The warm-up task, test questions, comprehension questions (i.e., "Did he try to open the box or try to close the box? Did the duck get the toy?"), transcriptions and coding procedures were identical to Experiment 1.

#### **Puppet show**

Children watched a live puppet show featuring a protagonist struggling to open a box; a second and third puppet intervened (failed helper in two events, failed hinderer in two events). All details were identical to those in Experiment 1, except for the actions of the failed hinderer:

Failed hinderer. In failed hinderer events, the puppet ran forward and jumped on the box, slamming it closed while saying "Close!" The puppet then ran away. After another struggle, the protagonist successfully opened the box and laid facedown, grasping the toy while saying "Toy!" Note that unlike in the positive outcome condition of Experiment 1, the present failed hinderer puppet

relevant skill valence.

jumped on the box once rather than three times; this mirrors the failed hinderer events shown to infants in Hamlin (2013).

# Results

#### Test Questions

A series of mixed-effect ANOVAs explored whether responses differed before and after comprehension questions; this revealed no main effects of round or interactions involving round of questioning on liking, niceness, or trouble scores (all Fs < 5.686, ps > 0.020, η 2 p s < 0.115). Children's scores were summed across rounds resulting in three scores between 0 and 2 per child (liking, niceness, trouble).

#### **Confirmatory analyses**

A series of one-sample t-tests comparing liking, niceness, and trouble scores at each age to a chance score of one revealed that younger children did not distinguish between the puppets: 3 year-olds' liking (p = 0.357), niceness (p = 0.203), and trouble (p = 0.417) scores did not differ from chance. In contrast, 4-yearolds liked the failed helper (p = 0.002), judged the failed helper as nicer (p < 0.001), and allocated punishment to the failed hinderer (p < 0.001; see **Figure 3** and **Table 1**).

#### **Exploratory analyses**

A mixed-effect ANOVA was used to examine whether age, gender, and/or question type influenced children's tendency to respond in the direction of the hypothesis. This revealed only a main effect of age (F[1,44] = 7.214, p = 0.010, η 2 <sup>p</sup> = 0.141; all other Fs < 2.449, ps > 0.114, η 2 p s < 0.054), such that 4-yearolds' overall scores across the three questions types (M = 5.125, SE = 0.363) were higher than 3-year-olds' (M = 3.500, SE = 0.421).

### Punishment Explanations

The most frequent response among 3-year-olds were uninformative (40%), while their most informative responses were appeals to relevant (attempted) helping or hindering actions (29%); nearly all these appeals referenced a hindering action (e.g., "this one tried to close the box," "because he closed it"), although one 3-year-old referenced a helping action in the second round (i.e., "because he's trying to open" when explaining why the failed helper should be punished). The most frequent response among 4-year-olds were appeals to relevant (attempted) hindering actions (42%; see **Table 2**). A factorial ANOVA examined the effect of age and gender on the proportion of uninformative responses across rounds. While there was no main effect of age or gender (all Fs < 2.396, ps > 0.128, η 2 p s < 0.053), there was an interaction between these factors (F[1,44] = 6.649, p = 0.013, η 2 <sup>p</sup> = 0.131), such that 3-year-olds males (M = 0.546, SE = 0.142) provided more uninformative explanations than 4-year-old males [M = 0.067, SE = 0.067; t(14.377) = 3.047, p = 0.008, d = 1.373], while female 3-year-olds (M = 0.269, SE = 0.108) and 4-year-olds (M = 0.389, SE = 0.162) provided the same proportion of uninformative responses (t[20] = 0.642, p = 0.528, d = 0.292). Finally, a factorial ANOVA examining the effect of age and gender on the proportion of relevant sociomoral responses (protagonist's goal, relevant actions, relevant general valence, and relevant skill valence) revealed that 4-year-olds provided more relevant responses than 3-year-olds (F[1,44] = 4.594, p = 0.038, η 2 <sup>p</sup> = 0.095; all other Fs < 1.695, ps > 0.199, η 2 p s < 0.038).

Overall, Experiment 2A reveals that 4-year-olds, but not 3-year-olds, privilege intentions when making social and moral judgments. When presented with a failed helper and failed hinderer, 4-year-olds preferred the failed helper, judged the failed helper to nicer, and allocated punishment to the failed hinderer. Four-year-olds' explanations of their punishment allocations most frequently referenced the puppet's (attempted) hindering action. In contrast, 3-year-olds failed to distinguish between the puppets when asked which puppet was liked, nicer, and should be punished. Given their chance responding to test questions, it is unsurprising that 3-yearolds' explanations regarding punishment allocations were largely uninformative.

Given young children's documented struggle to privilege intentions when intentions and outcomes conflict (e.g., Costanzo et al., 1973; Cushman et al., 2013), one possibility is that 3-yearolds simply do not use intentions to inform their sociomoral judgments when individuals can instead be distinguished by outcomes. Although 3-year-olds did not reliably distinguish characters by either intention or outcome, it is possible that they are in a transitional stage. An alternative possibility is that 3-year-olds can privilege intentions, but that the puppet shows in Experiment 2A did not adequately convey this mental state information to them. Specifically, the failed hinderer demonstrated his intention to close the box only once before the protagonist successfully opened it (in contrast, the failed helper demonstrated its intent three times); this may have made the strength of the failed hinderer's negative intent somewhat ambiguous, rendering the distinction between the characters unclear.

Experiment 2B explored whether children privilege intentions when the failed hinderer's intentions were made more salient. Children observed a failed helper and failed hinderer intervene in the protagonist's struggle to open a box. As in Experiment 2A, the failed helper attempted to open the box three times. Unlike in Experiment 2A, the failed hinderer also demonstrated his negative intention three times, by repeatedly slamming the box closed. We predicted that both 3- and 4-year-olds would report liking the failed helper, judge the failed helper as nicer, and allocate punishment to the failed hinderer.

# EXPERIMENT 2B

# Method

### Participants

Twenty-four 3-year-olds (Mage = 3;6, range = 3;0–3;11, 11 girls) and 25 4-year-olds (Mage = 4;6, range = 4;0–4;11, 12 girls) were tested in a university research center or the child's preschool. The pre-set sample size was 24 children per age per condition; one extra 4-year-old was run due to scheduling issues. An additional 13 3-year-olds were replaced due to procedure errors (3) and color/side preferences (10). An additional five 4-yearolds were replaced due to procedure error (1) and color/side preferences (4).

#### Procedure

The warm-up task, test questions, comprehension questions (i.e., "Did he try to open the box or try to close the box? Did the duck get the toy?"), transcriptions and coding procedures were identical to previous experiments.

#### **Puppet show task**

Children watched a live puppet show featuring a protagonist struggling to open a box; a second and third puppet intervened (failed helper in two events, failed hinderer in two events). All puppet show details were identical to Experiment 1.

# Results

#### Test Questions

A series of mixed-effect ANOVAs explored whether responses differed before and after comprehension questions; this revealed no main effect of round on niceness or trouble scores, and no interactions involving round for liking, niceness, or trouble scores (Bonferroni-corrected alpha value of 0.017 [0.05/3]; all Fs < 4.511, ps > 0.038, η 2 p s < 0.092). However, liking scores were higher after comprehension questions (M = 0.694, SE = 0.067) versus beforehand (M = 0.531, SE = 0.072; F[1,45] = 7.420, p = 0.009, η 2 <sup>p</sup> = 0.142). Because round of questioning had no effect on liking scores in other experiments and consistently had no effect on niceness or trouble scores, children's scores were summed across rounds resulting in three scores between 0 and 2 per child (liking, niceness, trouble).

#### **Confirmatory analyses**

A series of one-sample t-tests comparing liking, niceness, and trouble scores at each age to a chance score of one revealed that children did not prefer either the failed helper or hinderer: 3 year-olds' liking scores (p = 0.479) and 4-year-olds' liking scores (p = 0.073) did not differ from chance. However, both ages judged the failed helper to be nicer (ps < 0.001) and allocated punishment to the failed hinderer (ps < 0.001; see **Figure 3** and **Table 1**).

#### **Exploratory analyses**

A mixed-effect ANOVA examining whether age, gender, and/or question type influenced children's tendency to respond in the direction of the hypothesis revealed a main effect of question type (F[1.208,54.356] = 14.550, p < 0.001, η 2 <sup>p</sup> = 0.244; all other Fs < 3.821, ps > 0.056, η 2 p s < 0.079). To explore this main effect, a series of paired-samples t-tests using the Bonferroni corrected alpha value of 0.017 (0.05/3) were used to compare scores on each question type across age. Children were less likely to respond in the direction of the hypothesis when asked which puppet they liked (M = 1.225, SE = 0.121) compared to which puppet was nicer (M = 1.735, SE = 0.076; t[48] = 3.900, p < 0.001, d = 0.557) and which puppet should get in trouble (M = 1.735, SE = 0.081; t[48] = 4.228, p < 0.001, d = 0.604); there was no difference between niceness and trouble scores (t[48] = 0.000, p = 1.000, d = 0.000).

#### Punishment Explanations

When asked to explain why the selected puppet should get in trouble, responses most frequently included appeals to relevant (attempted) helping or hindering actions: 39% of 3-year-olds and 79% of 4-year-olds (see **Table 2**). While these appeals typically referenced a hindering action (e.g., "because he tried to close the box"), one 4-year-old referenced the failed helper's action in both rounds (i.e., "because he said open" when explaining why the failed helper should be punished). A factorial ANOVA examined the effect of age and gender on the proportion of uninformative responses across rounds and found that the proportion of uninformative responses was greater among 3 year-olds compared to 4-year-olds (F[1,45] = 7.438, p = 0.009,

η 2 <sup>p</sup> = 0.142; all other Fs < 2.129, ps > 0.151, η 2 p s < 0.046). Finally, a factorial ANOVA examining the effect of age and gender on the proportion of relevant sociomoral responses (protagonist's goal, relevant actions, relevant general valence, and relevant skill valence) revealed that 4-year-olds provided more relevant responses than 3-year-olds (F[1,45] = 11.502, p = 0.001, η 2 <sup>p</sup> = 0.204; all other Fs < 3.518, ps > 0.066, η 2 p s < 0.073).

Overall, Experiment 2B demonstrates that 3-year-olds can privilege intention over outcomes when making moral judgments when intentions are clarified (i.e., by having the failed hinderer demonstrate his intention to close the box three times rather than once). While 3-year-olds in Experiments 2A and 2B showed no preference for either the failed helper or failed hinderer, 3-yearolds in Experiment 2B judged the failed helper to be nicer and the failed hinderer to be more deserving of punishment. Fouryear-olds in Experiment 2B also showed no preference for the failed helper or failed hinderer (c.f. Experiment 2A), but judged the failed hinderer as nicer and allocated punishment to the failed hinderer. This pattern suggests that, like in previous experiments, both 3- and 4-year-olds' moral judgments (i.e., niceness, trouble) favor the failed helper more robustly than their social judgments (i.e., liking). The most frequent explanation for the allocation of punishment at both ages were references to the failed hinderer's hindering action.

Experiment 3 explored whether children utilize outcomes to make social and moral judgments when characters' intentions are the same. When intentions are completely uninformative, sociomoral judgments may favor individuals associated with positive versus negative outcomes, as these individuals may be associated with positive versus negative outcomes again in the future. Alternatively, judgments may favor individuals who are successful in bringing about their intended outcome, whatever it may be. Indeed, previous work has shown a relationship between judgments of competence and judgments of prosociality in children of this age (Stipek and Daniels, 1990; Brosseau-Liard and Birch, 2010; Landrum et al., 2016; but see Fusaro et al., 2011). That said, in a previous study infants tested with similar conditions did not distinguish characters who differed only on outcome (Hamlin, 2013).

In Experiment 3, children observed a puppet show featuring a protagonist unsuccessfully attempting to open a box. In the "positive intention" condition, a successful helper and failed helper intervened; both characters had a positive intention but the successful character brought about a positive outcome for the protagonist and the failed character brought about a negative outcome. In the "negative intention" condition, a successful hinderer and a failed hinderer intervened; both puppets had a negative intention but the successful character brought about a negative outcome for the protagonist and the failed character was associated with a positive outcome. Given previous work showing children's bias toward outcomes when making sociomoral judgments, we predicted that both 3- and 4 year-olds would prefer characters who caused or were associated with positive outcomes. Thus, in the positive intention condition we predicted that children would prefer the successful helper, judge the successful helper to nicer, and allocate punishment to the failed helper, and in the negative intention condition we predicted children at both ages would prefer the failed hinderer, judge the failed hinderer as nicer, and allocate punishment to the successful hinderer.

# EXPERIMENT 3

# Method

### Participants

Twenty-four 3-year-olds (Mage = 3;6, range = 3;0–3;10, 13 girls) and 26 4-year-olds (Mage = 4;5, range = 4;0–4;11, 14 girls) participated in the positive intention condition, while 23 3-yearolds (Mage = 3;6, range = 3;2–3;11, 13 girls) and 27 4-yearolds (Mage = 4;6, range = 4;0–4;11, 14 girls) participated in the negative intention condition. The pre-set sample size was 24 children per age per condition; five additional 4-year-olds were run due to scheduling issues and one child was initially recruited and tested as a 3-year-old but it was later learned that the child was 2-years-old at the time of testing. An additional 32 3-year-olds were tested but replaced due to unwillingness to participate (5), failure to complete an English language warm-up (1), and color/side preferences (26). An additional 17 4-year-olds were tested but replaced due to unwillingness to participate (2), failure to complete an English language warm-up (2), refusal to accept corrections following comprehension questions (1), and color/side preferences (12).

#### Procedure

The warm-up task, test questions, comprehension questions (for successful puppets, "Did he open the box or close the box? Did the duck get the toy?" and for unsuccessful puppets, "Did he try to open the box or try to close the box? Did the duck get the toy?"), transcriptions and coding procedures were identical to previous experiments. One 4-year-old in the positive intention condition indicated that neither puppet should get in trouble in round two, while one 3-year-old in the negative intention condition indicated neither puppet was liked or nicer in both rounds and three 4-year-olds indicated neither puppet was nicer in one or both rounds. These responses were scored as against the hypothesis that children would respond based on outcome.

### **Puppet show**

Children watched a live puppet show featuring a protagonist struggling to open a box; a second and third puppet intervened (two successful helper events and two failed helper events in the positive intention condition, two successful hinderer events and two failed helper events in the negative intention condition). All puppet show details were identical to those in Experiment 1.

# Results

## Test Questions

A series of mixed-effect ANOVAs explored whether responses differed before and after comprehension questions; this revealed no main effect of round or interactions involving round on liking, niceness, or trouble scores within the positive intention condition (Bonferroni-corrected alpha value of 0.017 [0.05/3]; all Fs < 2.896, ps > 0.095, η 2 p s < 0.060) or the negative intention

condition (all Fs < 5.220, ps > 0.026, η 2 p s < 0.103). Children's scores were again summed across rounds resulting in three scores between 0 and 2 per child (liking, niceness, trouble).

#### **Confirmatory analyses**

A series of one-sample t-tests comparing liking, niceness, and trouble scores at each age to a chance score of one revealed that 3-year-olds in the positive intention condition did not distinguish between the puppets when reporting who they liked (p = 1.000) or when allocating punishment (p = 0.357), though they did judge the successful helper to be nicer than the failed helper (p = 0.026). In contrast, 4-year-olds in the positive intention condition liked the successful helper (p = 0.005), judged the successful helper to be nicer (p = 0.005), and allocated punishment to the failed helper (p = 0.026). Children did not differentiate between the puppets for any test questions in the negative intention condition: 3- and 4-year-olds' liking (p3−year−olds = 0.604; p4−year−olds = 0.070), niceness (p3−year−olds = 0.575; p4−year−olds = 0.814), and trouble scores (p3−year−olds = 0.604; p4−year−olds = 0.814, d = 0.046) did not differ from chance (see **Figure 4** and **Table 1**).

#### **Exploratory analyses**

Two mixed-effect ANOVAs revealed no effect of age, gender, or question type on children's tendency to respond in the direction of the hypothesis in the positive or negative intention condition (all Fs < 2.967, ps > 0.071, η 2 p s < 0.062).

#### Punishment Explanations

In the positive intention condition, 3-year-olds' explanations regarding punishment allocation were mostly uninformative (48%). Four-year-olds also provided many uninformative responses (40%), and although neither puppet intended to close the box, 4-year-olds' appeals to relevant actions most often referenced the puppet's failure to help (e.g., "because she didn't open it," "he didn't help the duck open the box"; 42%). In the negative intention condition, 3-year-olds' responses were largely uninformative (37%) or appeals to relevant hindering actions (36%); 4-year-olds most often appealed to relevant hindering actions (34%), but also provided a number of uninformative responses (26%; see **Table 2**). Two factorial ANOVAs found no effect of age or gender on the proportion of uninformative responses across rounds in either condition (all Fs < 1.359, ps > 0.249, η 2 p s < 0.030) and two factorial ANOVAs found no effect of age or gender on the proportion of relevant responses (protagonist's goal, relevant actions, relevant general valence, and relevant skill valence) in either condition (all Fs < 2.761, ps > 0.102, η 2 p s < 0.058).

Overall, Experiment 3 reveals that children's social and moral judgments are not uniformly based on outcomes when intentions are identical. When positively intentioned characters brought about distinct outcomes, 3-year-olds judged the successful helper as nicer, but did not prefer or allocate punishment to either the failed or successful helper. Given their chance responding when asked which puppet should get in trouble, it is unsurprising that 3-year-olds' explanations for this judgment were often uninformative. In contrast, 4-year-olds consistently utilized outcomes to inform their sociomoral judgments of positively intentioned characters (i.e., 4-year-olds liked the successful helper, judged the successful helper as nicer, and allocated punishment to the failed helper). Among 4-year-olds' informative responses, explanations for punishment allocations most often referenced the puppet as having failed to help the protagonist, although neither puppet intended to thwart the protagonist's goal.

In contrast to the positive-intention condition, when negatively intentioned characters brought about distinct outcomes, both age groups responded at chance levels when asked which puppet was liked, nicer, and should get in trouble. When asked to explain their allocation of punishment, 3-year-olds' responses were most frequently uninformative, while 4-year-olds most often appealed to the puppet's attempted or completed hindering action.

# GENERAL DISCUSSION

Experiments 1 – 3 provide evidence that 3- and 4-year-olds readily produce sociomoral judgments based on character's intentions, rather than strictly on the outcomes these characters achieve. After observing live-action puppet shows in which characters' intentions are fully acted out and the consequences of their actions can be directly observed, preschoolers were asked to provide a social judgment (i.e., which of two puppets is liked) and moral judgments (i.e., which of two puppets was nicer and which should be punished). Children were also asked to verbally justify their allocations of punishment. When characters could only be distinguished based on their intentions because outcomes were uninformative (Experiment 1), both 3- and 4-year-olds' moral judgments revealed an intention focus, while children's social judgments were less consistent: 4-year-olds, but not 3-yearolds, liked the successful helper over the failed hinderer, and both ages preferred the failed helper over the successful hinderer. When both intentions and outcomes conflicted in Experiment 2A, 4-year-olds' social and moral judgments showed an intention focus, while 3-year-olds did not distinguish between the puppets. When the failed hinderer's intention was further highlighted in Experiment 2B (i.e., the failed hinderer attempted to block the protagonist's goal three times instead of once, the same number of attempts as the failed helper), children's moral judgments showed a consistent focus on intention over outcome, though neither 3- nor 4-year-olds consistently preferred one character over the other. Finally, when characters had identical intentions but brought about opposing outcomes in Experiment 3, 4-yearolds' social and moral judgments showed an outcome focus when comparing two characters with positive intentions, while 3-year-olds judged the successful helper to nicer than the failed helper and responded at chance when judging which character was preferred and which should receive punishment. Both ages responded at chance in all comparisons involving two negatively intentioned puppets.

Across all experiments, children's most frequent informative justifications for their punishment allocation were appeals to the character's hindering action. This was the case regardless of whether the action was successful or unsuccessful (e.g., children explained that a failed hinderer should get in trouble because he [tried to] block the protagonist's goal), and whether the character had intended to bring about a negative outcome (e.g., when comparing a failed and successful helper, children explained that the failed helper should get in trouble because he did not allow the protagonist's goal to be achieved). While we predicted that 3-year-olds' would provide more uninformative responses than 4-year-olds (see Kenward and Dahl, 2011; Van de Vondervoort and Hamlin, 2017) and that 4-year-olds would be more likely than 3-year-olds to provide more relevant sociomoral considerations in their explanations, these predictions were largely unsupported.

These results provide evidence that young children can privilege intention over outcome when making moral judgments. Contrary to evidence suggesting that a focus on intentions develops after the early preschool years (Piaget, 1932/1965; see also Armsby, 1971; Costanzo et al., 1973; Moran and O'Brien, 1983; Yuill, 1984; Zelazo et al., 1996; Helwig et al., 2001; Baird and Astington, 2004; Killen et al., 2011; Margoni and Surian, 2017), both 3- and 4-year-olds' forced-choice judgments regarding niceness and the allocation of punishment were based on which character displayed a positive versus negative intention to help or hinder a third party, regardless of the outcome achieved (except for 3-year-olds in Experiment 2A, in which the negatively intentioned character's intention may have been unclear). This was the case when the characters being evaluated had opposing intentions but brought about the same outcome, and when both characters' intentions and outcomes conflicted. Further, the consistency between children's responding in the current study and infants' responses to similar scenarios (Hamlin, 2013) suggests that sensitivity to others' intentions develops earlier than previously thought.

Surprisingly, when outcomes were the only way to distinguish between characters, children did not consistently show an outcome bias: although 4-year-olds liked a successful helper over a failed helper, judged the successful helper to be nicer than the failed helper, and allocated punishment to the failed helper, they did not distinguish between a successful and failed hinderer on any test questions. Three-year-olds fared even worse, and responded above chance in the outcome conditions only when asked whether the successful versus failed helper was nicer. These results are surprising, both given past work suggestive of an outcome bias in this age group and because children could have alternatively distinguished between the characters based on which character successfully brought about their intended outcome (see Stipek and Daniels, 1990; Brosseau-Liard and Birch, 2010; Landrum et al., 2016 for evidence that young children's judgments of competence and prosociality are related); however, (aside from some positive evaluation of the successful versus failed helper), children showed no consistent evidence of either strategy.

One potential concern is that 3-year-olds' inability to distinguish between the failed and successful helpers when allocating punishment and both ages' consistent failure to distinguish between the failed and successful hinderers may be due to the moral judgment questions asked. For instance, it is potentially unclear how to respond when asked which of two characters with positive intentions should get in trouble, or which of two characters with negative intentions is nicer. This ambiguity may have resulted in the observed chance-level responses in Experiment 3. However, it is important to note that these questions are only unclear if children are evaluating the puppets in light of their intentions. That is, if children were evaluating characters in terms of the outcomes they brought about, there would have been a clear answer to which of the

two positively intentioned puppets was nicer (i.e., the successful helper) and to which of the two negatively intentioned puppets should get in trouble (i.e., the successful hinderer). Thus, it seems clear that children were not uniformly utilizing an outcome bias to answer these moral judgment questions, even when this was the only way that they might have distinguished between the characters.

Another potential concern is that the chance-level responding among 3-year-olds allocating punishment in the positive intention condition and among both 3- and 4-year-olds in the negative outcome condition of Experiment 3 is due to differences in how the same intentions were displayed across characters. Specifically, in the positive intention condition, the successful helper enacted his positive intention once (at which point he successfully aids the protagonist in opening the box) while the failed helper enacted his positive intention three times (i.e., by repeatedly struggling with the protagonist to try and open the box). Positive evaluations of both the successful helper's positive outcome and the failed helpers' repeated well-intentioned efforts may have resulted chance-level responding among 3-year-olds; 4-year-olds' judgments favored the successful helper despite this concern. In the negative intention condition, the successful hinderer enacted his negative intention once before the protagonist's goal is thwarted while the failed hinderer enacted his negative intention three times (i.e., by repeatedly slamming the box shut before the protagonist was eventually able to achieve his goal). Negative evaluations of both the outcome of the successful hinderer's action and the failed hinderers' repeated negative-intentioned efforts may have resulted in chance-level responding among 3- and 4-year-olds when comparing these two characters. It is possible that equating the intention displays (i.e., the successful helper tries to open the box three times before being successful, the successful hinderer slams the box closed three times before the protagonist fails to achieve his goal) would result in a consistent outcome bias when the characters' intentions are equivalent. This possibility should be explored in future studies utilizing live-action puppet shows.

Regardless of children's judgments when intentions are equivalent, the current studies show that 3- and 4-year-olds can use intentions to form moral judgments when outcomes are equivalent (Experiment 1) and can privilege intentions over outcomes when the two conflict (Experiment 2B). What accounts for children's ability to privilege intentions following a live puppet show, compared to previous studies utilizing illustrated stories, in which young children initially fail to privilege intentions, especially when intentions and outcomes conflict (e.g., Killen et al., 2011; Cushman et al., 2013)? One possibility is that puppet shows allow characters' intentions to be fully acted out, making intentions more salient than when explained during a vignette (even when intentions are explicitly stated). Likewise, processing demands may be reduced when children can observe the events unfold, rather than needing to infer what happened between images illustrating a vignettebased task. Finally, the pragmatic demands of puppet show-based tasks versus vignette-based tasks may account for differences in young children's responding. For example, forced-choice comparisons between two puppets (e.g., asking which of two puppets is nicer) may allow children to distinguish between actors in a way that cannot be observed when children are instead asked to evaluate each character independently (e.g., asking whether each puppet is nice). Further, children's pragmatic reasoning about the experimenter's own intentions may lead them to focus on outcomes following vignettes if caregivers are more likely to use stories rather than pretend play to explain norms of behavior (e.g., stories depicting punishment for harms caused, regardless of the character's intentions; see Westra and Carruthers, 2017 for a discussion of how children's pragmatic reasoning may influence their performance on false-belief tasks). Future studies should probe these possibilities by directly comparing children's judgments following vignette and puppet show versions of the same scenarios.

While the current studies provide evidence that children's moral judgments are intention-based, 3- and 4-year-olds' social judgments less consistently showed an intention-bias. Specifically, exploratory analyses revealed an effect of question type in the positive and negative outcome condition of Experiment 1 and in Experiment 2B, such that children were less likely to respond in the direction of the hypothesis (i.e., that children would favor the character with positive intentions over the character with negative intentions) when asked which character they liked, as opposed to when making moral judgments about niceness and punishment. These analyses suggest that the puppet show events were more consistently viewed as morally relevant as opposed to socially relevant, and that idiosyncratic preferences (e.g., preferences based on the puppet's appearance) may have influenced children's social judgments more than their moral judgments. Children in the current studies were only asked to explain one judgment to prevent contamination between explanations regarding the allocation of punishment and explanations regarding social preferences. That said, future studies should explore whether children's social preferences are justified by appeals to the puppets' helpful intention or by appeals to other aspects of the puppets or the events within the puppet show.

Finally, there are several remaining open questions regarding the developmental trajectory of sensitivity to others' intentions. First, it is currently unknown whether infants' implicit preferences for characters with positive intentions over characters with negative intentions, regardless of outcome (Hamlin, 2013), are related to preschoolers' explicit sociomoral judgments following similar puppet show displays. While it is possible that infants' implicit preferences and young children's explicit judgments are distinct, it may be that sociomoral functioning in infancy is related to explicit moral development later in life. Relatedly, more work is needed to accurately characterize the use of intentions in moral judgments across the lifespan. This could be accomplished by utilizing the same stimuli to examine intention-based judgments in infants, preschoolers, older children, and adults. While the current studies adapted live puppet show stimuli previously shown to infants, practical concerns restricted our sample to the preschool years, rather

than the broader age range necessary to make strong conclusions regarding the continuity of intention sensitivity across the lifespan. Lastly, it is also an open question whether an early focus on intentions is universal, and if so, how this develops into adult-like moral responses across a variety of cultures. Given variability in the extent to which adults from smallscale, non-Western societies incorporate intentionality in moral judgments (Barrett et al., 2016), it is possible that early moral judgments differ along important dimensions, or that infants and young children in both Western and non-Western share an early sensitivity to intentions that is refined according to their culture. Exploring the development of implicit evaluations and explicit judgments within and across diverse individuals over time would greatly contribute to our understanding of how intention and outcome information becomes integrated in mature moral judgments.

# ETHICS STATEMENT

These studies were developed and conducted in accordance with ethical guidelines and the protocols were approved by the University of British Columbia's Behavioral Research Ethics Board. Caregivers gave written informed consent in accordance with the Declaration of Helsinki.

# REFERENCES


# AUTHOR CONTRIBUTIONS

JV and JH developed the study hypothesis and design. Testing, data collection, and data analysis were performed by JV, who also drafted a first manuscript. JH provided critical revisions. JV and JH approved the final version of the manuscript for submission.

# FUNDING

This research was supported by the Social Sciences and Humanities Research Council of Canada, Grant #435-2014-2173.

# ACKNOWLEDGMENTS

We thank all the participating daycares and preschools that made this research possible.

# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyg. 2018.01851/full#supplementary-material



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Van de Vondervoort and Hamlin. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Varieties of Young Children's Prosocial Behavior in Zambia: The Role of Cognitive Ability, Wealth, and Inequality Beliefs

Nadia Chernyak1,2 \*, Teresa Harvey<sup>3</sup> , Amanda R. Tarullo<sup>3</sup> , Peter C. Rockers<sup>4</sup> and Peter R. Blake<sup>3</sup>

<sup>1</sup> Department of Psychology, Boston College, Newton, MA, United States, <sup>2</sup> Department of Cognitive Sciences, University of California, Irvine, Irvine, CA, United States, <sup>3</sup> Department of Psychological and Brain Sciences, Boston University, Boston, MA, United States, <sup>4</sup> Department of Global Health, Boston University School of Public Health, Boston, MA, United States

By the 3rd year of life, young children engage in a variety of prosocial behaviors, including helping others attain their goals (instrumental helping), responding to others' emotional needs (comforting), and sharing resources (costly giving). Recent work suggests that these behaviors emerge early, during the first 2 years of life (Svetlova et al., 2010; Thompson and Newton, 2012; Dunfield and Kuhlmeier, 2013). To date, however, work investigating early varieties of prosocial behavior has largely focused on Western samples and has not assessed the impact of poverty and inequality. In this work, we investigate prosocial behavior in 3-year-olds in Zambia, a lower-middle income country with high wealth inequality. Experiments were integrated into a larger public health study along with both objective and subjective (parent) measures of wealth and inequality. Three-hundred-seventy-seven children (Mean age = 36.77 months; SD = 2.26 months) were presented with an instrumental helping task, comforting task, and two steps of a giving task – one with higher cost (children could give away their only resource) and one with lower cost (children had three resources to give). As predicted, rates of prosociality varied hierarchically by the cost of the action: instrumental helping was the most common followed by comforting, lower cost giving, and higher cost giving. All prosocial behaviors were significantly correlated with one another (with the exception of high cost giving), and with general cognitive ability. Objective family wealth did not predict any of the child's prosocial behaviors. However, subjective beliefs showed that mothers who believed that they had more than others in their village had children who were more likely to engage in instrumental helping, and mothers who believed that village inequality was a problem had children who were more likely to engage in low cost giving. Low cost giving was also more likely for children whose parents reported reading storybooks to them. This suggests that costly giving in the context of pretend play may relate to children's experience with using stories as representations of real life events. The results suggest both cultural differences and universalities in the development of prosociality and point to environmental factors that influence prosociality.

Keywords: Zambia, prosocial behavior, inequality, cross-cultural developmental psychology, preschoolers

#### Edited by:

Kelsey Lucca, University of Washington, United States

#### Reviewed by:

Stuart I. Hammond, University of Ottawa, Canada Valerie Kuhlmeier, Queen's University, Canada

> \*Correspondence: Nadia Chernyak nadia.chernyak@uci.edu

#### Specialty section:

This article was submitted to Developmental Psychology, a section of the journal Frontiers in Psychology

Received: 12 June 2018 Accepted: 25 October 2018 Published: 16 November 2018

#### Citation:

Chernyak N, Harvey T, Tarullo AR, Rockers PC and Blake PR (2018) Varieties of Young Children's Prosocial Behavior in Zambia: The Role of Cognitive Ability, Wealth, and Inequality Beliefs. Front. Psychol. 9:2209. doi: 10.3389/fpsyg.2018.02209

# INTRODUCTION

fpsyg-09-02209 November 15, 2018 Time: 18:3 # 2

By the 3rd year of life, young children show a variety of prosocial behaviors, including helping others attain their goals (instrumental helping), responding to others' emotional needs (comforting), and sharing resources (costly giving). Recent work generally finds that instrumental helping and comforting behaviors, which are lower in cost to self (i.e., effort), develop earlier than costly giving. Giving is considered more costly because one must sacrifice material resources to benefit others. These findings suggest that the development of humans' prosocial behavior proceeds in accordance with how costly it is to oneself (Warneken and Tomasello, 2009; Sommerville et al., 2018). To date, however, work investigating early varieties of prosocial behavior has largely focused on Western and relatively wealthy samples. As such, this work leaves open important research questions that comprised the aim of the current work: (1) to what extent is the development of these three forms of prosocial behavior similar in diverse societies and (2) how do demographic variables, such as wealth, affect children's prosocial behavior within a society. A third question concerns the role that parents' subjective beliefs about their economic status plays in shaping children's prosociality. We investigate these questions in a large-scale sample of Zambian children in order to test these predictions in a sample markedly different from the Western societies in which these questions have previously been studied, as well as to study the impact of local and global inequality on prosocial behavior.

Using both evolutionary and developmental evidence, a hierarchy of prosocial behaviors has been proposed based on the cost of different actions, with helping as relatively low cost and giving away resources as the highest cost (Warneken and Tomasello, 2009). The cost of an action even appears to moderate behaviors within the distinct subtypes of prosociality. For example, 18- and 30-month-olds are more likely to help others when they do not have to give up their own property to do so (Svetlova et al., 2010). For giving, preschoolers give away more of a resource that they value less compared to a resource they value more (Blake and Rand, 2010). Combined these studies suggest a hierarchy of prosociality based on cost to the actor that is evident early in development.

While most studies on children's prosociality have been conducted with Western samples, the limited cross-cultural evidence supports the idea that prosocial behaviors comprise separate subtypes, and that cultural differences in rates of prosocial behavior emerge as the cost of the behavior increases. For example, experiments have found that, similar to Western samples, low cost forms of helping are apparent by 18 months of age in rural communities in India, Peru and Brazil as well as in Western urban communities (Callaghan et al., 2011; Moritz et al., 2016), though notably, rates of prosocial behaviors differed among these samples (Callaghan et al., 2011). Low cost forms of giving (a choice of 1 for self, 0 for a peer vs. 1 for each) have also been found for children in hunter-gatherer, pastoralist and horticultural societies (House et al., 2013). By contrast, high cost forms of giving (a choice of 2 for self, 0 for a peer vs. 1 for each) varied across the same six societies (House et al., 2013). Children in seven diverse societies have also been found to engage in costly enforcement of equality when they face a disadvantage relative to a peer, but cultural variation in fairness enforcement appears when children face an advantage over a peer, a relatively higher cost (Blake et al., 2015a; Corbit et al., 2017). Cultural similarities have also been found in children's willingness to imitate low cost forms of giving, but differences emerge when the costs increase (Blake et al., 2016). Combined these results suggest a universal willingness to engage in prosocial behavior in early childhood that varies across societies as the cost of the prosocial behavior increases. However, a complete test of the hierarchy of costs model for all three subtypes of prosociality has not been conducted outside of Western societies.

Adding new societies for cross-cultural comparisons of development remains an important goal of psychological research (Correa-Chavez and Rogoff, 2005; Nielsen et al., 2017), but variation within a society is also important for examining the effects of environmental variables on children's behavior (e.g., see Alcalá et al., 2014). For example, prior cultural work has observed that children's prosocial behavior is deeply affected by their abilities to observe, participate, and learn from the chores and responsibilities that affect the adults around them (Silva et al., 2010). In Southern Zambia, where this project took place, young children are traditionally expected to help (e.g., gather and carry firewood) from as soon as they can walk and are encouraged to share by adults (Colson, 1967). By 4–5 years of age, children are expected to begin household duties based on gender: boys begin herding livestock and girls help with childcare and planting. Despite these traditional values of work and helping, children in contemporary Zambian society also face varying degrees of resource inequality and malnutrition with high rates of stunted growth in the country as a whole (Rockers et al., 2018). These local and global influences may affect the rates of prosociality, either by increasing rates of instrumental helping relative to Western samples, or by decreasing rates of resource sharing, due to exposure to resource inequality and scarcity.

For prosocial behavior in particular, socioeconomic status (SES) has been proposed as an important within-culture predictor for both adults and children, though prior work has found opposing effects of SES on prosociality. For example, some studies have found that adults with higher SES are less prosocial (Frank, 1999; James and Sharpe, 2007; Piff et al., 2010), but a large scale cross-national analysis found that wealthier adults are more prosocial compared to low SES individuals (Korndörfer et al., 2015). The same inconsistency has been found in developmental samples as well: children of wealthy families have been found both to give more (Benenson et al., 2007; Safra et al., 2016) and less (Miller et al., 2015) compared to low SES children. Moreover, one cross-cultural study found that both the poorest children (street children in Recife, Brazil) and the wealthiest children (from private day care in the United States) gave the fewest resources to a recipient in a Dictator Game (Rochat et al., 2009).

One potential moderating factor that may explain these conflicting findings is income inequality, although limited work has examined these effects on children. Wealthy adults in areas of high inequality tend to be less prosocial compared to wealthy adults in low inequality regions (Côté et al., 2015). Reviews of

research on wealth and inequality have also found that subjective perceptions of social status can impact a range of outcomes (Kraus et al., 2011). In addition, work on so called relative deprivation suggests that perceptions of oneself as being low social status relative to others (subjective social status; SSS) can have negative effects on behavior, and cognitive functioning in particular (Heberle and Carter, 2015). Perceptions of SSS are likely to be formed by adults, whose beliefs may influence children's behavior. Although speculative, we examined this possibility for children's prosocial behavior in the current study. In particular, we investigated the possibility that subjective beliefs about inequality might predict children's sharing behavior, for either low or high cost giving. Because no work to our knowledge has investigated or found relationships between inequality and other forms of prosocial behavior, we did not expect inequality beliefs to predict instrumental helping or comforting.

In summary, research on children's prosociality has identified three primary forms that emerge before or by 3 years of age: instrumental helping (i.e., helping others achieve a goal; Warneken and Tomasello, 2006), comforting (i.e., sympathizing and offering help to those in distress; Svetlova et al., 2010; Dunfield, 2014), and costly resource sharing (Blake and Rand, 2010; Chernyak and Kushnir, 2013; Chernyak et al., 2017, 2018). Although these behaviors appear in a range of societies, all three forms have not been tested in single non-Western society. Moreover, these behaviors may vary based on the degree of wealth and inequality experienced by a given family. In the current study, we conducted a test of the hierarchy of costs model of prosocial behavior in Zambia, a lower-middle income country with high inequality of wealth. Prosocial experiments and parent questionnaires were added to the second wave of data collection of a public health intervention. We thus obtained both objective and subjective measures of family wealth, parent beliefs about status and inequality, and measures of parenting practices.

## Research Context

As a country, Zambia is marked by both high poverty and high wealth inequality. According to the World Bank indicators database, approximately 60% of the population lives below the poverty line and income inequality is among the highest in the world (Gini coefficient = 0.57 in 2015; compared with United States Gini coefficient = 0.41 and Canada Gini coefficient = 0.34)<sup>1</sup> . In the rural areas where the study was conducted, the primary occupation is farming. Households often grow their own food on small plots of land and sell the excess at roadside markets. Families typically have limited childcare and education and public health research has found high rates of childhood stunted growth (Rockers et al., 2018).

The current study added measures to a health intervention targeting households with children between 6 and 12 months of age at baseline. The intervention occurred over 2 years with the treatment group attending bi-weekly parenting groups to learn about cognitive enrichment and play activities for their children, nutrition and self-care. The control group received no intervention. The original study was implemented as a cluster-randomized controlled trial in the Southern Province of Zambia, specifically the districts of Choma and Pemba. The clusters were 30 health zones and were randomized to treatment or control prior to enrollment. Villages within each zone were randomly selected and within those villages all eligible households were asked to participate. Participation was voluntary and all caregivers were provided written informed consent prior to study initiation. The study was approved by the Institutional Review Board at Boston University (protocol number H-32726) and by the ethics board at ERES Converge in Zambia (protocol number 2013-Dec-010) prior to the enrollment of participants. At the end of the 2-year period, study participants were reconsented for the final wave of data collection which included the measures added to the procedure for the current study.

At the beginning of the health intervention there were 268 mother-child dyads in the intervention group and 258 dyads in the control group. By the final wave of data collection there were 195 dyads in the intervention group (73%) and 182 dyads in the control group (71%). The total sample reported here thus consists of 377 mother-child pairs.

# MATERIALS AND METHODS

# Participants

Participants were 377 toddlers (190 males, 187 females; Mean age = 36.77 months; Range = 31.77 – 41.57 months) recruited through a larger public health study on maternal and child health outcomes. All mothers were provided with questionnaires about their own and their child's physical and emotional wellbeing, beliefs about parenting, and beliefs about inequality and interpersonal trust. Children were administered standardized measures of health, including the Bayley Scales of Infant and Toddler Development, Third Edition (BSID-III) as well as measured for their height, weight, and mid-upper arm circumference.

## Procedure

In addition to questionnaires and assessments aimed at investigating maternal and child health (not discussed or analyzed here), our group added the following structured tasks to the assessments which serve as the focus of this paper. Tasks were adapted from prior work aimed at studying prosocial behavior within this age group, and designed by the authors in consultation with local researchers who helped to make critical design modifications in order to make the tasks ecologically appropriate for the sampled population (e.g., using toys instead of stickers as the resource).

#### Instrumental Helping Task

In this task, adapted from Warneken and Tomasello (2006), the experimenter appeared to drop a bunch of sticks in front of the child seemingly by accident. The experimenter then expressed 4 cues in successive order to solicit the child's help. Each cue was followed by a 10-s pause in order to allow

<sup>1</sup>Gini coefficient is a measure of income inequality within a region, with 0 representing perfect equality, and 1 representing a society in which one individual owns all of the income/wealth.

the child an opportunity to respond. During the first cue, the experimenter simply stared at the sticks and exclaimed "Oops." During subsequent cues, the experimenter alternated between looking at the sticks and the child and said "I dropped my sticks" (second cue), "I dropped my sticks, I need them back" (third cue), and "Can you help me get my sticks?" (fourth cue; with palms extended toward the sticks).

During this time, children were coded as to whether they helped the experimenter retrieve his or her items (coded "yes" if the child helped at any point during the 40-s period, and "no" if the child did not help even 10 s after the last cue was provided), as well as their latency to respond. For latency, children were given a score of 1–5 corresponding to which cue elicited the child's help (a score of 5 indicated children did not help after the 4th cue).

#### Comforting Task

A non-costly comforting task was adapted from Svetlova et al. (2010). The experimenter retrieved two toys, noted that those toys are his/her favorite ("These toys are my favorite, they make me happy") and placed one near the child and out of the experimenter's reach. The experimenter then proceeded to play with the second toy and pretended to accidentally break it (the toy was configured in such a way that it broke upon handling). The experimenter then expressed 4 cues in successive order in order to elicit the child's help. Each cue was followed by a 10 s pause in order to allow the child an opportunity to respond. Responding was defined as either providing the second toy to the experimenter or attempting to fix the first toy. During the first cue, the experimenter simply stared at the broken toy and exclaimed "Oh no!" During subsequent cues, the experimenter alternated between looking at the toy and the child and said "I broke my toy!" (second cue), "I am sad, I want another toy" (third cue), and "Can you help me get my other toy?" (fourth cue; with palms extended toward the other toy).

During this time, children were coded as to whether they comforted the experimenter (coded "yes" if the child helped at any point during the 40-s period, and "no" if the child did not act even 10 s after the last cue was provided), as well as their latency to respond. For latency, children were given a score of 1–5 corresponding to which cue elicited the child's action (a score of 5 indicated the child did not help after the 4th cue).

#### High- and Low-Cost Resource Giving

In these last two tasks, adapted from Chernyak and Kushnir (2013), children were given the opportunity to give to a doll that was sad. These tasks were similar to the comforting task described above but add a personal cost to the action because children had to sacrifice an object that they could keep in order to comfort an agent (see Svetlova et al., 2010 for a similar approach). The adapted task employed a two-step design in which children were first introduced to a doll that was described as feeling "very sad." In the first step (high-cost giving), the child was then shown a resource (an attractive toy) and told that they could either keep it or give it to the doll to make the doll feel better. The child was then asked whether s/he would like to keep the toy for him/herself or whether s/he would like to give it to the doll to make the doll feel better and provided a box to place the resource into if they wished to give it to the doll. If the child did not respond with an answer, s/he was re-prompted two more times and then provided the resource if no response was given after the last re-prompt. Preliminary analyses revealed that this occurred for only a very small number of children (n = 7), who were excluded from any analyses or calculations involving this task. This task was defined as high-cost because the child had only one resource to either keep or give and was done first to prevent the larger variation in resources the children had obtained that occurs in the next step.

During the second step (low-cost giving), children were shown a new doll, told that the new doll was also feeling upset, and then shown three toys, that they could then allocate however they wished. Children were shown two boxes – one for the doll, and one for the child – and prompted to split the three toys into the boxes. If children left any toys unallocated, they were re-prompted until each toy was assigned to either the child or the doll. The number of toys that children gave to each doll during each step was recorded. This task was defined as lowcost because the child had three resources and thus could give something without sacrificing everything. Moreover, this task was completed after the high-cost giving task (thus giving the child an opportunity to keep one item in his or her possession), thus lessening the cost demands on the child.

#### Objective Wealth

Regular income is rare in this region of Zambia and health research typically use an assessment of household wealth. In the initial baseline survey, care-givers were asked if the home has specific assets (a radio, TV, stove, bicycle, farm animals) and access to utilities such as electricity and running water. A composite measure was created based on these responses and standardized (z-scored) across the sample.

#### Parent Beliefs About Inequality

We added beliefs about inequality in mothers' questionnaires in order to assess the impact of subjective perceptions of wealth and inequality. The first question (Village Inequality Belief) asked the mother "Which statement best characterizes your village?" with four response options: (1) Everyone has about the same; (2) Some people have a little more than others; (3) Some people have a lot more than others; and (4) A few people have much more than everyone else. The next two questions assessed subjective wealth status at both the village (question 2; Local Subjective Wealth) and country level (question 3; Global Subjective Wealth): "Thinking of your village/country, do you think that you have much more or much less than other people in your village/country." The five response options were: (1) a lot less; (2) a little less; (3) about the same; (4) a little more; and (5) a lot more. Finally, the last question asked the mother: "How much of a problem do you think wealth inequality is in your country?" (Inequality as a Problem Belief). Responses were on a five-point scale: (1) not a problem at all; (2) a small problem; (3) a moderate problem; (4) a big problem; and (5) a very big problem.

#### Child Cognitive Ability

Children's cognitive abilities were assessed as part of the larger health study using the Bayley Scale for Infant and Toddler

Development, Third Edition (BSID-III). The cognitive sub-scale of the BSID-III includes a set of age appropriate tasks that the child is asked to complete, focused on various cognitive skills including object relatedness, pattern recognition, and memory. Standardized scores for the BSID-III are based on norms from a United States sample, and should not be extended to other populations which likely have different normative trajectories of cognitive development (Cromwell et al., 2014). Therefore, children's raw scores on the cognitive sub-scale were established by summing the number of items successfully completed for each; raw scores were then converted to z-scores by standardizing within the study population.

All assessments were administered in the local language that was most familiar to the family and administered in the family's home by a local researcher conversant in both English and the family's local dialect (if not English).

# RESULTS

Preliminary analyses showed no gender, age, child height, or treatment effects (effect of intervention), so data were collapsed across these variables. For ease of comparison, we analyze all behaviors categorically (whether children opted into the target behavior or not). However, all reported results remained consistent when looking at prosocial behavior on continuous metrics (i.e., latency to help, rather than whether the child helped). See **Supplementary Analyses** for details.

We first investigated the rates of prosocial behavior across the three tasks (**Figure 1**). Children displayed instrumental helping and comforting at very high rates (approximately 75–80%), similar to rates found in Western samples investigating this age range (Svetlova et al., 2010). In contrast with prior work using Western samples (Chernyak and Kushnir, 2013), however, rates of high- and low-cost resource giving were markedly lower, though we note that the age-group sampled here was on average, younger, and thus direct comparisons are not possible.

Importantly for the hierarchy of costs model, the relative rates of each of the behaviors were consistent with what is reported in Western samples: instrumental helping most common, followed by comforting, and followed by low-cost and then high-cost resource giving (see Svetlova et al., 2010).

Within-subjects McNemar's tests comparing the rates of each behavior to one another showed that the rate of each target behavior was significantly different from the others (all ps < 0.001, with the exception of the comparison between instrumental helping and comforting: p = 0.057).

Spearman's rho correlations (see **Table 1**) showed that all behaviors, with the exception of high cost resource giving were significantly related to one another. In contrast, high-cost resource giving was only related to low-cost resource giving, suggesting a dissociation between lower-cost behaviors such as helping and comforting and higher-cost behaviors such as giving away one's only resource.

We next examined the questionnaire measures of wealth and inequality. **Table 2** shows descriptives and correlations among the key variables of interest: Objective Wealth (z-score) and the four questionnaire items (Village Inequality Belief, Local Subjective Wealth, Global Subjective Wealth, and Inequality as a Problem Belief). Generally, people reported that their village was characterized by inequality. Caregivers reported, on average, a 3.42 on a scale of 1–4, very close to the statement that some people have a lot more than everyone else. The majority of the sample also reported themselves as being poorer relative to others in their village and their country (one sampled t-tests comparing responses to the midpoint of 3, "about the same," ts < −20.0, ps < 0.001). Finally, people reported on average that inequality was at least a moderate to big problem (reporting an average of 3.45 on a scale of 1 to 5, where 1 indicated that inequality was not a problem and 5 indicated that it was a very big problem).

As shown in **Table 2**, objective wealth was correlated with both local and global subjective wealth, but not correlated with belief that inequality is a problem. Both local and global subjective wealth were very strongly correlated, and

#### TABLE 1 | Spearman's rho correlations for each of the target behaviors.


<sup>∗</sup>p < .05; ∗∗p < .01; ∗∗∗p < .001. Significant effects are displayed in bold.

moderately negatively correlated with the belief that inequality is a problem. Thus, the richer people perceived themselves to be, the less likely they were to believe inequality was a problem.

For the final set of analyses, we examined whether objective wealth, subjective wealth and inequality beliefs predicted each of the target prosocial behaviors. For these analyses, we ran binary logistic regressions using each of the target behaviors as the dependent variable, and Objective Wealth (z-score), and answers to each of the four inequality/subjective wealth questions as predictors. We also initially checked for any effects of Age, Child's Sex, Intervention Group, Child Height (as a proxy for physical development), and General Cognitive Ability (taken from the cognitive subscale of the BSID-III; z-scored) and removed these if they were non-significant. Unless otherwise noted, these were not significant.

For Instrumental Helping, there was a significant effect of Cognitive Ability (B = 0.524, SE(B) = 0.132, p < 0.001), and a significant effect of Local Subjective Wealth (B = 0.458,

TABLE 2 | Pearson's correlations for each of the inequality/subjective wealth items and objective wealth the less likely they were to believe that inequality was a big problem in Zambia.


<sup>∗</sup> p < .05; ∗∗ p < .01; ∗∗∗ p < .001. Significant effects are displayed in bold.

SE(B) = 0.211, p = 0.030). Children with higher cognitive ability and children with mothers who indicated having more than others in their village were more likely to engage in instrumental helping. No other effects reached significance (all ps > 0.15). For Comforting, there was a significant effect of Cognitive Ability, B = 0.421, SE(B) = 0.122, p = 0.001, and no other significant effects (all ps > 0.15). Thus, children with higher cognitive ability were also more likely to engage in comforting behaviors (**Figure 2**).

For Low Cost Giving, there was a significant effect of Village Inequality Belief, with children whose mothers believed the village was characterized by larger inequality having children who were more likely to share at least one toy out of three with the doll, B = 0.331, SE(B) = 0.121, p = 0.006, and no other significant effects (all ps > 0.25). For High Cost Giving, there was a significant effect of Cognitive Ability, B = 0.269, SE(B) = 0.123, p = 0.029, and no other significant effects (all ps > 0.15) (**Figure 2**).

Finally, given that high- and low-cost giving both took place in the context of pretend play (giving resources to a doll, as opposed to an experimenter), we explored the possibility that rates of giving were lower than what is observed in Western samples because children in Zambia were less familiarized with such pretend scenarios. Although we did not assess pretend play directly, the larger public health study did administer a question on whether the parent had ever read books or looked at picture books with the child. Indeed, less than half (40.6%) of mothers answered affirmatively to that question. Adding this question into the model predicting low-cost giving showed a significant relation between opting into low-cost giving and whether the parent read books to the child, B = 0.478, SE(B) = 0.219, p = 0.029, but not high-cost giving. Thus, one mechanism that influences children's prosocial behavior toward toys and puppets may be the extent to which they are exposed to pretend scenarios more generally.

# DISCUSSION

Despite a large body of recent research on the development of prosocial behaviors, no studies have examined all three components of prosociality outside of wealthy Western countries. We thus began this work by investigating the universality and cultural variability of two recent claims: that by the 3rd year of life, young children generally show high rates of instrumental helping and comforting with others (Warneken and Tomasello, 2006; Svetlova et al., 2010); and second, that, prosocial behavior follows a hierarchy of costs model (Warneken and Tomasello, 2009) with helping appearing first, followed by comforting, and then followed by costly resource giving (Svetlova et al., 2010; Brownell, 2012). We found support for both of these claims in a sample that was vastly different than those studied in prior work, characterized by high inequality, and in children with relatively little exposure to modern schooling. Children tended to show high rates of prosociality at the same point in development previously laid out in prior work. We note that these first set of analyses was aimed to provide a descriptive of the rates of prosociality within this society. Thus, unlike prior work off of which our study was based (Warneken and Tomasello, 2009; Svetlova et al., 2010; Chernyak and Kushnir, 2013), we did not include control conditions that were included in this prior work.

Our results, combined with this prior work, suggest that children take a cost-based approach to prosocial behavior (Sommerville et al., 2018): rates of prosocial behavior varied hierarchically based on the cost of the behavior (with instrumental helping, which is generally considered the lowest cost appearing first; followed by comforting, which involved expending greater effort to soothe the experimenter; followed by giving up their own resources in order to alleviate the distress of others). The fact that prosociality appears governed by cost even in a very poor sample suggests that this may be a cognitive universal.

We also find correlations among the lower cost behaviors, though not the higher cost behaviors. We note that this finding is slightly divergent from prior work, which has found little to no correlations (Dunfield and Kuhlmeier, 2013; Dunfield, 2014) among varieties of prosocial behavior. One possibility is the difference in samples: it is possible that prosocial behavior is viewed as a unitary construct among rural Zambian children, but not among Western children that may be frequently exposed to some behaviors through schooling (e.g., sharing), others through household chores (e.g., instrumental helping) (Hammond and Carpendale, 2014), and yet others through mental state talk (e.g., comforting; Drummond et al., 2014). The tasks used here also may have been more similar in eliciting empathy than in prior work, given that the experimenter or the doll were presented as sad.

Child cognitive ability was related to nearly every form of prosocial behavior suggesting that general cognitive development plays a role in children's abilities to display prosociality. This particular finding underscores the importance of going beyond studying age-related changes and studying the cognitive predictors and underpinnings of why those age-related changes occur. Recent work has taken an interest in understanding the cognitive underpinnings of prosocial behavior (Blake et al., 2015b; Cowell et al., 2017; Steinbeis and Over, 2017; Chernyak et al., 2018). Our study adds to this work by highlighting that general cognitive ability explains a substantial portion of the variance in prosocial behavior, underscoring the importance that various domain-general abilities may serve as important prerequisites for our prosocial tendencies. We encourage future work to continue to focus on how cognition underlies our behavioral capacities in the prosocial domain.

We find that rates of opting into low- and high-cost giving were markedly lower than those observed in prior work using a similar paradigm (Chernyak and Kushnir, 2013; Chernyak et al., 2017). Prior work does find variability in costly resource sharing across cultures (House et al., 2013; Blake et al., 2016), though the majority of this work, to our knowledge, has investigated older age groups. Though we did not include a Western sample as a direct comparison, one possibility, of course, is that our sample was slightly younger than that observed in past work, and thus costly sharing was too difficult for this age group. Another, and more intriguing possibility, is that the global culture in which

children were raised made the toys too high-value to give away for the sake of someone else's comfort. Thus, as noted above, children's behavior may have been governed by the cost of the action.

Another possibility is that because the high- and low-cost resource sharing task took place in the context of pretend play, only children who were familiarized with such symbolic play opted into resource giving toward a toy doll. Prior work does find marked cultural differences in symbolic play (Callaghan et al., 2011). Moreover, prior work shows relations between parents' use of emotion talk during book reading and children's prosocial behavior (Brownell et al., 2013). Pretend play, whether in the context of storybooks or symbolic objects, may thus provide one method through which children are exposed to and have opportunities to consider the emotional and physical needs of others. Our exploratory analyses showed a strong correlation between children's exposure to storybooks and their resource sharing, which held even when considering the effect of other confounding variables (i.e., social class and perceived access to resources). If this is the case, then our work points to the importance of considering ecological cultural validity (e.g., the extent to which toys and animal puppets, or resources in general, are valued by the sampled culture) in conducting cross-cultural investigations. Future work should investigate these possibilities more directly.

Our wealth and inequality questionnaires allowed us to explore the extent to which environmental variables shape our prosocial behavior, thus pointing to a source of individual differences in early prosociality. Prior work has found that social class is related to altruistic giving, although the direction of effects varies across studies. Our results find that family wealth did not relate to children's prosocial behavior, once we controlled for other correlates of objective wealth, namely, subjective wealth and beliefs that inequality was a problem. However, we note that our measures of household assets differ markedly from measures used in prior work, and also that given the high degree of poverty in the sample population, there was not a large range of objective wealth.

The extent to which mothers believed that inequality was a problem predicted children's rates of opting into low-cost resource sharing, pointing to the potential of parent-child transmission to prosocial behavior. Though this is a speculative possibility, one mechanism may be that mothers who discuss, emphasize, and elaborate on issues of inequality may also have children who are more willing to expend resources to comfort someone in distress. Such a possibility would be consistent with work generally finding that parent-child discussion of others' mental states predicts empathetic helping in toddlers (e.g., Drummond et al., 2014). Future work should directly include measures of parents' discussion of inequality in order to more directly study how parent-child conversation surrounding inequality shapes children's own beliefs, and subsequently, their prosocial behavior.

Finally, we found that subjective local wealth predicted instrumental helping, even when controlling for other class variables, suggesting that children of mothers who perceived themselves as having more than others around them were also more willing to help others attain their goals. One possibility is that these children were more familiarized with instrumental helping and reciprocal exchanges (Cortes Barragan and Dweck, 2014), since mothers may have felt more obligation to help others more generally.

In summary, this investigation points to the value of including understudied populations in developmental work, both as a way to validate existing theories on prosociality and as a way to explore potential individual differences more generally. In general, we replicate past work, and also join recent efforts in exploring the impact of culture on the diversity of prosocial behavior. In studying how previously documented effects do and don't hold across cultures, we hope to impart the need for broader, and more representative samples within developmental psychology.

# ETHICS STATEMENT

This study was carried out in accordance with the recommendations of Boston University and the ethics board at ERES Converge in Zambia (protocol number 2013Dec010) with written informed consent from all participants or guardians (for child participants). All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the Boston University and the ERES Converge in Zambia (protocol number 2013Dec010). Parental consent was obtained for child participants.

# AUTHOR CONTRIBUTIONS

All authors designed the studies and questionnaires. PR led data collection in Zambia. NC conducted data analyses and wrote the first draft of the manuscript. PB provided critical revisions to the manuscript. All authors provided revisions and suggestions to the manuscript.

# FUNDING

The data presented in this paper were collected as part of a larger public health study that was funded through a grant from the Policy Research Fund at the Department for International Development (DFID), United Kingdom (55204321).

# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyg. 2018.02209/full#supplementary-material

# REFERENCES

fpsyg-09-02209 November 15, 2018 Time: 18:3 # 9


Steinbeis, N., and Over, H. (2017). Enhancing behavioral control increases sharing in children. J. Exp. Child Psychol. 159, 310–318. doi: 10.1016/J.JECP.2017.02.001


Warneken, F., and Tomasello, M. (2009). Varieties of altruism in children and chimpanzees. Trends Cogn. Sci. 13, 397–402. doi: 10.1016/J.TICS.2009. 06.008

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Chernyak, Harvey, Tarullo, Rockers and Blake. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Motivating Moral Behavior: Helping, Sharing, and Comforting in Young Children With Autism Spectrum Disorder

#### Kristen A. Dunfield<sup>1</sup> \*, Laura J. Best<sup>2</sup> , Elizabeth A. Kelley<sup>2</sup> and Valerie A. Kuhlmeier<sup>2</sup>

<sup>1</sup> Department of Psychology, Concordia University, Montreal, QC, Canada, <sup>2</sup> Department of Psychology, Queen's University, Kingston, ON, Canada

This exploratory study examined the role of social-cognitive development in the production of moral behavior. Specifically, we explored the propensity of children with Autism Spectrum Disorders (ASD) to engage in helping, sharing, and comforting acts, addressing two specific questions: (1) Compared to their typically developing (TD) peers, how do young children with ASD perform on three prosocial tasks that require the recognition of different kinds of need (instrumental, material, and emotional), and (2) are children with ASD adept at distinguishing situations in which an adult needs assistance from perceptually similar situations in which the need is absent? Children with ASD demonstrated low levels of helping and sharing but provided comfort at levels consistent with their TD peers. Children with ASD also tended to differentiate situations where a need was present from situations in which it was absent. Together, these results provided an initial demonstration that young children with ASD have the ability to take another's perspective and represent their internal need states. However, when the cost of engaging in prosocial behavior is high (e.g., helping and sharing), children with ASD may be less inclined to engage in the behavior, suggesting that both the capacity to recognize another's need and the motivation to act on behalf of another appear to play important roles in the production of prosocial behavior. Further, differential responding on the helping, sharing, and comforting tasks lend support to current proposals that the domain of moral behavior is comprised of a variety of distinct subtypes of prosocial behavior.

Keywords: prosocial behavior, autism spectrum disorder, moral development, social-cognitive development, helping, sharing, comforting

# INTRODUCTION

Humans are a hyper-social species. Other-regarding concerns permeate most human interactions and social structures. Indeed, the ability and willingness to act on behalf of others has important implications for well-being in contexts that range from children's successful entry into peer culture (Wentzel, 2014), to the functioning of society as a whole (Tomasello, 2009). Over the last decade, there has been considerable interest in identifying the developmental origins of our tendency to act in ways that benefit others, potentially at a cost to ourselves.

#### Edited by:

Jessica Sommerville, University of Washington, United States

#### Reviewed by:

Arber Tasimi, Stanford University, United States Simpson W. L. Wong, Hong Kong Baptist University, Hong Kong

#### \*Correspondence:

Kristen A. Dunfield kristen.dunfield@concordia.ca; kristen.dunfield@crdh.concordia.ca

#### Specialty section:

This article was submitted to Developmental Psychology, a section of the journal Frontiers in Psychology

Received: 28 August 2018 Accepted: 07 January 2019 Published: 23 January 2019

#### Citation:

Dunfield KA, Best LJ, Kelley EA and Kuhlmeier VA (2019) Motivating Moral Behavior: Helping, Sharing, and Comforting in Young Children With Autism Spectrum Disorder. Front. Psychol. 10:25. doi: 10.3389/fpsyg.2019.00025

**91**

The term prosocial behavior typically refers to any action one individual engages in to benefit another (Hay, 1994). Though this definition hints at the diversity of actions that fit this characterization, it largely treats prosocial behavior as a unitary construct requiring a single developmental explanation. Importantly, this broad definition has led to mixed success determining when prosocial behaviors first emerge (Dunfield et al., 2011), the developmental trajectories that prosocial behaviors follow (Dunfield and Kuhlmeier, 2013), what neural and behavioral correlates support its production (Paulus et al., 2013; Steinbeis, 2018), and how individual differences affect its production (Beier et al., 2018; Schachner et al., 2018; see Eisenberg et al., 2015b for a recent, broad review). As such, there has been a move to clarify the varieties of ways people can act on behalf of others and identify the unique constraints imposed by each type of prosocial response. Importantly, as the field moves toward a more nuanced understanding of the factors that support the early development of prosocial behavior, there is still striking homogeneity in the participants studied (largely neurotypical participants from WEIRD cultures: e.g., Eisenberg et al., 2015b). The current research explores early prosocial behavior in a unique participant population, namely individuals diagnosed with Autism Spectrum Disorder (ASD).

Autism Spectrum Disorder is a neurobiological disorder characterized by impaired social behavior, communication, and language difficulties, in addition to restricted, repetitive behaviors and/or interests (American Psychiatric Association, 2013). Children with ASD show reduced attention to, and gain less reinforcement from, shared social attention and interactions (Dawson et al., 2004), which is thought to result in impairment in social cognition more broadly (Chevallier et al., 2012). Specifically, the ability to recognize and understand others' mental (theory of mind) and emotional (affect recognition) states appears delayed in children with ASD (e.g., Charman et al., 1998; Dyck et al., 2001). Thus, exploring the prosocial tendencies of young children with ASD presents a unique opportunity given documented deficits in social cognition and questions regarding the social motivation of affected individuals.

# Varieties of Prosocial Behavior

Much prosocial behavior, especially early in development, involves intervening when another individual is experiencing a problem. Effectively intervening on behalf of another requires the ability to take their perspective and notice that they are having trouble, the recognition of the cause of the problem, and the motivation to act to resolve the problem. If one fails to navigate any of these three challenges, an effective prosocial behavior is unlikely to be produced. With these constraints in mind, there are at least three varieties of negative states that individuals are likely to face and regularly resolve for others, namely, instrumental need, material desire, and emotional distress.

#### Helping

Instrumental need occurs when an individual is unable to complete goal directed behavior. Helping is the term we use to refer to other-oriented acts aimed at alleviating another's instrumental need. In the now classic out-of-reach helping paradigm (Warneken and Tomasello, 2006), toddlers observe an experimenter hanging clothes on a line. As the experimenter works through their chore, they drop a clothespin where they cannot reach it, giving the child an opportunity to help by retrieving the required item.

### Sharing

Material desire occurs when an experimenter does not have access to a desired resource. Sharing refers to behaviors intended to alleviate material desire in another by relinquishing control of a good. Children's responses to material desire have been assessed in a variety of ways that range from naturalistic observations (e.g., Hay et al., 1999) to structured economic-style games (e.g., Fehr et al., 2008). Typically, children are placed in situations of either resource abundance or scarcity and often explicitly prompted to make a decision about how available resources should be distributed.

## Comforting

Finally, emotional distress occurs when an individual is experiencing negative arousal and can be resolved through comforting. Comforting has been defined and examined in a number of ways that range from assessing children's concerned attention toward emotional displays (e.g., Spinrad and Stifter, 2006) to children's ability to approach and offer physical comfort to a distressed individual (e.g., Svetlova et al., 2010). In the current study, to facilitate comparisons across tasks, we will focus on overt responses to other's negative emotions such as verbal (e.g., kind words) or physical (e.g., pats, hugs, or kisses) behaviors instead of more ambiguous responses such as concerned attention, which could reflect either personal distress or other-oriented concern (Eisenberg et al., 1991).

# Prosocial Behavior in Typically Developing Children

Importantly, because responding to each of these distinct needs requires different initial assessments, and the underlying social cognitive abilities emerge at different ages, we should not necessarily predict consistency regarding when in development each of these behaviors will occur nor how individual differences will affect each variety of behavior. Previous research on children's social cognitive development suggests that within the first year of life infants can interpret goal directed action (e.g., Woodward, 1998), differentiate between intentional and accidental outcomes (e.g., Behne et al., 2005), and shortly thereafter begin to correct unintended outcomes (e.g., Meltzoff, 1995); suggesting that around their first birthday infants have the representational capacity to recognize and respond to instrumental needs. Consistent with this proposal, helping has been observed as early as 14-months (Warneken and Tomasello, 2007) and is enacted robustly in a variety of circumstances by 18-months (Warneken and Tomasello, 2006).

Sharing, on the other hand, involves the recognition of and response to material desire. Previous research reveals that, infants prefer equal distributions (and distributors; e.g., Geraci and Surian, 2011; Schmidt and Sommerville, 2011) and begin to offer others goods within the first year of life (Hay et al., 1999).

However, true sharing, in which the good is given up entirely, does not emerge consistently until closer to the third birthday and then, typically only when others make their desire explicit (Brownell et al., 2013), the cost of sharing is low (e.g., Thompson et al., 1997; Moore, 2009), or the recipient is familiar (e.g., Hay, 1979; Paulus, 2016). Thus, the real challenge in addressing material desire may be the motivation to relinquish control of a desired good (e.g., Svetlova et al., 2010).

Finally, comforting involves the recognition of and response to a negative affective state. A major challenge associated with the production of comforting behavior is taking another's perspective and determining an appropriate response in an emotional domain. Relative to the other varieties of prosocial behavior, comforting has a longer history of theoretical consideration. In what is still a dominant perspective on the development of comforting, Hoffman (1982) proposed that comforting develops over four stages with increasing complexity ranging from simple emotional contagion in infancy to veridical empathic responses that emerge closer to the fourth birthday. Consistent with this proposal, the earliest instances of "comforting" typically involve measures of concerned attention as opposed to other-oriented actions (Spinrad and Stifter, 2006), and the ability to perceive emotional distress and respond to it is affected by the types of comfort one has experienced over the course of their early life (e.g., Johnson et al., 2013; Dunfield and Johnson, 2015; Gross et al., 2017; Beier et al., 2018).

By distinguishing between the three varieties of negative states and focusing on the initial assessment the child is forced to make, researchers have demonstrated unique ages of onset, with helping and sharing preceding comforting (Svetlova et al., 2010; Dunfield et al., 2011; Paulus et al., 2013), unique developmental trajectories and uncorrelated patterns of production (Dunfield and Kuhlmeier, 2013; Sommerville et al., 2013; Eisenberg et al., 2015a), and variability associated with individual differences (Beier et al., 2018; Knafo-Noam et al., 2018; Schachner et al., 2018). These findings are consistent with the idea that varieties of other oriented behavior show distinct developmental trajectories due to the differential development of the underlying social cognitive abilities (e.g., Dunfield, 2014; Paulus, 2018). Importantly, because the production of prosocial behavior is thought to require the coordination of social understanding and motivation, critical insights can be gleaned from exploring these behaviors in atypical developmental samples.

# Prosocial Behavior in Children With ASD

Previous research reveals that some of the social cognitive abilities that underlie successful prosocial behavior are intact in young children with autism. For example, children with ASD appear to understand others' actions on objects (Aldridge et al., 2000; Carpenter et al., 2001), suggesting that they may be able to represent other's instrumental needs. In contrast, documented deficits in effortful control – particularly when inhibiting a prepotent response (Hill, 2004) – may make it difficult for children with ASD to relinquish control of a resource in order to alleviate another's material desire. Further, previous research has demonstrated that impairments in emotion recognition throughout the lifespan may make the production of an effective comforting response uniquely difficult (see Bons et al., 2013 for a review). Taken together, there is reason to believe that social cognitive deficits associated with ASD may differentially impact an individual's ability and willingness to act on behalf of another.

Few studies have examined the prosocial abilities of children with ASD. When assessed using the Strengths and Difficulties Questionnaire, parents and teachers reported reduced prosocial behavior in children and adolescents with ASD relative to typically developing (TD) participants (Iizuka et al., 2010; Jones and Frederickson, 2010; Russell et al., 2012). When assessed experimentally, 10–13 year-old children with ASD were found to engage in simple helping and sharing but at a lower rate than developmentally delayed control groups (Sigman et al., 1999; Travis et al., 2001). Importantly, these studies examined a either a single prosocial behavior, or combed them into a single prosocial score, making it impossible to separately consider the two varieties of prosocial behaviors, which may develop independently and, thus, may have occurred with different frequency. Moreover, these studies examined schoolaged children, and compared to a Developmentally Delayed (DD) control group, their findings cannot speak to the emergence of prosocial behavior or speak to differences related to TD participants.

In relation to comforting, Sigman et al. (1992) reported 3.5 year-old children with ASD's behavioral responses to another's distress, with the highest rating being "intense affective involvement and/or comforting behavior." Only 10 and 6% of children with ASD were rated as comforting a parent and experimenter, respectively (Sigman et al., 1992). Further, when assessing 6- and 7-year-old children with ASD's responses to another's emotional distress using a combination of questionnaires and online prosocial tasks, Deschamps et al. (2014) found that although children with ASD struggled with cognitive empathy (i.e., attributing a mental state to another) and social responsiveness relative to TD children, they showed similar performance on measures of affective empathy (i.e., experiencing an emotion congruent with another's experience) and computermediated prosocial behavior. Importantly, deficits in concern for others appear to develop early. Indeed, by 20-months- ofage, children at risk for developing ASD are already showing diminished concern for others relative to TD peers (Charman et al., 1997).

More recently, two experimental studies have examined the ability of young children with ASD to engage in a variety of prosocial behaviors. Paulus and Rosal-Grifoll (2017) compared helping and sharing in 3- to 6-year-old TD participants and participants with ASD. To assess helping, participants watched as an experimenter accidentally knocked a jar of pens onto the floor as they left the testing room. Sharing was assessed using a resource allocation paradigm in which participants were given 10 stickers to distribute between themselves and two hypothetical recipients (one rich and one poor). The authors report higher rates of helping in the ASD population than in the TD population. Further, while TD participants shared resources equally, ASD participants tended to give the majority of their resources away. Importantly, though this study demonstrates surprising prosocial abilities in young children with ASD, it is unclear to what extent

participants were motivated by the recipient's need. Specifically, neither task included a control condition, leaving open the possibility that the participants were enacting a learned script (e.g., if things are on the floor, pick them up; see Warneken and Tomasello, 2006). Relatedly, because participants had to infer the correct response either in absence of the recipient (i.e., the helping task), or in absence of any non-verbal cues to need (i.e., sharing with cardboard cut outs), it is difficult to determine the extent to which participants are responding to the other's needs.

Most relevant to the current study, Liebal et al. (2008) presented both 2- to 5-year-old children with ASD and a DD control group the opportunity to help and to cooperate with an experimenter. The helping task replicated the design of Warneken and Tomasello (2006) and included an experimental (need present) and control (need absent) condition. Though children with ASD clearly recognized and responded to another's need (i.e., retrieving the object more frequently in the experimental as opposed to control condition), participants with ASD were less likely to help than their DD counterparts. Together, the extant literature suggests that children with ASD have the ability to recognize and respond to each of the three types of negative states. However, it is not clear when these abilities emerge or how frequently these behaviors are produced relative to each other, and importantly, relative to TD children.

# The Current Study

The current, exploratory study will contribute to our understanding of prosocial abilities in children with ASD by addressing two fundamental questions: (1) across the three varieties of prosocial behavior, are children with ASD adept at distinguishing situations in which an adult needs assistance from perceptually similar situations in which the need is absent?, and (2) compared to TD peers, how do children with ASD perform on tasks that require the recognition of the three different types of need (i.e., instrumental need, material desire, and emotional distress)? We recruited ASD participants with a non-verbal mental age of 3 years to facilitate meaningful comparison with Liebal et al. (2008) and because previous research suggests helping, sharing, and comforting are all within the behavioral repertoire of 3-year-old TD children when presented with similar tasks (Dunfield et al., 2011; Dunfield and Kuhlmeier, 2013). Because of the exploratory nature of this research, and the fact that children with ASD have both social cognitive and motivational deficits that may impede the production of prosocial behavior, the mechanism underlying any group differences in the production of prosocial behavior will be difficult to interpret. We hypothesize that, due to impaired social motivation, children with ASD will produce less prosocial behavior than their TD peers across all tasks. Moreover, because the different varieties of prosocial behavior impose different cognitive constraints, children with autism may show patterns of production that vary across tasks and differ from the developmental trajectories we have previously observed in TD children (e.g., with helping emerging first followed by sharing then comforting). Should participants with ASD show a unique pattern of production of helping, sharing, and comforting, it could highlight important avenues for new research into the ways in which social cognitive development affects the production of prosocial behaviors.

# MATERIALS AND METHODS

# Participants

Twenty-eight children participated in this study. Our sample consisted of 14 children with a diagnosis of ASD and 14 typically developing children (TD; see **Table 1** for details regarding the two samples). Participants in the ASD group had been formally diagnosed with ASD by a pediatrician and/or a psychologist based on the DSM-IV criteria. Diagnoses were confirmed in our lab using the Autism Diagnostic Observation Schedule (ADOS)<sup>1</sup> . All ASD participants met the criteria for ASD with eight participants meeting the more stringent cutoff for Autism. Participants in the TD group had no history of medical or developmental diagnoses, nor did they have a family history of ASD. Groups were matched on non-verbal mental age because the prosocial assessment did not necessarily require verbal output and previous research suggests verbal mental age may underestimate the abilities of children with ASD (Burack et al., 2002). Six additional children with ASD were excluded from the final sample due to their chronological and/or mental ages exceeding the testing age range (n = 4) or failure to confirm ASD diagnosis using the ADOS (n = 2). No TD participants were excluded. Participants were recruited from a small southeastern Ontario city and spoke English as their primary language.

# Measures

#### Autism Diagnostic Observation Schedule (ADOS; Lord et al., 2002)

The ADOS is a standardized tool used to evaluate and diagnose children on the autism spectrum. All participants with ASD participated either in Module 1 or 2 of this test to confirm their existing diagnoses.


<sup>a</sup>Groups showed no statistical differences. <sup>b</sup>ADOS score: Autism cut-off minimum score of 12 (Module 1 and 2), Autism Spectrum Disorder cut-off minimum score of 8 (Module 2), or 7 (Module 1). <sup>1</sup>All children had an existing diagnosis of ASD upon participation in our study. Diagnoses were then confirmed using the Autism Diagnostic Observation Schedule (ADOS). Given that diagnostic stability of an ASD is questionable during the preschool years (Kleinman et al., 2008) and that the ADOS is not effective at differentiating between diagnoses on the spectrum (e.g., McConachie et al., 2005), we elected to include children with a diagnosis anywhere on the autism spectrum. According to ADOS scores, 8 children with ASD exceeded the cutoff for Autism, and the remaining 6 children met only the ASD cutoff score.

#### Mullen Scales of Early Learning (MSEL; Mullen, 2005)

The MSEL is a standardized test to evaluate the development (language, motor, visual abilities) of children from birth through age 69 months. Given that the tasks included in the present study were largely non-verbal, age-equivalent scores on the visual reception domain (an indicator of non-verbal IQ) were used to individually match participants from each group on mental age. That said, all children also completed the language subscales of the MSEL and there were no significant differences in language abilities across groups.

# Procedure and Design

fpsyg-10-00025 January 22, 2019 Time: 17:40 # 5

Participation involved two visits for children with ASD and, in general, one visit for TD children, with each session lasting approximately 45 min. Four TD children required two visits due to their involvement in an additional study; these children engaged in the experimental task during the first visit and the MSEL during the second. The remaining TD children completed the experimental procedure, followed by the MSEL in a single visit. Nine of the children with ASD also participated in an additional study. Most children with ASD (n = 9) completed the experimental procedure and the MSEL in one visit and the ADOS in the other. On average, the timeline between children's first and second visits was 3 weeks (M = 21.88 days, SD = 26.92).

In order to explore the prosocial tendencies of children with ASD, participants engaged in a play-based experiment that assessed their responses to instrumental needs, material desires, and emotional distress. In addition, joint attention, intention understanding, and imitation were assessed but are not reported here. Interspersing prosocial trials within other tasks allowed us present the prosocial tasks in a manner that appeared credible and somewhat natural. Caregivers who opted to accompany their child into the testing room were seated behind the participant and instructed not to influence or encourage their child's responses toward the experimenter; however, they were allowed to comfort their child if the child approached them in distress.

#### Prosocial Assessment

Replicating Dunfield et al. (2011), children were presented with two opportunities to engage in each of the three varieties of prosocial behavior (helping, sharing, and comforting). For each of the three negative states, children were presented with two varieties of trials. In the experimental condition, the experimenter demonstrated her negative state (e.g., outstretched arm) whereas in the control condition the experimenter engaged in a perceptually matched display that did not demonstrate need (e.g., placing the toy on the ground). By attempting to match the two displays as closely as possible we can ensure that any differences between the two conditions reflect specific responses to the observation of need. The control trial for a specific prosocial task never immediately followed or preceded the corresponding experimental trial. The order of presentation of experimental and control prosocial trials was counterbalanced in four orders such that for half of the participants, the experimental trial was seen before its respective control trial. In all experimental prosocial trials, the experimenter never verbally requested aid.

### Instrumental Need

Helping was elicited using an "out of reach" task that conceptually replicated the "clothespin" task from Warneken and Tomasello (2006). In this task, an experimenter (E1) picked up a small plastic toy and playfully walked it across the table. In the experimental condition, she dropped they toy over the edge of the table and said "oops!" while reaching for it. E1 reached with an outstretched arm and hand gesturing toward the toy for 5 s, and she then alternated her gaze between the toy and the child for 5 s until the participant provided a response or the trial ended (i.e., 10 total seconds had elapsed). In the control condition, E1 deliberately placed the toy on the floor and said "there!", folding her arms on the edge of the table. E1 held this pose with a neutral expression for 10 s or until the child provided a response. Trials ended when 10 s elapsed or the children engaged in helping behavior, which consisted of retrieving the toy and giving it to E1. Observed nontarget behaviors included ignoring the toy, playing with the toy, or explicitly refusing to assist the experimenter.

## Material Desire

Sharing was elicited using an "unequal snack" task (Dunfield et al., 2011). Prior to sharing trials participants were told that they would be getting a snack. A second experimenter (E2) entered the testing room with two small plastic containers that contained either cheese flavored or graham crackers (based on the parent's prior selection). E2 always offered E1 her snack first, holding the container out so both the participant and E1 could see the contents. When E1 received her snack she showed the contents to the participant and remarked, "Look what I have." After observing E1's snack, the child was given their container. In experimental trials, E1 received no treats while the participant received four. E1 made a sad face and placed a hand outstretched, palm up in a requesting gesture. She gazed down at her hand for 5 s then alternated her gaze between her hand and the participant for 5 s. In the control condition, both E1 and the participant received two treats. E1 waited for the child to receive their treats before she began consuming hers while gazing at the child with a neutral expression. Trials ended when the participant shared by offering E1 one or more of their treats (i.e., they engaged in prosocial behavior), failed to share by consuming all their treats, explicitly denying the experimenter (i.e., saying no or picking up the treats and creating distance), or 10 s elapsed.

### Emotional Distress

Comforting was elicited using a "physical harm" task (Dunfield et al., 2011). In this task E1 banged her knee against the edge of a low table making a loud noise. In the experimental condition, E1 then sat down with a look of distress on her face and rubbed her knee while vocalizing pain "oh! my knee, I banged my knee!". For the first 5 s the experimenter looked down at her knee, and then for the next 25 s she alternated her gaze between her knee and the participant. In the control condition, the experimenter simply sat down and looked toward the participant for 30 s with a neutral expression on her face. Because previous research suggests that 10 s may not provide enough time for the participants to respond to emotional distress (see Discussion, Dunfield et al., 2011), we report two comforting assessments. First, to allow comparisons

to other tasks, we report participants' responses following 10 s of emotional distress. Second, we report participants' responses to emotional distress over a 30 s period. Trials ended when the participant comforted the experimenter or 30 s elapsed.

To account for the diversity of appropriate comforting responses, in both the experimental and control condition, children were evaluated for their engagement in approach comforting behavior (i.e., walks over to the experimenter to see if she is alright, kisses experimenter's knee) and nonapproach comforting behavior (i.e., vocalizing concern for the experimenter from a distance, providing instructions for help, directing the caregiver's attention toward the experimenter to help her). Importantly, for instances of both approach and nonapproach behavior, only behavior aimed at alleviating the distress of the experimenter were considered 'comforting.' Vocalizations and approaches that did not serve to provide comfort were coded as non-target behavior. Other non-target behaviors included approaching the caregiver without drawing attention to E1 (e.g., self-soothing), ignoring or failing to respond to E1 at all, and actively refusing to assist her.

For each of the prosocial tasks, regardless of condition, participants received a score of 1 if they produced the target behavior and a score of 0 otherwise. Two ASD participants did not receive all six prosocial trials due to their disengagement from testing: one participant did not provide data for a comforting control trial and both sharing trials, and the other participant did not provide data for a helping control trial and two sharing trials. As a result, 13 ASD participants were included in the helping and comforting analyses and 12 ASD participants were included in the sharing analysis.

# Coding and Data Analysis

Each session was coded by a research assistant who was blind to the purpose and hypotheses of the study. A second blind coder coded a subset of the videos (10 videos, 35%) to measure interrater reliability. Inter-rater agreement ranged from strong to almost perfect across all prosocial tasks (Helping: Experimental κ = 1.00, Control κ = 1.00; Comforting: Experimental κ = 0.86, Control κ = 1.00; Sharing: Experimental κ = 1.00, Control κ = 1.00; McHugh, 2012).

# RESULTS

Due to the predominance of male participants, we did not assess gender as an independent variable. Additionally, because of the small sample and four counterbalanced orders there was not sufficient power to determine if order effected participants' responses. Importantly, previous research employing the same tasks and design did not observe effects of gender or order, suggesting these variables are unlikely to have a meaningful effect in the current sample (see Dunfield et al., 2011).

# Instrumental Need

To assess children's responsiveness to an experimenter's instrumental need, we compared the frequency of helping across the experimental and control condition. If participants are sensitive to the experimenter's need they should help more frequently in the Experimental than Control condition. TD participants helped more in the experimental versus control condition (McNemar test, 1, N = 14, p = 0.02, **Figure 1A**) whereas, participants with ASD were equally unlikely to return the toy in both conditions (McNemar test, 1, N = 13, p = 0.25, **Figure 1B**). In the experimental condition, seven TD participants (50%) helped, in contrast to three participants with ASD (21.4%). Importantly, no participants from either group retrieved the toy in the control condition. Although the ASD participants did not help significantly more in the experimental condition than the control condition, the number of ASD participants offering help in the experimental condition was not significantly different from the number of TD participants who helped in the experimental condition (χ <sup>2</sup> = 2.49, 1, N = 28, p = 0.12; **Figure 2**).

# Material Desire

Comparing the frequency of sharing across the Experimental and Control condition assessed children's response to material desire. Children who are sensitive to another's material desire are expected to share more in the Experimental than Control condition. TD participants shared more in the experimental condition than control condition (McNemar test, 1, N = 14, p = 0.001, **Figure 1B**). Eleven TD participants (78.6%) offered E1 treats in the experimental condition whereas no participants offered their treats in the control condition. In contrast, ASD participants did not share more frequently in the experimental condition (4, 33.3%) than in the control condition (0, 0%; McNemar test, 1, N = 12, p = 0.12, **Figure 1B**). When comparing sharing rates in the experimental condition across the two groups, TD participants were significantly more likely to share than participants with ASD (χ <sup>2</sup> = 5.42, 1, N = 26, p = 0.02; **Figure 2**).

# Emotional Distress

Children's sensitivity to emotional distress was assessed by comparing comforting behavior in the presence (Experimental) or absence (Control) of distress cues. Children who are sensitively responding to another's distress are expected to comfort more in the experimental than control condition. Importantly, previous research suggests that children may take longer to respond to emotional cues relative to instrumental need or material desire. To that end, we are reporting two comforting analyses. First, in order to facilitate comparison with the other prosocial tasks, we will report comforting following a 10 s response window. Second, to ensure responses aren't underestimated due to a short response window, we will report comforting behavior following 30 s.

Within the first 10 s, participants in both groups offered little comfort and did not differentially comfort across the two conditions (TD: McNemar test, 1, N = 14, p = 0.63; ASD: McNemar test, 1, N = 13, p = 0.50). However, when assessed over the full 30-s, both groups comforted more in the experimental than control condition (TD: McNemar test, 1, N = 14, p = 0.03, **Figure 1A**; ASD: McNemar test, 1, N = 13, p = 0.02, **Figure 1B**). Half of the participants in both groups offered comfort in the experimental condition (TD: 7, 50%; ASD: 9, 64.3%; **Figure 1**), whereas only one ASD and one TD participant offered comfort in the control condition (ASD:

FIGURE 1 | Percent of Typically Developing participants (A) and participants with ASD (B) who responded to instrumental need, material desire, and emotional distress by condition. All Participants were given up to 10 s to respond except where noted (i.e., Comfort 30 s).

7.7%; TD: 7.1%). The two participants who comforted in the control condition also comforted in the experimental condition. The ASD participant comforted within the first 10 s, the TD participant did so after 10 s but before the response period ended. Both groups of participants offered comfort at equally high rates following the 30 s response period (χ <sup>2</sup> = 0.58, 1, N = 28, p = 0.45; **Figure 2**).

# Relations Between Prosocial Tasks

The majority of TD participants produced two prosocial behaviors (8, 57.1%) whereas the majority of participants with ASD produced none (4, 33.3%) or one (4, 33.3%), though the distribution of number of prosocial behaviors produced did not

differ across the two groups (χ <sup>2</sup> = 5.42, 3, N = 26, p = 0.14; **Figure 3**). In the TD group, none of the prosocial tasks were associated (8's −0.17 to 0.17, p's > 0.51). In contrast, in the ASD group helping and sharing were associated with each other (8 = 0.82, p = 0.005) but not with comforting (8's < 0.24, p's > 0.48). Importantly, the relation between helping and sharing in the ASD group is more likely due to the infrequency with which either behavior was produced than an actual relation between the tasks.

# DISCUSSION

The goal of the present study was to explore the prosocial tendencies of young children with ASD by presenting tasks that involved recognizing and responding to three different varieties of need: instrumental, material, and emotional. Specifically, we examined the ability of children with ASD to distinguish situations in which an adult needs assistance (experimental condition) from perceptually similar situations in which needs are absent (control condition). We further explored prosocial motivation by comparing the frequency with which children with ASD responded to instrumental need, material desire, and emotional distress relative to mental-age-matched TD peers. We found that despite well-documented social cognitive impairments, young children with ASD were often willing and able to engage in appropriate prosocial behavior. Like their TD peers, children with ASD differentiated situations in which a need was present from situations in when a need was absent insomuch as prosocial behavior was observed only once over the course of 39 control trials.

Importantly, the picture of competence is somewhat more complicated. Though children with ASD never offered assistance when it was not required, they also rarely offered assistance when it was required. There was a trend toward offering significantly more assistance in the experimental over control conditions in response to both instrumental need and material desire; however, the difference only reached statistical significance for emotional distress. Further, when comparing the frequency with which children with ASD and TD children produce prosocial behavior, we found similar rates of helping and comforting but reduced rates of sharing. Finally, when we examined the varieties of

other-oriented behaviors produced, the majority of participants with ASD produced none or one prosocial behavior, whereas the majority of TD participants produced two. Interestingly, though none of the varieties of prosocial behaviors were associated in the TD participants, the low frequency of helping and sharing lead to an apparent correlation in children with ASD. What can we make of this unique pattern of production of helping, sharing, and comforting in young children with ASD and how does it help us understand the role of social cognition and motivation in the production of early prosocial behaviors?

Below, we will review the findings for each of the three varieties of prosocial behavior in relation to past research and then interpret our current results in light of theoretical proposals regarding the nature of early prosociality. Throughout, given the exploratory nature of this research, we will highlight future research directions suggested by these results. In general, these findings support and expand existing research by demonstrating that despite marked social impairments, children with ASD do act prosocially in response to other's needs and can do so in situations that extend beyond helping and cooperating (Liebal et al., 2008; Paulus and Rosal-Grifoll, 2017). Moreover, when we compare the pattern of responses across the TD and ASD participants, our results support the proposal that different types of prosocial acts are best understood as unique behaviors that depend on distinct social cognitive skills and motivations rather than a homogenous family of actions, as the amount of the prosocial behavior displayed, and relations between tasks, varied depending upon the kind of need that the experimenter was displaying and the participant's group status.

# Helping

Consistent with past research, TD participants readily recognized and responded to an experimenter's instrumental needs, yet they did so at a surprisingly low frequency relative to typical helping rates in methodologically similar studies (Dunfield et al., 2011; Dunfield and Kuhlmeier, 2013). Specifically, though Dunfield et al. (2011) found a similar rate of helping (50%) in 24-month olds assessed using the identical experimental paradigm, the vast majority of 2- to 4-year-old offered help in a highly similar task that afforded multiple helping opportunities (Dunfield and Kuhlmeier, 2013). Importantly, children with ASD engaged in similarly low levels of instrumental helping, with less than a quarter of affected children retrieving the out-of-reach toy for the experimenter. When looking at the overall rate of helping, the frequency with which our ASD group responded to instrumental need is approximately half that seen in the most comparable published study (Liebal et al., 2008). However, the number of children who helped exclusively in experimental trials was comparable across studies (about 20% of children with ASD in both cases). Two key methodological differences may account for the lower levels of helping observed in our sample. First, we included only one experimental trial, affording participants fewer opportunities to demonstrate their helpful capabilities. The different frequency of help offered across the two variants of Dunfield and colleagues' past work suggests this as a viable interpretation. Further, Liebal et al. (2008) reported the portion of trials (out of two) that participants responded to whereas we report proportion of participants responding. The divergent pattern of results suggests an important avenue for examining individual differences in the tendency to produce even the simplest prosocial behaviors, especially in atypical populations (Schachner et al., 2018).

Second, the behavior displayed by the ASD group suggests that our choice of stimuli may have reduced their likelihood of helping. Specifically, the provision of instrumental help appeared to come at a cost to the child, as it involved a desirable toy. Fifty-five percent of the children with ASD who did not provide help retrieved the toy but played with it themselves rather than giving it to the experimenter. In contrast, the target objects used by Liebal et al. (2008) were likely less interesting to the child (e.g., clothespin, pen) and potentially easier to part with (see also Paulus and Rosal-Grifoll, 2017). Indeed, the correlation between helping and sharing that was uniquely observed in the ASD group lends further support to the idea that the current task imposed unintended challenges associated with inhibiting a prepotent response, namely relinquishing a desired resource. Relatedly, because the experimenter was playing with the toy as opposed to using it in a more unambiguously goal directed manner – such as Liebal et al.'s (2008) experimenter who was using the pins to hang clothes – the experimenter's need may have been less clear to the participants with ASD. This methodological limitation provides support for proposals regarding the multifaceted nature of prosocial motivation (e.g., Paulus, 2018).

# Sharing

Replicating past research, TD participants recognized another's material desire and readily shared their treats when presented with unequal distributions (Dunfield et al., 2011; Dunfield and Kuhlmeier, 2013). When presented with the identical paradigm, 58% of 24-month-old offered to share their resources with a needy experimenter. Consistent with this observation, 78.6% of our TD participants shared in the experimental condition. To our knowledge, the present study is the first to experimentally evaluate (i.e., by including a control trial) preschoolers with ASD for their propensity to recognize a social partner's lack of material need and then act to alleviate that need by sharing some of their own material resources.

In the present study, approximately one third of children with ASD shared with the experimenter when they had an abundance of treats but she had none. In contrast, no participants shared their snack when both parties had equal portions, suggesting at least some recognition of the experimenter's material need in the former. Yet, relative to TD children, children with ASD shared at a much lower rate. This pattern of results is in striking contrast to Paulus and Rosal-Grifoll's (2017) finding that participants with ASD tended to give most of their resources away. Interpreting this difference is difficult because it is impossible to tell whether the low sharing rates were due to difficulties in perspective taking or low motivation. To the extent that ASD participants in our sample recognized material desire, it remains unclear as to how the children determined that a need was present, given that they had multiple cues (i.e., an unequal distribution, an outstretched hand, and a negative facial expression) available. Indeed, explicit instructions to distribute the resources may help explain the

high levels of generous behavior observed in previous research. Unfortunately, the nature of our design does not permit us to comment on the mechanism underlying the reduced rate of sharing in children with ASD, which may relate to obstacles in need-detection, a motivational component, or the capacity to produce sharing behavior.

Previous work with TD populations shows that even when children know that they should share, and expect others to share, they have difficulty enacting norms of fairness (Blake, 2018). Relatedly, behavioral control is importantly and uniquely associated with sharing over and above concerns about fairness in TD participants (Steinbeis and Over, 2017). As highlighted above, our results leave open the possibility that relatively lower rates of prosocial behavior in general in our ASD sample could be due to lower social motivation, or increased difficulty inhibiting prepotent responses. Future research can seek to better understand the intersecting influence of social cognition and motivation in the production of prosocial behavior.

# Comforting

Similar to past research, the TD participants readily recognized and appropriately responded to the experimenter's emotional distress (Dunfield et al., 2011; Dunfield and Kuhlmeier, 2013). Specifically, previous research, using a shortened response period (i.e., 10 s) found no comforting in 24-month-old TD children (Dunfield et al., 2011), but with age, and a longer response period, 3- to 4-year-old TD children readily recognized and responded to another's emotional distress (Dunfield and Kuhlmeier, 2013). Importantly, children with ASD did so as well. Indeed, a majority of children with ASD comforted the experimenter (65%), either verbally (42.90%) or physically (14.30%), and were as likely to do so as their TD peers. Despite the scarcity of comparative research on comforting in young children with ASD, and disparate findings within the field of empathy, the current results were unexpected given the literature that characterizes ASD children and adults as tending to be poor at identifying others' emotional states (e.g., Dyck et al., 2001).

A counterintuitive, yet plausible, explanation is that having a reduced empathic response (Corona et al., 1998) may have actually benefitted children with ASD in the current study. While we take caution in interpreting the affective responses of our sample given that our measurement was only of observable/audible acts of comfort, children with ASD did not appear to treat the comforting trial as an emotionally laden task. The comforting behavior exhibited by children with ASD was largely instrumentally oriented (e.g., kissing the experimenter on the knee, asking if the experimenter needed a Band-Aid, providing direction) and lacked signs of personal distress. Thus, consistent with Corona et al. (1998), children with ASD in the current study did not appear averse to the situation itself, affording the opportunity to act in an other-oriented manner. Moreover, the automaticity of their responses suggests that these children with ASD had good knowledge of, or scripts for, what to do when someone is hurt, which were activated once the need for comfort was detected. Importantly, this tendency to engage in approach-oriented comforting behavior differed from the TD children who were more reserved in their provision of comfort. Specifically, none of the TD participants directly approached the experimenter; instead they tended to engage their caregiver or offer verbal reassurance.

Taken together, the experience of witnessing the experimenter hurt herself and determining an appropriate course of action may have been less emotionally taxing to children with ASD than TD children (for more discussion of the emotional cost of comforting, see Hoffman, 1982, 2000). It is possible that TD children may have experienced more emotional contagion when faced with the experimenter's distress resulting in more personal distress and less other-oriented behavior (Eisenberg et al., 1991). This interpretation, if accurate, again draws upon the distinction between 'empathy' and 'comforting' in that the latter need not require one to share in the emotional experience of distress.

Comparing children with ASD's responses to the three varieties of negative states – instrumental need, material desire, and emotional distress – may suggest an important insight into the combined effects of social cognitive understanding and motivation on the production of prosocial behavior in early childhood. Children with ASD may have perceived the comforting task as relatively 'cost-free,' as the experimenter could be comforted without having to relinquish ownership of a desired good. Specifically, to enact effective helping or sharing in the current tasks the child was required to relinquish control of desired goods (i.e., an action figure and preferred treats respectively); in contrast, there were no material costs associated with offering the pained experimenter verbal or physical support. Documented deficits in inhibitory control in ASD populations (Hill, 2004) support the proposal that the "cost" of helping and sharing in the current task may be higher for participants with ASD than TD participants. Because both the behavioral and emotional costs of responding to emotional distress appear to be lower for ASD participants than TD participants, especially relative to instrumental needs and material desire, it is possible that comforting behaviors most clearly reflect the other-oriented tendencies of children with ASD.

Another possibility is that the parameters of the comforting task facilitated the detection of the experimenter's need. Multiple factors can add to the complexity of this task, including the complexity of the need (instrumental, material, emotional) and the number of cues that are at children's disposal (verbal, nonverbal, situational; e.g., Svetlova et al., 2010). The comforting paradigm used in the present study offered children a clear verbal cue as to what had happened to the experimenter ("Oh! My knee. I banged my knee!"). Though the required intervention was not verbalized, this verbal marker of distress may have increased the saliency of the fact that the experimenter was hurt. Best efforts were made to ensure each of the three tasks were comparable and verbalizations were made during the helping and sharing trials, but they did not directly say what was 'wrong,' leaving children to rely on the non-verbal cues offered by the experimenter instead. Given that children with ASD are challenged in their understanding of non-verbal cues (Dyck et al., 2001), they may have been disadvantaged at detecting the experimenter's need in the helping and sharing trials relative to the comforting trials. The success of children with ASD on the comforting task may then be associated with the increased saliency of the experimenter's need,

which, in turn, aided in their extraction of pertinent information about the situation and potentially activated a repertoire of scripted comforting responses. Under this account, knowing how to comfort the experimenter may not have been achieved through a true comprehension of the complex emotional need associated with being hurt, but through previous scaffolding of what one should do when a person indicates that they are hurt. Alternatively, participants had a longer response window in the comforting task relative to the helping and sharing tasks, and the relative performance of the two groups across the three tasks may suggest that prosocial behaviors take longer to mount for children with ASD. Future research is necessary to further investigate these complimentary interpretations.

# Future Directions and Limitations

While the present study advances our understanding the prosocial tendencies of young children with ASD, and the role of social cognition and motivation in the production of prosocial behavior, it represents only the beginning stages of a movement toward properly appreciating what forms of prosocial behavior children with ASD can and will engage in, what cues are necessary to detect a social partner's need, and what developmental pre-requisites are needed to enable engagement in effective other-oriented behavior. Most likely, the constellation of results obtained in the present study suggests that a myriad of factors are implicated in the propensity of children with and without ASD to act prosocially.

The present study is exploratory and contains some broad limitations that highlight areas that may be addressed in future research. First, knowing how to act on another's behalf does not necessarily equate to understanding that person's need. Instead, having a repertoire of learned behaviors, or scripts, to draw upon when prosocial behavior is warranted may be sufficient. Likewise, the absence of prosocial behavior in some of our participants must not be interpreted as a lack of understanding about what to do, as multiple factors that were not measured here may have inhibited children's ability to demonstrate and communicate their understanding (e.g., motivation, use of gestures, motor planning, etc.). Finally, as recently discussed, it is notoriously difficult to determine the motivation that underlies atypical social cognitive performance in young children with ASD (Jaswal and Akhtar, 2018). Indeed, the propensity for children with ASD to readily offer an experimenter support when the costs of doing so are low suggest that even young children with ASD are not disinterested in the welfare of others. Future research employing novel physiological designs (e.g., pupil dilation, Hepach et al., 2017) may shed additional light on this important, open question.

# REFERENCES


# CONCLUSION

The current study has provided an initial demonstration that young children with ASD are able to distinguish situations where need is warranted from when it is not and appear to be able to tap into pertinent knowledge about a person's intentions and desires. However, when the cost of engaging in prosocial behavior is high, children with ASD may be less inclined to engage in the behavior. Both the capacity to recognize another's need and the drive to act on behalf of another appear to play important roles in the production of prosocial behavior. Other variables, including the saliency of the indicators, behavioral control, and learned behaviors may also support the prosocial performance in children with ASD and require further exploration. Future investigation is needed to systematically delineate the individual and environmental correlates of prosocial behavior in this population and how this population can inform our understanding of the pervasive tendency for Homo sapiens to act on behalf of others.

# ETHICS STATEMENT

This study was carried out in accordance with the recommendations of the Queen's University General Research Ethics board. Parents of the participants provided written informed consent, all subjects additionally provided verbal or non-verbal assent. The protocol was approved by the Queen's University General Research Ethics board.

# AUTHOR CONTRIBUTIONS

KD, LB, EK, and VK designed the study. LB collected the data and conducted preliminary analyses. KD conducted the primary analyses. KD, LB, EK, and VK all contributed to the writing of the current manuscript.

# FUNDING

This research was supported in part by grants from the Developmental Disabilities Consulting Program at Queen's University (EK). Preparation of this manuscript was supported in part by a Concordia University Research Award (KD) and the Social Sciences and Humanities Research Council of Canada (VK). Please note that a portion of this manuscript overlaps with a previous version housed on a Queen's University server.




**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Dunfield, Best, Kelley and Kuhlmeier. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# The Development of Prosocial Attention Across Two Cultures

Robert Hepach1,2 \* and Esther Herrmann3,4

<sup>1</sup> Department of Research Methods in Early Child Development, Leipzig University, Leipzig, Germany, <sup>2</sup> Leipzig Research Center for Early Child Development, Leipzig University, Leipzig, Germany, <sup>3</sup> Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany, <sup>4</sup> Minerva Research Group on the Origins of Human Self-Regulation, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany

Despite the significance of prosocial attention for understanding variability in children's prosociality little is known about its expression beyond infancy and outside the Western cultural context. In the current study we asked whether children's sensitivity to others' needs varies across ages and between a Western and Non-Western cultural group. We carried out a cross-cultural and cross-sectional eye tracking study in Kenya (n = 128) and Germany (n = 83) with children between the ages of 3 to 9 years old. Half the children were presented with videos depicting an instrumental helping situation in which one adult reached for an object while a second adult resolved or did not resolve the need. The second half of children watched perceptually controlled non-social control videos in which objects moved without any adults present. German children looked longer at the videos than Kenyan children who in turn looked longer at the non-social compared to the social videos. At the same time, children in both cultures and across all age groups anticipated the relevant solution to the instrumental problem in the social but not in the non-social control condition. We did not find systematic changes in children's pupil dilation in response to seeing the problem occur or in response to the resolution of the situation. These findings suggest that children's anticipation of how others' needs are best resolved is a cross-cultural phenomenon that persists throughout childhood.

#### Edited by:

Kelsey Lucca, University of Washington, United States

#### Reviewed by:

Dorsa Amir, Boston College, United States Tara Callaghan, St. Francis Xavier University, Canada

> \*Correspondence: Robert Hepach robert.hepach@uni-leipzig.de

#### Specialty section:

This article was submitted to Developmental Psychology, a section of the journal Frontiers in Psychology

Received: 04 June 2018 Accepted: 15 January 2019 Published: 05 February 2019

#### Citation:

Hepach R and Herrmann E (2019) The Development of Prosocial Attention Across Two Cultures. Front. Psychol. 10:138. doi: 10.3389/fpsyg.2019.00138

Keywords: children, eye tracking, cross-cultural research, pupil dilation, attention

# INTRODUCTION

Prosocial attention, the degree to which we attend to the needs of others, precedes prosocial behavior. Even before they are old enough to actively help others themselves, children have been shown to focus on how well others are helped in both sharing and instrumental helping contexts (Kuhlmeier et al., 2003; Geraci and Surian, 2011; Hamlin et al., 2011; Hepach et al., 2016; Köster et al., 2016b). Seeing individuals being helped (or not) provides a child crucial social information. They become familiar with various forms of need, e.g., instrumental needs, emotional needs, and material needs (Dunfield, 2014), and they learn the prosocial or antisocial nature of the agents they are observing, i.e., whom to approach because they helped others and whom to avoid because they did not help others (Vaish et al., 2010; Dahl et al., 2013; Van de Vondervoort and Hamlin, 2018). Children's prosocial attention is thus an important prerequisite for the maturation of their own prosocial behavior. Studying the mechanisms of prosocial attention, i.e., how children anticipate help and how their physiological arousal changes as a consequence of others needing help and being helped, can contribute to a better understanding of the individual differences observed in children's prosocial behavior.

Infants are prosocially attentive from as young as 6 months. They expect resources to be distributed equally (Geraci and Surian, 2011; Schmidt and Sommerville, 2011; Sloane et al., 2012) and are surprised if others are blocked from achieving an instrumental goal and expect agents to approach those who hindered them over those who helped them (Kuhlmeier et al., 2003; Hamlin et al., 2007; Köster et al., 2016b). Infants not only form expectations about how others treat one another, but also prefer agents who have helped others over those who have harmed others (Hamlin et al., 2007). When making these choices, infants take into account an agent's goal, avoiding those with harmful intentions even when they did not succeed in carrying out the harmful behavior (Hamlin, 2013). It has been suggested that sympathy in response to others' distress underlies these social evaluations (Kanakogi et al., 2013). Infants not only respond to how others are helped but also anticipate how others are best helped (Köster et al., 2016b) and toddlers look longer at the correct solution to an agent's instrumental problem (Hepach et al., 2016).

Prosocial behavior emerges in the second year of life across cultures (Callaghan et al., 2011). However, previous research shows that helping and sharing behaviors vary across culture and context (House et al., 2013; Blake et al., 2015; Paulus, 2015; Köster et al., 2016a). For instance, 1.5- to 2.5-year-old toddlers' helping behavior varied in Germany and Brazil depending on how mothers structured helping tasks (Köster et al., 2016a). In a comparison of German and Indian children, the observed variability in instrumental helping was tied to parental scaffolding (Giner Torréns and Kärtner, 2017). With regards to sharing behavior, children varied in whether they engaged in costly sharing depending on their age and culture (House et al., 2013). Children's prosociality undergoes major qualitative changes after 3 years of age with prosocial behavior changing in frequency and type. This raises important questions concerning the underlying mechanism. Does children's attention to others' needs increase or decrease with age or does it follow a u-shaped pattern similar to children's sharing behavior (House et al., 2013; see also Blake et al., 2015)? Given the variability in children's prosocial behavior across development and cultures, is there similar variability in prosocial attention (Kärtner, 2018)?

Children's prosocial attention is closely tied to their prosocial behavior. Neural signature responses at 14 months were related to instrumental helping at 18 months and comforting at 14 months of age (Paulus et al., 2013). Twelve to 14-month-old infants' expressed degree of the Nc ERP component in response to seeing others being helped or hindered related to whether or not they reached for the prosocial and antisocial agent (Cowell and Decety, 2015; see also Cowell et al., 2018). In addition to activation of the central nervous system, changes in the autonomous nervous system (ANS) activity predict whether and how much 1.5 to 5.5-year-old children will instrumentally help others (Hepach et al., 2017a) and how much they will share with others (Miller et al., 2015). More specifically, empathic concern but not personal distress predicts instrumental helping behavior in young children (Eisenberg, 2000). At 4 years of age, children's baseline ANS activity and reactive ANS patterns predict their altruistic sharing (see Miller, 2018, for a recent review). As children enter school-age, sharing is related to behavioral control which becomes increasingly relevant for the self-regulation of selfish desires in order to benefit others (Steinbeis, 2018). Studying prosocial attention can thus provide important insights into of the mechanisms driving prosociality.

To date, the study of prosocial attention (as opposed to behavior) has focused on children at pre-weening age, typically younger than 3 years, and focused almost exclusively on Western samples. Therefore, this study extends previous work by e presenting German and Kenyan children aged 3 to 9 years old with a standardized eye tracking paradigm. The Kikuyu are the largest ethnic group in Kenya and E. H. has established a working relationship with local schools Kikuyu children are thus familiar with Westerners and not hesitant to participate in behavioral studies. The Kikuyu are traditionally small-scale farmers who cultivate vegetables and practice animal husbandry for their subsistence. The immediate nuclear family is the basic economic unit and relatives support one another. Many children attend the local nursery school from about 4 years of age, and almost all children in a community go to school once they are 5 years old.

Half the children were presented with videos that either depicted an instrumental helping situation in which one adult reached for an object while a second adult resolved the need or not. The other half watched perceptually controlled nonsocial videos in which objects moved without agents present. In the non-social videos each object followed the same movement trajectory as in the social condition. Following previous work by Hepach et al. (2016), we collected data in the social and non-social contexts based on four dependent variables: overall attention to the video stimuli, children's looking time to the agent's need prior to the resolution of the situation (see also Köster et al., 2016b), the change in children's pupil dilation in response to seeing the need situation arise and the change in children's pupil dilation upon seeing the situation being resolved. Such an assessment of children's prosocial attention reduces the possible impact of children's shyness in novel situations.

Pupillometry is an established measure of internal arousal in infancy and early childhood research (Laeng et al., 2012; Sirois and Brisson, 2014; Hepach and Westermann, 2016) similar to research in adults that shows greater pupil dilation in response to emotionally arousing images and sounds (e.g., Partala and Surakka, 2003; Bradley et al., 2008; Snowden et al., 2016). In the context of viewing others needing help, children's increase in pupil dilation relates to whether and how fast they are to subsequently help (Hepach et al., 2016, 2017a). Assessing children's pupil dilation in response to seeing others needing and being helped complements measures of children's anticipatory looking behavior before others are helped. Taken together, these measures provide a window into children's prosocial attention (Hepach et al., 2016; Köster et al., 2016b).

In the current study and based on prior work with 2-yearold German children, we predicted that children (1) look longer at the need in the social compared to the non-social condition (Hepach et al., 2016; Köster et al., 2016b), (2) show greater increase in pupil dilation in the social compared to the nonsocial condition (Hepach et al., 2016), and (3) that children's

internal arousal should decrease if the recipient's need was fulfilled but remain elevated if the need was not appropriately fulfilled (Hepach et al., 2016). In addition, we explored whether children's visual anticipation of the need resolution as well as children's changes in pupil dilation varied with age. Second, we sought to apply pupillometry and anticipatory gaze tracking techniques within a cross-cultural research paradigm (German and Kenyan children) as for the most part these methods had only been used in studies with Western populations. We included cultural group as a fixed effect in each of the three analyses and did not have a priori predictions with regards to the direction of an effect of culture. Our statistical analyses of culture were thus exploratory.

# MATERIALS AND METHODS

# Participants

Children were recruited in Leipzig, Germany and in local schools near Nanyuki in Kenya. German children came from middle-class families and Kenyan children were all Kikuyu, who lived in small villages near the Kenyan town of Nanyuki (see **Figure 1**). The German sample included 83 children (41 boys) and the Kenyan sample included 128 children (70 girls) across 7 age groups. Two additional German children were tested but excluded because one child was not tested in the correct experimental condition and because for one child the system sampled at a lower than average rate. Ten additional Kenyan children were tested but excluded because calibration could not be performed (n = 9) or because a child was not tested in the correct experimental condition.

This study's design and procedure was carried out in accordance with ethical guidelines and ethical approval for this study was provided by the Max Planck Institute for Evolutionary Anthropology Child Subjects Committee. All parents were informed about the study and written consent to participate was obtained for each child from parents in Germany and from the children's legal guardians (head teacher of the children's schools) in Kenya. Both consent procedures were approved by the Max Planck Institute for Evolutionary Anthropology Child Subjects Committee that approved the study.

# Materials and Design

The videos were identical to those used in prior work (Hepach et al., 2016). In contrast to Hepach et al. (2016) we presented children with only one test trial and used only one type of neutral stimulus, i.e., the blue-colored set. We tested children in a full two factorial between-subjects research design. The independent factors were condition (social vs. non-social) and type of object returned (relevant vs. irrelevant object). Children were presented with videos of a (Western) adult male either stacking cans to build a tower or placing shoes onto a shelf. The adult was either observed by a (Western) adult female (social condition) or children watched videos of self-propelled items being stacked or placed without any adults present (non-social condition).

Within each condition, the order of events was identical and proceeded as follows: first, in the introductory video (1120 gaze samples ∼ 19 s) the adult stacked the items (non-social condition: the items were being stacked). Second, in the drop video (1720 gaze samples ∼ 29 s) just before the task of stacking all items was complete, one relevant and one irrelevant object dropped to the floor. In the social condition only, the adult reached ambiguously for the items (no adults were featured in the non-social condition). In the final resolution video (750 samples ∼ 13 s) the second adult got up and handed the adult the irrelevant object (in the non-social condition the irrelevant object moved back on its own; see **Figure 2**). After each video, we presented the identical sequence of neutral stimuli on the computer screen throughout which pupil diameter was measured. These neutral videos were identical to those used in Hepach et al. (2016) and depicted computer animated bubbles on a blue background (see also Hepach et al., 2016, 2017a,b; Jessen et al., 2016). The total duration of the entire study for each participant was approximately 1 min and 40 s. Within each age group, we counterbalanced the type of context (social vs. nonsocial), the type of activity (stacking cans vs. placing shoes), and the position of the relevant object (left or right). Each child was presented with one video version. In sum, the trial children were

FIGURE 2 | The key scenes of the social condition (top) and the non-social control condition (bottom). The left panel depicts a frame from the introductory scene when the adult was stacking a tower (social condition) or a tower was being stacked (non-social condition). The center panel illustrates the key frame after the objects dropped to the floor. The regions of interest are marked here for illustration purposes. The right panel depicts the resolution of the situation after the second adult picked up an object (social condition) or after one object returned to the table. The individuals depicted here provided written consent for their images to be used in this figure.

presented with in the present study was identical to a trial used in Hepach et al. (2016) with the one exception that only the blue neutral measurement sequence was used.

During the study, children sat in front of an SMI eye tracking unit (model Red-m) attached below the screen of a laptop (17 inch; resolution 1,600 × 900 pixels). The sampling frequency of the eye tracker was set to 60 Hz. Stimuli were presented with Experiment Center (Version 3.7). The data of each child were exported from BeGaze (Version 3.7) to a text file. The processing and statistical analyses were carried out using R (Team, 2015).

# Procedure

In both Germany and Kenya children participated in the study in their respective schools. A female experimenter set up the laptop and eye tracker in a quiet room. She told children that she wanted to show them videos on a computer screen. Children were seated approximately 70 cm away from the laptop. For each child, we carried out a four-point standard calibration procedure. The experimenter remained seated next to the child during the experiment. Before children watched the actual study videos we presented a short video clip of a star image moving two four specified points on the screen in order to later recalibrate the position of children's gaze (Frank et al., 2012). After children completed watching the videos they were escorted by the experimenter back to their respective play group.

# DATA ANALYSIS

We only included samples that belonged to a fixation, defined within BeGaze with 100px dispersion and 70 ms minimal duration. In addition, we averaged from the left and right eye for X and Y-data, respectively. For those 135 participants who provided data on the calibration videos the raw gaze data were additionally corrected using the procedure developed by Frank et al. (2012). The algorithm was adapted for R to post hoc correct participants' point of gaze. For the remaining 75 participants (11 from the German sample) we included the data from the standard calibration of the eye tracking system.

Changes in children's physiological arousal were assessed via changes in pupil dilation. The data were exported from the eye tracker and pre-processed in R (Version 3.4.1; Team, 2015) using the algorithms developed by Hepach et al. (2016). We measured children's pupil dilation during each of the three presentations of the neutral video sequence. Specifically, each neutral video elicited two pupillary light reflexes in brief succession. We calculated the average pupillary minimum of the pupillary light reflex for each neutral video presentation. Increases in internal arousal results in an inhibited pupillary light reflex, therefore leaving the pupils more dilated (Steinhauer et al., 2000; Henderson et al., 2014; Hepach et al., 2015). In the present study, we calculated the change from baseline (first presentation of the neutral sequence) to after the drop sequence (process measure, second presentation of the neutral sequence) and the change from baseline to after the resolution scene (resolution measure, third presentation of the neutral sequence). The processing of pupil diameter changes and the identification of pupillary minimum was carried out in R and followed the steps reported in previous work (Hepach et al., 2012, 2016, 2017b).

The full data set including the text files exported from the eye tracking system, the processing scripts written in R, the data table which formed the basis of all the statistical analyses, and the R-script to execute those statistical analyses can be accessed at https://osf.io/wc3hr/.

# Looking Time: Initial Attention

To investigate children's overall interest in the video before the objects dropped to the floor, we determined the time each child spent looking at the introductory sequence. More specifically, we calculated the number of samples that children looked at the respective video sequence (within the screen area of 1600 × 900 pixels) and divided this by duration of the sequence (1120 samples or 18.7 s) thus arriving at a proportion score for each participant. We ran an analysis of covariance (ANCOVA)

including the interaction of condition (social vs. non-social) with the exploratory variable group (German vs. Kenyan) as well as the interaction of condition and age (linear and quadratic effect, z-standardized) while controlling for gender, and the type of game (can vs. shoes). Visual inspection indicated that model residuals were evenly distributed. This analysis included all 211 subjects who provided data on the introductory clip (see also **Table 1**).

# Looking Time: Anticipatory Looking

We investigated children's looking to the dropped objects within the crucial time window (13 to 29 s) in response to watching the drop video. For each child, we determined the time (i.e., found gaze samples) spent looking at each region of interest (ROI) encompassing the respective object. The dimensions and size of each ROI were identical and were adapted from the dimensions reported in Hepach et al. (2016) to the screen resolution of the present study (ROI width and height: 163 pixels). We calculated the dependent variable as the proportion of time children looked at the relevant object (time relevant object/[time irrelevant object + time relevant object]). Children were included in this analysis only if they looked either at the relevant or the relevant object ROI (see **Figure 2** and **Table 1**). This analysis excluded children who looked at the screen but at neither ROI (see **Table 1** for details on the number of children included in this analysis). We then ran an ANCOVA including the interaction of condition with the exploratory variable group as well as the interaction of condition and age (linear and quadratic effect, z-standardized) while controlling for gender, the type of game, as well as children's initial visual attention (see analysis above). Plotting the distribution of residuals indicated a bi-modal distribution given that a majority of subjects either never or without exception looked at the relevant object after both objects had dropped. As a consequence, we carried out additional pairwise comparisons using non-parametric Mann–Whitney-U-tests (with exact p-values). This analysis paralleled that of Hepach et al. (2016) and provided a test of our first research hypothesis (see **Table 1** for details on the number of participants included in the analysis). To compare our results more closely to those reported by Hepach et al. (2016) we carried out focal analyses comparing children's looking time to the relevant object between the social and non-social condition for German sample only.

# Pupil Dilation: Process Analysis

The change in children's pupil dilation as a consequence of seeing the objects drop was assessed with an ANCOVA including the process change measure of pupil dilation as the dependent variable. The predictor variables were the interaction of condition and the exploratory variable group as well as the interaction of condition and age group (linear and quadratic function, z-standardized) while controlling for gender, and the type of game, children's initial visual attention (see analysis above), as well as children's baseline pupil diameter to account, indirectly, for different luminance levels across testing sessions (see **Table 1** for details on the number of participants included in the analysis). The model residuals were normally distributed. This analysis paralleled that of Hepach et al. (2016) and provided a test of our second research hypothesis. Similar to our analysis of children's anticipatory looking, we ran a focal analysis with the German sample to compare our results more directly to those obtained by Hepach et al. (2016).

# Pupil Dilation: Resolution Analysis

The change in children's pupil dilation in response to the resolution of the situation was assessed with an ANCOVA including the change from process to the resolution measure of pupil dilation as the dependent variable. The predictor variables were the interaction of condition, type of object returned (relevant or irrelevant), and children's process measure whilst controlling for the exploratory variable group, age (linear and quadratic function, z-standardized), gender, and type of game, and children's initial attention (see **Table 1** for details on the number of participants included in the analysis). The model residuals were normally distributed. This analysis paralleled that of Hepach et al. (2016) and provided a test of our third research hypothesis. To compare our results more closely to those reported by Hepach et al. (2016) we carried out a focal analysis for the German sample only.

# RESULTS

# Looking Time

# Initial Attention

The time children spent looking at the video varied at a statistically marginal level by cultural group and condition, F(1,201) = 3.38, p = 0.069, η 2 <sup>p</sup> = 0.02. Children in the German sample looked for similar lengths of time at the social (M = 0.75, SD = 0.26) compared to the non-social videos (M = 0.76, SD = 0.22), t(81) = −0.25, p = 0.81, 95% CI [−0.12, 0.09]. On the other hand, children in the Kenyan sample looked longer at the non-social (M = 0.57, SD = 0.28) than at the social videos (M = 0.42, SD = 0.3), t(126) = 2.93, p = 0.004, [0.05, 0.25] (see **Figure 3**). Overall, children from the German sample spent more time looking at the videos (M = 0.75, SD = 0.24) than children from the Kenyan sample (M = 0.49, SD = 0.3), F(1,201) = 46.37, p < 0.001, η 2 <sup>p</sup> = 0.18, and children across groups looked longer at the non-social (M = 0.65, SD = 0.27) compared to the social videos (M = 0.55, SD = 0.32), F(1,201) = 6.72, p = 0.01, η 2 <sup>p</sup> = 0.04. In addition and overall, children looked longer at the situation in which cans (M = 0.64, SD = 0.28) as opposed to shoes (M = 0.56, SD = 0.32) were stacked, F(1,201) = 4.3, p = 0.04, η 2 <sup>p</sup> = 0.02. There were no interactions of age (linear or quadratic) and condition (Fs < 0.7, η 2 <sup>p</sup> < 0.01) and none of the remaining main effects reached statistical significance (Fs < 3, η 2 <sup>p</sup> < 0.01).

### Anticipatory Looking

The time children spent looking at the relevant object prior to the situation being resolved varied with condition, F(1,133) = 7.07,


**108**

subsamples

 of participants.

 In case the same number of participants

 was used across all four analyses, a single number was entered in the respective cell.

across all age groups between the two cultural groups. Red color represents areas of greatest visual attention. The regions of interest are highlighted with red squares for the purpose of this illustration. The individuals depicted here provided written consent for their images to be used in this figure.

p = 0.009, η 2 <sup>p</sup> = 0.05 (see **Figure 4**). Children looked longer at the relevant object in the social (M = 0.68, SD = 0.43) compared to the non-social condition (M = 0.48, SD = 0.45), U(n[social] = 64, n[non-social] = 80) = 3149, p = 0.01, 95% CI [0 0.31]. In addition, we found a marginally statistically significant main effect for game [F(1,133) = 3.44, p = 0.066, η 2 <sup>p</sup> = 0.03] showing that children, across the social and non-social conditions, looked longer at the relevant object when shoes (M = 0.63, SD = 0.45) as opposed to cans (M = 0.5, SD = 0.45) were being stacked, U(n[can game] = 72, n[shoe game] = 72) = 3039, p = 0.056, 95% CI [−0.005 0.05]. None of the other main or interaction effects yielded statistically significant effects, Fs < 2 and η 2 <sup>p</sup> < 0.01. Our focal analyses of the German sample only yielded a no statistically significant difference between children's anticipatory looking in the social (M = 0.69, SD = 0.46) compared to the non-social (M = 0.54, SD = 0.45) condition, U(n[social] = 30, n[non-social] = 38) = 688, p = 0.11, 95% CI [−0.02 0.3]. At the same time, German children looked at the relevant object more than 50% of the time only in the social condition, T = 321, p = 0.043, and not in the non-social condition, T = 400, p = 0.66.

# Pupil Dilation

# Process Analysis

Children's pupil dilation in response to seeing the objects drop did not vary as a function of cultural group, F(1,142) = 0.67, p = 0.42, η 2 <sup>p</sup> < 0.01, or condition, F(1,142) = 2.77, p = 0.098, η 2 <sup>p</sup> = 0.02. The analysis did yield a statistically significant main effect of children's baseline pupil diameter, i.e., the larger children's pupil during the baseline measurement sequence the smaller the change from baseline to after seeing the objects drop, β = −0.03, SE = 0.009, t = −3.04, p = 0.003. None of the interaction terms (Fs < 2, η 2 <sup>p</sup> < 0.01) or main effects had statistically significant effects (Fs < 3, η 2 <sup>p</sup> < 0.02). Our focal analyses for the German sample revealed that German children showed greater increase in pupil dilation in the social (M = 0.04, SD = 0.06) compared to the non-social (M = 0.02, SD = 0.06) condition, F(1,67) = 4.12, p = 0.046, η 2 <sup>p</sup> = 0.06. In addition, we found that similar to our analyses of both samples, larger baseline pupil diameter was linked to smaller change in pupil dilation, β = −0.04, SE = 0.01, t = −3.5, p < 0.001. None of the interaction terms (Fs < 1, η 2 <sup>p</sup> < 0.01) or main effects (Fs < 3, η 2 <sup>p</sup> < 0.02) yielded statistically significant effects.

#### Resolution Analysis

Children's pupil dilation in response to seeing the situation being resolved yielded a statistically significant effect of their process measure of pupil dilation. The greater children's pupil dilation in response to seeing the objects drop, the smaller the change in pupil dilation after seeing one object return β = −0.67, SE = 0.15, t = −4.62, p < 0.001, a pattern that is consistent with values regressing to the mean. In addition, children's pupil dilation remained increased in the situation showing cans compared to the situation showing shoes, F(1,109) = 3.97, p = 0.049, η 2 <sup>p</sup> = 0.04. None of the remaining interaction terms (Fs < 2, η 2 <sup>p</sup> < 0.02) or main effects (Fs < 3, η 2 <sup>p</sup> < 0.03) yielded statistically significant effects. Our focal analyses for the German sample revealed a similar effect of children's process measure of pupil dilation, β = −0.81, SE = 0.2, t = −3.97, p < 0.001. None of the interaction terms (Fs < 2, η 2 <sup>p</sup> < 0.03) or main effects (Fs < 2, η 2 <sup>p</sup> < 0.02) yielded statistically significant effects.

# DISCUSSION

The current study is the first to compare children's prosocial attention across 7 age groups from 3 to 9 years old and between a Western and Non-Western cultural group. The comparison of German and Kenyan children revealed that each group viewed the video stimuli differently. German children looked longer at the introductory sequence on the computer screen than Kenyan children. In addition, whereas German children attended equally to the social and nonsocial videos, Kenyan children spent more time looking at the non-social control videos than the social videos. These differences in initial overall attention may be explained by the difference in experience of watching computer animated clips without any human actors present. We found this difference in initial attention for the two cultural groups across the seven age groups. Crucially, despite different overall looking time to the introductory sequence, Kenyan and German children correctly anticipated the adult's need. They fixated

on the correct solution to the adult's need more in the social compared to the non-social control condition across all age groups. This replicates previous work which focused predominately on Western children during the first 2 years of life (Hepach et al., 2016; Köster et al., 2016b). These findings suggest that children's anticipation of how others' needs are best resolved is a cross-cultural phenomenon that persists throughout childhood.

Previous work showed that changes in children's internal arousal assessed via changes in pupil dilation, complemented findings from looking time analyses. Two-year-old children showed greater increase in physiological arousal when seeing others' in need in a social compared to a non-social condition and pupil dilation remained elevated when the situation was not resolved appropriately (Hepach et al., 2016). In the present study, we did not find any systematic changes in children's internal arousal across all participants but merely partial support for our second research hypothesis. Only German children showed a weak effect with more dilated pupils in response to seeing the objects drop in the social compared to the non-social control condition. These findings parallel ones with 2-year-old, German children (Hepach et al., 2016) but do suggest that the previously found effect does not generalize across age and cultural groups. Furthermore, we did not find support for our third research hypothesis. Children in the current in sample did not continue to show increased internal arousal when the adult was not helped thus failing to replicate a previous finding with 2-year-old German children (Hepach et al., 2016).

This deserves a detailed discussion given that we used the identical stimuli from Hepach et al. (2016). It is possible that the previously found effect of pupil dilation is specific to 2-yearold German children. The present study cannot rule out this possibility given that we did not have access to 2-year-old Kenyan children and thus did not test this age group. In fact, the central aim of the present study was to sample children 3 years of age and older and to apply the previously developed paradigm within a non-Western cultural group. The lack of statistically significant effects with regard to our assessment of children's pupil dilation raises the question of whether the way in which we captured pupil diameter affected our results. The human pupil first and foremost responds to luminance changes, constricting to brighter stimuli and dilating within darker environments (Sirois and Brisson, 2014). We could not control the luminance levels during our experiment. In Kenya, the study room did not have electricity and the only light source was sun light through the windows (see **Figure 1**). Given that this was the first eye tracking experiment run at the study site we wanted children to be as comfortable as possible and decided to not alienate them from their familiar class room environment by darkening the room. Additionally, piloting in Germany showed that the specific SMI eye-tracker model we used in the study does not track the eyes well in dark rooms. Some light was thus needed in the study room which may have in turn interfered with the measurement. In Germany, we collected data in a comparable manner by not changing the luminance in the kindergarten room. The circumstances under which we collected pupil data were not ideal which may have impaired our ability to detect psychologically induced changes in pupil dilation.

At the same time, it is important to point out that the methodological constraint of not controlling luminance was not systematically confounded with age, group or condition

because the measurement of pupil dilation was taken during the presentation of the neutral videos which were identical for all children. Thus, while we failed to control room luminance we did control screen luminance. Given that no prior work reported pupillometry findings in a cross-cultural study with human children, our study is the first to suggest that it is not enough to control for screen luminance during the measurement of pupil diameter and that control of room luminance is also required. We think that individual differences in room luminance across test sessions in our sample contributed to unaccounted for random measurement error thus failing to provide a strong test for rejecting the null-hypothesis of our second and third research hypotheses. We thus regard the lack of systematic difference in pupil dilation in the current sample to be methodological in nature, not psychological. Given the methodological concerns outlined above, we would caution against a strong theoretical conclusion on the basis of the pupillometry findings (or lack thereof). In addition, while this study represents the largest cross-cultural sample of children in a prosocial attention eye-tracking task, our final sample size within each age group was small. In comparison, previous work included 64 children for one age group of 2-year-old children (Hepach et al., 2016). We cannot rule out that a critical sample size is needed to detect systematic changes in pupil dilation. A necessary next step for future studies would be to conduct a cross-cultural comparison with more subjects per age group including 2-year-old children in a laboratory setting where both stimulus luminance and room luminance can be controlled.

In addition to assessing changes in children's pupil dilation, we assessed their anticipatory looking behavior to test our first research hypothesis. The results of this analysis hold crucial theoretical value for the study of developing prosocial attention. We found that children across all seven age groups and both cultural groups looked longer at the solution that would correctly fulfill the adult's needs. We can rule out that this was merely a perceptual preference given that no such anticipation was found in our non-social control condition. Across development, children continue to anticipate how others might best be helped. One avenue for future research is to assess both children's prosocial attention as well as their prosocial behavior to understand the driving mechanism for these tendencies and to identify individual differences in children's prosociality. Such an approach would also provide an opportunity to investigate different forms of prosocial behavior/attention. Whereas previous work focused on the variability of prosocial behavior across age groups and cultures in the context of sharing material resources (House et al., 2013; Blake et al., 2015), the study of children's prosocial attention has focussed on instrumental need scenarios (Hepach et al., 2016; Köster et al., 2016b).

The current paradigm, with improved control of room luminance, lends itself well to cross-cultural investigations of prosocial attention. Disentangling socio-cognitive factors, i.e., children's prosocial attention provides relevant information to better understand the emergence of prosociality (Callaghan and Corbit, 2018; Kärtner, 2018). Although instrumental helping behavior emerges early in ontogeny across cultures (Callaghan et al., 2011; Callaghan and Corbit, 2018) some other forms of prosociality such as sharing shows variability across development and cultures. An interesting focus for future research is the relation between prosocial attention and prosocial behavior. It is possible that a range of socio-cultural factors, such as maternal structuring of children's instrumental helping behavior that has been found to differ between German, Brazilian, and Indian children, affect prosocial attention (Köster et al., 2016b; Giner Torréns and Kärtner, 2017; Kärtner, 2018). An alternative possibility, is that children's sensitivity to others' needs is less affected by socio-cultural factors than by prosocial behavior itself. This could suggest that culture affects not so much whether we perceive others' needs but how we expect these needs to be fulfilled. In fact, cultures differ with respect to the norms that govern prosocial behavior but these norms may have a different impact on children's prosocial attention (House et al., 2013). In one example, children's aversion to unequal distributions in a resource allocation task followed different cultural and ontogenetic trajectories if the child (advantageous inequity aversion) or a peer (disadvantageous inequity aversion) benefited from the unequal distribution of resources (Blake et al., 2015). It is possible that children respond to both forms of inequity in all cultures, such as looking longer at the distribution or showing greater physiological arousal. But whether children intervene may depend more on developmental age and the norms of the culture they grow up in.

The current study is among the first to compare prosocial attention between a Western and non-Western culture as well as across multiple age groups. At the same time, there are a number of additional methodological improvements needed for future research. One explanation for why Kenyan children looked longer overall at the non-social stimuli than the social stimuli is because the social stimuli with Western adults were less interesting. It is possible that greater overall attention to the social stimuli would be achieved if Kenyan adults were depicted helping (or not helping) one another. Conversely, it would be interesting to present the German children with videos depicting Kenyan adults. For the purpose of the present study it is important to emphasize that we found both Kenyan and German children to look longer at the relevant object in the social than the non-social condition despite differences in their overall initial attention to the videos. Children in both cultures similarly anticipated how the adult would be best helped. Together our findings suggest children's anticipation of others' needs is a phenomenon that is not confined to the Western culture and persists throughout childhood. Future research will need to disseminate whether children's prosocial attention and behavior follow different developmental trajectories.

# AUTHOR CONTRIBUTIONS

RH and EH designed the study and wrote the paper. EH carried out data collection in Kenya and oversaw data collection in Germany. RH conducted the analyses.

# ACKNOWLEDGMENTS

fpsyg-10-00138 February 2, 2019 Time: 18:16 # 10

We thank the District Education Offices of Laikipia Central and the local Kenyan and German schools for

# REFERENCES


supporting our research. We also thank Nadin Bobovnikov for her help with data collection and Anna-Claire Schneider for her comments on an earlier version of the manuscript.



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Hepach and Herrmann. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Do Infants in the First Year of Life Expect Equal Resource Allocations?

Melody Buyukozer Dawkins<sup>1</sup> \*, Stephanie Sloane<sup>2</sup> and Renée Baillargeon<sup>1</sup> \*

<sup>1</sup> Department of Psychology, University of Illinois at Urbana-Champaign, IL, United States, <sup>2</sup> Department of Human Development and Family Studies, University of Illinois at Urbana-Champaign, IL, United States

Recent research has provided converging evidence, using multiple tasks, of sensitivity to fairness in the second year of life. In contrast, findings in the first year have been mixed, leaving it unclear whether young infants possess an expectation of fairness. The present research examined the possibility that young infants might expect windfall resources to be divided equally between similar recipients, but might demonstrate this expectation only under very simple conditions. In three violation-of-expectation experiments, 9-month-olds (N = 120) expected an experimenter to divide two cookies equally between two animated puppets (1:1), and they detected a violation when she divided them unfairly instead (2:0). The same positive result was obtained whether the experimenter gave the cookies one by one to the puppets (Experiments 1–2) or first separated them onto placemats and then gave each puppet a placemat (Experiment 3). However, a negative result was obtained when four (as opposed to two) cookies were allocated: Infants looked about equally whether they saw a fair (2:2) or an unfair (3:1) distribution (Experiment 3). A final experiment revealed that 4-month-olds (N = 40) also expected an experimenter to distribute two cookies equally between two animated puppets (Experiment 4). Together, these and various control results support two broad conclusions. First, sensitivity to fairness emerges very early in life, consistent with claims that an abstract expectation of fairness is part of the basic structure of human moral cognition. Second, this expectation can at first be observed only under simple conditions, and speculations are offered as to why this might be the case.

Keywords: infancy, social cognition, morality, fairness, equality, resource allocation, numerical cognition, first year

# INTRODUCTION

Over the past decade, developmental researchers have begun to systematically explore the foundations of moral cognition in infancy (e.g., Hamlin, 2013b; Spelke et al., 2013; Thomsen and Carey, 2013; Tomasello and Vaish, 2013; Paulus, 2014; Baillargeon et al., 2015; Martin and Olson, 2015; Bloom and Wynn, 2016; Davidov et al., 2016; Warneken, 2016; Liberman et al., 2017; Sommerville and Enright, 2018). In particular, several investigations have sought to uncover the early precursors of adults' and older children's well-established concern for fairness (e.g., Dawes et al., 2007; Fehr et al., 2008; Olson and Spelke, 2008; Rochat et al., 2009; Ng et al., 2011; Baumard et al., 2012; Shaw and Olson, 2012; Smith et al., 2013; Hamann et al., 2014; McAuliffe et al., 2015, 2017; Renno and Shutts, 2015). In this report, we focus

#### Edited by:

J. Kiley Hamlin, The University of British Columbia, Canada

#### Reviewed by:

Arianne E. Eason, Washington University in St. Louis, United States Annette M. E. Henderson, The University of Auckland, New Zealand

#### \*Correspondence:

Melody Buyukozer Dawkins melodibuyukozer@gmail.com Renée Baillargeon rbaillar@illinois.edu

#### Specialty section:

This article was submitted to Developmental Psychology, a section of the journal Frontiers in Psychology

Received: 03 August 2018 Accepted: 14 January 2019 Published: 19 February 2019

#### Citation:

Buyukozer Dawkins M, Sloane S and Baillargeon R (2019) Do Infants in the First Year of Life Expect Equal Resource Allocations? Front. Psychol. 10:116. doi: 10.3389/fpsyg.2019.00116

**114**

on infants' sensitivity to fairness in third-party situations where windfall resources are divided, either fairly or unfairly, between two similar recipients. In the next sections, we first summarize prior findings from relevant tasks. As will become clear, positive results have been obtained in the second year of life with a variety of tasks, providing converging evidence of sensitivity to fairness in older infants. In contrast, results in the first year have been mixed, leaving it unclear whether young infants possess an expectation of fairness. Next, we introduce the present experiments, which sought to reconcile the divergent findings that have been obtained with young infants and, in so doing, to ascertain at what age and under what conditions sensitivity to fairness can be observed in the first year of life.

We reasoned that such evidence would be important for at least two reasons: It would constrain theoretical accounts of the mechanisms by which an expectation of fairness first emerges in infancy, and it would help identify some of the factors that affect under what conditions this expectation is likely to be demonstrated.

# Findings With Older Infants

Evidence of sensitivity to fairness in the second year of life comes from at least three different tasks. In allocation-outcome tasks, a distributor divides resources either equally (equal event) or unequally (unequal event) between two similar recipients. The rationale is that if infants expect the distributor to act fairly, then they should look longer when this expectation is violated in the unequal event. To date, positive results have been obtained in four published reports with infants ages 15–19 months (Schmidt and Sommerville, 2011; Sloane et al., 2012; Enright et al., 2017; Bian et al., 2018; see also Tatone and Csibra, 2018). These reports varied along multiple dimensions, including whether the events were videotaped or live; whether the distributor and recipients were humans or puppets; whether infants saw a single distribution event followed by still-frame images depicting the equal and unequal outcomes or separate distribution events for the two outcomes; and whether the allocated resources comprised four items, with 2:2 and 3:1 outcomes (Schmidt and Sommerville, 2011; Enright et al., 2017), or two items, with 1:1 and 2:0 outcomes (Sloane et al., 2012; Bian et al., 2018). Positive results have also been obtained with infants ages 12–15 months under limited conditions (Ziv and Sommerville, 2017): When shown a videotaped event in which four items were distributed, followed by simultaneous still-frame images depicting equal (2:2) and unequal (3:1) outcomes, infants with one or more older siblings looked significantly longer at the unequal outcome. In contrast, infants without siblings tended to look equally at the two outcomes, as did 12-month-olds who were shown the two outcomes successively, rather than simultaneously (Sommerville et al., 2013; Tatone and Csibra, 2018).

In affiliative-preference tasks, one distributor divides resources equally between two recipients (fair-distributor event), and another distributor divides resources unequally between the same recipients (unfair-distributor event). Next, infants are encouraged to choose between the two distributors or to select one of two identical toys offered by the distributors. The rationale is that if infants expect a fair distribution, then they may prefer the fair over the unfair distributor, just as they prefer individuals who produce helpful as opposed to harmful actions (e.g., Hamlin et al., 2007, 2013a; Hamlin and Wynn, 2011). To date, positive results have been obtained with 16-month-olds using 2:0 violations (Geraci and Surian, 2011), with 15-montholds using 3:1 violations (Burns and Sommerville, 2014), and with 13- and 17-month-olds using 5:1 violations (Lucca et al., 2018). In each report, infants were significantly more likely to prefer or endorse the fair over the unfair distributor.

In reward/punishment tasks, infants first see a fair and an unfair distributor divide resources between two recipients, and then the distributors are rewarded or punished for their actions. In one experiment (DesChamps et al., 2016), for example, 15 month-olds first saw videotaped events in which two women distributed four or six items; one woman did so fairly, and the other did so unfairly, resulting in 3:1 or 5:1 violations. Next, photos of the two women were presented simultaneously, accompanied by a series of seven statements spoken by a disembodied voice. In the reward condition, the statements conveyed praise (e.g., "She's a good girl!"); in the punishment condition, they conveyed admonishment (e.g., "She's a bad girl!"). Infants looked significantly longer at the unfair distributor in the reward condition, but looked equally at the two distributors in the punishment condition. One possible interpretation of these findings is that two separate tendencies contributed to infants' responses: a tendency to look longer at the distributor who did not match the statements spoken, and a tendency to look longer at the unfair distributor (perhaps due to a vigilance or negativity bias; e.g., Kinzler and Shutts, 2008; Vaish et al., 2008; Baltazar et al., 2012). In the reward condition, these two tendencies combined, leading infants to look longer at the unfair distributor; in the punishment condition, these two tendencies canceled each other, resulting in equal looking times at the two distributors.

# Findings With Younger Infants

Sensitivity to fairness in the first year of life has been examined using the same types of tasks as with older infants. Reports using allocation-outcome tasks have yielded mixed results. When tested with computer-animated events showing a two-item distribution, 10-month-olds looked significantly longer at the unequal (2:0) than at the equal (1:1) event (Meristo et al., 2016). However, when tested with videotaped events showing a four-item distribution, with the final still-frame images depicting the unequal (3:1) and equal (2:2) outcomes presented simultaneously, 9- and 6-month-olds tended to look equally at the two outcomes, and this was true whether or not they had older siblings (Ziv and Sommerville, 2017).

A report using an affiliative-preference task also yielded negative results. After watching computer-animated events in which a fair and an unfair distributor divided two items between two recipients, 16-month-olds significantly preferred the fair over the unfair distributor ("Which one do you want? Pick it up!"), but 10-month-olds chose randomly between them (Geraci and Surian, 2011).

Finally, reports using reward/punishment tasks have yielded inconsistent results. In one report (Meristo and Surian, 2013), 10-month-olds first saw computer-animated events in which

a fair and an unfair distributor divided two items between two recipients; a bystander either observed these distributions (informed condition) or was prevented from doing so by a partial barrier (uninformed condition). Next, the bystander gave a reward (a strawberry) to either the fair or the unfair distributor.<sup>1</sup> Infants in the informed condition looked significantly longer when the bystander rewarded the unfair as opposed to the fair distributor, whereas infants in the uninformed condition looked equally at the two events, suggesting that they understood that the bystander lacked the necessary information to distinguish between the distributors. However, in additional experiments (Meristo and Surian, 2013, 2014), 10-month-olds also looked significantly longer when a newcomer who was absent during the distributors' actions, and therefore should have been uninformed, (a) rewarded the unfair as opposed to the fair distributor or (b) punished the unfair as opposed to the fair distributor (e.g., by taking away a strawberry).

# Two Hypotheses

The results reviewed in the preceding sections indicate that by the second year of life, infants expect a distributor to divide resources fairly between two similar recipients: They detect a violation when shown unequal distributions, they prefer fair over unfair distributors, and they selectively associate praise with fair distributors and admonishment with unfair distributors. In contrast, findings with infants in the first year of life were mixed, leaving it unclear at what age and under what conditions young infants first demonstrate an expectation of fairness. In particular, consider the divergent results from the allocation-outcome tasks of Ziv and Sommerville (2017) and Meristo et al. (2016). At least two hypotheses can be offered for these conflicting results; these hypotheses focus on different procedural variations between the two tasks and invoke different mechanisms to explain the emergence of fairness in infancy.

One (shift) hypothesis focuses on the different ages tested in the two tasks: Ziv and Sommerville (2017) obtained negative results with 6- and 9-month-olds, while Meristo et al. (2016) obtained positive results with 10-month-olds. According to this hypothesis, an important developmental shift takes place at about 10 months of age that leads to the acquisition of expectations about fairness. This shift occurs largely through socialization processes: As infants interact with others (e.g., parents, other caregivers, siblings) in their everyday social environments, they come to learn that resources are typically distributed equally between similar recipients (e.g., Sommerville et al., 2013; Bloom and Wynn, 2016; Ziv and Sommerville, 2017). From this perspective, it would make sense that even at 12–15 months of age, infants with older siblings were more likely to demonstrate sensitivity to fairness than were infants without siblings (Ziv and Sommerville, 2017). The presence of older siblings would result in more opportunities to learn about fairness and hence would "spur the developmental shift in infants' fairness expectations" (Ziv and Sommerville, 2017, p. 1044).

The other (continuity) hypothesis focuses on the different fairness violations used in the two tasks: Ziv and Sommerville (2017) obtained negative results with a 3:1 violation, while Meristo et al. (2016) obtained positive results with a 2:0 violation. According to this explanation, an abstract expectation of fairness emerges very early in life, as part of the basic structure of human moral cognition (e.g., Shweder et al., 1997; Dawes et al., 2007; Jackendoff, 2007; Premack, 2007; Rai and Fiske, 2011; Baumard et al., 2013; Graham et al., 2013; Baillargeon et al., 2015; Bian et al., 2018; Buyukozer Dawkins et al., in press). However, this expectation can at first be demonstrated only under limited conditions, which gradually broaden with experience. For example, it might be that young infants are initially able to process distributions of two items, but not distributions of four or more items; that they are initially able to detect qualitative violations, in which one recipient gets something and the other gets nothing (e.g., a 2:0 or a 4:0 violation), but not quantitative violations, in which both recipients get something but in differing amounts (e.g., a 3:1 or a 7:1 violation); or that they are initially able to detect quantitative violations when the numerical distance between the two amounts allocated is larger (e.g., a 7:1 violation), but not when it is smaller (e.g., a 3:1 violation). Regardless of which of these possibilities turns out to be correct (we return to them in the section "General Discussion"), the main thrust of the continuity hypothesis is that an expectation of fairness emerges very early in life, as part of the "first draft" of moral cognition (Graham et al., 2013).

Which of the two preceding hypotheses is more likely to be correct? Do infants acquire an expectation of fairness toward the end of the first year of life, as the shift hypothesis suggests, or is this expectation present beginning early in the first year but observable only under limited conditions, as the continuity hypothesis suggests? The present research sought to answer these questions.

# The Present Research

According to the continuity hypothesis, an expectation of fairness is present early in life but can initially be observed only under limited conditions. In particular, infants may initially be able to detect simple 2:0 violations, but not more challenging 3:1 violations. The present experiments tested three predictions from this hypothesis, using allocation-outcome tasks. A first prediction was that 9-month-olds would give evidence of sensitivity to fairness if presented with a 2:0 violation. Experiments 1 and 2 both tested this prediction, using slightly different procedures that made possible different control conditions. A second prediction, tested in Experiment 3, was that 9-month-olds would succeed in detecting a 2:0 but not a 3:1 violation. Finally, a third prediction was that infants younger than 9 months might also succeed in detecting a 2:0 violation. To evaluate this prediction, Experiment 4 tested 4-month-olds using a design similar to that of Experiment 1.

We reasoned that finding the predicted results in all four experiments (a) would confirm the positive findings of Meristo

<sup>1</sup>Although Meristo and Surian (2013) described their task as a reward task, it could also be construed as an affiliative-preference task: Perhaps infants simply expected the bystander to prefer and approach the fair over the unfair distributor. In this view, the same results would have been obtained had the bystander simply approached each distributor, without giving them a strawberry (for evidence that young infants both form affiliative preferences and expect others to share these preferences, see Hamlin et al., 2007).

et al. (2016) with a 2:0 violation and extend them to younger infants, (b) would confirm the negative findings of Ziv and Sommerville (2017) with a 3:1 violation, and more generally (c) would provide evidence for the continuity hypothesis.

# EXPERIMENT 1

Experiment 1 examined whether 9-month-old infants would succeed in detecting a 2:0 fairness violation in an allocationoutcome task. Infants were assigned to an experimental or an inanimate-control condition and saw live events (adapted from Sloane et al., 2012) in which a female experimenter divided two cookies either fairly or unfairly between two puppets. Each infant sat on a parent's lap facing a large puppet-stage apparatus; at the start of each trial, a supervisor lifted a curtain at the front of the apparatus. In each condition, infants received one familiarization trial and one test trial, and each trial had an initial phase and a final phase.

The familiarization trial served to introduce the puppets. At the start of the trial in the experimental condition (**Figure 1A**), two identical penguin puppets (operated by a hidden assistant) protruded from openings in the back wall of the apparatus; a small placemat lay in front of each puppet. During the initial (12-s) phase of the trial, the penguins "danced" by tilting from side to side every second. During the final phase, the penguins paused upright, and infants watched this paused scene until the trial ended (for criteria, see the section "Procedure").

During the initial (26-s) phase of the test trial, the penguins danced until a female experimenter opened a curtained window in the right wall of the apparatus. The penguins turned toward her and watched as she brought in a plate with two identical cookies and placed it on the apparatus floor. The experimenter then announced, "I have cookies!," and the penguins responded excitedly, "Yay, yay!" in two distinct female voices (the hidden assistant and the supervisor spoke in unison). Next, the experimenter placed a cookie on the placemat in front of one penguin (counterbalanced across infants); she then placed the other cookie in front of either the same penguin (unequal event) or the other penguin (equal event). Finally, the experimenter left with her empty plate, closing the curtain at her window, and the penguins looked down at their placemats and paused. During the final phase of the trial, infants watched this paused scene until the trial ended. We reasoned that if 9-month-olds expected the experimenter to divide the cookies fairly between the two puppets and could detect the 2:0 violation in the unequal event, then they should look significantly longer if shown that event as opposed to the equal event.

The inanimate-control condition (**Figure 1B**) served to rule out low-level interpretations of positive results in the experimental condition, such as a baseline preference for asymmetrical displays or for displays involving two cookies placed side by side. In previous experiments, researchers have consistently found that infants hold no expectation about how a distributor will divide windfall resources between two inanimate entities, suggesting that they appropriately restrict their expectation of fairness to animate entities (e.g., Sloane et al., 2012; Meristo et al., 2016; Ziv and Sommerville, 2017). In line with these findings, infants saw events identical to those in the experimental condition except that the penguins were inanimate: They did not move or talk and simply faced forward. Because the penguins gave no evidence of self-propulsion or agency (e.g., Setoh et al., 2013), we predicted that infants would view them as inanimate penguin-shaped toys, would hold no expectations about how the experimenter would divide the cookies between them, and hence would look about equally at the equal and unequal events.

# Materials and Methods

### Sample-Size Considerations

In a recent report, Jin and Baillargeon (2017) examined sociomoral reasoning in infants using the violation-ofexpectation method, a 2 × 2 between-subject design, and live events, as we did in the present research. The average Condition × Event effect size (η 2 p ) in their experiments was 0.19. An a priori power analysis using G∗Power based on this value indicated that, with power set at 0.80 and alpha set at 0.05, the minimum number of participants required per cell (i.e., per combination of condition and event) was nine participants (Faul et al., 2007). In line with this analysis, our experiments used 10 participants per cell, for a total of 20 per condition and 40 per experiment.

Although this sample size is admittedly small and reflects the limitations of infant data collection in a small town, a number of considerations may help alleviate potential concerns. First, it should be noted that sample sizes of 8–12 infants per cell are common in violation-of-expectation tasks with betweensubject designs, both in the area of sociomoral reasoning (e.g., Meristo and Surian, 2013; Jin and Baillargeon, 2017; Surian and Franchin, 2017; Bian et al., 2018; Surian et al., 2018; Wang and Henderson, 2018) and in other areas of infant cognition (e.g., Pitts et al., 2015; Kibbe and Leslie, 2016; Wellman et al., 2016; Scott, 2017; Stavans et al., 2018). Second, Experiments 2 and 3 provided conceptual replications of Experiment 1, again with 9 month-old infants. Third, following Experiment 3, we report an overall analysis of the pooled data from all three experiments (n = 120, with 30 infants per cell). Finally, following Experiment 4, which extended the results of Experiment 1 to 4-month-olds, we report a mini meta-analysis of the positive results from all four experiments. Thus, despite the relatively small number of infants per experiment, we believe that together, these multiple replications and overall analyses help provide a sound basis for our conclusions.

### Participants

Participants were 40 healthy term 9-month-olds, 20 male (range = 8 months, 9 days to 10 months, 8 days, M = 9 months, 10 days). Another 10 infants were excluded, 7 because they looked for the maximum time allowed in the test trial,<sup>2</sup>

<sup>2</sup>Across Experiments 1–3, 15 of the total 146 9-month-olds tested (10%) were excluded because they looked for the maximum amount allowed in the test trial (i.e., "reached ceiling"). The distribution of these 15 infants, in terms of the condition they were assigned to (and the test event they saw), was: Experiment 1, 4 experimental (1 unequal, 3 equal) and 3 inanimate-control (1 unequal, 2

1 because the infant was inattentive and looked away for 75% of the test trial, and 2 because their test looking times were over 3 standard deviations from the condition mean (both were in the experimental condition and saw the equal event). Half of the infants were randomly assigned to the experimental condition, and half to the inanimate-control condition; within each condition, half of the infants saw the equal event, and half saw the unequal event.

The infants' names in all experiments were obtained from a university-maintained database of parents interested in participating in child-development research. Parents were offered either a small gift (e.g., a children's book) or reimbursement for their travel expenses but were not otherwise compensated for their participation. Each infant's parent gave written informed consent, and the protocol was approved by the Institutional Review Board at the University of Illinois at Urbana–Champaign.

#### Apparatus and Stimuli

The apparatus consisted of a brightly lit display booth (201 cm high × 102 cm wide × 58 cm deep) with a large opening (56 cm × 95 cm) in its front wall; between trials, the supervisor lowered a curtain in front of this opening. Inside the apparatus, the side walls were painted white, and the back wall and floor were covered with pastel adhesive paper.

The experimenter wore a green shirt, knelt at a window (51 cm × 38 cm) in the right wall of the apparatus, and slid a white curtain to open or close her window. Another curtain behind the experimenter hid the testing room. As the test trial unfolded, the experimenter looked naturally at the puppets and at the objects she acted on, but she never made eye contact with the infants.

The two puppets were identical penguins (about 22 cm × 12 cm × 9 cm at their largest points) made of black and white furry fabric; each penguin had a large head with an orange beak. The penguins protruded from openings (each 20 cm × 12.5 cm and filled with beige felt) located 20 cm apart in the back wall of the apparatus, 5 cm above the floor. In the experimental condition, an assistant sat behind the back wall and manipulated the penguins; in the inanimate-control condition, the penguins rested upright on hidden wooden posts. Centered beneath each penguin was a rectangular white placemat (0.5 cm × 20 cm × 13 cm). The cookies were plastic vanilla sandwich cookies (each about 1 cm × 3 cm × 7 cm), and they were introduced by the experimenter on a beige ceramic plate (2.5 cm × 20 cm in diameter).

To help the experimenter and the assistant adhere to the events' scripts, a metronome beat softly once per second. During each testing session, one camera captured an image of the events, and another camera captured an image of the infant. The two images were combined, projected onto a computer screen located behind the apparatus, and monitored by the supervisor to confirm that the events followed the prescribed scripts. Recorded sessions were also checked off-line for experimenter accuracy.

#### Procedure

Each infant sat on a parent's lap centered in front of the apparatus; parents were instructed to remain silent and to close their eyes during the test trial. Each infant's looking behavior was monitored by two hidden observers who watched the infant through peepholes in cloth-covered frames on either side of the apparatus; the observers could not see the events from their viewpoints, and they did not know which test event was presented to the infant.<sup>3</sup> Each observer held a game controller linked to a computer and pressed a button when the infant looked at the event. Looking times during the initial and final phases of each trial were computed separately, using the primary observer's responses. Interobserver agreement during the final phase of each trial was measured as the proportion of 100-ms intervals in which the observers agreed about whether or not the infant was looking at the event; agreement was calculated for all 40 infants and averaged 93% per trial per infant.

Infants were highly attentive during the initial phases of the familiarization and test trials; across conditions, they looked, on average, for 93% of each initial phase. The final phase of each trial ended when infants (a) looked away for 2 consecutive seconds after having looked for at least 5 (familiarization) or 8 (test) cumulative seconds or (b) looked for a maximum of 45 cumulative seconds. A slightly longer minimum look was used in the test trial to give infants the opportunity to compare and evaluate the two puppets' allocations before the trial could end.

Finally, preliminary analyses of the test data revealed no significant interaction of condition and event with infant's sex or with which puppet received the first cookie, both Fs(1,32) ≤ 1.41, p ≥ 0.244; the data were therefore collapsed across the latter two factors in subsequent analyses.

#### Results and Discussion

Looking times during the final phase of the familiarization trial were subjected to an analysis of variance (ANOVA) with condition (experimental or inanimate-control) as a betweensubject factor. This effect was not significant, F(1,38) = 0.22, p > 0.250, suggesting that infants in the experimental (M = 18.34, SD = 12.99) and inanimate-control (M = 16.38, SD = 13.27) conditions tended to look equally at the puppets (for data from all experiments, see Dataset in **Supplementary Material**).

equal); Experiment 2, 3 experimental (equal) and 3 cover-control (1 unequal, 2 equal); and Experiment 3, 1 experimental (equal) and 1 control (unequal). Ceiling infants are typically eliminated on the assumption that they needed additional familiarization to process the events they were shown (for other reports with eliminated ceiling babies, see e.g., Scott et al., 2015; Baillargeon and DeJong, 2017; Jin et al., 2018; Margoni et al., 2018). In line with this assumption, the ceiling infants in Experiments 1–3 looked significantly longer during the familiarization trial (n = 15, M = 33.77, SD = 20.62) than did the infants included in the experiments (n = 120, M = 18.24, SD = 12.59), F(1,133) = 17.24, p < 0.0001. Of course, these ceiling infants might have performed better had they been provided with more or more varied familiarization trials, for a better introduction to the task. However, because increasing the number of familiarization trials also increases the risk of inadvertently inducing subtle novelty or familiarity preferences that can then affect test responses (e.g., Wang et al., 2004), researchers often err in the direction of using as few familiarization trials as possible, and we followed this practice here.

<sup>3</sup>At the end of the test trial in each experiment, the primary observer was asked to guess whether the infant had seen an unequal or an equal event during the trial. Across Experiments 1–4, the primary observer guessed correctly for 20/40, 20/38 (the observer failed to make a guess for two infants), 20/40, and 20/40 infants, respectively, all ps > 0.250 (cumulative binomial probability).

Looking times during the final phase of the test trial (**Figure 2**) were subjected to an ANOVA with condition (experimental or inanimate-control) and event (unequal or equal) as between-subject factors. The only significant effect was the Condition × Event interaction, F(1,36) = 5.19, p = 0.029, η 2 <sup>p</sup> = 0.13 (no such interaction was found in the familiarization trial, F(1,36) = 0.01, p > 0.250). Planned comparisons revealed that infants in the experimental condition looked significantly longer at the unequal (M = 23.04, SD = 8.11) than at the equal (M = 14.05, SD = 4.49) event, F(1,36) = 6.85, p = 0.013, Cohen's d = 1.37, whereas infants in the inanimate-control condition looked about equally at the unequal (M = 14.28, SD = 6.36) and equal (M = 16.35, SD = 10.46) events, F(1,36) = 0.36, p > 0.250, d = 0.24. Non-parametric Wilcoxon rank-sum tests confirmed the results of the experimental (Z = 2.61, p = 0.009) and inanimatecontrol (Z = −0.26, p > 0.250) conditions.

Infants expected the experimenter to divide the two cookies equally between the two animated penguins, but they held no particular expectation about how the experimenter would divide the cookies between the two inanimate penguins. Together, these results provided evidence that 9-month-old infants already expect a distributor to divide two items equally between two similar recipients, thus supporting the continuity hypothesis.

# EXPERIMENT 2

Experiment 2 had two goals: One was to confirm the positive result of the experimental condition in Experiment 1, and the other was to address a possible alternative interpretation of this result. Specifically, infants might have looked longer at the unequal event not because they expected a distributor to divide windfall resources equally between similar individuals, but because they expected similar individuals to have similar numbers of objects (e.g., one cookie each; Welder and Graham, 2001). To rule out this alternative interpretation, infants in Experiment 2 were assigned to a cover-experimental or a covercontrol condition. In the cover-experimental condition, the experimenter first removed covers placed over the penguins' placemats and then proceeded to distribute the two cookies, as in Experiment 1. The cover-control condition (adapted from Sloane et al., 2012; see also Meristo et al., 2016) was identical except that the experimenter no longer brought in and distributed the two cookies: In each event, she simply removed the covers to reveal the cookies already resting on the penguins' placemats. If infants merely expected similar puppets to have similar numbers of items, then infants in both the cover-experimental and cover-control conditions should look significantly longer at the unequal than at the equal event. However, if infants expected the experimenter to act fairly when she distributed the cookies to the puppets, but held no particular expectation about her actions when she simply revealed the cookies, then infants in the coverexperimental condition should look significantly longer at the unequal than at the equal event, whereas infants in the covercontrol condition should look about equally at the two events.

Infants in the cover-experimental condition (**Figure 3A**) first received the same familiarization trial as in the experimental condition of Experiment 1, with the animated penguins dancing

from side to side. Infants then received one test trial. At the start of the initial (42-s) phase, opaque rectangular covers rested in front of the penguins, over their empty placemats; the penguins (who were clearly visible above the covers) danced until the experimenter opened her window. The penguins then watched as the experimenter grasped one of the covers, lifted it, removed it from the apparatus through her window, and then repeated these actions with the other cover. Next, the experimenter brought in the plate with the two cookies, and the events proceeded exactly as in Experiment 1. Infants saw either the equal or the unequal event; for each event, which cover was removed first and which penguin received the first cookie were counterbalanced across infants.

The cover-control condition (**Figure 3B**) was identical with the following exceptions. At the start of the initial (26-s) phase of the test trial, the cookies were already on the penguins' placemats, hidden under the covers. The experimenter removed the covers, one at a time, to reveal the cookies; in the unequal event, both cookies were in front of the same penguin; in the equal event, one cookie was in front of each penguin. The experimenter then left, and the penguins looked down at their placemats and paused. The experimenter did not speak in this condition, but the penguins did greet her ("Yay, yay!") when she arrived. Which cover was removed first and which penguin had both cookies (unequal event only) were counterbalanced across infants.

# Materials and Methods

#### Participants

Participants were 40 healthy term 9-month-olds, 18 male (range = 8 months, 1 day to 10 months, 8 days, M = 9 months, 2 days). Another 12 infants were excluded, 6 because they looked for the maximum time allowed in the test trial,<sup>2</sup> 4 because they were fussy (2), distracted (1), or subjected to parental interference (1), and 2 because their test looking times were over 3 standard deviations from the condition mean (one in each condition, and both saw the equal event). Half of the infants were randomly assigned to the cover-experimental condition, and half to the cover-control condition; within each condition, half of the infants saw the equal event, and half saw the unequal event.

#### Apparatus, Stimuli, and Procedure

The apparatus and stimuli were identical to those in Experiment 1, with the addition of two identical tan rectangular covers (each 10 cm × 22.5 cm × 15.5 cm, with a wooden knob at the top). The procedure was also identical to that in Experiment 1. Infants were highly attentive during the initial phases of the familiarization and test trials; across conditions, they looked, on average, for 97% of each initial phase. Interobserver agreement during the final phase of the test trial was calculated for all 40 infants and averaged 94% per trial per infant. Finally, preliminary analyses of the test data revealed no significant interaction of condition and event with infant's sex or with which cover was removed first, both Fs(1,32) ≤ 1.64, ps ≥ 0.209; the data were therefore collapsed across the latter two factors in subsequent analyses.

# Results and Discussion

Looking times during the final phase of the familiarization trial were analyzed by means of an ANOVA with condition (coverexperimental or cover-control) as a between-subject factor. This effect was not significant, F(1,38) = 0.30, p > 0.250, suggesting that infants in the cover-experimental (M = 20.52, SD = 12.70) and cover-control (M = 18.17, SD = 14.64) conditions tended to look equally at the puppets.

Looking times during the final phase of the test trial (**Figure 2**) were subjected to an ANOVA with condition (cover-experimental or cover-control) and test event (unequal or equal) as between-subjects factors. The only significant effect was the Condition × Event interaction, F(1,36) = 6.40, p = 0.016, η 2 <sup>p</sup> = 0.15 (no such interaction was found in the familiarization trial, F(1,36) = 0.34, p > 0.250). Planned comparisons revealed that infants in the cover-experimental condition looked significantly longer at the unequal (M = 22.61, SD = 8.66) than at the equal (M = 14.56, SD = 1.98) event, F(1,36) = 7.26, p = 0.011, d = 1.28, whereas infants in the cover-control condition looked about equally at the unequal (M = 14.13, SD = 6.44) and equal (M = 16.77, SD = 7.63) events, F(1,36) = 0.78, p > 0.250, d = 0.37. Wilcoxon rank-sum tests confirmed the results of the cover-experimental (Z = 2.04, p = 0.041) and cover-control (Z = −0.87, p > 0.250) conditions.

When the experimenter brought in and distributed the two cookies, infants expected her to do so fairly, and they detected a violation when she instead gave both cookies to the same puppet. However, when the experimenter simply lifted covers to reveal the cookies already resting on the puppets' placemats, infants held no particular expectation about how many cookies each puppet would have. Infants thus bring to bear considerations of fairness when resources are distributed between similar individuals, but not when resources already in individuals' possession are revealed.

# EXPERIMENT 3

According to the continuity hypothesis, the discrepancy between the positive findings of Meristo et al. (2016) and the negative findings of Ziv and Sommerville (2017) was not due to the fact that the former tested 10-month-olds and the latter 9-montholds; rather, it was due to the fact that the former used a simple 2:0 violation and the latter a more challenging 3:1 violation. Experiments 1 and 2 provided initial evidence for this hypothesis by showing that 9-month-olds could indeed detect a 2:0 fairness violation. Building on these results, Experiment 3 sought to confirm that 9-month-olds would detect a 2:0 violation (two-item condition), but not a 3:1 violation (four-item condition).

A secondary goal of Experiment 3 was to address a possible alternative interpretation of the positive results of Experiments 1 and 2: Infants might have looked longer at the unequal event not because they expected equal distributions, but because they expected equal interactions. In the cover-experimental condition of Experiment 2, for example, the experimenter first removed the cover from each puppet's placemat – but then she gave both cookies to the same puppet, thereby excluding

the disadvantaged puppet from these last interactions. Because infants and older children have been shown to be sensitive to exclusion cues (e.g., Tronick, 2007; Over and Carpenter, 2009; Abrams et al., 2011), these unequal interactions gave rise to the possibility that infants were showing sensitivity to exclusion, rather than to unfairness (e.g., DesChamps et al., 2016). Some evidence against this possibility came from a control condition by Meristo et al. (2016): Instead of distributing two strawberries either equally or unequally between two recipients, as in the experimental condition, the distributor performed the same actions without distributing strawberries (i.e., approached either each recipient in turn or the same recipient twice). Infants in this control condition looked about equally at the two events, suggesting that they held no expectation that the distributor would approach each recipient equally or include both recipients in its social exchanges.<sup>4</sup> This negative result makes it unlikely that infants in the experimental condition of Meristo et al. (2016), or in the experimental conditions of Experiments 1 and 2, looked longer at the unequal event because they detected a violation when the distributor appeared to ignore the disadvantaged recipient when distributing items. Nevertheless, to provide additional evidence against this exclusion interpretation, in Experiment 3 we used a different distribution procedure, adapted from Schmidt and Sommerville (2011), which equated the distributor's interactions with the two recipients, irrespective of whether distributions were equal or unequal. Specifically, rather than distributing each cookie one by one, the experimenter now divided the cookies between two placemats and then slid one placemat toward each puppet. With this mode of distribution, differences between the two conditions, or between the unequal and equal events within each condition, could not be attributed to differences in how many times the experimenter interacted with each puppet.

Infants in both conditions first received the same familiarization trial as in the experimental condition of Experiment 1, with one exception: The placemats now rested back to back, 2.5 cm apart, at the front of the apparatus, centered between the two puppets in the back wall. Infants then received one test trial. At the start of the initial (33-s) phase in the two-item condition (**Figure 4A**), the penguins danced until the experimenter opened her window. As before, the experimenter brought in a plate with two cookies and announced, "I have cookies!," to which the penguins responded, "Yay, yay!." Next, the experimenter put one cookie on the back placemat and then one cookie on the front placemat (equal event), or she put both cookies, one at a time, on the back placemat (unequal event); the experimenter always started with the back placemat to make it easier for infants to see what was put on each placemat. The experimenter then paused briefly, to allow infants to compare the two placemats. Finally, the experimenter slid the back placemat toward one puppet and then the front placemat toward the other puppet. The experimenter then left, and the puppets looked down at their placemats and paused, as in Experiment 1. Each infant saw either the equal or the unequal event; in each event, which penguin received the back placemat was counterbalanced across infants. The four-item condition (**Figure 4B**) was identical with two exceptions. First, the experimenter brought in four cookies and either put two on each placemat (equal event) or put three on the back placemat and one on the front placemat (unequal event). Second, the initial phase of the test trial was extended from 33 to 39 s, as it took the experimenter slightly divide to divide four as opposed to two cookies.

Based on the positive results of Experiments 1 and 2, we predicted that the 9-month-olds in the two-item condition would again detect the fairness violation in the unequal event and hence would look significantly longer if shown that event as opposed to the equal event. In contrast, based on the negative results of Ziv and Sommerville (2017) with 9-month-olds, we predicted that infants in the four-item condition would be unable to detect the fairness violation they were shown and hence would tend to look equally at the unequal and equal events. Together, these results would provide strong evidence that young infants do possess an expectation of fairness but are initially very limited in the violations they can detect.

# Materials and Methods Participants

Participants were 40 healthy term 9-month-olds, 17 male (range = 8 months, 1 day to 9 months, 29 days, M = 9 months, 1 day). Another 4 infants were excluded, 2 because they looked for the maximum time allowed in the test trial,<sup>2</sup> 1 because the infant was distracted, and 1 because the infant's test looking time was over 3 standard deviations from the condition mean (the infant was in the two-item condition and saw the equal event). Half of the infants were randomly assigned to the two-item condition, and half to the four-item condition; within each condition, half of the infants saw the equal event, and half saw the unequal event.

# Apparatus, Stimuli, and Procedure

The apparatus and stimuli were identical to those in Experiment 1, with two exceptions: Four cookies were used in the fouritem condition, and felt was attached to the undersides of the placemats so that they slid quietly on the apparatus floor. The procedure was also identical to that of Experiment 1. Infants were highly attentive during the initial phases of the familiarization and test trials; across conditions, they looked, on average, for 96% of each initial phase. Interobserver agreement during the final phase of each trial was calculated for all 40 infants and averaged 94% per trial per infant. Finally, preliminary analyses of the test data revealed no significant interaction of condition and event with infant's sex or with which penguin received the back placemat, both Fs(1,32) ≤ 1.61, ps ≥ 0.213; the data were therefore collapsed across these latter two factors in subsequent analyses.

<sup>4</sup> In their experimental condition, Meristo et al. (2016) hid the distributor's actions by placing a large occluder at the center of the scene; at the end of each event, this occluder was removed to reveal that each potential recipient had received one strawberry (equal event) or that one recipient had received both strawberries (unequal event). This condition might also be taken to provide evidence against the exclusion interpretation, because infants did not witness the interactions between the distributor and recipients. However, because infants could easily infer what interactions had taken place behind the occluder, these results do not conclusively rule out the exclusion interpretation.

# Results and Discussion

fpsyg-10-00116 February 16, 2019 Time: 17:39 # 12

Looking times during the final phase of the familiarization trial were subjected to an ANOVA with condition (two- or fouritem) as a between-subject factor. This effect was not significant, F(1,38) = 1.54, p = 0.222, suggesting that infants in the two-item (M = 15.80, SD = 8.58) and four-item (M = 20.22, SD = 13.40) conditions tended to look equally at the puppets.

Looking times during the final phase of the test trial (**Figure 2**) were subjected to an ANOVA with condition (two- or four-item) and test event (unequal or equal) as between-subject factors. The only significant effect was the Condition × Event interaction, F(1,36) = 4.59, p = 0.039, η 2 <sup>p</sup> = 0.11 (no such interaction was found in the familiarization trial, F(1,36) = 1.55, p = 0.221). Planned comparisons revealed that infants in the two-item condition looked significantly longer at the unequal (M = 22.77, SD = 9.09) than at the equal (M = 15.37, SD = 4.79) event, F(1,36) = 4.83, p = 0.035, d = 1.01, whereas infants in the four-item condition looked about equally at the unequal (M = 17.76, SD = 6.76) and equal (M = 20.57, SD = 8.70) events, F(1,36) = 0.70, p > 0.250, d = 0.36. Wilcoxon rank-sum tests confirmed the results of the two-item (Z = 2.04, p = 0.041) and four-item (Z = −0.49, p > 0.250) conditions.

Consistent with the positive findings of Experiments 1 and 2, infants in the two-item condition detected a fairness violation when one puppet received a placemat with two cookies and the other puppet received a placemat with no cookies. Moreover, consistent with the negative findings of Ziv and Sommerville (2017), infants in the four-item condition failed to detect a violation when one puppet received a placemat with three cookies while the other puppet received a placemat with one cookie. Because the experimenter's interactions with the puppets were identical in the two conditions (she simply slid one placemat toward each puppet), these diverging results most likely stemmed from the numbers of items involved in each violation: Infants were able to detect a 2:0 violation, but not a 3:1 violation. This last finding is particularly striking because the two placemats were initially positioned back-to-back at the front of the apparatus, making it easy for infants to determine via one-to-one correspondence that the back placemat had two more cookies than the front placemat. We return in the section "General Discussion" to possible reasons why infants still failed to detect this violation.

# OVERALL ANALYSES OF EXPERIMENTS 1–3

To test the robustness of our results, we conducted overall analyses of the test data from Experiments 1–3. In these analyses, we pooled the data from the experimental (Experiment 1), cover-experimental (Experiment 2), and two-item (Experiment 3) conditions into a combined-experimental condition (N = 60), and the data from the inanimate-control (Experiment 1), covercontrol (Experiment 2), and four-item (Experiment 3) conditions into a combined-control condition (N = 60). As can be seen in **Figure 2**, infants in the first three conditions looked longer if shown the unequal as opposed to the equal event, suggesting that they detected a violation in the unequal event; in contrast, infants in the last three conditions tended to look equally at the two events, suggesting (at the very least) that they had no baseline preferences for asymmetrical displays or for displays depicting groups of two or three cookies.

Preliminary analyses of the test data in these combinedexperimental and combined-control conditions revealed no significant interaction of condition and event with infants' sex or with which puppet the experimenter approached first (in Experiment 1, when giving a cookie; in Experiment 2, when removing a cover; and in Experiment 3, when giving a placemat), both Fs(1,112) ≤ 0.98, ps ≥ 0.250; the data were therefore collapsed across the latter two factors in the following analyses.

We first conducted an ANOVA with condition (combinedexperimental or combined-control) and event (equal or unequal) as between-subject factors. This analysis yielded a significant main effect of event, F(1,116) = 4.63, p = 0.034, and a significant Condition × Event interaction, F(1,116) = 16.52, p < 0.0001, η 2 <sup>p</sup> = 0.12.<sup>5</sup> As expected, planned comparisons revealed that infants in the combined-experimental condition looked significantly longer if shown the unequal (M = 22.81, SD = 8.33) as opposed to the equal (M = 14.66, SD = 3.86) event, F(1,116) = 19.33, p < 0.0001, d = 1.26, whereas infants in the combined-control condition looked about equally at the unequal (M = 15.39, SD = 6.52) and equal (M = 17.90, SD = 8.90) events, F(1,116) = 1.83, p = 0.179, d = −0.32. Wilcoxon ranksum tests confirmed the results of the combined-experimental (Z = 3.96, p < 0.0001) and combined-control (Z = −0.91, p > 0.250) conditions.

Next, we focused on the combined-experimental condition only and examined the effects of two additional variables. The first was whether infants with and without older siblings differed in their ability to detect the violation in the unequal event. Recall that Ziv and Sommerville (2017) found that at 12– 15 months, infants with siblings looked longer at a 3:1 than at a 2:2 outcome when both were displayed simultaneously, whereas infants without siblings looked about equally at the two outcomes. In the combined-experimental condition, 30 infants had one or more siblings (14 saw the unequal event and 16 saw the equal event), and 30 did not (the corresponding numbers were 16 and 14). Looking times were compared by means of an ANOVA with siblings (yes or no) and event (unequal or equal) as between-subject factors. The main effect of sibling was not significant, nor was the Sibling × Event interaction, both Fs(1,56) ≤ 1.20, ps > 0.250. The only significant effect was the main effect of event, F(1,56) = 22.76, p < 0.0001, η 2 <sup>p</sup> = 0.29. Planned comparisons indicated that infants with siblings, F(1,56) = 15.49, p = 0.0002, d = 1.35, and infants without siblings, F(1,56) = 7.90, p = 0.007, d = 1.10, both looked significantly longer if shown the unequal as opposed to the equal event (with siblings: unequal, M = 22.54, SD = 9.01, equal, M = 13.18, SD = 3.90; without siblings: unequal, M = 23.04, SD = 7.97, equal, M = 16.36, SD = 3.14). Wilcoxon rank-sum

<sup>5</sup>This interaction remained significant when the five outliers from Experiments 1–3 were included in the analyses (n = 125), F(1,121) = 10.42, p = 0.002, η 2 <sup>p</sup> = 0.08.

tests confirmed the positive results obtained with the infants with siblings (Z = 3.01, p = 0.003) and without siblings (Z = 2.33, p = 0.020). Infants in the combined-experimental condition were thus able to detect the simple 2:0 violation they were shown whether they had older siblings or not.

The second variable we explored was age. Since infants in the combined-experimental condition varied in age from 8 to 10 months, we divided them via a median split into a younger, 8 month-old group (N = 30, range = 8 months, 1 day to 9 months, 3 days, M = 8 months, 17 days) and an older, 9-month-old group (N = 30, range = 9 months, 7 days to 10 months, 8 days, M = 9 months, 18 days). In the younger group, 18 infants saw the unequal event, and 12 infants saw the equal event; in the older group, these numbers were reversed. Our main goal here was to establish whether the younger half of our sample was as likely as the older half to detect the 2:0 violation they were shown. This analysis was identical to that above except that sibling was replaced by age (8 or 9 months) as a between-subject factor. The main effect of age was not significant, nor was the Age × Event interaction, both Fs(1,56) ≤ 0.10, ps > 0.250. Once again, only the main effect of event was significant, F(1,56) = 22.09, p < 0.0001, η 2 <sup>p</sup> = 0.28. Planned comparisons indicated that 8-month-olds, F(1,56) = 12.62, p = 0.0008, d = 1.23, and 9 month-olds, F(1,56) = 9.58, p = 0.003, d = 1.35, both looked significantly longer if shown the unequal as opposed to the equal event (8 months: unequal, M = 22.98, SD = 9.36, equal, M = 14.24, SD = 3.76; 9 months: unequal, M = 22.55, SD = 6.87, equal, M = 14.94, SD = 4.00). Wilcoxon rank-sum tests confirmed the positive results obtained with the 8-month-olds (Z = 2.67, p = 0.008) and 9-month-olds (Z = 2.84, p = 0.005). Infants in the combined-experimental condition were thus able to detect the simple 2:0 violation they were shown whether they were 8 or 9 months of age.

# EXPERIMENT 4

As predicted by the continuity hypothesis, the 8- and 9-montholds in Experiments 1–3 could detect a simple 2:0 fairness violation but not a more challenging 3:1 fairness violation. In Experiment 4, we began to explore whether infants younger than 8 months might also be able to detect a 2:0 violation. Four-month-olds were tested using a design similar to that of Experiment 1; half of the infants were assigned to the experimental condition (**Figure 5A**), and half to the inanimatecontrol condition (**Figure 5B**).

To make our events more appropriate for these very young subjects, we introduced three modifications. First, we used Elmo puppets, whose bright red color and large eyes seemed likely to capture the attention of 4-month-olds. Second, we gave infants two familiarization trials. The first served to introduce the puppets and was similar to that in Experiment 1; in the experimental condition, the puppets danced from side to side, and in the inanimate-control condition, they remained stationary. The second trial served to introduce the experimenter. During the (6-s) initial phase, she opened her window, deposited her plate of cookies on the apparatus floor, and then paused for the final phase of the trial (the puppets were absent in this trial). Third, during the final phase of the test trial in the experimental condition, the puppets moved slightly from side to side while bent over their placemats (pilot data suggested that the sudden change from moving to still Elmos seemed to be upsetting for some infants; this was not an issue in the inanimate-control condition because the Elmos were inanimate throughout the trials).

We reasoned that if 4-month-olds already possess an expectation of fairness and can detect simple 2:0 fairness violations, then infants in the experimental condition should look significantly longer if shown the unequal as opposed to the equal event, whereas infants in the inanimate-control condition should look about equally at the two events, as in Experiment 1.

# Materials and Methods

#### Participants

Participants were 40 healthy term 4-month-olds, 20 male (range = 3 months, 21 days to 5 months, 18 days, M = 4 months, 21 days). Another 10 infants were excluded, 6 because they looked for the maximum time allowed in the test trial (4 were in the experimental condition, 2 were in the inanimate-control condition, and all saw the equal event), 2 because they were distracted or inattentive, and 2 because their test looking times were over 3 standard deviations from the condition mean (both were in the experimental condition and saw the equal event). Half of the infants were randomly assigned to the experimental condition and half to the inanimate-control condition; within each condition, half of the infants saw the equal event, and half saw the unequal event.

#### Apparatus, Stimuli, and Procedure

The apparatus and stimuli were identical to those in Experiment 1 except that the penguin puppets were replaced by two identical Elmo puppets (about 25 cm × 25 cm × 10 cm at the largest points). Each puppet was made of red furry fabric and had a large head, large black and white eyes, and an orange nose. The procedure was similar to that in Experiment 1, with two exceptions. First, as noted earlier, infants received two familiarization trials, one to introduce the puppets and then one to introduce the experimenter. Second, a slightly different look-away criterion was used to end the final phase of each trial. Each trial now ended when the infant looked away for 1 cumulative second, as opposed to 2 cumulative seconds. This adjustment was necessary because infants tended to look more continuously at the events, either because of their very young age, because they found the Elmo puppets highly eye-catching, or both.

Infants were highly attentive during the initial phases of the familiarization and test trials; across conditions, they looked, on average, for 87% of each initial phase. Interobserver agreement during the final phase of each trial was calculated for all 40 infants and averaged 92% per trial per infant. Finally, preliminary analyses of the test data revealed no significant interaction of condition and event with infant's sex or with which puppet received the first cookie, both Fs(1,32) ≤ 1.55, ps ≥ 0.222; the data

were therefore collapsed across these latter two factors in subsequent analyses.

# Results and Discussion

fpsyg-10-00116 February 16, 2019 Time: 17:39 # 15

Looking times during the final phase of the first familiarization trial (which introduced the puppets) were subjected to an ANOVA with condition (experimental or inanimate-control) as a between-subject factor. This effect was not significant, F(1,38) = 0.35, p > 0.250, suggesting that infants in the experimental (M = 22.17, SD = 15.67) and inanimate-control (M = 25.14, SD = 16.32) conditions tended to look equally at the puppets. Looking times during the second familiarization trial (which introduced the experimenter and her tray of cookies) were analyzed in the same manner. The main effect of condition was now significant, F(1,38) = 5.83, p = 0.021, indicating that infants in the inanimate-control condition (M = 21.28, SD = 17.75) looked significantly longer than those in the experimental condition (M = 11.29, SD = 5.21). It could be that infants in the inanimate-control condition found this trial more interesting because it involved an animate individual (recall that they had seen only the inanimate puppets in the previous trial), or it could be that infants in the experimental condition found this trial less interesting because the animated puppets introduced in the first trial were now absent. Either way, this finding did not affect our interpretation of the test trial and is not discussed further.

Looking times during the final phase of the test trial (**Figure 2**) were subjected to an ANOVA with condition (experimental or inanimate-control) and test event (unequal or equal) as betweensubjects factors. The analysis yielded a significant main effect of event F(1,36) = 6.25, p = 0.017, as well as a significant Condition × Event interaction, F(1,36) = 5.45, p = 0.025, η 2 <sup>p</sup> = 0.13 (no such interaction was found in either the first or the second familiarization trial, both Fs(1,36) ≤ 0.16, ps ≥ 0.250). Planned comparisons revealed that infants in the experimental condition looked significantly longer at the unequal (M = 25.05, SD = 9.50) than at the equal (M = 12.02, SD = 4.19) event, F(1,36) = 11.69, p = 0.002, d = 1.77, whereas infants in the inanimate-control condition looked about equally at the unequal (M = 16.02, SD = 9.83) and equal (M = 15.57, SD = 9.28) events, F(1,36) = 0.01, p > 0.250, d = 0.04. Wilcoxon rank-sum tests confirmed the results of the experimental (Z = 3.14, p = 0.002) and inanimate-control (Z = 0.00, p > 0.250) conditions.

Next, we compared the test responses of the 4-month-olds in Experiment 4 to those of the 9-month-olds in Experiment 1, using an ANOVA similar to that above but with age as an added between-subject factor. The main effect of age was not significant, nor was the Age × Condition × Event interaction, both Fs(1,72) ≤ 0.04, ps ≥ 0.250, suggesting that the two age groups responded similarly to the test events they were shown. Because slightly different procedures were used at the two ages, however, these negative results should be interpreted with caution.

Next, we compared the test responses of 4-month-olds in the experimental condition (N = 20) who had (9) or did not have (11) an older sibling. The data were subjected to an ANOVA with sibling (yes or no) and event (unequal or equal) as between-subject factors. Neither the main effect of sibling nor the Sibling × Event interaction were significant, both Fs(1,16) ≤ 1.24, ps ≥ 0.250, suggesting that infants responded similarly whether or not they had an older sibling. Given the small numbers of participants involved, however, these results should again be interpreted with caution.

Like the 9-month-olds in Experiment 1, the 4-month-olds in Experiment 4 expected the experimenter to divide the two cookies equally between the two animated puppets, and this effect was eliminated when the puppets were inanimate. These results provide the first experimental demonstration that sensitivity to fairness can already be observed, at least under simple conditions, in the first half-year of life.

# MINI META-ANALYSIS OF EXPERIMENTS 1–4

Finally, we conducted a mini meta-analysis of the experimental data in our four experiments (i.e., the data from the experimental, cover-experimental, two-item, and experimental conditions in Experiments 1–4, respectively). There was no evidence of heterogeneity of effects across experiments (Cochran's Q tests, ps > 0.10), so a fixed-effects meta-analytic model was used. The meta-analytic estimates indicated that across experiments, infants looked significantly longer at the unequal than at the equal event, d+ = 1.34 [0.85, 1.82], z = 5.39, p < 0.001. The Rosenthal Fail-Safe tests suggested that 50 additional failed studies would be required to disprove this effect.

# GENERAL DISCUSSION

The present experiments yielded five findings. First, at both 9 months (Experiments 1–3) and 4 months (Experiment 4), infants expected an experimenter to divide two cookies equally (1:1) between two similar animated puppets, and they detected a violation when she divided them unequally (2:0) instead. Second, infants demonstrated this expectation whether the experimenter gave the cookies one by one to the puppets (Experiments 1, 2, and 4) or first separated them onto two placemats and then gave each puppet a placemat (Experiment 3). Third, infants held no particular expectation about the experimenter's actions when the puppets were inanimate (Experiments 1 and 4) or when the experimenter did not distribute the cookies but simply lifted covers to reveal them (Experiment 2). Fourth, at both 9 months (Experiments 1–3) and 4 months (Experiment 4), infants with or without older siblings were equally likely to detect the violation in the 2:0 outcome. Finally, when the number of cookies distributed was increased from 2 to 4, 9-month-olds failed to detect the violation in the 3:1 outcome (Experiment 3). Together, these results confirm and extend prior findings that 10- to 19-month-olds detected a violation when shown a 2:0 outcome (Sloane et al., 2012; Meristo et al., 2016; Bian et al., 2018), that 12-montholds failed to detect a violation when shown a 3:1 outcome (Sommerville et al., 2013; Tatone and Csibra, 2018), and that 9- and 6-month-olds failed to look preferentially at a 3:1

over a 2:2 outcome when both were presented simultaneously (Ziv and Sommerville, 2017).

The evidence reported here that 9- and 4-month-olds consistently detected a 2:0 violation provides strong support for the suggestion, from researchers across the social sciences, that the "first draft" (Graham et al., 2013) of human moral cognition includes an abstract expectation of fairness (e.g., Shweder et al., 1997; Dawes et al., 2007; Jackendoff, 2007; Premack, 2007; Rai and Fiske, 2011; Sloane et al., 2012; Baumard et al., 2013; Graham et al., 2013; Baillargeon et al., 2015; Meristo et al., 2016; Bian et al., 2018; Buyukozer Dawkins et al., in press). Such an expectation might have gradually evolved in our species in part because it represents a cost-effective strategy for reducing the likelihood of future negative interactions (e.g., Baumard et al., 2013; Cosmides and Tooby, 2013; Bian et al., 2018). By adhering to fairness, a distributor avoids having to work out in each and every resourceallocation situation that a recipient is likely to be resentful if offered, for no obvious reason, less than an equal share of a windfall resource. Over evolutionary time, a genuine expectation of fairness could have emerged that bypassed these mentalizing efforts, reduced errors, and ultimately benefited the distributor as well as the recipients. From this perspective, it would make sense that infants' concern for fairness would be highly abstract and would be brought to bear whenever they saw a distributor divide windfall resources between two similar recipients, be they two women, two speaking puppets, or two animated geometric figures with eyes (e.g., Schmidt and Sommerville, 2011; Sloane et al., 2012; Meristo et al., 2016).

At the same time, however, our findings and those of Sommerville and her colleagues (e.g., Sommerville et al., 2013; Ziv and Sommerville, 2017; see also Tatone and Csibra, 2018) make clear that there are sharp limits in young infants' ability to detect fairness violations. In particular, 9-month-olds are able to detect 2:0 violations, but not 3:1 violations, even when the experimenter's actions toward the recipients are identical (i.e., the experimenter slides a placemat toward each recipient). How can we explain these differential results? There are at least three possibilities.

First, it may be that young infants are able to process distributions that involve two items, but not distributions that involve four or more items, due to limitations in their information-processing capacity (e.g., Diamond, 2013). Thus, when there are two recipients and two items, infants can form an expectation, via simple one-to-one correspondence, about how many items each recipient will get (1:1), and they can compare this expectation to the observed distribution (1:1 or 2:0). When there are four or more items, however, this whole process becomes overwhelming, leading to equal looking times at equal and unequal distributions.

Second, it may be that young infants are able to detect qualitative violations, in which one recipient gets something and the other gets nothing (e.g., a 2:0 or a 4:0 violation), but not quantitative violations, in which both recipients get something but in differing amounts (e.g., a 3:1 or a 7:1 violation). For example, infants' representations of resource-allocation events could at first be very sparse: They might simply represent whether each recipient gets any items, rather than how many items each recipient gets. Such meager representations, when interpreted in light of infants' expectation of fairness, would enable them to detect qualitative violations (i.e., something vs. nothing), but not quantitative violations (something vs. something). As a point of comparison, the physical-reasoning literature presents many examples of event representations that are initially very sparse and become progressively richer as infants identify relevant features that help better predict outcomes (for reviews, see Baillargeon et al., 2009; Stavans et al., 2018).

Finally, it may be that young infants can detect quantitative violations, but only when the two amounts allocated are markedly different. In this view, infants would succeed when the numerical distance between the two amounts is larger (e.g., a 7:1 violation), but not when it is smaller (e.g., a 3:1 violation), most likely due to limitations in their numerical cognition. With experience, infants would come to more precisely represent the amounts allocated to the two recipients and hence would begin to detect a deviation from fairness even in a 3:1 violation.

Which (if any) of the preceding possibilities might be correct? Can prior findings on when infants first begin to detect 3:1 violations help us distinguish between them? It is not clear that this is the case. In particular, consider the finding that 12– 15-month-olds with older siblings looked significantly longer at a 3:1 than at a 2:2 outcome when the two were presented simultaneously (Ziv and Sommerville, 2017). These findings could be taken to suggest that, due to greater opportunities to represent and compare allocations in everyday life, infants with older siblings (a) are better at processing distributions with more than two items, (b) are faster at learning to attend not only to whether recipients get something but also to how many items they get, and/or (c) are more adept at precisely representing and comparing how many items recipients get. Future research can bear on these issues by examining whether young infants would succeed in detecting more extreme quantitative violations, such as a 5:1 or 7:1 violation. If yes, such results would tend to cast doubt on the first and second possibilities listed above and to support the third possibility instead. Such results would also dovetail well with recent findings that preschoolers sometimes perform poorly in first- and third-party fairness tasks due to cognitive limitations in their ability to encode and remember exact numerical information (e.g., Chernyak et al., 2016, 2019; Chernyak and Blake, 2017).

# Prior Findings With Young Infants

As noted above, the positive results obtained with 9- and 4 month-olds in the present allocation-outcome tasks confirm and extend those previously obtained with 10-month-olds (Meristo et al., 2016). The present results also fit well with the finding that 10-month-olds (a) looked significantly longer when an informed bystander rewarded an unfair as opposed to a fair distributor, but (b) looked about equally when an uninformed bystander (whose view was blocked during the distributors' actions) rewarded either distributor (Meristo and Surian, 2013). At the same time, however, our results and those just cited are inconsistent with a few other findings with young infants mentioned in the section "Introduction."

One such finding was that after watching a fair and an unfair distributor divide two items between two recipients, 10-montholds did not show a preference for the fair distributor (Geraci and Surian, 2011). Given the extensive evidence that infants in the first year of life prefer individuals who act positively over individuals who act negatively (e.g., Hamlin et al., 2007; Hamlin and Wynn, 2011; Hamlin, 2013a), it is unlikely that infants failed to prefer the fair distributor because they were too young to show such affiliative preferences. Rather, it is more likely that details about the task made it too difficult for young infants to process. In particular, the task involved five kinds of animal characters. To start, a bear or a lion (the distributor) stood alone at the center of the computer monitor, near two allocation items. Next, a chicken (an observer) entered the scene, brought the items closer to the distributor, and then rested at the bottom of the monitor. Next, a donkey and a cow (the recipients) entered one at a time and took positions in the top two corners of the monitor. Finally, the distributor divided the two items between the recipients, either equally (e.g., the bear) or unequally (e.g., the lion). Given this fairly complex cast of characters, infants might simply have had difficulty remembering who played what role in the events, due to their limited information-processing capacity. Future research can examine whether young infants might be more likely to succeed if shown simpler events involving a fair distributor (e.g., a bear), an unfair distributor (e.g., a lion), and two similar recipients (e.g., two donkeys). Given the present results, we would predict that even young infants would prefer the fair over the unfair distributor.

The other inconsistent findings were that after watching a fair and an unfair distributor divide two items between two similar recipients, 10-month-olds looked significantly longer when a newcomer either rewarded or punished the unfair as opposed to the fair distributor (Meristo and Surian, 2013, 2014). One possible explanation for these results is that because infants could form no particular expectations about the newcomer's actions (recall that the newcomer was entirely absent during the distributor's actions), their responses were guided primarily by a vigilance or negativity bias (e.g., Kinzler and Shutts, 2008; Vaish et al., 2008; Baltazar et al., 2012; DesChamps et al., 2016). Specifically, infants looked longer whenever the newcomer approached the unfair distributor, as though they were interested in monitoring and learning more about that distributor.

# CONCLUSION

In four experiments using allocation-outcome tasks, 9- and 4-month-olds detected a violation when shown an unfair

# REFERENCES


2:0 outcome. In contrast, 9-month-olds failed to detect a violation when shown an unfair 3:1 outcome. Together, these results support claims that an abstract expectation of fairness is a part of the basic structure of human moral cognition, but they also point to sharp limitations in young infants' ability to detect deviations from fairness. The present results thus pave the way for future investigations of how numerical accuracy and other factors may contribute to the development of early expectations about fairness in infancy and beyond.

# AUTHOR CONTRIBUTIONS

MBD, SS, and RB designed the experiments. MBD and SS performed the experiments. MBD analyzed the data. MBD and RB wrote the manuscript, with edits by SS.

# FUNDING

This research was made possible through a grant (52034) from the John Templeton Foundation to RB and a Graduate Research Fellowship (1746047) from the National Science Foundation to MBD. The opinions expressed in this article are those of the authors and do not necessarily reflect the views of the John Templeton Foundation or the National Science Foundation.

# ACKNOWLEDGMENTS

We thank Lin Bian, Cindy Fisher, and Fransisca Ting for helpful comments and suggestions, the staff of the University of Illinois Infant Cognition Laboratory for their help with the data collection, and the parents and infants who participated in the research.

# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyg. 2019.00116/full#supplementary-material

TABLE S1 | Participant information and looking-time data for Experiments 1 through 4.




**Conflict of Interest Statement:** The authors state that they have no affiliations or involvement in any organization or entity with any financial or non-financial interest in the subject matter or materials discussed in this manuscript

Copyright © 2019 Buyukozer Dawkins, Sloane and Baillargeon. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Do Dogs Prefer Helpers in an Infant-Based Social Evaluation Task?

Katherine McAuliffe1,2 \*, Michael Bogese1,2, Linda W. Chang2,3, Caitlin E. Andrews 2,4,5 , Tanya Mayer <sup>2</sup> , Aja Faranda<sup>2</sup> , J. Kiley Hamlin<sup>6</sup> and Laurie R. Santos <sup>2</sup>

<sup>1</sup> Department of Psychology, Boston College, Chestnut Hill, MA, United States, <sup>2</sup> Department of Psychology, Yale University, New Haven, CT, United States, <sup>3</sup> Department of Psychology, Harvard University, Cambridge, MA, United States, <sup>4</sup> Department of Zoology, University of Cambridge, Cambridge, United Kingdom, <sup>5</sup> Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, United States, <sup>6</sup> Department of Psychology, University of British Columbia, Vancouver, BC, Canada

Social evaluative abilities emerge in human infancy, highlighting their importance in shaping our species' early understanding of the social world. Remarkably, infants show social evaluation in relatively abstract contexts: for instance, preferring a wooden shape that helps another shape in a puppet show over a shape that hinders another character (Hamlin et al., 2007). Here we ask whether these abstract social evaluative abilities are shared with other species. Domestic dogs provide an ideal animal species in which to address this question because this species cooperates extensively with conspecifics and humans and may thus benefit from a more general ability to socially evaluate prospective partners. We tested dogs on a social evaluation puppet show task originally used with human infants. Subjects watched a helpful shape aid an agent in achieving its goal and a hinderer shape prevent an agent from achieving its goal. We examined (1) whether dogs showed a preference for the helpful or hinderer shape, (2) whether dogs exhibited longer exploration of the helpful or hinderer shape, and (3) whether dogs were more likely to engage with their handlers during the helper or hinderer events. In contrast to human infants, dogs showed no preference for either the helper or the hinderer, nor were they more likely to engage with their handlers during helper or hinderer events. Dogs did spend more time exploring the hindering shape, perhaps indicating that they were puzzled by the agent's unhelpful behavior. However, this preference was moderated by a preference for one of the two shapes, regardless of role. These findings suggest that, relative to infants, dogs show weak or absent social evaluative abilities when presented with abstract events and point to constraints on dogs' abilities to evaluate others' behavior.

Keywords: social evaluation, helper, hinderer, infancy, domestic dogs, cooperation

# INTRODUCTION

Social evaluation is a core part of the human moral sense: humans tend to prefer helpful individuals and avoid harmful individuals, behaviors which undoubtedly contribute to our ability to work cooperatively in large groups (Hamlin, 2013). Remarkably, some research suggests that social evaluation may be present from infancy. In a first demonstration of early-emerging social evaluation, Hamlin et al. (2007) presented 6- and 10-month-old infants with a puppet show in which an agent (a wooden shape with googly eyes) attempted, but failed, to climb a hill. The agent

#### Edited by:

Norbert Zmyj, Technical University Dortmund, Germany

#### Reviewed by:

Claire Holvoet, Université de Rouen, France Patrizia Piotti, Eötvös Loránd University, Hungary

> \*Correspondence: Katherine McAuliffe katherine.mcauliffe.2@bc.edu

#### Specialty section:

This article was submitted to Developmental Psychology, a section of the journal Frontiers in Psychology

Received: 05 June 2018 Accepted: 04 March 2019 Published: 29 March 2019

#### Citation:

McAuliffe K, Bogese M, Chang LW, Andrews CE, Mayer T, Faranda A, Hamlin JK and Santos LR (2019) Do Dogs Prefer Helpers in an Infant-Based Social Evaluation Task? Front. Psychol. 10:591. doi: 10.3389/fpsyg.2019.00591 was either helped or hindered by another shape. In a preference task, infants preferred the helpful shape. These findings were the first to suggest that social evaluative abilities may be present from very early in life, and have now been replicated and extended numerous times (for reviews see Holvoet et al., 2016; Margoni and Surian, 2018; but see also Salvadori et al., 2015 for a failure to observe preferences for prosocial over antisocial agents in 9 month-olds). Overall, this work has led some scholars to argue that capacities for social evaluation may be part of a system of "core" knowledge, which extends to other conceptual domains (Spelke, 2000; see also Hamlin, 2013).

The finding that social evaluation is deeply rooted in ontogeny raises the question of whether it might be similarly deeply rooted in phylogeny. Do other species show signatures of social evaluation or is this ability unique to our species? Research on animals suggests that indeed social evaluation may be important to sustaining productive cooperative<sup>1</sup> relationships outside of humans. Across a range of taxa, individuals from cooperative species evaluate others based on past behavior and use this evaluation to guide their own decisions (Russell et al., 2008; Subiaul et al., 2008; Herrmann et al., 2013; Abdai and Miklósi, 2016). For instance, reef-dwelling client fish watch cleaner fish (Labroides dimidiatus) cleaning other clients and choose to approach those who behaved cooperatively (Bshary and Grutter, 2006). Similarly, chimpanzees (Pan troglodytes) recruit collaborators who have behaved cooperatively with others in previous interactions (Melis et al., 2006). More recently, social evaluation in animals has been shown to cut across taxonomic lines. For instance, coral trout (Plectropomus leopardus), a fish species which hunts collaboratively with moray eels, can quickly learn to recruit effective eel collaborators (Vail et al., 2014). Social evaluation has also been shown in more abstract contexts: using an infant-inspired paradigm involving moving shapes, recent work has shown that bottlenose dolphins (Tursiops spp) expect agents to interact with helpers (Johnson et al., 2018) while bonobos (Pan paniscus) show a reliable preference for hinderers (Krupenye and Hare, 2018).

Building on this work showing that several animals species evaluate conspecifics and other cooperators, a series of recent studies have investigated whether animals socially evaluate humans (Anderson et al., 2012, 2013; Kawai et al., 2014). For example, tufted capuchin monkeys (Cebus apella) discriminate between good and bad human partners across two contexts: they preferentially accept food from a human who has previously helped another human (Anderson et al., 2012) and from a human who has previously shown reciprocity toward another human (Anderson et al., 2013). While these results indicate that certain primate species have the ability to socially evaluate humans, they are difficult to reconcile with the natural social ecology of nonhuman primates, as non-human primates would not typically benefit from choosing cooperative human partners in the wild.

Domestic dogs (Canis familiaris), on the other hand, are dependent on humans for a range of benefits and so present an ecologically valid model for studying animals' evaluations of human actors. Additionally, like humans, dogs show within and between-species cooperation (Miklósi, 2007; Kaminski and Marshall-Pescini, 2014), leading to a range of contexts in which social evaluation may be beneficial. Finally, because pet dogs and human infants grow up in the same environment, attend to human social cues, and witness similar social stimuli in their environment, many theorists have argued that dogs are a particularly useful comparison for shedding light on potentially human-unique social traits more generally (Hare and Tomasello, 2005; Topál et al., 2014; Johnston et al., 2015).

Dogs cooperate with conspecifics and with humans across a range of real and experimental contexts (Miklósi, 2007; Bräuer et al., 2012, 2013; Ostojic and Clayton, 2013; Kaminski ´ and Marshall-Pescini, 2014). Perhaps because of this, dogs attend to several aspects of their human partners' behavior that could indicate their cooperative tendency: for instance, dogs prefer humans who are friendly (Vas et al., 2005), informative (McMahon et al., 2010), reliable (Takaoka et al., 2014), cooperatively communicative (Petter et al., 2009; Pettersson et al., 2011), winners of playful games (Rooney and Bradshaw, 2006) and, at least in some contexts, familiar (Gyori et al., 2010 ˝ ). These studies suggest that dogs pay attention to many features of humans, which likely serves them well in their cooperative relationships with human partners.

Building on these paradigms, a suite of recent studies has begun to probe dogs' social evaluative abilities (reviewed in Abdai and Miklósi, 2016), specifically asking whether, like humans, dogs show a preference for helpful over unhelpful individuals. These studies can be categorized broadly in one of two ways. First, some studies investigate first-party evaluation contexts in which the subject dog has direct experience with a helpful or unhelpful individual (direct evaluation). A second category of studies investigates third-party evaluation (also known as 'social eavesdropping')—contexts in which the subject dog indirectly observes interactions occurring between others (indirect evaluation).

Within the category of work on direct evaluation, several studies have examined whether dogs can distinguish between helpful and unhelpful humans based on their own interactions with each person. For instance, in Carballo et al. (2015), dogs learned over trials to prefer a "generous" experimenter who would share food with them over a "selfish" experimenter who would eat the food before they could access it. In support of the idea that experience with social interactions is needed to facilitate this discrimination, only adult dogs but not puppies appear to show this effect (Carballo et al., 2017). However, an open possibility is that dogs' performance in these studies can be explained with an alternative explanation: namely that dogs simply come to associate a certain individual (here, the generous individual) with food, and it is this association that drives dogs' preferences. One recent study accounted for this alternative food association interpretation in its design. In this study, Nitzschner et al. (2012) demonstrated that dogs preferred to associate with a "nice" human—someone who behaved affectionately toward them—rather than a "mean" experimenter who ignored them. Thus, although more evidence is surely needed, dogs appear to

<sup>1</sup>For the purpose of this paper, we use the West et al. (2007) definition of cooperation: "a behavior which provides a benefit to another individual (recipient), and which is selected for because of its beneficial effect on the recipient."

be able to form impressions of humans with whom they have directly interacted.

Work on dogs' evaluation in indirect contexts has generated more mixed findings (see Abdai and Miklósi, 2016). Marshall-Pescini et al. (2011) examined whether dogs socially eavesdrop on others; specifically, they tested whether dogs pay attention to nice and mean individuals when they are not the direct recipients of nice or mean behavior. In their task, dogs watched a human beggar approach a generous human from whom they received a treat and approach a selfish human who deprived them of a treat. In a choice task, dogs showed a preference for the generous over the selfish human, providing evidence that dogs can socially evaluate others in indirect contexts. However, as with the possibility of a food confound in the work described above, further work building on this paradigm suggested that dogs' preference for a "nice" human was instead a preference for a location associated with food (Freidin et al., 2013; Nitzschner et al., 2014). In line with this second interpretation, other work has shown that dogs prefer "sharing" over "nonsharing" actors, even in relatively nonsocial tasks; for instance, in which the recipient of generous or selfish behavior is a box as opposed to a human (Kundey et al., 2011). Taken together, these studies hint that dogs track and use information about who has previously been associated with food sharing, even when they are uninvolved bystanders watching interactions between other agents or objects. However, rather than reflecting a preference for prosocial behaviors, these tendencies may reflect that dogs are simply savvy about how they will most readily obtain food. Indeed, a recent study by Piotti et al. (2017) explored whether dogs are sensitive to helpfulness in a paradigm that controlled for the possibility that dogs prefer those associated with food. The researchers introduced a condition in which a "nice" individual who spoke in a high-pitched voice and established eye contact with the dog was not associated with food (i.e., was not helpful in showing dogs how to access food) and compared this to one in which the "nice" individual was associated with food (i.e., was helpful in showing dogs how to access food). They additionally compared these conditions to two other conditions in which the experimenter ignored the dog (ignoring but helpful, and ignoring and not helpful). They found that dogs did not show a preference for the helpful individual (ignoring niceness), nor did they show a preference for the nice individual (ignoring helpfulness), providing further evidence that dogs' social evaluative abilities in indirect contexts are importantly limited.

To our knowledge, there is only one remaining case of putative evidence for dogs' social evaluative abilities in indirect contexts (Chijiiwa et al., 2015). In this study, dogs watched their owner ask one of two people for help accessing an object in a jar. In one condition, the helper assisted each dog's owner in opening the jar while a neutral agent did nothing. In another condition, the nonhelper refused to assist each dog's owner by turning away following their request while a neutral agent did nothing. Dogs were then given a choice to approach and receive food from either the helper vs. neutral person (in the first condition) or the nonhelper vs. neutral person (in the second condition). Dogs were presented with four trials, which meant that they received food from their chosen agent before trials two, three and four. Dogs showed no preference for the helpful over neutral agent. However, they avoided the nonhelper relative to the neutral agent. These findings suggested that dogs may be able to socially evaluate in indirect contexts, at least when their owner is the target of helpful or unhelpful behavior. Additionally, these results were suggestive of a negativity bias—preferential attention to negative information—a bias exhibited by human infants (Hamlin et al., 2010) and bonobos (Krupenye and Hare, 2018). However, these results must be interpreted with caution because (1) dogs did not show an aversion to the nonhelper on the first trial (see Abdai and Miklósi, 2016) suggesting that the pattern of reinforcement between trials may have influenced their behavior and (2) there were important asymmetries in how negative vs. neutral actions were performed which may also have affected their avoidance of unhelpful agents.

Thus, although results are mixed, there is some evidence that domestic dogs and human infants show similarities in their ability to track helpful and unhelpful individuals, consistent with the possibility that individuals in both species benefit from being able to quickly evaluate prospective social partners. However, based on work conducted to date there is a key difference in the contexts in which social evaluation has been demonstrated in infants and dogs. Specifically, human infants have been shown to engage in social evaluation in relatively abstract contexts— infants interact with shapes rather than people— suggesting that social evaluation in infants may be generalizable. By contrast, work on social evaluation in dogs has to date focused only on whether dogs are able to evaluate good and bad humans. While humanevaluation tasks are clearly ecologically valid for dogs, they leave open the question of whether dogs share human infants' ability to extract relevant social information from more abstract contexts. Answering this question will shed light on the strength and flexibility of dogs' social evaluative abilities, providing hints about the importance of these abilities for domestic dogs.

Here we address this question by adapting the original human infant paradigm from Hamlin et al. (2007) for use with domestic dogs. Dogs watched a puppet show in which an agent (a red circle with googly eyes) attempted to climb a hill and was either assisted in climbing by a helper or prevented from doing so by a hinderer shape. Previous work suggests that dogs view moving shapes as social beings (Gergely et al., 2013, 2015, 2016), and thus we were hopeful that dogs would see these shapes as social beings in our task. After seeing this puppet show, dogs were then presented with a choice task in which they could approach the helper or hinderer shape. We predicted that, like infants, dogs would show a preference for helpers. Additionally, we examined whether dogs spent more time investigating the helper or hinderer. We reasoned that dogs may spend longer investigating helpers, if they did indeed show a preference for them. However, we also thought it possible that dogs would spend longer investigating hinderers, consistent with existing evidence that negative social information may be particularly salient to humans (Hamlin et al., 2010), bonobos (Krupenye and Hare, 2018) and possibly to dogs (Chijiiwa et al., 2015). Finally, we examined the number of times that dogs engaged with their handlers during presentations of the helping and hindering events. That is, whether or not they looked at their handlers or otherwise attempt to interact with them; for instance, by looking back at, nuzzling, or putting their head on their handler's lap. Looking back at humans is particularly interesting, as several studies have used this behavior as an indicator that dogs are attempting to engage humans in helping them solve a problem (Miklósi et al., 2003). Here we expected dogs to differentiate between helping and hindering events, but did not have a strong prediction about the directionality of this effect. If dogs showed a strong preference for helpers, they may find helping events more engaging and thus engage with their handlers more while watching helping. In contrast, dogs may find hindering events surprising or unsettling and may thus engage more in response to hindering. Our main aim in designing this study was to provide as close a replication to existing infant work as possible, allowing for a valid comparison of the social evaluative abilities of domestic dogs and infants. We also wished to contribute to the existing literature on dogs' social evaluative abilities, which is quite mixed (Abdai and Miklósi, 2016), by testing dogs' evaluation of helpers and hinderers in a non-food context, thereby removing a factor that has complicated interpretations from past designs.

# METHODS

# Subjects and Design

We tested 27 dogs (15 females; Mean age = 6.45 years, Standard deviation = 2.81, Range = 1.70–11.87) at the Canine Cognition Center at Yale University. Dogs were of varying breeds (see **Table S1** for breed information). Four additional dogs were tested but excluded due to failure to make a choice within the choice interval (3) and because their handler released them before the choice presentation had been completed (1). This study was conducted after piloting different versions of the puppet show to bring the method in line with infant protocols. Two of the dogs in our final sample participated in earlier versions of the study. These subjects were tested with different stimuli and the interval between sessions was nearly 2 years. Our goal was to test as many subjects as possible (with a maximum of 40) in the time that our main experimenter, who had been trained over several months, was available to run the puppet show. Our final sample of 27 is consistent with similar work on infants (Hamlin et al., 2007 tested 28 6- and 10-month old infants in Experiment 1).

We employed a within-subject design. All dogs were presented with four events (two helping events and two hindering events). Events were presented in alternating order and starting event was roughly counterbalanced across subjects (16 dogs saw the helper event first).

# Set-Up

Dogs were tested in a small room (6.5 × 12.5 feet; **Figure 1**), accompanied by their guardians who handled them throughout the experimental session. Guardians sat in a chair with their back to the door and were instructed to position the dog roughly in the middle of their legs. To assist with positioning, a black rectangle was marked on the floor with black tape (**Figure 1**). A video camera was placed behind the puppet show stage, which recorded the dog and the guardian. Additionally, a ceilingmounted camera captured a birds-eye view of experimental sessions as depicted in **Figure 1**. See S2 for a more detailed diagram of room measurements.

The puppet show stage was created using a small table, foamcore, duct tape, and a black shower curtain hung from a rod. The shapes were made of foamcore and wood so that they would make a noise when brought into contact (see **Figure S1** for photograph of shapes). Each shape was covered with duct tape for coloration. We chose to use blue and yellow colors for two reasons. First, because these were the colors used in Hamlin et al. (2007). Second, because previous research has shown that dogs can tell them apart (Neitz et al., 1989; Jacobs et al., 1993; see **Figure S4** for shapes from a dog's eye view; Pongrácz et al., 2017). Stripes were placed on the target agent in order to align it with the upward angle on the hill and to further differentiate the agent from the helpful and hindering shapes. The agent's eyes were glued to gaze toward the top of the hill to emphasize its goal of climbing upwards, known to be critical for infant social evaluation (Hamlin, 2015).

# Procedure

Our procedure was modeled after Hamlin et al. (2007). Dogs watched a puppet show depicting a "helper" shape assisting an agent achieve its goal of climbing a hill and a "hinderer" shape preventing an agent from achieving its goal of climbing a hill (**Figure 1** and **Video S1**).

Before watching the puppet show, dogs were brought into the testing room and given a few moments to acclimatize to the room. However, they were prevented from exploring the puppet show stage and the area behind it. During this period, the experimenters explained the task to the dog's handler and asked them to keep their eyes closed for the duration of the puppet show. This was done so that handlers would not know which shape was the helper and which was the hinderer, thus preventing cueing during the choice period. Additionally, handlers were asked to try their best to keep dogs positioned roughly in the middle of their legs and oriented toward the show.

# The Puppet Show

To conduct the puppet show, an experimenter crouched under the table, behind the stage (**Figure 1**) and controlled the shapes using short wooden dowels (see **Figure S3** for detailed diagram and measurements of hill). When the dog was in position, the experimenter opened the curtains, at which point the dog saw the agent resting at the bottom of the hill. A squeak sound was made using a rubber squeaker and the agent was moved slightly (rotated to and fro) to attract the dog's attention. The dog then saw the agent attempting, but failing, to climb the hill. They saw two complete attempts and failures. On the third attempt, either the helper or hinderer appeared, depending on event type. In Helper events, the second shape (either a yellow triangle or a blue square) appeared at the bottom of the hill and pushed the red circle up the hill, allowing the agent to achieve its goal. In Hinderer events, the second shape appeared at the top of the hill and aggressively (with exaggerated, forceful motions) pushed the agent down the hill, preventing it from achieving its goal. At the end of the scene, the agent remained still for 10 s to allow the dog to look.

After this period, the curtain was closed and the next scene was initiated with another squeak and shape "wiggle" to orient the dog.

informed consent was obtained from the depicted individuals for the publication of these images.

During the show, timing was controlled so that each attempted ascent by the red square was approximately 2 s long, with a rapid 1-s descent over the same distance to simulate "falling" down the hill. The puppet would "rest" for 1 s before attempting to climb the hill again. The agent's climb speed was inverse to its position on the hill; as the agent climbed, the speed of its ascent would slow. In each helping and hindering event, the second shape would push the red circle twice, moving slightly backwards after the first push to show effort. The agent would pause when not in contact with the helper or hinderer between interactions to emphasize their role in pushing or aiding the agent on the hill. After each event, the agent would pause at the base or top of the hill.

Dogs saw four events in total: two helping and two hindering events. The helper and hinderer shapes were roughly counterbalanced across subjects (blue square was helper for N = 16 subjects, yellow triangle was helper for N = 11 subjects). Shape was not perfectly counterbalanced because of exclusions and because our original counterbalancing was created for a maximum sample of N = 40 dogs.

# Choice Measure

During the two helping and two hindering events, the second experimenter was turned around at the back of the room and was blind to the identity of the helper and hinder shapes. After the dog had seen the puppet show, the second experimenter approached the center of the room, called the dog's name to capture their attention and slowly (and simultaneously) placed the two shapes equidistant from the dog. The shapes were presented in clear plastic domes so that dogs could not mouth them. The domes were positioned exactly 27.25 inches from each other and each dome was placed 35 inches from the dog. To ensure consistency in placement across sessions, the dome positions were marked on the floor in black tape (see **Figure 1**; see **Figure S2** for a detailed measurements of the choice area). After placement, the second experimenter backed away and gave the handler a cue to release the dog. If the dog did not look at both domes, the second experimenter tapped the domes to attract the dog's attention. By tapping on the shapes, either with equal force or with slightly more force on one or the other (the one the dog had not seen), we tried to make sure that the dog saw both shapes before moving on to the next part of the procedure. The second experimenter then called the dog's name to center their attention, backed away and gave the handler a cue to release the dog. Handlers were invited to open their eyes for the choice phase of the task. We counterbalanced whether the helper or hinderer was presented on the right or left.

Dogs had a period of 30 s to make a choice and to explore the two shapes. Choices were coded when the dog had one or both paws in or on the choice area, which was demarcated with black tape for ease of live and video coding. After 30 s, the session was terminated. The dog was then led out of the testing room and the handler was debriefed.

# Coding and Analysis

Choice data— specifically, whether the dog chose the helper or hinderer— were live coded by the second experimenter who was blind to which shape was the helper or hinderer. In addition to a live coder, we recruited a second coder to watch and code video-recorded sessions. Sessions were coded from the birds-eye view videos that were captured by a ceiling-mounted camera. However, videos from the camera inside the room were referred to if visual access was occluded in the birds-eye view videos. Our video coder watched all sessions for (1) dog attention; (2) experimenter error; (3) handler error. Attention varied across events, but all included subjects saw at least one of the helping and one of the hindering events. Within events, we ensured that they either oriented toward the stage during the "squeak" or when the shapes first appeared. During the choice phase, all included dogs either oriented toward the second experimenter or had a chance to see both shapes before making a choice. Our video coder also recorded any instances of handler engagement by the dogs. Engagement was coded when dogs turned their heads to look at their handler or otherwise interacted with them (e.g., nuzzling). After watching videos to code for these variables, our video coder re-watched all videos and coded for shape exploration time. Dogs were considered to be exploring the shapes when they were in close proximity to one of the domed shapes. This criterion was met when the dog was directed toward and touching or close to touching (within a few inches of) the dome or base. Pawing was counted within the exploration time. Please see **Table S3** for a summary our dependent measures of interest.

Following video coding, we recruited a third coder to serve as a reliability coder. Our reliability coder coded videos of the choice procedure, which included only the presentation of shapes and the dog's choice (i.e., she did not watch the demonstration phase and was thus blind to condition). Thus, for almost all sessions, we had four sources of data for dog choices: live coding from our "live coder," video coding and exploration coding from our "video coder" and reliability coding from our "reliability coder." In our only exception, live coder failed to note down the dogs' choice on the live coding sheet and thus we only had three sources of data for choice coding. In our sample of 27 dogs, there were only four cases of disagreement across these sources of data. In these cases, we relied on consensus across coding sources (e.g., if three sources reported a "square" choice and one reported a "triangle" choice, we recorded "square"). Please see **Table S4** for a summary of our choice data sources.

Analyses were conducted in R version 3.3.2 (R Core Team, 2016). We examined three dependent measures of interest. First, we examined whether dogs were more likely to choose the helper or hinderer using two-tailed binomial tests, which compared dogs' choices of the helper shape to 50% probability of choosing the helper due to chance. Second, we tested whether dogs spent longer exploring the helper or hinderer shape. To test this, we first employed a two-tailed paired t-test. We tested for normality by examining a quantile-quantile plot, which displays the correlation between our sample distribution (differences) and the normal distribution. Because the majority of our points fell along the 45◦ reference line, we considered our data to meet the assumptions of a paired t-test. Second, we ran a linear mixed model which allowed us to examine interactions of interest while controlling for repeated exploration measures within dog. Subject identity was fit as a random intercept in our mixed model. Third, we tested whether dogs showed more handler engagement during Helper or Hinderer events using generalized linear mixed models (GLMMs) with a logit link function with the presence or absence of engagement (yes = 1, no = 0) as our dependent measure. Again, we included subject identity as a random intercept to control for repeated measures within subject. For both our linear and generalized linear models, we assessed the importance of predictors by including them in a full model and comparing the model with a predictor of interest to one without the term of interest. Model comparisons were conducted using Likelihood Ratio Tests (LRTs) using the command "drop1." Models were fit using package "lme4" (Bates et al., 2012, 2015). Across all our analyses, we additionally explored whether dogs showed a consistent side bias (e.g., a preference for the object presented on the right) and whether they preferred one of the shapes over the other (i.e., a preference for the yellow triangle or the blue square).

# RESULTS

# Choice: Did Dogs Preferentially Approach Helpers?

Fifteen of our 27 dogs chose to approach the helper shape first, which did not differ from chance (**Figure 2**; two-tailed binomial test, p = 0.701). Dogs were no more likely to approach the square than the triangle (10 approached the square first, binomial test, p = 0.248). Choice data thus suggest that dogs did not show a preference for the helper shape, nor did they show a preference for the square or the triangle. However, we did see a significant preference for shapes presented on the dogs' left side: 20 of the 27 dogs approached the shape that was presented on the dog's left side (binomial test, p = 0.019).

# Exploration: Did Dogs Preferentially Explore Helpers?

To examine whether dogs spent more time exploring the helper or hinderer shape, we tallied the total amount of time dogs spent in proximity to one shape or the other and compared them with a two-tailed paired t-test. We found that dogs spent more time exploring the hinderer than the helper (t = −2.27, df = 26, p = 0.032). There was no difference in dogs' exploration of the shape placed on the right or the left (p = 0.4). However, dogs spent longer exploring the triangle than the square (t = −3.5, df = 26, p = 0.002).

To understand whether dogs' preferential exploration of the hinderer could be explained by their preference for the triangle, we conducted a general linear mixed model with exploration time fit as a function of role (helper or hinderer) and shape (was the helper shape the square or triangle) and the interaction between these two terms. We found that the interaction between role and shape was significant (LRT, X 2 <sup>1</sup> <sup>=</sup> 9.55, <sup>p</sup> <sup>=</sup> 0.002). As **Figure 3** shows, this interaction was due to the fact that dogs showed greater exploration of the hinderer in cases in which the triangle was the hinderer (i.e., in which the square was the helper). This same exploratory preference was not seen in cases in which the triangle was the helper, although we had fewer of these cases due to the sampling imbalance mentioned above. To test whether side influenced dogs' exploration, we reran our model with side (the side on which the helper was presented: left or right) entered as a control variable. Including this term did not change our results (LRT, X 2 <sup>1</sup> <sup>=</sup> 9.55, <sup>p</sup> <sup>=</sup> 0.002), nor did its inclusion improve model fit (LRT, X 2 <sup>1</sup> <sup>=</sup> 0.22, <sup>p</sup> <sup>=</sup> 0.64). Model output from all models can be found in **Table S2**.

# Handler Engagement: Were Dogs More Likely to Engage Their Handlers During Helping Events?

Dogs often engaged with their handlers during the event presentations (**Figure 1**). However, they were no more likely to engage during the Helper or Hinderer events. Our GLMM showed that event type (helper vs. hinderer) was not a significant predictor of dog's probability of handler engagement (p = 0.477). We additionally examined whether engagement became more common across time by including event number (1–4) as a predictor and examining the interaction between event number and event type (Helper vs. Hinderer). However, these effects were not significant predictors of dogs' engagement behavior (ps > 0.5). To test whether side influenced dogs' engagement, we reran our reduced model (event as a predictor) with side (the side on which the helper was presented: left or right) entered as a control variable. Including this term did not change our results (p = 0.5), nor did its inclusion improve model fit (LRT, X 2 <sup>1</sup> <sup>=</sup> 0.43, <sup>p</sup> <sup>=</sup> 0.51). Model output from all models can be found in **Table S2**.

# DISCUSSION

To our knowledge, our study is the first to adapt a wellestablished infant social evaluation paradigm (Hamlin et al., 2007) to test domestic dogs. Reasoning that the ability to evaluate helpfulness would be beneficial to domestic dogs due to their reliance on humans, we predicted that, like human infants, domestic dogs would show a preference for helpers over hinderers. However, across three measures we found no strong support for this prediction: dogs were no more likely to approach the helper shape than the hinderer shape, no more likely to engage handlers during the helpful events than during the hindering events, and no more likely to explore hindering individuals independently of the individuals' color and/or shape. On this last point: dogs in our study did show greater exploration of the hindering individual on our exploration measure, indicating that they may have found hindering behavior to be more puzzling and/or interesting than helping behavior, thus warranting extra investigation. However, this effect must be interpreted with caution, as dogs' exploration of hindering shapes was moderated by a preference for the triangle shape over the square shape.

Our choice measure was most analogous to the preference measure used in past infant work and suggested that dogs had no preference for the helper over the hinderer. This is intriguing given that past infant work has shown a strong and earlyemerging preference for helpers in this paradigm: In Hamlin et al. (2007) study, a large majority of 6- and 10-month old infants (26 of 28 infants tested in Experiment 1) chose to interact with the helpful as opposed to the hindering shape, and followup replications (Hamlin, 2015) showed a similar rate of helper preferences (but see Scarf et al., 2012; Hamlin, 2015 for evidence that infants do not prefer helpers in certain circumstances, and Salvadori et al., for a replication failure in a different context). These data are interesting in light of other work which shows that dogs avoid unhelpful individuals in indirect contexts in a 'live action' paradigm, in which they witness human agents interacting (Chijiiwa et al., 2015). Specifically, dogs avoid people who refuse to help their owner. Taken together with our findings, these results suggest that dogs' social evaluative abilities may be restricted to more ecologically-valid contexts. By contrast, infants have a more general ability to extract relevant social information from a wider range of contexts.

While our choice data did not reveal a preference for helpers, it did reveal a significant side bias: dogs were more likely to choose to approach the domed shaped that had been placed on the left side of the room. We are not entirely sure why this happened. One possibility is that there was more equipment stored on the left side of the room (the camera was placed there) so it is possible that the left side of the room was more attractive because it contained more visual stimuli than the right side of the room. A second possibility is that dogs preferred to approach the hinderer's side of the room (recall that the hinderer always emerged from the left side and disappeared into the left side of the hill). A third possibility is that the dogs were avoiding the side of the room where the second experimenter had been waiting with the choice objects. Because we could not counterbalance the side of the recording equipment or the side from which the second experimenter approached due to constraints of our room set-up, we cannot distinguish between these possible explanations for the observed side bias. However, these are merely speculative suggestions, as we had no a priori reason to think that one side of the room would be more attractive to dogs than the other.

Our handler engagement results suggest that dogs frequently engaged their handlers by socially referencing them or using other behaviors. However, they were no more likely to engage during the Helper or Hinderer events. While this is by no means a perfect measure of dogs' responses to these events, we thought handler engagement would provide insight into whether dogs were more interested in or unsettled by one event than the other. For instance, dogs look back at humans when confronted with an unsolvable task (Miklósi et al., 2003) and there is recent evidence that dogs show social referencing toward humans when they encounter a potentially scary object (Merola et al., 2012a,b). Despite this, we observed no differential handler engagement during helper and hinderer events in our task.

Relative to our choice and handler engagement measures, our exploration time measure yielded some intriguing differences in dogs' behavior toward the helper and hinderer shapes. We found that dogs spent longer investigating the hinderer than the helper during the 30 s exploratory period. Preferential exploration of the hinderer is consistent with the idea that dogs did, in fact, distinguish between the helper and hinderer and were perhaps driven to preferentially investigate it out of surprise at its behavior, keeping in mind that this preference was also moderated by the shape/color of the object. This result is in line with previous work showing that in a "live action" paradigm, dogs did not distinguish between a human who helped their owner (the helper) versus one who did nothing (neutral agent), but avoided a human who refused to help their owner relative to the neutral agent (Chijiiwa et al., 2015, but see Abdai and Miklósi, 2016). Taken together with our shape exploration finding, these results suggest that dogs may pay particular attention to unhelpful individuals, avoiding them in some contexts and exploring them in others (our paradigm). Indeed, dogs may show something akin to the negativity bias that has been seen in young infants (Hamlin et al., 2010; see Abdai and Miklósi, 2016 for a review).

While our finding that dogs showed preferential exploration of the hinderer is intriguing, we must be cautious in interpreting it richly for two reasons. First, it is possible that dogs were more likely to explore the hinderer due to activity differences that existed between the helper and hinderer events. One difference is that the hinderer may have contacted the red circle with slightly more force than did the helper in the experimenter's effort to convey hindering behavior. This may have even resulted in a slightly louder sound from the dog's perspective during the hindering events, which could have led to difference in how attention-grabbing the different scenes are. In addition to this possibility, there may have been other differences across the scenes which led to differential attention and thus to differential exploration (e.g., maybe dogs viewed hindering events as more playful than helping events). While these possibilities should certainly be considered, it is also important to note that we kept our events as close to the infant paradigm as possible, thereby making it worthwhile to discuss differences between our dog findings and the existing infant findings. A second reason why we must be cautious in interpreting our exploration result is because we additionally found that dogs spent longer investigating the triangle than the square. Our follow-up model suggested that dogs' exploration time was predicted by an interaction between shape role (helper versus hinderer) and shape (triangle versus square). Thus, what initially appeared to be preferential exploration of the hinderer is likely to be—at least in part—accounted for by a preference for the triangle. Because the triangle was always yellow, this could also be explained by preference for yellow objects. This preference appeared to be particularly pronounced when the triangle was the hinderer, suggesting that dogs were especially drawn to the triangle when it was playing the hinderer role. While these data are suggestive of a potentially interesting additive effect of dogs' interest in hinderers and triangles, it is difficult to make a strong case for this interpretation because we had a slight sampling bias toward the triangle playing the hindering role due to exclusions and our sample size being lower than our planned maximum target. Future work investigating dogs' abstract social evaluative abilities could also include guardian questionnaires which assess whether dogs have more triangle- or square-shaped toys at home (or yellow- or blue-colored toys) and could pre-test dogs for a baseline color and/or shape preference.

Our aim in designing this study was to adapt a method that has been successfully employed in work on social evaluation in young infants. While we believe that we achieved this aim, our close reliance on the infant method resulted in several possible limitations of our study. First, at a high level, our aim to standardize methodology meant that our paradigm was not particularly socially valid for dogs. Future work could adapt the puppet show to use objects and events that would be more familiar to dogs. Second, and in this same vein, it is possible that domestic dogs do not ascribe agency to wooden shapes with googly eyes in the same way that human infants do. While this is certainly a possibility, it is worth noting that previous work has shown that dogs do view moving objects as social interaction partners (Gergely et al., 2013, 2015, 2016). Nevertheless, a difference in agency ascription could explain the discrepancy between our results and past work that has used a live action paradigm (Chijiiwa et al., 2015). However, another possibility is that dogs understood the actions as helping and hindering but this understanding did not result in a preference since they were unaffected by the agents' actions. Had we tested dogs in a second-party context, one in which they were reliant on one of the agents for help, we may have seen a preference for the helper. Future work could explore this possibility. To further probe dog's understanding of helping vs. hindering, future work could also test dogs in a looking-time paradigm (West and Young, 2002; Racca et al., 2009; Marshall-Pescini et al., 2014). A looking time task would provide insight into whether dogs expect the recipient of help/harm to prefer one agent over the other (Kuhlmeier et al., 2003; Hamlin et al., 2007), even if they do not prefer the helpful over the hindering individual.

A final caveat we would like to note is that we have compared our results to those of Chijiiwa et al. (2015) throughout the discussion because, to our knowledge, their study represents the only remaining putative evidence for dogs' social evaluative abilities in indirect contexts. However, we want to emphasize again that the findings from this study must be interpreted with caution (Abdai and Miklósi, 2016). Additionally, it is important to note that other studies have investigated dogs' indirect evaluative abilities using live action paradigms, and have not found evidence that dogs prefer nice and/or helpful people (Nitzschner et al., 2014; Piotti et al., 2017). Thus, before claiming that dogs can more easily evaluate unhelpfulness in a live-action paradigm than in an abstract paradigm, it is important to understand how robust the effects of indirect social evaluation are in human evaluation tasks. We view this as an important next step for future work in this area.

In sum, our study is the first to adapt a well-established infant social evaluation paradigm for use with dogs. We were interested in exploring whether dogs, like infants, can extract relevant social information from relatively abstract events. However, across our three measures, dogs did not show behavior consistent with this ability. These findings add to the ongoing debate about dogs' social evaluative abilities based on direct and indirect experience and complement existing work suggesting that dogs

avoid unhelpful humans in a third-party context. Broadly, these

# REFERENCES


findings suggest that while dogs may attend to and use social information about human interaction partners in some contexts, these abilities may not generalize to more abstract contexts as they do in infants.

## ETHICS STATEMENT

This study was carried out in accordance with the recommendations of Protocol # 2017-11448 which was approved by the Yale Institutional Animal Care and Use Committee. Owners of animal subjects gave written and informed consent before participation in the study.

# AUTHOR CONTRIBUTIONS

KM, MB, LC, CA, TM, AF, JKH, and LS cotributed to the design of the study and refined the methodology. Analyses were conducted by KM and LC. The paper was written by KM, MB, LC, CA, JKH, and LS.

# FUNDING

CA would like to thank the Harvard College Research Program for funding her time at Yale. LS is grateful to the McDonnell Foundation for a grant supporting this project.

### ACKNOWLEDGMENTS

We are grateful to the following people for their help collecting and coding the data for this study: Alexandra Bailey, Cove Geary, Daniel Gil, Gorana Gonzalez, Sarah Kosterlitz, Serena Murphy and Miriam Ross. Additionally, we thank Angie Johnston for help at all stages of this project. Finally, we would very much like to thank the dogs and guardians who participated in this research.

# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyg. 2019.00591/full#supplementary-material

Video 1 | Video clip of helping, hindering and choice events.

Bates, D., Maechler, M., Bolker, B., and Walker, S. (2015). Fitting linear mixedeffects models using lme4. J. Stat. Softw. 67, 1–48. doi: 10.18637/jss.v067.i01


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 McAuliffe, Bogese, Chang, Andrews, Mayer, Faranda, Hamlin and Santos. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Examining Infants' Individuation of Others by Sociomoral Disposition

#### Hernando Taborda-Osorio<sup>1</sup> \*, Ashley B. Lyons<sup>2</sup> and Erik W. Cheries<sup>2</sup>

<sup>1</sup> Department of Psychology, Pontificia Universidad Javeriana, Bogotá, Colombia, <sup>2</sup> Department of Psychological and Brain Sciences, University of Massachusetts, Amherst, MA, United States

Early on infants seem to represent social actions of others from a moral perspective, evaluating others' dispositions as "mean" or "nice." The current research examined whether or not 11-month-old infants represent these sociomoral dispositions as deep and identity-determining properties using an object individuation task. Infants were shown two identical looking characters emerging sequentially from behind a screen and engaging in two different sociomoral actions. By using a looking-time paradigm the results show an interaction effect between the baseline and test trials, showing that infants seem to represent two different characters involved in the event, disregarding their same external appearance. This effect was mainly apparent when infants witnessed a negative event first in test trials. Experiments 2 and 3 control for alternative explanations. In Experiment 2 infants failed to individuate two characters when they are shown two identical looking puppets. In Experiment 3 infants fail to represent two characters when social information was taken away from the show. We discuss the possibility that by the end of the first year of life infants might represent sociomoral dispositions as diagnostic of individual identity.

#### Edited by:

Kelsey Lucca, University of Washington, United States

#### Reviewed by:

Robert Hepach, Leipzig University, Germany Arber Tasimi, Stanford University, United States

### \*Correspondence:

Hernando Taborda-Osorio htaborda09@gmail.com

#### Specialty section:

This article was submitted to Developmental Psychology, a section of the journal Frontiers in Psychology

Received: 23 May 2018 Accepted: 14 May 2019 Published: 31 May 2019

#### Citation:

Taborda-Osorio H, Lyons AB and Cheries EW (2019) Examining Infants' Individuation of Others by Sociomoral Disposition. Front. Psychol. 10:1271. doi: 10.3389/fpsyg.2019.01271 Keywords: infants, cognitive development, sociomoral dispositions, individuation, social cognition

# INTRODUCTION

Moral judgment is a fundamental part of our daily social life. Our constant evaluation of others' behavior – categorizing others' actions as nice or mean, helpful or unhelpful – comprises a moral sense that is a continuous influence on the ways we choose to interact with others (Tomasello, 2016). Moreover, the propensity to automatically infer the social disposition of others appears to take root very early in development. By the end of their first year of life infants spontaneously represent the social actions of others as positive or negative (Premack and Premack, 1997), predict agents' social preferences based on their past sociomoral interactions (Kuhlmeier et al., 2003), and evaluate others' actions whereby they reject mean agents and choose to interact with nice ones (Hamlin et al., 2007). Such moral evaluations have been examined across a range of different scenarios and levels of difficulty (see Hamlin, 2013b for a review). For example, infants as young as 6 months of age who observe a puppet whose goal (such as reaching the top of a hill or opening a box) is assisted or thwarted by others, seemingly represent the agents involved in the interaction as possessing a nice (positive) or mean (negative) social disposition, respectively (e.g., Hamlin et al., 2007; Hamlin and Wynn, 2011; cf., Salvadori et al., 2015). Furthermore, infants' evaluations are dependent on the goals (Hamlin and Baron, 2014), intentions (Hamlin, 2013a), and knowledge the characters possess when interacting with one another (Hamlin et al., 2013b), suggesting that these abilities comprise the essential foundation for a later-developing system of moral judgment (Wynn, 2008). Such a

core capacity for social evaluation may be the result of an evolutionary adaptation to deal with other people in cooperative contexts (Tomasello and Vaish, 2013).

As these previous studies show, infants are capable of distinguishing agents by the sociomoral dispositions they display. However, no prior research has investigated how central these moral dispositions are for representing the identity of people over time. The current study aims to investigate whether infants represent an agent's moral disposition as a deep and identitydetermining property. Currently, it is an open question whether infants represent the sociomoral behaviors other engage in as fleeting actions that are subject to change from one moment to the next, or instead as relatively stable traits that constitute an important part of an agent's individual identity. In other words, are infants biased to represent helpful and unhelpful actions as arising from different types of individuals?

Previous research show evidence of trait-based reasoning in older children and adults. For example, adults heavily weigh memories and personality traits when judging whether or not someone is the same person (Rips et al., 2006; Rips, 2011). Thus, psychological factors are more crucial for tracking peoples' identity than external features, such as their face or bodily features (Brook, 2014). Similarly, preschool-aged children hold the belief that moral traits such as "niceness" and "meanness" are stable over time (Liu et al., 2007; Diesendruck and Lindenbaum, 2009; Boseovski, 2010), treat them as inductively powerful features rather than as mere transient behavioral properties (Heyman and Gelman, 2000), and use trait labels "mean" and "nice" to predict others mental states (Heyman and Gelman, 1999). For instance, young children predict that people labeled as "mean" will have more negative motives than "nice" people (Heyman and Gelman, 1999). Taken together, evidence suggests that older children represent sociomoral behavior as reflecting stable psychological dispositions that are inherently part of an individual's identity and that help organize the social world in a more or less categorical manner. This may reflect, or perhaps help explain, a widespread practice in many cultures to tell children stories about well-defined good and evil characters (Bloom, 2013).

A powerful way to explore the developmental origins of this type of trait-based reasoning is by using a classic individuation task (e.g., Xu and Carey, 1996; Kingo and Krojgaard, 2011). In this experimental paradigm, infants are shown a situation, where 2 objects emerge sequentially from behind one or two screens separated by a gap. In the two screen condition infants as young as 4 months are able to use the differing spatiotemporal trajectories of the objects to represent that there must be two individuals in the event, a process called "individuation" (Spelke et al., 1995). By contrast, in order to successfully individuate two objects in the one screen condition, where the spatiotemporal properties of each object are ambiguous (i.e., both objects appear from behind the same screen), infants must rely upon their representations of other properties. Studies using this paradigm have determined that infants are capable of using featural information, such as an object's shape, size, and pattern, from very early on (Wilcox, 1999), and functional and language-related differences between objects by about 10–12 months of age (Xu and Carey, 1996; Xu and Baker, 2005; Futo et al., 2010). Most strikingly, this paradigm has revealed that not all perceptually salient property differences are treated equally. Infants will respond to an event portraying two very different looking objects as containing just a single individual if they share some deeper or more intrinsic property such as their category membership (Xu et al., 2004), ontological kind (Bonatti et al., 2002; Surian and Caldi, 2010), or physical "insides" (Taborda-Osorio and Cheries, 2018). For example, while infants who observed an object displaying self-propelled motion and agentive features (e.g., a worm) and another that looked like a typical inanimate object (e.g., a box) represented two individuals in the scene, infants failed to individuate two very different looking entities that were agents (e.g., a bee and a worm; Surian and Caldi, 2010).

Just as individuation tasks have been used to identify the diagnostic criteria that underlie infants' representations of objects, the current project uses this same strategy to determine whether infants represent an agent's sociomoral behavior as a relatively stable and identity-determining property. Do infants represent sociomoral behaviors as fleeting actions that are subject to change from one moment to the next, or instead as stable traits that constitute an important part of an agent's individual identity? We tested this by merging the classic object individuation task (Xu and Carey, 1996) with a recent demonstration of infants' sociomoral evaluation (Hamlin and Wynn, 2011). Specifically, we tested whether infants would use the type of sociomoral behavior they observe to individuate the number of agents that exist in an event. In all three experiments reported here, 11-month-old infants witness a puppet struggling to open a box. In Experiment 1 two identical looking characters emerged sequentially from behind a screen and engaged in two different sociomoral actions toward the puppet, helping or hindering its goal of opening the box. In Experiment 2 infants observed the same task except that the two identical-looking characters appeared at different times to engage in identical rather than different sociomoral behaviors (two helping or hindering actions). Finally, Experiment 3 examined whether infants' individuation judgments were primarily driven by characters engaging in two perceptually different actions that lacked any sociomoral content.

# EXPERIMENT 1

# Participants

Sixteen 11-month-old infants (8 female) participated in this experiment (M = 11 months, 13 days, SD = 5 days). This age group was selected based upon similar individuation studies using infants in the 10–12 month age range (e.g., Xu and Carey, 1996). All participants were healthy, full-term infants recruited from the Amherst, Massachusetts area. All study procedures were approved by the University of Massachusetts Internal Review Board and written informed consent was obtained from each of the parents. Eight additional infants participated but were excluded from analysis because of fussiness (2), experimental error (4) and parental interference (2).

# Materials

Infants sat on their parent's lap facing a black stage measuring 118 cm. wide × 75 cm. high. The room was dimly lit and parents were instructed to remain silent throughout the experiment. Infants observed a transparent box (35 cm. wide × 19 cm. deep and 12 cm. high) resting on the center of the stage with two different-colored cubes (5 cm × 5 cm) inside. At the right corner of the stage infants observed a blue screen (25 cm high × 36 cm wide) in a vertical position. There was a gap of 12 cm between the screen and the right frame of the stage and a gap of 17 cm between the screen and the box. Three different puppets were used in the experiment, all measuring 18 × 10 cm. A cow puppet served as the "Protagonist" who struggled to open the box. A pig puppet served as the "Opener" who emerged from behind the screen and helped the Protagonist to open the box by lifting the lid. Another identical pig puppet served as the "Closer" who hindered the Protagonist from opening the box by slamming the lid shut. A black curtain was lowered between trials to hide the stage. Two video cameras recorded events for posterior analyses, one focused on the infant's face and the other focused on the stage (see **Figure 1** for a depiction of the materials and stage display).

# Design and Procedure

Infants were shown 4 baseline trials, 2 familiarization trials, and 4 test trials in a typical violation-of-expectation design as described below (see **Table 1** for the complete set of variables).

#### Baseline Trials

In the Baseline Trials, the curtain was raised revealing an upright blue screen on the stage, then one of the experimenters drew the infant's attention to the stage using infant-directed speech ("Hi [baby's name], look here") before dropping the screen revealing either one or two identical pig puppets (see **Figure 1**). Infants' looking time was recorded and the trial finished when they either looked away for at least two consecutive seconds or after 60 s of cumulative looking. This procedure was repeated for a total of 4 baseline trials. The number of revealed objects was counterbalanced across participants (baseline trial block: 1, 2, 2, 1 or 2, 1, 1, 2).

#### Familiarization Trials

The familiarization trials were modeled from the original box task used in previous demonstrations of infants' moral evaluation that elicited reliable reaching preferences (Hamlin and Wynn, 2011). The familiarization trials were included to expose infants to the

FIGURE 1 | An outline of the experimental design depicting the (A) baseline trials, (B) familiarization/test events, and (C) test outcomes of Experiment 1.

events and to help facilitate encoding of the sequence of actions that would be seen in the subsequent test phase. At the start of the event the Protagonist puppet entered the stage from the left corner and moved to one side of the box, which was positioned in the center of the stage. The puppet leaned down to look inside the box three times and then attempted to open the box four times by pulling on the corner of the box's lid. On the first two attempts it pulled up, lifted the edge of the box a few inches, and dropped it back down. On the third and fourth attempts, it lifted the edge of the lid and lowered it while continuously holding onto the lid, as if the lid was too heavy for it to open. On the fifth attempt, a Pig puppet moved out from behind the opaque screen that was positioned on the right side of the stage, and moved forward next to the box. What happened next was determined by whether it was a Helping or Hindering trial.

During the Helping trial, the Pig puppet grasped the front right corner of the box, and both the Pig and Protagonist opened

#### TABLE 1 | A table depicting the counterbalance variables that were used across all three experiments.


the box together. The Protagonist then reached into the box, retrieved one cube, and returned to its original location on the left side of the stage. The Pig closed the lid and returned back to its original position behind the opaque screen.

During the Hindering trial, the Pig puppet jumped on the corner of the box, slamming the lid closed. The Protagonist and Pig puppets then returned to their original locations (the left side of the stage and behind the opaque screen, respectively). Both Helping and Hindering trials lasted approximately 45 s. After the action on the stage had paused for 5 s the curtain was lowered and the trial ended. The order of these trials (Helping or Hindering trials first) was counterbalanced across participants.

## Test Trials

Each test trial began by showing infants a full sequence of the same familiarization trial events (both a helping and hindering event) described above. In addition, test trial events included a second phase where, after each full helping/hindering sequence had ended, one of the experimenters drew the infant's attention to the opaque screen on the stage (e.g., "Hi [baby's name], look here") and then dropped the opaque screen to reveal either 1 or 2 identical pig puppets resting on the stage. The number of puppets revealed behind the screen (1 or 2) and the order of the preceding events (Helping first or Hindering first) were both counterbalanced for each participant in two trial blocks (1, 2; 2, 1 or 2, 1; 1, 2, and Helping, Hindering; Hindering, Helping or Hindering, Helping; Helping, Hindering). The duration of the infants' looking time was coded by two independent observers who were naive to the condition. The inter-observer agreement was high (r = 0.96).

# Results

Preliminary analyses found no main effects of sex, Outcome Order (1 object or 2 objects first) or Trial Order (Helping first or Hindering first); therefore, these variables were collapsed in subsequent analyses. Following previous individuation studies with a within-subject design (e.g., Xu et al., 2004; Kingo and Krojgaard, 2011) the index of object individuation in this task is the statistical interaction in looking time to the two different object outcomes (1 vs. 2 objects) between the baseline and test phases. As such, a 2 (Object Outcome: 1 or 2 objects) × 2 (Trial Type: baseline or test) repeated measures analysis of variance (ANOVA) was conducted. This analysis revealed a significant interaction between Object Outcome and Trial Type, F(1, 15) = 13.4, p < 0.01, η <sup>2</sup>p = 0.47, which resulted from longer looking times toward two object outcomes (M = 9.81 s, SD = 3.89 s) than one object outcomes (M = 7.56 s, SD = 2.93 s) in the Baseline Trials, and longer looking times toward one object outcomes (M = 10.5 s, SD = 4.54 s) than two objects outcomes (M = 8.09 s, SD = 2.92 s) in the Test Trials (**Figure 2**). Planned comparison t-tests of one- versus two-object outcomes revealed a significant difference in the Baseline [t(15) = −3.2, p = 0.02 d = 0.8, two-tailed, Bonferroni corrected, 95% CI = −3.75, 0.75], but a non-significant difference in the Test Trials [t(15) = 1.85, p = 0.16, d = 0.62, two-tailed, Bonferroni corrected, 95% CI = −0.34, 5.2]. Additionally, the non-parametric analysis revealed that 12 out of 16 infants exhibited a larger preference for the one object outcome (p = 0.04, via a binomial test) in the test trials, while in the baseline trials only 3 infants had the same preference (p = 0.01, via a binomial test). The difference between both conditions was significant (p = 0.004, Fisher's exact test). Overall, these results show that in the test trials infants overcame their preference for looking longer toward the two objects outcome, providing evidence of infants individuating two different agents behind the screen. However, the planned comparisons failed to provide this evidence in test trials only.

In order to get a better understanding as to why the results in test trials did not reach a significant difference we conducted a new set of analyses. Each participant witnessed two test pairs, counterbalancing Object Outcome and Trial Order; therefore, we conducted an ANOVA to detect possible differences across test pairs in infants' looking time. A 2 (Test Pair: Hinder block first or Hinder block second) × 2 (Object Outcome: 1 or 2 objects) × 2 (Trial Order: Helping first or Hindering first) mixed-design ANOVA revealed a significant interaction between Trial Order and Object Outcome, F(1, 14) = 9.4, p < 0.01, η <sup>2</sup>p = 0.4. This interaction was followed up with planned t-tests between one and two objects outcome for Hindering first trial and for Helping first trials. This comparison showed a significant difference in the Hindering first condition, t(15) = 3.2, p = 0.01, d = 0.69, two-tailed, Bonferroni corrected, 95% CI = 1.8, 9.1 (MOneObject = 13.4 s, SDOneObject = 8.7; MTwoObjects = 7.9 s, SDTwoObjects = 4.2), but no for Helping first condition, t(15) = −0.39, p > 0.5, d = 0.12, two-tailed, Bonferroni corrected, 95% CI = −4.1, 2.8 (MOneObject = 7.6 s, SDOneObject = 3.1; MTwoObjects = 8.2 s, SDTwoObjects = 5.1). These findings show that the difference between one and two objects outcome showed up only in the test pair where infants witnessed the hinder action first, regardless of whether it was the first or the second block. The ANOVA also revealed significant interactions between Test Pair and Object Outcome, F(1, 14) = 4.95, p = 0.043, η <sup>2</sup>p = 0.26, and between Test Pair and Trial Type, F(1, 14) = 6.6, p = 0.02, η <sup>2</sup>p = 0.32. However, planned comparisons do not show significant differences across simple effects in either case, correcting for multiple comparisons. No other interactions or main effects were significant.

Additionally, we examined the nature of the reported result in this experiment further by including a Bayes factor analysis using a one-sample t-test on the baseline-test trial difference score. This resulted in a Bayes Factor that strongly favored the experimental hypothesis (Scaled-Information Bayes Factor = 26.4; Rouder et al., 2009).

# Discussion

Overall, the results of this experiment suggest that infants' expectation about the number of individuals involved in the event was significantly affected by the different preceding actions they observed. By 11-months of age infants overcome their baseline preference for two objects, showing different patterns of looking time in baseline and test trials. This evidence of object individuation is striking since previous studies have demonstrated that infants at this age require the presence of contrasting physical properties (e.g., color or

shape differences) in order to represent objects as separate individuals (e.g., Xu and Carey, 1996; Van de Walle et al., 2000). In contrast, the current study suggests that infants will infer the presence of two individuals if they observe two different sociomoral actions, despite both puppets involved in the helping-hindering interactions displaying the same surface properties. This pattern indicates that infants may have interpreted the different sociomoral actions as relatively stable behavioral dispositions that were diagnostic of their being 2 puppets involved in the event (e.g., one who helps and one who hinders).

Additionally, we found that infants' individuation response appeared to be strongest when the first social interaction they observed was negative. That is, infants had a stronger expectation of there being two separate individuals involved in the events when first viewed a puppet hindering another's goal before seeing a helpful event. This result could be an instance of the so-called "negativity bias" previously reported in the moral development literature, where negative events are better remembered and weighted than positive events (e.g., Hamlin et al., 2010; Hamlin and Baron, 2014). This effect may be due to negative events being perceived as more diagnostic of individual's underlying disposition or because negative events are much more salient and have a deleterious effect on infants' memory of the relatively weaker positive event.

While one interpretation of the current results is that infants used the different sociomoral behaviors they observed as criteria for agent individuation, an alternative possibility is that infants merely represented two individuals behind the screen based on the number of actions they observed, regardless of how these actions differed. Indeed, previous studies have reported that 6 month-olds are able to individuate and enumerate actions from continuous motion (Wynn, 1996; Sharon and Wynn, 1998). For example, when infants witness a sequence of 2 identical actions (jumps) they dishabituate when observing 3 actions, even if both sequences have the same duration (Sharon and Wynn, 1998). Therefore, an alternative explanation for the pattern of results we observed is that infants count 2 actions based on the sequence of events within each trial (one helping and one hindering event) and expect a correspondence between the number of actions and the number of puppets behind the screen, resulting in longer looking times for 1 object than for 2 objects outcome in the test trials. To test for this possibility a second experiment was run using the same box task but presenting 2 identical moral dispositions within each trial, two helping or two hindering actions. If infants in the first experiment individuated the puppets solely because of the number of actions or events, the pattern of results should be replicated in the second experiment.

# EXPERIMENT 2

# Participants

Sixteen 11-month-old infants (8 females) participated in this experiment (M = 11 months, 12 days, SD = 5 days). All infants were recruited from the Amherst, Massachusetts area, all study procedures were approved by the University of Massachusetts Internal Review Board and written informed consent was obtained from the parents. Four additional infants participated but were excluded from analysis because of fussiness (2) and experimental error (2).

# Materials, Design, and Procedure

The materials, design and procedure for the second experiment were the same for that of Experiment 1, except that both social actions infants witnessed were identical in the pattern of motion and in the moral disposition they displayed (both helping actions or both hindering actions), which was counterbalanced across participants. In order to be consistent regarding the number of cubes that infants observe in the box across test trials and across experiments, the hindering event started off with only one cube inside the box, and the helping event started off with three cubes inside the box. The result of two helping actions and two hindering actions was always one cube inside

the box. The inter-observer agreement of this experiment was high (r = 0.95).

# Results

Preliminary analyses found no main effects of sex, Outcome Order (1 object or 2 objects first) or Trial Order (Opening first or Closing first); therefore, these variables were collapsed in subsequent analyses. A 2 (Object Outcome: 1 or 2 objects) × 2 (Trial Type: baseline or test) repeated measures analysis of variance (ANOVA) yielded no significant main effect for Object Outcome, F(1, 15) = 2.08, p = 0.17, η <sup>2</sup>p = 0.01, and Trial Type, F(1, 15) = 0.71, p = 0.41, η <sup>2</sup>p = 0.04. This analysis did not reveal a significant interaction between Object Outcome and Trial Type, F(1, 15) = 0.105, p = 0.75, η <sup>2</sup>p = 0.01. Infants spent the same time looking at the 1 and 2 object outcomes in both the baseline trials (M = 8.48, SD = 5.09; M = 9.1, SD = 3.72, for one object and two objects, respectively, t(15) = −0.69, p > 0.5, d = 0.17, two-tailed, Bonferroni corrected, 95% CI = −2.55, 1.3) and the test trials (M = 7.3, SD = 3.27; M = 8.4, SD = 3.44, for 1 object and 2 objects, respectively, t(15) = −1.1, p > 0.5, d = 0.28, two-tailed, Bonferroni corrected, 95% CI = −3.2, 1.0). A nonparametric analysis revealed that significantly more infants had a larger preference for the 2 objects outcome in test trials with 12 out of 16 infants showing this pattern (p = 0.04, via a binomial test), reversing the results of Experiment 1, while in the baseline trials 9 infants displayed a preference for the 2 objects outcome (p = 0.4, via a binomial test). The difference between both conditions was non-significant (p = 0.46, Fisher's exact test). No interaction effects were found between Object Outcome, Trial Order and Test Pair.

Since the overall preference during baseline trials was different compared to Experiment 1 (where infants exhibited a significant preference for 2 objects, overall) we examined these trials in more detail in a subsequent analysis. A 2 (Object Outcome: one or two objects) × 2 (Trial Pair: first or second) ANOVA of just Baseline Trials revealed a significant interaction F(1, 15) = 4.55, p = 0.05, η <sup>2</sup>p = 0.23, resulting from a larger difference between one object and two object outcomes in the second pair (M = 6.2, SD = 5.4 and M = 8.7, SD = 7.1, respectively, t(15) = −2.9, p = 0.02, d = 0.36, two-tailed, Bonferroni corrected) than in the first pair (M = 11.4, SD = 6.9 and M = 10.3, SD = 4.5, respectively, t(15) = 0.75, p > 0.5, d = 0.19, two-tailed, Bonferroni corrected). Since the baseline preference found during this second pair of trials was more similar to what was observed in Experiment 1 and might be a more analogous test, we compared the results of this pair to both pairs of Test Trials, which yielded no significant interactions, F(1, 15) = 0.95, p = 0.35, η <sup>2</sup>p = 0.06, for the first pair and F(1, 15) = 0.88, p = 0.36, η <sup>2</sup>p = 0.06, for the second pair.

Additionally, we examined the nature of the reported null result in this experiment further by including a Bayes factor analysis using a one-sample t-test on the baseline-test trial difference score. This resulted in a Bayes Factor that favored the null (Scaled-Information Bayes Factor = 2.85; Rouder et al., 2009).

Finally, we compared results across experiments using a 2 (Outcome: 1 or 2 objects) × 2 (Trial Type: baseline or test) × 2 (Experiment Type: Experiment 1 or Experiment 2) analysis of variance (ANOVA), which yielded a significant three-way interaction among Outcome, Trial Type and Experiment Type, F(1, 30) = 7.02, p = 0.01, η <sup>2</sup>p = 0.19. This interaction suggests that the pattern of infants' looking responses in Experiment 2 were significantly different from that of Experiment 1.

# Discussion

The results of Experiment 2 show that infants failed to individuate 2 agents behind the screen when the puppets they saw engaged in identical sociomoral actions. Infants who viewed two instances of an identical-looking puppet either help or hinder the protagonist's goal of opening the box did not look longer at outcomes of either 1 or 2 individuals present behind the screen during test trials. In other words, viewing two discrete action events did not lead infants to expect two puppets behind the screen. This pattern of results stands in contrast to those we observed in Experiment 1, where infants' looking preferences were significantly influenced by viewing two different sociomoral actions. Taken together these results suggest that infants' successful individuation in Experiment 1 was not a purely numerical response based on them counting the number of discrete events. Infants across both experiments observed two events in each trial, but what seemed to affect infants' expectations of the number of puppets involved in the event was whether they saw two different or two of the same sociomoral actions. These results are consistent with the interpretation that infants are biased to perceive a puppet that engages in a helpful sociomoral action as a different individual than one who engages in the opposite sociomoral event a moment later.

A second alternative explanation that may account for the successful individuation we observed in Experiment 1 is that infants' representations are driven by differences in action type, regardless of whether those actions are social or not. For instance, our helping and hindering actions differed not only by their sociomoral disposition, but also by the types of motion and perceptual patterns that constitute those actions. For example, our hindering actions were characterized by a puppet pushing the box's lid down, whereas in the helping actions the puppet lifted the lid up. Second, our helping and hindering actions also involved differences in the first order goals that the characters demonstrate across the event. Namely, during a hindering action the puppet demonstrates the intention to close the box, which could result in the representation of that agent as a "closer," while during a helping action the intention of the puppet is to open the box, which could result in the representation of that agent as an "opener." Either of these alternatives, or both together, could be driving the individuation effect observed in Experiment 1 without requiring any sensitivity to sociomoral interaction per se, among the different characters involved in the event. In other words, are different sociomoral actions treated the same as two different actions of any type, even those devoid of any social meaning? In order to address this question a third experiment was conducted to determine how infants would respond after observing two separate events showing a character opening a box and then an identical-looking character closing a box. Although the mechanics of these actions were perceptually identical to those in Experiments 1 and 2, they were rendered

non-social in the current study by eliminating the protagonist from the event, thereby avoiding any interpretation of the events in terms of a social interaction or disposition. If infants' agent individuation is driven by differences in motion and first-order goals, then we should observe a pattern of results similar to those in Experiment 1.

# EXPERIMENT 3

# Participants

Sixteen 11-month-old infants (8 females) participated in this experiment (M = 11 months, 10 days, SD = 4 days). All infants were recruited from the Amherst, Massachusetts area, with approval from the University's Institutional Review Board, and written informed consent obtained from the parents. Three additional infants participated but were excluded from analysis because of fussiness (2) and experimental error (1).

# Materials, Design, and Procedure

The materials and design of the third experiment was the same for that of Experiment 1, except that in the Familiarization and Test Trials the Protagonist (the cow) and the cubes inside the box were removed from the show. The pattern of motion of both the Opening and the Closing actions was the same as the pattern of motion used in the Helping and Hindering events in the previous two experiments. During Opening trials, the Pig puppet jumped on the frontal right corner of the box, pulling up the lid completely backward. During Closing events, the Pig puppet grabbed the lid to close the box in a forward movement. A pause of about 6 s was used between both actions. The inter-observer agreement of this experiment was high (r = 0.96).

# Results

Preliminary analyses found no main effects for sex, Outcome Order (1 object or 2 objects first) or Trial Order (Opening first or Closing first); therefore, these variables were collapsed in subsequent analyses. A 2 (Object Outcome: 1 or 2 objects) × 2 (Trial Type: baseline or test) repeated measures analysis of variance (ANOVA) yielded no significant main effect for Object Outcome, F(1, 15) = 0.02, p = 0.9, η <sup>2</sup>p < 0.01, and Trial Type, F(1, 15) = 1.57, p = 0.23, η <sup>2</sup>p = 0.09, nor any significant interaction between Object Outcome and Trial Type, F(1, 15) = 0.13, p = 0.72, η <sup>2</sup>p < 0.01. Infants spent the same time looking at the 1 and 2 objects outcome in both the baseline trials (M = 10.01, SD = 5.52; M = 9.85, SD = 4.78, for one object and two objects, respectively, t(15) = 0.16, p > 0.5, d = 0.04, two-tailed, Bonferroni corrected, 95% CI = −3.2, 3.7) and the test trials (M = 11.36, SD = 4.97; M = 11.88, SD = 4.45, for 1 object and 2 objects, respectively, t(15) = −0.4, p > 0.5, d = 0.1, two-tailed, Bonferroni corrected, 95% CI = −3.3, 2.26). No interaction effects were found between Object Outcome, Trial Order and Test Pair. Nonparametric analysis revealed that only 7 out of 16 infants had a larger preference for the one object outcome in the test trials (p = 0.4, via a binomial test), while in the baseline trials 8 out of 16 infants preferred the one object outcome (p = 0.6, via a binomial test). The difference between both conditions was non-significant (p = 1, Fisher's exact test). Additionally, we examined the nature of the reported null result in this experiment further by including a Bayes factor analysis using a one-sample t-test on the baseline-test trial difference score. This resulted in a Bayes Factor that favored the null (Scaled-Information Bayes Factor = 2.82; Rouder et al., 2009).

Finally, a 2 (Object Outcome: 1 or 2 objects) × 2 (Trial Type: baseline or test) × 2 (Experiment Type: Experiment 1 or Experiment 3) analysis of variance (ANOVA) yielded a significant three-way interaction among Outcome, Trial Type and Experiment Type, F(1, 30) = 4.74, p = 0.037, η <sup>2</sup>p = 0.14. This interaction suggests that the pattern of results in Experiment 3 is significantly different from that of Experiment 1.

# Discussion

The results of Experiment 3 reveal infants' failure to individuate two agents behind the screen after observing two different but non-social actions. Infants who viewed a puppet emerge from behind a screen to open a box and then an identical-looking puppet emerge to engage in the opposite action of closing the box looked equally long at 1 and 2 object outcomes, suggesting that they did not clearly represent how many agents were involved in the event. In other words, events involving two discrete and opposite actions are not sufficient for driving infants' individuation judgments. This lack of sensitivity is striking since these events involved the same exact actions and movements (opening and closing a box) as those observed in Experiment 1. This suggests that infants' individuation judgments are not merely based upon observing actions that are perceptually distinct from one another.

This pattern also suggests that infants are not inferring the number of individuals in the event by representing the number of first-order goals they have attributed to the agents. For instance, infants in Experiment 1 might have attributed to an agent the goal of "opening" the box in one moment and the agent's goal of "closing" the box in the next and used that as the basis of their individuation judgment. However, despite the first order goals of the agents being equated, infants exhibited significantly different patterns of looking in the current experiment compared to those in Experiment 1. Taken together these results suggest that infants' successful individuation in Experiment 1 was not purely a response based on them counting perceptually discrete events or goal states.

# GENERAL DISCUSSION

The current study utilized an individuation task to investigate whether 11-month-old infants use social dispositions to keep track of the agents' individual identity. Experiment 1 found that when infants observe two different sociomoral actions, such as helping and hindering, their looking pattern is consistent with them having an expectation of two agents, despite the agents looking perceptually identical. By contrast, infants in Experiment 2 who observed two separate but identical sociomoral actions (either helping-helping or hindering-hindering), failed to individuate two agents, indicating that infants do not infer the

number of agents involved in the event solely based on the number of discrete actions they had perceived. Likewise, in Experiment 3 infants fail to individuate two agents based on differences in motion or distinct first-order intentions (e.g., puppets who engage in opposite actions, closing and then opening a box) alone, despite these events being the same as those actions infants witnessed in Experiment 1. However, it is worth highlighting that we did not find a significant effect in the test phase of Experiment 1 across both Helping and Hindering trials. Significant differences in test trials were obtained only in the Hindering first condition. Although an interaction effect in Experiment 1 indicates a significant change in the patter of infants' looking time, stronger evidence of an individuation effect of sociomoral dispositions should be collected in future studies. Ideally, this evidence should be collected by comparing experiments with similar group's baseline preferences. However, together these three experiments support the possibility that by the end of the first year of life infants represent intentions with sociomoral content as diagnostic of individual identity. While other types of social interactions might be sufficiently salient to drive similar individuation judgments, the difference between helpful and harmful actions might be an especially meaningful distinction for establishing social preferences (e.g., see Hamlin, 2013b for a review) and for tracking identity early in life.

The suggestion that infants' sociomoral evaluations govern their judgments about identity in the current work may be useful in explaining prior demonstrations of sociomoral evaluation in infants (Hamlin et al., 2007). In these studies, 9-montholds pick the character who previously displayed a prosocial action. One interpretation of this result in light of the current findings is that infants' choice of who to select or reject is informed by an underlying attribution they make about the agent's sociomoral identity. Infants' choice, even in a third-party context may be supported by the belief that an agent's past behavior is indicative of how it normally behaves or how it might behave in the future. This idea has support from recent work showing that 14-month-old infants seemingly expect an agent who has acted in a helpful manner toward another (e.g., helping them climb a hill) to also distribute resources fairly in another context (Surian et al., 2018). Therefore, it seems that early on in development infants are able to reason about agents' sociomoral behaviors as stable and identity-determining dispositions. However, other authors (Liu et al., 2007) have claimed that the origins of trait-based reasoning may come from an understanding of labels as referring to kinds. Labeling, and namely generic language, has been shown to promote essentialist beliefs in the social domain (Rhodes et al., 2012). Thus, it could be the case that the use of trait labels leads children to infer that sociomoral behaviors come from internal and stable dispositions. The current individuation study suggest, however, that at the onset of language acquisition infants have already a basic intuition connecting sociomoral behaviors to different individuals. Over development, and through labeling, children may get a deeper understanding of sociomoral dispositions and engage in a more sophisticated social reasoning. For instance, although preschoolers understand the stability of sociomoral traits over time (Diesendruck and Lindenbaum, 2009), not until

8–9 years of age children are able to make trait-consistent predictions based only on observed behavioral information (Rholes and Ruble, 1984). Additionally, the early ability to represent sociomoral behaviors as stable dispositions suggested in the current research is only one part of a mature traitbased reasoning. Representing sociomoral disposition as traits also implies making rich inductions about possible behaviors, emotions and attitudes in different scenarios, and thus it implies a wider sense of identity (Heyman and Gelman, 1999).

Although the current research suggest that for infants sociomoral disposition are diagnostic of individual identity it is less informative as to how precisely they represent the identity of sociomoral agents. For example, the representation of the identity of an animal is different from the identity of an artifact (Kelemen and Carey, 2007). Animals, as natural kinds, are represented as possessing an objective and intrinsic identity, whereas artifacts are represented as possessing a more contextual and graded identity (Estes, 2003; Rips, 2011). The current research cannot determine how strongly infants connect an agent's sociomoral behavior to their identity, as both natural and artifact kinds have been shown to support object individuation in infancy (Xu et al., 2004; Futo et al., 2010).

There are at least two possible interpretations of the current results. First, infants may represent an agent's sociomoral disposition as a stable trait that is indicative of its kind or category. In support of this view, a recent study demonstrated that 9-month-old cannot form graded representations of prosocial and antisocial dispositions in the same agent (Steckler et al., 2017). Over time, children may become more flexible and admit graded representations of moral dispositions. A second possibility is that infants represent moral dispositions as a more relative and contextual trait from the start. This interpretation would be necessary for infants to exhibit more complex social inferences, as people engage in different types of social relationships with different people and in diverse situations. Indeed, previous research has demonstrated that infants are able to take into account contextual factors when reasoning about social behavior. For instance, infants prefer to interact with prosocial over antisocial agents, but they also prefer antisocial agents who harm dissimilar others (Hamlin et al., 2013a). Thus, they know that being "mean" or "nice" depends on the previous history of the characters involved. Either way, the current research suggests, first, that infants are able to use abstract psychological information to individuate different moral agents, and second, that they prioritize second-order over first-order intentions in doing so. However, more research is necessary to clarify how infants connect moral disposition to agents' identity.

The issue about agents' identity is also related to the observed asymmetry between Hindering first trials versus Helping first trials in Experiment 1. Infants seemed to have a stronger representation of two individuals when they witnessed the negative event first. This result is instructive since it suggests that the valence of the events witnessed had an effect on the infants' looking time, something that did not manifest in Experiment 3, where social information was removed. However, future research should clarify what the reason of this effect was. For instance, the timing of the Helping-Hindering sequence in each trial was

unusually long compared to previous individuation experiments (45 s) and this might be particularly harmful in the Helping first trials where the less salient (and more expected) event was shown first. This raises the possibility of having a stronger individuation effect in a future study by presenting a shorter Helping-Hindering sequence. If the difference still remains, then this effect may be telling of a deeper asymmetry in the infants' representation of sociomoral actions.

A related open question concerns the specificity of the underlying representations infants are using, both in the current experiments and in prior studies using a social evaluation paradigm. To our knowledge, no prior research has determined how specific or abstract infants' representations of such social interactions are. For example, infants may represent sociomoral dispositions that are very specific and conservatively bound to the context or action type in which they were observed (e.g., "the agent helped open the box"). Alternatively, infants may represent the same action in a deeper, more abstract way that refers to a general type of disposition (e.g., "the agent is a helper"). The latter possibility would be indicative of infants possessing a "kind" representation for sociomoral actions, where they represent a variety of sociomoral dispositions of the same valence as belonging to the same category. We are agnostic as to what type of representation may have driven the effects reported here. However, the object individuation paradigm could provide insights related to this distinction in the future by testing whether infants represent two different sociomoral actions of the same valence (e.g., helping an agent open a box and helping an agent climb a hill) as diagnostic of one or two agents. A failure in individuating two agents in this case compared to success in a task that involves two different events of different valences (e.g., helping an agent open a box and hindering an agent's goal of climbing a hill) might indicate infants' representation of such sociomoral interactions in a more kind-based manner.

Some statistical concerns still remain in this study. Significant differences in baseline looking times were obtained only in the Experiment 1, while the test trials did not reach a significant difference between one and two-objects displays. Although the same group of infants were compared across baseline and test trials, thus controlling for individual differences, it would be worthwhile to replicate the results of Experiment 1 by using different procedures in the future.

Finally, this research might also help inform how the representation of agents differs from the representation of physical objects in the infants' mind. Unlike inert physical objects, agents' behavior is better explained by internal non-obvious properties, in such a way that infants' suppose that animal-like

## REFERENCES


agents are endowed with internal physical properties (Setoh et al., 2013). Similarly, social behavior is better explained by internal dispositions as they have more predictive power than first-order intentions. In a previous study (Taborda-Osorio and Cheries, 2018), infants were shown to individuate agents based on the perception of internal physical properties while in the current one infants individuate based on moral dispositions. Thus, it appears that infants may represent agents as possessing diverse causal powers, and they pick these properties as more identitydetermining than external properties when pitted against each other. Future research might expand on these findings by examining whether sociomoral dispositions are attributed to an agent's internal properties and whether infants might also individuate based on other social behaviors (e.g., dominance) and social membership.

# ETHICS STATEMENT

This study was carried out in accordance with the recommendations of the University of Massachusetts Amherst Human Subjects IRB. The protocol was approved by the University of Massachusetts Amherst Human Subjects IRB. All subjects gave written informed consent in accordance with the Declaration of Helsinki.

# AUTHOR CONTRIBUTIONS

HT-O, AL, and EC conceived and planned the experiments. HT-O and AL carried out the experiments. HT-O and EC contributed to the analysis of the results and to the writing of the manuscript.

# FUNDING

This work was supported by a Fulbright Scholarship Award to HT.

# ACKNOWLEDGMENTS

We thank the participating infants and their families, and Neil Berthier, Louise Antony, and Ronnie Janoff-Bulman for providing comments on previous drafts of this manuscript. We also thank Cleo Bergmann, Sara Klum, and Shannon Slater for their help in data collection.

judgements. Child Dev. Pers. 4, 212–218. doi: 10.1111/j.1750-8606.2010. 00149.x



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Taborda-Osorio, Lyons and Cheries. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.