Inferring Unseen Causes: Developmental and Evolutionary Origins

Civelek, Zeynep; Call, Josep; Seed, Amanda M.

doi:10.3389/fpsyg.2020.00872

ORIGINAL RESEARCH article

Front. Psychol., 06 May 2020

Sec. Cognitive Science

Volume 11 - 2020 | https://doi.org/10.3389/fpsyg.2020.00872

This article is part of the Research TopicCausal Cognition in Humans and MachinesView all 12 articles

Inferring Unseen Causes: Developmental and Evolutionary Origins

Zeynep Civelek^*

Josep Call

Amanda M. Seed

School of Psychology and Neuroscience, University of St Andrews, St Andrews, United Kingdom

Human adults can infer unseen causes because they represent the events around them in terms of their underlying causal mechanisms. It has been argued that young preschoolers can also make causal inferences from an early age, but whether or not non-human apes can go beyond associative learning when exploiting causality is controversial. However, much of the developmental research to date has focused on fully-perceivable causal relations or highlighted the existence of a causal relationship verbally and these were found to scaffold young children’s abilities. We examined inferences about unseen causes in children and chimpanzees in the absence of linguistic cues. Children (N = 129, aged 3–6 years) and zoo-living chimpanzees (N = 11, aged 7–41 years) were presented with an event in which a reward was dropped through an opaque forked-tube into one of two cups. An auditory cue signaled which of the cups contained the reward. In the causal condition, the cue followed the dropping event, making it plausible that the sound was caused by the reward falling into the cup; and in the arbitrary condition, the cue preceded the dropping event, making the relation arbitrary. By 4-years of age, children performed better in the causal condition than the arbitrary one, suggesting that they engaged in reasoning. A follow-up experiment ruled out a simpler associative learning explanation. Chimpanzees and 3-year-olds performed at chance in both conditions. These groups’ performance did not improve in a simplified version of the task involving shaken boxes; however, the use of causal language helped 3-year-olds. The failure of chimpanzees could reflect limitations in reasoning about unseen causes or a more general difficulty with auditory discrimination learning.

Introduction

In life and also in science, much of the evidence we get for causal relations is indirect. We can infer the existence and nature of a cause for an event despite not witnessing it directly: if it is hidden from our perspective, or if it is not perceivable by the senses. Our inferences can range from identifying the cause of a crashing sound coming from the kitchen (the wooden cutting board or the metal pot falling on the floor) to the causes of global warming (anthropogenic impact on the greenhouse effect). But how do we do this? Bullock et al. (1982) suggest that we use the principles of determinism, priority and mechanism: We assume that there is a causal structure to the world (i.e., that events typically have causes); that these structures are unidirectional (i.e., causes come before their effects) and that events are underpinned by a causal mechanism of some kind. Using these principles, and our prior knowledge with regards to specific relations, we can work our way from effects to detect likely causes. This is an extraordinary ability that frees us from relying on what can be directly perceived, allows us to make predictions about the future, and intervene to bring about desirable outcomes.

However, we can also learn regular covariations in spatiotemporal contiguity, which allow us to exploit a causal pattern even if we do not theorize about the generative mechanism (Shanks and Dickinson, 1987). If two events occur repeatedly under close spatiotemporal proximity, we form associative links between them. Later when one of the cues occur, the other can be predicted without any reference to the causal mechanism involved, indeed, without any explicit awareness of the relationship at all (Reber, 1989). Conversely, we can learn a great deal about unseen causal relations without any direct experience: from others’ explicit testimony or implicit linguistic cues to causality (Harris and Koenig, 2006; Gelman, 2009). We may even learn about causal relations we may not have learnt otherwise (e.g., “The gravitational attraction of the moon causes tides”). These three alternative routes to exploiting causal relations in the world (association, theory-building and testimony) are not mutually exclusive, as adults we make us of all of them, and they interact in important ways.

What are the origins of these abilities in human development and over human evolution? There is good evidence that statistical or associative learning is present early in infancy (Aslin et al., 1998; Kirkham et al., 2002), and that this ability is shared with a great many other species. It is similarly uncontroversial that learning from testimony is a route available to children once they learn language, and unique to our species. However, when it comes to going beyond the data to reason about causal mechanisms there is more controversy both in developmental and comparative psychology (Penn and Povinelli, 2007; Bonawitz et al., 2010; Seed et al., 2011). Some researchers have suggested that humans have a natural tendency to explain the events they observe in terms of causal theories from very early in life (Bullock et al., 1982; Gopnik and Wellman, 2012). If this is the case, it is plausible that we share this ability with our closest primate relatives, and possibly other species (Seed et al., 2011; Völter and Call, 2017). Alternatively, others contend that causal thinking in early childhood might not be well-characterized by the notion of “theories all the way down” (Carey and Spelke, 1996). Instead children’s thinking about causation may only approximate scientific thinking later in development, due in part to input from others with the development of language. If this is the case, we may not expect to find causal reasoning in non-human primates. Penn and Povinelli (2007) have argued that there is no evidence non-human animals represent causality as such.

While tackling these questions empirically, one issue common to the comparative and developmental literature concerns distinguishing causal reasoning (based on representations of causal mechanism) from associative learning (making predictions in the absence of these representations), since events that are causally linked tend to co-occur. From a developmental perspective alone, a second issue concerns teasing apart the role of causal language and reasoning since children can use both to solve causal problems. We have two aims in this paper: (1) to further explore children’s inferences about unseen causes in the absence of linguistic cues to causality, and (2) to use the same paradigm to explore this ability in our closest relatives, chimpanzees.

There is substantial research suggesting that preschool children take unseen causal relations into account when explaining natural phenomena such as light (Bullock et al., 1982), wind (Shultz, 1982), electricity (Buchanan and Sobel, 2011), and contamination by germs (Legare et al., 2009). However, it is difficult to isolate the route to causal knowledge in cases that involve familiar events such as these. Children may have extensive prior experience with lights and blowing candles which may lead to forming associative links or may have been explicitly taught by adults about how “germs cause disease.” Indeed, younger preschoolers who supposedly did not have extensive experience with wires and electricity, failed to reason about these relations and made decisions based on covariation information instead (Buchanan and Sobel, 2011). They were only able to solve the problem when it involved more familiar batteries. Although it is possible that experience leads to extracting abstract causal information, it may also lead to learning arbitrary associations (e.g., when there are batteries inside, the toy works).

A way to address this issue has been to present preschoolers with novel and arbitrary causal structures. As adults and scientists, when the evidence we get does not fit with our prior knowledge or expectations, we infer unseen causes or confounding variables. In order to test if children reasoned in the same way, children were first trained on a novel causal structure (e.g., puppets moving in a certain way), and then saw evidence that was inconsistent with their training (Gopnik et al., 2004; Schulz and Sommerville, 2006; Schulz et al., 2008). When children were asked to make predictions about the cause of this inconsistent event, they were more likely to say that an unseen cause (i.e., “something else”) was responsible. Children also displayed an ability to imagine the effect of a hidden cause in a series of experiments by Siegel et al. (2014). They were able to select boxes to shake that would yield unambiguous data (e.g., if their task was to locate a hard object, they chose to pair it with a soft object rather than another hard object). However, in these studies the existence of a cause and the possibility that it might be unseen was provided in the framing of the task by the experimenter so the children did not have to infer it from the evidence alone. For instance, the experimenter asked “Why are the puppets moving together? Is it X, Y or something else?”

Overall, the evidence suggests that by 4 years of age children can successfully detect the presence of an unseen cause and make inferences about their nature; but the potential impact of others’ verbal testimony on their abilities has not been explored to date. Gelman (2009) argued that children are not “lone scientists”: they get much needed input from adults around them. Linguistic framing can help children to specify a causal relation by testifying that the covariations they see are indeed causal; and the use of same wording can point to the commonalities between an observed action and agent’s action (as in intervention studies: “The block makes it go. Can you make it go?”). Indeed, there is accumulating evidence that the use of causal framing can impact children’s propensity to make causal inferences from directly perceived and indirect evidence (Sobel and Sommerville, 2009; Bonawitz et al., 2010; Butler and Markman, 2012; Lane and Shafto, 2017).

One possibility is that verbal framing merely highlights the problem for children: making the task more sensitive to their theory construction ability by reducing peripheral demands such as the need to focus attention (Sobel and Sommerville, 2009). Another possibility is that without the verbal framing younger children are yet to develop some of the fundamental cognitive components needed to construct a causal explanation from evidence alone. The difficulty with using never-seen-before causal relationships is that some training or explanation is necessary for children to have the required background information to make inferences. While the nature of the instructions have been varied, they are rarely excluded. The verbal framing may simplify the task for older children, equally, it may make the test unsuitable for younger children such as 2–3 year-olds if they lack sufficient verbal ability to follow the instructions. We therefore designed a paradigm with minimal language requirements to explore this issue. We also intended to use this paradigm to make comparisons between children and non-human primates. This line of evidence could be very informative in establishing the degree to which human scientific thinking is grounded in skills we share with our closest relatives, or is rather a skill that requires cultural input over development to emerge, and verbal input to elicit in younger children.

Whether or not our closest relatives, chimpanzees, engage in causal reasoning is a controversial issue in comparative psychology. Some authors propose that causal reasoning is a uniquely human ability; and chimpanzees either learn associatively or they rely on generalizations based on the surface appearance of objects alone to solve problems (Penn and Povinelli, 2007; Penn et al., 2008; Bonawitz et al., 2010). Limitations in performance in some tasks designed to probe the causal reasoning abilities of great apes would seem to support this interpretation (Köhler, 1925; Limongelli et al., 1995; Povinelli, 2000; Call, 2007). In contrast to Penn and Povinelli (2007), Seed et al. (2011) proposed that non-human great apes can make use of causal information from events happening around them if the testing situation does not overload other cognitive resources. It could be shown that they did not rely solely on the available sensory information to learn associations. However, it has been a challenge to decisively distinguish associative learning from causal reasoning.

One of the most promising ways to resolve this issue has been to compare how non-human primates (and other animals, such as corvids and dogs) make inferences about the location of food in two contexts, either: (a) the evidence is caused by the food or (b) the evidence co-varies with the presence of food but the relation is arbitrary (reviewed in Seed and Mayer, 2017; Völter and Call, 2017). Great apes successfully used indirect evidence to locate food in a number of studies: in the form of auditory cues coming from shaken cups (Call, 2004), the visible effect of weight (Hanus and Call, 2008); and visible traces or trails (Völter and Call, 2014). In the critical comparison conditions, in which the relationship between a similar cue and the food location was arbitrary rather than causal, apes did not find the food (for example, if the experimenter played the recording of the rattling sound over the baited cup, Call, 2004). Taken together these studies imply that apes are capable of causal reasoning about unseen causes.

However, the comparability of the arbitrary conditions to the causal ones were criticized. For example, Penn and Povinelli (2007) point out that the “recorded sound” control of the shaken cups study was not identical to the sound the shaken cup made. They further argued that the results could still be explained by associative learning if subjects had used the combination of shaking motion and rattling sound as a discriminative cue for locating food. Overall, the comparability of the experimental and control conditions in terms of different feedback (e.g., auditory) poses a challenge for distinguishing causal reasoning from associative learning.

The task presented in this study was designed to address some of the empirical challenges raised above by reducing verbal requirements and implementing robust controls for associative learning. In the “causal condition,” a ball containing a reward was dropped into a forked tube, and could be found in one of two cups at the bottom. After the ball was dropped, participants heard either a ding or a clack sound. After a few trials, subjects were expected to learn that when they heard a ding, the ball would be in one cup and when it was a clack, the ball would be in the other one. If subjects succeed in this condition, it might mean that they reasoned about the underlying causal structure (the ball hitting the different boxes caused different sounds) or that they simply associated the sound with the side (if ding, choose right). In order to distinguish between these two possibilities, in the “arbitrary condition” the order of events was reversed: participants first heard a ding or a clack sound, and then the ball was dropped into the forked tube. Although the sounds were still predictive of the location of the ball (if ding, choose right), the relationship was now arbitrary. Critically, the two conditions were equivalent from an associative learning perspective since the stimuli involved in both conditions were exactly the same and the only difference was the order of events. However, if participants reason about unseen causes, they are expected to do better in the “causal condition” where there is a plausible causal structure than in the “arbitrary condition.”

In previous studies, we have found such differences between causal and arbitrary conditions in children between the ages of 3 and 5, when dealing with directly perceivable events such as choosing an appropriate tool or an unobstructed path for extracting a reward (Mayer et al., 2014; Seed and Call, 2014). However, such performance differences are not apparent in older children, probably because 6-year olds are capable of interpreting arbitrary cues as symbolic communication to solve a problem (DeLoache, 2004; Seed et al., 2011; Mayer et al., 2014). We therefore focused on the 3–6-year-olds in this study. By 3-years of age children expect causes to precede their effects (Bullock and Gelman, 1979; Rankin and McCormack, 2013) so we predicted that by this age children should perform at above chance levels in the causal condition if they reasoned causally, and by 6-years they should be above chance in both conditions.

Experiment 1: Children

Methods

Participants

Three-to-six-year-old children (N = 129) were tested in different locations in Scotland. There were 65 children in the causal condition and 64 in the arbitrary condition. Age and sex were split roughly equally in the two conditions (Table 1). Twenty-three additional children that were tested were excluded from the study due to experimenter or apparatus error (7), parental interference (3), discovery of the trick about the box (4) and refusal to complete the task (9). All the children studies reported in this paper were ethically approved by University of St Andrews Teaching and Research Ethics Committee and informed consent were taken from parents/guardians.

TABLE 1

Table 1. Age, sex, and mean/median performances of children in Experiments 1, 2, 4, and 5.

Materials

Transparent training box

The training apparatus was a forked chute made from clear acrylic (Figure 1). The middle singular channel (30 × 6.5 × 5cm) was forked into two channels. Directly at the bottom of the channels there were two white acrylic boxes (2.5 cm apart). The channels were mounted on a white acrylic back panel (30 × 49 cm); and a base panel (25 × 30 cm) to stand. They contained pegs that were 7.5 cm apart from each other on both sides. The pegs were designed to slow down the fall of the ball and to make sounds so that subjects could easily follow the ball’s trajectory. A peg positioned right above the fork could be moved to the either side from behind the back panel. It enabled the experimenter to control which side the ball would fall in a trial.

FIGURE 1

Figure 1. Transparent training box (A), opaque testing box (B), and the back of the opaque testing box (C) used in Experiments 1 and 3.

Opaque testing box

The testing apparatus had the same measurements as the training box but the channels were opaque. The boxes at the bottom of the channels were spray-painted, one yellow and one gray, using Plastikote stone-textured paint (Figure 1). In the testing apparatus, the back panel concealed two additional elements which, unbeknownst to the participant, controlled the falling of the ball through the apparatus and the production of the sound cues.

First, there was a middle singular channel (30 × 10 cm) into which the dropped ball would fall, hitting pegs along the way, and land noiselessly on a piece of foam. Below this channel was a shorter one (6.5 cm) in which a second ball was held and could be released onto a noise-making block (wooden or metal). This block could be exchanged by the experimenter depending on the trial. These two components were combined through the action of two small motors which controlled the rotation of small plastic supports that held the two balls in place. When the motors were switched on by a remote, the plastic supports would rotate, releasing the two balls according to a precise timing. The two buttons on the remote controlled the order in which the motors would activate. In the causal condition, the motor at the top would operate first and let the ball dropped into the apparatus by the experimenter, go down the channel hitting the pegs, and then the motor at the bottom would release the second ball to fall onto the metal/wooden piece positioned by the experimenter. The intended illusion was that the ball had fallen down the channel into one of the two boxes and made a distinct sound. In the arbitrary condition, the activation of the motors was reversed. The second support moved first to release the ball on the metal/wooden piece, and then the experimenter dropped the ball in time for the first support to rotate and let the ball fall down the channel with the pegs. The time interval between the activation of the two motors copied the actual time it would take the ball to fall in reality and was the same in both conditions. In the causal condition it appeared as a single event sequence. The electronic card that controlled the motors was concealed in a box behind the apparatus (Figure 1). The reason for creating the illusion rather than using a real event sequence was that: (1) no local sound cues were given to locate the ball; and (2) the order of the cues could be reversed in the arbitrary condition while keeping everything else about the stimuli exactly the same.

The balls were made of thermoplastic (1.60 cm in diameter) and contained a hole in the middle where the reward could be put.

Procedure

Training phase

The experiment started with the transparent training box. The experimenter introduced the task saying; “In this game, I will put a sticker in the ball and then I will drop the ball from here (the top opening). It will roll down to one of these boxes (points to the boxes at the bottom). If you find the ball, you will win the sticker. Ready?” The experimenter then dropped the ball and the child could watch the entire trajectory of the ball until it came to rest, hidden, in one of the boxes. The child then pointed to or opened the box that she/he thought the ball was in. Once the child made a choice, the other box was also opened to show the content. Transparent training ended after five consecutive successes or ten trials in total.

Test phase

After the training, the experimenter said “This game was too easy for you! Shall we make it more fun?” and brought out the opaque testing box. Then introduced the task to the children; “The game is the same. I will put a sticker in the ball and drop the ball from here. If you can find the ball, you will win the sticker. You cannot see inside the box anymore, but there is still a way to find the ball in the correct box! Do you want to try?” Before each trial, the experimenter prepared the apparatus behind a barrier by putting a ball with a sticker inside into one of the boxes at the bottom, placing another ball on the support attached to the motor just above the metal/wood piece and holding another in her hand for the child to see. The metal and wood pieces were interchanged in between trials and the remote that controlled the events rested behind the apparatus.

In the causal condition, the experimenter pressed the causal-order button on the remote while dropping the ball. From the participant’s perspective, they would see the experimenter drop a ball into the apparatus, follow the trajectory of the fall due to the pegs inside the middle channel and then hear a metallic or wooden sound.

In the arbitrary condition, the experimenter pressed the arbitrary-order button on the remote. The participant would first hear a metallic/wooden sound and then see the experimenter drop the ball into the apparatus and follow the trajectory of the fall due to the pegs.

If the child found the ball, the experimenter said “Well done! You won a sticker!” removed the other box to show that it was empty and prepared for the next trial. If the child did not find the ball, the experimenter said “Oh no! It was here (opening the other box). Let’s do it again!” In total children got 20 testing trials which lasted about 15 min. For a given participant the position of the yellow and gray boxes at the end of the channels stayed the same over the 20 trials (e.g., the yellow box on the left was associated with a ding, and the gray box in the right was associated with a clack), but between subjects the pairing of the color of the box and the sound were randomized. The ball was placed in each box 10 times in a random fashion but never in the same box more than twice in succession.

Open ended question

At the end of the task, the experimenter asked children; “How did you decide which box to choose?” If children did not reply, the experimenter elaborated “Sometimes the ball was in the gray one and sometimes in the yellow one. How did you know where the ball was?” Other than 15 missing explanations (first 10 participants were not asked because it was not initially planned in the study design and 5 other participants had to leave immediately after testing), all children responded to the question.

Scoring and Analysis

The first choice of the subjects was scored as their response in all of the experiments. All trials were scored live by the experimenter as correct or incorrect and were also videotaped. A second examiner coded 20% of the videos for reliability, Kappa = 0.97 (95% CI [0.95, 0.99], p < 0.001. The mistakes that were found by the second coder were corrected and all the videos were recoded from the video once again to check for other potential mistakes (none were found). The data for this study can be found at Supplementary Table S2.

We specified generalized linear mixed models (GLMM; Baayen, 2008) with binomial error structure and logit link function using the function glmer of the R-package lme4 (Bates et al., 2015) for all of our analyses in this paper. In Experiment 1, our full model comprised of condition (causal/arbitrary), age, and their interaction; trial number, and sex as fixed effects. Subject ID and the side of the boxes were included as random effects. In order to keep type-1 error rates at the nominal level of 5%, we included random slopes of trial number within subject ID, but left out the correlation parameters between random intercepts and random slopes terms (Schielzeth and Forstmeier, 2009; Barr et al., 2013). We compared the full model to a null model which included only the random effects using a likelihood ratio test.

The model stability was assessed by excluding individual cases one at a time and comparing the estimates with those derived from a model with the full data set. The model was stable with regards to the fixed effects. We checked whether the variability was greater than expected (overdispersion) and found that it was not an issue with regards to the final model (dispersion parameter: 0.95). Finally, variance inflation factors (VIF) were calculated using the function vif of the R-package car and it did not indicate collinearity to be an issue.

The data was not normally distributed so non-parametric Wilcoxon signed-rank tests were used to examine whether children’s performance was significantly different from chance level (p = 0.05) in different conditions and age groups. Children who chose one side 16 or more times were counted as side biased according to a two-tailed binomial test (p = 0.004). Chi-square tests were used to explore the relationship between side bias, condition and age.

Children’s responses to the open-ended questions were categorized into five types of explanations (N = 114) using the relevant categories from Legare et al. (2010). The first category, “No explanation,” consisted of children who could/did not provide a verbal strategy (e.g., pointed to the boxes, said “yellow/gray one”). The second category was “Don’t know,” which consisted of children who said they did not know how to find the ball and they were just guessing. The third category was “Non-causal strategies” that referred to a solution based on a non-causal feature or pattern (e.g., the ball alternated right-left-right-left, “because of the colors”). The fourth category was “Causal explanations that were wrong” (e.g., “I followed the noises into the boxes,” “The box wiggled a bit when the ball fell into it”). And the last category was “Referring to different sounds/materials” which showed an understanding of the true causal structure (e.g., “They made two different sounds”). A second examiner categorized children’s answers into these five different types of explanations. There was a high agreement between the two coders, Kappa = 0.86 [95% CI, 0.80, 0.93], p < 0.001 and it rose to Kappa = 0.96 [95% CI, 0.92, 0.99], p < 0.001 after further discussions. The disagreements were due to some responses that could be categorized either as category one or two (e.g., “Don’t know” and points to the boxes). We decided to include them in “no explanation” category as they were mostly pointing gestures. Only when children explicitly stated that they were just guessing, we included them in “Don’t know” category. The relationship between verbal explanations, age and condition was explored using chi-square tests.

Results

Training

All children except for one 5 and three 3-year-olds passed the transparent training within 5 consecutive trials. Two of these children needed 6 and the other two needed 8 trials to complete the transparent training.

Test

The full model comprising of the interaction of age and condition, sex and trial number as fixed effects fit the data better than the null model which lacked these fixed effects [χ²(9) = 30.91, p < 0.001]. We found that there was a significant condition and age interaction [χ²(3) = 8.71, p < 0.05] and a significant effect of trial number [χ²(1) = 6.99, p < 0.01]. There was no effect of sex [χ²(1) = 3.17, p = 0.075] (Supplementary Table S1).

Comparisons of children’s performance in different conditions across age groups showed that there was no significant difference between performance in the causal and arbitrary conditions for 3- and 6-year olds (Mann–Whitney U-Test for 3-year-olds: U = 93, N_causal = 16, N_arbitrary = 16, p = 0.348; 6-year-olds: U = 133, N_causal = 17, N_arbitrary = 16, p = 0.921). Three-year-olds performed at chance level in both causal (Median: 0.45, Wilcoxon signed-ranks test: T⁺ = 72, N = 15, p = 0.513) and arbitrary conditions (Median: 0.5, T⁺ = 40, N = 11, p = 0.562); 6-year-olds were above chance in both causal (Median: 0.6, T⁺ = 127, N = 17, p < 0.05) and arbitrary conditions (Median: 0.6, T⁺ = 92.5, N = 14, p < 0.01). Four-year-olds performed significantly better (U = 64.5, N_causal = 16, N_arbitrary = 16, p < 0.05) and above chance levels in causal condition (Median: 0.62, T⁺ = 98.5, N = 14, p < 0.01) as opposed to chance level performance in arbitrary condition (Median: 0.5, T⁺ = 39, N = 12, p = 1). Five-year-olds showed a similar trend for better performance compared to chance in the causal condition (Median: 0.55, T⁺ = 61, N = 12, p = 0.08) than in the arbitrary condition (Median: 0.48, T⁺ = 66, N = 15, p = 0.751); however this difference was not significant (U = 93, N_causal = 16, N_arbitrary = 16, p = 0.191). Figure 2 shows the average performance of each age group in causal and arbitrary conditions. An effect of learning as evidenced by the significant effect of trial number on performance was found. This was expected given that subjects had no way of solving the task in their first trial.

FIGURE 2

Figure 2. Performance of children in the causal and arbitrary conditions in Experiment 1 (N = 129, see Table 1 for age group information and means). Dotted line shows chance level performance (p = 0.05), error bars represent SE.

There was no significant relationship between condition and side-bias [χ²(1) = 0.73, p = 0.39], however, there was a significant relationship between age and side bias [χ²(3) = 16.77, p < 0.001]. Three-year-olds were more likely to be side biased than other age groups.

Open Ended Question

Table 2 summarizes the percentages of children’s responses to the question “How did you decide which box to choose?” in each age group across two conditions. For a more robust analysis using chi-square, “no explanation” and “don’t know” categories; and “non-causal strategies” and “wrong causal explanations” categories were lumped to result in three explanation categories in total: “no idea,” “wrong idea,” and “correct explanation.” According to the chi-square analysis there was not a significant relationship between 3-year-old children’s explanations and the condition they were in [χ²(1) = 0.36, p = 0.55]. In both conditions, a high percentage of 3-year-olds had “no idea” about how to find the ball, a minority gave a wrong explanation and there were no children who could provide the correct explanation. There was a significant relationship between 4-year-olds’ explanations and condition [χ²(2) = 6.95, p < 0.05]. Although the majority of 4-year-olds were in the “no idea” category in both conditions, 33.3% could provide the correct explanation in the causal condition whereas none did in the arbitrary condition. Interestingly, there was a higher percentage of children in the arbitrary condition who gave a “wrong idea” explanation compared to those in the causal condition. The relationship between explanations and condition were marginally significant for 5-year-olds [χ²(2) = 5.73, p = 0.057]. “No idea” responses were comparable in both conditions, however, there were more 5-year-olds in the arbitrary condition who referred to wrong explanations than in causal condition and there were more children in causal condition that referred to the “correct explanation” than in the arbitrary condition. There was a significant relationship between 6-year-olds’ explanations and the condition they were in [χ²(2) = 6.51, p < 0.05]. The pattern was similar to 5-year-olds. More children referred to wrong explanations in the arbitrary condition compared to the causal condition and more 6-year-olds in the causal condition came up with an explanation based on different sounds than in the arbitrary condition.

TABLE 2

Table 2. Percentage of children who gave the following explanations in response to the question “How did you know where the ball was?” in Experiment 1 (N = 114).

Finally we explored whether children’s reports matched with their performance. The performance of the 16 children who referred to different sounds/materials in the causal condition was compared with the performance of an age-matched group in the causal condition who gave other explanations. The model comprising of the fixed effects of explanations (correct/incorrect), trial number and sex fit the data better than the null model without the fixed effects [χ²(3) = 29.47, p < 0.001] (Supplementary Table S2). There was a significant effect of explanation type on children’s performance [χ²(1) = 24.90, p < 0.001]. Children who gave the correct explanations performed better than their peers who gave incorrect explanations [Mean difference = 0.25, 95% CI [0.15, 0.36], t(15) = 5.24, p < 0.001]. Moreover, children who gave correct explanations performed above chance levels [M = 0.77, 95% CI [0.69, 0.85], t(15) = 7.17, p < 0.001], whereas those who gave incorrect explanations were at chance [M = 0.52, 95% CI [0.47, 0.56], t(15) = 0.76, p = 0.46].

Discussion

When the sound cues were consistent with a causal structure, by 4-years of age children used the discriminatory sound cue to locate the ball, whereas 3-year-olds failed. When the cues were not consistent with a causal structure, 4–5-year-olds did not use these same sounds to find the ball; and performed worse than they did in the causal condition. This difference was significant for 4-year-olds but not for 5-year-olds. These results suggested that children went beyond the immediately available cues to imagine their likely unseen causes. The explanations children provided about how they found the ball matched the results of the main task. More children referred to different sounds/materials when there was a plausible causal structure than when the relation was arbitrary. In addition, the children who referred to different sounds outperformed their peers who gave different explanations for their choice.

However, one could argue that the temporal proximity between the distinct sound cue (metal/wood) and the outcome (choice of one box) was smaller in the causal condition: when the order of events was “falling” (filler) sound, metal/wood sound, choice, than in the arbitrary condition, when the order was metal/wood sound, filler sound, and then choice. And since associations are more easily formed between temporally proximate events (Barnet et al., 1991; Miller and Barnet, 1993), and even brief delays have been shown to result in a reduction of causality judgments (Michotte, 1963; Shanks et al., 1989), these could explain the better performance in the causal condition compared to the arbitrary condition. In Experiment 2 we tested this alternative explanation.

Six-year-olds performed equally well in both conditions. Their successful performance in the arbitrary condition might have resulted from the ability to treat arbitrary cues as symbols to solve a problem (DeLoache, 2004; Seed et al., 2011; Mayer et al., 2014). On the other hand, 3-year-olds did not pass either condition in this study, they were unable to provide a verbal explanation about how they found the ball and were more likely to be side-biased.

One possibility for the failure of 3-year-olds could be that unlike older children, they cannot, or do not spontaneously, imagine unseen causes. However, other explanations are possible too, such as the necessity to remember the cues which, being auditory, are transitory, and map them to one of the two boxes which do not look to be made of the materials evoked by the sounds. In Experiment 4 we simplified the task by using boxes that were visibly made of metal and wood, to examine whether or not this task would be easier. In Experiment 3, we tested chimpanzees, and planned to titrate the level of difficulty based on our initial results with the task described above.

Experiment 2: Follow-Up With Arbitrary Sounds

In this experiment, we tested whether better performance in the causal condition as opposed to the arbitrary condition in Experiment 1 could be due to temporal proximity of the sound cues and the outcome. Children were asked to locate a sticker in one of the two boxes based on recorded sounds which were similar either to the causal (filler, wood/metal) or the arbitrary order (wood/metal, filler) of the Experiment 1. Would children perform better when the discriminatory cue was more proximate to the choice, than when it was followed by a filler sound? If this was the case, then it would raise concerns that the differences between the causal and arbitrary conditions in Experiment 1 could be due to temporal proximity rather than causal plausibility. However, if children detected the causal structure, we did not expect to find differences between conditions when all cues, regardless of the order, were arbitrarily related to the outcome.