Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Cognit., 02 January 2026

Sec. Reason and Decision-Making

Volume 4 - 2025 | https://doi.org/10.3389/fcogn.2025.1565294

This article is part of the Research TopicCausal Cognition in Humans and Machines - Volume IIView all 7 articles

Cause-effect perception in an object place task

  • Cognitive Neuroinformatics, University of Bremen, Bremen, Germany

In this paper, we conducted an exploratory study in virtual reality to investigate whether people can discover causal relations in a realistic sensorimotor context and how such learning is represented at different processing levels (conscious-cognitive vs. sensorimotor). Additionally, we explored the relationship between human causal learning and state-of-the-art causal discovery algorithms. The task consisted of placing a glass on a surface. To enhance ecological validity, the setup included haptic rendering to simulate the glass's weight and contact force. The glass would break if the contact force exceeded its breakability threshold, determined by the causal structure weightbreakabilitycolor. Participants were asked to repeatedly transport and place glasses of varying weights and colors on a surface without breaking them. Therefore, to accomplish the task, participants had to discover the underlying causal structure. The trials were conducted over three separate sessions, each aimed to capture a different behavior [(i) naive and causally unaware, (ii) exploratory, and (iii) consolidated and causally aware]. After each session, participants completed a questionnaire providing a measure of their conscious understanding of the task's causal structure. Sensorimotor representations were inferred by applying three causal-discovery algorithms (PC, FCI, FGES) to the recorded trial-by-trial variables, and conditional mutual information was used to quantify the strength of causal influence on the sensorimotor level. Results show that (i) participants identified the weight-breakability link (≈76% correct after the final session) and, to a lesser extent, the color-breakability link (≈43% correct), but they could not reliably infer causal direction. (ii) Sensorimotor analysis revealed a robust weight-force coupling that increased across sessions, whereas the color-force coupling was weak and noisy, yet mutual information indicated an attempted learning. (iii) Discovery algorithms recovered the underlying structure across the three sessions. Together, these findings indicate that humans can, to some extent, perceive the causal structure of the task and that conscious and sensorimotor representations are partially dissociated.

1 Introduction

Acting intelligently in the world in everyday life is impossible without the appropriate perception and understanding of the causal structure of the physical environment (Is the food spoiled because of the sunlight or the heat of the room? What happens if I plug in the cable of the coffee machine? If my knife gets stuck while slicing bread, should I strengthen the pressure or increase my movement? Is the pavement wet because of the sprinkler or the rain?). The inference of causal relations provides the basis for reasonable decisions and a means to adapt to changes and contingencies in the environment and task conditions (Hattori and Oaksford, 2007; Rottman and Hastie, 2014). Causal understanding enables the inference of the effect of actions taken and provides the basis for controlling movements (Glymour, 2003; Shams and Beierholm, 2022). Cause-effect relationships tend to interact in a sparse/local way and are mostly invariant under domain shifts, thereby improving the stability of decision-making systems that use causal knowledge instead of statistical associations as a basis. Understanding the causal structure of the world is a key challenge for both humans and artificial systems, like robots or machine learning algorithms (Hellström, 2021; Scholkopf et al., 2021).

A crucial first step for a successful causality-based behavior is the identification of the causal structure of a system, also known as causal structure learning, or causal discovery (Glymour et al., 2019; Nogueira et al., 2022). For this identification, one must first disentangle spurious associations from actual cause-effect relationships. Suppose you clap your hands and, at the same moment, hear a dog bark outside. To figure out whether your clapping caused the dog to bark or if the two events simply happened at the same time by chance, you need to separate a real cause-effect link from a spurious association. Without further evidence, it would be mistaken to assume your clapping was responsible for the barking, rather than recognizing that the two events may have occurred independently. Furthermore, one has to take probabilistic dependencies into account, since only a smaller part of the relevant causal relations can be reduced to deterministic rules.

There are numerous algorithms to automate the identification, which are formally sound and provably converge toward the optimal solution. However, regardless of these desirable properties, there are still no convincing approaches to the autonomous handling of everyday tasks. Part of this deficit in robotic applications may originate from necessary assumptions that are often not strictly met in practice, e.g., the representation of the causal system must be provided (relevant variables, etc.) or assumptions about the functional form of the causal mechanisms. Additionally, working with large sets of variables is particularly challenging due to the exponential growth in combinatorial space.

In contrast to robots, though, humans possess full everyday competence in spite of facing the same problems. It is thus of interest to compare the properties of the causal processing of humans and of machine algorithms. In psychological research on causality the experimental setups are often relatively abstract, involving for example a hypothetical communication between aliens (Steyvers et al., 2003; Rottman and Hastie, 2014), the operation of machines like the “blicket detector” (Lucas and Griffiths, 2010; Goddu and Gopnik, 2024), the hypothetical influence of temperature and pressure on a rocket launch (Lagnado and Sloman, 2002), or the characteristics of fictitious types of stars (Rehder, 2017b).

In this work, we want to investigate how humans can learn causal relationships on a more basic level in the context of a realistic sensorimotor task. In particular, we want to investigate the following set of interrelated research questions:

• Can causal relations in a natural sensorimotor task of object transport be perceived and identified by human subjects?

• Does this causal learning take place on different levels of abstraction and processing?

• How can we measure the human causal representation in this context?

• How does the causal representation evolve over the course of an experiment?

• Can we compare the human learning in such a sensorimotor task with the operation of machine algorithms?

For this, we develop a suitable experimental setup in this work which (i) feels natural to our subjects, in contrast to the sometimes artificial setups used in other human causal learning experiments, (ii) is firmly embedded in a behaviorally meaningful sensorimotor task, as opposed to the abstract reasoning problems being often used, (iii) allows us to implement a ground truth causal structure of our free choice, and (iv) provides us with measurements of human motor data which allow us to analyze the underlying causal sensorimotor representation.

2 Background and related work

2.1 Causal Bayesian networks and causal discovery

In this work, we use the formalism of causal Bayesian networks (Pearl, 2009b). A causal Bayesian network is a probabilistic graphical model that represents the cause-and-effect relationships among a set of variables using a directed acyclic graph (DAG), where nodes represent the variables and directed edges direct causal relations, visually represented as arrows. The nodes can represent categorical, discrete, and/or continuous variables. An arrow represents the statistical dependency between variables.

The notion of the strength of a causal effect refers to the degree to which changing one variable through manipulation influences another variable. A strong causal effect indicates that a manipulation yields substantial variation in the outcome, while a weak effect indicates minimal impact on the outcome. Using the do notation, the general causal effect is defined as P(Y = y|do(X = x)), which denotes the conditional probability of Y = y, given the intervention do(X = x) (Pearl et al., 2016, Sec. 3.2). An intervention describes the active manipulation of the given system, i.e., setting a variable X to the value x independently of its causes instead of selecting data in which X = x (and therefore dependent on its causes). Janzing et al. (2013) have reviewed other measures of causal strength that have been proposed in the literature, emphasizing that the applicability of a particular measure depends on assumptions of the functional dependency between variables and the structure of the DAG (we will return to this topic in Section 3.5, where we discuss the measure of causal strength used in our setup).

The quantification of causal effects relies on the availability of a DAG. When the DAG is not known, it can be identified from data using causal discovery algorithms (also known as structure learning algorithms) (for recent accounts on the subject, see the reviews by Glymour et al., 2019 and Nogueira et al., 2022). Causal discovery algorithms systematically analyze many possible causal structures in relation to the statistical properties of the data. The causal structure can be identified by using conditional independence tests or model scoring (Glymour et al., 2019). For example, in the case of three variables, the common-effect structure (e.g., C1EC2, the one we use in this paper) has a unique set of independence statements (Pearl, 2009c).

There are two well-established approaches to causal discovery: constraint-based and score-based algorithms. Constraint-based algorithms identify the causal structure by testing for probabilistic (in)dependencies between the variables (Glymour et al., 2019). Since this is done repeatedly, the process models a multiple-testing problem as opposed to score-based approaches (Spirtes, 2010). However, most score-based algorithms require stronger assumptions over the data-generating process than constraint-based algorithms (Huang et al., 2018). Score-based algorithms identify the causal structure by optimizing a score function such as the Bayesian Information Criterion (BIC), which approximates the posterior probability of the causal structure given the data (Malinsky and Danks, 2017; Glymour et al., 2019). A special class of score-based algorithms is Bayesian causal discovery (BCD, Heckerman et al., 2006). The algorithms provide a degree of belief of every hypothetically possible causal graph for the given setting in the form of the Bayesian posterior. An advantage of BCD is that incorporating prior knowledge is easily possible. However, there are also risks associated with a proper choice of the prior (Weakliem, 1999). Compared to constraint-based approaches, the complexity of BCD is high, and finding the highest scoring DAG model is generally NP-hard (Chickering, 1996). In particular, it is impossible to apply standard BCD to configurations with more than six variables (Karimi Mamaghan et al., 2024) To mitigate this, approximations using Monte-Carlo methods or Gaussian approximations can be done in BCD (Heckerman et al., 2006). Additionally, score-based methods such as the GES algorithm employ search heuristics or greedy schemes (Spirtes et al., 2001).

2.2 Human causal inference

It has been suggested that research on human causal learning and reasoning should make use of the formal description in terms of causal Bayesian networks (e.g., Waldmann and Martignon, 1998; Glymour, 2001, 2003; Gopnik et al., 2004; Tenenbaum et al., 2006). Meanwhile, a majority of studies of human causal inference make use of causal Bayesian networks as the normative framework, and a number of researchers applied it successfully as a computational framework to study how humans infer causality by analyzing patterns of correlation and the impacts of interventions (e.g., Lagnado and Sloman, 2002; Glymour, 2003; Steyvers et al., 2003; Gopnik and Tenenbaum, 2007; Meder et al., 2010; Rottman and Hastie, 2014).

2.2.1 Causal inference in cognitive reasoning

Most studies of human causal inference focus on the cognitive level of analysis. This aims to investigate human qualitative and quantitative causal judgments and provides a basis to investigate whether people follow the causal Markov assumption and whether people make use of the parameters of the causal structures (Glymour, 2003; Rottman and Hastie, 2014), as well as to investigate the effect of causal knowledge on categorization (i.e., process of inferring an object's category membership given its features) (Rehder, 2017a) and category-based induction (i.e., the process of inferring features given an object's category membership) (Rehder, 2017b).

A key problem in experiments of causal learning is the unpredictable influence of previous experience. It has been noted that, in experimental settings, it is difficult to assess or control for potential contributions to judgment or behavior coming from participants' prior real-world knowledge, experience, or beliefs (Rottman and Hastie, 2014; Bailey et al., 2024). A common approach in cognitive causal research thus consists of providing participants with a cover story describing a specific context of the variables involved and the inference tasks. Typically, unnaturalistic tasks (e.g., the blickets paradigm where participants can intervene on objects that light up or play music Lucas and Griffiths, 2010; Goddu and Gopnik, 2024), unfamiliar contexts [e.g., rocket launch (Lagnado and Sloman, 2002)], or artificial contexts are provided [e.g., fictitious types of stars (Rehder, 2017b) or telepathic aliens who transfer their thoughts through mind reading (Steyvers et al., 2003; Rottman and Hastie, 2014)].

Complete learning of a causal representation requires knowledge about its structure (the directed graph) and its parameters (functional relationships). Early causal research in psychology has concentrated on the latter, for example, by describing the perceived strength of a causal influence using the ΔP rule (Jenkins and Ward, 1965; Allan and Jenkins, 1980; Cheng and Novick, 1990), which is a measure of the strength of covariation between a (cause) variable and a subsequent (effect) variable, and the powerPC theory (Cheng, 1997). The ΔP rule has been used as a normative measure to assess human judgements of causal association between variables (see Glymour, 2001; Chap. 4, Hattori and Oaksford, 2007, and references therein).

Regarding the structure of the causal graph, research has been focused on paradigmatic causal structures such as chains, common effects, and common causes (Rottman and Hastie, 2014; Rehder, 2017a); typically involving binary causes and effects (Glymour, 2001; Rottman and Hastie, 2014).

Judgments often align with causal direction, yet subjects show limited sensitivity to causal parameters and structure (Rottman and Hastie, 2014). When informed about causal strength, their predictions fall short of normative expectations. This insensitivity manifests as violations of the Markov condition, where conditionally independent variables are viewed as causally dependent (Rottman and Hastie, 2014; Rehder, 2017b). Proposed explanations include assumptions of hidden causal mechanisms and the influence of prior knowledge or beliefs that introduce unconsidered causal links (Rottman and Hastie, 2014; Rehder, 2017b).

Humans, despite their imperfections, are surprisingly quick at learning causal relations, faster than expected had they used full conditional probability distributions. According to Hattori and Oaksford (2007), this process begins with extracting covariation information from observations, which helps eliminate unrelated events as potential causes. They argue that after detecting covariation, an analytic process involving domain-specific causal knowledge and interventions follows, aiding in distinguishing real from spurious causes. Steyvers et al. (2003) demonstrated that interventions outperform observation-only methods for identifying causal structures in humans, while Lagnado and Sloman (2004) emphasize the importance of temporal cues in interpreting intervention results. The surprisingly fast learning, and the ability to incorporate the aforementioned influence of prior experience have been used as arguments for considering Bayesian causal discovery for the modeling of human causal learning (Waldmann and Martignon, 1998; Griffiths and Tenenbaum, 2009), where domain-specific prior knowledge guides statistical inference (termed theory-based causal induction) (Tenenbaum et al., 2006; Griffiths and Tenenbaum, 2009).

2.2.2 Causal inference in perceptual reasoning

The Bayes' rule and Bayesian networks formalisms have been used as normative models to investigate human performance in various perceptual and sensorimotor tasks (Shams and Beierholm, 2010, 2022). In particular, Bayesian formalisms are used to investigate two problems in multisensory perception (Shams and Beierholm, 2022): determining the causal structure that generated the sensory cues among a set of mutually exclusive options (termed the Bayesian Causal Inference model) and estimating hidden variables given sensory cues (termed the integration problem).

The Bayesian Causal Inference model is a statistical framework that determines the more likely causal structure between two options (commonly a common cause vs. independent causes) (Shams and Beierholm, 2022). A key example is the research by Körding et al. (2007), which explored how subjects infer common or distinct causes for auditory and visual cues. A common cause (e.g., X1SX2) is inferred when auditory and visual signals are perceived as close, while distinct causes (e.g., S1X1 and S2X2) are identified when they are perceived as distant. These causal structures are viewed as competitive priors (Shams and Beierholm, 2022), where prior beliefs influence causal interpretation, and these priors can be shaped by experience or evolution (Shams and Beierholm, 2022). Experimental data suggest that the Bayesian Causal Inference model effectively captures human behavior across various tasks (see Shams and Beierholm, 2022). However, it's uncertain whether this inference happens consciously or if the nervous system actively commits to a causal structure during processing (Shams and Beierholm, 2022). Moreover, the competitive prior structures are assumed to be fixed, making the inference a matter of choosing one structure from a limited set of options.

Regarding the general causality literature, the term causal inference relates to computations based on a known causal structure, as data alone cannot resolve causal questions without understanding the data-generating process (Pearl et al., 2016). Causal discovery, or causal structure learning, involves identifying the causal structure of a process from its statistical properties without assuming any specific structures a priori (Glymour et al., 2019; Nogueira et al., 2022). Thus, the Bayesian Causal Inference model of human perception, which uses competitive prior structures, differs from general causal discovery/learning, although the conceptual resemblance is acknowledged (see Section 1 in the Supplementary material). The term Bayesian probabilistic inference describes how humans assess belief over hypothetical causal structures based on data (Gopnik and Tenenbaum, 2007).

3 Methods

3.1 Motivation

In the introduction, we have outlined the main questions we aim to address through our investigation. In this section, we aim to outline the various aspects of our motivation and examine the detailed requirements that must be met to facilitate a thorough investigation. Here is an overview of the problems and requirements we will discuss in this and the following sections:

1. Observation-only identification. To avoid problems with the role of interventions, we seek an experimental setup with a causal structure that can be uniquely identified from observations only.

2. Different levels of causal representation. Is there one unique causal representation? Or do we have reason to believe that there exist different levels of causal representations, in particular for the handling of a sensorimotor task?

3. Perceived cognitive causal representation. We have to find means to determine which causal representation our subjects use on the conscious cognitive level.

4. Sensorimotor causal representation. Is it different from the causal representation on the perceptual level? How can we measure this representation?

5. Sensorimotor relevance. How can we ensure that the causal structure is relevant for the sensorimotor level?

6. Observability of the sensorimotor causal representation. The internal variables of representations can usually not be observed.

7. Problems of naturalness. Natural tasks pose a problem for causal experiments due to the possible influence of long-term experience.

8. Measure the learning effect. How can we measure how much subjects learn about the causal relations, both on the conscious and on the sensorimotor level? How can we deal with prior experience?

9. Comparison of human causal learning with machine algorithms. Do we need access to a full-fledged robot? How can we enable a fair comparison?

In the following, we provide detailed comments on how we considered the above-stated problems in designing the experiment and in the methods used for data analysis. For this, we developed an experimental setup in Virtual Reality (VR) in which human subjects perform the natural task of moving an object and placing it at a prescribed target location. In particular, the subjects had to move a glass as fast as possible while avoiding breaking it in the final contact with the supporting surface. Special care was taken to ensure the realism of the setup by providing high-quality haptic feedback to the subjects using a PHANToM haptic device.

Figure 1 shows the ground truth causal structure of the experimental setup. The upper left subgraph describes the causal dependencies between the glass properties (weight, color, and force threshold). Throughout the text, we will use the terms force threshold, breakability, and breakability threshold interchangeably. Whether these dependencies can be recognized by our subjects is the central question of our study. The lower part of the graph describes how the combined influence of the causes breakability and force-value determines whether the glass will remain intact (i.e., glass-ok = true) or will be broken (i.e., glass-ok = false) by the contact force against the target plate.

Figure 1
Causal diagram depicting five nodes “Glass-weight”, “Glass-color”, “Glass-Force-Threshold”, “Force-value”, and “Glass-ok”. The nodes are connected by directed edges. They indicate flow from Glass-weight and Glass-color to Glass-Force-Threshold and from Glass-Force-Threshold and Force-value to Glass-ok.

Figure 1. The ground truth causal structure of the experimental setup. Upper left sub-graph: causal dependencies between glass properties. Lower subgraph: causal determinants of possible glass destruction.

In the following, we will describe our decisions and motivations in attempting to solve the aforementioned problems.

Problem 1: Unique identification of the causal structure by observations only. In general, this is not possible for most causal structures. However, in this experiment, the causal relations should be uniquely identifiable by observation alone, without the need to intervene in the system. This is because neither did we want to give our subjects direct access to variables that encode properties of the VR environment, nor is it easy to implement an ideal intervention in the sense of Pearl's do operator in motion-related parameters of the hand motion, such as velocity or force.

Therefore, we used the only basic causal configuration for which a unique identification based on observations alone is possible, the common-effect configuration (cf., e.g., Spirtes et al., 2001). Such a V-like configuration is shown in the upper left part of Figure 1. In particular, we used a causal dependence of the breakability of the glass on two object features: the glass weight and the glass color (note that the lower part of the figure is also a V-configuration with a common effect).

How can our subjects learn the ground truth from observation? The first two variables, weight and color, are sensory observables for our subjects via the PHANToM force rendering mechanism and the VR screen rendering. The breakability can be indirectly observed through the binary outcome: whether the glass is broken or not.

Problem 2: Mutiple levels of causal representation. Does there exist only one level on which a causal representation exists and can be learned? In general, there is overwhelming evidence that the learning and execution of a task involves different levels of representation.

The classical view of sensorimotor learning (e.g., Fitts and Posner, 1967) is that it occurs as a systematic and gradual transition between two representational levels. In the first phase, learning is determined by cognitive control and deliberate action. In this phase, and at this level, the learner can utilize verbal instructions and acquire declarative knowledge about the task and the appropriate strategies to cope with it. With ongoing practice (and in a continuous fashion), more and more of the control is transferred to the motor level. Finally, the complete motor control is performed by a fully automated procedural routine, without the need for cognitive control. Typically, in the fully automatized state, the actor is barely aware of his automatic (re-)actions. Tasks that initially require cognitive control can thus become automatic, freeing up cognitive resources for other tasks (cf. also the two-process theory of attention by Shiffrin and Schneider, 1977).

Regarding these levels of sensorimotor learning, there is an analogy to Kahneman's dual-systems theory (Kahneman, 2012). Key properties of Kahneman's system 1 are: fast, intuitive, automatic, and unconscious. System 2 properties are: slow, deliberate, analytical, and consciously aware. However, the dual-systems theory of Kahneman primarily addresses modes of thinking rather than levels of motor control, and the relationships between the two concepts have yet to be fully worked out in detail (nevertheless, something like automatized behavior in car driving is often used as an example for a Kahneman system 1 operation).

A dual-processing approach has also been discussed in sensorimotor learning in the form of a fast and a slow process (Smith et al., 2006), or as a distinction between explicit and implicit processes (Taylor et al., 2014; McDougle et al., 2015). The explicit processing operates with conscious awareness, requires deliberative effort, and can be influenced by verbal instructions and declarative knowledge. Implicit processing is closely linked to sensory input and motor output, operating on an unconscious level. It may thus be regarded as the sensorimotor system, as opposed to the consciously operating explicit learning and cognitive reasoning. Of special interest for the context of this paper is that these explicit and implicit processes, while often interacting with each other, can also contribute independently to overall sensorimotor performance (Sülzenbrück and Heuer, 2009; D'Amario et al., 2024).

And after all, there is also evidence from causality research itself, which suggests the existence of different levels of causal representations (although this is usually not explicitly addressed).

To recognize this, let us consider two major research paradigms that are usually employed in causality research: On the one hand, presenting subjects with certain (often artificial and abstract) data, e.g., from a fictitious laboratory experiment or the communication patterns of aliens, and asking them purposefully directed questions about the underlying causal relations. In terms above, these experiments can be assumed to address the conscious cognitive level of causal representation. On the other hand, there is a line of research where causal relations are investigated that underlay certain multisensory effects [the aforementioned Bayesian Causal Inference model (Körding et al., 2007)]. It is a characteristic of this multisensory processing that it is fully automatized and therefore completely “shielded” from the cognitive level [”cognitively impenetrable” (Pylyshyn, 1999)]. For example, the position attribution in the ventriloquist effect occurs despite our cognitive knowledge that puppets do not speak, and in full awareness of the principles behind the trick. This is clearly an example of a causal representation at a level different from the conscious cognitive level.

In conclusion, there is substantial evidence for the existence of different processing levels, both with respect to higher cognition (Evans, 2008), and to sensorimotor learning (Smith et al., 2006; Taylor et al., 2014). To what extent these levels overlap, reduce to two major subsystems, or represent a system-wide duality being realized in multiple subsystems is currently unclear, and we do not want to take a specific stance on this matter. For the context of our investigation, the only important point is that we can distinguish at least two levels which are of relevance. On the one hand, the conscious, explicit level. This is the level on which our subjects consciously perceive and reason about the causal dependencies in our experimental setup. On the other hand, there is the sensorimotor level. This is the level at which the motor parameters for the hand movement are determined in dependence on the sensory input.

Taken together, we derive the following conclusion from the above evidence: It is possible that there exists only one unique causal representation, with the respective information being transferred during the course of learning from an explicit and conscious level to an unconscious sensorimotor level. It is equally possible, though, that the subsystems have a certain independence of each other. And even if the sensorimotor level should not be perfectly cognitively impenetrable (i.e., there would be no influence from the explicit conscious level), these levels are sufficiently well separated from each other that the possibility exists that two different causal representations develop and exist on these levels.

This poses two questions. First, how can we measure these different causal representations (Problems 3 and 4). And second, how can we ensure a proper and fair comparison of the representations? (Problem 5). We start with the measurement:

Problem 3: Experimental identification of the causal representation on the higher, cognitive level. This is a problem that can be solved relatively easily. We can simply ask our subjects about their representation. We have done this in the form of a questionnaire that participants were required to answer after each of the three experiment sessions. Furthermore, we let them draw causal graphs. Such an identification, achieved by asking subjects about their perceived causal relations, is a standard procedure in human causality research (Glymour, 2001; Rottman and Hastie, 2014). A detailed description of the questions can be found in Section 3.3.1.

Problem 4: Identification of the sensorimotor causal representation. This is a more tricky issue since the representation cannot be directly inferred by questioning our subjects or by performing a test after each experiment session. Of course, we have asked our subjects about the criteria they used to determine their motor strategy. However, by the very definition of the sensorimotor level, we cannot expect our subjects to have complete conscious access to the underlying causal representation. In particular, there is a high probability that they would confuse their sensorimotor representation with their conscious causal representation. Therefore, we have to look for an alternative, “neutral” measuring method. In this paper, we suggest using causal discovery methods for identifying the sensorimotor causal representation. To our knowledge, this is the first time a causal representation learned by humans has been analyzed using causal discovery algorithms (an important precondition for such an application, the availability of sufficient statistical data from our subjects, is discussed below).

Problem 5: fair comparison of the representations. Now that we have the methods for measuring causal representations on the two different levels available, we must turn to the second of the above-mentioned problems. Note that a fair comparison would not easily be possible if the learning of the two representations did not employ the same information about the causal relations in the experimental setup. How can we determine that both levels are using the same information (in our case, this should be the ground truth causal v-structure shown in the upper left part of Figure 1)?

We designed our experiment in such a way that this is achieved by the specific structure of the sensorimotor task. A successful accomplishment of the task, being fast and breaking not too many glasses, is only possible if the underlying sensorimotor representation does properly reflect the ground truth causal structure (including its quantitative relations, i.e., the parameters of the corresponding structural equation model). This is shown in Figure 2.

Figure 2
Three causal diagrams illustrate relationships between variables depicted as nodes. First causal network shows “weight” and “color” pointing to “breakability”. Second diagram shows “perceived weight” and “perceived color” pointing to “estimated breakability”, which points to “force”. Third diagram shows “weight” and “color” pointing to “force”.

Figure 2. Causal graphs relevant for the sensorimotor causal representation. (Left) Ground truth causal structure of the experimental setup. (Center) Internal representation enforced to be used for sensorimotor control. (Right) The corresponding simplified proxy graph of the sensorimotor causal representation.

The left graph of the figure shows the relevant ground truth, and the center graph shows the corresponding causal graph of the sensorimotor representation. underlying causal structure of the assumed sensorimotor control as enforced by the task. It is important to understand how the experimental design enables us to ensure that the causal representation and the sensorimotor task are aligned in such a way that the correct sensorimotor causal representation is required for an optimal solution of the sensorimotor task:

The perceived weight and perceived color are used to compute an internal estimate of the glass's breakability (note that this computation may be suboptimal or may underestimate the influence of one of the factors, possibly even up to the point that one factor or both are ignored). The estimated breakability is then used to compute a force that is just a small margin below the breakability threshold. This is enforced by the task, as it is only then possible to move the glass as fast as possible (high velocity means high force at the final placement) while still avoiding the glass from breaking.

Problem 6: observability of the variables of the sensorimotor representation. The application of causal discovery methods for measuring sensorimotor causal representation would ideally require access to the variables. Since these variables are internal variables, possibly in the form of activities of dedicated neurons, we currently have no means of direct access. However, we can manage to obtain at least an approximate or indirect access to these variables. We solved this by finding a setup in which the variables of the internal causal representation are either sensory variables (which can be approximated by their corresponding input, which is observable) or have an observable proxy variable, as explained in more detail below.

This specific configuration allows us to apply some approximations: First, we can equate perceived weight and color with the true values. This implies assuming a sufficiently high signal-to-noise ratio, thereby enabling us to ignore sensory noise. Note that this is standard procedure in investigations of human causal representations where internal and external variables are routinely equated. Due to this, the two variables become observables, as we can obtain their respective values directly from the experimental software.

The second approximation is that we consider the force to be a close proxy to the estimated breakability. This is justified by the sensorimotor task, which requires the two to differ only by a small margin (see above). Since the force value is registered by the PHANToM, the effect variable is now also observable.

Taken together, we now have all three variables in an observable state such that we can use them as input to a causal discovery algorithm, which infers the underlying sensorimotor causal representation.

Problem 7: naturalness: a further motivation for using these two features is that we can thus balance possible biases due to prior experience while ensuring ecological validity. We use two realistic object properties. One property, the causal relation between the weight of the glass and its breakability, is compatible with prior everyday experience. This is because the thickness of a glass is often an indicator of greater robustness, given other properties like form and geometry. At the same time, a greater thickness implies a higher weight of the glass. Thus, it is well possible, though not necessarily true, that at least some of our subjects can have prior experience with a causal relation between glass weight and glass breakability.

For the other property, the color of the glass, it is reasonable to assume that there is no a priori systematic relation to the breakability. That is because glasses of every form can be produced in every color. We thus assume that many, if not all, of our subjects will have no a priori inclination to associate the color of a glass with its breakability. By combining both types of relations, we expect to be able to examine how prior inclination influences the identification of causal structures.

Problem 8: measuring the learning effect: this requires the solution of several sub-problems: Ideally, we would like to know how much prior experience subjects have regarding the causal relations in our experiment. On the other hand, we do not want our subjects to develop biased expectations regarding our experiment. For example, if we were to ask our subjects before the start of the experiment whether they expect the breakability of a glass to depend on its color, they could take this as a motive for paying far more attention than they would normally do to this specific relation. On the other hand, if they have not previously experienced such a causal relation, they could interpret our question as a manipulative means to drive them toward possibly unreasonable behavior. They will thus refrain from declaring such a relation, even if their experience would suggest the presence of this relation in the experimental setup. Both types of bias would drive subjects away from the normal, everyday behavior that we want to maintain in our experiment.

On the other hand, if we refrain from asking questions beforehand, how do we obtain the baseline against which to measure the learning in our experiment? This baseline problem is further exacerbated by the requirements for measuring the sensorimotor representation. Since this measurement requires experimental data, its baseline can only be obtained by actually running the experiment, and not beforehand.

A further problem with learning is posed by the ultimate goal of the sensorimotor task: move as fast as possible without breaking glasses. This is in conflict with learning because most information about the underlying causal relations can be obtained with a balance between breakage and its avoidance.

As a solution to these conflicting requirements, we designed a three-session experiment:

Session I: we deliberately abstained from any pre-experiment tests to avoid the aforementioned bias problems. The first session has two objectives. First, to familiarize subjects with the task and setup. Additionally, this establishes the baseline for later comparisons. In this session, subjects pursue the sensorimotor task but are unaware of the main goal of the study, which is to identify the underlying causal relations. This comes only afterwards, when subjects see the questionnaire for the first time.

We are aware that setting the baseline using the experimental data obtained during this first session and the subsequent application of the questionnaire is suboptimal, as it does not accurately represent the true initial states of the subjects. This is because a certain degree of learning and some conscious reasoning already takes place in this first session. However, given the above-discussed conflicting requirements, this design seems to offer the best compromise.

Session II: this session is intended to provide the maximum learning effect. The subjects are instructed to ignore the sensorimotor task and are encouraged to conduct a more risky strategy, leading to more broken glasses, in order to obtain the maximum information about the underlying causal relations. After this session, the questionnaire is applied for the second time.

Session III: in this session, we return to the original instructions for the sensorimotor task (as fast as possible without breaking the glass). Since subjects are now aware of one goal of the study, the recognition of causal relations, they are assumed to follow an integrated strategy: take into account the sensorimotor instructions and, at the same time, try to identify the causal relations. Note that these are in no sense counteracting demands. It is due to the very design of our experiment that the best results are obtained when the causal relations are identified and used for determining the motor parameters (in particular, the optimal placement force). It is in this session that we expect participants to reach the top level of causal sensorimotor learning. After this session, the questionnaire is applied for the third and last time.

Problem 9: Human-Machine Comparison.

What would be a fair comparison of humans and machines with respect to the given experimental setup? Ideally, a robot equipped with grasping capabilities, sensors, sensorimotor learning algorithms, and causal discovery algorithms should be brought into our experimental setup to perform the same tasks as our subjects. This is, for various reasons, far beyond the scope of this paper.

What we can do here is to investigate how a causal representation can be learned by discovery algorithms, provided they have access to the same data as available to our human subjects. This is not the ideal comparison, because, for example, the force values in those data stem from our subjects, whereas our hypothetical robot would use its own strategy to adapt the motor control to the task. However, for a first approach, this approximation appears to be reasonable.

For comparison purposes, the machine algorithms can also be fed data not directly available to our subjects, such as the breakability of glass (the force threshold), which our subjects must infer from the observable glass-ok state. Furthermore, we can use different state-of-the-art causal discovery algorithms, such as FGES, PC, and FCI, and compare each of them to human capabilities. Results are presented in Section 4.4.

3.2 Description levels, causal representations, and causal learning

In this paper and in the above-described experimental paradigm, causal structures, causal representations, and causal learning methods appear on various levels, in different roles, and with different shortcuts and approximations. We provide an overview here since a conceptually clear separation of these different aspects is essential for understanding.

Causal structures. For the description of the ground-truth causal relations, we make use of the framework of the causal graph. The causal structure of the most basic level of our experiment is given by the ground truth causal graph of the experimental setup, as shown in Figure 1. As pointed out, the main focus of this paper is on the v-structure (common effect structure) on the upper left of the figure (also shown on the left of Figure 2). This V-structure is relevant for the human representation of the causal relations between the glass properties (weight, color, and breakability). The complete overview of the causal structure of the experiment (including the subject) is shown in Figure 3. This extended causal structure results from the combination of the ground truth causal structure of the experimental setup with the (hypothetical) sensorimotor representation of the human. It corresponds to the view the experimenter has of the situation.

Figure 3
Two interconnected flowcharts, depicting the experimental setup and the sensorimotor representation indicated by dashed boxes grouping nodes. The diagram shows the causal flow in which weight and color influence both breakability and the perceived weight/color nodes. Perceived weight and -color point towards estimated breakability, which in turn influences the node force. Breakability and force together determine glass-ok. The experimental setup contains nodes weight, color and breakability. The sensorimotor representation includes nodes perceived weight, perceived color and estimated breakability.

Figure 3. Full causal graph with ground truth causal relations of experimental setup and of the (hypothetical) sensorimotor representation.

Causal representations. These are obtained by learning from data. Like the ground truth causal structures, we describe them in terms of causal graphs. We address two basic forms of causal representations in our investigation, human causal representations and “machine” representations. Human causal representations are internal to our subjects and are assumed to exist in two variants, on a conscious cognitive level and on the sensorimotor level. For the identification of the respective human representations, we use two methods, questionnaires and drawings for the conscious representation, and discovery algorithms for the sensorimotor representation. For the machine representation, no special identification is necessary; they are directly produced by the machine learning algorithms (causal discovery). In this paper, we investigate how far our subjects can learn the ground-truth causal structure. In the ideal limit, i.e., with perfect learning, the structure of the human causal graph will be identical to that of the ground truth. For some of our subjects and in the different experiment sessions, the learned graph will differ from the ground truth due to missing or incorrect causal relations and directions.

Causal learning. Corresponding to the representations are different types of learning. For humans, there are conscious levels and sensorimotor level learning, which cannot be directly observed. We hope to gain insight into the learning process by analyzing the development of causal representations across the three experimental sessions. For machine learning, we use discovery algorithms, FGES, PC, and FCI.

Please note that there are two cases of learning with a special status: (a) application of machine-learning algorithms to the identification of the human causal sensorimotor representation (for reasoning see problem 4). (b) Human-machine comparison with machine algorithms applied to human data (for reasoning see problem 9).

3.3 Experiment

The data for the analysis were obtained from an experiment conducted with 21 healthy human adults. Since there exists neither experience with a sensorimotor task for causal structure learning nor with the use of sensorimotor data for causal discovery algorithms, the current experiment represents mainly an exploratory study. Although we present statistical tests for some effects, the primary focus of our study is on identifying the basic properties of and relationships between causal learning at the cognitive and sensorimotor levels. In this experiment, participants were required to move a glass from one plate to another as quickly as possible without breaking it (this will be further explained in the following sections and in the Supplementary material).

The experiment is structured into three blocks, termed sessions, each consisting of a series of 80 trials, followed by a questionnaire and a subsequent break. The motivation for this design is described in detail in Section 3.1. In the first session, termed Raw, the participant is only told to place the glass. In this session, the participant is not aware that a questionnaire will be applied. Thereby, the participant does not pay attention to any possible relationships. As a whole, the Raw session provides a baseline for comparison. In the second session, termed Train, the participant is allowed to experiment and explore possible strategies without any constraints or optimization targets, and is aware of the questionnaire. In the third session, termed Test, participants were instructed to place the glass as fast as possible without breaking it. Overall, the Raw session is intended to observe the participants' naive behavior, the Train session to observe exploratory behavior, and the Test session to observe consolidated behavior.

Each participant had an introductory, informal warm-up phase to get used to the setup. The design was verified in a pilot phase with three participants.

3.3.1 Questionnaire

After each session, the participant is tasked to create a causal graph consisting of the nodes glass color, glass weight and glass breakability with qualitative (binary) relations. To introduce the concept of causality, the participant is primed by an interventional definition of causality, i.e., manipulations of X propagate to Y but not the other way around (Pearl, 2009a). To ease the construction, the participant may also answer binary questions regarding the existence of all possible relations. Based on the answer, the participant or (if necessary) the instructor then draws the causal graph as a graphical representation of the answers. If the participant cannot specify the directionality of influence (e.g., the binary responses are symmetric), the edge is considered undirected. Therefore, it is not possible to give contradictory answers. The questionnaire is included in the Supplementary material.

All other properties (e.g., regarding the shape or 3D model of the glass in general) are held fixed. Changing the glass weight and color is a result of careful consideration of the participants' background knowledge. The relationship between an object's weight and its breakability can be observed through everyday life. For example, fragile materials benefit from an increased width, thereby reducing overall breakability. This additional material will consequently result in a higher weight. For the color, however, there is no such simple rule. Therefore, this link between the glass' breakability and its color can be seen as more artificial than the weight-breakability relationship. This is considered to reduce the interference of acquired knowledge throughout the experiment and prior knowledge. After the last session, participants are asked about their optimization strategy.

3.3.2 Hardware- and software

To set up the VR simulation, we used the development edition of Worldviz Vizard 5.9, 64 Bit1 for python 2.7.12. Furthermore, we used the PHANToM Premium 1.5 (High Force Model) with the PHANToM device driver Version 5.1.7 (Figure 4a). It is employed as a haptic device to function as both an input interface for the participants and an output device to render the force provided by the VR. For further specifications such as damping and stiffness see Supplementary Table 1. They were chosen based on how natural they felt. Before each session, we recalibrated the haptic device with the PHANToM configuration utility for Windows, also version 5.1.7. In our code for the simulation, we utilized the developer version of the sensable3e-plugin from openHaptics.2

Figure 4
a. A phantom haptic device in a lab setting is being manually controlled by a person's hand. b. A 3D simulation showing a virtual hand holding a red wine glass on a grid pattern surface with two shaded squares.

Figure 4. (a) PHANToM haptic device used in the experiment. (b) VR-environment of the study with two lower plates and a bar in the upper half of the screen (not shown). The start plate appears blue, indicating that it needs to be touched next. In the upper left corner is a window that displays the top view, making it easier to locate the glass.

3.3.3 Setup

The participant is seated in a frame with a monitor pointing downwards and mounted right above the head. The haptic device is then placed in front and in reach of the test subject. To make the content of the monitor available, an angled mirror is placed directly in front of the subject. The mirror thus also hides the participant's hand. Note that the frame blocks the participant's peripheral view due to its construction.

3.3.4 Trial design and causal relations

The VR setting consists of two plates, an indicator bar, and a closed hand holding a glass of wine (see Figure 4b). As described earlier, a haptic device will be used to control the glass by the user. It renders the force applied to the glass by the weight, either one of the two plates or the floor. A trial consists of placing the glass on the right plate and moving it to the left plate while touching the bar at the top at some point in between.

The subject is instructed not to break the glass and to move it from the starting plate to the end plate as quickly as possible. The glass will break if the haptic device registers a force above a given threshold, i.e., the variable Glass-OK in the causal model becomes zero (see Figure 1). The force threshold (i.e., the breakability of the glass) is calculated by

fforce_threshold(color,weight)=(2.5·weight+color_offset(color))·0.8241    (3.1)

with

color~U({red, green, blue})weight~U([0.2616,1])    (3.2)

and color specific offsets with the following order: blue>green>red. Therefore, on average, blue is the most stable one and red the least stable.

The calculation of the breakability force threshold of the glass forms a collider in the causal graph. This form of interaction is deliberate, as collider/common effect structures can be identified based solely on correlational or observational data. Other forms of interaction (chain, confounder/common cause structure) leave the same trace in the data, so they are indistinguishable from one another. This is why they are in the same Markov Equivalence Class (MEC) while colliders have a different MEC (see e.g. Spirtes et al., 2001).

A more detailed explanation of the trial design and the causal model can be found in the Supplementary material.

3.4 Data preparation and analysis

For each participant, there are three datasets, one for each session, with 80 samples each. Additionally, there is a dataset with 240 samples, composed of the concatenation of all three. This dataset resembles the experience of the participants after the experiment. Within each dataset, there are three continuous features and one categorical feature, making a total of four:

1. glass_weight- continuous.

2. glass_color- categorical.

3. force_value- continuous. The rendered contact force when the glass is placed on the plate.

4. force_threshold- continuous. The breakability of the glass.

On the datasets with 240 samples, the causal discovery algorithms PC, FCI, and FGES are applied. PC and FCI are both constraint-based algorithms that use conditional independence tests to identify the skeleton and all collider structures within the data. The FGES algorithm, on the other hand, is score-based. This class of algorithms repeatedly score, compare, and evolve candidate graphs by using a criterion such as the BIC score (see Spirtes et al., 2001; Kalainathan et al., 2022 for an overview).

For PC and FCI, the conditional-gaussian likelihood ratio test (CG-LRT, Andrews et al., 2018) and for FGES, the degenerate gaussian BIC-score (DG-BIC, Andrews et al., 2019) are used. For the implementation of the algorithms, we used the CLI of the Tetrad toolbox for causal discovery3 (Ramsey et al., 2018). For further analyses of the results, we used the causal-learn Python package4 (Zheng et al., 2024). Each algorithm was run 60 times with bootstrapped data, and the resulting graphs are combined such that we keep the highest frequency edge. For an overview over all hyperparameter used for Tetrad see Supplementary Tables 2, 3.

3.5 Quantification of causal strength on the sensorimotor level

Assuming a causal structure weightforcecolor, the strength of the effect of the glasses' weight and color on the contact force exerted by the participants can be measured using the conditional mutual information (Janzing et al., 2013). The conditional mutual information (Cover, 1991), defined as:

I(weight;force|color)=weight,force,colorP(weight,force,color)logP(weight, force|color)P(weight|color)P(force|color),    (3.3)

quantifies the information of weight on force given color. Similarly, the expression

I(color;force|weight)=color,force,weightP(color,force,weight)logP(color, force|weight)P(color|weight)P(force|weight),    (3.4)

quantifies the information of color on force given weight. Conditional mutual information is recommended as a measure of causal strength for the structure weightforcecolor over typical measures such as the Average Causal Effect (ACE) and Analysis of Variance (ANOVA) (Janzing et al., 2013).5 This quantity applies to dependencies between variables of arbitrary domains (discrete and continuous), with linear and/or nonlinear interactions (Janzing et al., 2013). The conditional mutual information was computed in R (R Core Team, 2022; Version 4.5.1) using the infotheo package (Meyer, 2022; Version 1.2.0.1).

4 Results

Whenever we apply causal discovery to the data (i.e. Sections 4.2, 4.4), the results obtained by the FCI algorithm are representative for the PC and FGES algorithm as well. Therefore, we only report the results based on FCI in the main document. The results of the other algorithms are presented in Section 4 as part of the Supplementary material.

4.1 Human causal discovery

Before the results of the questionnaire are displayed, it is important to note that none of the participants were able to report the causal direction (i.e., the orientation of the arrows) of the relations between variables. All participants reported that they refrained from distinguishing cause from effect because more data would be needed to do so. Therefore, only the undirected relation between variables can be analyzed further.

Figure 5 shows the aggregated results of the questionnaire. The thickness of the connection corresponds to the number of participants who identified a relation between the connected variables. The dashed line indicates an erroneously identified connection.

Figure 5
Diagram showing three different configurations of glass weight, glass color and force threshold, each fully connected by weighted, undirected edes. Panel a: glass weight is connected to glass color with weight 0.24 and to force threshold with 0.57. Glass color is connected to force threshold by 0.24. Panel b: Glass weight is connected to glass color (0.24) and force threshold (0.67) and glass color with force threshold (0.43). Panel c: Glass weight is connected to glass color (0.05) and force threshold (0.76) and glass color with force threshold (0.43).

Figure 5. Aggregated results of the questionnaire. The thickness of the lines corresponds to the fraction of how many participants identified the relation between the connected variables. This fraction is also shown on the given lines. The dashed connection indicates that this connection is erroneous. (a) Session raw. (b) Session train. (c) Session test.

Figure 6 shows the same information with respect to the learning effect from the initial state before the experiment (here assumed with “no causal information” = 0%) across the three experiment sessions.

Figure 6
Graphs comparing cognitive and sensorimotor learning. Learning level is on the y-axis and the sessions (Baseline, Raw, Train, Test) denote the x-axis. Cognitive learning shows three lines: weight-threshold, color-threshold, weight-color, whereas Sensorimotor learning shows two lines, ommiting weight-color. All lines start at learning level zero for session Baseline. weight-threshold is comparable in both learning scenarios. color-threshold is much higher in cognitive learning. All lines increase along the x-axis, except for color-threshold in sensorimotor learning.

Figure 6. Learning on the conscious cognitive level (left) and sensorimotor level (right) from the initial state of “no information” across the three experiment sessions (percentage correct responses for each edge). The dashed lines indicate the assumed change from baseline to the first observed response.

Considering the assessments of the humans in their entirety, the structure identification becomes more accurate over the course of the experiment. The connection between the weight of the glass and its breakability was the most prominently identified one. In the first session, 57% of participants suspected it, 67% in the second session, and 76% in the last session. The relationship between the glass color and its breakability seems to be harder to identify, as only 24% recognized the relation in the first session, which increased to 43% in the subsequent sessions.

After the first session, some participants noted that they expected the weight of the glass to be connected to its color. This suggests a strong bias toward establishing a causal relationship between the two properties. Accordingly, we observed that 24% of the participants drew a connection between the two variables in the test after the first two sessions. However, the percentage drops to less than 5% after the last session, when more empirical data for their decision has been gathered by the subjects.

4.2 Sensorimotor

During interrogation, the participants reported sensorimotor optimization strategies that were completely agnostic to the properties of the glass (e.g., optimizing muscle control and movement path, reducing the distance to move, etc.).

Indeed, the moved distance and speed in general increased over the duration of the experiment. The self-reported stimulus-agnostic optimization is surprising because the average gap between the applied force and the breakability threshold reduces from the first to the last session. Additionally, the graph of the conscious cognitive causal representation (Figure 5) shows that the majority of subjects (76%) clearly recognized a causal dependency between the breakability of the glass and its weight, and 43% recognized this dependency for color as well. We thus examine the sensorimotor level in more detail to determine whether it is really agnostic to the causal dependencies.

We derived the causal graph of the sensorimotor representation by application of discovery algorithms to the recorded experimental data (the reasoning and the method for this is described in Section 3.1, Problem 6). The resulting causal graph (FCI algorithm) is shown in Figure 7 and the corresponding learning curves are shown in Figure 6. See Supplementary Figure 16 for results of the PC and FGES algorithms.

Figure 7
Three panels each show network diagrams with nodes “glass weight”, “glass color” and “force value”. Undirected edges connect “glass weight” and “force value” with numeric values: 0.62, 0.71, and 0.81 respectively. “glass weight” and “glass color” connect with a value of 0.05 in the first and last panel. “glass color” and “force value” connect with 0.05 in the second panel.

Figure 7. Sensorimotor causal graphs identified by the FCI causal discovery algorithm (sessions from left to right: raw, train, test).

The results provide clear evidence that the sensorimotor level is not completely agnostic to the causal relations. There is a causal link (0.81) between the glass weight and the force value rendered during the contact of the glass and the plate. This is even slightly higher than the corresponding value on the conscious cognitive level (0.76). Note that since the sensorimotor causal representation is derived from data that include the rendered force value that the subjects perceive during the contact of the glass and the plate, this result also shows that subjects incorporate the causal dependency between weight and breakability_threshold into their sensorimotor strategy, and this without conscious awareness of doing so.

The situation is different for the color-force link. Such a link could be found on the 0.05 level in a few discovery runs, but this could also be a spurious result. Since discovery algorithms operate on the safe side (rather than missing a weak causal dependency and reporting it erroneously as existing), we sought a more sensitive measure. For this, we directly examined the correlations between weight and force, as well as between color and force.

This closer analysis of the causal weight-force link is illustrated in Figures 8, 9 (top). If the sensorimotor system were to completely ignore the influence of the glasses' weight, the correlations should be zero. In contrast, almost all correlations are found to be greater than zero, including the bounds of the confidence intervals calculated using Fisher's z-transformation. Furthermore, a two-sided one-sample t-test on the change in linear correlation from the first to the last session indicated a significant difference from zero with α = 0.05 [t(20) = 2.253, p = 0.035]. The mean change was positive (μ¯=0.073), indicating an increase in causal coupling. This implies that further learning of the weight-force relationship occurs on the sensorimotor level during the experiment. Please also note that no negative correlation has been observed (which would correspond to an increased caution when placing a heavy glass). Additionally, the difference between the rendered force and the force threshold needed to break the glass is reduced in the last session compared to the first.

Figure 8
Three scatter plots comparing exerted force by participants versus weight of glass. Each plot represents different correlation coefficients: 0.781, 0.369, and 0.073. Data points indicate broken and healthy glass, marked by color. A trend line is present in each plot.

Figure 8. Examples for different levels of linear correlation (shown in upper left corner) of the glass weight and the exerted force by the participant on the glass (both in N). The data has been recorded in the last session of the experiment and stem from three different participants. Each circle represents a unique trial, and the black line indicates a linear regression of the force on the weight.

Figure 9
Scatter plots display correlation coefficients for two relationships: weight-force and color-force across three sessions: Raw, Train, and Test. Each chart shows data points with error bars, comparing relationships. Weight-force is shown in a darker tone, while color-force appears lighter. The y-axis represents the correlation coefficient ranging from -1.0 to 1.0, while the x-axis lists the sessions.

Figure 9. Linear correlations for the causal weight-force (top) and color-force (bottom) relations. The confidence intervals are approximated using Fisher's z-transformation. The gray lines indicate zero correlation.

In a further step, we applied correlation analysis to the causal relation between color and force, which was unclear in the sensorimotor causal graph obtained by discovery analysis (Figure 7). Linear correlations are shown in Figure 9. They were dominated by noise, showing negative and positive correlations, which signals that the sensorimotor system had difficulties in identifying the correct linear ordering of the color-breakability mapping as defined in Section 3.3.4.

However, this does not exclude that some information is taken into account, e.g., on the color which most or least enhances sturdiness, while the other two colors are unsystematically mapped to force. We therefore sought a measure that is not based on linear relations or correct ordering. Such a measure is given by the mutual information. In this context, it is interesting to note that mutual information has been suggested as the preferable measure for quantifying the causal influence of one variable on another (Janzing et al., 2013).

The mutual information values for the color-force relations across subjects and sessions are shown in Figure 10. For the interpretation, recall that I(color; force|weight) = 0 indicates that color and force are conditionally independent given weight (i.e., color does not influence force when weight is known) (Cover, 1991; p. 35). Figure 10 shows three interesting facts: First, the observed I(color; force|weight)>0 indicates that there is a causal influence of color on force, i.e., the sensorimotor system takes color into account when determining the motor parameters. Second, although the sensorimotor system does not find the right solution within the duration of the experiment, it learns and improves, as evidenced by the increase from the first (Raw) to the last session (Test) of the experiment. Finally, some low mutual information values appear in the second session (Train), which indicate a low influence of color on force. Note that this is the session for which subjects have been encouraged to test out different exploratory behaviors, irrespective of how many glasses will break. For example, if a subject decides to test what happens if they will use the same force in the next 10 trials, irrespective of the glass properties and of how many glasses break, then the force will be become “decoupled” from the setup and the mutual information will go down, approaching the value I(color; force|weight) = 0 (i.e., approaching conditional independence).

Figure 10
Scatter plot showing data points for three sessions: Raw, Train, and Test aranged on the x-axis. The y-axis represents the conditional mutual information of color and force given weight in natural units. Each session has scattered data points with respective averaged marked by a triangle (red for Raw, green for Train, blue for Test). A line connects the average triangles across sessions (negative slope from Raw to Train and positive from Train to Test).

Figure 10. Mutual information for the causal color-force relation.

4.3 Comparison of the causal relations on conscious cognitive level and on the sensorimotor level

To compare the cognitive and sensorimotor level, we split the participants into two groups per session, depending on whether they identified the relation of interest. In addition to the split per session, we calculate the change of conditional mutual information from the first to the last session per participant. We compared them using the conditional mutual information values.

The Alexander-Govern tests indicate no significant difference between participants who did perceive and did not perceive the causal relation in any scenario tested (Figure 11 shows the p-values).

Figure 11
Scatter plot showing p-values across four scenarios: Raw, Train, Test, and Learning. Two relation types are indicated: color-force and weight-force. P-values range from approximately 0.2 to 0.9.

Figure 11. P-values of Alexander-Govern tests performed with null-hypothesis of equal means. The conditional mutual information values of participants are analyzed, who are separated into groups depending on whether they perceived the given causal relation (indicated by color). In “Learning”, the change of the CMI value from session Raw to session Test is considered.

While the statistical tests did not yield statistically significant results, subtle interactions can be found in the data worth exploring further. To complement the statistical tests, we employ alternative, more exploratory measures.

Figure 12 depicts the interplay of the cognitive- and sensorimotor-level. The plot shows the fraction of participants who judged a causal relationship between glass color and breakability (left) and glass weight and breakability (right), as a function of the number of top-ranked participants included. The ranking is based on the mutual information between stimulus features (color, weight) and the contact force of the glass on the surface, given the other stimulus feature. The x-axis shows the number of participants considered, starting with those exhibiting the strongest sensorimotor-stimulus association, and the y-axis indicates the proportion of these participants who identified the respective causal relationship based on the questionnaire. The color indicates the session.

Figure 12
Two line graphs compare the fraction of relation perceived across cognitive sessions (raw, train, test) for participants, sorted by I(c;f|w) and I(w;f|c) for left and right respectively. x-axes show the number of participants considered from 1 to 21; y-axes show the fraction of relation perceived from 0 to 1. In both graphs, the red line represents raw, green train, and blue represents test. The left graph shows raw decreasing sharply, while train and test are approximately constant. The right graph shows raw and train behaving similar, while test remains at one for the first eight participants and approximately at around 0.8 afterwards.

Figure 12. Fraction of participants that perceived each causal relation (color-breakability left, weight-breakability right) as a function of top-ranked participants considered for each session. The ranking is based on the mutual information of the applied force by the participant and the given stimulus, conditioned on the other stimulus. The x-axis shows the number of participants included.

The left plot indicates that the fraction of participants who perceive the causal relation between color and breakability, as well as the mutual information between color and force, is independent of each other, regardless of the session. However, in the right plot, the mutual information (specifically in the test session) is highest in participants who also identified the causal relation, thus indicating a communication between the sensorimotor and the cognitive level.

Next, and focusing on the weight-force relation, we considered the participants with low conditional mutual information values and compared the gain from the first to the last session. Figure 13 shows that the gain is higher in participants who perceived the causal relation compared to non-perceiving ones. In the latter group, the mutual information is mostly stable.

Figure 13
Two line graphs compare participants that perceived the weight-breakability relation (green) in Raw to those that didn't (red). The x-axes show the sessions Raw and Test and the y-axis measures I(weight; force | color)[nat] from 0.7 to 1.3. The red lines show varied changes, while the green lines indicate mostly positive trends.

Figure 13. Change of sensorimotor mutual information from first to last session conditioned on the conscious cognitive recognition of the weight-breakability relation in the first session.

4.4 Comparison with machine algorithms

As described in Section 3.1, Problem 9, a real machine comparison using a robot in the experiment is not possible. Therefore, we apply the machine algorithms for causal discovery to the data from the experiment. We provide the same data as used for and being generated by each of the subjects, organized into three sessions with 80 trials each.

We begin with an idealized test, in which we grant the algorithms access to the ground-truth variables (weight, color, and breakability threshold). Note that this is a somewhat unfair comparison for humans, as the latter variable is not directly observable to them and they have no 100% perfect perception of weight and color. The main motivation for this test is (a) to provide a baseline on the convergence and robustness properties of the machine algorithms under ideal conditions and (b) to get an idea of what could potentially be learned under the conditions of our experiment.

The results of this test for FCI are shown in Figure 14. The causal graph is shown after learning from the data from the first session, from the combined data from the first and second sessions, and after learning from all three sessions. The results are obtained from data of single-subject runs and then averaged over subjects. See the Supplementary Figures 18, 19 for the results for PC and FGES algorithms.

Figure 14
Three network diagrams each depict relationships between nodes labeled “glass weight,” “glass color,” and “force threshold,” with varying strength values. Corresponding heatmaps are shown below each diagram, illustrating connection frequencies between nodes. Values include 0.33, 0.81, and 0.95 for “glass color” to “force threshold” connections in the respective diagrams.

Figure 14. Causal machine learning with discovery algorithms across the three sessions with an idealized variable set. Upper part: Causal connectivity. Lower part: Causal directions.

After the first session, the causal connection weight—breakability_threshold is learned perfectly, but the causal direction is unstable, as that can only be established in conjunction with a third variable to form a V-Structure. The connection color—breakability_threshold is recognized in 33% of the runs and forms this V-Structure to establish the causal direction. The rate of identification is higher than the 24% of the human subjects. The connection weight—color is correctly recognized as non-existing, in contrast to it being erroneously detected by 24% of the humans.

During the next two sessions, the learned causal graph stabilizes. After incorporating all three sessions, the learned causal graph accurately represents the ground truth almost perfectly. In comparison to the development and quality of the machine representation in quantitative detail to those learned by the participants (Figures 5, 7), the machine seems superior. However, this comparison is idealized for machine learning and unfair insofar as humans cannot directly observe the breakability threshold.

Therefore, we investigate in the next step what machine algorithms can learn if they can access only the variables observable to the human subjects. Instead of the non-observable breakability_threshold, we now make use of the two observable variables force and glass-OK.

It is essential to understand the subtle but important difference in the application of discovery algorithms in Section 4.2. For the analysis of the human causal representation, we wanted to know what the subjects have actually learned; for the human-machine comparison, we want to know what could potentially be learned from the data. The analysis of the human sensorimotor causal representation has thus been limited to only those three variables whose causal relations are relevant for the human causal representation (see Figure 2). In particular, the target variable regarding the causal representation on the sensorimotor level is the force (as a proxy for the estimated breakability). The outcome, i.e., the Glass-OK status of the glass after application of the result of the sensorimotor computation, is not part of the causal sensorimotor representation. It could of course be used by the sensorimotor system in its learning but we are not interested in the question of how we causally describe the learning mechanism of the sensorimotor system but we want to describe the causal sensorimotor representation which results from the learning.

This differs for the question being considered now: what can potentially be learned from the data (in particular, by a machine algorithm)? For this, it is essential to include the Glass-OK status in the analysis.

The results of applying causal discovery (FCI) to the four variables are shown in Figure 15. Similar to the idealized set of variables, the algorithms are applied to the cumulative combined data from the three sessions of one participant, and the results are averaged over all subjects. See the Supplementary Figure 20 for results of the PC and FGES algorithms.

Figure 15
Network diagrams show relationships between variables “glass weight”, “force value”, “glass color” and “glass OK”. The nodes are connected by undirected, weighted edges. Left diagram shows edges weight-force (0.43), weight-OK (0.38), force-OK (0.52). Center diagram shows weight-force (0.67), weight-OK (0.52), force-OK (0.95), color-OK (0.14). Right shows weight-force (0.81), weight-OK (0.81), force-OK (0.95), color-OK (0.19).

Figure 15. Causal relations discovered by the FCI algorithm from the observable data available to humans in the experiment [datasets contain sessions raw (left), raw, train (center), raw, train and test (right)].

The glass_weight - glass_ok relation has been recovered in 81% of runs based on the combined data of all sessions (right), whereas the glass_color - glass_ok relation is identified in 19% of runs. There was no error regarding the link between glass_color and glass_weight. Therefore, the performance of the algorithms is comparable to that of our participants for the influence of weight on the breakability (76%) as well as the independence of glass_weight and glass_color (0.05%) and worse for the influence of the color on the breakability (43%, see Figure 5). Additionally, the direct link between force_value and the outcome of the experiment is recovered in almost all runs (95%). See Section 5.6 for details about the interpretation of (absent) relations, including force_value.

5 Discussion

5.1 Experimental paradigm

In our view, there are several novel aspects of this study: First, unlike many studies on causal structure learning that use abstract settings, we utilize a naturalistic setup based on sensorimotor object place tasks. Second, to the best of our knowledge, this is the first paradigm developed to measure causal structure learning at the sensorimotor level. Third, our setup enables a comparison between causal learning on conscious cognitive and sensorimotor levels, facilitated by our experimental design that ensures both levels identify the same ground-truth causal structure.

5.2 Conscious cognitive level

The results indicate that humans can identify the causal V-structure in our experimental setup through observational learning, achieving 76% weight-breakability, 43% color-breakability, and 95% weight-color independence. Previous studies in more abstract contexts suggested that observing covariation often fails to lead to accurate causal structure identification (Lagnado and Sloman, 2002; Steyvers et al., 2003; Lagnado and Sloman, 2004; Lagnado et al., 2007). Our findings present a more hopeful view of human competence in structure identification. It is plausible that the handling of causality and probability varies based on how problems are presented (Meder and Gigerenzer, 2014). While artificial setups are preferred to avoid prior experience biases, future research should explore more natural settings.

What is unexpected is that our subjects had substantial difficulties in recognizing the direction of causality in our experiment. We expected that by conventional reasoning one direction would be preferred (with inter-individual differences).6 The inability or reluctance to distinguish cause from effect has been reported on other occasions as well. A plausible explanation of such problems is associative reasoning, which is directionless by definition (Rottman and Hastie, 2014; Rehder and Hastie, 2001). Others have reported that humans may perform local reasoning schemes by only considering two variables at a time (Rottman and Hastie, 2014; Kruschke, 2006; Fernbach et al., 2011; Waldmann et al., 2008). In fact, the participants' verbalizations during the questionnaire suggest the utilization of this strategy. Furthermore, it is proposed that people sometimes may add imaginary nodes to such graphs (Rehder and Burnett, 2005; Rottman and Hastie, 2014). Therefore, an imagined hidden common cause could make potential causal relations seem like spurious correlations without direction. Removing a link for such a correlation would rely on the participant's understanding of the question.

Another interesting observation is that some participants noted a strong prior belief that weight and color will possibly be connected in our experimental setup (see also role of prior). With this in mind, it is not surprising that this connection also appears in the graphs identified by the subjects after the first session, as these biases have a large impact on the (perhaps erroneous) detection of relationships. A recognition of correlation in data is simpler than rejecting a hypothesized relationship (Waldmann, 1996; Waldmann and Hagmayer, 2001; Mercier, 2022; Klayman, 1995, see Lagnado et al., 2007).

We do not believe this reflects actual prior experience, as no transparent glasses were used, and the red, green, and blue glasses do not have an expected systematic relation to weight. Many types of colored glasses have varying weights. Subjects recognized color and weight, and maybe they initially expected a relation due to the experimental context. However, most were able to learn that no such relation exists. This unlearning is notable, as representations of causal links can be resistant to contradictory information (Barberia et al., 2019; Lagnado and Sloman, 2006; Yarritu and Matute, 2015).

5.3 Sensorimotor level

We conducted a novel investigation of causal structure learning at the sensorimotor level (see comments on problems 4 and 6 in sect. 3.1) through a specific task and meta-level analysis using machine causal discovery methods. The causal discovery analysis indicated that the sensorimotor system “recognized” weight as influencing breakability, while no causal link was found between color and breakability. Correlation analysis supported the weight-breakability relationship with positive correlations, though not as systematic as those at the cognitive level, and showed no evidence of a color-breakability relationship, with correlation values around zero.

However, does the sensorimotor system really completely ignore the color-breakability relation? Applying a mutual information measure showed that this is not the case. In case of complete ignorance of the relation between color and breakability, the force variable should be simply independent of color. However, non-zero mutual information was found in the first session. Additionally, since the information measures increased, the sensorimotor system was likely able to learn and optimize using the color-breakability relation during the experiment.

The system struggled to learn the correct quantitative relationship between color and breakability, as shown by its random correlation results. This difficulty arises from the complex mapping and the challenge of ordering color levels. Moreover, color has less influence on breakability than weight, based on our cautious interpretation of pre-test results. Future experiments emphasizing color could improve the understanding of this relationship. Simplifying color information from three to two levels might also help clarify the linear relationship with breakability.

Finally, it is worth noting that in sensorimotor research, a decrease in errors indicates learning. Using mutual information could offer insights into early sensorimotor learning phases and enhance causal discovery results.

5.4 Relations between levels

The current state of scientific opinion on this issue remains uncertain (see problem 2 in Section 3.1). Our data indicate that information transfer between cognitive and sensorimotor levels is largely decoupled. While exploratory measures suggest a weak relationship, we have found no evidence for the direction of information flow.

If both levels share the same causal representation, subjects recognizing the causal relation should show stronger coupling in the sensorimotor domain. This could occur on the same timescale (indicating that both levels operate in parallel) or with a slight delay (indicating that one level informs the other). However, our findings suggest unrelated causal representations in early learning stages, as there is minimal overlap between those recognizing the causal relation and those showing increased stimulus-motor output association.

In comparison, the cognitive level appears to identify relations more quickly and precisely than the sensorimotor level, which faces greater demands. While the cognitive level may only need to understand qualitative relations, the sensorimotor level must determine quantitative relationships for successful task completion, making it more challenging.

For further analysis, we need to differentiate the levels, their interrelations, and the information they represent. We focus on two basic levels: cognitive, measured by the questionnaire, and sensorimotor, assessed through sensorimotor data. While we've examined the causal representational properties of these levels, there's also an “intermediate” level related to self-reported behaviors and optimization strategies. Participants reported how they optimize their movements, which reflects their conscious perception of sensorimotor properties. This intermediate level can be viewed as a sub-level of the cognitive level.

This distinction of three levels now allows for the observation of an interesting pattern of dissociation. On the basic cognitive level, the relationship between the weight and the breakability of the glass can be clearly perceived and verbalized. This relationship also appears to be accessible at the basic sensorimotor level, where participants seem to improve over the course of the experiment in placing heavier glasses with less precision without breaking them (an important possibility is that this allows for exploiting the speed-accuracy-tradeoff Burdet and Milner, 1998). However, when probed on the intermediate level, no participant seemed to notice this pattern in their own behavior or incorporate it into a conscious strategy at the second level.

Considering the color of the glass, the pattern changes. On the cognitive level, the connection between the color of the glass and its breakability is harder to identify than the relationship between weight and breakability. There's no evidence of motor behavior effects related to color, nor is there a conscious strategy that links perceived color to motor control. To put it bluntly, the basic levels “agree” on weight but “disagree” on color, while the intermediate level lacks understanding of how to translate conscious knowledge into effective motor actions.

This dissociation pattern may be linked to prior knowledge and beliefs. Research shows that humans are skilled at aligning structural knowledge with data (Waldmann, 1996; Waldmann and Hagmayer, 2001; Lagnado et al., 2007). The weight-breakability link appears more plausible than the color-breakability connection, as greater weight may suggest thicker materials that break less easily. This difference in plausibility was a key factor in the experiment's design, which aimed to minimize interference from prior knowledge. Additionally, the uncertainty surrounding the color-breakability link prevents its effective integration into the sensorimotor optimization loop without further empirical data.

5.5 Role of prior information

As expected, our results suggest a possible prior expectation about a link between glass weight and breakability; people may have experienced heavier glasses as being sturdier. Examining the color-breakability relation, the discovery-based sensorimotor causal representation analysis did not identify any relation between the two variables. This can be seen as evidence against a relevant prior for this relationship. Since the learning of the “novel” color-breakability relation was of special importance for us, we performed pre-tests to determine the degree of influence of color on the breakability in a way that the causal learning process should be clearly observable. In particular, we wanted to avoid that many of our subjects recognize the role of color already in the first session (raw). With hindsight, we may have been too cautious, as half of our subjects were unable to recognize the influence of color throughout the entire experiment. Furthermore, at the sensorimotor level, color is even a greater problem, since it not only has to “recognize” that color has an influence but also to find an at least approximately correct quantitative mapping. However, since the sensorimotor level seems to recognize, in principle, that there is some influence of color (see Section 5.3), we would expect that setting the gain of color higher in our experimental setup would enable both the cognitive level and the sensorimotor level to learn this causal relation.

At last, there seems to be an erroneous prior for the weight-color relation. This might stem from the fact that color is sometimes used as an indicator of continuous variables, such as weight. However, as there are also examples against this relation, it seems more plausible that subjects just assumed that the most apparent features are linked (see end of Section 5.2).

Due to the experimental design, it was not possible to gather information about the expectations of each participant. This also implies that we cannot infer how much learning happened during the first session. Interrogating participants at the start of the experiment would introduce a bias and alter the level of naivety that participants exhibit. Of course, this raises the question of whether learning success can be measured as a difference from zero or if a different level is more appropriate. Ultimately, we cannot answer this question without gathering more data about prior expectations.

5.6 Comparison with machine algorithms

The results from the idealized set of variables highlight two key points: First, the experimental data are sufficient for complete learning, despite the limitations of a small sample size of 3 × 80 trials. Second, the learning rate of algorithms is comparable to that of human subjects, although algorithms typically outperform humans in identification. However, when both are provided the same variable set, the algorithms' performance declines with low sample volumes, making them less stable in recovering causal relations, especially in identifying weak relationships, and ultimately end at a similar identification rate to humans (weight-breakability) or even worse (color-breakability). The recovered direct link between force_value and the outcome of the experiment is important, as this indicates that the influence from the produced force on the state of the glass (broken/healthy) is recoverable. This indicates that our experimental paradigm works (see problems 4–6).

Next, we consider the causal parents of force_value in the graph. Even if the human sensorimotor representation fails to grasp the color-force relation (which is often not learned by participants), the machine algorithm may detect a causal link between color and glass_ok. This could suggest an ideal learner might view the sensorimotor representation as imperfect. However, since algorithms are applied across all sessions for each subject, potential learning effects must be taken into account. Thus, the factors affecting force_value likely vary qualitatively and quantitatively within the dataset, leading us to avoid interpreting these relations based on the results.

6 Conclusion

In this paper, we introduce a novel experimental paradigm to investigate human causal structure learning at both the cognitive and sensorimotor levels. This is the first study to explore causal learning on the sensorimotor level using a naturalistic task, which necessitates learning the same causal structure for successful task completion. Our exploratory study reveals that: (a) causal structure learning is achievable in a sensorimotor context, (b) it occurs at both cognitive and sensorimotor levels, and (c) while the cognitive level may influence the sensorimotor level, both can operate largely independently in processing causal structures.

Several aspects deserve a closer investigation in future research: First, why is there a problem with the unique identification of causal directions? Although our subjects could have determined the correct causal direction through commonsense reasoning about the task and experimental setup, as well as by utilizing the specific structure of the common-effect causal structure, they insisted until the very end of the experiment that they considered both causal directions equally possible. Second, it would be interesting to explore how far sensorimotor causal structure learning is possible with respect to other object properties, such as causal relations between shape and elasticity. Third, it would be desirable to investigate the influence of a substantially extended training period on causal learning. Not only because it is known that sensorimotor learning in general can profit from longer training than in the current experiment. Longer training would also provide us with more data for applying causal discovery to the sensorimotor data. Finally, a further obvious extension in future research of sensorimotor causal structure learning is the design of tasks and experimental setups that allow for and require interventions in the Pearlian sense for the unique identification of the causal structure. This also includes conducting an experiment with more participants to validate the findings, particularly given the small effect sizes suggested by our analysis (e.g., conditional mutual information across sessions).

In summary, the present results reinforce our conviction that the sensorimotor domain offers a valuable testbed for examining how causal information affects our perception of the environment and our motor interactions with it.

Data availability statement

The raw data supporting the conclusions of this manuscript will be made available by the authors, without undue reservation, to any qualified researcher.

Ethics statement

The studies involving humans were approved by University of Bremen Ethics Committe. The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study. Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.

Author contributions

NB: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Validation, Visualization, Writing – original draft, Writing – review & editing. CZ: Conceptualization, Funding acquisition, Methodology, Writing – original draft, Writing – review & editing. JM: Conceptualization, Methodology, Software, Writing – original draft, Writing – review & editing. KS: Funding acquisition, Project administration, Resources, Supervision, Writing – review & editing.

Funding

The author(s) declare that financial support was received for the research and/or publication of this article. This work has been supported by the German Research Foundation DFG, as part of Collaborative Research Center (Sonderforschungsbereich) 1320 Project-ID 329551904 “EASE - Everyday Activity Science and Engineering,” University of Bremen (http://www.ease-crc.org/). The research was conducted in subproject H01 “Sensorimotor and Causal Human Activity Models for Cognitive Architectures”.

Acknowledgments

Usage and optimization of the causal discovery algorithms were possible thanks to discussion and support of Konrad Gadzicki (EASE - subproject H03 “Discriminative and Generative Human Activity Models for Cognitive Architectures”).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that no Gen AI was used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fcogn.2025.1565294/full#supplementary-material

Footnotes

1. ^Available for a fee at: https://www.worldviz.com/vizard-virtual-reality-software.

2. ^Available for a fee at: https://de.3dsystems.com/haptics-devices/openhaptics.

3. ^https://github.com/bd2kccd/causal-cmd

4. ^https://github.com/py-why/causal-learn

5. ^ACE, defined as P(Y = y|do(X = x1))−P(Y = y|do(X = x2)) for the interventions do(X = x1) and do(X = x2) (Pearl et al., 2016), only accounts for the linear aspect of an interaction between variables (i.e., shifts in the mean), and ANOVA is not appropriate for variables with nonlinear causal influences and when there are statistical dependencies between the variables (see Janzing et al., 2013 and references therein).

6. ^E.g. More weight indicates more material and therefore less breakability.

References

Allan, L. G., and Jenkins, H. M. (1980). The judgment of contingency and the nature of the response alternatives. Can. J. Psychol. 34, 1–11. doi: 10.1037/h0081013

Crossref Full Text | Google Scholar

Andrews, B., Ramsey, J., and Cooper, G. F. (2018). Scoring Bayesian networks of mixed variables. Int. J. Data Sci. Anal. 6, 3–18. doi: 10.1007/s41060-017-0085-7

PubMed Abstract | Crossref Full Text | Google Scholar

Andrews, B., Ramsey, J., and Cooper, G. F. (2019). “Learning high-dimensional directed acyclic graphs with mixed data-types,” in The 2019 ACM SIGKDD Workshop on Causal Discovery (Anchorage: PMLR), 4–21.

PubMed Abstract | Google Scholar

Bailey, D. H., Jung, A. J., Beltz, A. M., Eronen, M. I., Gische, C., Hamaker, E. L., et al. (2024). Causal inference on human behaviour. Nat. Hum. Behav. 8, 1448–1459. doi: 10.1038/s41562-024-01939-z

PubMed Abstract | Crossref Full Text | Google Scholar

Barberia, I., Vadillo, M. A., and Rodríguez-Ferreiro, J. (2019). Persistence of causal illusions after extensive training. Front. Psychol. 10:24. doi: 10.3389/fpsyg.2019.00024

PubMed Abstract | Crossref Full Text | Google Scholar

Burdet, E., and Milner, T. E. (1998). Quantization of human motions and learning of accurate movements. Biol. Cybern., 78, 307–318. doi: 10.1007/s004220050435

PubMed Abstract | Crossref Full Text | Google Scholar

Cheng, P. W. (1997). From covariation to causation: a causal power theory. Psychol. Rev. 104, 367–405. doi: 10.1037/0033-295X.104.2.367

Crossref Full Text | Google Scholar

Cheng, P. W., and Novick, L. R. (1990). A probabilistic contrast model of causal induction. J. Pers. Soc. Psychol. 58, 545–567. doi: 10.1037/0022-3514.58.4.545

PubMed Abstract | Crossref Full Text | Google Scholar

Chickering, D. M. (1996). “Learning Bayesian networks is np-complete,” in Learning From Data: Artificial Intelligence and Statistics V (Cham: Springer), 121–130. doi: 10.1007/978-1-4612-2404-4_12

Crossref Full Text | Google Scholar

Cover, T. M. (1991). Elements of Information Theory. Wiley series in telecommunications. New York, NY: Wiley.

Google Scholar

D'Amario, S., Ruttle, J. E., 't Hart, B. M., and Henriques, D. Y. P. (2024). Implicit adaptation is fast, robust and independent from explicit adaptation. bioRxiv [preprint]. doi: 10.1101/2024.04.10.588930

Crossref Full Text | Google Scholar

Evans, J. S. B. T. (2008). Dual-processing accounts of reasoning, judgment, and social cognition. Annu. Rev. Psychol. 59, 255–278. doi: 10.1146/annurev.psych.59.103006.093629

PubMed Abstract | Crossref Full Text | Google Scholar

Fernbach, P. M., Darlow, A., and Sloman, S. A. (2011). Asymmetries in predictive and diagnostic reasoning. J. Exp. Psychol. Gen. 140:168. doi: 10.1037/a0022100

PubMed Abstract | Crossref Full Text | Google Scholar

Fitts, P. M., and Posner, M. I. (1967). Human Performance. Oxford: Brooks/Cole.

Google Scholar

Glymour, C. (2003). Learning, prediction and causal bayes nets. Trends Cogn. Sci. 7, 43–48. doi: 10.1016/S1364-6613(02)00009-8

PubMed Abstract | Crossref Full Text | Google Scholar

Glymour, C., Zhang, K., and Spirtes, P. (2019). Review of causal discovery methods based on graphical models. Front. Genet. 10:524. doi: 10.3389/fgene.2019.00524

PubMed Abstract | Crossref Full Text | Google Scholar

Glymour, C. N. (2001). The Mind's Arrows. Bradford Bks. Cambridge, MA: MIT Press. “A Bradford book.” OCLC-Licensed Vendor Bibliographic Record. doi: 10.7551/mitpress/4638.001.0001

Crossref Full Text | Google Scholar

Goddu, M. K., and Gopnik, A. (2024). The development of human causal learning and reasoning. Nat. Rev. Psychol. 3, 319–339. doi: 10.1038/s44159-024-00300-5

Crossref Full Text | Google Scholar

Gopnik, A., Glymour, C., Sobel, D. M., Schulz, L. E., Kushnir, T., Danks, D., et al. (2004). A theory of causal learning in children: causal maps and bayes nets. Psychol. Rev. 111, 3–32. doi: 10.1037/0033-295X.111.1.3

PubMed Abstract | Crossref Full Text | Google Scholar

Gopnik, A., and Tenenbaum, J. B. (2007). Bayesian networks, Bayesian learning and cognitive development. Dev. Sci. 10, 281–287. doi: 10.1111/j.1467-7687.2007.00584.x

PubMed Abstract | Crossref Full Text | Google Scholar

Griffiths, T. L., and Tenenbaum, J. B. (2009). Theory-based causal induction. Psychol. Rev. 116, 661–716. doi: 10.1037/a0017201

PubMed Abstract | Crossref Full Text | Google Scholar

Hattori, M., and Oaksford, M. (2007). Adaptive non-interventional heuristics for covariation detection in causal induction: model comparison and rational analysis. Cogn. Sci. 31, 765–814. doi: 10.1080/03640210701530755

PubMed Abstract | Crossref Full Text | Google Scholar

Heckerman, D., Meek, C., and Cooper, G. (2006). “A Bayesian approach to causal discovery,” in Innovations in Machine Learning: Theory and Applications, eds. D. E. Holmes, and L. C. Jain (Cham: Springer), 1–28. doi: 10.1007/3-540-33486-6_1

Crossref Full Text | Google Scholar

Hellström, T. (2021). The relevance of causation in robotics: a review, categorization, and analysis. Paladyn J. Behav. Robot. 12, 238–255. doi: 10.1515/pjbr-2021-0017

Crossref Full Text | Google Scholar

Huang, B., Zhang, K., Lin, Y., Schölkopf, B., and Glymour, C. (2018). “Generalized score functions for causal discovery,” in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (New York, NY: ACM), 1551–1560. doi: 10.1145/3219819.3220104

PubMed Abstract | Crossref Full Text | Google Scholar

Janzing, D., Balduzzi, D., Grosse-Wentrup, M., and Schölkopf, B. (2013). Quantifying causal influences. Ann. Statist. 41, 2324–2358. doi: 10.1214/13-AOS1145

Crossref Full Text | Google Scholar

Jenkins, H. M., and Ward, W. C. (1965). Judgment of contingency between responses and outcomes. Psychol. Monogr. Gen. Appl. 79, 1–17. doi: 10.1037/h0093874

PubMed Abstract | Crossref Full Text | Google Scholar

Kahneman, D. (2012). Thinking, Fast and Slow. Penguin Psychology. London: Penguin Books.

Google Scholar

Kalainathan, D., Goudet, O., Guyon, I., Lopez-Paz, D., and Sebag, M. (2022). Structural agnostic modeling: adversarial learning of causal graphs. J. Mach. Learn. Res. 23, 1–62. doi: 10.48550/arXiv.1803.04929

Crossref Full Text | Google Scholar

Karimi Mamaghan, A. M., Tigas, P., Johansson, K. H., Gal, Y., Annadani, Y., Bauer, S., et al. (2024). Challenges and considerations in the evaluation of Bayesian causal discovery. arXiv [preprint]. arXiv:2406.03209. doi: 10.48550/arXiv.2406.03209

Crossref Full Text | Google Scholar

Klayman, J. (1995). Varieties of confirmation bias. Psychol. Learn. Motiv. 32, 385–418. doi: 10.1016/S0079-7421(08)60315-1

Crossref Full Text | Google Scholar

Körding, K. P., Beierholm, U., Ma, W. J., Quartz, S., Tenenbaum, J. B., Shams, L., et al. (2007). Causal inference in multisensory perception. PLoS ONE 2:e943. doi: 10.1371/journal.pone.0000943

PubMed Abstract | Crossref Full Text | Google Scholar

Kruschke, J. K. (2006). Locally Bayesian learning with applications to retrospective revaluation and highlighting. Psychol. Rev. 113:677. doi: 10.1037/0033-295X.113.4.677

PubMed Abstract | Crossref Full Text | Google Scholar

Lagnado, D. A., and Sloman, S. (2002). “Learning causal structure,” in Proceedings of the Annual Meeting of the Cognitive Science Society, Vol. 24 (New York, NY).

Google Scholar

Lagnado, D. A., and Sloman, S. (2004). The advantage of timely intervention. J. Exp. Psychol. Learn. Mem. Cogn. 30:856. doi: 10.1037/0278-7393.30.4.856

PubMed Abstract | Crossref Full Text | Google Scholar

Lagnado, D. A., and Sloman, S. A. (2006). Time as a guide to cause. J. Exp. Psychol. Learn. Mem. Cogn. 32, 451–460. doi: 10.1037/0278-7393.32.3.451

PubMed Abstract | Crossref Full Text | Google Scholar

Lagnado, D. A., Waldmann, M. R., Hagmayer, Y., and Sloman, S. A. (2007). “Beyond covariation,” in Causal Learning: Psychology, Philosophy, and Computation, eds. A. Gopnik, and L. Schulz (Oxford: Oxford Academic), 154–172. doi: 10.1093/acprof:oso/9780195176803.003.0011

Crossref Full Text | Google Scholar

Lucas, C. G., and Griffiths, T. L. (2010). Learning the form of causal relationships using hierarchical Bayesian models. Cogn. Sci. 34, 113–147. doi: 10.1111/j.1551-6709.2009.01058.x

PubMed Abstract | Crossref Full Text | Google Scholar

Malinsky, D., and Danks, D. (2017). Causal discovery algorithms: a practical guide. Philos. Compass 13:12470. doi: 10.1111/phc3.12470

Crossref Full Text | Google Scholar

McDougle, S. D., Bond, K. M., and Taylor, J. A. (2015). Explicit and implicit processes constitute the fast and slow processes of sensorimotor learning. J. Neurosci. 35, 9568–9579. doi: 10.1523/JNEUROSCI.5061-14.2015

PubMed Abstract | Crossref Full Text | Google Scholar

Meder, B., Gerstenberg, T., Hagmayer, Y., and Waldmann, M. R. (2010). Observing and intervening: rational and heuristic models of causal decision making. Open Psychol. J. 3, 119–135. doi: 10.2174/1874350101003010119

Crossref Full Text | Google Scholar

Meder, B., and Gigerenzer, G. (2014). “Statistical thinking: no one left behind,” in Probabilistic Thinking: Presenting plural Perspectives, eds. E. J. Chernoff, and B. Sriraman (Cham: Springer Netherlands), 127–148. doi: 10.1007/978-94-007-7155-0_8

Crossref Full Text | Google Scholar

Mercier, H. (2022). “Confirmation bias-myside bias,” in Cognitive Illusions, ed. R. F. Pohl (London: Routledge), 78–91. doi: 10.4324/9781003154730-7

Crossref Full Text | Google Scholar

Meyer, P. E. (2022). infotheo: Information-Theoretic Measures. R package version 1.2.0.1. doi: 10.32614/CRAN.package.infotheo

Crossref Full Text | Google Scholar

Nogueira, A. R., Pugnana, A., Ruggieri, S., Pedreschi, D., and Gama, J. (2022). Methods and tools for causal discovery and causal inference. WIREs Data Min. Knowl. Disc. 12:1449. doi: 10.1002/widm.1449

Crossref Full Text | Google Scholar

Pearl, J. (2009a). Causality, 2 Edn. Cambridge, MA: Cambridge University Press.

Google Scholar

Pearl, J. (2009b). “Introduction to probabilities, graphs, and causal models,” in Causality (Cambridge: Cambridge University Press), 1–40. doi: 10.1017/CBO9780511803161.003

Crossref Full Text | Google Scholar

Pearl, J. (2009c). “A theory of inferred causation,” in Causality (Cambridge: Cambridge University Press), 41–64. doi: 10.1017/CBO9780511803161.004

Crossref Full Text | Google Scholar

Pearl, J., Glymour, M., and Jewell, N. P. (2016). Causal Inference in Statistics: A Primer. Chichester: Wiley.

Google Scholar

Pylyshyn, Z. (1999). Is vision continuous with cognition?: the case for cognitive impenetrability of visual perception. Behav. Brain Sci. 22, 341–365. doi: 10.1017/S0140525X99002022

PubMed Abstract | Crossref Full Text | Google Scholar

R Core Team (2022). R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing.

Google Scholar

Ramsey, J. D., Zhang, K., Glymour, M., Romero, R. S., Huang, B., Ebert-Uphoff, I., et al. (2018). “Tetrad—a toolbox for causal discovery,” in 8th International Workshop on Climate Informatics (Boulder), 1–4.

Google Scholar

Rehder, B. (2017a). “Concepts as causal models: categorization,” in The Oxford Handbook of Causal Reasoning, ed. M. R. Waldmann (Oxford: Oxford University Press), 347–376. doi: 10.1093/oxfordhb/9780199399550.013.39

Crossref Full Text | Google Scholar

Rehder, B. (2017b). “Concepts as causal models: induction,” in The Oxford Handbook of Causal Reasoning, ed. M. R. Waldmann (Oxford: Oxford University Press), 377–414. doi: 10.1093/oxfordhb/9780199399550.013.21

Crossref Full Text | Google Scholar

Rehder, B., and Burnett, R. C. (2005). Feature inference and the causal structure of categories. Cogn. Psychol. 50, 264–314. doi: 10.1016/j.cogpsych.2004.09.002

PubMed Abstract | Crossref Full Text | Google Scholar

Rehder, B., and Hastie, R. (2001). Causal knowledge and categories: the effects of causal beliefs on categorization, induction, and similarity. J. Exp. Psychol. Gene. 130:323. doi: 10.1037/0096-3445.130.3.323

PubMed Abstract | Crossref Full Text | Google Scholar

Rottman, B. M., and Hastie, R. (2014). Reasoning about causal relationships: inferences on causal networks. Psychol. Bull. 140:109. doi: 10.1037/a0031903

PubMed Abstract | Crossref Full Text | Google Scholar

Scholkopf, B., Locatello, F., Bauer, S., Ke, N. R., Kalchbrenner, N., Goyal, A., et al. (2021). Toward causal representation learning. Proc. IEEE 109, 612–634. doi: 10.1109/JPROC.2021.3058954

Crossref Full Text | Google Scholar

Shams, L., and Beierholm, U. (2022). Bayesian causal inference: a unifying neuroscience theory. Neurosci. Biobehav. Rev. 137:104619. doi: 10.1016/j.neubiorev.2022.104619

PubMed Abstract | Crossref Full Text | Google Scholar

Shams, L., and Beierholm, U. R. (2010). Causal inference in perception. Trends Cogn. Sci. 14, 425–432. doi: 10.1016/j.tics.2010.07.001

Crossref Full Text | Google Scholar

Shiffrin, R. M., and Schneider, W. (1977). Controlled and automatic human information processing: II. perceptual learning, automatic attending and a general theory. Psychol. Rev. 84, 127–190. doi: 10.1037/0033-295X.84.2.127

Crossref Full Text | Google Scholar

Smith, M. A., Ghazizadeh, A., and Shadmehr, R. (2006). Interacting adaptive processes with different timescales underlie short-term motor learning. PLoS Biol. 4:e179. doi: 10.1371/journal.pbio.0040179

PubMed Abstract | Crossref Full Text | Google Scholar

Spirtes, P. (2010). Introduction to causal inference. J. Mach. Learn. Res. 11, 1643–1662.

Google Scholar

Spirtes, P., Glymour, C., and Scheines, R. (2001). Causation, Prediction, and Search. Cambridge, MA: MIT press. doi: 10.7551/mitpress/1754.001.0001

Crossref Full Text | Google Scholar

Steyvers, M., Tenenbaum, J. B., Wagenmakers, E.-J., and Blum, B. (2003). Inferring causal networks from observations and interventions. Cogn. Sci. 27, 453–489. doi: 10.1207/s15516709cog2703_6

Crossref Full Text | Google Scholar

Sülzenbrück, S., and Heuer, H. (2009). Functional independence of explicit and implicit motor adjustments. Conscious. Cogn. 18, 145–159. doi: 10.1016/j.concog.2008.12.001

PubMed Abstract | Crossref Full Text | Google Scholar

Taylor, J. A., Krakauer, J. W., and Ivry, R. B. (2014). Explicit and implicit contributions to learning in a sensorimotor adaptation task. J. Neurosci. 34, 3023–3032. doi: 10.1523/JNEUROSCI.3619-13.2014

PubMed Abstract | Crossref Full Text | Google Scholar

Tenenbaum, J. B., Griffiths, T. L., and Kemp, C. (2006). Theory-based Bayesian models of inductive learning and reasoning. Trends Cogn. Sci. 10, 309–318. doi: 10.1016/j.tics.2006.05.009

PubMed Abstract | Crossref Full Text | Google Scholar

Waldmann, M. R. (1996). “Knowledge-based causal induction,” in Psychology of Learning and Motivation, Vol. 34 (Amsterdam: Elsevier), 47–88. doi: 10.1016/S0079-7421(08)60558-7

Crossref Full Text | Google Scholar

Waldmann, M. R., Cheng, P. W., Hagmayer, Y., and Blaisdell, A. P. (2008). “Causal learning in rats and humans: a minimal rational model. The probabilistic mind,” in Prospects for Bayesian Cognitive Science, eds. N. Chater, and M. Oaksford (Oxford: Oxford Academic), 453–484. doi: 10.1093/acprof:oso/9780199216093.003.0020

Crossref Full Text | Google Scholar

Waldmann, M. R., and Hagmayer, Y. (2001). Estimating causal strength: the role of structural knowledge and processing effort. Cognition 82, 27–58. doi: 10.1016/S0010-0277(01)00141-X

PubMed Abstract | Crossref Full Text | Google Scholar

Waldmann, M. R., and Martignon, L. (1998). “A Bayesian network model of causal learning,” in 20th Annual Conference of the Cognitive-Science-Society (London: Routledge), 1102–1107. doi: 10.4324/9781315782416-198

Crossref Full Text | Google Scholar

Weakliem, D. L. (1999). A critique of the Bayesian information criterion for model selection. Sociol. Methods Res. 27, 359–397. doi: 10.1177/0049124199027003002

Crossref Full Text | Google Scholar

Yarritu, I., and Matute, H. (2015). Previous knowledge can induce an illusion of causality through actively biasing behavior. Front. Psychol. 6:389. doi: 10.3389/fpsyg.2015.00389

PubMed Abstract | Crossref Full Text | Google Scholar

Zheng, Y., Huang, B., Chen, W., Ramsey, J., Gong, M., Cai, R., et al. (2024). Causal-learn: causal discovery in python. J. Mach. Learn. Res. 25, 1–8.

Google Scholar

Keywords: virtual reality, haptic feedback, causal learning, causal discovery, causal representation, sensorimotor behavior, dual systems theory

Citation: Bahr N, Zetzsche C, Maldonado J and Schill K (2026) Cause-effect perception in an object place task. Front. Cognit. 4:1565294. doi: 10.3389/fcogn.2025.1565294

Received: 22 January 2025; Revised: 13 November 2025;
Accepted: 21 November 2025; Published: 02 January 2026.

Edited by:

Andrew Tolmie, University College London, United Kingdom

Reviewed by:

Mohammed Mostafizur Rahman, Harvard University, United States
Akash K. Rao, Manipal Academy of Higher Education, India

Copyright © 2026 Bahr, Zetzsche, Maldonado and Schill. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Nikolai Bahr, bmliYWhyQHVuaS1icmVtZW4uZGU=

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.