Revisiting Wittgenstein’s puzzle: hierarchical encoding and comparison facilitate learning of probabilistic relational categories

Jung, Wookyoung; Hummel, John E.

doi:10.3389/fpsyg.2015.00110

ORIGINAL RESEARCH article

Front. Psychol., 10 February 2015

Sec. Cognition

Volume 6 - 2015 | https://doi.org/10.3389/fpsyg.2015.00110

This article is part of the Research TopicImpact of Context on Category LearningView all 15 articles

Revisiting Wittgenstein’s puzzle: hierarchical encoding and comparison facilitate learning of probabilistic relational categories

Wookyoung Jung^*

John E. Hummel

Relational Perception and Thinking Laboratory, Department of Psychology, University of Illinois at Urbana-Champaign, Champaign, IL, USA

Kittur et al. (2004, 2006) and Jung and Hummel (2011, 2014) showed that people have great difficulty learning relation-based categories with a probabilistic (i.e., family resemblance) structure, in which no single relation is shared by all members of a category. Yet acquisition of such categories is not strictly impossible: in all these studies, roughly half the participants eventually learned to criterion. What are these participants doing that the other half are not? We hypothesized that successful participants were those who divided the nominal categories into two or more sub-categories, each of which individually had a deterministic structure. We report three experiments testing this hypothesis: explicitly presenting participants with hierarchical (category and sub-category) structures facilitated the acquisition of otherwise probabilistic relational categories, but only when participants learned the subordinate-level (i.e., deterministic) categories prior to learning the nominal (i.e., probabilistic) categories and only when they were permitted to view multiple exemplars of the same category simultaneously. These findings suggest that one way to learn natural relational categories with a probabilistic structure [e.g., Wittgenstein’s (1953), category game, or even mother] is by learning deterministic subordinate-level concepts first and connecting them together under a common concept or label. They also add to the literature suggesting that comparison of multiple exemplars plays an instrumental role in relational learning.

Introduction

One of the most generally accepted assumptions in the literature on categorization and category learning is that categories and exemplars are mentally represented as lists of features and that the process of assigning exemplars to categories is based on comparing their features (for reviews, see Murphy, 2002; Kittur et al., 2006). As pointed out by Barsalou (1983), Gentner (1983), Murphy and Medin (1985) and others, one limitation of this view is that many concepts and categories are based, not on the literal features of their exemplars, but on relations—either relations among an exemplar’s features (e.g., arranged in one way, the parts of a folding bed form a bed, but arranged in another, they form a couch; Biederman, 1987; Hummel and Biederman, 1992) or relations between the exemplar and other objects in the world (e.g., the category conduit is defined by a relation between the conduit and the thing it carries; barrier is defined by the relation between the barrier, the thing to which it blocks access and the thing deprived of that access; even mother is defined by a relation between the mother and her child. Such concepts include both role-governed categories (Markman and Stilwell, 2001), such as friend, mother, conduit and key, which are defined by an object’s role relative to another object and full-blown, multi-role schemas, such as transaction (see, e.g., Gentner, 2005; Gentner and Kurtz, 2005; see also Hummel and Holyoak, 2003). Relational categories may be more the rule than the exception: informal ratings by Asmuth and Gentner (2005) of the 100 highest-frequency nouns in the British National Corpus revealed that about half of these nouns refer to relational concepts. The distinction between relational and feature-based categories need not pose a problem for the study of category learning as long as relational and featural categories are learned in similar ways. But if they are learned in different ways, then little or nothing we know about the acquisition of feature-based categories will necessarily apply to the case of relational concepts and categories.

Consider the well-known prototype effects in category learning (effects so robust they led Murphy, 2002, to quip that any experiment that fails to show them is suspect). One of the most basic of these effects is that participants are capable of learning categories with a family resemblance structure—that is, a structure in which every member of a category shares more features with the prototype of its own category than it does with the prototype of the contrasting category, but no single feature is shared by all members of the category. As noted by Kittur et al. (2004, 2006), this effect has always been demonstrated using categories defined by their exemplars’ features. These researchers wondered whether they could also demonstrate prototype effects in categories defined, not by the exemplars’ features, but by the relations among those features. In Kittur et al.’s (2004, 2006) experiments each exemplar was a two-part “object” consisting of an octagon and a square. In the prototype of one category, the octagon was larger than the square, darker than the square, above the square (in the picture plane) and in front of the square; in the prototype of the other category, the octagon was smaller, lighter, below, and behind the square. In a design typical of experiments demonstrating prototype effects, the categories had a family resemblance structure, such that each exemplar possessed three relations typical of its own prototype and one relation typical of the opposite prototype and no relation was shared by all members of either category.

Consistent with the hypothesis that relational categories are not learned in the same way as feature-based categories, Kittur et al. (2004, 2006); see also (Jung and Hummel, 2014) found that people have great difficulty learning relational categories with a probabilistic (i.e., family resemblance) structure. Their findings are consistent with the hypothesis that people learn relational concepts by a process of intersection discovery. Numerous researchers have proposed that relational concepts are represented as schemas¹: relational structures that specify the properties of a concept or category exemplar and the relations among those properties and between the concept and other concepts (e.g., Gentner, 1983; Murphy and Medin, 1985; Holland et al., 1986; Keil, 1989; Barsalou, 1993; Ross and Spalding, 1994). In turn, it has been proposed (e.g., Gick and Holyoak, 1980, 1983; Hummel and Holyoak, 2003) that schemas are learned by a process of structural alignment (i.e., analogical mapping; see Hummel and Holyoak, 2003) followed by intersection discovery, in which a schema is learned from examples by keeping what the examples have in common and disregarding details on which they differ (see also Doumas et al., 2008). Alignment and intersection discovery are useful because they can reveal relational generalities that might otherwise remain implicit in the mental representation of the individual exemplars (see Doumas et al., 2008). However, intersection discovery fails catastrophically with probabilistic categories, in which the intersection is the empty set: by definition, the intersection is that which is common to all exemplars; in a probabilistic category structure, nothing is common to all exemplars. The findings of Kittur et al. (2004, 2006) are consistent with this account of their participants’ failure to learn their category structures.

Jung and Hummel (2014) extended the Kittur et al. (2004, 2006) findings by exploring the conditions under which probabilistic relational categories can be rendered learnable. Our logic was as follows: if the intersection discovery account of how we learn relational categories is correct, then any task that encourages participants to discover a relation that remains invariant over members of a category (and which differs between categories) ought to make otherwise probabilistic relational categories learnable. In order to test this hypothesis we created categories with a logical structure identical to that of Kittur et al. (2004, 2006): every exemplar consisted of a circle and a square, one of which was larger than the other, one of which was darker, one of which was in front the other, and one of which was above the other. In the prototype of category A, the circle was larger, darker, above and in front of the square; in B it was smaller, lighter, below, and behind. Each exemplar of either A or B shared three relations with its prototype, and no relation was constant across all members of a category. (In the Kittur et al. (2004, 2006) studies, the category structures were identical except that an octagon was used in place of the circle.) All we manipulated between participants was the instructions participants were given, and thus the task they were nominally performing.

In the “categorize” condition of Jung and Hummel (2014), Experiment 1, participants were instructed to categorize each exemplar as either an A or a B. In the “who’s winning?” condition, participants were instructed to determine whether the circle or the square was “winning.” Importantly, the tasks were otherwise completely isomorphic: any exemplar a participant in the “winning” condition would correctly classify as “the circle is winning” (by pressing the A key), a participant in the categorize condition would correctly classify as a member of category A (by pressing the A key); and any exemplar correctly classified as “the square is winning” (by pressing the B key) would correctly be categorized as a member of category B (by pressing the B key). We hypothesized that the “who’s winning” task—but not the categorize task—would encourage participants to discover a higher-order relation that remained invariant over members of a category (namely, which part, the circle or square, was in more “winning” roles of the four relations) and thus render the categories learnable. In several experiments, the results were exactly as predicted: participants given the “who’s winning” task learned to criterion much faster (and a much higher proportion of them reached criterion at all) than participants given the categorize task, even though the correct response to each exemplar was exactly the same across the tasks. This result is consistent with Kittur et al.’s (2004, 2006) interpretation of their findings in terms of participants invoking the psychological mechanisms responsible for schema induction (by intersection discovery) when faced with the task of learning a relational category structure. Specifically, the invariant participants appear to be learning in the “who’s winning” condition is something like, “The circle [or square] has more points, so it wins.” In the case of this invariant, it does not matter which relations give rise to the points; it only matters which shape has more of them. As a result, this learning procedure is robust to the variation in the individual relations giving rise to the “points.”

Although participants in the “who’s winning” condition learned much faster and more reliably than those in the categorize condition, as noted previously, roughly half the participants even in the categorize condition eventually learned to correctly classify the exemplars. Our primary motivation in the current study was to investigate what makes the probabilistic relational categories learnable in those participants that do manage to learn them. On the strictest interpretation of the intersection discovery hypothesis, this ought to be impossible: the intersection is always the empty set, so the categories should never be learnable by anyone. How do those participants who learn the categories manage to do so?

Polysemy, Hierarchical Categories, and Probabilistic Relational Categories

One possibility, suggested by Lakoff (1987), is that putatively probabilistic relational categories may in fact be polysemous. Consider for example, the category mother. Mother is a relational category (since a person’s membership in the category is defined by her relationship to her child), and although it may, at first, seem to be deterministic, there are in fact different kinds of mothers: birth mothers and adoptive mothers; caring and neglectful mothers; loving and abusive mothers, etc. The result is that no single relation (either genetic, care-based or emotional) necessarily characterizes every kind of mother. That is, mother is polysemous: a single label that refers to similar but nonetheless different categories of relationships. This possibility suggests a solution to the problem of learning probabilistic relational categories: rather than learning that all the exemplars belong to a single (probabilistic) category, perhaps it is easier to learn multiple sub-categories (each of which is individually deterministic), which are polysemous, in the sense of sharing a single label or name. Accordingly, we reasoned that the participants in the Kittur et al. (2004, 2006) and Jung and Hummel studies who managed to learn to criterion may have done so by treating the categories they were learning as polysemous: perhaps they somehow discovered subordinate-level categories that were deterministic by virtue of one or two relations remaining invariant, and then learned to classify those sub-categories with a common label (as elaborated shortly).

The Current Experiments

The current experiments tested three hypotheses about factors that might help people to learn otherwise probabilistic relational concepts. Experiment 1 tested the hypothesis that learning putatively probabilistic relational categories (like mother) can be facilitated by rendering such categories polysemous, that is, by training participants to learn deterministic sub-categories (i.e., “subordinate-level” categorizations) concurrently with the probabilistic category labels (i.e., “basic-level” categorizations). This experiment also tested the hypothesis that comparison—specifically, having the opportunity to explicitly compare the exemplars of the subordinate-level categories—would facilitate subordinate-level category learning. Comparison is thought to play a central role in schema induction (e.g., Gick and Holyoak, 1980, 1983; Gentner, 1983; Hummel and Holyoak, 2003) and relational learning (e.g., Doumas et al., 2008), and numerous studies have demonstrated the facilitatory effect of comparison on the learning of relational concepts (e.g., Hammer et al., 2008, 2009; Namy and Clepper, 2010; Augier and Thibaut, 2013; Kok et al., 2013; Kurtz et al., 2013; Carvalho and Goldstone, 2014; Guo et al., 2014). The results of Experiment 1 demonstrated that subordinate-level category learning facilitated participants’ learning of our probabilistic relational category structures, but only when participants were also allowed to compare multiple exemplars of a category to one another on each trial.

Experiment 2 extended Experiment 1 by investigating the necessity of the concurrent subordinate- and basic-level learning. In this experiment, participants were trained to classify exemplars at the probabilistic basic level before the deterministic subordinate level and learning did not improve relative to training on a basic-level-only baseline. Experiment 2 also investigated the effect of subordinate-level comparison without subordinate-level category learning. That is, it investigated whether giving learners the ability to compare two exemplars that would have belonged to the same subordinate-level category (and thus shared two invariant relations) during basic-level classification—but without explicit subordinate-level categorization— would improve basic-level learning relative to a one-exemplar baseline. It did not, suggesting that comparison, by itself, may not facilitate learning probabilistic relational categories.

Experiment 3 tested the hypothesis that presenting the (deterministic) prototype of each category alongside each (probabilistic) exemplar during training would facilitate learning. This manipulation is analogous to explicit instruction (e.g., in a classroom setting) that although the exemplars are probabilistic in the relations they possess, they nonetheless derive from a deterministic underlying category structure. The results suggest that this procedure, like the subordinate-before-basic procedure of Experiment 1, facilitated participants’ learning. This experiment also tested two additional variants of the comparison hypothesis tested in Experiment 1 and provided weak support for that hypothesis.

An additional purpose of the current experiments was to replicate the basic difficulty-of-probabilistic-relational-category learning effect with new stimulus materials. Kittur et al. (2004, 2006) used stimuli composed of octagons and squares, and Jung and Hummel (2014), (Experiments 1…3) used stimuli composed of circles and squares. The current experiment used fictional “bugs” as stimuli (Figure 1). The purpose of using these new stimuli was simply to demonstrate whether the same effects obtain with very different (arguably, more natural) stimulus materials. Like the stimuli in the previous experiments, the categories used in the current experiments were defined by the relations among their exemplars’ parts, and individual relations were probabilistically related to category membership across exemplars.

FIGURE 1

FIGURE 1. Examples of the stimuli used in all the experiments. The top two rows depict basic-level category Fea; the bottom two rows depict basic-level category Dav. The left most column shows two examples of each category prototype: Fea (1111) and Dav (0000). Columns 2…5 depict two examples each of the categories’ specific exemplar classes. Columns correspond to the exception relation defining that class. For example, exemplars in column 2 (0111 and 1000) differ from their respective prototypes in the relative width of the bugs’ heads and bodies (r1). Examples of a prototype or an exemplar class differ from one another in their metric properties (e.g., head width) but share categorical relations (e.g., whether the head is wider or narrower than the body). The figure shows two randomly selected examples of each prototype or exemplar class, out of an open-ended set of such examples (with each example differing from the others in its class in terms of its precise metric properties). See text for details.

Category Structures

The categories used in these experiments were fictional “bug species” defined by the relations between the bugs’ head, body, wings, and antennae. As shown in Figure 1 and Table 1, the prototype of the category (species) “Fea” [1, 1, 1, 1] had a head wider and darker than its body (relations r1 and r2; the first two 1’s in the vector), antennae longer than its head (r3) and wings longer than its body (r4). The prototypical Dav [0, 0, 0, 0] had the opposite relations, with its body wider and darker than its head (r1 and r2), antennae shorter than its head (r3) and wings shorter than its body (r4).

TABLE 1

TABLE 1. The prototype and exemplar class for each species.

Any exemplar of Fea or Dav shared three relations with its own prototype and one with the prototype of the opposite category (Table 1). In other words, the formal category structures used were isomorphic to those used by Kittur et al. (2004, 2006) and Jung and Hummel (2014). All members of an exemplar class (where a class corresponds to one of the eight binary codes in Table 1) share exactly the same defining relations (e.g., all members of 0111 have a heads that are narrower and darker than their bodies, antennae longer than their heads and wings longer than their bodies) but differ from one another in the exact numerical dimensions and darknesses of their heads, wings, antennae, and bodies. That is, although relationally identical to one another, members of an exemplar class are featurally different from one another. Stimuli were generated by the computer while the subject performed the experiment, randomly choosing the metric values of the bugs’ parts to be consistent with the defining relations. As such, it is unlikely that any given subject would see exactly the same bug more than once during the experiment.

In Experiment 1, participants learned to classify the bugs at a subordinate level (Cim Fea [first two exemplar classes in Column1 of Table 1], Kei Fea [last two exemplar classes in Column 1], Sko Dav [first two exemplar classes, Column 2] or Lif Dav [last two exemplar classes, Column 2]). In Experiment 2, participants learned to classify the bugs at both the basic level (Fea vs. Dav) and at the subordinate level. In Experiment 3, participants learned each exemplar class as its own unique subordinate-level category (Kei Fea, Bai Fea, Wou Fea, or Cim Fea for the Fea species; Haw Dav, Ang Dav, Sko Dav, or Lif Dav for the Dav species). The basic level categories were probabilistic, in the sense that each relation was diagnostic of category membership 75% of the time, but no single relation was fully diagnostic. However, each subordinate-level category had two fully deterministic relations. For example, in the two exemplars of Cim Fea, [1101] and [1110], relations r1 and r2 both deterministically take the value 1; and in the two exemplars of Sko Dav, [0010] and [0001], both take the value 0. As such, Fea and Dav are polysemous, with deterministic subordinate level categories.

Experiment 1

Experiment 1 investigated the hypothesis that learning the categories’ deterministic subordinate-level labels would facilitate participants’ learning of their polysemous (probabilistic) basic-level labels. It also investigated the necessity of explicit subordinate-level comparison for the learning of the subordinate-level categories.

Method

Participants

Forty five undergraduates enrolled at the University of Illinois participated in Experiment 1 for course credit.

Materials

Stimuli were line drawings of fictional bugs as described above. Subspecies of each species were made by grouping pairs of exemplars according to shared relations: Kei Fea = [0, 1, 1, 1] and [1, 0, 1, 1,], and Cim Fea = [1, 1, 0, 1] and [1, 1, 1, 0]; Sko Dav = [1, 0, 0, 0] and [0, 1, 0, 0], and Lif Dav = [0, 0, 1, 0], and [0, 0, 0, 1]. Eight trials per block were presented in the subordinate-level with comparison condition, and 16 trials per block were presented in the subordinate-level without comparison and basic baseline conditions. Each exemplar was presented in a random order once (subordinate-level with comparison condition) or twice (subordinate-level without comparison and basic baseline conditions) per block. There were only half as many trials per block in the subordinate-level with comparison condition as in the other two conditions because each trial of subordinate-level with comparison presented two versions of each exemplar at a time, whereas the other conditions presented only one exemplar per trial.

Design

The experiment used a three-condition (subordinate-level with comparison vs. subordinate-level without comparison vs. basic baseline) between-subjects design.

Procedure

All conditions consisted of two or more blocks of training trials followed by two blocks of transfer trials. The training phase of the experiment differed across conditions, as described above. During this phase of the experiment, participants received accuracy feedback on each trial.

In the subordinate-level with comparison condition, each trial of the training phase simultaneously presented two exemplars belonging to the same subordinate-level category. Participants identified the stimuli at the subordinate level (i.e., as Cim Fea, Kei Fea, Sko Dav or Lif Dav) by clicking one of four boxes depicting the relevant subordinate- and basic-level names under the two bugs. This response was followed by accuracy feedback. See the Appendix for figures depicting the participants’ task in each condition of each experiment reported here.

In the subordinate-level without comparison condition (Figure A2 in the Appendix), each trial depicted a single stimulus (rather than a pair), but otherwise the procedure was identical to that in the subordinate-level with comparison condition. In the basic baseline condition, each trial depicted a single bug, which the participant classified at the basic level only (Figure A3 in the Appendix). In all three conditions, this training phase was followed by a transfer phase.

The training phase lasted 40 blocks (320 trials for the two subordinate-level with comparison condition and 640 trials for the other conditions) or until the participant responded correctly on at least 87.5% (7/8 or 14/16) of the trials for two consecutive blocks, whichever came first. The transfer phase was the same across all conditions. All participants classified the bugs at the basic level only and received no accuracy feedback. 16 trials were presented per block, with each exemplar presented in a random order once per block. Each exemplar remained on the screen until the participant responded. At the end of the experiment participants were queried about strategies they used during the experiment.

Results

Trials to criterion

Most of the participants (12 of 15) reached criterion in subordinate-level with comparison, whereas only 1 of 15 reached criterion in subordinate-level without comparison and none reached criterion in basic baseline. A chi-square test of independence showed that trials-to-criterion differed reliably across conditions [χ2 (2, N = 45) = 25.187, p < 0.001].

In addition to the chi-square test, in all three experiments we performed a more conservative test of our hypothesis (i.e., more favorable to the null hypothesis) by comparing trials to criterion across conditions (Figure 2). (Rather than converting each subject to a binary, did reach criterion vs. did not reach criterion as in the chi-square test, the differences in trials to criterion preserve metric differences between participants’ performance.) We made this test even more conservative by treating those participants who failed to reach criterion as though they had reached criterion in the last block of learning. There was a reliable difference between subordinate-level with comparison (M = 182, SD = 108) and subordinate-level without comparison (M = 625, SD = 58) [t(28) = -14.014, p < 0.001]. The performance in basic baseline was the worst overall (M = 640, SD = 0).

FIGURE 2

FIGURE 2. Trials to criterion by study condition in Experiment 1. Error bars represent SEs. ***p < 0.001.

Study phase accuracy

First, we report accuracy of subordinate-level classification (Kei Fea, Cim Fea, Sko Dav or Lif Dav) in the subordinate-level with comparison and subordinate-level without comparison conditions and accuracy of basic-level classification (Fea or Dav) in the basic baseline condition. Participants in subordinate-level with comparison (M = 0.56, SD = 0.12) were more accurate than participants in subordinate-level without comparison (M = 0.43, SD = 0.12) [t(28) = 2.928, p < 0.01]. Participants in basic baseline were the most accurate (M = 0.62, SD = 0.09; Figure 3). However, chance performance in the two subordinate-level conditions was 0.25 whereas chance in the basic baseline condition was 0.5, so it is difficult to compare study phase accuracy directly across these conditions. If we correct for chance performance by subtracting each participant’s mean accuracy by chance performance in the condition, then mean corrected accuracy is 0.31 in subordinate with comparison condition, 0.18 in subordinate without comparison and 0.12 in basic baseline. (Of course, this correction has no effect on the results of the t-tests.)

FIGURE 3

FIGURE 3. Mean accuracy by study condition in Experiment 1. Error bars represent SEs. *p < 0.05.

Transfer phase accuracy

A three-way (subordinate-level with comparison vs. subordinate-level without comparison vs. basic baseline) between-subjects ANOVA revealed main effects of task [F(2,44) = 11.880, MSE = 0.149, p < 0.001; Figure 4]. Participants in the subordinate-level with comparison condition (M = 0.86, SD = 0.11) showed reliably more accurate performance during transfer than participants in the subordinate-level without comparison (M = 0.72, SD = 0.09; Tukey’s HSD, p < 0.01) and basic baseline conditions (M = 0.67, SD = 0.13; Tukey’s HSD, p < 0.001). There was no reliable difference between transfer in the subordinate-level without comparison and basic baseline conditions (Tukey’s HSD, p = 0.36).

FIGURE 4

FIGURE 4. Mean accuracy by transfer condition in Experiment 1. Error bars represent SEs. **p < 0.01, ***p < 0.001.

Discussion

Both in terms of trials to criterion during learning and in terms of accuracy of basic-level classification during transfer, training participants to classify stimuli at a deterministic subordinate level and allowing them to explicitly compare multiple exemplars of a subordinate-level category to one another (the subordinate-level with comparison condition) improved category learning relative to simply training the stimuli at the basic level only (the basic baseline condition) and relative to simply training the stimuli at the subordinate level without the opportunity to compare them (the subordinate-level without comparison condition). This finding suggests that, as hypothesized, rendering probabilistic relational categories polysemous (and thus deterministic at the subordinate level) makes them more learnable, but that this facilitatory effect of polysemy (at least in our data) depends on participants having the opportunity to compare members of the same subordinate-level category and thus observe which relations they have in common.

Experiment 2

If deterministic subordinate-level learning is to facilitate probabilistic basic-level learning, then it seems necessary for the subordinate-level learning to temporally precede (or at least proceed at the same time as) the basic-level learning (see also Anderson, 1991; Love et al., 2004; Mathy et al., 2013)². Accordingly, in the subordinate-level conditions of Experiment 1, participants viewed pairs of exemplars from the same subordinate-level category on each trial and learned to classify the stimuli at the subordinate level before being required to transfer learning to the basic level. Experiment 2 investigated the necessity of the subordinate-before-basic learning order used in Experiment 1. In the basic-level first with comparison condition of Experiment 2, participants were trained to classify exemplars at the probabilistic basic level before classifying them at the deterministic subordinate level. This experiment also investigated the effect of subordinate-level comparison without subordinate-level category learning: in the basic-level only with comparison condition of this experiment, participants viewed pairs of exemplars that would have belonged to the same subordinate-level category, but only learned to classify them at the basic level. The basic baseline condition of Experiment 2 was identical to that condition of Experiment 1: on each trial, the participant viewed only a single exemplar and classified it only at the basic level.