Perceptual estimation obeys Occam's razor

Gershman, Samuel  Joseph; Niv, Yael

doi:10.3389/fpsyg.2013.00623

ORIGINAL RESEARCH article

Front. Psychol., 23 September 2013

Sec. Cognition

Volume 4 - 2013 | https://doi.org/10.3389/fpsyg.2013.00623

Perceptual estimation obeys Occam's razor

SJ
Samuel J. Gershman ¹^*
YN
Yael Niv ²

1. Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology Cambridge, MA, USA
2. Department of Psychology and Princeton Neuroscience Institute, Princeton University Princeton, NJ, USA

Article metrics

View details

Citations

9,2k

Views

2,1k

Downloads

Abstract

Theoretical models of unsupervised category learning postulate that humans “invent” categories to accommodate new patterns, but tend to group stimuli into a small number of categories. This “Occam's razor” principle is motivated by normative rules of statistical inference. If categories influence perception, then one should find effects of category invention on simple perceptual estimation. In a series of experiments, we tested this prediction by asking participants to estimate the number of colored circles on a computer screen, with the number of circles drawn from a color-specific distribution. When the distributions associated with each color overlapped substantially, participants' estimates were biased toward values intermediate between the two means, indicating that subjects ignored the color of the circles and grouped different-colored stimuli into one perceptual category. These data suggest that humans favor simpler explanations of sensory inputs. In contrast, when the distributions associated with each color overlapped minimally, the bias was reduced (i.e., the estimates for each color were closer to the true means), indicating that sensory evidence for more complex explanations can override the simplicity bias. We present a rational analysis of our task, showing how these qualitative patterns can arise from Bayesian computations.

Introduction

The fourteenth century English friar and theologian William of Occam advised philosophers “not to multiply entities beyond necessity” (Boehner, 1957). The contemporary interpretation of Occam's razor is that, all other things being equal, simpler explanations of data should be preferred to more complex explanations. This heuristic notion has found mathematical expression in Bayesian statistics (Jaynes, 2003) and algorithmic information theory (Li and Vitányi, 2008). It has since been applied to cognitive psychology as the “simplicity principle” (Chater and Vitányi, 2003; Feldman, 2003): the idea that humans seek simple explanations of their sensory input. Our focus in this paper is on unsupervised category learning, where evidence suggests that humans assign stimuli to a small set of categories, only inventing new categories when the stimulus statistics change radically (Anderson, 1991; Clapper and Bower, 1994; Pothos and Chater, 2002; Love et al., 2004; Sanborn et al., 2010).

If the categories people invent dictate how they “carve nature at its joints” (i.e., divide the environment into meaningful entities; see Gershman and Niv, 2010, then effects of Occam's razor should be discernible in perceptual estimation. Substantial evidence exists that categories shape perception (Huttenlocher et al., 1991, 2000; Goldstone, 1995; Hemmer and Steyvers, 2009). For example, Goldstone (1995) had participants judge the color of numbers and letters that varied in color along a red-violet gradient, and showed that stimuli belonging to the letter category (with typically red objects) were judged to be more red than identically colored stimuli belonging to the other category. As another example, syllables belonging to different phonetic categories are more easily discriminated than syllables with the same physical difference but belonging to the same category—the so-called perceptual magnet effect (Liberman et al., 1957). However, these studies assume a pre-defined category structure, whereas many real-world learning situations (particularly during development) require one to discover the underlying category structure from undifferentiated sensory data. In these situations, we expect that Occam's razor will influence the number of perceptual categories inferred from sensory data, and in turn govern participants' estimates of stimulus properties. The experiments reported in this paper were designed to test this hypothesis.

The stimuli in our experiments consisted of randomly scattered colored circles displayed on a computer screen (Figure 1), similar to stimuli used in studies of number perception (Izard and Dehaene, 2008). Each trial was characterized by one of two colors, and all circles were displayed in this color. The number of circles on each trial was drawn from a color-specific Gaussian distribution. The distributions differed in their means (Experiments 1 and 3) or variances (Experiment 2). Participants were asked to judge how many circles there were on the screen, but did not have enough time to count them explicitly.

Figure 1

If the distributions corresponding to the two colors overlap sufficiently, Occam's razor dictates that the stimuli should all be assigned to one category despite their obviously different colors, a prediction formalized in several models of categorization (Anderson, 1991; Sanborn et al., 2010). The consequence of merging the two perceptual categories is that estimates will be “regularized” toward the average of the two distributions. In contrast, reducing overlap between the distributions is expected to diminish this regularization, as it supports separate categories for each color. Each of the experiments reported below included a high overlap condition in which merging (and hence more regularization) was expected to occur, and a low overlap condition in which splitting (and less regularization) was expected to occur.

To make our theoretical account explicit and quantitative, we present a computational model of human performance in our task. In the spirit of the probabilistic motivation for Occam's razor described above, we derive our model from hypothesized probabilistic assumptions about the environment and suggest that participants perform approximately optimal inference. In other words, we undertake a “rational analysis” (Anderson, 1990). Our aim is to elucidate the computational constraints, rather than particular processing or implementational mechanisms, that govern perceptual estimation in our task. We compare the rational model to an exemplar model (Medin and Schaffer, 1978; Nosofsky, 1986, 1988; Kruschke, 1992) which represents each data point as a unique perceptual category, and thus lacks a simplicity bias. Through quantitative model comparison, we show that the rational model is able to better account for our data.

Experiment 1

In our first experiment, we manipulated categorical overlap by varying (within subject) the distance between the means of the two distributions in blocks. Each block included one distribution (mean 65, standard deviation 10) which was designated the “baseline,” and a second, “alternative” distribution that either had low overlap (mean 35, standard deviation 10) or high overlap (mean 55, standard deviation 10) with the baseline distribution (Figure 2, left). We refer to these conditions as Low mean alternative and High mean alternative, respectively. In each block each of the distributions (alternative and baseline) was associated with a unique color, and circles appeared in that color on those trials in which the number of circles was drawn from that distribution.

Figure 2

Our instructions to participants made no mention of color. However, we expected participants to use color as a cue for categorization. More precisely, we expected use of the color cue to depend on a combination of sensory evidence (i.e., the number of circles) and a simplicity bias toward fewer categories. On High alternative mean blocks in which all trials had relatively similar numbers of circles, we expected participants to treat all trials as if they were one category, and effectively ignore color as a categorization cue. As a result, in these blocks we expected estimates about the number of circles to be affected by the statistics of both colors. In contrast, in Low alternative mean blocks in which there was less overlap between the number of circles in trials of one color as compared to the other color, we expected participants to treat each color as a separate category. If participants indeed learned separate estimates for each color, their estimates would be closer to the true mean of each of the distributions. As such, across blocks we predicted that estimates on the baseline trials would be lower on average in the High mean alternative condition than in the Low mean alternative condition, due to the regularization induced by merging the color categories together in the High mean but not in the Low mean alternative condition. Note that if participants ignored color and grouped all trials together on all blocks, we would expect the opposite: baseline estimates in the High mean alternative condition should be systematically higher than in the Low mean alternative condition. Alternatively, if participants always used color as a categorization cue, there should be no difference between estimates of baseline trials in the two conditions, since the baseline distribution is the same in both cases.

Materials and methods

Participants

Fourteen students participated in the experiment for course credit or monetary compensation ($10). All subjects gave informed consent and the study was approved by the Princeton University Institutional Review Board.

Procedure

Stimuli consisted of colored circles displayed in a random spatial configuration within a bounded section of the computer screen. On each trial, the participant was presented with a pattern of randomly scattered (occasionally overlapping) circles (Figure 1), where the number of circles was drawn from a Gaussian with a category-specific mean and variance. There were two trial types: “baseline” trials in which the number of circles was drawn from a Gaussian with mean 65 and standard deviation 10), and “alternative” trials. In the “High mean alternative” block the latter trials were drawn from a Gaussian with mean 55 and standard deviation 10. In the “Low mean alternative” block, the alternative trials were drawn from a Gaussian with mean 35 and standard deviation 10. In all cases, the number of circles was truncated between 10 and 100, and rounded to the nearest integer. The two categories in each block were associated with a different color of circles (randomly chosen).

The participant was given 5 s to enter a two-digit estimate of the number of circles on the screen using the keyboard; if no response was entered within this time limit, a message indicated that the response was too slow and the trial was subsequently not used in data analysis. The circles remained on the screen during the 5 s response interval. After entering a response, the participant received feedback indicating the correct number of circles. Each subject performed eight blocks of the High mean alternative condition and eight blocks of the Low mean alternative condition (randomly interleaved), with 20 trials in each block (10 baseline and 10 alternative, randomly interleaved). All experiments were implemented in Matlab (Version 7.9.0.529) using the Psychophysics toolbox (Brainard, 1997).

We used paired-sample, two-sided t-tests to compare conditions. Effect sizes were measured using Cohen's d. We excluded subjects whose average errors (in terms of distance from the true mean) on alternative trials were greater than two standard deviations from the mean across all three experiments. No subjects were excluded from Experiment 1.

Results and discussion

The average responses on baseline trials in each condition are shown in Figure 2 (right). Estimates of the number of circles on baseline trials in the High mean alternative condition (mean = 62.25) were significantly lower than in the Low mean alternative condition (mean = 63.86) [t₍₁₃₎ = 2.41, p < 0.05, d = 0.64]. Moreover, the estimates were significantly lower than the true average in the High mean condition [t₍₁₃₎ = 3.19, p < 0.05, d = 0.85] but not in the Low mean condition (p = 0.30). These results are consistent with the hypothesis that participants are more likely to assign the alternative and baseline distributions to the same category in the High mean alternative condition than in the Low mean alternative condition, due to greater overlap between the distributions in the former but not in the latter.

We also examined the estimates on alternative trials. The average number of circles reported by participants closely tracked the true average: 55.44 for the High mean alternative condition and 36.47 for the Low mean alternative condition. T-tests confirmed that average participant estimates were not significantly different from the true average (p = 0.51 for the High mean alternative condition and p = 0.07 for the Low mean alternative condition).

If participants indeed merged the baseline and alternative categories in the High mean alternative condition, one might argue that we should also have seen regularization effects on the alternative trials. While we saw no evidence for such regularization in the trial-averaged data, it may be the case that regularization effects operate over timescales that are shorter than a whole block. To test this hypothesis, we calculated the correlation between estimates on each baseline trial and the preceding alternative trial (note that, due to the randomized trial order, the preceding alternative trial might have been several trials back). We reasoned that if estimates are influenced by recently experienced trials, then the correlation dependent measure should be positive. Importantly, this should only occur if both trials were assigned to the same merged category. Figure 3 (left) shows the results of this analysis: Fisher z-transformed correlations were significantly greater than 0 in the High mean alternative condition [t₍₁₃₎ = 3.14, p < 0.01, d = 0.84] but not in the Low mean alternative condition (p = 0.86). We also examined the influence of baseline trials on subsequent alternative trials (Figure 3, right): Again, Fisher z-transformed correlations were significantly greater than 0 in the High mean alternative condition [t₍₁₃₎ = 4.47, p < 0.001, d = 1.19] but not in the Low mean alternative condition (p = 0.50). These results are consistent with the hypothesis that the High mean alternative condition promotes category merging while the Low mean alternative condition does not.

Figure 3

The correlation analyses reported above also rule out an alternative explanation of our findings in terms of contrast effects. According to this explanation (see Holland and Lockhead, 1968), contrast between the baseline and alternative categories is accentuated in the Low mean alternative condition, causing participants to produce higher estimates for baseline trials compared to estimates in the High mean alternative condition. Such a contrast explanation would predict negative correlations between estimates in the baseline and alternative trial types; yet we found no evidence for negative correlations.

Experiment 2

Our second experiment was identical to Experiment 1 in all respects except that we manipulated the variances of the distributions rather than their means, as illustrated in Figure 4 (left). This manipulation was again expected to affect the likelihood of splitting or merging perceptual categories. Specifically, the High variance condition resulted in greater overlap between the alternative and baseline distributions as compared to the Low variance condition, leading to the prediction that estimates of baseline trials in the High variance condition would be regularized downward more than in the Low variance condition.

Figure 4