Metacognitive Myopia in Hidden-Profile Tasks: The Failure to Control for Repetition Biases

Fiedler, Klaus; Hofferbert, Joscha; Wöllert, Franz

doi:10.3389/fpsyg.2018.00903

ORIGINAL RESEARCH article

Front. Psychol., 05 June 2018

Sec. Cognition

Volume 9 - 2018 | https://doi.org/10.3389/fpsyg.2018.00903

This article is part of the Research TopicJudgment and Decision Making Under Uncertainty: Descriptive, Normative, and Prescriptive PerspectivesView all 18 articles

Metacognitive Myopia in Hidden-Profile Tasks: The Failure to Control for Repetition Biases

Klaus Fiedler^*

Joscha Hofferbert

Franz Wöllert

Department of Psychology, Heidelberg University, Heidelberg, Germany

The failure to exploit collective wisdom is evident in the conspicuous difficulty to solve hidden-profile tasks. While previous accounts focus on group-dynamics and motivational biases, the present research applies a metacognitive perspective to an ordinary learning approach. Assuming that evaluative learning is sensitive to the frequency with which targets are paired with positive versus negative attributes, selective repetition of targets’ assets and deficits will inevitably bias the resulting evaluations. As selective repetition effects are ubiquitous, metacognitive monitoring and control functions are required to correct for repetition biases. However, three experiments show that metacognitive myopia prevents judges from correction, even when explicitly warned to ignore selective repetition (Experiment 1), when same-speaker repetitions rule out social validation (Experiment 2) and when blatant debriefing enforces superficial corrections (Experiment 3). For a comprehensive understanding of collective judgments and decisions, it is essential to take metacognitive monitoring and control into account.

Introduction

Democratic societies rely on the belief that arduous tasks that exceed individual persons’ capacity can be managed collectively. Performance and motivation can be enhanced if the overall workload is divided. However, for many judgment and decision problems – such as health risk assessment or personnel selection – the need to coordinate and integrate collective efforts creates a serious difficulty. Information can vary in trustworthiness and validity, arguments may be redundant or in conflict, and individual opinions may rely on different sources and sample sizes. Still, in democratic societies, virtually all important decisions are made collectively.

Despite the trust in the superiority of collective knowledge and in the wisdom of crowds (Surowiecki, 2004; Mannes et al., 2014), several decades of empirical research have drawn a rather pessimistic picture. Collective brainstorming was shown to decrease productivity (Diehl and Stroebe, 1987), group discussion can cause polarization and over-statement (Brauer et al., 1995; McCauley, 1998), and others’ advice is not utilized appropriately (Yaniv et al., 2009).

Conspicuous Evidence From Hidden-Profile Tasks

Research on hidden profile-tasks illuminates this failure to exploit the potential advantage of collective wisdom (Stasser and Titus, 1985, 2003; Lu et al., 2012; Schulz-Hardt and Mojzisch, 2012). In this paradigm, part of the information about decision options (applicants, products) is shared by everybody, while other, unshared information is exclusively available to single individuals (see Table 1). Although Candidate A (six positive and three negative attributes) is clearly superior to Candidate B (three positive and six negative attributes), the subset of information available to all three individual judges J1, J2, and J3 favors B (three positive; two negative) over A (two positive; three negative). This is possible because A’s few deficits and B’s few assets are shared (dark gray) whill agree on a wrong decision. The only chance to uncover the hidden profile seems to be the collective exchange all raw arguments about all candidates’ assets and deficits. However, a growing body of evidence shows that people rarely manage to transcend their individual perspective and to identify a hidden profile (Lu et al., 2012; Schulz-Hardt and Mojzisch, 2012). Several explanations that have been offered for this persistent deficit converge on emphasizing group-dynamic influences and social motives.

TABLE 1

TABLE 1. Structure of a hidden-profile problem.

Most prominent accounts focus on a shared-information bias. Shared arguments are more likely to be mentioned and repeated in group discussions than unshared arguments (Stasser et al., 1989; Larson et al., 1994; Mesmer-Magnus and DeChurch, 2009), for two reasons. First, shared arguments are known by more than one discussant and are therefore more likely to be mentioned by at least one discussant than unshared arguments (Larson et al., 1994; Larson and Harmon, 2007). Second, shared arguments are socially rewarding and serve to enhance one’s self-esteem (Wittenbaum et al., 1999). Complementing the shared information bias is a bias to discuss (Dennis, 1996; Faulmüller et al., 2012) or to believe in the validity of preference-consistent arguments (Edwards and Smith, 1996; Greitemeyer and Schulz-Hardt, 2003; Faulmüller et al., 2010). Perceived validity should be enhanced when arguments are shared or consistent with one’s own preferences (Yaniv and Kleinberger, 2000; Volzhanin et al., 2015).

Other accounts have started to examine the cognitive basis of the shared-information bias. As shared arguments are introduced and repeated more frequently (Stasser, 1988; Stasser et al., 1989; Larson and Harmon, 2007), they have a natural memory advantage over unshared arguments. This advantage could interfere with the solution of hidden profile tasks, which draw heavily on the utilization of less well memorized unshared and preference-inconsistent items. Indeed, a number of classical studies testify to the extra persuasive impact of information repetition (Wilson and Miller, 1968; Chalmers, 1971; Cacioppo and Petty, 1979) and to the enhanced attractiveness and preference due to repeated exposure (Zajonc, 1968; Bornstein, 1989). Similar biases favoring repeated arguments can be found in a few hidden-profile studies (Van Swol et al., 2003; Schulz-Hardt et al., 2016).

However, despite the evidence on the advantage of shared or preference-consistent arguments, hidden-profile research has so far not considered an alternative explanation in terms of the simple and uncontested principle that all inductive learning increases with the number of trials. Without any group discussion or prior commitment to individual preferences, and independent of motivational factors such as social utility or subjective validity of arguments, when every item is given the same attention in an unbiased process, evaluative-learning should reflect the number of trials providing positive and negative evidence for different targets. For every stimulus item linking a target to a positive (negative) stimulus item, an increment (decrement) should be added to the evaluation of that target. This valence-updating process should be sensitive to repetitions, not only to novel stimuli, as evident from work on evaluative conditioning (Hofmann et al., 2010) and instance-based learning (Gonzalez and Dutt, 2011). Thus, an unbiased learning mechanism affords a sufficient explanation of the impact of repetition, independent of motivated biases like social sharing, preference consistency, or social validation (Boos et al., 2013).

While such an unbiased, ordinary-learning account calls for the manipulation of repetition as independent variable, almost all previous studies have treated repetition as a dependent variable, showing that shared information is likely to be repeated. Moreover, the two available publications by Van Swol et al. (2003) and by Schulz-Hardt et al. (2016) rely on restricted task set-ups (e.g., including only two-choice alternatives rather than profiles over several targets; convenient protocol sheets reducing memory demands; repetition confounded with preference consistency). Theoretically, both studies focus on distinct cognitive illusions. Van Swol et al. (2003) interpret the obtained repetition bias in terms of a truth bias (Arkes et al., 1991; Boehm, 1994). A similar point is made by Weaver et al. (2007), who argue that the enhanced fluency of repeated arguments should produce a repetition bias, regardless of social validation. Schulz-Hardt et al. (2016) believe in a projective variant of social validation, assuming that repetition leads people to infer that other people share repeated opinions.

Ordinary Learning and Metacognitive Myopia

The aim of the present article is different from all previous work on hidden profiles. Starting from basic premise that learned evaluations are sensitive to the number of trials, we provide participants with unequal opportunities to learn positive and negative evaluations of four target persons. Impression judgments should reflect the number of trials conveying targets’ assets and deficits. Whether an argument is new or redundant, whether repeated arguments stem from the same or from independent sources, whether learning experience is fluent or effortful, taking place in group discussions or individual encounters, a basic prediction says: evaluation learning is an increasing monotonic function of the frequency of positive minus the negative arguments.

To be sure, amount of information may be reduced when the stimulus series involves repeated, overlapping, or fully redundant arguments. Yet, merely repeating the same stimuli benefits learning. Although novel and surprising stimuli trigger better learning (Rescorla and Wagner, 1972; Sutton and Barto, 1981), a more fundamental rule says that all trials, whether novel or repetitive, will benefit learning. Even plain repetitions foster rehearsal, elaborate encoding, and consolidation and decrease the chances that arguments will be lost, overlooked, or forgotten.¹ This basic assumption not only accounts for a variety of biases in judgment and decision making (Fiedler, 1996; Fiedler et al., 2002; Lightle et al., 2009). It also offers a new perspective on hidden profiles.

For an experimental demonstration, it is necessary to deprive the hidden-profile task of other influences but repetition. Such a modified set-up appears in Table 2; it is the stimulus distribution used in the experiments below. The entire profile of all information about four candidates, A, B, C, D is available to all individual participants, indicating a clear-cut preference order D > C > B > A.² There is no group discussion, no motive to defend one’s predetermined individual preferences, and no distinction of shared and unshared information. However, the selective repetition of part of the arguments creates a conflict between actual set sizes and repetition frequencies of positive and negative attributes. Although B is clearly inferior to D, B’s fewer assets are repeated more often and B’s more deficits are repeated less often than Ds assets and deficits, respectively, making it easier to learn assets and harder to learn deficits in B than in D. Judgments should thus exhibit a bias to favor B over D.

TABLE 2

TABLE 2. Two stimulus distributions (Series 1 and Series 2) used to study repetition biases.

In the present set-up, finding the hidden profile of substantial information requires judges to ignore (the repeated) part of the superficially presented information, unlike the common task set-up in which the hidden profile includes additional (unshared) items. Thus, our design highlights the independence of the concept “hidden profile” of the specific case involving unshared items.

Metacognitive Monitoring and Correction

Because most collective learning is subject to selective repetition – due to unequal rates of majority and minority groups and variation in the information revealed by the environment – some arguments are more likely to be presented and repeated than others. But should it really be impossible to overcome this problem?

Taking a metacognitive perspective suggests an answer and a possible remedy. Because unequal sample sizes and repetition rates are ubiquitous in the real world, homo sapiens should have evolved meta-cognitive devices to monitor and correct for the impact of repetition. In the hidden-profile paradigm, selective repetition ought to be detected and correct for (e.g., B should be downward-corrected and D should be upward-corrected). From such a metacognitive theory perspective, it is not sufficient to point out that ordinary learning is sensitive to repetition; it is also necessary to explain why repetition and unequal validity are not corrected for.

The present approach relates an ordinary learning account to the intriguing notion of metacognitive myopia (Fiedler, 2000, 2012). Numerous findings demonstrate that sampling biases and repetition biases remain undetected and uncorrected at the metacognitive level (Fiedler et al., 2000, 2002, 2016; Unkelbach et al., 2007; Fiedler, 2012; Powell et al., 2017). For instance, Unkelbach et al. (2007) asked participants to assess how often 10 different shares were among the daily winners in a stock-market game. On some days, they watched two TV programs so that the winners were presented twice, creating a repetition bias in favor of these repeated daily winners. The chief determinant of the resulting evaluations and share preferences was the presentation frequency, regardless of whether presentations reflected new winning outcomes or mere repetitions. Strong and robust repetition bias persisted even when participants were deliberately warned to avoid being misled by mere repetitions.

Because of many similar findings in various paradigms (for a review, see Fiedler, 2012), we expected metacognitive-myopia to extend to hidden profiles. Learned preferences should be markedly biased, due to the failure to correct for apparent repetitions. Even explicit debriefing and warnings to ignore repetitions should not eliminate the bias. This expectation is easy to understand theoretically. One cannot tell one’s cognitive system to stop learning from repetitions (cf. Koriat, 1997; Fiedler et al., 2016; Powell et al., 2017), just as one cannot instruct oneself to stop learning from repeated CS-US pairings in Pavlovian conditioning.

Previous work on hidden profiles never mentioned the need for metacognitive monitoring and control, although metacognitive constructs were considered. Thus, Schulz-Hardt et al. (2016) assumed that discussion partners’ repetitions will reinforce the subjective validity rather than triggering an attempt to correct for repetition bias. Similarly, Weaver et al.’s (2007) notion that fluency mediates the evaluation of repeated arguments is suggestive of naïve and uncritical influences of metacognitive cues. The notion of metacognitive myopia is fundamentally different. We argue that a comprehensive account must not only explain why repetition biases (and feelings of fluency or social validity, and countless other biases) arise in the first place. It must also explain why repetition biases go undetected and uncorrected at the metacognitive level.

Preview of Experiments and Predictions

For an empirical test of these considerations, we exposed individual participants to an audio-recorded protocol of verbal descriptions of positive and negative attributes of four target persons (A, B, C, and D). A cover story explained that targets were applicants for flat share and that the stimulus descriptions reflected the flat mates’ experiences with different subsets of applicants. To rule out group dynamics and social reward motives, participants were not engaged in group discussions but were individually exposed to a pooled (audiotaped) profile.

The four applicants varied in the effective number of positive versus negative attributes, such that the unequivocally correct preference order (D > C > B > A) should be apparent in a no-repetition baseline condition. However, by selectively repeating subsets of the targets’ positive and negative attributes (Table 2), the resulting presentation frequencies yielded a new ordering. This should cause a shift from the correct order D > C > B > A toward the repetition-based ordering B > D > A > C in Experiment 1. We expected that judges would fail to correct for repetition spontaneously. Even an explicit warning not to be misled by repetitions in one of two conditions should not undo the basic repetition effect on evaluative learning. Experiment 2 was devoted to another aspect of meta-cognitive myopia, namely, low sensitivity to variation in social validation. A repetition bias should be obtained regardless of whether repetitions came from the same source or from different flat mates (implying social validation).

In Experiment 3, the design was extended to include recall and recognition measures in addition to evaluative ratings, to substantiate the assumption that repetition fosters learning. To increase the reliability of memory tests, the number of items was doubled and four different patterns of target-item allocations served to enhance the external validity.

Moreover, Experiment 3 allowed for a more refined test of the meta-cognitive inability to correct one’s evaluative judgments. Instead of instructions not to learn from repetitions, which may be impossible, participants in one condition were informed that repetitions came from one flat mate who had vested interests in manipulating the decision. Such a cheater-detection prompt (Cosmides, 1989) entails an obvious demand to correct the final ratings of D relative to B. The vested-interest scenario should therefore motivate a local correction. However, the correction should not undo the impact of selective repetition on implicit learning, as evident in a persistent repetition bias in recall and recognition. Thus, despite the local correction of immediate ratings, the memory data may reveal that repetition biases have become an irreversible social reality.

Experiment 1

Methods

Participants and Design

Eighty-five participants (29 males and 56 females, mean age = 23.73, SD = 3.75) either received course credit or 3 Euro. One participant who did not complete the major dependent measures was excluded. The remaining 84 were randomly assigned to two instruction groups (warning vs. no warning). Another group of 15 participants received the same stimulus tape, from which all repetitions had been removed, to check on the premise that without repetitions the correct preference order (D > C > B > A) can be identified. Set sizes and numbers of positive and negative attributes per target (A, B, C, and D) varied within participants (Table 2).

In the absence of any effect size estimates from similar research, the number of participants required to meet a power criterion was hard to estimate. Given the rather high effect sizes obtained in Experiment 1, larger samples in Experiments 2 and 3 warranted overpowered tests, as evident from the evidence reported below.

Materials

In a pretest, 80 items describing positive (e.g., “He respects and pays heed to other people’s privacy,” “He always tries to preserve the harmony in the shared flat”) and negative attributes (e.g., “He is not very hospitable,” “He transfers a bad temper easily to his flat mates”) were rated by 26 judges for valence and importance for flat sharing. Two different stimulus series were constructed, such that the attributes of the four targets (cf. Table 2) were balanced for valence and importance. Only Series 1 was used in Experiment 1. Repetitions involved slightly altered but semantically invariant paraphrases of original items (e.g., “It is very hard to get him to help with the housework” repeated as “Getting him to help with the housework is very hard”). All items were presented vocally by three male volunteers; repetitions of the same items always came from different voices (flat mates). As all information about each target was presented as a randomly ordered block, repetitions were maximally detectable. Block order was counterbalanced.

Procedure

The entire experiment took place in computer dialog. Participants were asked to imagine living in a flat with four people, looking out for a new flat mate to replace one who had moved out. A casting would take place, during which applicants were interviewed by three flat mates. Not all of them were present when the applicants appeared, so the decision had to rely on a combined report of all flat mates’ experiences with subsets of applicants. One experimental group received an explicit warning not to be misled by repetition: “Some attributes of applicants may be stated repeatedly. Do not incorporate these repetitions in your evaluation.” This warning was not provided to the other group. Afterwards, participants rated the targets on five trait dimensions covering the meaning of the stimulus attributes (agreeable, communicative, appreciative, companionable, helpful; on graphical scales anchored “not at all” and “very much”). They also provided an overall evaluation of all candidates in response to the single item “How much would you like to share your flat with applicant X?”). All ratings were provided on graphical sliding scales; ratings were linearly transformed to numerical scores from 0 to 100. The entire experiment lasted between 10 and 15 min. The materials and computer procedures can be found under the following link: https://drive.google.com/drive/folders/1atdnNdyKAcdVhbWI6X-YOgg1itGpCkIQ?usp=sharing

Results and Discussion

In accordance with the transparency norm, all empirical data are publicly available. To get access, click on Hidden prof on the site below: http://www.psychologie.uni-heidelberg.de/ae/crisp/studies/index.html

Average evaluation scores were computed across all five ratings. To make sure that in the absence of repetitions the stimulus attributes induced the intended ordering of targets (D > C > B > A), 30 participants provided baseline ratings of the targets in a questionnaire (using exactly the same rating scales and instructions as indicated above). Two subgroups evaluated targets described by two different versions of the stimulus series. These baseline ratings were also used to estimate the internal consistency of the five-item evaluation, which amounts to α = 0.91 when based on ratings averaged across all 30 judges, and α = 0.76 when the five ratings were used to discriminate between all 120 = 30 (judges) × 4 (targets) individualized targets. For convenience, we analyzed unweighted average ratings.³

Baseline Impressions

Means and standard deviations of the baseline evaluation scores (without repetitions) are shown in Table 3 (top row). Evidently, the stimulus series induced more positive impressions of the two superior targets (D,C) than the two inferior targets (B,A), although the two targets within each pair received similar ratings. While the four evaluation scores should have ideally produced a linear increase from A to D, the stepped line graphs in Figure 1 suggest that the baseline evaluations were mainly sensitive to the difference between the two superior (D,C) and the two inferior targets (A,B).

TABLE 3

TABLE 3. Means and standard deviations (italics) of target evaluations obtained in Experiments 1 and 2, as a function of instruction conditions (extra warning vs. no warning to ignore repetitions) and two stimulus series.

FIGURE 1

FIGURE 1. Mean evaluative ratings (averaged across traits) of target persons A, B, C, D by experimental conditions (warning vs. no warning vs. no-repetition baseline) across experiments.

For a statistical test of the intended baseline ordering, we followed Rosenthal and Rosnow’s (1985) advice to test focused hypotheses rather than standard analyses of variance, calculating a contrast score that captures a linear increase in evaluative ratings from A to D. This contrast score was the sum of each participant’s mean evaluation of A, B, C, and D, weighted by the baseline contrast coefficients -1.5, -0.5, +0.5, +1.5, respectively. Testing this baseline contrast against zero is tantamount to testing the discriminability of actually existing target differences, independent of repetitions. This premise was indeed met. The mean contrast score was clearly positive, M = +26.79, SD = 25.16 [CI 12.86; 40.72], t(14) = 4.12, d = 2.20, p = 0.001.

Note also that the sigmoid deviation from a purely linear trend (i.e., the slightly enhanced increase from B to C) cannot account for the repetition bias predicted for the experimental conditions (i.e., B > D > A > C), which implies that C should decrease markedly relative to B. This is evident from a repetition contrast defined as the sum of A, B, C, D evaluations weighted by the linear coefficients -0.5, +1.5, -1.5, +0.5, respectively, corresponding to the B > D > A > C pattern reflecting a repetition bias. Indeed, this contrast score tended to be negative, M = -12.60, SD = 25.32 [-26.62; 1.42], t(14) = -1.93, d = 1.03, p = 0.074, indicating that, if anything, the baseline evaluations worked against the predicted repetition bias.

A nice feature of the present design is that baseline and repetition contrasts are orthogonal; the cross product of -0.5, +1.5, -1.5, +0.5 and -1.5, -0.5, +0.5, +1.5 is exactly 0. This allows us to run independent tests of the impact of the effective number of positive and negative attributes (captured by the basic contrast) as well as the presentation frequencies (repetition contrast).

Repetition Bias on Target Evaluations

Turning to the experimental groups, the same average evaluation scores and contrast scores were used to analyze evaluations after selective repetition. As evident from the numerical means in the upper part of Table 3 (summarized in Figure 1), the target evaluations reflect a mixture of both determinants, which is, however, clearly dominated by the repetition bias. Although the two superior targets C, D together received slightly higher evaluations than A, B, selective repetition caused a marked increase in the evaluation of A and B, along with a decrease in the evaluation of C and D, relative to the baseline. Explicit instructions to discount repetitions in the warning group did slightly decrease, but clearly not eliminate repetition biases (see Figure 1).

To disentangle the relative impact of the effective set size of different positive versus negative items and of the repetition bias, the baseline-contrast scores and the repetition-contrast scores were tested against zero. Across all 84 participants, the repetition contrast was strong and clearly above chance, M = +14.89, SD = 27.99 [+8.82; +20.96], t(83) = 4.874, d = 1.064, p < 0.001. The baseline contrast scores only slightly exceeded zero, M = +5.01, SD = 27.26 [-0.90; 10.93], t(83) = 1.684, d = 0.367, p = 0.096.

In the condition without an explicit warning to discount repeated items, only the repetition-contrast score was significantly positive M = +13.72, SD = 30.44 [4.88; 22.56], t(47) = 3.12, d = 0.91, p = 0.003, but not the baseline-contrast score M = +2.79, SD = 27.57 [-5.22; 10.80], t(47) = 0.70. This means that the repetition bias completely overrode the baseline evaluations.

An explicit warning to discount repetitions slightly increased the baseline-contrast score to a marginally significant level, M = +7.97, SD = 26.94 [-1.14; 17.08], t(35) = 1.78, d = 0.60, p = 0.084. However, the repetition-contrast score remained high and significant, despite the warning, M = +16.44, SD = 24.70 [8.08; 24.80], t(35) = 3.99, d = 1.35, p < 0.001. Indeed, the strength of repetition bias increased slightly after a warning (from 13.72 to 16.44). While this difference was far from being significant, t(82) = 0.44, p = 0.661, it highlights the ineffectiveness of the warning.

The single-item summary evaluation yielded a similar ordering as the overall evaluation score based on five trait ratings (M = 41.26, 51.48, 45.98, 49.62, SD = 23.78, 25.58, 24.72, 24.72, for A, B, C, and D, respectively). Due to the restricted reliability of this single-item measure, though, both the repetition-contrast, M = +12.42, SD = 58.01 [-0.17; 25.01], t(83) = 1.96, d = 0.43, p = 0.053, and the baseline contrast M = +9.78, SD = 51.37 [-1.37; 20.93], fell short of significance, t(83) = 1.75, d = 0.38, p = 0.084.

Altogether, these findings support the notion that even when all collective knowledge is shared, the resulting judgments are clearly biased. Mere repetitions of original items caused a marked bias in favor of A and B and against C and D, as portrayed in Figure 1. This finding fits a fully normal law of learning. As learning increases with repetitions, it is no wonder that the impact of repeated information on evaluations is enhanced. Yet, it is reflective of meta-cognitive myopia, the inability to correct for selective repetition.

However, as repetitions in Experiment 1 always came from different speakers, they may have been understood as social validation. Although this cannot account for the failure of explicit discounting instructions, it may have facilitated the repetition bias. To rule out this possibility, we conducted a new experiment with repetitions always coming from the same speaker. If meta-cognition is sensitive to social validation, the repetition bias should disappear, or the resulting judgments should be at least reduced relative to the different-speaker condition in Experiment 1. Conversely, if clearly redundant same-person repetitions continue to exert a systematic bias, this would lend further support to metacognitive myopia.

Another limitation of Experiment 1 was the constant assignment of attributes to targets. In Experiment 2, we used two different stimulus tapes (Series 1 and 2) with different assignments.