When Success Is Not Enough: The Symptom Base-Rate Can Influence Judgments of Effectiveness of a Successful Treatment

Blanco, Fernando; Moreno-Fernández, María Manuela; Matute, Helena

doi:10.3389/fpsyg.2020.560273

ORIGINAL RESEARCH article

Front. Psychol., 23 October 2020

Sec. Cognition

Volume 11 - 2020 | https://doi.org/10.3389/fpsyg.2020.560273

This article is part of the Research TopicUnderstanding and Overcoming Biases in Judgment and Decision-Making with Real-Life ConsequencesView all 15 articles

When Success Is Not Enough: The Symptom Base-Rate Can Influence Judgments of Effectiveness of a Successful Treatment

Fernando Blanco^1*

María Manuela Moreno-Fernández¹

Helena Matute²

¹Faculty of Psychology, University of Granada, Granada, Spain
²Faculty of Psychology and Education, University of Deusto, Bilbao, Spain

Patients’ beliefs about the effectiveness of their treatments are key to the success of any intervention. However, since these beliefs are usually formed by sequentially accumulating evidence in the form of the covariation between the treatment use and the symptoms, it is not always easy to detect when a treatment is actually working. In Experiments 1 and 2, we presented participants with a contingency learning task in which a fictitious treatment was actually effective to reduce the symptoms of fictitious patients. However, the base-rate of the symptoms was manipulated so that, for half of participants, the symptoms were very frequent before the treatment, whereas for the rest of participants, the symptoms were less frequently observed. Although the treatment was equally effective in all cases according to the objective contingency between the treatment and healings, the participants’ beliefs on the effectiveness of the treatment were influenced by the base-rate of the symptoms, so that those who observed frequent symptoms before the treatment tended to produce lower judgments of effectiveness. Experiment 3 showed that participants were probably basing their judgments on an estimate of effectiveness relative to the symptom base-rate, rather than on contingency in absolute terms. Data, materials, and R scripts to reproduce the figures are publicly available at the Open Science Framework: https://osf.io/emzbj/.

Introduction

A great deal of health-related decisions, such as deciding whether or not to quit a treatment, or whether to replace it by an alternative option, depend on the patients’ beliefs about their symptoms and diseases, and particularly about the effectiveness of their treatments. For instance, one of the main reasons for treatment drop-out is the belief that the treatment is producing little or no observable benefit (Leventhal et al., 1992; Dilla et al., 2009). Thus, understanding how patient’s beliefs form and evolve is critical to developing strategies aimed at improving the trust and adherence to the prescribed treatments, and therefore fostering well-being among patients and users.

Previous research on experimental psychology suggests that many of these health-related decisions such as treatment adherence, or therapeutic choices, can be better understood as a result of causal learning (Rottman et al., 2017). That is, the users’ beliefs about the effectiveness of the treatment are causal in nature, i.e., “the treatment causes the symptom remission,” or “the treatment prevents me from falling ill.” Thus, it is possible to study the patients’ beliefs of treatment effectiveness through causal learning experiments (see reviews in Matute et al., 2015; Matute et al., 2019). This possibility offers a number of advantages. To begin with, we can study the formation of beliefs under highly controlled settings, by using fictitious scenarios and computerized tasks. This would be impossible in real life, in which researchers cannot manipulate parameters such as the frequency with which a treatment is used, its actual effectiveness, or the severity of symptoms. Thus, ecological studies would be limited because it is often impossible to run an experiment that unveils causal relationships between different factors and health beliefs, and most research would be limited to uncontrolled, observational studies. The second advantage of using causal learning experiments is that we can study health beliefs in a safe context, without putting the participant’s health at risk. As this research normally involves using treatments with no actual benefit, or even inducing false beliefs of effectiveness, it would be unethical to conduct such studies with real health outcomes. Additionally, it is sometimes possible to use samples of real patients who deal with fictitious or imagined health outcomes in the context of a causal learning experiment (Meulders et al., 2018), which helps to alleviate the limitations of ecological validity while using highly controlled procedures.

This line of research that uses causal learning experiments to study health beliefs has shown some promising advances. For example, it is possible to predict which conditions will make patients and users more vulnerable to pseudomedicine and bogus health claims (Blanco et al., 2014; Blanco and Matute, 2020), to discover situations in which previously acquired beliefs interfere with actual effectiveness (Yarritu et al., 2015), to investigate how health beliefs are affected by biases in Internet search (Moreno-Fernández and Matute, 2020), to explain why certain patients are hypersensitive to pain symptoms (Meulders et al., 2018), and to improve the effect of placebos (Yeung et al., 2014). This knowledge has the potential to offer a valuable foundation for designing interventions aimed at debiasing dysfunctional beliefs in real life settings (Lewandowsky et al., 2012; Macfarlane et al., 2020).

Exploring Health Beliefs in the Laboratory

Most causal learning experiments exploit a basic principle of causality: causes and effects (outcomes) correlate with each other, unless a third factor masks this relationship. Since causality cannot be directly observed (Hume, 1748), people use this simple principle and rely on a proxy measure, the contingency between the cause and the outcome, to estimate causality (Allan, 1980; Wasserman et al., 1996; Vadillo et al., 2005; Blanco et al., 2010). In a simple situation with only one binary cause and one binary outcome, the contingency can be computed by means of the Δp index (Allan, 1980). This is simply the result of subtracting the probability of the outcome occurring given that the cause occurred, P(O| C), minus the probability of the outcome occurring given that the cause did not occur, P(O| ∼C). Large values of Δp correspond to situations in which the cause increases or decreases the probability of the outcome beyond the base-rate, P(O| ∼C). The larger this difference is, the stronger the association between cause and outcome, and therefore the higher the chances that there is a causal link. According to previous research, this is how probabilities could produce causal beliefs in many situations (Perales et al., 2016).

In the context of judging a treatment’s effectiveness, this reasoning amounts to computing how often the symptomatic episodes appear during the treatment, P(O| C), compared to how frequent they are without the treatment, P(O| ∼C). This comparison renders fairly in randomized controlled trials, in which two comparable groups of patients are recruited (i.e., experimental vs. control, or treatment vs. placebo). That is, clinicians often form their judgments on the effectiveness of a treatment after carefully comparing the two groups, and ensuring that occurrences of symptom remission are more frequent in the treatment group than they are in the control group. However, although this reasoning applies well to clinicians and researchers, patients often lack the resources to base their decisions on such complete information. Rather, they must form their beliefs of effectiveness on the basis of a more limited comparison: how often symptoms were observed before the treatment started vs. how often they occur during the treatment, on the same patient (usually, themselves). Most causal learning experiments do not take into account this limitation, and instead provide participants with information about a series of different patients (Blanco et al., 2014; Matute et al., 2019). This is useful to investigate the formation of causal knowledge in general, but it is not realistic when applied to the case of patients’ beliefs of effectiveness, as the procedure clearly departs from the actual experience of patients with their own treatments. In the current research, we propose a more natural setting to investigate the formation of beliefs of effectiveness, by presenting information of a single patient previous to, and during, a treatment (see a related approach in Blanco and Matute, 2020).

Previous experiments that used causal learning paradigms suggest that people can often be accurate in their judgments of causality (Shanks and Dickinson, 1987; Wasserman, 1990; Blanco et al., 2010), being generally sensitive to the actual contingency presented in the experiments. However, researchers have also reported systematic deviations, or biases. In particular, when the probability of the desired outcome is high, judgments tend to be higher even in null contingency conditions (Alloy and Abramson, 1979; Buehner et al., 2003; Blanco et al., 2014, 2020; Chow et al., 2019), contributing to what has been called a “causal illusion.” This is a bias consisting of the belief in a causal link that is actually inexistent (Matute et al., 2015; Matute et al., 2019). The causal illusion bias share some features with other phenomena like the classical illusory correlation effect (Chapman and Chapman, 1967, 1969), and pseudocontingencies (Kutzner et al., 2011; Fiedler et al., 2009).¹ Despite their different explanations and assumptions, all these phenomena coincide in the importance of event probabilities, such as the probability of the cause and the probability of the outcome, when judging causal relationships.

Thus, the causal illusion (as well as the other related biases) has been suggested to underlie many beliefs related to treatment effectiveness, and in particular those concerning pseudomedicines. These are treatments claiming to be effective, despite the lack of scientific evidence supporting levels of effectiveness higher than those of placebo (Lilienfeld et al., 2014; Macfarlane et al., 2020). The rationale is that, when diseases have a high chance of spontaneous remission, people systematically overestimate the effectiveness of treatments, even of those treatments that are completely unable to produce an effect. This could have serious consequences in real-life, as patients may grant undeserved trust and reliability to treatments that produce no actual benefit, thus losing the therapeutic opportunity (Freckelton, 2012).

By contrast, little research has paid attention to another possibility: that patients may also underestimate the effectiveness of actually valid treatments. As we will show, we have reasons to expect that causal learning can also produce this underestimation effect under some circumstances (see an example in Yarritu et al., 2015). For instance, by virtue of the biasing effect of the probability of the remissions that we described above, a treatment might appear as not effective when used on a disease with frequent symptomatic episodes, compared to a mild disease with less frequent symptoms.

Overview of the Experiments

In the current research, we use a causal learning procedure to experimentally study how people form beliefs of effectiveness for a fictitious treatment. Specifically, we present a medicine that is able to produce a moderate improvement in symptoms (i.e., a medicine with moderate contingency with symptom remission), and compare the perceived effectiveness in two situations: a disease with a high probability of symptomatic episodes, and a disease with a low probability of symptomatic episodes. Since the medicine equally works to reduce the frequency of episodes in both scenarios, one would expect similar ratings of effectiveness. However, the probability of the outcome (in this case, the observation of symptom remissions) could bias the judgments, producing the impression that the medicine is working better in the group in which symptoms had lower base-rates. In contrast with most previous studies on causal learning, we provide the information of the treatment effectiveness on a more natural fashion, which implies: (a) describing first how likely symptoms are before the treatment, and then how they respond to the introduction of the treatment, and (b) that the information given through a series of trials concerns only one patient, observed through time. This presentation format aims to mirror the chronology and generalization ability of the observations made by patients in real life.

Ethics Statement

The procedure was revised and approved by the Ethical Review Board of the University of Deusto. The participants were informed before the experiment that they could quit the study at any moment by closing the browser window. No personal information (i.e., name, IP address, e-mail) was collected. We did not use cookies or other software to covertly obtain information from the participants. All measures, groups and conditions are disclosed. Data, materials, and R scripts for the three experiments are publicly available at the Open Science Framework: https://osf.io/emzbj/.

Experiment 1

Experiment 1 uses a causal learning task to investigate the question of whether the effectiveness of a medicine can be underestimated if the disease has a high base-rate of symptomatic episodes. We expect that diseases that produce frequent observations of symptoms would create the impression that the treatment is not working as effectively as a treatment used for a disease with less frequent symptomatic episodes.

Method

Participants

We initially planned a sample of 100 participants, which would allow for the detection of effects of d ≥ 0.57 in the difference between two groups at 80% power. However, data from one subject were not recorded due to technical errors. Thus, 99 Internet users (45 male, with age M = 31.38, SD = 9.88) participated anonymously through the Prolific Academic platform (Palan and Schitter, 2018), in exchange for money (0.80£ for about 10 min). The program randomly assigned 52 participants to the Infrequent group, and 47 to the Frequent group.

Procedure and Design

We adapted the standard trial-by-trial contingency learning task (Wasserman et al., 1990) that is extensively used to study human learning. The experiment was programmed in JavaScript to run online using a web browser. The instructions (available at the Open Science Framework, https://osf.io/emzbj/) asked participants to imagine that they were suffering from a fictitious disease called Hamkaoman Syndrome, which produces severe headaches. However, this symptom appears from time to time. Participants were told that the fictional drug Batatrim was a potential treatment for this disease if taken on a daily basis, but it may not work equally well for all people (i.e., “Perhaps it works in your case, but we don’t know until we try”). The goal of the task was to use the information to find out whether Batatrim works to stop the headaches.

Then, the training started by presenting a series of 40 records sequentially. Each record corresponded to one day, and displayed information about (a) whether the patient took Batatrim that day and, after a delay of 1 s, (b) whether the patient reported a headache (see Figures 1A,B). This information remained on the screen until the button “Next” was clicked, which proceeded to the next trial (after an inter-trial-interval of 500 ms).

FIGURE 1

Figure 1. Screenshots showing the contingency learning task. (A) At the beginning of the trial, the information about the medicine (top part of the screen) is shown for 1 s. (B) Then, the information about the presence or absence of the symptoms is shown in the center of the screen (in this example, the patient did not report symptoms). Pressing the “Next entry” button leads to next trial after a delay (ITI) of 500 ms in which the screen is cleared. (C) After the training session, we collect an effectiveness judgment on a –100 to +100 scale.

The training comprised two consecutive phases. During Phase 1, as the instructions indicated, participants observed the records corresponding to the time before the treatment had started (“In the first round of records, you will observe the diary entries corresponding to the time before you had any treatment, when you were just waiting for the doctor to give you Batatrim.”). That is, Phase 1 contained 20 medicine-absent trials, in which either the patient reported a headache or not, and did not take any drug, therefore it conveyed the information to compute P(O| ∼C). Then, in Phase 2, participants started observing the 20 records that corresponded to the time after the treatment had started (“You have already learned about the symptoms produced by the Hamkaoman Syndrome when no treatment is given. Now, your pills have arrived, and you will start taking Batatrim on a daily basis.”). This means that only medicine-present trials were shown in Phase 2, which serves to compute P(O| C). The order of the trials within each phase (outcome-present or outcome-absent) was randomly determined for each participant.

Table 1 summarizes the experimental design. In the Frequent group, the symptoms were initially very frequent: 14/20 trials in Phase 1 (before treatment), and 8/20 in Phase 2 (during treatment). By contrast, in the Infrequent group, the symptoms were reported less often: 8/20 trials in Phase 1, and 2/20 in Phase 2. However, the objective contingency between treatment and symptom occurrence was the same in both groups. In the Frequent group, the contingency is computed as P(O| C) – P(O| ∼C) = 0.4–0.7 = −0.3; and in the Infrequent group it yields the same number, P(O| C) – P(O| ∼C) = 0.1–0.4 = −0.3. That is, according to the contingency rule for determining effectiveness (Δp), the two groups were depicting a medicine that was equally effective (a difference of 30% in the symptoms occurrence, in absolute terms), although they differed in the symptom base-rate.

TABLE 1

Table 1. Design of Experiment 1.

After the sequence of 40 trials (20 in each phase), participants were asked several questions. First, we collected an effectiveness judgment (i.e., “How effective is Batatrim?”), which was our main dependent variable. The judgment was collected on a scale from −100 (“Batatrim clearly worsens your symptoms”) to 0 (“Batatrim does not have an effect on your symptoms”), to +100 (“Batatrim clearly improves your symptoms”). To help interpret the response scale, we included five evenly separated small pictures of faces ranging from −100 (sick face) to +100 (happy face). When participants hovered the mouse pointer over these pictures, a small box appeared with a verbal label as shown in Figure 1C. No time constraints were imposed to answer these questions.

Second, we asked two conditional probability questions (in random order for each participant): P(O| C) judgment (“Imagine a different person who suffers from the same syndrome. This person takes Batatrim on 100 consecutive days. Out of these 100 days in which the person takes Batatrim, on how many of them will the person report having headaches?”), and P(O| ∼C) judgment (“Imagine a different person who suffers from the same syndrome. This person does not take Batatrim on 100 consecutive days. Out of these 100 days in which the person does not take Batatrim, on how many of them will the person report having headaches?”). These two pieces of information, combined, serve to compute the contingency between treatment and symptoms, and hence are necessary to correctly assess effectiveness. By examining these two questions, we will be able to detect whether participants correctly encode the two probabilities.

Finally, we requested a judgment about the tendency to opt for an alternative treatment different from Batatrim (“If you had the chance, would you stick to your current treatment with Batatrim, or would you try a different treatment?”). This was answered on a scale with five options (“I’m sure I would stick to Batatrim” / “I would probably stick to Batatrim” / “I don’t know” / “I would probably try a different treatment” / “I’m sure I would try a different treatment”). We expected that participants who felt that the medicine was not working well would be more likely to stop taking it and try a different treatment.

Results and Discussion

The main results are those obtained from the effectiveness judgments, depicted in Figure 2. Although the medicine was identically effective in both groups according to the contingency information, the effectiveness judgments were significantly higher in the Infrequent group (which featured a lower symptom rate before the medicine was taken) than in the Frequent group, t(97) = 4.96, p < 0.001, d = 0.998. This suggests that those diseases that course with frequent symptomatic episodes will produce an underestimation of the actual effectiveness of the treatment relative to those with less frequent symptoms.

FIGURE 2

Figure 2. Mean effectiveness judgments in Experiment 1. Higher positive values indicate a strong belief that the medicine works to reduce the symptoms. Jittered data points are superimposed to the plot (to avoid overplotting, the placement of data points of a given condition along the x-axes is random). Error bars depict 95% confidence intervals for the mean.

Next, we examine the judgments measuring the tendency to switch to alternative treatments, whose descriptive statistics appear in Table 2. The judgments could range between 1 (“I’m sure I would stick to Batatrim”) and 5 (“I’m sure I would try a different treatment”). These judgments were significantly higher in the Frequent group than in the Infrequent group, t(97) = 4.22, p < 0.001, d = 0.850. That is, those participants who observed a disease with frequent symptomatic episodes were not only more likely to produce lower estimates for the effectiveness of the medicine, but they were additionally less willing to adhere to the treatment with Batatrim, despite the medicine being identically effective in the two groups.

TABLE 2

Table 2. Descriptive statistics for the alternative treatment judgments in the three experiments.

Finally, we analyzed the conditional probability judgments to gain insight into how participants learned these two pieces of information, the probability of symptoms when the medicine was taken, P(O| C) and the probability of symptoms when no medicine was taken, P(O| ∼C). These judgments are depicted in Figure 3. We conducted a mixed 2 (Group) × 2 (Probability), revealing a main effect of Group, F(1,97) = 117.0, p < 0.001, $η_{p}^{2} = 0.55$ . Overall, probability judgments were greater in the Frequent group than in the Infrequent group, which is consistent with the actual symptom probabilities in each group. We also found a main effect of Probability, F(1,97) = 327.91, p < 0.001, $η_{p}^{2} = 0.77$ , which just reflects the fact that the symptoms reduced their frequency from Phase 1 to Phase 2 (i.e., the medicine was effective). Importantly, there was no interaction, F < 1. To better interpret these results (and those of subsequent experiments, with additional groups), we computed a “perceived contingency score” by subtracting the two conditional probability judgments following the Δp rule, i.e., P(O| C)-P(O| ∼C). These scores can then be interpreted as the amount of contingency that a participant perceived, based on the conditional probability ratings. The resulting values showed no differences between groups, t(97) = 0.65, p = 0.51, d = 0.13, indicating that the perceived contingency was the same in both base-rate groups, as the conditional probability estimations only differed between groups in their absolute values. Taken together, the results suggest that participants were able to capture accurately the probabilities involved in the computation of contingency, as the mean estimations were close to the actual values presented in the task. Therefore, the underestimation of effectiveness that we reported above cannot be explained as a failure to learn the conditional probabilities.

FIGURE 3

Figure 3. Mean conditional probability judgments in Experiment 1. Jittered data points are superimposed to the plot (to avoid overplotting, the placement of data points of a given condition along the x-axes is random). Error bars depict 95% confidence intervals for the mean.

Experiment 2

Experiment 1 successfully showed that the base-rate of the symptomatic episodes can bias the judgments of treatment effectiveness: diseases with a higher probability of symptoms produced lower perceived effectiveness, even if the actual contingency was identical. This aligns with the evidence obtained in different situations (e.g., null contingencies), and also with results from experiments conducted in related paradigms (e.g., pseudocontingencies, Kutzner et al., 2011).

Still, our results could be interpreted as if our participants were simply ignoring the contingency information, guiding their judgments by the probability of symptoms only. That is, it could be possible that if a medicine drives the probability of symptoms close to zero, it would be judged as effective even if the initial base-rate without treatment was also small, as people could just ignore the initial base-rate. In fact, as we mentioned above, there is ample empirical evidence indicating that judgments of causality can be strongly biased by the probability of the outcome, at least in null contingency situations (Alloy and Abramson, 1979; Buehner et al., 2003; Blanco et al., 2014; Chow et al., 2019; Blanco and Matute, 2020).

Experiment 2 aims to replicate the findings of Experiment 1, while introducing two control groups in which the actual contingency between the treatment and symptom remissions is zero: In these two control groups, the probability of the symptoms is the same before and after the treatment (i.e., the medicine does not work at all). These two probabilities match those of the two experimental groups when taking the medicine, P(O| C), which are identical to those used in Experiment 1. That is, for half of the participants, symptoms will be frequent, and for the other half they will be infrequent. Orthogonally, for half of the participants, the medicine will work (by reducing the symptom probability in 30%, in absolute terms), whereas for the other half it will not work at all. Thus, if participants judge the effectiveness of the treatment only by attending to the frequency of the symptoms and ignoring the contingency, then the control groups would not differ from the experimental groups, revealing that participants are only biased by the base-rate of the effect. Conversely, if participants do take into account contingency, they should note that control medicines are not effective.