A Decision-Theoretic Model of Behavior Change

Undesirable habitual or addictive behaviors are often difficult to change. The issue of “behavior change” has long been studied in various research fields. Several models for behavior change have converged to the hypothesis that attitudes, norms, and self-efficacy are important determinants of intentions and behavior. To improve the accuracy of behavior-change models, some researchers have tried to combine behavioral economics models with existing models for behavior change. However, these attempts have failed because the existing models [e.g., Theory of Planned Behavior (TPB)] are not consistent with Expected Utility Theory (EUT), which underlies various behavioral economics models. In the present paper, we clarify the corresponding components between existing models for behavior change and EUT, and propose a new model, the Decision-Theoretic Model of behavior change (DTM), which is a natural extension of ordinary EUT.


INTRODUCTION
It is often difficult for clinicians, trainers, or teachers to change people's undesirable habitual or addictive behaviors, such as overeating, excessive drinking, lack of exercise, and smoking. How can we help them change people's behavior for the better? The problem of "behavior change" has long been studied in various research fields such as psychology, pedagogy, nursing, public health, medicine, and health promotion (Fishbein and Ajzen, 2010). Several models for behavior change have converged to the hypothesis that attitudes, norms, and self-efficacy are important determinants of intentions and behavior (Sheeran et al., 2016). However, existing models for behavior change, such as "Social Cognitive Theory" and "Theory of Planned Behavior (TPB)" cannot sufficiently predict the occurrence probabilities of a considered behavior or its change through interventions (Sniehotta et al., 2014).
To improve the accuracy of predictive models for behavior change, some researchers have started to try to combine behavioral economics models with existing models for behavior change (Roberto and Kawachi, 2015). Because behavioral economics models consider various behavioral biases that affect the occurrence of a target behavior and/or its change through interventions, this combination was expected to be useful. However, existing models of behavior change are not consistent with Expected Utility Theory (EUT), which underlies a variety of behavioral economics models (Kahneman and Tversky, 1979;Schoemaker, 1982), and, therefore, this combination of models has been challenging.
In the present paper, by clarifying the corresponding components between TPB and EUT, we propose a new model, Decision-Theoretic Model of behavior change (DTM), which is consistent with EUT (Figure 1). Specifically, in DTM, we add the components of subjective norm and self-efficacy to the ordinary EUT.
In the following sections, we first explain the details of EUT; second, we explain the details of TPB and reinterpret TPB in a decision-theoretic way; third, we describe our new model as a natural extension of EUT; fourth, we discuss the superiority of DTM; and finally, we summarize our arguments and discuss future research directions.

EXPECTED UTILITY THEORY (EUT)
EUT is one of the most popular approaches for rational decision-making in a stochastic environment (von Neumann and Morgenstern, 1947). When the state set (S = {s 1 , s 2 , . . . , s n , . . . , s N }), the action set (A = {a 1 , a 2 , . . . , a j , . . . , a J }), the subjective probability of a state s n given an action a j (P(s n |a j )), and the subjective utility of a state s n (U self (s n )) are given, EUT states that the agent chooses an action a j so as to maximize the expected value of subjective utility.
In the present paper, we consider a case wherein the action set has two complementary elements (A = {a 1 : performing the target behavior, a 2 : not performing the target behavior}) ( Figure 1A). In many empirical studies, it is assumed that the agent's actionselection rule is based on a sigmoidal function, e.g., the logistic function (Luce, 1959;Sutton and Barto, 1998).
where the inverse temperature β 1 denotes randomness of action selection, and the constant term β 0 denotes decision bias. For example, consider the case with S = {s 1 : health, s 2 : disease}, A = {a 1 : exercising, a 2 : not exercising}, and that the agent has the beliefs of P(s 1 | a 1 ) = 0.8, P(s 1 | a 2 ) = 0.2, U self (s 1 ) = 1, and U self (s 2 ) = 0. Then, the expected utilities of each action are: When the agent's internal decision parameter β 1 = 1, and constant term β 0 = 0, EUT predicts that P(a 1 ) 0.65 in this simple situation.

THEORY OF PLANNED BEHAVIOR
TPB is a typical model for behavior change, in which the behavioral intention (BI) for the target behavior (a 1 ) is determined by three factors: attitude toward the behavior, subjective norm, and perceived self-efficacy (Figure 2). At first glance, perceived self-efficacy is different from "perceived behavioral control, " which is the third factor of the original version of TPB, but these two concepts are treated as being the same in a newer version (Fishbein and Cappella, 2006). All behavior determinants are measured by questionnaire ratings for the target behavior. Table 1 shows the typical TPB questionnaire in the case that the target behavior is "Exercising for at least 20 min, three times per week for the next 3 months" (Fishbein and Ajzen, 2010).
Attitude toward the behavior is the agent's positive or negative evaluation of performing the target behavior a 1 (Ajzen, 1991;Fishbein and Ajzen, 2010), which is based on EUT in economics, or expectancy-value theory in psychology (Edwards, 1954;Ajzen, 1985). Attitude toward the behavior is determined by aggregating the products of behavioral beliefs and the evaluation of outcomes. As a behavioral belief it is the belief (subjective probability) that performing the target behavior (a 1 ) will lead to a particular outcome state (s n ) among the state set, we consider and denote the behavioral belief as P(s n |a 1 ) (Ajzen, 1985). As the evaluation of an outcome is the expectation of an agent's utility when the outcome is obtained, we denote it as U self (s n ) (Ajzen, 1985). Then, importantly, we can consider the attitude toward the behavior as the expected utility when a 1 is given (E[U self |a 1 ] = Σ n=1 N P(s n |a 1 ) * U self (s n )) (Edwards, 1954;Ajzen, 1985;Fishbein and Ajzen, 2010). It is worth noting that both E[U self |a 1 ] and E[U self |a 2 ] are considered in EUT, but only E[U self |a 1 ] is considered in TPB.
Because the agent's behavior could not be explained well merely by attitude toward the behavior, TPB has added two other factors, subjective norm and perceived self-efficacy.
Subjective norm is the perceived social pressure to engage or not engage in a behavior (Fishbein and Ajzen, 2010). Subjective norm is determined by aggregating the products of normative beliefs and the motivation to comply with other individuals (m k ; k = 1, 2, . . . , K). As normative beliefs refer to the agent's belief about the degree to which a particular individual, K, thinks the agent should perform the target behavior a 1 , we consider it as the agent's expectation of the individual's utility when the target behavior is performed, and denote it as U k (a 1 ). Then, we can consider the subjective norm as the weighted sum of other individuals' utilities (U others (a 1 ) = Σ k=1 K m * k U k (a 1 )) (Fishbein and Ajzen, 2010). It is worth noting that other individuals' utilities are a function of action, whereas the agent's utility in attitude toward the behavior is a function of state, in TPB.
(Perceived) self-efficacy, originally proposed by Bandura (Bandura, 1977), is a personal judgement of "how well one can execute courses of action required to deal with prospective situations" (Bandura, 1982). Bandura emphasized it as a determinant of human behavior in addition to outcome expectations ( Figure 3). As perceived self-efficacy for the target behavior a 1 is the belief about the probability of performing the behavior successfully when the agent intends to perform the target behavior (i 1 ), we denote it as P(a 1 |i 1 ). It is worth noting here that the outcome expectation corresponds to the behavioral beliefs mentioned above, because it is defined as an agent's estimate that a given behavior will lead to certain outcomes.

FIGURE 1 | (A)
EUT. EUT is one of the most popular approaches for rational decision-making in a stochastic environment. An action set (A = {a 1 : performing the target behavior, a 2 : not performing the target behavior}) and a state set (S = {s 1 , s 2 , …}) are assumed. The agent holds the belief that each action causes any state with a certain probability in the corresponding action-state link (P(s n |a j )). When an action a j is given, the expected value of subjective utility (E[U self |a j ]) is calculated. EUT states that the agent chooses a j, so as to maximize E[U self |a j ]. (B) EUT-like schema of TPB. Intention to perform the target behavior (i 1 ) is additionally assumed. In TPB, the three determinants of the behavioral intention are attitude toward the behavior, subjective norm, and perceived self-efficacy. The attitude toward the behavior depends on P(s n |a 1 ) and U self (s n ), subjective norms appear as U others (a 1 ), and perceived self-efficacy appears as P(a 1 |i 1 ). (C) DTM. The intention set I = {i 1 : intention to perform the target behavior, i 2 : intention not to perform the target behavior} as well as the action set, and the state set are assumed. The agent holds the belief that each intention causes both actions with certain probabilities of the corresponding intention-action links (P(a j |i h )) in the same way as each action causes the states with certain probabilities of the corresponding action-state links (P(s n |a j )). When i h is given, the expected value of subjective utility (E[(U self + wU others )|i h ]) is calculated, where w denotes the weight of U others relative to U self in calculating subjective utility. DTM states that the agent chooses intention i h so as to maximize E[(U self + wU others )|i h ].
The weighted sum of these three determinants-attitude toward the behavior, subjective norm, and perceived selfefficacy-determines BI (Figure 2).
where, w 1 , w 2 , and w 3 denote the weight of attitudes toward the behavior, subjective norm, and perceived self-efficacy, respectively. This equation can be simplified to: which allows us to compare it with DTM later [section Decision-Theoretic Model of Behavior Change (DTM)]. The second term of Equation 3 and the corresponding part of Equation 3 ′ about U others are equivalent, because N n=1 P (s n |a 1 ) = 1. Here, we note that BI is not consistent with EUT, because subjective norm and perceived self-efficacy are simply added to E[U self |a 1 ]. In other words, attempts to improve the model's accuracy by incorporating subjective norm and perceived self-efficacy in TPB are inconsistent with EUT, which underlies a variety of behavioral economics models (Kahneman and Tversky, 1979;Schoemaker, 1982). We tried to draw a schematic view of TPB while maintaining consistency with EUT, as much as possible ( Figure 1B). In the EUT-like schema of TPB, the three determinants of behavioral intention can be identified. However, their summation does not mathematically provide the occurrence probability of the target behavior in the EUT-like schema.
When the target behavior is considered as a dichotomous variable ({a 1 : performing the target behavior, a 2 : not performing the target behavior}), logistic regression is commonly used to predict the agent's intention. This corresponds to the assumption that the agent's intention-selection rule is based on a sigmoidal function, e.g., the logistic function Luce, 1959;Sutton and Barto, 1998.
The occurrence probability of the target behavior (P(a 1 )) is a function of P(i 1 ) and actual (not perceived) self-efficacy. As actual self-efficacy for the target behavior a 1 should be the objective probability of performing the behavior successfully when the agent intends to perform a 1 , we denote it as P actual (a 1 |i 1 ). However, in many cases, actual self-efficacy is difficult to measure through questionnaires. In such cases, perceived self-efficacy is FIGURE 2 | TPB. TPB is a typical model for behavior change, in which the BI for the target behavior is determined by three factors: attitude toward the behavior, subjective norm, and perceived self-efficacy. Attitude toward the behavior (E[U self |a 1 ]) is determined by aggregating the products of each behavioral belief strength (P(s n |a 1 )), and evaluation of each outcome (U self (s n )) (violet). Subjective norm (U others ) is determined by aggregating the products of each normative belief (U k (a 1 )), and motivation to comply (m k )) (green). Perceived self-efficacy is the belief about the probability of performing the target behavior successfully when the agent intends to perform it (P(a 1 |i 1 )) (orange). BI (blue) is determined by the weighted sum of attitude toward the behavior, subjective norm, and perceived self-efficacy. Occurrence of the behavior (red) is a function of BI and actual self-efficacy (P actual (a 1 |i 1 )) (gray). The target behavior "Exercising for at least 20min, three times per week for the next 3 months" is considered in this example. used as a proxy for actual self-efficacy. Then, the estimated occurrence probability of the target behavior is: Here, note that the TPB questionnaire (Table 1) does not include any questions regarding the belief about the probability of achieving the target behavior (a 1 ) when the agent intends not to perform the behavior (i 2 ). Calculating P(a 1 ) without considering P(a 1 |i 2 ) ( P actual (a 1 |i 2 )) is allowed when P(a 1 |i 2 ) is assumed to be zero, which enables us to calculate P(a 1 ) just with P actual (a 1 |i 1 ) and P(i 1 ) (cf. Equation 8). Thus, P(a 1 ), which requires the value of P(i 1 ) based on BI(i 1 ) to be calculated, is what researchers would like to predict in behavior change studies. Therefore, typical TPB questionnaires contain questions about P(s n |a 1 ), U self (s n ), U k (a 1 ), m k , and P(a 1 |i 1 ), to predict P(a 1 ) ( Table 1).

DECISION-THEORETIC MODEL OF BEHAVIOR CHANGE (DTM)
As we mentioned in the Introduction, some researchers recently tried to combine behavioral economics models with existing models for behavior change (Roberto and Kawachi, 2015) to improve the accuracy of the prediction of behavior. However, the existing models of behavior change challenge this combination, because they are not consistent with EUT.
Here, we propose a new model, DTM, which is consistent with EUT. In DTM, we add the components of subjective norm and self-efficacy to the ordinary EUT. To do so, we introduce an intention set (I = {i 1 : intention to perform the target behavior, i 2 : intention not to perform the target behavior}), in addition to the state set (S = {s 1 , s 2 , . . . , s n , . . . ,s N }) and the action set (A = {a 1 : performing the target behavior, a 2 : not performing the target behavior}), which were already included in EUT ( Figure 1C).
The occurrence of i h (h = 1, 2) is determined by expected utility (E[U total |i h ]) in DTM. E[U total |i h ] is an aggregation of the products of the subjective probability of a state s n given an intention i h (P(s n |i h ) = Σ j=1 2 P(s n |a j ) * P(a j |i h )), and the total utility of a state (U total (s n )). We assume that total utility is a summation of the agent's utility and others' utility (U total = U self + wU others ), both of which are functions of state and behavior, where w denotes the weight of U others relative to U self in calculating subjective utility. Thus, expected utility E[U total |i h ] is: P(a j i h N n=1 P(s n a j U self a j , s n +wU others a j , s n (6) Note that other individuals' utilities in subjective norm are functions of action, whereas the agent's utility in attitude toward the behavior is a function of state in TPB. Here, in DTM, we defined both the agent and other individuals' utilities as functions of action and state.
To compare with TPB, we denote equation 6 as follows: =P(a 1 |i h ) N n=1 P(s n |a 1 ) U self (a 1 , s n ) +wU others (a 1 , s n )} +P (a 2 | i h ) N n=1 P (s n | a 2 ) U self (a 2 , s n ) +wU others (a 2 , s n )} (6 ′ ) Equation 3 ′ of TPB and Equation 6 ′ of DTM are different in the following five ways (Figures 1B,C): (1) E[U total |i 1 ] is a kind of expected utility; E[U total |i 1 ] in DTM is naturally extended from E[U self |a 1 ] in EUT by adding the components of subjective norm and perceived selfefficacy. In contrast, BI(i 1 ) in TPB cannot be considered as expected utility.
(2) DTM considers not only the expected utility given i 1 (E[U total |i 1 ]), but also the expected utility given i 2 (E[U total |i 2 ]), whereas TPB considers behavioral intention only for i 1 (BI(i 1 )). This difference is important when we consider P(i 1 ) and P(a 1 ) later in this section. (3) U self (a j , s n ) and U others (a j , s n ) in DTM are more flexible functions than U self (s n ) and U others (a 1 ) in TPB. TPB cannot consider cases in which the agent's utility depends on his/her action cost, or other individuals' utilities depend on the consequences of their actions. (4) E[U total |i 1 ] in DTM considers the utility of the case in which the agent intends to perform the target behavior (i 1 ), but fails to perform it and instead, performs an alternative FIGURE 3 | Bandura's schema. Perceived self-efficacy as well as outcome expectation are considered as determinants of human behavior. Perceived self-efficacy (P(a|i)) is the belief about the probability of performing the behavior successfully when the agent intends to perform it. Outcome expectation (P(s|a)) is the belief about the probability of a particular outcome, given the agent's target behavior.
action (a 2 ). However, BI(i 1 ) in TPB cannot take this into account. (5) Perceived self-efficacy (P(a j |i h )) is multiplied by expected utility given an action in DTM but is added to expected utility given a 1 in TPB.
We assume that the intention-selection rule is based on the sigmoidal function, as with EUT Luce, 1959;Sutton and Barto, 1998.
The difference between Equation 4 (TPB) and 7 (DTM) is that E[U total |i 2 ] is explicitly considered in Equation 7, but not in Equation 4. This difference is not important when E[U total |i 2 ] is stable across subjects or contexts, because it is adsorbed into a constant term. If E[U total |i 2 ] varies across subjects or contexts, which should be a plausible assumption, it significantly affects P(i 1 ). The estimated occurrence probability of the target behavior is: P (a 1 ) = P actual (a 1 |i 1 ) ·P (i 1 ) +P actual (a 1 |i 2 ) ·P (i 2 ) P (a 1 |i 1 ) ·P (i 1 ) +P (a 1 |i 2 ) ·P (i 2 ) The difference between Equation 5 (TPB) and Equation 8 (DTM) is that Equation 8 explicitly considers the case in which the agent performs the target behavior despite the absence of an intention to do so. This difference is not important only if P actual (a 1 |i 2 ) and/or P(i 2 ) are zero, because Equation 5 (TPB) and Equation 8 (DTM) are the same in this case. Thus, the occurrence probability of the target behavior is predicted by using these equations (Equations 6-8) in DTM. Therefore, DTM needs some additional questions in its questionnaires ( Table 2).
To summarize, DTM is a natural extension of EUT, which accounts for behavior change.

AN EXAMPLE SHOWING THE SUPERIORITY OF DTM
Here, we focus on the fifth difference between Equations 3 ′ and 6 ′ in section Decision-Theoretic Model of Behavior Change (DTM), to assert the superiority of DTM over TPB. Whereas, perceived self-efficacy is multiplied by the weighted sum of attitude toward the behavior and subjective norm in DTM (Equation 6 ′ ), it is added to these factors in TPB (Equation 3 ′ ), as we noted above. Let us think about the case of opening a tight jar lid. For the sake of simplicity, let us assume that there is no other individual present. The target behavior (a 1 ) is "straining the wrist enough to open the jar lid." Here, i 1 is "intention to strain the wrist enough to open the jar lid, " s 1 is "the lid was opened, " and s 2 is "the lid was not opened." In TPB, BI is determined by the following factors: (1) Attitude toward the behavior, which is governed by the value of the contents of the jar to oneself, (2) Subjective norm, which can be ignored in this case, because the absence of any other individual is assumed, (3) Perceived self-efficacy, which is the belief about the probability of straining the wrist enough to open the jar lid when one intends to do it. The estimated weight for attitude toward the behavior, and that for perceived self-efficacy are assumed to be positive in this case. Now, let us assume that this person injured his/her spinal cord and became totally paralyzed. Then, perceived self-efficacy would change to 0, but the attitude toward the behavior (or the subjective norm) would not change. Because BI of TPB is determined by the weighted sum of the attitude toward the behavior, the perceived self-efficacy, and the subjective norm (ignored here), TPB would predict that one will have the intention to strain the wrist enough to open the jar lid, regardless of her/his inability to move, in proportion to the value of the contents of the jar. This prediction is unrealistic, thus presenting a counterexample for TPB.
In contrast, DTM can properly predict that BI is consistently zero regardless of the value of the contents, because the weighted sum of attitude toward the behavior (and the subjective norm) is multiplied by perceived self-efficacy (= 0), showing the superiority of DTM.

DISCUSSION
In the present paper, we show that TPB could be considered as an attempt to improve the EUT's accuracy of predicting behavior change, by incorporating subjective norm and selfefficacy. Indeed, TPB has achieved great success, because it is a relatively simple model, and its three factors are actually effective in promoting behavior change (Sheeran et al., 2016). Applying TPB has allowed investigators to identify important psychological factors to understand, predict, and change human social behavior (Van Lange et al., 2011). Moreover, behavior change interventions applying TPB were actually effective in twothirds of studies (Hardeman et al., 2002), indicating that TPB is appropriate for clinical application.
However, TPB has a serious problem. Because subjective norm and perceived self-efficacy are simply added to the standard expected utility in TPB, it is not consistent with EUT, and thus, cannot be connected with behavioral economics models.
To overcome this problem, we propose a new behavior change model, DTM, which includes the components of subjective norm and self-efficacy as a natural extension of EUT.
As DTM is consistent with EUT, it can be easily extended in several ways. First, DTM can handle intertemporal choices by using temporal discounted utility. In particular, hyperbolic discounting, which is well-studied in behavioral economics, is important for behavior change because it can express procrastination (Story et al., 2014). Second, DTM can be easily extended to a Markov model by introducing a Markov decision process (MDP) framework. Markov models are useful when the situation is continuous over time, and important events may happen more than once (Sonnenberg and Beck, 1993;Sutton and Barto, 1998). Because most current neural models of the reward system are based on MDP, this extension enables us to combine behavior change models with pharmacological models of aberrant behavior such as addiction (Redish, 2004;Rangel et al., 2008). Third, we simply defined U total by the weighted sum of U self and U others in the present paper, but other ways of formulating U total are possible when considering various types of social preferences, such as inequality aversion, guilt aversion, and Rawlsian preferences (Fehr and Krajbich, 2014). Fourth, DTM could be applicable to studies about morality (Crockett, 2013). In DTM, we introduced a distinction between action and intention into the EUT, and this is an important character of moral judgement (Cushman, 2008). Utility in DTM is suitable to represent moral values, because it could be a function of not only action and outcome, but also intention [i.e., U self (i h , a j , s n ), U others (i h , a j , s n )].
We hope that DTM leads to a better combination of existing models of behavior change and behavioral economics models.