A surprising lack of consequences when constraining language

shifts regarding the group, or motivational shifts, according to Bayesian analyses. Nor did we detect negative effects of language constraint among people who saw themselves as opposed to censorship. Discussion: Although free speech and respectful language remain a multifaceted social debate, our ﬁndings show that university students are willing to follow even completely contrived language directives when describing social identity groups and to do so without substantial discomfort or backlash against those groups.


Introduction
Many hotly-debated social issues of the 21st century involve the use of labels for social identity groups.Some of these issues are moral questions: for instance, what labels should be used for specific social identity groups (e.g., "Native American, " "Aboriginal, " vs. "Indigenous"), and who is to decide this?Assuming normatively approved labels can be determined, however, many at least partially empirical questions are also raised by these debates.What messages or interventions can and should be used to encourage or pressure people into using these labels?Can the mass usage of alternative terms (e.g., "differently abled" vs. "disabled" or "handicapped") shift attitudes toward the group being labeled, or will revised terms provoke a "euphemism treadmill" by which old attitudes are simply transferred to the new terms (Greer, 1971;Pinker, 1994)?What compliance strategies are likely to stimulate resentment (e.g., provoking concerns about "language policing" and "political correctness"; e.g., Haidt, 2016) vs. being accepted without resistance?We think psychological science has given surprisingly little direct attention to these empirical issues, and in the present work we attempt to provide a "lightning in the bottle" demonstration of some relevant processes.
Choices about labels have consequences for members of threatened social identity groups.For instance, more negative attitudes toward gay people are activated in heterosexual people who are exposed to derogatory labels (i.e., slurs) vs. non-derogatory labels for gay people (Carnaghi and Maass, 2007).On the other hand, concerns are sometimes raised that efforts toward "political correctness" might result in backlash effects (i.e., negative attitudes toward the protected group, driven by reactance or avoidance), or the "chilling" of free speech (e.g., Strauts and Blanton, 2015;Haidt, 2016;Read, 2018).Concerns about political correctness tap into underlying political and moral concerns that obviously transcend the present investigation, which attempts to address some specific empirical questions.We tested whether soliciting compliance in avoiding certain group labels generates hostility or backlash effects among university students.

Multiculturalism and diversity initiatives
Social psychologists have developed increasingly sophisticated techniques to reduce negative attitudes held toward social identity groups (Allport, 1954;Hornsey and Hogg, 2000;Kawakami et al., 2007;Dovidio et al., 2008;Page-Gould et al., 2008;Johnson et al., 2018) or encourage multiculturalism more broadly (Rios and Wynn, 2016).A broad literature examines how diversity can be increased, and the benefits of multiculturalism (Crisp and Turner, 2011).
However, psychologists are beginning to also probe how diversity and multiculturalism initiatives can provoke resistance and backlash effects.Pushback toward prejudice-reduction and pro-diversity initiatives has often been observed (e.g., Vertovec and Wessendorf, 2010;Saad, 2020).Psychological interventions often provoke unwanted, unintended consequences (Wilson, 2011;Peters et al., 2014).Interventions designed to reduce gender bias may accidentally increase gender bias (Caleo and Heilman, 2019).People may become angry and show negative attitudinal shifts when pressured to engage in behaviors favorable toward minoritized groups (Plant and Devine, 2001).Interracial interactions, intended to improve intergroup attitudes through contact (e.g., Pettigrew and Tropp, 2006) may sometimes produce negative cognitive and emotional experiences toward the target or other prejudicial thinking (Shelton et al., 2005;Richeson and Shelton, 2007;Legault et al., 2011;Cooley et al., 2019).We think that attempts to control people's language could also prompt unintended negative reactions.

Why language control compliance may cause issues
An important social change relevant to academic/institutional settings and the broader public is the movement to have people comply with using specific terms for specific social identity groups (Marks, 2014;Indigenous Corporate Training Inc., 2016;National Assembly of State Arts Agencies, 2020;American Psychological Association, 2022).For instance, Canadians are asked to say "indigenous people, " not "native Americans, " "Indians, " or "aboriginal people" (Indigenous Corporate Training Inc., 2016).Concerns about politically correct speech such as appropriate group labels has been a concern historically and more recently (for a review, see Henderson, 2003).However, researchers seldom consider the possible barriers involved in securing people's compliance in using (or avoiding) target group labels, and the little work in this area generally focuses on exposure to blatantly negative group labels (i.e., slurs; Carnaghi and Maass, 2007;Croom, 2011;Jeshion, 2013), rather than more innocuous terms.
Empirical evaluation of strategies to change language usage remains an unresolved social problem.One theory often invoked in the political correctness debate is psychological reactance (Brehm, 1966;Brehm and Brehm, 1981).According to reactance theory, "threat to or loss of a freedom motivates [an] individual to restore that freedom" (Brehm and Brehm, 1981, p. 4).Reactance can have clear relevance to anti-prejudice or language constraint interventions, which may threaten some people's feeling of freedom to think, speak, and act freely toward members of threatened social identity groups (also see Chen et al., 2015;Munger, 2017).Reactance to control attempts may manifest in a variety of ways, such as reactant people seeking to learn more about a banned topic (Worchel et al., 1975); and negative cognitions, affect, attitudes, or behavioral intentions toward the prescribed behaviors (Dillard and Shen, 2005).Freedom threats may even be conceptualized as threats to one's sense of self (Graupmann, 2018) or group identity (Kachanoff et al., 2022).
If securing people's compliance in using group labels produces reactance-related threats in those targets, we might also anticipate especially positive (negative) reactions from people who are relatively supportive (unsupportive) of censorship efforts that favor diversity/multiculturalism efforts.That is, language constraints will promote the preferences of people who support censorship as a social strategy to advance their social goals (e.g., Ashokkumar et al., 2020;Clark and Winegard, 2020;Costello et al., 2022).To that end, in the present work, we considered whether people who censoring language in the name of progressive values might have more positive reactions to our language control instructions.

When language control compliance may cause issues
Rather than being an invariant psychological response to any request for language compliance, people might only dislike interventions when they are accompanied by specific arguments or justifications.In university communities, for example, language compliance requests will often be accompanied by any guiding rationale ("do not use the word X, because. . ."). Anti-prejudice interventions may work most effectively when they focus on moral issues raised by prejudicial attitudes or behaviors, consistent with the "that's wrong" approach advocated by Johnson et al. (2018).Experimentally examining distinct framing techniques may produce interesting insights.For instance, people cannot abstractly judge which speech-acts they will consider offensive when they are actually exposed to them (Almagro et al., 2021).Similarly, which framing factors will shape people's willingness to abstain from using language (arbitrarily designated as) "offensive" may not be intuitively obvious.We considered three types of framing consideration.

Positive vs. negative reasons
An example of a positive reason is that complying would showcase one's multicultural values.An example of a negative reason is that failure to comply could harm the target group's mental health because exposure to such language is harmful.In many domains, positive and negative information has asymmetrical effects (Baumeister et al., 2001;Rozin and Royzman, 2001;Fredrickson, 2013), so we wanted to consider the possibility of differential consequences.

Justification
Second, we varied the type of moral justification provided for language constraints, from consequentialist to deontological.Consequentialist morals appraise actions by considering what good or bad consequences arise from those actions (i.e., "speak this way because then something good will happen"), whereas deontological morals appraise actions' inherent qualities (i.e., "speak this way because it is inherently good to do so").Consequentialist rationales often are used to persuade people to follow language directives (e.g., National Assembly of State Arts Agencies, 2020; American Psychological Association, 2022).However, consequentialist arguments can have downsides.For instance, people are less likely to be judged as moral when they justify actions through a consequentialist (vs.deontological) lens (Everett et al., 2018), and people should be more resistant to an intervention imposed by an immoral source.Therefore, deontological reasons may be more effective at securing compliance, or avoiding deleterious psychological consequences for compliers.

Arbitrary motivations
Finally, we considered that providing no justification at all might produce distinct effects from providing any justification.For instance, reactance concerns and persuasive backfire may be increased when task instructions suggest the experimenter's persuasive intent (Wicklund et al., 1970;Brauer et al., 2012), so ironically language constraint might operate most effectively without rhetorical justification.However, people are also more willing to comply when provided with even vacuous justifications from a requester (Langer et al., 1978), which might suggest that arbitrary requests may be especially resisted or disliked.

The present research
To investigate the above ideas, we exposed participants to a novel paradigm in which they were to avoid using a set of completely commonplace group labels when writing brief essays about those groups.We focused on university students because universities are often intellectual and legal battlefields for disputes about free speech, group labels, and prejudice (Byrne, 1990;O'Neil, 1997) and are often viewed descriptively or normatively as places where societal change may be initiated (Marullo and Edwards, 2000).Specifically, we examined the psychological consequences of prohibiting particular labels for specific social identity groups (i.e., language constraints).Our decision to prohibit (i.e., use a proscriptive injunction) rather than encourage (i.e., prescriptive injunction) specific language use is important because past work suggests that proscriptions generate more resistance and legitimacy concerns than do prescriptions (Pavey et al., 2022).Additionally, we chose to prohibit words presently in common usage (as opposed to words already considered inappropriate) because people tend to generate more resistance to novel restrictions to their freedom (which seem contestable) as opposed to established freedom restrictions (which seem uncontestable; Laurin et al., 2012).Thus, by maximizing situational factors that seem to generate resistance, the present work represents a strong test of the hypothesis that language guidelines generate negative responses.Most likely, our instructions will prompt immediate compliance; therefore, we tested both: H1.A weak reactance hypothesis such that language constraints will increase reactance and decrease comfort.
H2.A strong reactance hypothesis that language constraints will produce more negative attitudes and behavioral intentions and decrease willingness to comply with subsequent language directives.
Still, we think that compliance directives may not have these effects for the reasons noted earlier.Because hypothesizing a null is counter to the null hypothesis significance testing (NHST) approach, we include Bayesian analyses to determine if we accumulated meaningful evidence for the null hypotheses (H0: Compliance instructions do not produce the effects denoted as H1/H2).
We also suspected that H1/H2 might depend on people's beliefs about diversity-related censorship activities, suggesting an interaction effect: H3.Language constraints may lead to the negative consequences listed under H1 and H2 only for individuals low in pro-diversity censorship beliefs.
Additionally, we had more exploratory interests in the following questions: Q1. Does positive vs. negative framing affect compliance rates or downstream consequences of compliance?
Q2. Does deontological vs. consequentialist framing affect compliance rates or downstream consequences of compliance?
Q3. Does framing in general (vs.providing only "arbitrary" or no specific justification) affect compliance rates or downstream consequences of compliance?

Method Overview of the samples and integrated dataset
Our four experiments used very similar procedures and methods (see verbatim materials in SOM-1), integrating to N = 997 (see Table 1 for an overview of the samples).All samples were composed of Canadian university students, primarily white, primarily women (84% women, 15% men, 0.2% non-binary, remainder PNA; M age = 19.5,SD age = 4.6; 75% White, 12% East Asian, 3% Black, 7% other, 3% PNA), participating for course credit.In all cases, we attempted to get participants to comply with talking about a target social identity group while avoiding specific labels for that group (e.g., write an essay about Black targets while avoiding words like "Black" and "African-American").The banned words were made up of words that would be commonly used in our university at the time of data collection.Given that some of our tests require such large sample sizes (e.g., Lakens et al., 2018), we decided to aggregate our data into an integrative data analysis (IDA; Curran and Hussong, 2009).The social group targeted for language constraints varied by sample (i.e., obese people in Sample 1, White or Black people in Samples 2-4), as did what forms of language constraint condition were employed.Data/syntax are open at https://osf.io/vpm8a/.

Procedure Phase : compliance request
Participants were initially introduced to the experiment's tasks: writing a few essays about a particular social group and answering some questionnaires.Before completing the writing tasks (Phase 2), they were randomly assigned to one of six betweenparticipant conditions.In the Control (No Constraint) condition, no special instructions were given at this point.In all remaining (Constraint) conditions, however, participants were warned that they should "not use certain group labels when discussing the group. . .absolutely must avoid any slur language in this writing."For Sample 1 these words were "fat, " "obese, " "overweight, " and "heavy."For Sample 2-4′s White Target conditions, these words were "white, " "Europeans, " and "Caucasians."For Sample 2-4′s Black Target conditions, these words were "black, " "African-American, " "African, " "colored, " and "person/people of color." However, each Constraint condition differed in terms of how the constraint was justified.In the Arbitrary Constraint condition, no further justification was given.In the Negative/Consequentialist condition, we told participants that inappropriate language "makes people think that it is normal to dislike that group, " that "groups exposed to such inappropriate language may feel socially isolated or rejected, " and that "when individuals see this sort of inappropriate language they are more likely to experience anxious We did not collect demographics for all samples, but samples were drawn from the same university and demographics should therefore be very consistent.Any demographic information was collected at the end of the study.Ethnicity questions had fixed options which in some cases contradicted compliance instructions (e.g., "White / European") but note these were positioned immediately before debriefing.
According to pilot testing on undergraduates drawn from the same population as the primary samples, on -point scales, African-Americans were seen as "su ering discrimination" (M = ., SD = .), as were overweight people (M = ., SD = .). White people were not seen as "su ering discrimination" (M = ., SD = .).
As we show in SOM-, White vs. Black as a target group did not a ect our results.
or depressive episodes."In the Negative/Deontological condition, we told participants that using "inappropriate group labels is simply the wrong thing to do, " and that "it is problematic to be disrespectful, cruel, and indecent-it is intrinsically wrong."We characterized such language as "bigotry, " and claimed that "bigotry and rejection of others are inherently bad things." The remaining conditions modified the previous two conditions but using a positive rather than negative framing.In the Positive/Consequentialist condition, we told participants that using appropriate language generates positive attitudes toward the target group, that groups exposed to appropriate language feel socially welcomed and accepted, and that using appropriate language leads the speaker to have more positive attitudes toward that group.Finally, in the Positive/Deontological condition we stated that using appropriate language is "simply the right thing to do, " that "it is an opportunity to be respectful, kind, and decent-it is intrinsically right, " and that using appropriate language is an example of following "diversity, " stating that, "diversity and acceptance of others are inherently good things."

Phase : writing tasks
Regardless of experimental condition, participants viewed a cluster of images showing four members of the target group, and reported what they would usually call people in that group using a textbox (following any rules imposed by Constraint conditions).Participants then wrote two paragraphs, each about the specific group they had just labeled.Our two writing prompts read as follows: "In this box, please write down your thoughts concerning your own personal experiences interacting with this [Sample 2-4: ethnic] group" (Personal Interactions Task), and "In this box, please write down your thoughts concerning how you think this [Sample 2-4: ethnic] group is treated in modern Canada" (Cultural Context Task).For each task, participants spent about 5 min writing.Thus, participants were being pressured into complying repeatedly across an extensive writing period.

Phase : reaction and moderator measures
Measures were filled out in the following order: (1) reactance emotions, (2) willingness for future compliance, (3) task comfort, (4) attitudes and behavioral intentions toward the target group in counterbalanced order, (5) motivations to control prejudice and censorship attitudes in counterbalanced order.Not all Samples included all measures, so degrees of freedom vary somewhat across tests.We summarize the measures briefly below, but see SOM-4 for more extensive descriptions.

Future compliance
Participants rated how willing they would be to continue with the language directives in the future (Constraint conditions), or how willing they would be to follow language rules if we imposed these rules on them (Control condition), from 1 (very unlikely) to 7 (very likely; M = 4.95, SD = 1.68).

Task comfort
Participants rated "How comfortable did these previous language-related tasks make you feel?" from 1 (very uncomfortable) to 7 (very comfortable; M = 4.33, SD = 1.72).

Behavioral intentions
We averaged four items evaluating positive behavioral intentions toward the target group (playing sports, working on a project, having a conversation, playing a game; α =0.91; rated from 1 = Very Unlikely to 7 = Very Likely; M = 5.75, SD = 1.22).

Debriefing
We debriefed participants and clarified that our specific language directives were contrived for the sake of the experiment.Because data were collected online, we do not have interview data with participants.However, the null results for reactance emotions and task comfort suggest that the paradigm was not particularly distressing for participants.Furthermore, no adverse events were reported for any of the studies.

Results
The following results are from the IDA which aggregates data from the four samples.

Compliance
Starting with the Control (no language constraint) conditions, 55% of Sample 1 participants referred to the targets using one or more of the labels that in the other conditions would be banned.The most frequent choice was "overweight."In contrast, in Sample 1, 2-6% (by condition) of Constraint condition participants used banned words.Constraint condition participants (told to avoid specific labels) were far less likely to use the banned words than Control participants (who were not told to avoid specific labels), F (4, 247) = 32.10,p < 0.001, η 2 p = 0.34.87%/88%/82% of Sample 2/3/4′s Control (no language constraint) participants, respectively, referred to labels that in the Constraint conditions would be banned.In short, we succeeded in picking words that people conventionally use for these groups.In Samples 2/3/4, respectively, 27-33%/11-29%/13-34% (ranging by condition) of Constraint condition participants used banned words.Thus, constraint conditions greatly reduced those words' usages; Sample 2: F (4, 352) = 28.42,p < 0.001, η 2 p =0.24; Sample 3: F (4, 187) = 21.19,p < 0.001, η 2 p =0.31; Sample 4: F (2, 192) = 59.11, p < 0.001, η 2 p = 0.38.These results support our prediction that compliance instructions would produce at least immediate behavior change.These findings also could be seen as a successful check for the compliance manipulation.
Compliant participants referred to White targets as "fair, " "bright, light, happy, " "English, " "humans, " "non-POC people, " and other workarounds; and to Black targets as "Racialized people, " "African, " or "minority."Most participants used workarounds to avoid using banned terms, including "people" or "plus size, " and interestingly some called the group "diverse" (presumably because . /frsps. .the pictures of obese individuals were racially diverse, and a mix of men/women).Many participants referred to "this ethnic group, " "this ethnicity, " "humans, " and similar generic terms.We observed minimal reactance.A few Sample 1 participants used presumably facetious responses (e.g., "athletes").A very small number of participants explicitly objected, for instance stating "white. . .I [sic] not offensive" in a White Target condition.In short, the Language Ban conditions did not make the task impossible for participants although language use often became vague or awkward, and most participants simply substituted alternative words.

Other e ects (downstream consequences) E ects of language constraint condition
For all remaining variables, we worked through a common set of analyses (full statistics reported in Tables 2, 3, respectively).First, we wanted to examine if our language compliance conditions caused negative reactions in participants.One-way ANOVA tests determined if any of the specific language constraint conditions produced unique effects compared to the rest (or vs. the Control condition).As Table 2 reveals, the effects for most variables were not significant; the only exception was task comfort, which we discuss briefly below.In sum, we did not find support for either the weak (H1) or strong (H2) reactance hypotheses.

Task comfort
We found significant evidence that task comfort differed across conditions.Because we did not have specific predictions about which particular cells might differ other than the Constraint conditions presumably reducing task comfort vs. Control, we used Bonferroni corrections to control for multiple testing when examining post-hoc comparisons between cells.The Arbitrary (no justification) condition only significantly differed from positive/deontological (M diff = −0.84,SE = 0.26, p = 0.017).We are not inclined to interpret this effect any further, particularly because of the subsequent Bayesian analysis reported below.

Bayesian analysis
Despite the results reported above, finding null effects in NHST does not provide clear support for the null hypothesis (see Wagenmakers et al., 2011).Furthermore, we were unclear how seriously to take the single significant effect on task comfort.Accordingly, we checked if we had accrued meaningful support for a null hypothesis.Therefore, we performed a Bayesian analysis focused on determining the Bayes Factor associated with support for the null over the alternative hypothesis using the anovaBF function from the BayesFactor package (Morey and Rouder, 2022) in R (R Core Team, 2022).We used the package's prior probability defaults because these defaults were specifically developed to be "general, broadly applicable" (Rouder et al., 2012, p. 356).Conventional standards suggest that Bayes Factors between 1 and 3 provide "anecdotal" support for a hypothesis, 3-10 provide "moderate" support, 10-30 offer "strong" support, 30-100 "very strong, " and >100 "extreme."In Table 2, all BFs are expressed BF 01 , meaning that they express how many times more likely the data are under the null over the alternative hypothesis (i.e., larger numbers indicate greater support for the null hypothesis).In sum, the preponderance of evidence moderately to greatly favored the null hypothesis, H0.Additionally, the various framing conditions (i.e., Q1-Q3) did not make a substantive difference.
Bayesian analyses supported the null hypothesis, usually strongly.In total, one Bayes Factor was in each of the "moderate" and "strong" evidence ranges, three in the "very strong, " and seven in the "extreme" range.These analyses increase our confidence that despite creating substantial compliance among participants (at least two-thirds of participants always complied when directed, as discussed previously), our compliance request did not make them uncomfortable or feel reactance, did not lead to attitude or behavior-intentional backlash effects against the group protected by the language compliance instructions, did not shift people's motivations to control prejudice to a more external basis, and did not alter people's beliefs about the (lack of) value in using censorship to protect vulnerable social identity groups.

Overall e ects of constraining language
Perhaps by including so many conceptually diverse language constraint conditions, we overlooked a specific contrast of interest: whether imposing language constraint instructions at all had aversive effects on people compared to the control condition which imposed no such rules.In Supplementary material (SOM-3), we test this possibility by comparing the Control group against an aggregation of all the Compliance groups, examining this contrast with independent-samples t-tests, Bayesian t-tests, and equivalence tests (Lakens, 2017;Lakens et al., 2018).Briefly, all significance tests were non-significant, and Bayesian tests produced substantial evidence favoring the null for all variables.Furthermore, Of course, this contradicts the NHST testing in the case of comfort because Bayesian testing supported the null and NHST supported the alternative hypothesis.Data that reaches statistical significance in NHST while also supporting the null over alternative hypothesis are not necessarily surprising; for example, see Wagenmakers et al. ( ).It is most easily explained by observing that the e ect was very small and detected because of our large sample size.
equivalence tests run against two benchmarks (and a theoryderived effect size of d =0.41; the so-called "small" effect size of d = 0.20, Cohen, 1988, also see Richard et al., 2003) provided us with statistically significant basis to say that if constraining language had effects on any of our diverse outcomes, these effects may be ignored as unimportantly small by two distinct standards.

Moderation by censorship beliefs
Finally, we wondered if our usage of university students may have steered our results toward placidity.Universities including the one at which we collected data promote diversity, and students might therefore find even strongly-worded and unusual requests to avoid certain language choices acceptable and normal.We therefore tested the possibility that despite a general absence of concerning effects of language constraints, effects might at least emerge among the subset of participants least in favor of censorship in the name of social identity goals.We, therefore, tested if any effects of language constraint (vs.control) were moderated by censorship beliefs (H3).We analyzed this research question using standard OLS regression but also the Bayes Factor associated with each parameter (contrast-coded main effect of constraint, centered main effect of censorship beliefs, and their interaction) tested against an intercept-only model.All results are reported in Table 3.The BF is always reported as >1 to ease comparison, so we note whether the BF supports the null (i.e., was BF 01 ) or the alternative hypothesis (i.e., was BF 10 ).
For 10 of 11 interaction terms, we found a non-significant effect and moderate or greater support for the null over alternative hypothesis using Bayesian analysis.Thus, even participants who saw themselves as relatively unsupportive of censorship to benefit progressive goals were indifferent to our language constraint intervention.In sum, most variables did not support H3.
In the case of future compliance, however, we detected a significant interaction effect, also corroborated by "very strong" support for the alternative over null hypothesis in Bayesian testing.This interaction is also tracked in Figure 1.As the figure illustrates, the effect of experimental condition (constraint conditions in blue, control condition in red) shifted based on participants' pro-diversity censorship beliefs.Participants lower in pro-diversity censorship (left side of figure) anticipated less compliance after undergoing a constraint manipulation whereas participants higher in pro-diversity censorship (right side of figure) anticipated significantly greater compliance after undergoing a constraint manipulation.A Johnson-Neyman analysis indicated that Constraint manipulations (vs.Control) prompted significantly less anticipated compliance among the 14% of participants most anti-censorship, and significantly more anticipated compliance among the 42% of participants most favoring pro-diversity censorship.
Furthermore, we found that people more supportive of censorship in the name of social justice (i) were also more willing One may wonder if censorship beliefs truly were a moderator rather than a consequence of the compliance manipulation.Because Hence and found a high (r = .) three-week test-retest reliability of their scale, we assumed these beliefs have substantial trait-like variance and would probably not change based on our manipulation.As we earlier showed (Table ), the manipulation indeed did not change these beliefs.
Frontiers in Social Psychology frontiersin.org

FIGURE
Willingness to follow future language constraints is influenced by language constraint condition and pro-diversity censorship support.
to comply with future language control instructions, (ii) had more positive opinions of the target groups, (iii) had more positive behavioral intentions toward the target groups, (iv) were more intrinsically motivated to control for their prejudices, and (v) had more identified-regulation motivation to control prejudice.A final effect suggested that censorship beliefs might be favorably related to (vi) avoidance of the target words, but the Bayesian analysis suggested this effect was so weak as to be more consistent with the null than the alternative hypothesis.These relationships are generally in line with our expectations, and help to establish that our novel measure of censorship attitudes showed sensible patterns of validity.That is, someone who has positive views of promoting diversity with authoritarian means would be likely to be personally okay with complying with new language rules, report favorable attitudes and behavioral intentions social groups identified as needing such protection, and have more positive and morally based desires to deal with prejudice.

General discussion
In summary, our four experiments covered diverse variables, targets, analyses, and message types.However, the key finding is straightforward: university students willingly followed arbitrary and frustrating language directives simply because we told them to.Participants readily adopted our new language conventions even when we gave no rationale whatsoever.There were no negative emotional or attitudinal shifts, problematic motivational styles, or adverse consequences observed across nearly a thousand participants in various analyses (Frequentist and Bayesian).People's beliefs about censorship had surprisingly little impact, except for those supporting censorship to promote diversity, who showed increased willingness to follow our future directives.Thus, participants complied with the act of abandoning frequently used words, effectively capturing the phenomenon of novel changes to group labels commonly seen in modern society.

Implications
On one hand the present results might be considered concerning.The vast majority of undergraduates obediently followed nonsensical instructions to avoid evaluatively-neutral words without resistance.Avoiding terms like "white" or "Caucasian" because we arbitrarily banned them as "offensive" might be seen as problematic.Students extreme malleability could be seen as a lack of critical discernment regarding reasonable vs. unreasonable language requests.
On the positive side, our data suggest real-world language constraints need not always lead to psychological issues among university students, even for those opposed to censorship.We intentionally created a strong situation to provoke backlash, including proscribing language, banning conventional words, and demanding compliance toward a group perceived as not needing support.Despite this setup, we observed minimal problematic reactions.Therefore, it's unlikely that real-world interventions, which offer alternatives, target disfavored language, and protect minoritized groups, would cause issues.
Obviously, a large literature on reactance and autonomy threats supports that people are often resistant or overtly hostile to attempts made to change their speech, attitudes, and behaviors (Brehm, 1966;Worchel et al., 1975;Brehm and Brehm, 1981;Dillard and Shen, 2005;Chen et al., 2015;Munger, 2017;Graupmann, 2018).Therefore, it is worth asking what considerations of the present work led to conditions in which people placidly tolerated a novel (and arguably absurd) demand to change their language.
One consideration is that university students are very frequently exposed to novel compliance requests about appropriate language usage (Roberts, 2017;Macnamara, 2022; for example news stories, see Anderson, 2022;Price, 2023), and are sometimes even paid to confront inappropriate language on campus (Coughlan, 2020).Thus, compliance in this domain may be a well-formed habit for students.When people become accustomed to a social norm involving a behavior such as language change, they may become highly open to additional revisions in "approved terminology."In essence, having grown accustomed to one specific form of compliance, subsequent compliance requests may be highly successful ( i.e., the foot-in-the-door technique; Freedman and Fraser, 1966;Burger, 1999;Pettigrew and Tropp, 2006), a phenomenon that our paradigm perhaps exploited.Past scholars have suggested that foot-in-the-door may work because consistent compliance is motivated by self-enhancement: one's past behaviors, being one's own, are perceived as good, making the present (similar) behavior also seem good (Cialdini and Goldstein, 2004).This may help to explain the lack of any negative psychological reactions from participants' (continued) compliance.However, since past compliance research seldom assesses psychological reactions directly, our work contributes to this area by measuring whether compliant participants felt any psychological resistance.
Another consideration is that people have strong desires not to be bigoted, and not to seem bigoted (e.g., Devine et al., 2002;Legault et al., 2007;against racial minorities).Compliance in our paradigm (i.e., avoiding words that we stated were offensive) might be perceived as consistent with either motivation.That is, whether a given participant was primarily motivated to merely avoid seeming bigoted, or whether they actually wished not to be bigoted, compliance was presumably a safer choice than resistance.

Limitations and future directions Stakes and framing issues
Our paradigm can be considered low-stakes for participants in some respects.That is, they did not have to interact with other people while following language directives, and they were free to disregard the instructions outside of the experiment (which we made clear in the debriefing but would have been true regardless).We cannot entirely dismiss that they complied because they simply did not care very much about the task, but we do have a few counterpoints.First, the interaction of censorship beliefs by constraint on participants' intentions to follow future language directives suggests that the intervention was psychologically real enough that people's core values around speech and censorship polarized people's response in terms of behavioral intentions to comply later.Behavioral intentions, such as our willingness-tocomply measure, often predict behavior reasonably well (Webb and Sheeran, 2006), so we consider this finding to be noteworthy.Second, even if participants' compliance represented a superficial normative conformity to instructions (i.e., "I'll comply to not make waves"), normative influences often provoke changes in internal construal (Griffin and Buehler, 1993), so the high compliance rates might nonetheless be consequential.
One possibility is that our framing manipulations were ineffective because of their particular wording choices.The manipulations do have ecological validity in that they were directly modeled on real-world directives from companies (e.g., Indigenous Corporate Training Inc., 2016) and relevant organizations (e.g., media reference guides from GLAAD; e.g., GLAAD, 2024) that often use positive/negative and consequentialist/deontological framing and/or arbitrary justifications when arguing for appropriate speech.Thus, they validly represent the sort of messages commonly distributed to the public and capture the spirit of how such messages are circulated.Furthermore, often the specific wording of justifications is not what matters in compliance paradigms, but the mere provision of "any" reason (Langer et al., 1978).Nonetheless, more carefully optimized versions of our framings might produce different results.One interesting difference between our stimuli and the raw material is that frequently these sources refer to multiple reasons to follow language constraints.Future research could examine if more compliance or downstream consequences differ when multiple justifications are combined in the same intervention.
Might others resist more (or even less)?
Our sampling was narrowly focused on university students in a Western context.Future research might tackle this limitation by examining a few types of heterogeneity.First, past research suggests that cultures vary in the extent to which they cultivate a need to follow one's preferences.Savani et al. (2008) found that the association between preferences and choices was very pronounced among North Americans, but was attenuated among Indians.Assuming that people's conventional labels for groups can be considered a personal preference, examination of non-Western cultures that privilege personal preferences less could lead to higher compliance rates when securing agreement to proposed language changes (or given our very high overall compliance, greater compliance to more strongly-worded or invasive forms of the intervention).
Second, we examined beliefs about pro-diversity censorship because we wanted an individual difference moderator that was maximally likely to alter reactions to our language constraints.However, future work should draw samples that include a broader political spectrum including political conservatives, who often lodge objections against politically correct speech and might therefore react more negatively to language constraints (e.g., Fish, 1994;Wilson, 2020).Relatedly, our sample was primarily young adults who more often are strongly politically left or progressive (Electoral Calculus, 2019).Thus, our effects might have been stronger than if we had sampled older adults, given that progressive people might be more sympathetic to the ostensible intentions of our language compliance paradigm.Indeed, Proulx et al. (2022) showed that "mandated diversity" (which could include language constraints) is one of the beliefs that distinguishes these two leftwing groups (i.e., traditional liberals and political progressives).
Third, our samples were also mostly women.Laboratory research sometimes finds higher compliance rates from women vs. men (e.g., Tom and Granié, 2011) across miscellaneous domains, but such differences are usually modest (Eagly, 1983;Grosch and Rau, 2016).So gender probably does not account for our findings or limit their generalizability.Still, replicating our studies with more representative proportions of older conservative men and especially in a non-university context would be expected to change our baseline compliance rates.

Collective autonomy threats
In our experiments, participants were isolated individuals exposed to language compliance requests.However, one can also imagine contexts in which a group of people might be exposed to a compliance request to use or avoid specific group labels.Interestingly, restricting groups' ability to communicate can result in decreased wellbeing for the people experiencing such constraints (Kachanoff et al., 2019(Kachanoff et al., , 2020(Kachanoff et al., , 2022)).Therefore, shifting our research to a context of group discussion, in which the compliance request might be seen as an imposition on the group's autonomy, might generate collective autonomy threats and therefore stimulate more negative reactions.Researchers can thus continue to explore what determines whether people will comply with vs. resist language directives.

Conclusion
Public debates on free speech, language constraints, political correctness, etc., surpass the scope of a single research article.We aim to test claims from popular and academic sources on these matters.Our goal is not to take a stance on language constraints.The current findings represent an initial investigation into language control's downstream consequences.We hope these results spark more interest and provide evidence-based insights into heated social questions.
TABLE Samples used to construct the integrative data analysis.
TABLE E ects of specific constraint condition vs. control condition on all study outcomes.
TABLE E ects of (any) constraint condition vs. control on study outcomes, moderated by pro-diversity censorship beliefs.