Walking the Plank: An Experimental Paradigm to Investigate Safety Voice

Noort, Mark C.; Reader, Tom W.; Gillespie, Alex

doi:10.3389/fpsyg.2019.00668

METHODS article

Front. Psychol., 02 April 2019

Sec. Organizational Psychology

Volume 10 - 2019 | https://doi.org/10.3389/fpsyg.2019.00668

Walking the Plank: An Experimental Paradigm to Investigate Safety Voice

Mark C. Noort^*

Tom W. Reader

Alex Gillespie

Department of Psychological and Behavioural Science, London School of Economics and Political Science, London, United Kingdom

The investigation of people raising or withholding safety concerns, termed safety voice, has relied on report-based methodologies, with few experiments. Generalisable findings have been limited because: the behavioural nature of safety voice is rarely operationalised; the reliance on memory and recall has well-established biases; and determining causality requires experimentation. Across three studies, we introduce, evaluate and make available the first experimental paradigm for studying safety voice: the “Walking the plank” paradigm. This paradigm presents participants with an apparent hazard (walking across a weak wooden plank) to elicit safety voice behaviours, and it addresses the methodological shortfalls of report-based methodologies. Study 1 (n = 129) demonstrated that the paradigm can elicit observable safety voice behaviours in a safe, controlled and randomised laboratory environment. Study 2 (n = 69) indicated it is possible to elicit safety silence for a single hazard when safety concerns are assessed and alternative ways to address the hazard are absent. Study 3 (n = 75) revealed that manipulating risk perceptions results in changes to safety voice behaviours. We propose a distinction between two independent dimensions (concerned-unconcerned and voice-silence) which yields a 2 × 2 safety voice typology. Demonstrating the need for experimental investigations of safety voice, the results found a consistent mismatch between self-reported and observed safety voice. The discussion examines insights on conceptualising and operationalising safety voice behaviours in relationship to safety concerns, and suggests new areas for research: replicating empirical studies, understanding the behavioural nature of safety voice, clarifying the personal relevance of physical harm, and integrating safety voice with other harm-prevention behaviours. Our article adds to the conceptual strength of the safety voice literature and provides a methodology and typology for experimentally examining people raising safety concerns.

Introduction

The term safety voice describes the behaviour of raising, or withholding, safety concerns to prevent physical harm from hazardous situations (e.g., Tucker et al., 2008). Across organisational (e.g., healthcare, energy), family (e.g., transport, DIY), and leisure contexts (e.g., high risk sports), promoting the act of raising of safety concerns can reduce people's exposure to hazards (e.g., medicine dispensation, dangerous driving, high-altitude climbing without proper gear), and prevent physical harm (Anicich et al., 2015; Manias, 2015). The absence of speaking-up, also termed safety silence (Barzallo Salazar et al., 2014), has been implicated in catastrophes such as the 1983 Challenger disaster (Moorhead et al., 1991) and 2010 Deepwater horizon oil spill (Reader and O'Connor, 2014), and is estimated to be involved in 25% of aviation accidents (Tarnow, 1999; Bienefeld and Grote, 2012).

Due to the difficulty of observing safety voice in safety-critical situations, academic safety voice publications tend to present data obtained through report-based data (e.g., surveys, focus groups, interviews, vignettes; Noort et al., submitted) in which individuals or their seniors report on behavioural responses to previously held or imagined safety concerns. Yet, it remains unclear whether data from reports is reflective, explanatory, and predictive of safety voice behaviours. Alternative approaches are required to study the conditions and ways through which people raise or withhold safety concerns, and to address this, we propose and test the first experimental paradigm for investigating safety voice. Through investigating the occurrence of safety voice behaviours in a laboratory setting, and the challenges in assessing these, we aim to establish a methodology for (i) observing the behavioural nature of safety voice; (ii) reducing the methodological reliance on memory and imagination; and (iii) advancing knowledge on the factors that predict safety voice.

Safety Voice: The Need for an Experimental Approach

The term “safety voice” is used as a broad label to, confusingly, encompass a behaviour and its counterpart: safety voice (i.e., raising safety concerns) and safety silence (Van Dyne et al., 2003; Okuyama et al., 2014; Manapragada and Bruk-lee, 2016; Morrow et al., 2016). “Safety voice” often relates to raising safety concerns, which is the act of speaking-up about safety issues, through informal or formal communication channels, to a variety of targets (e.g., management, co-workers, the public), with the intention to mitigate harm from a situation perceived to be dangerous (Tucker et al., 2008). Through doing this, people communicate safety issues with the aim of creating a shared perception of the risk and, ultimately, avoiding the danger (Okuyama et al., 2014). Safety silence, the “non-voicing” type of safety voice, is defined as the active withholding of safety concerns (e.g., Okuyama et al., 2014), and is thus different from the simple absence of speaking-up: this can follow from not having safety concerns (i.e., “unconcerned silence”).

The concept of safety voice emerged from the literature on employee voice and silence (Van Dyne et al., 2003; Morrison, 2011, 2014), and appears similar. Yet, voice behaviours (in the broadest sense) can be distinguished based on message content (e.g., Morrison, 2011; Liang et al., 2012), and the safety voice literature is characterised by a narrower concern (i.e., limited to prohibiting harm from safety issues), broader application (i.e., beyond organisational environments), more severe outcomes (e.g., fatalities), and has established different antecedents across levels of analysis (e.g., expected impact of harm, safety knowledge, workload, national culture; Noort et al., submitted). The message content of safety voice relates to the avoidance of harm based on perceived risks, and arguably types of harm may be distinguished: the prevention of physical (e.g., injuries, accidents), psychological (e.g., bullying, harassment), social (e.g., ostracism, unpleasant interactions) or ethical harm (e.g., loss of autonomy; Marshall, 1996). These issues are important to safety voice researchers and practitioners as they can contribute to unsafe outcomes (e.g., bullying can create a poor safety culture), yet physical harm may be easiest to operationalise (i.e., it is closer to a hazard, less ambiguous, easiest to manipulate), and other types of harm may occur beyond (potential) hazards.

Researching safety voice for academic or practice-based purposes is complex due to the elusive and sensitive nature of the phenomenon. Safety voice is a spontaneous response to hazards occurring in natural environments (e.g., wobbly stepladders, incorrect aircraft atmospheric pressure settings), and systematic behavioural observations can provide valuable insights into the dynamic social and physical context in which people raise safety concerns (Reiss, 1971; Mulhall, 2003; van Schagen and Sagberg, 2012; Rydenfält et al., 2015), real-time patterns of behaviour (e.g., attention; Waller and Kaplan, 2016; Lappi et al., 2017), demographic variations (Pérez-Tejera et al., 2018) or how people feel and act when they speak-up without having to rely on post-hoc reports (e.g., Mastrofski et al., 1998; Murphy and Dingwall, 2007), and may reveal stronger effects (Brodin et al., 2016). Yet, within natural environments, it is difficult to (i) observe short-lived and spontaneous behaviours that may not occur frequently (Mastrofski et al., 2010) in a resource efficient way (i.e., many resources are needed to capture brief moments of speaking-up/remaining silent; Reiss, 1971), (ii) record behaviours in a standardised way (e.g., across unsafe situations), (iii) assess the riskiness of a situation and whether people are withholding a safety concern (or did not understand the gravity of the situation), or (iv) ensure participants are not changing their natural behaviour (Nichols and Maner, 2008). A notable exception to these limitations of naturalistic observations are cockpit voice recordings, but to-date they have received limited empirical study in terms of safety voice (cf. U. Fischer and Orassanu, 2000).

To overcome the challenges of observing safety voice, practice-based investigations (e.g., inquiries, accident investigations; Rogers, 1986; Francis, 2013, 2015) and the vast majority of academic investigations into safety voice (i.e., a systematic review indicated 76% of academic publications; Noort et al., submitted) utilise methodologies that obtain data from participant reports on whether they or their supervisees raised or withheld safety concerns. For example, through participants providing statements during inquiries (e.g., Francis, 2015), stating their imagined response to a vignette scenario (Schwappach and Gehring, 2014c), recalling scenarios in which they held a safety concern and communicated this to others (e.g., Schwappach and Gehring, 2014d), or completing survey scales that elicit agreement with statements about imagined or generic scenarios (e.g., “I chose to remain silent when I have concerns about patient safety”; Delisle et al., 2016; Gkorezis et al., 2016). Applications of these methodologies for academic and non-academic purposes have enabled the identification of lay rationales for safety voice, contributing factors to major incidents, cross-sectional comparisons (e.g., across organisational departments), and testing of interventions to alter lay perceptions of the likelihood, with practitioners supplementing academic conclusions through providing better access to people involved in incidents, subject-matter experts, and faster publication of lessons learned (when conclusions are published).

However, there are limitations in the use of report-based methodologies to investigate safety voice. Reports have limited applicability for addressing situational factors (e.g., personal relevance of risk, group dynamics, previous history of raising safety concerns) and mechanisms (e.g., decision-making on risk) that can shape safety voice, and perhaps paradoxically, request people to speak up about whether they remained silent. Reports on safety voice are always at least one-step removed from the actual behaviour of raising or withholding safety concerns, are over-reliant on imagining or recalling behaviours, and cannot provide predictive insight into how safety voice relates to antecedents and outcomes. Accordingly, the validity of the research remains uncertain, and alternative methodologies focussing on actual behaviour are required to validate findings, and evidence interventions. The use of experiments in related domains (e.g., bystander intervention; P. Fischer et al., 2011) suggest these methods can provide a way to overcome the unique challenges of studying safety voice in hazardous situations. We thus propose that the shortfalls of safety voice methods (summarised in Table 1) can be overcome through the development of an experimental methodology that: (i) captures the behavioural nature of safety voice; (ii) avoids the reliance on memory and imagination; and (iii) explores the relationship to other variables as potential causes.

TABLE 1

Table 1. Methodological shortfalls, needs, and experimental solutions for the investigation of safety voice.

The Behavioural Nature of Safety Voice

Research on safety voice has emerged due to recognition that, in high-risk situations, raising concerns is critical to avoiding accidents. Case study investigations have revealed acts of raising and withholding of safety concerns as critical determinants of harm in dangerous situations (e.g., Moorhead et al., 1991; Cocklin, 2004), and the phenomenon is highly behavioural. It typically involves an individual (e.g., an employee, patient, concerned stakeholder) having a concern about a safety issue, and then raising it with another party (e.g., supervisor, doctor, colleagues) in order to prevent harm, or holding back from raising the concern altogether (silence). Yet, and despite the recognised importance of raising safety concerns for avoiding accidents (and silence in allowing accidents; Moorhead et al., 1991; Tarnow, 1999; Francis, 2013; Reader and O'Connor, 2014), investigations into this phenomenon have frequently assumed that reports correspond to real-world behaviour, and are subject to the same mechanisms that drive safety voice (Del Boca and Noll, 2000).

This is problematic because of: (i) the often-observed gaps between reports and actual behaviour (e.g., Sheeran, 2002); (ii) the lack of behavioural data upon which to base findings and interventions (Weathington et al., 2010); and (iii) the low fidelity of actions and context (i.e., operationalisations do not correspond to the behaviour and risky environment; Stoffregen et al., 2003). Accordingly, it remains unclear to what extent safety voice behaviours differ from report-based data and should be observed directly, in a standardised way (i.e., reports may not acquire behavioural data; Shortfall 1), or conceptualised, operationalised, and measured as emerging from clear hazards that cause safety concerns (i.e., safety concerns have not been measured alongside safety voice behaviours; Shortfall 2). Establishing this is important for generating accurate baseline data on safety voice (e.g., the average rates of people that are concerned about a hazard and speak-up or remain silent), clarifying the relationship between presented hazards and the extend that these cause concerns, and for generalising and predicting safety voice behaviours.

However, to date, 76% of the safety voice literature (Noort et al., submitted) has focussed on willingness to raise safety concerns in general (e.g., agreement to generic questionnaire items), post-intervention changes in safety voice, or the extent of safety voice in response to presented hazards without measuring safety concerns (yet for a safety concern item, see: Schwappach and Gehring, 2014c). For example, high-fidelity training simulations (e.g., Hanson, 2017) have specified safety voice as a trainable behaviour, whilst only measuring changes in safety voice in pre- and post-training questionnaires, and studies that have exposed participants to (perceived) hazards such as a senior person engaging in unsafe acts (e.g., Barzallo Salazar et al., 2014; Aubin and King, 2015) or medical emergencies (Reime et al., 2016) have assumed that such hazards should trigger safety concerns (yet do not measure this). Furthermore, where observational data on safety voice has been obtained, measurements have included safety voice into higher level codes (e.g., “team cooperation”; Hughes et al., 2014; Reime et al., 2016), focussed on a tendency to speak-up or remain silent without measuring safety concerns (Kolbe et al., 2012, 2014), assumed knowledge about hazards or their presentation elicited safety concerns (Barzallo Salazar et al., 2014), or presented multiple hazards at once (Hodges, 2018). To our knowledge, no studies have investigated the relationship between observed levels of safety voice and reported safety voice, or to measured safety concerns.

This is important, because safety voice is a highly contextualised behaviour: it is assumed to occur in response to the perception of safety being threatened within a particular context (e.g., cockpit, operating theatre, production line) that can be highly ambiguous (e.g., contrasting information, multiple hazards) and complex (e.g., March and Olsen, 1975). Without collecting data on perceptions of risk within a given context one cannot (i) compare across hazardous situations (e.g., threats to patient and aviation safety; Tamuz and Thomas, 2006) and (ii) make assumptions about why someone may have remained silent (i.e., unconcerned silence vs. withholding of safety concerns due to fear of reprisals), or (iv) ascertain whether voice occurred due to concern or precaution (i.e., unconcerned voice). Whilst self-report studies can provide insight on general tendencies for safety voice, insights on how safety concerns elicit safety voice behaviours remain minimal (cf. Schwappach and Gehring, 2014c), and behavioural studies have not measured the risk perceptions of the participants being observed.

To study similar phenomena in other fields, experimenters have designed standardised situations for eliciting participant behaviour: for example bystander interventions (for a meta-analysis see: P. Fischer et al., 2011) or defiance/resistance to authority (Milgram, 1963; Miller et al., 1995; Kaposi, 2017). Within the field of voice more generally, experiments have been used to investigate employee voice for volunteering non-safety related information (Morrison et al., 2015). What is common to these studies is that they create a high-fidelity illusion of an emerging problem that requires a behavioural response (e.g., helping a person falling victim to verbal abuse in a bystander scenario; P. Fischer et al., 2006) without endangering participants. Their benefit is that they allow for a behavioural phenomenon to be investigated in a highly controlled environment, with observations then being contextualised to specific scenarios.

To investigate safety voice, a similar approach would be beneficial, with participants engaging in standardised situations that create a safety concern that can be addressed through speaking-up. This is challenging because participants cannot be exposed to genuine physical harm and, to avoid observer effects and study naturalised behaviour (Nichols and Maner, 2008), participants should not be aware that their decisions on safety voice are being observed (Shortfall 3). These issues can only be addressed through designing scenarios that manipulate perceived levels of safety (i.e., hazards that elicit a concern, and a need to intervene), not actual risks, while measuring safety concerns and ensuring participant remain naïve to study goals through deception procedures (Weathington et al., 2010). In particular, designing plausible cover stories is important: in the absence of these invalid data may emerge because participants (i) deduce the hazard is fabricated; or (ii) believe (correctly) that researchers would need to comply with ethical standards that would prevent the scenario.

In summary, an experimental paradigm is required to investigate safety voice in a controlled, standardised, and generalisable way. A key property of any such paradigm is that it elicits observable safety voice behaviours (i.e., both raising and withholding concerns) through manipulating perceived risk and ascertaining safety concerns (i.e., as opposed to exposure to real physical harm), with deception procedures ensuring that participants are naïve to study intentions, and thus their behaviour is natural.

The Reliance on Memory and Imagination

Insight on safety voice has largely been generated through recalled or imagined (in)action during hazardous instances. Whilst practice-based inquiries have investigated actual incidents (Rogers, 1986; Francis, 2013), typically, it is assumed that participants are accurate in remembering and generalising past behaviours (e.g., Schwappach and Gehring, 2014d), or can imagine how they would respond in a safety-related situation (Schwappach and Gehring, 2014c). These data are then used to explain the factors that influence safety voice (e.g., Nembhard et al., 2015), to describe its occurrence (Tucker et al., 2008), and predict future outcomes (Blanco et al., 2009). Yet, the validity of this approach is not self-evident, and correlations often low (Reiss, 1971), with participants in report-based studies having been long-shown as unable, or unwilling, to provide accurate data (Bartlett, 1932; Podsakoff and Organ, 1986). Memories are influenced by a limited ability to recall situations: behaviour can be activated by causes outside of conscious awareness at the time of the behaviour such as scents, posters or semantic primes (Aarts et al., 2008; Custers and Aarts, 2010). Furthermore, distances in time, space, or person (e.g., CEOs reporting whether staff in remote locations raised safety concerns for a system introduced the previous year) can further erode data accuracy, and recalling and imagining behaviours is subject to subject-matter expertise and cognitive biases (e.g., availability heuristic; Schwarz et al., 1991). That is, participants may lack knowledge on what constitutes speaking-up, or be unwilling to accurately report safety silence: reports are constructed based on individual attitudes and perceived social norms regarding safety voice (Bartlett, 1932); individuals may experience dissonance between their ideal self-image as able to speak-up and admitting to safety silence (Baumeister, 1982); and social desirability biases half of survey and interview findings (van de Mortel, 2008). For example, desires to appear a good and ethical employee (or effective manager), may bias participants toward reporting speaking-up over safety silence, especially when harmful outcomes occurred in serious or obvious safety situations.

Moreover, recalling and imagining safety voice provides limited scope for exploring the dynamic context in which it occurs. Safety voice surveys, interviews, and vignettes typically aim to increase realism through recall of previously experienced hazardous scenarios or the presentation of scenarios validated by subject matter experts. However, these scenarios remain limited because (i) the hazard environments they present will usually differ in some way from reality (i.e., which hinders recall; Schwabe and Wolf, 2009); (ii) the dynamic of a situation is not present (e.g., task pressures on the participant); and (iii) there are no immediate consequences for participants, or safety, at the time of data collection. This means, as shown in other research paradigms (Milgram, 1974; Blass, 1999), a gap may exist between reports and behaviour, and addressing this is important for establishing the triggers of safety voice (e.g., hazard perception), and the contextual factors (e.g., interactions between people and situations) that determine voice: or, indeed, silence.

The above factors potentially erode the accuracy of safety voice data collected through report-based methods (Shortfall 4), which undermines the validity of conclusions assumed from data (Bagozzi et al., 1991), and more specifically, how safety voice is assumed to be operationalised in risky situations. Addressing this is important for establishing the triggers of safety voice, and the contextual factors (e.g., interactions between people and situations) that determine the behaviour. An experimental paradigm focussed on eliciting safety voice can address this limitation through facilitating observations of safety voice (e.g., at the time of data collection, or through video), ensuring these are reliably assessed (e.g., using inter-coder reliability for the extent to which an individual raised safety concerns), with participant post-hoc reports being matched to behavioural data. To achieve this, and undertake meaningful statistical analyses, safety voice experiments need to elicit both safety voice acts (i.e., raising a concern) and silence. Floor (i.e., near-complete silence) and ceiling effects (i.e., near-complete voice) can bias estimates of the behaviour (Shortfall 5), with information about change (e.g., through interventions) being lost at the extreme ends of the scale through data censoring (i.e., relevant data falling beyond the scale end-point; Cox and Oakes, 1984). Though statistical procedures are available (McBee, 2010), a successful experimental paradigm should produce sufficient statistical variance and a moderate degree of speaking-up and silence (i.e., a 50–50 split). Thus, an experimental approach enables direct observations of safety voice behaviours, and provides scope for statistical analyses that can evidence higher construct validity.

The Relationship With Other Variables

Data collection using report-based methodologies typically collect data on safety voice and other variables simultaneously (e.g., in the same survey), and using populations that are not randomised. This limits interpretation of the factors that determine or follow safety voice and silence behaviours.

Investigations using reports provide limited insights into causal relationship between safety voice and antecedents and outcomes (Shortfall 6). Yet, to build interventions, safety voice measures need to establish and replicate causal relationships. Antecedents and outcomes have been linked with safety voice and silence, and evidence suggests that interventions can successfully alter reported levels of safety voice. For example, safety silence increases with perceived social risks (e.g., ramifications of speaking up; Bickhoff et al., 2016), differences in safety knowledge (e.g., Schwappach and Gehring, 2014b), hierarchical power relations (e.g., Seiden et al., 2006), and, conversely, training on why and how to speak up reduces silence (Johnson and Kimsey, 2012; Delisle et al., 2016; Kulig and Blanchard, 2016; Hanson, 2017). Yet, such observations tend to be correlational rather than causal in nature. Additionally, controlled manipulations of safety voice antecedents through vignettes (Schwappach and Gehring, 2014c; Anicich et al., 2015; Aubin and King, 2015) or interventions (Habyarimana and Jack, 2011; Hanson, 2017) are scarce and tend to rely on indirect data rather than behavioural observations.

Furthermore, reports on safety voice may be subject to structural confounds (i.e., variables that are not of interest but covary with independent variables and provide alternative explanations of results; Goodwin, 2008) that may emerge from contextual variables that are introduced through sampling (Shortfall 7; e.g., junior doctors needing longer to accrue subject-matter expertise in part of the included research contexts). To establish valid conclusions, measures need to minimise the influence of confounds and minimise alternative explanations of relationships between antecedents and safety voice and silence. Yet, report-based methodologies have sampled within similar populations (e.g., oncology departments, medical students; Schwappach and Gehring, 2014a; Delisle et al., 2016), and across different populations (e.g., healthcare, construction, retail; Manapragada and Bruk-lee, 2016), and both sampling practices can be problematic because unmeasured and uncontrolled characteristics of contexts (e.g., workload; Nembhard et al., 2015) can provide alternative explanations of patterns in safety voice. Addressing this is important, and a need exists to minimise the influence of unwanted contextual confounds through applying random sampling procedures.

Hence, a need remains to establish methodologies that can address the relationships between safety voice and other variables. The optimal way to achieve this is through safety voice experiments. These can manipulate antecedents (i.e., enabling causal conclusions), randomise participants (i.e., randomising confounds across the groups to eliminate structural influences), and limit participants' influence on hazard mitigation to a choice on whether to speak up. Critical to an experimental paradigm is that participants should not be able to mitigate physical harm through other means than speaking-up: a third outcome variable is created when alternative mitigations are possible (Shortfall 8). This means that, when participants have a safety concern, safety silence can be determined through absence of safety voice. The field experiment by Barzallo Salazar and colleagues (Barzallo Salazar et al., 2014) showed how surgeon communication style predicts medical students' tendency to speak up, yet the field experiment did not assesses safety concerns and thus cannot distinguish concerned and unconcerned silence, and because relationships between psychological variables may not be reliable over time (Shortfall 9; Gergen, 1973) a need remains for available experimental protocols that enable the direct replication and falsification of findings (Earp and Trafimow, 2015) in laboratory settings.

The Current Article

We propose the first experimental paradigm for investigating safety voice in laboratory environments, and establish and evaluate it across three studies in order ensure the protocol meets the nine requirements reported in Table 1 that address the shortfalls of current safety voice methodologies. Through doing this, we aim to advance safety voice research by (i) enabling a behavioural approach, (ii) moving away from a reliance on memory and imagination, and (iii) supporting the investigation of causal relationships between safety voice and other variables, which can be used as a basis for intervention.

Below, we describe the “Walking the plank” paradigm that we have developed for investigating safety voice. We then report on the three studies used to refine and iterate the paradigm, alongside the observations about safety voice yielded from these studies.

The “Walking the Plank” Paradigm

Our proposed paradigm for assessing safety voice, the “Walking the plank paradigm” introduces a decision-point for participants in which they are faced with a hazard (a plank with the potential to break when walked on), and need to decide to either raise their safety concern (and experience any consequences of safety voice) or remain silent and let the situation run its course (with potential harmful implications for victims of the hazard). The paradigm's title is a reference to the naval practice of coercing victims to walk off a plank, plunging into the open sea and certain doom. The parallel is in the fact that perpetrators felt abdicated of responsibility because the victim ostensibly killed themselves (i.e., for onlookers, it was an act of safety silence rather than murder). Our Walking the plank paradigm is generic, and its realistic perceived consequences and randomisation of participants provide for a confound-free assessment of safety voice that enables generalisable conclusions. Before settling on a viable scenario, we considered and abandoned four hazardous scenarios for the experimental investigation of safety voice: crossing a busy road (i.e., the real risk was considerable), faking a terrorist threat (i.e., too politically sensitive; likely to upset participants), interacting with loose electric wiring (i.e., the hazard could be mitigated by the participant through alternative means than safety voice such as unplugging the equipment), and ordering participants to provide approval for future hazardous experiments (i.e., difficult to ascertain risk perceptions; no immediate consequences at time of data collection).

The final scenario involved a person walking across a plank with a perceived low weight limit in the context of an alleged creativity task (the cover story). We chose this hazard because we could manipulate the perception that the plank might break (by having a bendy plank and stating a weight limit) while using a plank that was actually safe. Furthermore, it enables experimental control of variables of interest (e.g., self or other walking on the plank), safety knowledge (i.e., provided information regarding the maximum load of the plank), a plausible cover story (i.e., participation in a creativity task to evaluate and test creative uses of wooden materials), evaluative mindsets (i.e., participants evaluated aspects of the task), standardisation of the hazard (i.e., consistent materials and research assistants), testing of risk perceptions and safety concerns (i.e., perceived maximum load of the plank and the person sitting/walking on it), a straightforward and resource efficient replication by others, and a systematic observation of the linguistic nature of safety voice (this is beyond the scope of the current article). In this article we show that this paradigm meets our nine criteria.

To test the scenario, we iterated it across three studies. Our goal was for the paradigm to meet the nine requirements (see Tables 1, 2) of an effective safety voice experiment. Demonstrating and reporting on this process is important for (i) enabling the effective application of the Walking the plank paradigm (e.g., it highlights potential challenges for future research), (ii) supporting open science (i.e., protocol histories enable more direct replication; it acknowledges safety voice experiments are challenging and that the final version emerged from addressing this) and (iii) supporting future research on safety voice (i.e., it illustrates how amendments to the paradigm can be made and evaluated).

TABLE 2

Table 2. Protocol characteristics of study 1, 2, and 3.

Through the course of three studies (their characteristics are summarised in Table 2), we illustrate that the Walking the plank paradigm meets the requirements for safety voice and silence experiments. In brief, in study 1 we demonstrate that the paradigm can elicit safety voice behaviours in a safe, controlled and randomised laboratory environment. In study 2 we refine the protocol and demonstrate it is possible to elicit safety silence. In study 3 we further refine the protocol to enable sufficient risk perceptions and explore the nature of safety voice behaviours.

Study 1

The aim of study 1 was to establish the protocol for the Walking the plank paradigm (initially “sitting on the plank”), and provide a first evaluation. Within the guise of a creativity task, participants experienced a perceived hazard designed to elicit safety voice behaviours (i.e., being asked to sit on a plank with a risk of breaking under heavy load). The goals of study 1 were to (i) test whether the paradigm could sufficiently elicit safety voice behaviours in response to potential physical harm from breaking the plank; (ii) present a perceived, not actual hazard; (iii) observe safety voice directly; (iv) apply participant randomisation and deception procedures; and (v) introduce the experimental manipulation of variables (i.e., minimising harm, hazard presentation, hazard awareness, deception, victim identity) for determining safety voice.

Method

Protocol

A 2(safety: unsafe-control) * 2(victim: participant-research assistant) design was employed. Participants were invited to a study about “creativity” and allocated to study conditions using double blind and random procedures. The study consisted of three stages. First, participants completed a 5-min “creativity task” in which they had to design creative uses of a pinewood plank (L: 120 cm, W: 20 cm, H:1.8 cm) and four blocks of wood. The instruction read: “In this room you find a plank and four pieces of wood. In the box below, write down how you could use a plank and four pieces of wood. Try to be creative and think of as many solutions as you can. You have 5 min.” Second, in an interaction with a research assistant, the participants were instructed to undertake and rate the feasibility and creativity of each idea, but were informed that they would test the previous participant's ideas (a standard set: seesaw, shelving, door, juggling, chair/bench, slide) which included a hazardous idea (i.e., “chair/bench”). Upon re-entering the room, the research assistant stated: “The next stage involves testing these ideas for two things: feasibility and creativity. However, your ideas will be tested by the next participant, and now the ideas of the previous participant are tested.” Finally, participants completed an electronic questionnaire (including manipulation checks for hazard awareness and naivety to study hypotheses, and unpresented exploratory variables), after which they received a full debrief.

To present the hazard, and elicit a behavioural response, the instruction for the creativity task included a note on the maximum load of the plank (i.e., “Please note: the plank can carry a maximum load of 45 kg/99 lbs/7.1 stone)”; (unsafe condition), or no additional note (control condition). Furthermore, a broken version of the plank in the room reinforced this information. In reality, the plank was able to hold at least 125 kg. When testing the previous participant's creative ideas, the participant was prompted by the research assistant to place the plank across two chairs (their location marked discretely on the floor) with a gap for a third chair between them. The research assistant then made clear their intention to test the feasibility of the bench through sitting on it (e.g., “Okay, let me test this”) or requested the participant to sit (e.g., “Could you please demonstrate?”). The emphasis of the protocol was to observe any subsequent speaking-up or silence behaviour. The protocol concluded with the participant completing a questionnaire.

Ethical approval was obtained for all studies from LSE's research ethics committee (#000540), and informed consent was required from participants before commencing. To comply with data regulations, anonymous data storage to enable future research was included as a separate question.

Participants

129 participants (N_females = 85, N_students = 98) were recruited from a pool including students and the general public. Participants were spread in age [M_(sd) = 26.57_(7.56)] and weight [M_(sd) = 64.81 kg_(14.41)]. On a 5-point Likert scale (with 1 = low), participants indicated they had no expertise on timber [M_(sd) = 1.67_(1.03)], or whistleblowing legislation [M_(sd) = 1.48_(0.83)], and safety voice did not correlate with demographic variables (i.e., student status, gender, age, social economic status, class, education, expertise on timber/whistleblowing, nationality, language). One participant was dropped from analyses because the protocol was not followed.

Measures

Manipulation checks

A perceived risk was calculated from two items in the questionnaire that followed the scenario (i.e., kilograms of participants' own weight minus the estimated plank's maximum load). This measure addressed that the plank's maximum load would not pose a safety issue without a person sitting on it. One participant's estimation of the maximum load of the plank (i.e., 292 kg) was removed based on a Cook's test identifying the response as an outlier (i.e., for the effect of the safety condition on risk perception; Cook = 0.50). The questionnaire asked whether participants noticed anything odd during the study.

Safety voice

A direct observational measure of safety voice was used. Safety voice (1) was coded if the participant questioned whether testing the bench was a good idea and/or alternative action might be more appropriate (e.g., “Did the instruction not state a maximum of 45 kg?”; “This would be feasible for a child, not for adults”), before the chair/bench was tested. Otherwise the participant's behaviour was recorded as “no voice” (0). Through discussing examples, research assistants were trained to recognise whether statements intended to prevent a situation in which someone sat on the plank and might break it. The first author made a final decision through watching video recordings when research assistants were unsure on how to code participants statements.

Prohibitive employee voice

Three items from Liang et al. (2012) were adapted to the laboratory environment to explore overlap with safety voice (on 5-point Likert scale, with 5 indicating strong agreement): “I pointed out problems when they appeared, even if that would hamper relationships with others”; “I advised others against undesirable behaviours that might hamper the task”; “I highlighted problems that might cause serious issues.”

Results

Manipulation Check

The paradigm's safety manipulation created a perception that sitting on the plank would break it (i.e., weight difference between person sitting and plank's maximum load ≥0 kg). The perceived maximum load of the plank was 13.96 kg lower in the unsafe condition [M_(se) = 48.84 kg_(2.97)], F_{(1, 127)} = 4.39, p = 0.04, η² = 0.03, observed power = 0.55. The perceived risk for the unsafe condition [M_(se) = 19.60 kg_(3.26)] was non-zero, t₍₅₉₎ = 6.00, p < 0.001, higher than the control condition [M_(se) = −1.32 kg_(5.97)], F_{(1, 127)} = 7.72, p = 0.006, η² = 0.06, observed power = 0.79, and led 81% of participants in the unsafe condition (95CI: 71–91%) to think the plank would break, t₍₆₁₎ = 15.94, p < 0.001. Illustrating successful deception, no participant guessed the true nature of the study.

Safety Voice

The safety manipulation successfully elicited safety voice. Whilst some participants raised safety concerns in the control condition (i.e., 20% spoke up; 95CI: 10–29%), t₍₆₆₎ = 3.99, p < 0.001, participants were 2.76 times more likely to raise safety concerns against sitting on the bench when information regarding an unsafe maximum load was provided, Wald(1) = 6.12, p = 0.01. Yet, and despite the success of the manipulation to create risk perceptions for 81% of participants in the unsafe condition, a considerable proportion of participants in the unsafe condition did not raise a concern (60%; 95CI: 48–73%), and this held when participants without a perceived risk were accounted for: 58% (95CI: 44–72%) remained silent about their perceived risk (see Table 3). Furthermore, in the unsafe condition, 33% (95CI: 2–65%) of participants raised a safety concern despite not perceiving a risk, t₍₁₁₎ = 2.35, p = 0.04, and perceiving risk was not related to safety voice, χ²₍₁₎ = 0.30, p = 0.58. However, whilst the safety manipulation caused differences in safety voice, no influence was found on prohibitive employee voice, F_{(1, 127)}s < 1.29, ps > 0.26, and no correlation existed with observed safety voice, rs < |−0.10|, ps > 0.25. This suggests that hazards differentiate safety voice but the relationship between risk perception and safety voice is not straightforward. A need thus exists for improved safety concern measures. Finally, the identity of the victim (i.e., participant vs. research assistant) did not influence safety voice, ns¹.

TABLE 3

Table 3. Safety voice behaviours for Study 1 (unsafe condition).

Discussion

Study 1 demonstrated that the paradigm enables (i) the reproduction of safety voice behaviours in response to a hazard (speaking-up only); (ii) the presentation of a perceived, not actual, hazard; (iii) the direct observations of safety voice; (iv) participant randomisation to minimise alternative explanations; and (v) experimental control over study variables (i.e., minimising harm, hazard presentation, hazard awareness, deception, victim identity). Furthermore, it suggested that the relationship between risk perceptions and safety voice is not straightforward, and participants can remain silent when perceiving a risk, or speak-up when not perceiving a risk.

However, study 1 did not fully illustrate five requirements for safety voice experiments. First, participants raised safety concerns when demonstrating the seesaw and slide ideas, thus presenting multiple hazards and potentially producing unmeasured spillover effects. Second, it was not clear whether the perception of risk made people concerned about the hazard: it is not self-evident that safety concerns emerge from participants' body weight, or that the application of this weight to a plank with a low capacity always leads to concerns, and in order to demonstrate safety silence (i.e., the withholding of safety concerns) experiments need to establish optimal measures to establish safety concerns. This is important, because, third, whilst safety voice behaviours were observed, these emerged for people with and without perceptions of the plank potentially breaking, and in the absence of clear safety concern measures it is unclear whether a lack of voice meant safety concerns were withheld (i.e., participants might not have been concerned about harm despite a perceived likelihood of the plank breaking). Fourth, the proportion of safety voice acts was low and could be improved to prevent floor effects. Finally, when participants were victim, they occasionally mitigated the hazard by keeping weight on their feet and thus not fully sitting on the plank (creating a third outcome variable).

Study 2

Study 2 aimed to address the issues raised in study 1 through amending the risk perception measures to enable the observation of safety silence (i.e., calculated based on the person sitting on the bench and triangulated with an item on having a safety concern); eradicating safety voice for multiple hazards; improving the manipulability of the perceived physical risk to elicit stronger responses (i.e., lowering the weight limit; using a bendy plank; creating sufficient variance in safety voice and silence); and minimising alternative ways to mitigate physical harm following from breaking the plank².

Methods

Protocol Refinements

The protocol in study 1 was followed, albeit with five adjustments. First, the observation of safety silence was enabled through an altered risk perception measure and self-report safety voice questionnaire item to obtain additional data and ascertain whether the scenario led to subjective safety concerns. Second, to increase the perceived risk of physical harm, the maximum load was lowered slightly to 42 kg (93 lbs, 6.6 stone) and the pinewood plank was replaced by a more bendy plywood plank of the same proportions (still capable to withstand at least 125 kg in reality). Third, to eliminate other perceived hazards from the protocol, three ideas (i.e., seesaw, door, slide) were replaced with two new ideas (i.e., mirror, piece of art). Fourth, to ensure that the hazard could not be mitigated through not fully sitting on the plank, the research assistant sat on the plank. Finally, based on a pilot study, only the unsafe scenario was included³.