Phish Derby: Shoring the Human Shield Through Gamified Phishing Attacks

Canham, Matthew; Posey, Clay; Constantino, Michael

doi:10.3389/feduc.2021.807277

ORIGINAL RESEARCH article

Front. Educ., 05 January 2022

Sec. Higher Education

Volume 6 - 2021 | https://doi.org/10.3389/feduc.2021.807277

This article is part of the Research TopicThe Human Factor in Cyber Security EducationView all 6 articles

Phish Derby: Shoring the Human Shield Through Gamified Phishing Attacks

Matthew Canham¹*^†

Clay Posey²^†

Michael Constantino³

¹Beyond Layer Seven, LLC, Oviedo, FL, United States
²Information Systems, Marriott School of Business, Brigham Young University, Provo, UT, United States
³Information Security Office, University of Central Florida, Orlando, FL, United States

To better understand employees’ reporting behaviors in relation to phishing emails, we gamified the phishing security awareness training process by creating and conducting a month-long “Phish Derby” competition at a large university in the U.S. The university’s Information Security Office challenged employees to prove they could detect phishing emails as part of the simulated phishing program currently in place. Employees volunteered to compete for prizes during this special event and were instructed to report suspicious emails as potential phishing attacks. Prior to the beginning of the competition, we collected demographics and data related to the concepts central to two theoretical foundations: the Big Five personality traits and goal orientation theory. We found several notable relationships between demographic variables and Phish Derby performance, which was operationalized from the number of phishing attacks reported and employee report speed. Several key findings emerged, including past performance on simulated phishing campaigns positively predicted Phish Derby performance; older participants performed better than their younger colleagues, but more educated participants performed poorer; and individuals who used a mix of PCs and Macs at work performed worse than those using a single platform. We also found that two of the Big Five personality dimensions, extraversion and agreeableness, were both associated with poorer performance in phishing detection and reporting. Likewise, individuals who were driven to perform well in the Phish Derby because they desired to learn from the experience (i.e., learning goal orientation) performed at a lower level than those driven by other goals. Interestingly, self-reported levels of computer skill and the perceived ability to detect phishing messages failed to exhibit a significant relationship with Phish Derby performance. We discuss these findings and describe how focusing on motivating the good in employee cyber behaviors is a necessary yet too often overlooked component in organizations whose training cyber cultures are rooted in employee click rates alone.

Introduction

Despite significant and increasing organizational spending on cybersecurity technologies and associated efforts, successful threats abound. For example, while organizational leaders are expected to spend more than $150 billion US on cyber and related technologies and services in 2021 (Gartner, 2021), threats related to remote work, cloud adoption, healthcare, and other domains continue to flourish (CheckPoint, 2021). Thus, cyber “solutions” are not always what they appear, and throwing technology at the cyber problem will create rather than solve problems (Schneier, 2015).

An important realization has been that organizational cybersecurity efforts depend largely on the employees who reside within organizational walls. These individuals are central to the effectiveness of organizational actions to protect sensitive assets, and research has shown that they can be detrimental (e.g., sabotage and computer abuse) (Straub and Nance, 1990; Willison and Warkentin, 2013) as well as beneficial (e.g., protective motivated behaviors, precaution taking) (Boss et al., 2009; Posey et al., 2013; Burns et al., 2019) to their employers. Employee actions thus range from accidental errors to malicious acts of sabotage on the negative side and forced compliance to security championing on the positive side.

A specific, significant context where employees continue to affect their organizations is how these individuals respond to phishing attempts that come through corporate email systems. Online phishing is a common attack vector used by external actors to penetrate organizational networks, steal employee credentials, and commit other forms of harm. In fact, more than 90% of malicious software is delivered by email, with personalized phishing attacks (i.e., spear phishing) being the entry gate (Purplesec, 2021). Because of this massive potential for injury, organizations have focused on how best to reduce the risk stemming from employees who encounter and fall victim to phishing attacks. These efforts rely largely on simulated phishing campaigns wherein employees encounter emails that mimic real phishing attacks, and the resulting failure metrics are used to examine progress within an employee base.

Notwithstanding the importance of assessing the number of employees who fall victim to these mock attacks, it is important to note how employees can also have positive reactions to phishing attacks—reactions that alert organizational representatives to the potential threat (Canham et al., 2021). It is unfortunate that many of these positive reactions are often overshadowed by the failures (i.e., successful mock attacks) despite serving as an important warning signal or beacon to the organization that something could be wrong. At a time when cybersecurity remains a top priority for leadership, but funding for the requisite resources is unable to keep pace with the ever-evolving threat landscape, it would serve organizations’ interests to also provide significant focus on the positive spectrum of employees’ cyber behavior.

To increase our understanding of this phenomenon, which we refer to as the “protective steward phenomenon,” we gamified a series of simulated phishing campaigns to see how such an alteration would influence employee cyber behaviors. Gamification refers to the “use of computer games and features of games for non-game purposes” (Fleming et al., 2020, p. 2). These campaigns, collectively called a “Phish Derby” competition, allowed employees to compete against one another in their efforts to detect and create an alert when encountering simulated phishing emails.

Given the evidence showing how the gamification of learning-based exercises can increase participants’ engagement and overall learning (Marín et al., 2018; Groening and Binnewies, 2019), we explored whether and how gamification could be used to foster positive employee reactions and experiences with a form of training (i.e., simulated phishing campaigns) seen by some workers as a source for decreased productivity and increased levels of boredom, anxiety, stress, embarrassment, and even ostracism (Conley, 2021; Emm, 2021; Ferrell, 2021). At the very least, gamification could prove to increase user attentiveness during these activities, which could then possibly translate to better performance during real attacks. In addition, not only was correct identification of phishing attempts important, but given the need for organizations to be able to respond to threats as quickly as possible, employee response times (i.e., time difference between phish receipt and employee alert) were also tabulated. Therefore, our experiment with the Phish Derby and its associated results provides a more holistic view to positive employee behaviors regarding one of the most harmful attack vectors used against modern organizations—online phishing attacks.

Background on Phishing

Since online phishing and its variants, like business email compromise, continue to be successful attack vectors, especially during the COVID-19 pandemic when cyberattacks increased by 600% (Purplesec, 2021), it is no wonder that substantial scholarly attention has been given to phishing attack detection. Unfortunately, when compared to automated, technical-detection solutions, research on human-based detection efforts is more limited and focuses on how training techniques can be leveraged to enhance detection capabilities (Khonji et al., 2013; Zielinska et al., 2014; Wash and Cooper, 2018). Fortunately, research shows some promise in increasing human-detection capabilities via phishing training embedded directly into corporate email systems (Kumaraguru et al., 2007), but even then, employees might not even fully read or pay attention to the training (Caputo et al., 2013).

Complementing the research on human-detection capabilities, recent efforts have drawn attention to all potential employee behavioral responses to email phishing attacks (Canham et al., 2021). By analyzing the responses of more than 6,000 employees at a large U.S. university over the course of 20 phishing training campaigns and 19 months, this effort demonstrated that a small subset of users (6% of the total population of users) were responsible for repeated phishing training failures (i.e., “Repeat Clickers”) and a larger subset (33%) of users (“Protective Stewards”) were responsible for reporting these emails to the Information Security Office. Thus, more employees alert their organizations about potential attacks than succumb to phishing attacks. Unfortunately, this positive-oriented and more sizable employee subpopulation has received relatively limited attention when compared to its smaller and more detrimental counterpart—a concerning trend when so many information security offices are struggling to handle day-to-day operations with limited resources.

One potential way to continue to increase employees’ 1) ability to detect and 2) motivation to report phishing emails might be through the gamification of the mock phishing campaign experience. The addition of gaming elements to non-gaming situations in this and other cyber-related contexts has been explored (Francia et al., 2014; Gjertsen et al., 2017; Emm, 2021; Khando et al., 2021). For example, gamification has demonstrated promise in the education of normal users regarding password security (Scholefield and Shepherd, 2019), and gamified systems can increase motivation to comply with security policy and reduce mock phishing failures, significantly outperforming training provided via email (Silic and Lowry, 2020). Different variations of gamification capabilities have also been examined in the context of employees’ online self-disclosure (Dincelli and Chengalur-Smith, 2020) and corruption behaviors (Baxter et al., 2017). In addition, previous work on gamified systems have relied on both monetary and non-monetary rewards to incentivize participants (Lewis et al., 2016; Karac and Stabauer, 2017; Meixner et al., 2020; Ueyama et al., 2014). It is evident that gamification can be a useful tool in educating and motivating individuals in a variety of contexts.

Given this opportunity, we extended previous research efforts by exploring the factors surrounding employees who actively choose to alert their information security office when they suspect a rogue email in their inbox. In addition, we wanted to determine if employee response times could be incentivized through such gamification. Akin to the field of positive psychology (Seligman and Csikszentmihalyi, 2014), our goal here is to help motivate positive behaviors rather than correct negative actions and understand whether gamification is a fruitful avenue for this objective.

Possible Employee-Performance Factors

To better understand potential variance in employee performance during our Phish Derby, we relied on concepts found in two theoretical foundations. The first foundation is commonly referred to as the “Big Five” personality traits. These traits include extraversion, emotional stability, agreeableness, conscientiousness, and openness to experience (Norman, 1963; McCrae and Costa, 1987). Because so much has been written on these traits, we briefly discuss them here.

Openness is a trait aligned with intellectual curiosity, creativity, and a preference for novelty. Individuals high in conscientiousness tend to be organized, self-disciplined, and have a need for achievement, whereas individuals high in extraversion tend to be socially outgoing, energetic, and seek stimulation. Agreeableness refers to those who tend to be cooperative, helpful, and well-tempered. Finally, individuals exhibiting neuroticism tend to be prone to anxiety and stress, easily experience unpleasant emotions, and be insecure.

The Big 5 has been examined as an influential factor in studies of information security previously (Pattinson et al., 2012; Uebelacker and Quiel, 2014; Halevi et al., 2015; Welk et al., 2015; Lawson et al., 2017; Sudzina and Pavlicek, 2017); however, how these traits influence phishing susceptibility is not always obvious. For example, people high in conscientiousness might be less susceptible to phishing attempts (Lawson et al., 2017), but they might also be leveraged to help an attack become more likely to succeed (Halevi et al., 2015). Regarding phishing vulnerability, research on individuals high in extraversion has shown mixed results. Two studies have shown increased susceptibility to phishing (Welk et al., 2015; Lawson et al., 2017), while another study (Pattinson et al., 2012) showed a better ability to detect phishing emails. Despite these differences, we believe that one or more of the Big 5 components could play an important role in understanding potential differences in our participants’ performance, especially given our unique context of gamification and the inclusion of relatively difficult-to-detect phishing emails in our Phish Derby.

Goal orientation theory (GOT) serves as our second theoretical foundation. This theory explains the reasons why individuals are driven to certain outcomes in achievement-focused tasks. Generally, individuals approach and engage in achievement tasks because they desire to 1) learn (i.e., learning), 2) prove their performance abilities (i.e., prove performance), and/or 3) avoid negative judgments and perceptions of inferiority (i.e., avoid performance) (Brett et al., 1999; Kaplan and Maehr, 2007).

GOT has been used in examining individuals in numerous achievement-focused scenarios. For example, goal orientation concepts have been linked to academic performance, even mediating the relationship between intrinsic motivation and performance (Cerasoli and Ford, 2014). Learning orientations have been linked to expatriates’ academic and social adjustment outcomes (Gong and Fan, 2006), and both learning- and performance-orientation goals have shown relationships with team adaptability when facing adversity (Porter et al., 2010). Finally, research has shown that trait-forms of goal orientation explain employee job performance above and beyond cognitive ability and even the personality variables mentioned above (Payne et al., 2007). Determining whether and how these goals drive Phish Derby performance in general, and in comparison, with the “Big Five” personality traits should prove fruitful.

Gamified Approach

Gamification was achieved via our “Phish Derby” by having participants prove their ability to spot phishing attacks and earn points based upon the number of attacks they successfully reported, as well as how quickly those alerts were issued. To help increase the amount of variance in user responses, the research team utilized very difficult simulated phishing attacks. The KnowBe4 platform was used for the Phish Derby. Participants received monetary prizes (i.e., Amazon gift cards) at the end of the competition, and they also knew that the research team would debrief all who were interested in an online seminar. Potential participants were notified of the Phish Derby a week prior to its beginning through email communication. Participation in the Phish Derby was voluntary, and competitors were instructed that because this was a competition, the simulated phishing emails that they received would be more difficult than the regular training emails that they had received in the past. Information Security Office staff informed volunteers that performance during this Phish Derby would not negatively impact their training requirements (e.g., being required to complete additional training if they fell for a simulated phishing message sent as part of this Phish Derby).

A total of six simulated phishing email templates were utilized for the Phish Derby competition. These six were titled “LinkedIn–People Are Looking at your Profile,” “UPS Label Delivery,” “Test of the Notification System,” “Sarah Butler Sent You a Secure File,” “Knightro’s Halloween Costume,” and “COVID-19 Reported Cases in Your Area.” The “LinkedIn–People Are Looking at your Profile” template purported to notify the recipient that their profile had been viewed and included a hyperlink that falsely claimed to redirect to LinkedIn. “The UPS Label Delivery” template used the pretext of a UPS delivery notification with a hyperlink made that appeared to redirect to UPS. The “Test of the Notification System” template claimed to be a notification test and requested the receiver verify their contact information through a deceptive hyperlink. The “Sarah Butler Sent You a Secure File” template appeared to be a shared document from Sarah Butler, a fictitious university employee. The “Knightro’s Halloween Costume” used the pretext of an invitation to enroll in a university costume contest. The final template, “COVID-19 Reported Cases in Your Area,” used the pretext of discovering reported COVID-19 cases in the area through a deceptive linked portal.

Developing an objective metric of email difficulty is a challenge that the NIST Phish Scale seeks to address. This difficulty scale considers two factors in operationalizing phishing email difficulty: first, the number of phishing “cues,” and second, the email premise alignment with user role (Steves et al., 2020). These factors were derived from previous empirical work demonstrating their central role in phishing email detection (Greene et al., 2018). Cues refer to inconsistencies within, or characteristics of, the message that may alert the target that the message might be a phishing attempt. Examples of cues include spelling and grammatical errors, technical indicators (e.g., a hyperlink mismatch), odd language, and the use of time pressure. Premise alignment refers to the degree to which the message aligns with the recipient’s job role and alludes to the user’s context in evaluating the message. Prior research demonstrates the more highly the message premise aligns with the target’s job role (i.e., a past-due notice sent to the accounting department, or a resume sent to a human-resources department), the less likely people are to notice detection cues in the message (Greene et al., 2018). We applied the NIST Phish Scale of email difficulty to each of the six simulated phishing email templates that we employed in the Phish Derby, and the difficulty ratings for each template are summarized in Table 1.

TABLE 1

TABLE 1. Email template phish scale difficulty with click and report rates.

Experimental Method

In early October 2020, participants completed an initial survey that covered demographic and model variables used in our analyses. A total of 116 individuals took part in the initial survey, but attention-check items indicated that only 101 individuals should remain in the study. These individuals then received six simulated phishing emails to their work email address throughout the remainder of October. Participants explicitly agreed to not use any means (technical or otherwise) that would prohibit a fair competition. Any evidence suggesting use of such methods would result in immediate disqualification from the competition. Participants were instructed to report emails as potential phishing attacks by using an embedded “Phish Alert” button as provided by KnowBe4 or by forwarding the email as an attachment to the Security Incident Response Team (SIRT). All interaction with phishing emails (e.g., email receipt, reporting) took place in the 8:00 am—5:00 pm (participants’ local time) window. The mean age of our sample was 44.4 years, with 40% identifying as female. Twenty-five percent of our sample was in administrative positions, and 10% was in an IT/IS role.

Our research team collected the number of phishing alerts/reports received from participants as well as the timeliness with which those alerts/reports were received. All participants began the competition with 10 “Derby Bucks.” For every simulated phishing email not alerted within 4 h of receiving the email, 1 Derby Buck was subtracted from their total. If the competitor reported the email but only after falling victim to the phishing email, $0.75 was subtracted from their total. This option was available to highlight the fact that while succumbing to a phishing attack is a negative event, it is still of benefit to the organization to report it as soon as possible. No Derby Bucks were removed when participants accurately alerted SIRT within 4 h of receiving the phishing email. At the end of the Phish Derby, competitors received the following rewards based on the total amount of Derby Bucks remaining in their possession:

Competitors with at least 6.00 Derby Bucks were awarded a $5.00 Amazon gift card.

Competitors with at least 7.50 Derby Bucks were awarded a $7.00 Amazon gift card.

Competitors with at least 8.50 Derby Bucks were awarded a $10.00 Amazon gift card.

The dependent variable in our model was a performance score normalized for report timeliness against the average response time for each phishing campaign. This represents the importance of timeliness in reporting potential threats to SIRT. For example, if two participants correctly identified all six phishing campaigns, their initial performance score would equal 6.00, but because their average response times differed, the final performance scores would be adjusted relative to those response times. Thus, instead of both participants receiving the same 6.00 performance score, the faster responder might receive a 5.93 and the slower one a 5.78. Thus, the faster the response, the higher the score (assuming the same number of phishing campaigns was identified). This normalized score was not used in the assignment of Derby Bucks mentioned above due to university institutional review board (IRB) stipulations. Participants were not informed of their performance relative to other competitors either during or after the Phish Derby; however, they were informed of the overall Phish Derby detection and reporting performance after the competition had concluded.

Measures

In addition to the demographic variables, we used previously published and validated scales to capture our constructs related to our two theoretical foundations. The Big 5 Personality dimensions were assessed using the IPIP–NEO–60 scale (Maples-Keller et al., 2019). This scale employs 60 items to infer an individual’s placement along each of the five dimensions on the Five-Factor Personality Scale. Learning (5 items), prove performance (4 items), and avoid performance (4 items)—concepts from GOT—were measured using the 13-item goal orientation scale (Brett and VandeWalle, 1999).

Results

We performed a hierarchical regression analysis where we focused on participants’ demographic variables first and then assessed components related to the Big Five personality traits and GOT. Given that we operationalized these components with previously validated measures, each exhibited adequate internal consistency metrics (α ≥ 0.70). While our sample size is relatively small (n = 101), our statistical power (1-β > 0.99) did not prohibit us from discussing non-significant relationships. What is rather interesting is that such a relatively simple model produced a rather large amount of variance in participants’ performance (i.e., R2 = 52.0%). The mode for phishing emails reported across all participants was 4, and the average response times (in minutes) for the email templates were 97.0, 66.4, 121.3, 189.0, 37.8, and 77.2 for templates 1–6, respectively. The mean normalized performance score (possible range 0–6) was 2.55 during the Phish Derby. Table 2 displays the correlations among our variables, and Table 3 displays our statistical results.

TABLE 2

TABLE 2. Interconstruct correlations.

TABLE 3

TABLE 3. Results from hierarchical regression of normalized performance on demographics and predictors.

In addition to the large R² value, we see several notable relationships between demographic variables and Phish Derby performance. First, participants’ exposure to, and performance during, previous simulated phishing campaigns matter as demonstrated by the significance of the percentage of reports relative to phishing emails received by the employees before entering the Phish Derby. We also see that age becomes a significant variable in our analysis, and the positive beta indicates that older participants performed better in the Phish Derby than did their younger counterparts. On the other hand, the years of education had the opposite effect on performance; more years of education led to poorer performance. Finally, in our assessment of whether participants used PCs or Macs at work, or a mix of both, we found that individuals who use a mix performed worse than those using a single platform.

Outside the demographic variables, we see variables of significance within our two theoretical foundations. First, within the Big Five personality dimensions, two personality dimensions influenced performance: extraversion and agreeableness. Extraversion was negatively associated with phishing email reporting performance (β = −0.181, p = 0.066). This finding is interesting because previous research on those high on the extraversion dimension has been mixed. At least three studies have found increased susceptibility to phishing in those higher in extraversion (Welk et al., 2015; Lawson et al., 2017; Anawar et al., 2019), while another study (Pattinson et al., 2012) showed a better ability to detect phishing emails. Messages that utilize likability as a social influence principle have also been found to be more persuasive to people high in extraversion (Alkış and Temizel, 2015). This aspect of personality needs to be examined more because more extraverted individuals might make more attractive targets for criminals, by virtue of having more connections and thus having more connections to target if their account is compromised. Agreeableness has been positively associated with self-reported cybersecurity behaviors in previous research (McCormac et al., 2017; Shappie et al., 2020); however, we found that higher agreeableness was also associated with poorer phish reporting performance in the Phish Derby (β = −0.261, p = 0.016).

From a goal orientation perspective, we found that participants whose goal was overall learning performed significantly worse than those who identified their goal as performing well in the competition. In other words, those who are trying to better themselves at identifying phishing attacks performed at a lower level than those who cared little about overall learning as their main goal. Goals of “performance proving” and “performance to avoid disapproval” did not exhibit a significant relationship with overall performance.

Finally, participants’ intention to do well in the Phish Derby was close to becoming a significant component in the model but ultimately was not. In the case of the gamified Phish Derby, intentions were not significantly related to performance—perhaps a case of the “knowing-doing gap” (Workman et al., 2008).

Regarding performance on the various templates, most had relatively low click-rates, with the exception of the UPS Label Delivery email template (18% click-rate). This template received the second highest reporting rate (69%) suggesting that it was among the more interactive templates of the Phish Derby. These results are summarized in Tables 4, 5. While the “Secure File Delivery,” “COVID-19 in Your Area,” and “Test of the Notification System” were also highly reported, the “Secure File Delivery” received zero clicks. This may have been due to similar simulated phishing emails having been previously used in training. No significant interactions (in click-rates or report-rates) between the email template types and job role were observed. Overall, the click-rates were relatively low compared to previously observed campaigns (Canham et al., 2021). This might have been the result of a self-selected sample of participants with knowledge of their contest participation.

TABLE 4

TABLE 4. Click-rates by email template and job role.

TABLE 5

TABLE 5. Report-rates by email template and job role.

Phish Derby Participant Comments

In addition to our quantitative assessment, we wanted to determine whether the participants viewed the Phish Derby experience as a success. Fortunately, the comments from participants during the debrief regarding the Phish Derby were overwhelmingly positive, and a sample of them includes the following direct quotes:

“I enjoyed taking part in the phishing derby–seriously a great idea!”

“It was kind of scary though... I would usually delete, but during this I felt like maybe I should report more.”

“After I got caught on the first one, I was much more alert for the rest”

“This was a great way to heighten awareness and learn about different kinds of things to watch for.”

“I thought I was already aware so kinda wanted to test myself.”

“More cautious now.”

“When can we do it again?”

“This was a perfect strategy: educational and fun!”

“It was a great learning experience. Thank you.”

“(I) would love to see this again thank you”

“I appreciate the Derby tests and will stay vigilant!”

Discussion

We aimed to assess whether the gamification of mock phishing exercises would be successful and whether key factors explaining participant performance would emerge. Further, our goal was to highlight employees’ positive, phish-reporting behaviors rather than focus on failure (i.e., click) rates. Our results suggest that gamification can be a useful, interesting, and perhaps even exciting approach to employ in mock phishing exercises—exercises that are usually thought to be intrusive or a waste of time by many employees. Moreover, we were able to determine considerable differences in Phish Derby performance, indicating that some employees are or could become star performers or champions for organizations’ security teams in the quest to quickly identify phishing attempts once they clear technical filters.

From a theoretical standpoint, it was interesting to discover that both extraversion and agreeableness exhibited negative relationships with Phish Derby performance. In phishing-susceptibility research, findings relative to the extraversion personality trait have been somewhat mixed, but at least three studies have found that extraverts exhibit increased susceptibility to phishing emails (Welk et al., 2015; Lawson et al., 2017; Anawar et al., 2019). Our findings suggest that extraverts perform more poorly on the positive-oriented behaviors of reporting as well. On the other hand, agreeableness has shown positive associations with self-reported cybersecurity behaviors in previous research (McCormac et al., 2017; Shappie et al., 2020), which is at odds with our results. We do not fully understand why such is the case from our Phish Derby exercise, but it is possible that individuals who desire to be helpful also do not like to have external pressures to do so. Perhaps the additional pressure of the research team tracking and rewarding response times backfired with participants high in agreeableness. This aspect of our findings and rationale deserves future attention.

An additional finding that surprised us was that the first of the three goal orientations (i.e., learning orientation) was negatively related to participants’ performance. In fact, learning-oriented individuals exhibited higher average response times, meaning that they were slower to report suspected phishing attacks to the organizational representatives. Like those high in agreeableness discussed above, it is likely that learning-oriented individuals wish to devote a reasonable amount of time to delve into an issue, and quick-reaction contexts do not bode well for these people. Conversely, these individuals are likely those participants who attended and actively participated in the Phish Derby debrief to learn of overall response rates and to discuss the possible cues within each of the mock phishing templates. Unfortunately, we did not assess this possibility.

Another surprising finding was that more education was related to poorer performance in the Phish Derby. On the one hand, it could be that those with more education perceived themselves to be more capable than those with less education at identifying phishing threats, thereby presenting a situation of overconfidence. On the other hand, a potential explanation for this finding may be the more highly educated participants might also have had a higher workload and/or received substantially more emails on a daily basis than the less educated participants. This finding deserves more investigation in future studies.

We also found that employees who alternate between PC and Mac systems in their daily job tasks performed worse than those using a single platform. Our rationale leans toward multi-platform users having an increased cognitive load due to the switching between platforms, icon sets, and other platform-dependent idiosyncrasies. If they are subconsciously expending mental energy on the effective utilization of the various platforms, they may not have the same level of energy available to recognize more difficult phishing threats. Of course, this assertion deserves more attention and could make for a very interesting experiment.

Several practical implications also emerged from our Phish Derby exercise. First, some CISOs might wonder whether exposing employees to simulated phishing campaigns works, and whether they should pay attention to individual metrics. In the case of gamified competitions, the answer is “yes.” Competitors’ previous reporting behaviors as a percentage of the total number of phishing campaigns they previously received (made available in the KnowBe4 platform) significantly related to their overall performance (i.e., correct identification and timeliness of report) during the Phish Derby.

Second, anecdotal evidence suggests that some CISOs and other security administrators feel unable to get and maintain older employees’ interest and attention regarding security matters. We did not find this to be the case during the Phish Derby. In fact, we found that the more aged employees performed better than the younger employees. Perhaps the friendly competition provided by the Phish Derby and the focus on protective reporting—a positive focus—appealed more to older employees than would a focus on thwarting phishing failures—a negative focus. We believe that research in the positive psychology movement (Seligman and Csikszentmihalyi, 2014) could help provide additional rationale for why such is the case on this matter.

Third, the individuals who really desired to learn about how to improve their future performance on reporting phishing emails were the ones who performed worse in the Phish Derby. Those individuals who tend to join competitions to show themselves and others how good they are and those who try to do well in achievement tasks to avoid negative judgment did not perform differently than those who do not. Thus, a true learning goal orientation affects performance during gamified phishing competitions. This is not to say that these individuals should not be involved in gamified phishing exercises; rather, organizational leaders should aim to provide a meaningful and engaging debrief that is focused on the needs of these employees. These will be the individuals most interested in understanding the cues and contexts that made the phishing templates difficult to detect.

Finally, of the variables that failed to exhibit a significant relationship with Phish Derby performance, two of the most interesting ones were self-reported computer skills and perceived ability to detect phishing messages. These variables never approached statistical significance in our analyses; thus, they were not included in our findings table. But the lack of findings here indicates that individuals who believe that they can identify phishing emails better than others failed to perform any differently from those who did not. Such is the same with self-reported computer knowledge and skills.

Conclusion

We complemented the online phishing training exercises of employees at a large U.S. university with a month-long gamification experiment. The gamified experience focused on the positive reporting behaviors of participants rather than click rates alone. In addition, we assessed the speed with which participants reported the simulated phishing emails to the appropriate organizational representatives. As evidenced by the findings and the comments provided by Phish Derby participants during the debrief, we view the gamification effort a success. Moreover, despite low enrollment—partially due to the lack of extensive marketing channels between the Information Security office and employees—we believe this event provided the single-most positive experience between the security office and the university’s employees. We encourage CISOs who are looking to improve employee participation in phishing exercises to strongly consider adding gamification elements to their efforts, and we implore other researchers to explore gamification’s influence more fully in increasing and sustaining individuals’ motivation to serve as stewards of security.

Data Availability Statement

The raw data supporting the conclusion of this article will be made available by the authors, without undue reservation.

Ethics Statement

The studies involving human participants were reviewed and approved by the Institutional Review Board, University of Central Florida. The patients/participants provided their written informed consent to participate in this study.

Author Contributions

MCa developed experimental materials, conducted data analysis and wrote portions of the introduction, method, results, discussion, and conclusion. CP developed experimental materials, conducted data analysis and wrote portions of the introduction, method, results, discussion, and conclusion. MCo developed experimental materials, ran the study protocol, and wrote portions of the introduction, method, results, discussion, and conclusion.

Funding

Funding for this research was made possible through the University of Central Florida’s Office of Research. Funding was also provided by the National Institute of Standards and Technology (NIST) under Financial Assistance Award Number: 60NANB20D189. The views and conclusion contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of NIST or the U.S. Government.

Conflict of Interest

Author MCa was employed by the company Beyond Layer Seven, LLC.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Acknowledgments

The authors would like to thank the Information Security Office at the University of Central Florida for engaging in a collaborative partnership with faculty. Without such efforts, this research would not have been possible.

References

Alkış, N., and Temizel, T. T. (2015). The Impact of Individual Differences on Influence Strategies. Personal. Individual Differences 87, 147–152.

Google Scholar

Anawar, S., Kunasegaran, D. L., Mas’ud, M. Z., and Zakaria, N. A. (2019). Analysis of Phishing Susceptibility in a Workplace: a Big-Five Personality Perspectives. J. Eng. Sci. Technol. 14 (5), 2865–2882.

Google Scholar

Baxter, R. J., Holderness, D. K., and Wood, D. A. (2017). The Effects of Gamification on Corporate Compliance Training: A Partial Replication and Field Study of True Office Anti-corruption Training Programs. J. Forensic Account. Res. 2 (1), A20–A30. doi:10.2308/jfar-51725