- Facultad de Ciencias Sociales y del Trabajo, Departamento de Psicología y Sociología, Universidad de Zaragoza, Zaragoza, Spain
Introduction: The assessment of dark personality using self-report questionnaires suffers limitations due to social desirability, bias, and response faking, particularly in organizational contexts. This research examines the psychometric properties of an extended version of VASSIP, a gamified assessment to briefly measure dark personality through an immersive situational judgment test (SJT).
Methods: A sample of 395 Spanish workers (47.4% female, Mjob experience = 10.4 years) participated in the study, completing questionnaires of the target variables and playing the gamified assessment.
Results: The hard-gamified extension of VASSIP has a unidimensional factor structure with 5 items. Validity was supported by direct associations with Honesty-Humility, moral disengagement, task performance, and counterproductive work behaviors (CWBs).
Discussion: Therefore, this measure appears to be valuable for briefly assessing dark personality, although its predictive capacity could be optimized. Its situational approach offers a more nuanced understanding of how individuals manifest dark personality in workplace scenarios.
1 Introduction
The prediction of job performance is a key issue in the study of work behavior. In that sense, personality is one of the main predictors of performance across most occupations and settings (Salgado and Moscoso, 2019). Specifically, meta-analytic reviews have consistently confirmed the validity of the ‘Big Five’ personality traits for predicting performance, highlighting conscientiousness and neuroticism as the best correlates (Judge et al., 2013). Recent research has turned its attention to dark personality, a cluster of subclinical, socially undesirable traits associated with various antisocial behaviors that can impact organizational outcomes (Szabó et al., 2023). By considering dark personality in addition to the Big Five, researchers are finding an increase in the explained variance of job performance (Fernández-del-Río et al., 2020). However, the frequent use of self-reported questionnaires in organizational settings limits the assessment of these traits. Respondents tend to misrepresent their responses to provide a better view of themselves (Wille et al., 2023), so we need alternative measures that prevent fake scores (Miller et al., 2019; Walker et al., 2022). One of these alternatives may be game-related assessments or GRAs (Ramos-Villagrasa and Fernández-del-Río, 2023). The study reported here analyses the performance of a GRA for the brief assessment of dark personality in organizational contexts.
1.1 Dark personality
Most research on dark personality in the work context has been based on the Dark Triad (Paulhus and Williams, 2002). The Dark Triad, focused on self-interest, describes a personality profile that shares the subclinical characteristics of three personality traits: Machiavellianism, psychopathy, and narcissism. Individuals with these traits tend to be insensitive, selfish, and malevolent in their interpersonal relationships. Recently, there has been a scholarly consensus to consider everyday sadism as part of this construct, renamed Dark Tetrad (Paulhus, 2014). Along these lines, several papers have recently advocated the existence of a latent (super) malevolence factor or core of the dark personality called the ‘D factor’, similar to the proposal of a General Personality Factor (e.g., van der Linden et al., 2010). Different dark traits (i.e., psychopathy, Machiavellianism, narcissism, sadism) would emerge from this factor as specific manifestations (Moshagen et al., 2020). Therefore, according to these authors, assessing dark personality by an omnibus score could contribute to screening for the shared malevolence that underlies different dark personality profiles. As an alternative to this dark personality approach, some authors argue that low scores on the Honesty-Humility (H-H) factor of the HEXACO personality model (Lee and Ashton, 2014) or low scores on the Big Five Agreeableness factor (Vize et al., 2021) offer a better explanation for the common core of malevolence considered in the D factor. Despite the high percentage of shared variance, Horsten et al. (2024, p. 400) argue that the lower pole of H-H and Agreeableness “do not represent D.”
1.2 Individual differences in personality and job performance
Job performance is considered the ultimate criterion in human resource management (Organ and Paine, 1999). Empirical evidence about the relationship between dark and light personality and job performance is far from conclusive (Moscoso and Salgado, 2004; Fernández-del-Río et al., 2020; Zettler and Solga, 2013). One possible explanation for the differences identified is the consideration of the multidimensional nature of performance. Thus, although job performance comprises behaviors of workers that contribute to organizational goals (Campbell and Wiernik, 2015), three main domains stand out (Sackett and Lievens, 2008): task performance, contextual performance, and counterproductive work behaviors. Task performance (TP) refers to behaviors that support the “technical core” of the organization, involving the execution and maintenance of processes, formally recognized as job requirements (Borman, 2006; Motowidlo et al., 2014). Contextual performance (CP; Smith et al., 1983) is behavior that contributes to organizational goals by collaborating socially and psychologically through initiative and cooperation (Koopmans et al., 2011). Counterproductive work behaviors (CWBs) are employees’ intentional actions that harm the organization and/or its members (Sackett and DeVore, 2001). These behaviors are divided into deviations directed at individuals—CWBI—or the organization—CWBO— (Bennett and Robinson, 2000).
Research on the relationship between TP and dark personality has shown mixed results. O’Boyle et al. (2012) conducted a meta-analysis that found small but significant effects of Machiavellianism (r = −0.06) and psychopathy (r = −0.08) on TP, while narcissism was not significant (r = −0.02). However, a recent primary study developed in Spain by Fernández-del-Río et al. (2020) found that narcissism (β = 0.23) and Machiavellianism (β = 0.10) positively predicted TP, whereas psychopathy (β = −0.14) and sadism (β = −0.11) were negative predictors. The inconsistency of the results suggests the need for further research in this field.
Evidence also suggests a negative relationship between dark personality and CP. Judge et al. (2006) found that narcissism affected this performance more negatively than TP. Becker and O'Hair (2007) found that Machiavellianism negatively affected civic behaviors, especially toward the organization. Fernández-del-Río et al. (2020) confirmed this relationship (β = −0.18) but found narcissism to be a positive predictor (β = 0.34), possibly due to the characteristics of the narcissistic pattern (i.e., high sense of self-importance, conceit, ostentation, etc.).
The scientific literature has paid increasing attention in recent years to the relationship between dark personality and CWBs, which are widespread in the workplace and pose a serious threat to both organizational performance and employee well-being (Duradoni et al., 2023). According to a meta-analysis by O’Boyle et al. (2012), the three components of the Dark Triad are positively, albeit weakly, related to this type of negative behavior. More recently, in the first study that considered sadism as a predictor of job performance, Fernández-del-Río et al. (2020) found that precisely this dark personality pattern best explained CWBs. However, we wonder whether some variables, such as moral disengagement, could influence the relationship between dark personality and undesirable behaviors in the workplace. Moral disengagement—a cognitive process by which people distance themselves from their internal moral standards to behave unethically (Navas et al., 2024)—may increase the explained variance of immoral behaviors in dark personalities in the workplace, as in other contexts (e.g., Egan et al., 2015). In any event, we need cumulative evidence of the relevance of moral disengagement in predicting workplace deviance (Ramos-Villagrasa et al., 2025).
1.3 Measuring dark personality in the workplace: traditional instruments and new alternatives
In the work setting, most research on dark personality has been developed using self-report measures, although they differ in length, reliability, and validity. Researchers, namely Uppal (2022), Ramos-Villagrasa et al. (2025), and Sekhar and Uppal (2024), have tended to use general scales, rather than work-specific ones—such as the Dirty Dozen (DD; Jonason and Webster, 2010), the Short Dark Tetrad (SD4; Paulhus et al., 2021), and Dark Core Scale (DCS; Moshagen et al., 2020). A notable exception is the Dark Tetrad at Work Scale (DTW; Thibault and Kelloway, 2020), whose use is increasing (e.g., Fernández-del-Río et al., 2020; Longpré and Turner, 2024). Although long questionnaires are preferable when there are no time constraints, extensive tests are not always necessary in the organizational domain, and brief scales are a significant time-saver (DeNisi and Murphy, 2017). There is a call for shorter personality measures (Rammstedt and Beierlein, 2014), driven in part by the recognition that alleged limitations, including lower reliability or criterion correlations, are frequently misunderstandings (Ziegler et al., 2014).
A common challenge in personality measurement within organizational contexts is that applicants tend to present themselves in a positive light (Birkeland et al., 2006). However, this concern is amplified when assessing dark personality, as the traits themselves are socially undesirable. Given that personality assessments are leveraged for critical personnel decisions (e.g., selection, promotion, development) in the workplace, minimizing such impression management is crucial, particularly when attempting to identify these subtle yet impactful characteristics. Thus, reliable and valid alternative measures are considered necessary in contexts with consequences for the person being assessed (Miller et al., 2019; Walker et al., 2022). GRAs could be an alternative technique that acts as a “shadow assessment system” to obtain more comprehensive and accurate information from candidates by reducing social desirability and response faking (Landers and Collmus, 2022; Melchers and Basch, 2022).
According to the main theoretical models on deception (Ellingson and McFarland, 2011; Levashina and Campion, 2006; Tett and Simonet, 2011), this behavior is determined by three interrelated factors: the capability, motivation, and opportunity to deceive. Capability refers to the individual’s cognitive ability to identify which responses will be perceived as most socially desirable; motivation depends on both personal characteristics (e.g., competitive orientation or personality traits) and situational factors (e.g., job importance or organizational attractiveness); and opportunity relates to the extent to which the assessment format allows responses to be adjusted to create a favorable impression (Ohlms et al., 2024). In this context, situational judgment tests (SJTs) represent a significant advance over traditional personality questionnaires. While self-reports are based on introspective judgments that are easily influenced by social desirability, SJTs place the individual in specific scenarios that require selecting or evaluating hypothetical behaviors, which reduces the transparency of the evaluation criteria and, therefore, the opportunity to simulate (Lievens and Motowidlo, 2016; Mussel et al., 2016). Furthermore, by involving the interpretation of contexts and the application of knowledge about which behaviors are effective, SJTs tend to be less vulnerable to falsification and offer greater ecological validity, as they approximate the way people make decisions in social or work situations (Olaru et al., 2019).
On this basis, gamified SJTs take this approach a step further by introducing interactive and dynamic environments that significantly modify the three components of cheating. First, the high cognitive load and complexity of the environment reduce the ability to identify which response will be most highly valued (Altomari et al., 2023). Second, playful immersion decreases the motivation to falsify, as participants focus on performance within the game rather than managing their self-image (Bhatia and Ryan, 2018). This reduced vulnerability to simulation is particularly relevant in the assessment of dark personality, as it is associated with greater motivation and the ability to manipulate self-presentation in selection contexts (Roulin and Krings, 2016). In these contexts, where social desirability is high, gamified SJTs offer a format in which strategic manipulation is more difficult and in which the decisions and behaviors observed more genuinely reflect the characteristics of dark personality. Finally, the opportunity is limited by the low transparency of the evaluation criteria, which are integrated into the narrative or dynamics of the game and are not evident (Woods et al., 2020). Taken together, these factors favor more spontaneous and authentic responses than those generated in classic personality questionnaires. Therefore, the integration of the fundamentals of falsification theory and situational assessment allows us to argue that gamified SJTs are a particularly relevant tool for the valid and ethical assessment of dark personality in organizational contexts.
Gamified SJTs belong to GRAs, a set of assessment methods that utilize games or gamified tools (Ramos-Villagrasa et al., 2022). They can be classified on a continuum according to their degree of playfulness (Ramos-Villagrasa and Naryniecki, 2025). Within this continuum, tests that incorporate game-like elements in their assessment are referred to as serious games and can, therefore, be considered psychological assessment methods in their own right, with adequate predictive validity (Harman and Brown, 2022; Hilliard et al., 2022). There are three types of serious games: (1) soft gamified assessments, which are similar to traditional assessments but include gamification elements like music, stories, or score points; (2) hard gamified assessments, where gamification is fundamental to the assessment design; and (3) game-based assessments, which are structured as actual games. Therefore, a gamified SJT is classified into one of these types depending on its degree of playfulness. Compared to traditional assessment methods, such as questionnaires, serious games enhance individuals’ reactions (Ellison et al., 2020), mitigate bias (Landers and Collmus, 2022), and reduce faking behavior (Melchers and Basch, 2022; Ohlms et al., 2025).
Research on GRAs for assessing personality is scarce and always focused on “bright” personality (Barends and Ohlms, 2025). The closest example to date is Building Docks, a hard gamified assessment designed to measure H-H (Barends et al., 2022). Although dark personality overlaps with H-H, the latter appears to be functionally and nomologically distinct, and dark personality outperformed in the prediction of aversive behaviors, like CWBs (Horsten et al., 2021). Therefore, there is still room to evaluate dark personality through gamification. In this paper, we present an extension of VASSIP (Ramos-Villagrasa et al., 2024), a soft-gamified assessment that evaluates the Big Five model, expanding its scope to measure dark personality.
1.4 The present study
Addressing the need for alternative dark personality assessment measures and leveraging the advantages of GRAs, this research aims to evaluate the performance of a hard-gamified assessment designed for the brief appraisal of dark personality in organizational contexts. To achieve this, we will develop a dark personality measure, integrate it into an existing gamified assessment, and assess its functioning.
The development of the dark personality measure is based on situational judgment test (SJT)-type items (Lievens et al., 2008). SJTs, which present participants with “short domain-relevant situational descriptions and various response options to deal with the situations” (Herde et al., 2019, p. 66), have great potential for assessing personality (Lievens et al., 2021). The advantage of SJT-type items is that evaluees tend to rate them better, although they are often less reliable (internal consistency) than other scales due to the methodology used to design them (Kasten and Freund, 2016). Furthermore, McDaniel et al.’s (2007) meta-analysis found that SJT-type items had a mean observed correlation of 0.20 for predicting job performance and high face and content validity, so incorporating these types of items could be advantageous for the gamified assessment of dark personality in the work environment. This is especially relevant in contexts such as recruitment, where the use of gamified tests can improve validity, diversity, and candidate reactions (Van Iddekinge et al., 2023).
The process of gamifying the SJT items for inclusion in VASSIP was based on storyfication (Ohlms et al., 2024), which means incorporating them into the narrative so that the questions posed to the individuals being assessed become part of the story without substantially altering its development. The integration of the gamified SJT for measuring dark personality in the serious game VASSIP transforms it into a hard-gamified assessment (Ramos-Villagrasa and Naryniecki, 2025). This is because the described situations are embedded within the assessment’s narrative, such that the questions can only be answered by considering the gamified elements. An additional advantage is that this approach allows for the assessment of both the Big Five and dark personality using a single instrument.
To examine the functioning of the hard-gamified assessment, we will evaluate dimensionality, construct validity, and criterion validity. Regarding dimensionality, given the aim of creating a brief instrument, we intend to develop a unidimensional, general measure of dark personality for initial assessment. A more precise measure would require greater length, and we must first demonstrate that GRAs offer advantages in assessing this type of personality. For construct validity, we will test its convergence and divergence with a self-report measure of dark personality and measures of “bright” personality. Finally, we will examine criterion validity by analyzing whether the dark personality measure plays a role in the prediction of the different dimensions of job performance (i.e., TP, CP, and CWBs).
2 Method
2.1 Sample
To test the factor structure and construct validity of the VASSIP measure of dark personality, an a priory power analysis was conducted using GPower. The significance level (α) was set at 0.05, and the desired statistical power (1 – β) was 0.80. Based on an expected effect size of 0.30, the power analysis indicated a required sample size of N = 347. Thus, 410 workers living in Spain and fluent in Spanish were recruited through the Prolific research platform. Prolific ensures a controlled online environment with strict participant verification and in-study quality checks, providing high-quality data. Participants were informed about the study’s purpose and their rights according to APA ethical standards before deciding to participate. After eliminating participants with more than 5% of missing responses and those who failed the attentional task, the final sample comprised 395 workers (47.4% female, Mage = 34.2 years, SDage = 10.4, Mjob experience = 10.83 years, SDjob experience = 9.5).
2.2 Measures
2.2.1 Attention check
The questionnaire with the target variables included a question stating, “Please select Disagree as the answer to this question.” Fifteen participants failed the check and were removed from the study.
2.2.2 Sociodemographic
Participants were asked about their gender, age, level of education, job experience, and current job position.
2.2.3 Gamified assessment of personality
As previously described, the gamified personality measure is an extension of VASSIP (Ramos-Villagrasa et al., 2024), a soft-gamified assessment that evaluates “bright” personality using the Big Five model. The Big Five measure retains its original format, with items and response format identical to the short version of the BFI-2 (Soto and John, 2017). For dark personality, the authors of this paper developed a gamified SJT.
The gamified elements of the original version by Ramos-Villagrasa et al. (2024) are still present in this one. The first version, storyfication, consists of embedding the assessment into a science fiction story, in which the person being assessed is hired to work in a space base whose security is compromised. During the game, the person being tested must make decisions until reaching one of the three outcomes of the story (see Ohlms et al., 2024, for more information about storified assessments). Immersion in the test is achieved through images and music that are not part of the assessment but make it easier for the evaluee to feel that they are part of a story and not part of an assessment process. Finally, VASSIP incorporates decision-making and some simple games that are not part of the assessment but help the participants perceive the experience more as a game than a personality assessment. As developing the measure of dark personality is among the objectives of the present study, information about the measure is described in detail in the following sections. Given the changes performed on the original VASSIP scale, we consider the present version a hard-gamified assessment because the measurement is embedded in the game narrative (Ramos-Villagrasa and Naryniecki, 2025).
2.2.4 Honesty-Humility (H/H)
This dimension of the HEXACO model was measured with the Spanish version of the corresponding subscale of HEXACO-100 (Lee and Ashton, 2004; Spanish version by Romero et al., 2015). The internal consistency of this scale was ω = 0.76 (McDonald’s omega) and α = 0.74 (Cronbach’s alpha). A sample item is “If I want something from a person I dislike, I will act very nicely toward that person to get it.”
2.2.5 Moral disengagement
We used the Spanish version of the 8-item Propensity to Moral Disengagement Scale (Moore et al., 2012; Spanish version by Navas et al., 2024). It is rated on a 7-point Likert scale ranging from 1 (strongly disagree) to 7 (strongly agree). The internal consistency of this scale was ω = 0.70 (McDonald’s omega) and α = 0.67(Cronbach’s alpha). A sample item is “It is okay to spread rumors to defend those you care about.”
2.2.6 Self-reported measure of dark personality
We applied the Spanish version of the Short Dark Tetrad (SD4; Paulhus et al., 2021) used by Ramos-Villagrasa et al. (2025). This scale comprises 28 items rated on a 5-point Likert-type scale, ranging from 1 (strongly disagree) to 5 (strongly agree). The internal consistency of this scale was ω = 0.72 for Machiavellianism, 0.74 for narcissism, 0.77 for psychopathy, and 0.78 for sadism (McDonald’s omega); and α = 0.72 for Machiavellianism, 0.73 for narcissism, 0.76 for psychopathy, and 0.77 for sadism (Cronbach’s alpha). A sample item is “Watching a fistfight excites me.”
2.2.7 Big Five
As the present version of VASSIP is based on the original one developed by Ramos-Villagrasa et al. (2024), it includes a gamified version of the short version (30 items) of the BFI-2-S (Soto and John, 2017) based on storyfication, immersion, and the inclusion of game dynamics that are not part of the assessment, while the items and response format are identical to the original test. The authors of VASSIP reported similar results to those of the original scale, with a slightly higher mean score in Conscientiousness. Responses are rated on a 5-point Likert-type scale, ranging from 1 (strongly disagree) to 5 (strongly agree). The internal consistency of this scale was ω = 0.82 for Negative Emotionality, 0.73 for Extraversion, 0.80 for Open-Mindedness, 0.69 for Agreeableness, and 0.81 for Conscientiousness (McDonald’s omega), and α = 0.82 for Negative Emotionality, 0.72 for Extraversion, 0.79 for Open-Mindedness, 0.68 for Agreeableness, and 0.81 for Conscientiousness (Cronbach’s alpha). A sample item is “Is full of energy.”
2.2.8 Task performance and contextual performance
We applied the Spanish version of Individual Work Perfomance Questionnaire (IWPQ), developed by Koopmans (2015) and translated into Spainish by Ramos-Villagrasa et al. (2019). The IWPQ comprises two subscales: Task Performance (5 items, ω = 0.87 and α = 0.86). and Contextual Performance (13 items, ω = 0.86 and α = 0.86). Participants answered on a 5-point scale ranging from 0 (never) to 4 (often). A sample item of Task Performance is “I managed to plan my work so that I finished it on time.” A sample of Contextual Performance item is “I took on challenging tasks when they were available.”
2.2.9 Counterproductive work behaviors
We applied the Workplace Deviance Scale (WDS; Bennett and Robinson, 2000; Spanish version by Fernández del Río et al., 2021). The WDS comprises two subscales: Organizational Behaviors or CWBO (12 items, ω = 0.82 and α = 0.80) and Interpersonal Behaviors or CWBI (7 items, ω = 0.89 and α = 0.88). Participants answered on a 7-point scale ranging from 1 (never) to 7 (daily). A sample item is “Have taken property from work without permission.”
2.3 Procedure
Participants were randomly divided into two groups: (1) Condition 1 accounted for 52.7% of the sample, who first completed an online questionnaire related to the target variables and subsequently played the gamified assessment; and (2) Condition 2, which comprised the remaining 47.3% of the participants, who first played the gamified assessment and then completed the online questionnaire. This approach, which is a standard method for counterbalancing two tasks in empirical designs, ensures that any potential influence of task order is distributed across participants rather than confounding the results (Kirk, 2013).
This research was approved by the ethics committee of Aragón (CEICA, ref. PI24/123).
2.4 Analysis
2.4.1 Item generation and content validation
The formal procedure proposed by Haynes et al. (1995) was carried out to provide apparent validity to the content of the developed SJT. For this purpose, a common empirical methodology for developing SJTs, known as hit-rate analysis, was used (Moore and Benbasat, 1991).
Fifteen situational scenarios requiring decision-making were designed, each one with three response options. These options represented a low, medium, and high score in dark personality. After an internal review, scenarios were sent to four external experts (three academics and one HR professional) who assessed the value they would give to each response alternative (low, medium, or high). After analyzing the responses, the experts and researchers concluded that in the gamified assessment, a total of 12 items reached acceptable levels of congruence during the content validation procedure (Kendall’s W = 0.63 for the gradation of the intensity of the responses) in the SJT format according to Cicchetti (1994) guidelines.
2.4.2 Factor structure and validity of the gamified assessment
A confirmatory factor analysis (CFA) was conducted to assess the adequacy of the structure of the gamified SJT with the data provided by the participants. Diagonally Weighted Least Squares (DLWS) estimates and robust statistics were used to address non-normality of the data and fit indices, as recommended by Hu and Hu and Bentler (1999). More specifically, the following criteria were considered for optimal fit: χ2/df < 2–3, CFI > 0.95, RMSEA < 0.06, SRMR < 0.05; and for acceptable or reasonable fit: χ2/df < 4, CFI > 0.90, RMSEA < 0.08, SRMR < 0.08 (Byrne, 2012).
To establish the construct validity of the new hard-gamified extension of VASSIP, participants’ responses were used to perform descriptive analyses, mean comparisons, correlations, and linear and hierarchical regressions, with a set of well-established measures used as independent variables.
All analyses were performed using JAMOVI statistical software.
3 Results
3.1 Factor structure of the items that comprise the gamified assessment
The 12 items that constitute the gamified STJ were subjected to a CFA where each item loads on a first-order factor representing dark personality. The fit indices in the CFA for this model were not optimal, χ2 (80.8, 54) = 1.55, p < 0.05, CFI = 0.74, RMSEA = 0.03 [0.01, 0.05], SRMR = 0.08. Consequently, items whose estimators neither showed a significant contribution to the model (p > 0.05, n = 5 items = i1, i2, i4, i9, and i12) nor significant correlations with the dimensional scores obtained on the self-report assessing dark personality (SD4, n = 2 items = i3 and i6) were removed, so that the final items can be seen in Table 1.
Table 1. Correlations among the 12 items of the VASSIP measure of dark personality and the four dimensions of SD4.
The reduction from 12 to 5 items was guided by well-established psychometric criteria (DeVellis, 2017). Items that did not load significantly on the factor, nor correlate meaningfully with the dimensional scores obtained from the reference self-report, were excluded to improve construct validity and model fit. This approach also follows the principle of parsimony, aiming for a concise instrument that retains the core variance of the latent construct while minimizing redundancy (MacCallum et al., 1992). Subsequently, the five resulting items composing the gamified assessment of dark personality (i5, i7, i8, i10, and i11) were subjected to an CFA, showing optimal fit indices, χ2(17.3, 14) = 1.27 p > 0.05, CFI = 1.00, RMSEA = 0.00 [0.00, 0.02], SRMR = 0.02. The items are presented in the Supplementary material. The estimators of each item for the model are presented in Figure 1.
Figure 1. Confirmatory factor structure of the items comprising the gamified version of the VASSIP measure of dark personality.
3.2 Descriptive statistics
The descriptive statistics (M, SD, skewness, kurtosis) presented in Table 2 indicate that, except for H-H (W = 0.99, p = 0.11), the data of the variables did not follow a normal distribution. Statistically significant differences between men and women were found in age, agreeableness, Machiavellianism, psychopathy, moral disengagement, and CWBI, albeit all with small effect sizes. In addition, negative emotionality (d = 0.24), dark personality—assessed using the gamified assessment (d = 0.32)—and sadism (d = 0.49), showed significant differences between men and women with a moderate effect size. Men displayed higher mean scores in dark personality, and women showed significantly higher mean scores in negative emotionality.
The assessment of dark personality through the gamified assessment revealed significant differences according to application condition, where Condition 2 (playing the gamified assessment first, answering the self-report questionnaire afterwards) obtained higher scores on dark personality measured through the gamified assessment (Mann–Whitney U = 15121, p = 0.003, d = 0.17).
3.3 Correlations and regression analyses
Next, Spearman correlations were conducted to determine whether the scores on the gamified dark personality assessment were significantly associated with scores on Machiavellianism, narcissism, psychopathy, and sadism of the self-reported assessment and with the several dimensions of job performance. Spearman’s correlations (ρ) are reported together with their 95% confidence intervals obtained via bootstrapping (1000 samples), providing a more reliable estimation of the associations. The associations between the variables are presented in Table 3. The relationships between dark personality assessed by the gamified assessment and the self-reported assessment showed a positive and significant correlation for Machiavellianism (ρ = 0.35, p < 0.001), narcissism (ρ = 0.23, p < 0.001), psychopathy (ρ = 0.19, p < 0.001), and sadism (ρ = 0.40, p < 0.001). In addition, the gamified assessment of dark personality was negatively and significantly related to H-H (ρ = −0.32, p < 0.001). The relationships of the gamified assessment of dark personality were negatively and significantly related to TP (ρ = −0.15, p < 0.05). The relationships of the CWBs presented a similar pattern in both assessments: in the self-reported assessment, the correlations for Machiavellianism were ρCWBO = 0.28, p < 0.001; ρCWBI = 0.21, p < 0.001; for narcissism, they were ρCWBO = 0.11, p < 0.05; ρCWBI = 0.12, p < 0.05; for psychopathy, they were ρCWBO = 0.36, p < 0.001; ρCWBI = 0.27, p < 0.001; and for sadism, they were ρCWBO = 0.37, p < 0.001; ρCWBI = 0.37, p < 0.001, while in the gamified assessment, the associations with dark personality were ρCWBO = 0.20, p < 0.001; and ρCWBI = 0.18, p < 0.01. Likewise, the dark personality assessed with the new gamified version of VASSIP was positively and significantly related to the scores obtained in propensity to moral disengagement (ρ = 0.30, p < 0.001) and, in turn, the latter was positively and significantly linked to CWBs (ρCWBO = 0.35, p < 0.001; ρCWBI = 0.29, p < 0.001).
To further explore the association between the brief measure that analyzes dark personality through a hard gamified assessment (GRA) and job performance, we developed four hierarchical regression models where dark personality was proposed as a predictor of dimensions of job performance (TP, CP, CWBO, and CWBI) as criteria. As can be seen in Table 4, the percentages of explained variance of the VASSIP measure of dark personality were significant for TP (1%) and both types of CWBs (3.1% for CWBO and 2% for CWBI).
Table 4. Hierarchical regression models predicting different aspects of job performance from dark personality and Big Five traits.
Hierarchical regression analyses revealed that dark personality did not significantly predict job performance when Big Five traits were included in the models, with the exception of CP, where a lower dark personality score was positively associated with higher scores in this dimension of job performance.
4 Discussion
Video games have become an integral part of daily life, especially following the COVID-19 outbreak (Lewinson et al., 2024). Consequently, their influence is permeating other contexts, such as the assessment of individuals in organizational settings (Ramos-Villagrasa et al., 2022). At the same time, the interest in measuring dark personality is growing. Self-report assessments are limited by the risk of response manipulation (Wille et al., 2023), a difficulty that is especially relevant in the case of dark personality. Hence, in recent years, a notable effort has been made to propose alternative measures to self-reported questionnaires for personality assessment (Miller et al., 2019), such as GRAs (Ramos-Villagrasa and Naryniecki, 2025). The present study has proposed a hard gamified assessment to briefly measure dark personality in organizational contexts. The following sections discuss the study’s findings and their theoretical and practical implications.
First, it seems that the use of SJT items in serious games to measure dark personality is adequate. As Lievens and Motowidlo (2016) point out, SJTs allow a more accurate assessment of behaviors in specific situations by focusing on how individuals respond to work-relevant scenarios. Therefore, their situational approach can capture more accurately how candidates might display these characteristics in real work situations, providing added value in personnel selection and the study of dark personality in organizational settings. This is crucial in contexts where the concealment of dark personality traits may be particularly detrimental, such as in selection processes where impression management is more likely among individuals with these traits (e.g., Curtis et al., 2022).
Regarding content validity, the indices of the VASSIP extension are considered “adequate or good” because they reach Kendall’s W above 0.60 (Cicchetti, 1994). This result suggests that the scenarios selected for the situational judgment test are suitable for measuring dark personality. To capture the underlying nature of the dark personality, a CFA was conducted to assess whether the generated scenarios loaded on a single factor, representing the so-called D factor, a construct resulting from the shared malevolence of several related dark traits (Moshagen et al., 2020). Initial results with this hard gamified assessment did not show adequate fit indices for a unidimensional model of dark personality. However, a five-item solution with satisfactory fit indices was achieved after eliminating items with low contributions to the model. This result suggests the existence of a set of maladaptive traits that would form a brief and coherent measure of dark personality (Rauthmann and Kolar, 2012). Therefore, the new extended version of VASSIP presents a novel solution for exploring dark personality in organizations that rely on personality assessments for HR processes. It is also important to acknowledge the practical implications of employing an instrument based on a unidimensional factor structure. Although the unidimensional model offers a theoretically parsimonious approach to capturing the shared tendency of dark personality to maximize personal gain at the expense of others, it does not account for the unique variance of each trait in predicting specific behavioral outcomes (Book et al., 2016). In this regard, the unidimensional configuration can obscure the qualitative distinctions among the various dark traits (Vize et al., 2020). Consequently, this instrument can provide a parsimonious and useful tool for the initial assessment of dark personality in personnel selection. However, it should be supplemented with additional evaluations when making decisions about targeted interventions for specific behavioral risks. Regarding convergent and concurrent validity, both were successfully tested. Correlational analyses have shown a moderate and significant association between the VASSIP measure of dark personality and the SD4 scale, indicating a moderate correspondence between the two dark personality assessment instruments. These findings suggest that, although the measures assess similar aspects of dark personality, they are not entirely equivalent (Rauthmann and Kolar, 2012). This pattern is consistent with previous research showing only modest convergence between situational judgment measures and traditional self-report inventories of personality (e.g., Arthur and Villado, 2008; Lievens and Motowidlo, 2016). Moreover, the results indicate that the SJT captures behavioral manifestations of dark personality, supporting its convergent validity. In addition, the SJT showed theoretically coherent relationships with external variables, such as lower honesty and integrity, higher moral disengagement, and greater tendencies toward counterproductive work behaviors, while its relationship with contextual performance was negligible. This configuration aligns with the dark core of personality framework (Moshagen et al., 2018), indicating that individuals displaying stronger dark tendencies in the SJT also exhibit patterns of behaviors consistent with this theoretical model. However, these predictive relationships are modest and should be interpreted with appropriate prudence, such magnitudes are typical and meaningful in personality-behavior research (Fernández-del-Río et al., 2020; Funder and Ozer, 2019). Therefore, even with small effects, the gamified SJT provides practically useful and ecologically valid insights that complement traditional self-report personality assessments.
Another interesting result is related to the regression analyses. Using the measure developed in this study as the sole predictor, it can predict TP and CWBs. Although the variance explained by this model is small, it indicates that this solution is appropriate for screening the associations between dark personality and these criterion variables. However, when the remaining personality variables measured in VASSIP are included, the effect of the dark personality measure becomes negligible. There are two complementary explanations for this result: (1) the predictive power of the Big Five for performance is greater than that of dark personality; thus, when using a brief measure of the latter construct, its effect diminishes; (2) Ramos-Villagrasa et al. (2022) suggest that GRAs with measures that are more like conventional tests (such as the Big Five measure in this study) perform better than those that are more “game-like” (such as the dark personality measure).
The observed validity coefficients of the VASSIP gamified assessment are comparable to those typically reported for traditional dark personality measures. Specifically, the correlations with CWBs and TP (ρ = −0.15) fall within the range found in meta-analyses of self-reported dark traits (O’Boyle et al., 2012) and in previous primary studies (Fernández-del-Río et al., 2020). These results suggest that, despite its brief and interactive format, the gamified SJT captures the core variance of the dark personality with a similar predictive strength to traditional self-report scales. However, hierarchical regression analyses indicated modest practical predictive value: the gamified measure explained a small proportion of variance in job performance and CWBs (ΔR2 = 0.01–0.03), and its contribution diminished when the Big Five traits were included. This pattern aligns with prior evidence that the predictive power of dark traits is often limited after controlling for broader personality factors (Moscoso and Salgado, 2004). Nevertheless, the practical advantage of the VASSIP extension lies in its reduced susceptibility to faking and its capacity to elicit authentic behavioral responses within work-relevant contexts (Landers and Collmus, 2022; Melchers and Basch, 2022). Hence, while the predictive magnitude is modest, the measure offers an ecologically valid and ethically sound alternative to traditional questionnaires for assessing dark personality in organizational settings.
Additionally, hierarchical regression analyses have shown that the gamified version of dark personality is a positive predictor of CP when considering the VASSIP measure of bright personality. From our point of view, this result could be due to a statistical artifact, as multicollinearity seems the most plausible explanation, given that these variables are related to personality (Daoud, 2017; Shrestha, 2020). This is further supported by the fact that when dark personality is analyzed in isolation, the same prediction did not occur. Nevertheless, this finding could also align with recent research suggesting that certain dark personality profiles may promote behaviors beneficial to the organization, if they result in personal gain (Kızıloğlu et al., 2021).
Finally, it is worth noting that the assessment of dark personality through the serious game VASSIP revealed significant differences according to the application condition, where applying the GRA first resulted in higher scores on the dark personality assessed through it. This result suggests, like prior literature, that gamified tests may be more resistant to faking. However, this requires further support from research.
Taking into account all the aforementioned, we can conclude that the new version of VASSIP has as main advantages: (1) the integration in the same GRA of the assessment of bright personality traits through the Big Five with a measure of dark personality; (2) their brief nature in terms of content (6 items to measure each of the Big Five traits and 5 items to measure dark personality), and application time (around 15 min); and, (3) according to the results of its previous version (Ramos-Villagrasa et al., 2024), the GRA is rated favorably by the people evaluated. All this leads us to conclude that VASSIP can be a useful instrument in contexts in which dark personality assessment is desired but is not considered necessary, or there is insufficient time to carry out in-depth evaluations. However, a more exhaustive analysis of the different existing dark profiles would require the use of more detailed and comprehensive scales.
From a general perspective, the present study supports the idea that GRAs may be an interesting way to measure dark personality in the workplace. However, the modest results suggest clearly that more research is needed to know how to develop better assessments based on games (Ramos-Villagrasa and Fernández-del-Río, 2023). Therefore, the use of gamified assessments in personnel selection context presents both promising opportunities and important ethical considerations. From a practical standpoint, these gamified-related assessment tools can capture behavioral tendencies that complement self-report measures, providing a richer understanding of candidates’ traits and potential workplace behaviors. Importantly, their implementation should support diversity and fairness in selection processes. The literature emphasizes that a diverse workforce offers substantial organizational benefits, which begin by ensuring that assessment tools provide equal opportunities to all candidates, regardless of gender, age, nationality, or other characteristics (Langer et al., 2023). Early research on gamified-related assessment tools suggests minimal bias, particularly regarding gender and educational background, though further work is needed to explore other diversity dimensions such as age and culture (Brown et al., 2022; Melchers and Basch, 2022; Ramos-Villagrasa et al., 2024).
Ethical considerations are equally critical. Although gamified assessments are designed to be valid and engaging, their excessive or improper use can unintentionally stress or influence participants in counterproductive ways. Kim and Werbach (2016) suggest that excessive use of gamified assessments can potentially infringe upon autonomy or negatively impact candidates’ well-being. Transparency about the purpose of the assessment, clear instructions, voluntary participation, and accessibility for all candidates are key to mitigating such risks. Furthermore, cultural factors should be considered when applying gamified tools in multinational contexts to avoid misunderstandings or inadvertent disadvantages. Overall, when carefully implemented, gamified assessments are ethically responsible, practically useful, and supportive of organizational diversity goals, while offering an engaging alternative to traditional assessment methods.
4.1 Limitations and further research
Like any study, the present one has some limitations that should be acknowledged. First, this study relies solely on self-report measures collected concurrently, which makes it susceptible to common method variance. Future research is crucial to address this, not only by utilizing multiple measurement time points and assessing job performance through supervisor ratings, but also by more thoroughly exploring the impact of common method variance in GRAs more broadly. Specifically, when evaluating GRAs’ incremental validity over self-report measures, it is vital to investigate how assessment methodology interacts with outcome measurement. As Barends et al. (2022) demonstrated, a game-based assessment of H-H showed incremental validity only for behavioral tasks, not for self-reported outcomes. Therefore, subsequent studies should incorporate multi-source or objective outcome measures to provide a more robust evaluation of GRAs’ unique predictive contributions, moving beyond solely self-reported criteria. Second, our GRA is a hard-gamified assessment tool. Previous research suggests that soft-gamified assessment tools obtain better results in terms of validity, but we did not compare soft and hard versions (e.g., introducing the SD4 items in the GRA vs. the current version of VASSIP). However, we believe that soft gamified assessment is not an adequate option for measuring dark personality, as it is more challenging to develop a shadow assessment, and applicants’ reactions are less favorable. Third, our study focused on developing a brief measure of dark personality but did not verify whether this version is less prone to faking. This should be verified in further studies. Fourth, the study employs a cross-sectional design, which precludes the establishment of causal relationships between dark personality and the outcomes measured. While the current design allows for the assessment of associations and initial validation of the gamified-related assessment, longitudinal studies are required to examine causal links and changes over time. Fifth, the study lacks longitudinal predictive data. Consequently, while the current findings provide evidence of the VASSIP’s reliability and construct validity, its predictive utility for real-world outcomes or future behaviors remains to be established. Future research should incorporate longitudinal designs to assess the stability of measured traits and the instrument’s ability to predict relevant behavioral outcomes over time. Sixth, another relevant limitation is the capitalization on chance associated with the item selection used to construct VASSIP’s dark personality measure. This likely overestimates the strength of the relationships analyzed. Consequently, a replication study is required to more robustly ascertain the true relationship between VASSIP’s dark personality measure and the remaining variables.
Continuing with further research, applicant reactions to the present version of VASSIP should be gathered. In Likert-type scales, items directly associated with dark personality promote negative reactions, but we do not know if this is true in the SJT-gamified version. This could interest researchers and designers of measurement instruments of socially undesirable constructs. In this regard, although the present study focused on the psychometric evaluation of an extended version of the VASSIP for initially assessing dark personality in personnel selection contexts, future research should consider testing a multidimensional alternative model, such as a higher-order or bifactor structure. Such an approach would allow the examination of both the shared variance captured by the D factor and the unique contributions of individual dark personality traits, potentially enhancing predictive precision and providing a more nuanced understanding of behavioral outcomes in organizational settings. Additionally, we consider it necessary to compare different types of GRAs that measure personality to verify possible differences depending on the degree of playfulness.
Data availability statement
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: Open Science Framework (OSF): https://osf.io/vdfgw/?view_only=dbb2afb1c85c460e906ed32297a648bc.
Ethics statement
The studies involving humans were approved by Comité de Ética de la Investigación de la Comunidad Autónoma de Aragón (CEICA), ref. PI24/123. The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.
Author contributions
MN-S: Formal analysis, Resources, Software, Validation, Visualization, Writing – original draft, Writing – review & editing. EF-d-R: Formal analysis, Resources, Software, Validation, Visualization, Writing – original draft, Writing – review & editing. PR-V: Conceptualization, Data curation, Funding acquisition, Investigation, Methodology, Project administration, Resources, Supervision, Writing – review & editing.
Funding
The author(s) declare that financial support was received for the research and/or publication of this article. This work was supported by the Ministry of Science and Innovation, Government of Spain, under grant PID2021-122867NA-I00; and the Government of Aragon (Group S31_23R), Department of Innovation, Research and University and FEDER 2014–2020, Building Europe from Aragón.
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
The author(s) declared that they were an editorial board member of Frontiers, at the time of submission. This had no impact on the peer review process and the final decision.
Generative AI statement
The authors declare that no Gen AI was used in the creation of this manuscript.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Supplementary material
The Supplementary material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyg.2025.1686784/full#supplementary-material
References
Altomari, L., Altomari, N., and Iazzolino, G. (2023). Gamification and soft skills assessment in the development of a serious game: design and feasibility pilot study. JMIR Serious Games 11:e45436. doi: 10.2196/45436,
Arthur, W. Jr., and Villado, A. J. (2008). The importance of distinguishing between constructs and methods when comparing predictors in personnel selection research and practice. J. Appl. Psychol. 93, 435–442. doi: 10.1037/0021-9010.93.2.435,
Barends, A. J., De Vries, R. E., and Van Vugt, M. (2022). Construct and predictive validity of an assessment game to measure honesty-humility. Assessment 29, 630–650. doi: 10.1177/1073191120985612,
Barends, A. J., and Ohlms, M. L. (2025). Game-related personality assessment. Curr. Opin. Psychol. 65:102095. doi: 10.1016/j.copsyc.2025.102095,
Becker, J. A. H., and O'Hair, H. D. (2007). Machiavellians' motives in organizational citizenship behavior. J. Appl. Commun. Res. 35, 246–267. doi: 10.1080/00909880701434232
Bennett, R. J., and Robinson, S. L. (2000). Development of a measure of workplace deviance. J. Appl. Psychol. 85, 349–360. doi: 10.1037/0021-9010.85.3.349,
Bhatia, S., and Ryan, A. M. (2018). “Hiring for the win: game-based assessment in employee selection” in The brave new world of eHRM 2.0. eds. J. H. Dulebohn and D. L. Stone (Charlotte: IAP Information Age Publishing), 81–110.
Birkeland, S. A., Manson, T. M., Kisamore, J. L., Brannick, M. T., and Smith, M. A. (2006). A meta-analytic investigation of job applicant faking on personality measures. Int. J. Sel. Assess. 14, 317–335. doi: 10.1111/j.1468-2389.2006.00354.x
Book, A., Visser, B. A., Blais, J., Hosker-Field, A., Methot-Jones, T., Gauthier, N. Y., et al. (2016). Unpacking more “evil”: what is at the core of the dark tetrad? Pers. Individ. Differ. 90, 269–272. doi: 10.1016/j.paid.2015.11.009
Borman, W. C. (2006). The concept of organizational citizenship. Curr. Dir. Psychol. Sci. 13, 238–241. doi: 10.1111/j.0963-7214.2004.00316.x
Brown, M. I., Speer, A. B., Tenbrink, A. P., and Chabris, C. F. (2022). Using game-like animations of geometric shapes to simulate social interactions: an evaluation of group score differences. Int. J. Sel. Assess. 30, 167–181. doi: 10.1111/ijsa.12375,
Byrne, B. M. (2012). Structural equation modeling with Mplus: Basic concepts, applications, and programming. New York: Routledge/Taylor & Francis Group.
Campbell, J. P., and Wiernik, B. M. (2015). The modeling and assessment of work performance. Annu. Rev. Organ. Psychol. Organ. Behav. 2, 47–74. doi: 10.1146/annurev-orgpsych-032414-111427
Cicchetti, D. V. (1994). Guidelines, criteria, and rules of thumb for evaluating normed and standardised assessment instruments in psychology. Psychol. Assess. 6, 284–290. doi: 10.1037/1040-3590.6.4.284
Curtis, S. R., Carre, J. R., Mueller, S. M., and Jones, D. N. (2022). Hiding your dark side: anticipatory impression management of communal traits. Curr. Psychol. 42, 18720–18730. doi: 10.1007/s12144-022-03039-5
Daoud, J. I. (2017). Multicollinearity and regression analysis. J. Phys. Conf. Ser. 949, 1–6. doi: 10.1088/1742-6596/949/1/012009
DeNisi, A. S., and Murphy, K. R. (2017). Performance appraisal and performance management: 100 years of progress? J. Appl. Psychol. 102, 421–433. doi: 10.1037/apl0000085,
DeVellis, R. F. (2017). Scale development: Theory and applications. 4th Edn. Thousand Oaks: Sage Publications.
Duradoni, M., Gursesli, M. C., Martucci, A., Gonzalez Ayarza, I. Y., Colombini, G., and Guazzini, A. (2023). Dark personality traits and counterproductive work behavior: a prisma systematic review. Psychol. Rep. 6:00332941231219921. doi: 10.1177/00332941231219921
Egan, V., Hughes, N., and Palmer, E. J. (2015). Moral disengagement, the dark triad, and unethical consumer attitudes. Pers. Individ. Differ. 76, 123–128. doi: 10.1016/j.paid.2014.11.054
Ellingson, J. E., and McFarland, L. A. (2011). Understanding faking behavior through the lens of motivation: an application of VIE theory. Hum. Perform. 24, 322–337. doi: 10.1080/08959285.2011.597477
Ellison, L. J., McClure Johnson, T., Tomczak, D., Siemsen, A., and Gonzalez, M. F. (2020). Game on! Exploring reactions to game-based selection assessments. J. Manage. Psychol. 35, 241–254. doi: 10.1108/JMP-09-2018-0414
Fernández del Río, E., Barrada, J. R., and Ramos-Villagrasa, P. J. (2021). Bad behaviors at work: Spanish adaptation of the workplace deviance scale. Curr. Psychol. 40, 1660–1671. doi: 10.1007/s12144-018-0087-1
Fernández-del-Río, E., Ramos-Villagrasa, P. J., and Barrada, J. R. (2020). Bad guys perform better? The incremental predictive validity of the dark tetrad over big five and honesty-humility. Pers. Individ. Differ. 154:109700. doi: 10.1016/j.paid.2019.109700
Funder, D. C., and Ozer, D. J. (2019). Evaluating effect size in psychological research: sense and nonsense. Adv. Methods Pract. Psychol. Sci. 2, 156–168. doi: 10.1177/2515245919847202
Harman, J. L., and Brown, K. D. (2022). Illustrating a narrative: a test of game elements in game-like personality assessment. Int. J. Sel. Assess. 30, 157–166. doi: 10.1111/ijsa.12374
Haynes, S. N., Richard, D. C. S., and Kubany, E. S. (1995). Content validity in psychological assessment: a functional approach to concepts and methods. Psychol. Assess. 7, 238–247. doi: 10.1037/1040-3590.7.3.238
Herde, C. N., Lievens, F., Solberg, E. G., Harbaugh, J. L., Strong, M. H., and Burkholder, G. J. (2019). Situational judgment tests as measures of 21st century skills: evidence across Europe and Latin America. J. Work Organ. Psychol. 35, 65–74. doi: 10.5093/jwop2019a8
Hilliard, A., Kazim, E., Bitsakis, T., and Leutner, F. (2022). Measuring personality through images: validating a forced-choice image-based assessment of the big five personality traits. J. Intelligence 10:12. doi: 10.3390/jintelligence10010012,
Horsten, L. k., Moshagen, M., Zettler, I., and Hilbig, B. E. (2021). Theoretical and empirical dissociations between the dark factor of personality and low honesty-humility. J. Res. Pers. 95:104154. doi: 10.1016/j.jrp.2021.104154
Horsten, L. K., Thielmann, I., Moshagen, M., Zettler, I., Scholz, D., and Hilbig, B. E. (2024). Testing the equivalence of the aversive core of personality and a blend of agreeableness(−related) items. J. Pers. 92, 393–404. doi: 10.1111/jopy.12830,
Hu, L.-t., and Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. Struct. Equ. Model. 6, 1–55. doi: 10.1080/10705519909540118
Jonason, P. K., and Webster, G. D. (2010). The dirty dozen: a concise measure of the dark triad. Psychol. Assess. 22, 420–432. doi: 10.1037/a0019265,
Judge, T. A., LePine, J. A., and Rich, B. L. (2006). Loving yourself abundantly: relationship of the narcissistic personality to self- and other perceptions of workplace deviance, leadership, and task and contextual performance. J. Appl. Psychol. 91, 762–776. doi: 10.1037/0021-9010.91.4.762,
Judge, T. A., Rodell, J. B., Klinger, R. L., Simon, L. S., and Crawford, E. R. (2013). Hierarchical representations of the five-factor model of personality in predicting job performance: integrating three organising frameworks with two theoretical perspectives. J. Appl. Psychol. 98, 875–925. doi: 10.1037/a0033901,
Kasten, N., and Freund, P. A. (2016). A meta-analytical multilevel reliability generalisation of situational judgment tests (SJTs). Eur. J. Psychol. Assess. 32, 230–240. doi: 10.1027/1015-5759/a000250
Kim, T. W., and Werbach, K. (2016). More than just a game: ethical issues in gamification. Ethics Inf. Technol. 18, 157–173. doi: 10.1007/s10676-016-9401-5
Kirk, R. E. (2013). Experimental design: Procedures for the behavioral sciences. 4th Edn. Thousand Oaks: Sage Publications.
Kızıloğlu, M., Dluhopolskyi, O., Koziuk, V., Vitvitskyi, S., and Kozlovskyi, S. (2021). Dark personality traits and job performance of employees: the mediating role of perfectionism, stress, and social media addiction. Probl. Perspect. Manage. 19, 533–544. doi: 10.21511/ppm.19(3).2021.43
Koopmans, L. (2015). Individual Work Performance Questionnaire instruction manual. Amsterdam, NL: TNO Innovation for Life – VU University Medical Center.
Koopmans, L., Bernaards, C. M., Hildebrandt, V. H., Schaufeli, W. B., De Vet Henrica, C. W., and Van Der Beek, A. J. (2011). Conceptual frameworks of individual work performance: a systematic review. J. Occup. Environ. Med. 53, 856–866. doi: 10.1097/JOM.0b013e318226a763,
Landers, R. N., and Collmus, A. B. (2022). Gamifying a personality measure by converting it into a story: convergence, incremental prediction, faking, and reactions. Int. J. Sel. Assess. 30, 145–156. doi: 10.1111/ijsa.12373
Langer, M., Roulin, N., and Oostrom, J. (2023). Diversity and technology—challenges for the next decade in personnel selection. Int. J. Sel. Assess. 31, 355–360. doi: 10.1111/ijsa.12439
Lee, K., and Ashton, M. C. (2004). Psychometric properties of the HEXACO personality inventory. Multivar. Behav. Res, 39, 329–358. doi: 10.1207/s15327906mbr3902_8
Lee, K., and Ashton, M. C. (2014). The dark triad, the big five, and the HEXACO model. Pers. Individ. Differ. 67, 2–5. doi: 10.1016/j.paid.2014.01.048
Levashina, J., and Campion, M. A. (2006). A model of faking likelihood in the employment interview. Int. J. Sel. Assess. 14, 299–316. doi: 10.1111/j.1468-2389.2006.00353.x
Lewinson, R., Wardell, J. D., Katz, J., and Keough, M. T. (2024). Internalizing personality traits and coping motivations for gaming during the COVID-19 pandemic: a cross-lagged panel mediation analysis. Cyberpsychol. J. Psychosoc. Res. Cyberspace 18:5. doi: 10.5817/CP2024-3-5
Lievens, F., and Motowidlo, S. J. (2016). Situational judgment tests: from measures of situational judgment to measures of general domain knowledge. Ind. Organ. Psychol. 9, 3–22. doi: 10.1017/iop.2015.71
Lievens, F., Peeters, H., and Schollaert, E. (2008). Situational judgment tests: a review of recent research. Pers. Rev. 37, 426–441. doi: 10.1108/00483480810877598
Lievens, F., Schäpers, F., and Herde, C. N. (2021). “Situational judgment tests: from low-fidelity simulations to alternative measures of personality and the person-situation interplay” in Emerging approaches to measuring and modeling the person and situation. eds. D. Wood, P. Harms, S. Read, and A. Slaughter (San Diego: Elsevier), 285–311.
Longpré, N., and Turner, S. (2024). Dark tetrad at work: perceived severity of bullying, harassment, and workplace deviance. Int. J. Offender Ther. Comp. Criminol. doi: 10.1177/0306624X241236715
MacCallum, R. C., Roznowski, M., and Necowitz, L. B. (1992). Model modifications in covariance structure analysis: the problem of capitalization on chance. Psychol. Bull. 111, 490–504. doi: 10.1037/0033-2909.111.3.490,
McDaniel, M. A., Hartman, N. S., Whetzel, D. L., and Grubb, W. L. III (2007). Situational judgment tests, response instructions, and validity: a meta-analysis. Pers. Psychol. 60, 63–91. doi: 10.1111/j.1744-6570.2007.00065.x
Melchers, K. G., and Basch, J. M. (2022). Fair play? Sex-, age-, and job-related correlates of performance in a computer-based simulation game. Int. J. Sel. Assess. 30, 48–61. doi: 10.1111/ijsa.12337
Miller, J. D., Vize, C., Crowe, M. L., and Lynam, D. R. (2019). A critical appraisal of the dark-triad literature and suggestions for moving forward. Curr. Dir. Psychol. Sci. 28, 353–360. doi: 10.1177/0963721419838233
Moore, G. C., and Benbasat, I. (1991). Development of an instrument to measure the perceptions of adopting an information technology innovation. Inform. Syst. Res. 2, 192–222. doi: 10.1287/isre.2.3.192
Moore, C., Detert, J. R., Treviño, L. K., Baker, V. L., and Mayer, D. M. (2012). Why employees do bad things: moral disengagement and unethical organisational behavior. Pers. Psychol. 65, 1–48. doi: 10.1111/j.1744-6570.2011.01237.x
Moscoso, S., and Salgado, J. F. (2004). Dark side personality styles as predictors of task, contextual, and job performance. Int. J. Sel. Assess. 12, 356–362. doi: 10.1111/j.0965-075X.2004.00290.x
Moshagen, M., Hilbig, B. E., and Zettler, I. (2018). The dark core of personality. Psychol. Rev. 125, 656–688. doi: 10.1037/rev0000111,
Moshagen, M., Zettler, I., and Hilbig, B. E. (2020). Measuring the dark core of personality. Psychol. Assess. 32, 182–196. doi: 10.1037/pas0000778,
Motowidlo, S. J., Borman, W. C., and Schmit, M. J. (2014). “A theory of individual differences in task and contextual performance” in Organizational citizenship behavior and contextual performance. (New York: Psychology Press), 71–83.
Mussel, P., Gatzka, T., and Hewig, J. (2016). Situational judgment tests as an alternative measure for personality assessment. Eur. J. Psychol. Assess. 34, 1–19. doi: 10.1027/1015-5759/a000346
Navas, M. P., Ramos-Villagrasa, P. J., Golpe, S., and Sobral, J. (2024). Propiedades psicométricas de la versión Española de la Escala de Propensión a la Desconexión moral (S-PMD) [psychometric properties of the Spanish versión of moral disengagement scale]. Lisbon, Portugal: XIII Ibero-American Congress of Psychology.
O’Boyle, E. H., Forsyth, D. R., Banks, G. C., and McDaniel, M. A. (2012). A meta-analysis of the dark triad and work behavior: a social exchange perspective. J. Appl. Psychol. 97, 557–579. doi: 10.1037/a0025679
Ohlms, M. L., Melchers, K. G., Kanning, U. P., and Barends, A. J. (2025). Game on, faking off? Are game-based assessments less susceptible to faking than traditional assessments? J. Bus. Psychol. doi: 10.1007/s10869-025-10019-6
Ohlms, M. L., Melchers, K. G., and Lievens, F. (2024). It's just a game! Effects of fantasy in a storified test on applicant reactions. Appl. Psychol. 74:e12569. doi: 10.1111/apps.12569
Olaru, G., Burrus, J., MacCann, C., Zaromb, F. M., Wilhelm, O., and Roberts, R. D. (2019). Situational judgment tests as a method for measuring personality: development and validity evidence for a test of dependability. PLoS One 14:e0211884. doi: 10.1371/journal.pone.0211884,
Organ, D. W., and Paine, J. B. (1999). “A new kind of performance for industrial and organizational psychology: recent contributions to the study of organizational citizenship behavior” in International review of industrial and organizational psychology. eds. C. L. Cooper and I. T. Robertson (New York: John Wiley & Sons Ltd), 14:337–368.
Paulhus, D. L. (2014). Toward a taxonomy of dark personalities. Curr. Dir. Psychol. Sci. 23, 421–426. doi: 10.1177/0963721414547737
Paulhus, D. L., Buckels, E. E., Trapnell, P. D., and Jones, D. N. (2021). Screening for dark personalities: the short dark tetrad (SD4). Eur. J. Psychol. Assess. 37, 208–222. doi: 10.1027/1015-5759/a000602
Paulhus, D. L., and Williams, K. M. (2002). The dark triad of personality: narcissism, Machiavellianism, and psychopathy. J. Res. Pers. 36, 556–563. doi: 10.1016/S0092-6566(02)00505-6
Rammstedt, B., and Beierlein, C. (2014). Can't we make it any shorter? The limits of personality assessment and way to overcome them. J. Individ. Differ. 35, 212–220. doi: 10.1027/1614-0001/a000141
Ramos-Villagrasa, P. J., Barrada, J. R., Fernández-del-Río, E., and Koopmans, L. (2019). Assessing job performance using brief self-report scales: the case of the individual work performance questionnaire. J. Work Organ. Psychol. 35, 195–205. doi: 10.5093/jwop2019a21
Ramos-Villagrasa, P. J., and Fernández-del-Río, E. (2023). Predictive validity, applicant reactions, and influence of personal characteristics of a gamefully designed assessment. J. Work Organ. Psychol. 39, 169–178. doi: 10.5093/jwop2023a18
Ramos-Villagrasa, P. J., Fernández-del-Río, E., and Castro, Á. (2022). Game-related assessments for personnel selection: a systematic review. Front. Psychol. 13:952002. doi: 10.3389/fpsyg.2022.952002,
Ramos-Villagrasa, P. J., Fernández-Del-Río, E., Hermoso, R., and Cebrián, J. (2024). Are serious games an alternative to traditional personality questionnaires? Initial analysis of a gamified assessment. PLoS One 19:e0302429. doi: 10.1371/journal.pone.0302429,
Ramos-Villagrasa, P. J., Fernández-Del-Río, E., Reig-Botella, A., and Clemente, M. (2025). The role of propensity to moral disengagement in the prediction of non-ethics outcomes at work. An. Psicol. 41, 55–62. doi: 10.6018/analesps.597811
Ramos-Villagrasa, P. J., and Naryniecki, T. (2025). “Game-related assessment (GRA)” in International encyclopedia of business management, Melbourne: Academic Press. ed. V. Ratten.
Rauthmann, J. F., and Kolar, G. P. (2012). How “dark” are the dark triad traits? Examining the perceived darkness of narcissism, Machiavellianism, and psychopathy. Pers. Individ. Differ. 53, 884–889. doi: 10.1016/j.paid.2012.06.020
Romero, E., Villar, P., and López-Romero, L. (2015). Assessing six factors in Spain: validation of the HEXACO-100 in relation to the five factor model and other conceptually relevant criteria. Pers. Individ. Differ. 76, 75–81. doi: 10.1016/j.paid.2014.11.056
Roulin, N., and Krings, F. (2016). When winning is everything: the relationship between competitive worldviews and job applicant faking. Appl. Psychol. 65, 643–670. doi: 10.1111/apps.12072
Sackett, P. R., and DeVore, C. J. (2001). “Counterproductive behaviours at work” in Handbook of industrial, work and organisational psychology. eds. N. Anderson, D. S. Ones, H. K. Sinangil, and C. Viswesvaran, (London: Sage), 1:145–164.
Sackett, P. R., and Lievens, F. (2008). Personnel selection. Annu. Rev. Psychol. 59, 419–450. doi: 10.1146/annurev.psych.59.103006.093716,
Salgado, J. F., and Moscoso, S. (2019). Meta-analysis of the validity of general mental ability for five performance criteria: hunter and hunter (1984) revisited. Front. Psychol. 10:2227. doi: 10.3389/fpsyg.2019.02227,
Sekhar, S., and Uppal, N. (2024). When the dark one negotiates: sacrificing relations at the altar of money. Pers. Individ. Differ. 230:112790. doi: 10.1016/j.paid.2024.112790
Shrestha, N. (2020). Detecting multicollinearity in regression analysis. Am. J. Appl. Math. Stat. 8, 39–42. doi: 10.12691/ajams-8-2-1
Smith, C. A., Organ, D. W., and Near, J. P. (1983). Organizational citizenship behaviour: its nature and antecedents. J. Appl. Psychol. 68, 653–663. doi: 10.1037/0021-9010.68.4.653
Soto, C. J., and John, O. P. (2017). Short and extra-short forms of the big five inventory–2: the BFI-2-S and BFI-2-XS. J. Res. Pers. 68, 69–81. doi: 10.1016/j.jrp.2017.02.004
Szabó, Z. P., Diller, S. J., Czibor, A., Restás, P., Jonas, E., and Frey, D. (2023). “One of these things is not like the others”: the associations between dark triad personality traits, work attitudes, and work-related motivation. Personal. Individ. Differ. 205, 112098–112010. doi: 10.1016/j.paid.2023.112098,
Tett, R. P., and Simonet, D. V. (2011). Faking in personality assessment: a “multisaturation” perspective on faking as performance. Hum. Perform. 24, 302–321. doi: 10.1080/08959285.2011.597472
Thibault, T., and Kelloway, E. K. (2020). The dark tetrad at work. Hum. Perform. 33, 406–424. doi: 10.1080/08959285.2020.1802728
Uppal, N. (2022). Does it pay to be bad? An investigation of dark triad traits and job performance in India. Pers. Rev. 51, 699–714. doi: 10.1108/PR-07-2019-0391
van der Linden, D., te Nijenhuis, J., and Bakker, A. B. (2010). The general factor of personality: a meta-analysis of big five intercorrelations and a criterion-related validity study. J. Res. Pers. 44, 315–327. doi: 10.1016/j.jrp.2010.03.003
Van Iddekinge, C. H., Lievens, F., and Sackett, P. R. (2023). Personnel selection: a review of ways to maximise validity, diversity, and the applicant experience. Pers. Psychol. 76, 651–686. doi: 10.1111/peps.12578
Vize, C. E., Collison, K. L., Miller, J. D., and Lynam, D. R. (2020). The “core” of the dark triad: a test of competing hypotheses. Personal. Disord. Theory Res. Treat. 11, 91–99. doi: 10.1037/per0000386,
Vize, C. E., Miller, J. D., and Lynam, D. R. (2021). Examining the conceptual and empirical distinctiveness of agreeableness and "dark" personality items. J. Pers. 89, 594–612. doi: 10.1111/jopy.12601,
Walker, S. A., Double, K. S., Birney, D. P., and MacCann, C. (2022). How much can people fake on the dark triad? A meta-analysis and systematic review of instructed faking. Pers. Individ. Differ. 193:111622. doi: 10.1016/j.paid.2022.111622
Wille, B., Heyde, F., Vergauwe, J., and De Fruyt, F. (2023). Understanding dark side personality at work: distinguishing and reviewing nonlinear, interactive, differential, and reciprocal effects. Int. J. Sel. Assess. 31, 1–21. doi: 10.1111/ijsa.12407
Woods, S. A., Ahmed, S., Nikolaou, I., Costa, A. C., and Anderson, N. R. (2020). Personnel selection in the digital age: a review of validity and applicant reactions, and future research challenges. Eur. J. Work Organ. Psychol. 29, 64–77. doi: 10.1080/1359432X.2019.1681401
Zettler, I., and Solga, M. (2013). Not enough of a ‘dark’trait? Linking Machiavellianism to job performance. Eur. J. Personal. 27, 545–554. doi: 10.1002/per.1912
Keywords: dark personality, gamification, game-related assessment, gamified assessment, job performance, counterproductive work behaviors
Citation: Navas MP, Fernández-del-Río E and Ramos-Villagrasa PJ (2026) Using a serious game for a brief assessment of dark personality in the workplace. Front. Psychol. 16:1686784. doi: 10.3389/fpsyg.2025.1686784
Edited by:
Ana Jiménez-Zarco, Open University of Catalonia, SpainReviewed by:
Gianpaolo Iazzolino, University of Calabria, ItalyAndy Ang, Southern Institute of Technology, New Zealand
İhsan Çağatay Ulus, Bartin University, Türkiye
Copyright © 2026 Navas, Fernández-del-Río and Ramos-Villagrasa. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Pedro J. Ramos-Villagrasa, cGpyYW1vc0B1bml6YXIuZXM=