Thinking Through the Box: Evaluating a 3D Game to Engage Penetrative Thinking

Spatial skills allow us to mentally imagine and manipulate objects and their spatial relations. These skills are crucial in both every day and expert tasks. The present paper reports on an evaluation of a 3D game developed to train a specific spatial skill known as penetrative thinking—the ability to imagine cross-sections of 3D objects from their surface features. In the game, users change the location and orientation of a virtual plane to make cuts through 3D objects in a series of spatial puzzles. Users operate an interface to position the virtual plane until a “slice” at the location of the plane matches a target cross-section of a virtual object. Multiple spatial puzzles with different properties are completed throughout the game. In one version of the game, users completed the puzzles in an immersive virtual environment and operated a tangible interface to move the virtual plane. A secondary version of the game required users to view the puzzles in a virtual environment displayed on a computer screen, and to position the slicing plane with a keyboard and mouse. Participants (n = 45) completed a measure of penetrative thinking (Santa Barbara Solids Test) before and after completing one of three interventions: the game with the tangible interface (n = 15), the game with the keyboard interface (n = 15), or a series of (control) questions (n = 15). Although there were no significant pre-/post-intervention changes in penetrative thinking in any of the groups, participants' performance in the game correlated with scores on a standardized test of penetrative thinking. These results provide evidence that the game and the standardized test accessed similar spatial skills and, as a consequence, indicate that the 3D game has the potential to be a valid approach for training penetrative thinking skills.


INTRODUCTION
Spatial skills are central cognitive abilities necessary for both everyday experience and specialized expert activities. We use a diverse range of spatial skills to manipulate objects and orient our bodies in space in order to navigate and comprehend a three-dimensional physical world. These spatial skills include: mental rotation, which is the ability to construct a mental representation of an object and then draw conclusions about it after the object has undergone spatial transformations (Shepard and Metzler, 1971); perspective taking, which is the ability to mentally construct a viewpoint that differs from one's own perspective (Frick et al., 2014); and mental folding, which involves the abstract transformation of 2D patterns or materials into 3D objects and representations (Harris et al., 2013).
These and other spatial skills allow us to form mental representations of shapes and positions of objects in order to manipulate them mentally or physically. Careers in various science, technology, engineering, and mathematics (STEM) fields strongly leverage various spatial skills (Uttal and Cohen, 2012). For example, engineers who study the assembly of a complex machine in a 3D visualization rely on a spatial skill like mental rotation to estimate, predict, or judge relationships between entities in spatial contexts (Eliot and Smith, 1983).
Various spatial cognition assessment and training tools rely on 2D paper-pencil or keyboard-mouse interactions (e.g., the Perspective Taking/Spatial Orientation Test-see Kozhevnikov and Hegarty, 2001) and convey little information about the structural dimensions or surface features of an object. Because these tools operate with limited digital media features, they often do not incorporate the actions or embodied relationships between users and visual objects that have been shown to enhance learning in mathematics and physical systems (e.g., Goldin-Meadow and Beilock, 2010), and that better represent how a learner will use spatial skills in practice. Moreover, they lack the sort of dynamic 3D visualization contexts that users typically encounter in real-world applications.
To address this set of challenges, our research group designed and implemented several interactive virtual reality environments, driven by tangible interfaces, that provide rich, embodied experiences for users facing complex spatial challenges (Chang et al., 2017a(Chang et al., , 2018(Chang et al., , 2019. The present paper reports on a study designed to assess the efficacy of a novel game that was designed to focus on a specific spatial ability known as penetrative thinking, which is the ability to visualize and understand the internal structure of a 3D object based on its external form (Kali and Orion, 1996). Penetrative thinking is crucial to success in fields ranging from geology to medical science. For example, geologists require the capacity to envision cross-sections of the earth's strata, while medical practitioners must be able to determine the location of internal organs based on anatomical landmarks on the outside of a body prior to initiating surgical procedures.
The present research is based on the following premises: (1) the novel 3D game will engage the processes that support penetrative thinking; and, (2) systems that draw on the affordances of virtual reality and tangible and embodied interaction (VR-TEI) for engaging penetrative thinking skills may afford a greater capacity to improve this important spatial ability than systems that involve less immersive environments in which users operate a keyboard and mouse to move objects on a 2D computer screen. This paper reports the results of a study that evaluates two main questions: (1) does the novel 3D game engage penetrative thinking? and (2) does the 3D game played using the VR-TEI lead to greater improvements in spatial skills relative to the 3D game with the keyboard interaction?

Related Work
Accounting for spatial ability is an important aspect of the development of new interfaces for STEM learning. Numerous studies highlight a strong correlation between spatial ability and future STEM success. For example, longitudinal research conducted over the past 50 years suggests that spatial thinking skills are strongly related to students' entrance into, and success within, STEM fields. Lubinski and Benbow (2006) followed 5,000 students for 35 years to uncover what influences careers in math and sciences. A high degree of spatial ability was found to be indicative of the future pursuit of a scientific career. Similarly, Wai et al. (2009) analyzed 400,000 participants over 11 years and found that spatial ability plays a critical role in developing STEM expertise. These studies suggest that advancing methods for teaching and learning spatial ability is crucial for STEM development (Uttal and Cohen, 2012).
There is a large body of evidence that suggests physical interaction with certain types of objects supports the acquisition of spatial skills (Tran et al., 2017). Manipulating physical chemistry models, for example, leads to improvement on spatial items in a subsequent evaluation (Small and Morton, 1983). When designing for such spatial ability in digital media, the notion of "embodiment" is crucial. Embodiment is based on the principle that cognitive processes are deeply rooted in the body's interaction with the world (Wilson, 2002). Here, the entire body is integral in shaping perception and cognition.
Ideomotor theory posits a link between action, perception, and cognition, and provides a mechanistic explanation for how interactions occur based on the idea that perception and action share a common representational domain (Prinz, 1997;Hommel et al., 2001). This theory describes how our motor system is activated implicitly not only when actions are perceived or imagined (e.g., Jeannerod, 2001), but also during most-if not all-perceptual processing. In the context of virtual interactions and gaming, the theory can explain, for example, why we move our heads to dodge virtual bullets when playing shooter games, lean into a tight turn in a driving game, or get dizzy if we see somebody else spinning. As a result of the connections between perception, cognition and action, thinking about or imagining action can prime perceptual systems, and perceiving the actions of other people can also lead to cognitive activation. For example, performance on mental rotation tasks is better when people are able to rotate their hands than when hand movements are restricted (Wohlschläger, 2001). Such activation supports the likely development of spatial skills, which are central for movement and engagement. For the present purpose, the key principle is that perception, action, and cognition are intricately linked through the motor system during cognitive tasks, as suggested by ideomotor theory.
Embodiment has become a paramount concept in HCI research (Dourish, 2001), and tangible interfaces have proven to be particularly well-suited for interaction design focusing on embodiment (Baykal et al., 2018). Specific to the present research, the benefits of tangible and embodied interaction have been observed in studies focused on a wide range of topics, including the relationship between spatial ability, puzzle solving, and abstract mapping (Macaranas et al., 2012;Antle and Wang, 2013); gaming performance (Reinhardt and Hurtienne, 2018); virtual reality interaction (Bozgeyikli and Bozgeyikli, 2019); and interactive learning environments (Malinverni et al., 2016).
Virtual Reality provides a natural home for computing environments that wish to emphasize tangible and embodied interaction in spatial skill development. The benefits of fullbody, immersive interaction have been shown for manipulation and design of 3D objects (Nakanishi, 2012), creative expression with complex models (Fröhlich et al., 2018), medical training (Sousa et al., 2017), and building understanding of abstract spatial and geometrical concepts (Oberdörfer et al., 2019). These benefits are enhanced in immersive environments that combine virtual elements with physical interaction, plausible contexts, and illusion of place (Slater, 2009;Chagué and Charbonnier, 2016). The combination of VR and TEI interfaces has the proven potential to affect users and activate spatial cognition in a controlled condition.

Tangibles for Augmenting Spatial Cognition
The game described herein is part of a research program called Tangibles for Augmenting Spatial Cognition (TASC), which aims to develop and study virtual game environments that employ tangible and embodied interaction in the context of solving spatial puzzles. The TASC team has examined various interactive methods for engaging distinct spatial skills, including mental rotation, perspective taking, and penetrative thinking (Mazalek et al., 2010;Chang et al., 2017aChang et al., , 2019. In previous TASC research, head-tracking, hand-tracking, tactile feedback, and a tangible interface were integrated in a custom VR environment that required users to solve spatial puzzles. That system was designed to emphasize embodiment in the activation of a different spatial skill-perspective taking. Chang et al. (2017a) describes a previous experimental study that used a digital version of the paper-based Perspective Taking/Spatial Orientation Test (PTSOT) developed by Hegarty et al. (2008) to measure change in perspective taking ability across 3 conditions (VR-TEI, keyboard/mouse, control; n = 46). Analysis of the pre-/post-test change in performance revealed that only the VR-TEI group showed statistically significant improvements. Chang et al. (2017b) provides details of the custom-designed system used in this study, describing the TASC team's design motivations, game iterations, and implementation of the tangible controller. A pilot study (n = 6) and user study (n = 10) are also described, providing information about spatial strategies and gestures deployed by users. Participants described involving their bodies when solving spatial puzzles using this system.
The research reported in the present paper was conducted to extend previous TASC work by evaluating a newly developed game that has the potential to assess and train penetrative thinking skills. Consistent with previous work, the present study involved participants playing a newly developed game that could be completed using either a tangible or a keyboard interface, or a control condition without a game. The two goals of this study were to: (1) assess the ability of the newly designed game to activate the spatial skill of penetrative thinking; and (2) conduct an initial test of potential for this new game to train penetrative thinking.

Technical System and Game Design
The system was built in the Unity game engine for an Oculus Rift Development Kit 2 virtual reality headset. During the "Keep the Ball Rolling" game, participants were required to "slice" 3D objects to advance through multiple levels. During gameplay, they are presented with a desired slice, which consists of a single cross-section of a virtual object. Participants must adjust the location and orientation of a virtual slicing plane to cut the virtual object in such a way that the resulting cross-section matches the target slice. Users submit their selected cut by pressing a foot pedal. To successfully determine the corresponding slice, users must draw on information gathered from the volume, shape, and surface of the object to envision its internal structure, thus potentially engaging their penetrative thinking ability.
The goal of the game is to successfully slice and clear 3D objects so that a virtual ball can travel through an obstacle course (see Figure 3). When a player solves a spatial puzzle by matching the target cross-section, the remaining part of the shape turns into a ramp that allows the ball to move forward to the next level. Players have to solve a total of 12 consecutive puzzles of increasing complexity to complete the full game. The game's overall design goal was to support embodiment through both VR and tangible interaction design. Additional details about the system and game design can be found in (Chang et al., 2019).
In the VR-TEI version of the game, participants viewed an immersive 3D environment through an Oculus headset and used a tangible interface to control the position and location of the cutting plane. The tangible interface consists of a physical plane made from a thin wooden board attached to a custom rail track. The position and orientation of the board directly maps to the position and orientation of the virtual cutting slice allowing for vertical in-game movement (see Figure 2). The board's height is captured by an ultrasonic distance sensor, and its angle is measured by a potentiometer attached to a rotating shaft affixed to the board. Both sensors are connected to an Arduino microcontroller that feeds data to the Unity application over a serial connection. In the keyboard version of the game, participants viewed the same environment on a computer screen and controlled the position and orientation of the cutting plane with a keyboard and mouse.

THE EXPERIMENT
The present study involved three groups of participants. Each group completed one of three interventions involving: (1) the "Keep the Ball Rolling" game using the tangible interface (VR-TEI); (2) the "Keep the Ball Rolling" game using a keyboard and standard monitor (KB); and (3) a control intervention that did not use any spatial interfaces but consisted of math and language problems (CT). Participants completed a standardized test of penetrative thinking, known as the Santa Barbara Solids Test (SBST), both before and after completing the assigned intervention (see Cohen and Hegarty, 2012, for more details on the SBST).
Recall that Research Question #1 of the study was to determine whether the novel slicing game engaged spatial thinking processes that are modeled on those assessed in the SBST (i.e., penetrative thinking skills). To address Research Question #1, performance in the slicing game, measured by the total number of attempted slices required to successfully complete the game, was correlated with performance on the SBST. This main question was addressed by conducting the correlation using the data from all participants. If the "Keep the Ball Rolling" game and the SBST engage similar processes, then performance on the two should be highly correlated. If the slicing game and the SBST engage different processes, then performance on the two tasks should not be correlated. In other words, players who perform well in the SBST should also perform well in the game. Following this main analysis, additional sub-analyses were conducted to determine if any relationship between performance on the game and the SBST emerged for each interface (VR-TEI vs. KB). The results of these sub-analyses should be interpreted with some caution due the relatively low sample size (n = 15) for each correlation.
Research Question #2 of the study was to provide an initial assessment of the efficacy of learning protocols based on the experimental interventions we designed. To address this topic, each group's scores on the pre-and post-intervention SBST were compared. If the tangible-embodied interface protocol (VR-TEI) is more effective at influencing penetrative thinking skills than using a keyboard interface (KB) and/or the control group, then participants in that experimental condition should show a significant pre/post-intervention increase in SBST scores compared with the keyboard and/or control group. If the interface does not enhance learning, or if penetrative thinking skills are not amenable to change, then scores on the SBST should be similar across all groups and may not change at all.

Participants
Exclusion criteria during recruitment were minimal, allowing for a sample representative of a wide range of individuals with different ages and educational backgrounds. We used email lists, social media, and flyers posted around local universities to advertise the study, restricting participation to adults over 18 years of age. Forty-five (45) participants took part in the study (M = 22, F = 23; mean age = 26; range: 18-68; SD = 9.5). Fifteen (15) participants were assigned to each condition group: VR-TEI group (M = 7, F = 8; mean age = 25); KB group (M = 8, F = 7; mean age = 30); and the CT group (M = 7, F = 8; mean age = 23). Participants were randomly assigned to one of the intervention groups with the one constraint of maintaining, as best as possible, gender balance across the groups. Based on demographic data acquired at the time of testing, most (n = 37) participants were undergraduate students or recent graduates from a STEM discipline. Participants were naïve to the purpose of the study and provided consent prior to its start. All individuals received $10 compensation. Participation lasted approximately 60 min. The procedures of this study were consistent with and approved by the Ryerson University Research Ethics Board.

General Procedures
Participants were randomly assigned to one of the three intervention conditions. Prior to commencement of the intervention, participants completed a digitized version of the SBST (see section Assessment of Penetrative Thinking Ability for details). After completing the pre-intervention SBST, participants in the VR-TEI and KB groups were given a short, untimed tutorial (which takes approximately 2 min to complete) in order to familiarize themselves with the equipment and interaction gestures, and then completed the intervention. There were no time limits on completing the intervention, however members of both groups took an average of 18 min to complete their respective intervention. The control group completed a non-spatial task that lasted 18 min. Each of these conditions is described in greater detail below. Immediately after the interventions, participants completed the post-intervention SBST and then debriefed on the study.

Assessment of Penetrative Thinking Ability
Before and after each intervention, participants completed a digital version of the SBST, prepared using Google Forms, to assess their penetrative thinking ability. Upon completion of the post-intervention assessment, participants also completed a short background and demographic survey in which age, experience in STEM, level of education, and familiarity with virtual reality were recorded.
The SBST requires individuals to infer cross-sections of 3D objects of varying complexity. In the test, participants are presented with images of geometric solids that are intersected by a cutting plane. For each image, participants must select the 2D shape that best represents the cross-section that would be created if the specific plane sliced the object. Three additional shape choices are also available, presenting different cuts through the same 3D object.
The SBST consists of 30 questions that are categorized based on two types of cutting plane: orthogonal or oblique. An orthogonal plane is either vertical or horizontal to the cutting FIGURE 1 | A puzzle from the digital version of the SBST (Cohen and Hegarty, 2012).
Frontiers in Virtual Reality | www.frontiersin.org object (as seen e.g., in Figure 1), whereas an oblique plane is at an angle. The test also includes three object categories: simple objects, like cubes and cones; joined objects, in which two simple solids are attached to one another; and embedded objects, in which simple solid objects are enmeshed with one another. A digital version of the SBST was created and administered to the participants (see Figure 1). The order of the objects in the pre-test was identical to that in the original SBST. The same objects were used in the post-test, but the order in which the objects were presented was randomized to avoid possible pattern memorization and familiarity. This randomized order for the post-test was consistent for all participants.
The SBST was chosen because it is specifically designed to assess penetrative thinking ability. Cohen and Hegarty (2007), who designed this evaluation measure, describe an initial study (n = 59) meant to establish internal reliability and external validity of the test. A later study (n = 223) by Cohen and Hegarty (2012), administered online to undergraduate students enrolled in introductory science courses, found differences in performance across item complexity and participant sex, suggesting that items on the test are "differentially amenable to imagistic and analytic strategies, with males outperforming females on items that are less amenable to analytic strategies." Cohen and Bairaktarova (2018) describe a study (n = 141) conducted among firstyear engineering students with low mental rotation ability. Their findings suggest that the SBST is an appropriate tool for characterizing spatial visualization challenges and strategies demonstrated by engineering students. Sanandaji et al. (2017) modify the SBST by presenting participants with 3D stimuli and biological shapes relative to the 3D geometric shapes found in the SBST. Their study (n = 40) suggested that overall performance improved when participants could see objects rotating in 3D, and that inferring cross-sections of biological shapes is more difficult than pure geometric shapes.

VR-TEI Condition and "Keep the Ball Rolling" Gameplay
Embodied interaction design emphasizes the interwoven relationship between perception, reasoning, decision-making, and bodily action. Interaction designers working within this paradigm seek to leverage the full range of a user's gestures, movements, and actions to facilitate meaningful interaction with technical systems. Embodied interaction has been demonstrated to be effective for spatial reasoning-based educational games (Chiu et al., 2018). The broader TASC project emphasizes embodiment to enhance existing spatial training protocols. Our VR-TEI intervention for this system was created to improve penetrative thinking ability through embodiment in two ways: (1) the user wears an Oculus head-mounted display, which tracks head movement and immerses the user in a 3D virtual environment, thus engaging the user's vestibular system for orientation, in-game locomotion, and spatial reasoning; and (2) a physical interactive wooden plane controls the movement of the virtual slicing plane, thus matching tangible with virtual object behavior and engaging the user's somatosensory system, primarily proprioceptive and tactile reasoning. In the study, participants were provided with an introduction to the system and its operations. A short on-screen tutorial familiarized them with the physical interface and virtual environment. These initial interactions were not recorded. Once familiar with the system, participants began the game intervention, which consisted of 12 spatial puzzles. At each level, participants encountered a 3D object and the image of a target cross-section next to it. Their task was to use the board as a slicing tool to recreate the same "target-slice" through the object at each level.
Players controlled the cutting plane by moving it up, down, or tilting it side-to-side (Figure 2). Once participants were satisfied with the orientation of the slicing tool, they operated a foot pedal to signal a slicing action. The 3D object was cut, and the top portion of the object detached to show the slice.
Next to the 3D object, participants encountered two screens. One displayed the "desired answer" showing the targeted crosssection to match. The other showed the actual cross-section that had been created with their last slicing action (see Figure 3). These screens gave participants feedback on the plane orientation of their previous cut, and helped them assess what they should do to better match the desired cross-section. If the cut was not within an acceptable range of the target cross-section, then the 3D object would re-assemble and the participant would have to try again. A cut within the target range would solve the particular challenge. The top part of the 3D object would then disappear, and the bottom of the object would build a connection to bridge the gap. As players solved spatial puzzles, the level of difficulty increased. Levels 1-3 consisted of simple objects (e.g., an oval or triangular prism); levels 4, 5, and 7 consisted of joined objects (e.g., a house with a connected roof); levels 6, 8, and 9 consisted of embedded objects (e.g., a rod piercing through a box); and levels 10-12 presented organic shapes (e.g., a crab or a star inside a hat). During testing, no restrictions were placed on the amount of slices an individual could make on each level. Objects in levels 1-9 were designed according to the difficulty guidelines used in the SBST (Cohen and Hegarty, 2012). Organic shapes used in levels 10-12 were based on work done by Sanandaji et al. (2017) describing how organic and biological shapes pose a higher challenge than pure geometric shapes when inferring cross-sections.

Keyboard Condition and "Keep the Ball Rolling" Gameplay
Participants in the keyboard condition completed the same game intervention as the VR-TEI group, but without the use of any tangible interface, Oculus headset, or the foot pedal. Instead, the keyboard group used a keyboard, a mouse, and a flat monitor to play the game. The cursor control keys or the arrow keys on the keyboard were used to move the virtual plank up, down, and side-to-side, while the space bar was used in place of the foot pedal to create a desired slice. The puzzle objects, their order, the target screens, and the accompanied animations remained the same as in the VR-TEI condition. However, because the Oculus was no longer utilized, participants moved the mouse around the screen to change the view in the virtual environment (i.e., to "look around"). Overall, substituting the tangible interface and the Oculus with a keyboard and mouse placed less emphasis on embodiment. The interaction conditions in this low-embodiment group were important for comparison, as they would resemble the kinds of input conditions more commonly found in classroom settings. Because this group also completed the game, the data from this group was pooled with the data from the VR-TEI group to address Aim #1 of the study-to determine whether or not the game engaged similar cognitive processes as those assessed by the SBST. By comparing the presence and magnitude of any pre-and post-intervention changes in this group (and the control group) to those of the VR-TEI group, the study's secondary aim was addressed.

Control Condition
A control condition was included in order to determine whether learning effects might emerge between pre-and postintervention tests independent of any game intervention. For example, participants might improve in their SBST performance simply because they took the test twice. Participants in the CT group did the SBST pre-and post-tests, but did not engage with the "Keep the Ball Rolling" game. Instead, they completed a series of questions that required them to solve simple math problems, retype words, and correct basic grammar in sentences (e.g., solve 2x + 1 = x + 2). This task-irrelevant questionnaire was delivered on a Google Form in between the pre-and post-intervention SBST tests. The form was made long enough to take the same average time to complete as the mean range of the other two groups. Although the form was mentally engaging, no part of the questionnaire contained any spatial or visual component.

Data Reduction and Descriptive Comparisons to Norms
Data recorded during each test consisted of: SBST results (preand post-intervention); VR-TEI and KB in-game performance (number, time, angle of each slicing event); demographic survey data (age, gender, 3D experience level); and notes about technical design. The original SBST consisted of 30 questions, but one was removed when we implemented the test in a digital version because it contained no correct answer (see Cohen and Hegarty, 2012, in which the authors describe the rationale for removing this question). As a consequence, the total number of possible correct answers was adjusted to 29. Each question in our digital SBST included two incorrect answers, one correct answer, and one "egocentric" foil. The egocentric distractor represents the shape a participant might envision if they failed to change their perspective relative to the cutting plane of the object (e.g., answer "B" in Figure 1). The developers of the SBST (Cohen and Hegarty, 2012) suggest incorporating egocentric foil answers and treating them as partially correct. For our study, egocentric distractors were considered errors and each question on the SBST was marked as either correct or incorrect. Each SBST result in our study marked a single value.
SBST responses were analyzed in multiple steps. Mean and standard deviation values of the pre-test score for all three conditions were compared with the values reported by Cohen and Hegarty (2012) to determine if we had a representative sample of participants and/or we conducted the test as intended. The mean and standard deviation of the proportion of correct answers on the pre-test (M = 0.66; SD = 0.25; n = 45) closely resembles the results of the Cohen and Hegarty (2012) study (M = 0.66; SD = 0.25; n = 223). The similarity in the average performance of the SBST from previous studies to our current study indicates that our sample is consistent with what is seen in literature.

Correlation Between SBST Score and Cut Attempts
Any potential for the game intervention to improve penetrative thinking (Research Question #2) is predicated on the notion that penetrative thinking abilities are engaged during the game intervention (Research Question #1). The first level of analysis, thus, was conducted to determine if there was a relationship between performance during the two SBST tests and the game intervention. In other words, did participants with high SBST scores also fare well in the "Keep the Ball Rolling" game? To test this prediction, each individual's total score on the SBST was correlated with the total number of cuts that they made to get through the game intervention. This analysis was based on the prediction that someone with a higher penetrative thinking ability (as reflected in a higher score on SBST) would require fewer attempts of cutting the objects to find the correct cut. In contrast, someone with a lower penetrative thinking ability (as reflected by a lower SBST score) would require more attempts. If the game and SBST engage similar cognitive abilities, then there should be a negative correlation between SBST score and the total number of cuts. The data from both the VR-TEI and the keyboard group were included in this initial overall analysis.
Consistent with the assumption that similar cognitive abilities were engaged in the SBST and the game, there were significant negative correlations between the number of total cuts made during the intervention game and the pre-(r = −0-0.539, p = < 0.05) and post-test (r = −0-0.562, p < 0.05) SBST scores (see Figure 4). These correlations provide meaningful evidence that the game engages the penetrative thinking ability that is assessed by the SBST. As such, the correlations establish a foundation for our underlying claim that the game may be an appropriate means for potentially training this ability. That is, it appears that the novel game activates similar abilities as those assessed by the SBST.
To further determine if these same relationships exist in each of the VR-TEI and KB groups, additional correlations were conducted separately for each interface and each pre-/postintervention test. The results of these additional analyses reveal significant correlations between the number of cuts in the KB interface and the scores in the SBST in both the pre-(r = −0.699, p = <0.05) and post-intervention tests (r = −0.606, p = <0.05), and the number of cuts in the TEI condition and the post-intervention test (r = −0.539, p = <0.05). The correlation between the number of cuts and pre-intervention score on the SBST for the TEI group approached but did not cross conventional levels of statistical significance (r = −0.411, p < 0.13). Note that, although this correlation between the number of cuts and pre-intervention score on the SBST for the TEI group was not statistically significant, the effect size was in the medium range (r = 0.411) and, most importantly, the characteristics of the line of best fit for the correlation approximates that of the other correlations. Overall, the results of these correlations are consistent with the overall analysis and provide evidence that the game, regardless of the interface, activated processes that are similar to those that are assessed by the SBST.
The next section reports the evaluation of this potential training effect. Before turning to the analysis of any potential pre-/post-intervention changes in scores on the SBST, we will report additional analyses of the number of cuts made in the game by participants using the different interfaces. Although there was no statistically significant difference in the mean number of cuts each group needed to complete the game, t  7), an assessment of the between-subjects variability (i.e., Levene's test of homogeneity of variance) in performance within each group revealed that participants operating the VR-TEI interface were more consistent in number of cuts required to complete the game than the participants completing the game with the KB interface, F (1, 28) = 9.12, p < 0.05. These analyses may be interpreted with caution given that this is a between-group analysis and may be accounted for by individual differences. Nonetheless, these analyses provide some evidence that the game was completed more efficiently and consistently with the VR-TEI interface than the KB interface. Any potential learning effects, and in particular larger learning effects in the VR-TEI group relative to the KB group, were assessed via subsequent analyses and are reported in the following section.

Penetrative Thinking Improvement
To determine if any of the specific interventions influenced a participant's penetrative thinking ability, the scores on the SBST were submitted to a three (Group: VR-TEI, KB, CT) by two (time: pre-intervention, post-intervention) mixed ANOVA with group as a between-subject factor and time as repeated measures. Prior to considering the results of the ANOVA, an assessment of the homogeneity of variance was conducted separately for pre-test and post-test scores. The results of Levene's test revealed that the assumption of homogeneity of variance was met within each set of scores (p s > 0.12). The ANOVA revealed that there were no significant main effects of Group, F (2, 42) = 1.00, p = 0.38, η 2 p = 0.046, or of Time, F (1, 42) = 0.04, p = 0.84, η 2 p = 0.001, suggesting that there were no overall group differences in scores and no overall change in scores. Mean performance on the pre-test SBST was 19.38 out of 29 (range: 4-28; SD: 7.3) and mean performance on the post-test SBST was 19.47 out of 29 (range: 3-28; SD: 7.2). Furthermore, the interaction between Time by Group was also not statistically significant, F (2, 42) = 0.423, p = 0.66, η 2 p = 0.02 (see Figure 5). The mean improvement for each group was: 0.4 for the control group; −0.47 for the keyboard group; and 0.33 for the VR-TEI group. To specifically examine whether or not there were statistically significant and/or relevant and meaningful improvements to SBST scores as a result of any of the interventions, a series of paired sample t-tests, Cohen's d z effect sizes, and Bayesian paired samples t-tests (JASP Team, 2016) were conducted on the pre-and post-test scores for each group separately. The results of these analyses are presented in Table 1. In sum, these results do not provide evidence to support the hypothesis that the interventions improved performance, but rather indicate that the interventions did not lead to a statistically significant (p s > 0.4) or meaningful (d z s < 0.2) change in performance on the SBST. In fact, the results of the Bayesian analysis provide no support for the alternative hypothesis of a (BF 10 s ≤ 0.34), whereas there is moderate evidence that the null hypothesis is more likely than the alternative hypothesis (BF 01 s ≥ 3.0) (Jeffreys, 1962).

DISCUSSION
The overall purpose of the present research was to evaluate a tangible virtual reality-driven game designed to (1) engage and (2) improve penetrative thinking ability. The study revealed two main findings. First, the correlation analyses show a significant negative correlation between number of cuts made and the pre-and post-intervention scores on the SBST and performance in the "Keep the Ball Rolling" game. The correlation results provide support for the prediction that the game engages spatial skill processes and, as a proof of principle, presents a novel method of engagement for this important spatial ability. Second, although the game intervention may have engaged penetrative thinking skills, the results of the comparisons of the pre-and post-intervention scores did not provide any evidence that the interventions affected SBST scores and the penetrative thinking ability that is assessed by this test. That is, there was no statistically significant change in the pre-to post-intervention SBST, and the effect size of any change was negligible. In other words, there was no evidence that the game intervention led to a significant or meaningful performance increase. We had predicted a beneficial impact of the game and that there could be additional benefit for the VR-TEI condition relative to the KB condition. However, despite the fact that both game conditions engaged penetrative thinking, the overall results provide no evidence that the game (whether using the KB and VR-TEI interventions) were effective in improving this ability based on current training conditions. Finally, the absence of an effect of Group or an interaction between Group and Time indicates that there were no differences between the VR-TEI and KB groups and the CT group that did not undergo the intervention. Overall, there is no evidence that the interventions affected penetrative thinking despite the correlational evidence suggesting that the game activates processes assessed by the SBST.
This study focused on how games such as "Keep the Ball Rolling" can be effective in engaging penetrative thinking and any immediate changes in penetrative thinking that may result from playing this game. It was not explicitly designed to test longterm learning effects given the limited time of the intervention. Even though potential learning effects were not registered, we remain optimistic that such games have the potential to deliver effective training over more substantial periods of time (perhaps over several days or weeks of training). The following discussion outlines the consequences of these results for potential system redesign opportunities to enhance embodiment effects and the training of penetrative thinking.

Transferability of Penetrative Thinking Skills
There are several possible reasons why no change in penetrative thinking skills was observed across any of the intervention groups. Given the results of the correlation analysis, we believe it is very likely that the game does, in fact, engage the targeted spatial cognition mechanisms. However, as each session included only 12 puzzles and was completed over a limited duration, it is possible that the intervention was not sufficiently powerful, long, or difficult to generate meaningful and statistically significant changes. Penetrative thinking might take longer to train than a comparable spatial skill such as mental rotation. Indeed, semester-long interventions have been shown to improve penetrative thinking in geosciences education (Gold et al., 2018;Hannula, 2019). Although the correlational analyses provide evidence that the game activates penetrative thinking, actual improvement might require more effort over time or more difficult challenges. Is it possible to improve penetrative thinking over such a short period? Penetrative thinking might be more task-or domain-specific than other spatial abilities, and we might take longer to improve it because it is less frequently used or deployed only in specific contexts. If short-term improvement is possible, does it require a context that is familiar to the trainee? Perhaps virtual environments would be more effective if they are more closely related to the contexts that participants might later transfer skills to. For example, a geology student might fare better in a virtual environment replete with distinct stratum and geological features that also resemble the objects featured in evaluation tools like the Geologic Block Cross-Sectioning Test.
A second issue related to designing for skill transfer has to do with the selection of objects used to model virtual assets in the virtual environment. Are geometric primitives (e.g., cubes and cylinders), as seen in the SBST examples we modeled some of our game levels on, more intuitive than the joined, embedded, or organic ones we used in the later levels? Sanandaji et al. (2017) report that cross-sections of biological shapes are more difficult to interpret than pure geometric shapes. It is likely easier to imagine a cross-section of a pyramid (see Figure 6) than the body of a crab or a donut-themed torus embedded in a cube (three objects from our game). In-game slicing data and post-intervention interviews indicated that some participants in our study may have attempted naïve pattern matching by "spamming" cuts in a kind of exploratory mode when encountering unfamiliar objects. It could be the case that environments seeking to improve penetrative thinking might need to take familiarity into account and be populated with less complex objects that more closely correspond with those found in evaluation tools like the SBST.
A third factor may have been pre-existing consistencies between the groups and pre-existing skill levels among participants. Ormand et al. (2014) found that already highperforming students appear to be adept at multiple spatial skills. Overall, participants in the present work were already fairly high scoring on the SBST test. The average pretest scores across the three groups were as follows: VR-TEI = 18.7, KB = 18.2, CT = 21.2. Scores this high may indicate less room for improvement (Cohen and Hegarty, 2012). This "ceiling effect" is a possible reason why we did not observe significant group differences and pre-/post-intervention changes in the scores. Future work will be aimed at the assessments of people with lower overall preintervention scores because these individuals may not have been constrained by this ceiling effect. Further, in a practical sense, it is this group of individuals with lower penetrative thinking abilities who would benefit most from the intervention and, hence, would most likely be the target for such an intervention in the real world.

Potential for VR-TEI
The correlation analysis validates the potential of VR-based spatial games and tangible-embodied interaction systems for future work. First, the negative correlation between number of cuts and test scores indicates that those who performed poorly on the SBST were the same individuals who made a high number of cuts throughout an intervention. An increased number of cuts likely represents an exploratory approach taken by participants as a way to solve a puzzle. Such an exploratory strategy would be consistent with a participant not being able to determine the answer from the information provided but needing to "search" and use visual feedback to find the solution. Conversely, participants who made fewer cuts to find the solution are likely to be those who already had a general idea of how to slice the desired cross-section more efficiently. These results suggest that the game system taps into participants' capacity for penetrative thinking.
But if the game engages penetrative thinking ability, why does it not also strengthen it? Virtual environments ground object interaction to users' pre-existing spatial experience. At the same time, they also afford the capacity to experience space in ways that cannot be achieved in the physical world (Dünser et al., 2006;Wauck et al., 2017). Virtual objects in the "Keep the Ball Rolling" game were grounded to participants' pre-existing spatial experience, but the game never fully realized the second capacity: to help users experience space in ways they cannot achieve in the physical world. They were never asked to deviate from the perceptual context they would encounter in a paper-or screen-based test of penetrative thinking. In these tests, they are required to mentally orient themselves around an object in order to visualize it fully. In our game, we have effectively restricted them to the same condition when encountering such tests-to face an object head-on without any increase in complexity or extended activation of the initial perceptual characteristics. Users cannot experience virtual locomotion, perspective shifting, or scale differentiation in ways that VR interactions could afford. In effect, they are fixed to a single egocentric spatial arrangement that does not fully leverage other spatial skills like mental rotation and perspective taking.
We see possible overlap of multiple cognitive activations in VR experiences. This might have impacted the effect of the "Keep the Ball Rolling" game, but it also indicates opportunities to activate more than one spatial skill at a time, or at least in correspondence with each other. With this in mind, we suggest that there remains potential for VR-TEI games and systems to improve spatial abilities, but that this might require clearer methods for evaluating the efficacy of such systems to control for or distinguish between different spatial abilities. VR environments afford the controlled activation of single spatial abilities. For the purpose of evaluation, it might be necessary to determine whether separating a skill like penetrative thinking from related skills like perspective taking is possible without diminishing the inherent capacity of VR to present novel experiences that cannot be realized in physical interaction. This is an emerging research concern, and we anticipate moving our technical designs in this direction.

Future Work
Because of the potentially limited intervention (i.e., only 12 puzzles in a single hour-long session), it might be premature to suggest that the game (and VR-TEI systems in general) would be ineffective at improving penetrative thinking over a longer duration. Our previous work indicates preliminary support for VR-TEI systems being used to engage spatial skills (Chang et al., 2017a,b). To provide a more effective training interface, we propose a significantly longer and higherimpact intervention. The average run time for the VR-TEI intervention was approximately 18 min, and participants only solved 12 puzzles. This is likely not enough time to cognitively engage in penetrative thinking skills for the purpose of improvement. Future work will focus on increasing the number of puzzles, adding multiple levels to create a fully-fledged game experience, further refining the feedback mechanism in the virtual environment, and carrying out studies over longer periods of time.
We also propose to focus on more targeted recruiting in order to test a wider learning window for spatial abilities by evaluating other spatial abilities in our pre-intervention assessments. Recent studies in applied, discipline-specific penetrative thinking studies (e.g., Atit et al., 2015;Hannula, 2019) use additional prestudy tools including the Geologic Block Cross-Sectioning Test (GBCT; Ormand et al., 2014) and the mental rotation test (MRT; Vandenberg and Kuse, 1978;Peters et al., 1995). This will help us evaluate whether it is possible to effectively distinguish penetrative thinking-related activities from ones that draw on complementary spatial skills in VR-TEI systems. Using additional spatial assessment tools in future recruitment will let us specifically target participants with lower pre-intervention spatial ability scores across various skills, including mental rotation. These individuals are the ones who, in a real-world context, stand to benefit most from interventions like the ones we describe.
Additional means to support stronger embodiment are possible. Previous instances of the TASC system that underlies "Keep the Ball Rolling" rendered hands visible in the VR environment as a way to enhance embodiment between the physical interaction and the virtual world. For future versions of the game, virtual hands could be re-incorporated to reflect real-world movement of the tangible plank and its influence in the virtual world. Additionally, 3D-printed objects that correspond with in-game virtual assets could be used as peripherals to control the movement of target objects in a game, thus providing enhanced opportunities for object-based spatial reference. A combination of interaction methods such as these, that are grounded in realistic physical gestures, with others that afford views and movements that can only be fully realized in a virtual environment (e.g., rapid scale shifts; perspective jumps), will be necessary to understand whether more grounded or abstract forms of embodied interaction support activation and enhancement of penetrative thinking. Overall, it is our goal to continue the development of new virtual environments driven by tangible interactions, with a long-term goal of demonstrating how VR-TEI systems can be more effectively designed for integration into actual STEM curricula and professional training methods in fields like medicine and engineering. These are fields where VR systems, in particular, frequently have mixed training results (Levinson et al., 2007;Wang et al., 2018) while manipulation of physical objects has been shown to improve spatial reasoning in learning contexts (Dadi et al., 2014;Wainman et al., 2018). The study reported in this paper is an initial step in this direction.

CONCLUSION
This paper presented findings from a study evaluating the capacity for an interactive game played in a virtual reality environment, driven by a tangible interface, to provide a rich, embodied experience for users facing complex spatial challenges. The findings from this did not provide evidence to support the claim that the designed game can improve penetrative thinking ability over the short term-whether played with a tangible or a keyboard interface. The findings do, however, provide evidence suggesting that games such as these engage penetrative thinking, and may provide novel contexts for developing this important spatial ability. Our discussion highlights how these opportunities might be realized through the use of longer-term engagements; development of more game-like environments; the use of context-specific virtual assets; and greater attention to physical interface objects that support embodiment.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors upon reasonable request.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Ryerson University Research Ethics Board. Participants provided their written informed consent prior to participating in this study.