Off-The-Shelf Stylus: Using XR Devices for Handwriting and Sketching on Physically Aligned Virtual Surfaces

This article introduces the Off-The-Shelf Stylus (OTSS), a framework for 2D interaction (in 3D) as well as for handwriting and sketching with digital pen, ink, and paper on physically aligned virtual surfaces in Virtual, Augmented, and Mixed Reality (VR, AR, MR: XR for short). OTSS supports self-made XR styluses based on consumer-grade six-degrees-of-freedom XR controllers and commercially available styluses. The framework provides separate modules for three basic but vital features: 1) The stylus module provides stylus construction and calibration features. 2) The surface module provides surface calibration and visual feedback features for virtual-physical 2D surface alignment using our so-called 3ViSuAl procedure, and surface interaction features. 3) The evaluation suite provides a comprehensive test bed combining technical measurements for precision, accuracy, and latency with extensive usability evaluations including handwriting and sketching tasks based on established visuomotor, graphomotor, and handwriting research. The framework’s development is accompanied by an extensive open source reference implementation targeting the Unity game engine using an Oculus Rift S headset and Oculus Touch controllers. The development compares three low-cost and low-tech options to equip controllers with a tip and includes a web browser-based surface providing support for interacting, handwriting, and sketching. The evaluation of the reference implementation based on the OTSS framework identified an average stylus precision of 0.98 mm (SD = 0.54 mm) and an average surface accuracy of 0.60 mm (SD = 0.32 mm) in a seated VR environment. The time for displaying the stylus movement as digital ink on the web browser surface in VR was 79.40 ms on average (SD = 23.26 ms), including the physical controller’s motion-to-photon latency visualized by its virtual representation (M = 42.57 ms, SD = 15.70 ms). The usability evaluation (N = 10) revealed a low task load, high usability, and high user experience. Participants successfully reproduced given shapes and created legible handwriting, indicating that the OTSS and it’s reference implementation is ready for everyday use. We provide source code access to our implementation, including stylus and surface calibration and surface interaction features, making it easy to reuse, extend, adapt and/or replicate previous results (https://go.uniwue.de/hci-otss).


INTRODUCTION
Writing and sketching are important aspects of human communication. Symbolically representing language and ideas allows to externalize information and share it across space and time. Consequently, several use cases in Virtual, Augmented, and Mixed Reality (VR, AR, MR, in short XR) are suited to embedding text and illustrations, e.g., in the areas of knowledge work and communication, social XR, or training and learning scenarios (Latoschik et al., 2019). We usually acquire the craft of penmanship during childhood. Today, writing by hand is marginalized evermore in favor of typewriting. This preference starts during beginning writing instruction and carries through adult's work and leisure activities with various implications related to the ease of writing with keyboards and the consistent output it produces (Mangen, 2018). In recent research, typewriting has been ported to XR by tracking finger movement, blending in camera views of a physical keyboard and/ or tracking where the keyboard is located (Bovet et al., 2018;Grubert et al., 2018;Jiang et al., 2018;Richardson et al., 2020). The resulting text production speed is reported to compare to typewriting outside XR, even for non-touch typists. However, neuroscientific research shows that the motor process of writing longhand is linked to more brain activity than typewriting (Ose Askvik et al., 2020). van der Meer and van der Weel (2017) aptly summarized a similar finding in their title as "Only Three Fingers Write but the Whole Brain Works".
Experimental research in psychology suggests that, when taking notes, rapid text production with a keyboard (unexpectedly) results in memorizing shallow facts with typists even when explicitly counseled not to take notes verbatim (Mueller and Oppenheimer, 2014). In contrast, the slower writing speed by hand acts as "desirable difficulty" to encourage deeper understanding because of a more pressing need to paraphrase and condense ideas. As a further advantage of writing and sketching by hand, it provides countless possibilities for spontaneous and creative expression in structured and free-form modes, whereas typewriters are restricted to use of a standardized, finite set of keys.
In the real world, passive haptic feedback is provided by the surface, e.g., the whiteboard or paper. This feedback is important for both task performance (Viciana-Abad et al., 2010) and to reduce the fatigue of the arm (Speicher et al., 2018). Therefore, it is recommended to align the virtual surface to a physical counterpart, e.g., a table or a wall (Zielasko, 2020).
Dedicated pen-like input devices (so-called styluses) have recently been announced as pilot versions. While these devices are expensive or unavailable, they, like many scientific stylus prototypes, usually require external hardware components such as additional six-degrees-of-freedom (6DoF) tracking systems, microcontrollers, or 3D printed parts, and often include custommade input metaphors and interface implementations (Poupyrev et al., 1998;Fiorentino and Uva, 2005;Arora et al., 2017;Wu et al., 2017;Chen et al., 2019;Elsayed et al., 2020;Jackson, 2020;Romat et al., 2021). Other approaches try to overcome limited availability, high pricing, or a complex construction process by basing their XR styluses on consumer-grade XR devices (e.g., HTC Vive Controller or Oculus Touch controller) held upside down (Pham and Stuerzlinger, 2019;Wang et al., 2019;Bowers et al., 2020). Nevertheless, they still require 3D printed components and/or microcontrollers for tip attachments. To eliminate the requirements of additional tracking systems, 3D printing, and/or microcontrollers, we propose the Off-The-Shelf Stylus (OTSS) framework.
OTSS's contribution is twofold: It provides guidance on how to structure and modularize device-independent required functionalities to realize 2D interaction (in 3D) as well as for handwriting and sketching with digital pen, ink, and paper on virtual/physical 2D surfaces in XR. In addition, OTSS provides a comprehensive test bed combining technical measurements for precision, accuracy, and latency with extensive usability evaluations including handwriting and sketching tasks based on established visuomotor, graphomotor and handwriting research. The OTSS development is motivated by the following questions (Q): [Q1:] How to convert typical consumer-grade XR devices into flexible pen-oriented handwriting and sketching devices at low cost with minimal additional tracking tools and equipment?
[Q2:] How to align virtual 2D surfaces to arbitrary flat physical surfaces for handwriting and sketching input with XR styluses?
[Q3:] How to generate, store, and share written and sketched content in a familiar way across platforms?
[Q4:] How to evaluate handwriting and sketching in XR?
We built a reference implementation of the OTSS framework in the Unity game engine with an Oculus Rift S headset and Oculus Touch controllers. In addition, we performed technical measurements as well as a first usability evaluation.
We share our source code publicly to provide researchers and practitioners with a comprehensive and shared foundation making it easier to reuse, extend, adapt, and/or replicate previous results and to build styluses for their own XR applications (https://go.uniwue.de/hci-otss).

RELATED WORK
This section discusses previous efforts in analyzing handwriting and sketching, prototyping XR styluses, and their use on everyday physical surfaces. We summarize the relevance and influence of previous work on our OTSS framework and describe how we have applied it.

Handwriting
Writing is an essential ability during school time and later in life (Selin, 2003). Before children start formal education on writing, they acquire fundamental skills like scribbling or drawing, which are necessary for the ability of handwriting. Handwriting requires well-coordinated movements of shoulders, arms, and hands. Gerth et al. (2016b) divide skills needed for handwriting into graphomotor skills (movements necessary for writing), visuomotor skills (interaction of visual, visual-perceptual, and motor skills), and handwriting skills (accuracy of letter formation, uniformity of letter size, letter and word spacing, and alignment on lines of writing). At an early age, children often hold objects between palm and fingers (primitive grip) (Selin, 2003). In pencil grip development, children initially secure objects with the thumb against index or other fingers (power grip) and later hold them in a pad-to-pad position to facilitate precise movements (precision grip) (Selin, 2003). Power grip offers stability but limits hand or wrist movements and requires more strength and movements of the whole arm. In contrast, precision grip mainly requires finger and wrist movement. Writing tools are usually held in dynamic tripod grip (Rosenbloom and Horton, 1971), which can be also classified as a precision grip (Selin, 2003). It is ideal for the finegrained, intricate movements required for handwriting (Selin, 2003). Here thumb, index, and middle finger perform fluent writing movements. In VR, Batmaz et al. (2020) showed that selection tasks were accomplished with fewer errors and increased user performance when holding a stylus in precision grip rather than power grip.

Handwriting and Sketching Performance Evaluation
In the past, handwriting skills have been assessed using standardized tests such as the Beery-Buktenica Developmental Test of Visual-Motor Integration (VMI) (Beery and Beery, 2010), using pen and paper and digital tablet computers. To assess handwriting skills, Gerth et al. (2016a), Gerth et al. (2016b) suggest to measure handwriting dynamics and score handwriting quality by experts. Gerth et al. (2016a), Gerth et al. (2016b assessed visuomotor abilities with two parts from the Beery-Buktenica Developmental Test of Visual-Motor Integration, and its Supplemental Developmental Test of Motor Coordination (Beery and Beery, 2010). For the visual-motor integration test, participants had to copy geometric forms to empty sketching areas below the templates. In the motor coordination test, geometric forms were traced by connecting dots without intersecting with double-lined borders. To measure graphomotor abilities, test subjects produced basic handwriting components in the form of continuous and repetitive patterns, such as loops, zigzag lines, and staircase patterns around given dots. Lastly, Gerth et al. (2016b) let participants write the phrase "Sonne und Wellen" ("Sun and waves" in German) as handwriting assessment because it represents a simple, continuous handwriting movement. Handwriting quality for the visuomotor tasks was evaluated by scoring the accuracy of the sketching result compared to the template geometric form, according to the manual of the VMI (Beery and Beery, 2010). Gerth et al. (2016b) scored errors stemming from graphomotor and handwriting abilities based on the Minnesota Handwriting Assessment (MHA, Reisman, 1999). As quantitative handwriting measures, Gerth et al. (2016a) propose to measure the writing duration, writing velocity, in-air time, number of strokes, and number of inversions in velocity. In addition to the scoring scheme for visuomotor tasks, based on expert assessments, previous work presents objective measures to interpret the quality of drawn strokes. Arora et al. (2017) propose to use mean overall deviation, which is the average distance of a point (or strokes when the points are resampled and positioncorrected) to the corresponding reference point from the template figure. Also, Romat et al. (2021) compare strokes by an objective distance metrics. Wiese et al. (2010) introduced a category system to evaluate the quality of strokes by their line straightness, the matches of two lines, the degree of deviation, and the corrective movement at the end of the line.

Virtual Reality Styluses
Many previous works have proposed solutions to integrate handwriting to XR, inspired from familiar interaction of writing implement and surface, as it can be found with brush on canvas, pen(cil) on paper, or finger on touchscreen. Recently, dedicated XR stylus devices have been announced, e.g., Logitech's VR Ink 1 , Holo-Light's Stylus XR 2 , or Wacom's VR Pen 3 . Also, the HCI community developed several input devices like the Flashpen (Romat et al., 2021), the OVR Stylus (Jackson, 2020), the DodecaPen (Wu et al., 2017), the Elastylus (Lyu et al., 2015), the SenStylus (Fiorentino and Uva, 2005), the VRSketchPen (Elsayed et al., 2020), and other not formally named devices (Viciana-Abad et al., 2010;Arora et al., 2017;Wacker et al., 2018;Chen et al., 2019;Pham and Stuerzlinger, 2019;Drey et al., 2020;Gesslein et al., 2020;Li et al., 2020;Bowers et al., 2021). XR styluses typically require additional tracking sensors, like a general-use motion capture system, finger tracking, custommade tracking systems, or haptic devices. As an exception, the Logitech VR Ink stylus uses SteamVR tracking. In addition, researchers combined their tracking solutions with 2D tracking devices like graphics tablets to provide a solid surface for precise and accurate handwriting and sketching (Billinghurst et al., 1997;Poupyrev et al., 1998;Chen et al., 2019;Wang et al., 2019;Gesslein et al., 2020;Hsu et al., 2021). While self-made solutions can provide high tracking quality depending on the deployed tracking system, button interactions, and even haptic feedback (Bowers et al., 2020;Drey et al., 2020;Elsayed et al., 2020;Jackson, 2020;Bowers et al., 2021), they frequently require 3D printed components (Bowers et al., 2020(Bowers et al., , 2021Jackson, 2020) or complex constructions (Fiorentino and Uva, 2005), which limits the availability of these devices. Also, the necessity of microcontrollers (Fiorentino and Uva, 2005;Bowers et al., 2020;Jackson, 2020;Romat et al., 2021), which require programming skills, may seem simple for technically experienced users, but makes these devices unattainable and unusable for beginners and inexperienced users. As a more economic solution with easier construction, Pham and Stuerzlinger (2019), Wang et al. (2019), Bowers et al. (2020), Bowers et al. (2021) explored consumer-grade XR devices held in precision grip as stylus, instead of power grip. However, Pham and Stuerzlinger (2019) reported the HTC Vive Controller (ca. 200 g) as too heavy and the HTC Vive Tracker (ca. 90 g) attached to a thin bar as uncomfortable to use due to its uneven weight distribution. Bowers et al. (2020), Bowers et al. (2021) used an Oculus Touch controller. Its weight is distributed more evenly and has less circumference than the HTC Vive Controller.
However, their 3D printed controller attachment increased the length of their physical construction which resulted in an uneven weight distribution.

Stylus Calibration
Self-created XR styluses can be designed in a large variety. Each stylus construction has its own shape and is individually attached to a position on the XR controller based on user preferences. For this reason, the attached tip needs to be calibrated in order to determine its position in the virtual environment. Tuceryan et al. (1995) propose an equation to calculate the position of the hotspot relative to the tracking position of the pointer or controller. Fuhrmann et al. (2001) used this approach to calibrate head-mounted displays, shutter glasses or projection screens. The exact location of the tip is unknown at the beginning. For the calibration, it is necessary to place the attached tip at a fixed position. Fuhrmann et al. (2001) propose a small pit drilled into a table. While the controller is rotated on a hemisphere, its position as reported by the tracking system is recorded multiple times. When fitting a sphere to the recorded points, the position of the tip is found at its center. The resulting accuracy of a tip attachment always depends on the tracking system's accuracy. Based on the equation of Tuceryan et al. (1995), Anthony Steed provides a publicly available reference implementation for use in the Unity game engine 4 .

Surface Tracking and Calibration
Virtual surfaces can be interacted with if they are placed in midair or aligned to physical surfaces. However, not being able to rest one's arm on a physical surface is prone to fatigue (the so-called gorilla arm effect (Speicher et al., 2018;Batmaz et al., 2020), therefore detrimental to prolonged writing sessions. Researchers combined calibration techniques and tracking systems to spatially map flat virtual surfaces onto physical ones. Examples are passive and active markers for optical tracking systems (Poupyrev et al., 1998;Lindeman et al., 1999;Clergeaud and Guitton, 2017;Arora et al., 2018;Drey et al., 2020) or controllers affixed to physical objects . Also AR devices like the Microsoft HoloLens provide spatial mapping features for physical planes which were used in the past (Arora et al., 2018). While tracking dynamic surfaces allows to move them and their virtual representation around, it requires dedicated physical props. Calibration techniques for static physical counterparts do not require these dedicated devices, but they are restricted in mobility. Wagner et al. (2018), Wagner et al. (2019) used the Oculus Touch controller to calibrate a predefined virtual desk to its physical correspondence. The controllers were placed in a fixed position on the physical desk when the application was started. Also, Zielasko et al. (2019) calibrated the precise height and position of a fixed-size virtual desk/board to a physical desk/board by using the Leap Motion sensor data. The calibration procedure was executed for every participant due to tracking inaccuracies and drift. Regarding readability and tracking inaccuracies (up to few millimeters), participants interacted with menu items using either one finger or their whole hand. In particular, physically aligned virtual surfaces rely on high tracking accuracy to provide passive haptic feedback in the right place. Xiao et al. (2018) placed virtual content on flat physical surfaces detected by a Microsoft HoloLens. However, their system's accuracy (Euclidean distance of M 5.4 mm, SD 3.2 mm) and latency (180-200 ms) is not yet useful for precise writing input. Also commercial solutions include calibration techniques to physically align virtual surfaces like FlyingShapes 5 and Logitech VR Ink. They propose a simple and fast procedure to create a rectangle by capturing the position of three points and align a virtual rectangle to it.

Virtual Surfaces
Previous work showed handwriting and sketching surfaces in XR based on projecting 3D lines onto flat virtual surfaces or by manipulating pixel colors of the surface texture. While some surfaces which were implemented in previous work only visualize the user drawing (Elmgren, 2017;Wang et al., 2019;Drey et al., 2020), others also integrate saving and loading features (Poupyrev et al., 1998;Clergeaud and Guitton, 2017;Arora et al., 2018;Chen et al., 2019;Jetter et al., 2020). Traditional 2D desktop applications can be brought into XR environments. As initially proposed by Angus and Sowizral (1995b), this allows reusing familiar interactions like touch-input (e.g., press and scroll) and facilitates switching between working in XR and outside XR. Current applications solve this by capturing entire screens or windows and streaming them to a tethered headset (e.g., SteamVR or Oculus Dash). Applications that stream desktops wirelessly (e.g., Virtual Desktop 6 , Bigscreen 7 , and Immersed 8 ) facilitate the integration into standalone headsets, but also necessitate access to a dedicated computer. However, remote desktops as such are primarily designed to be controlled with keyboard and mouse and might not be as usable in XR (Zielasko et al., 2017). Hoppe et al. (2020) proposed adding magnified parts of a virtual desktop and triggering shortcut actions with virtual buttons. Still, such workarounds have to be adjusted manually. Integrating web browsers into XR applications like Angus and Sowizral (1995a), Jetter et al. (2020), Li et al. (2020) outsources handwriting and sketching features, data persistence and collaborative work to web platforms. This enables cross-device use in XR and non-XR settings (e.g., desktop computers, tablets, and smartphones) and is especially useful for standalone XR devices without access to dedicated computers.

Summary and Design Implications for the Off-The-Shelf Stylus
Handwriting and sketching are important abilities of human communication to externalize and share information. For this, people often use analog pencils and paper or digital tablet computers. Due to the limited availability of writing devices in XR, everyday users with consumer-grade hardware may not be able to use their existing hardware for this purpose-or at least have to deal with a complex setup including external tracking systems, microcontrollers, and/or 3D printing. To eliminate the need for an external tracking system, previous work showed that consumer-grade XR devices usually held in the power grip could also be held in the precision grip, which is more suitable for finegrained, intricate movements. Nevertheless, these approaches still require microcontrollers and/or 3D printing to, e.g., recognize the intention of handwriting and sketching on physical surfaces. To overcome the requirements of additional tracking systems, microcontrollers, and 3D printing, we propose a low-cost and low-tech solution that everyday users with consumer-grade XR devices can build themselves. As users can design self-created XR styluses in a large variety, we also provide a tip attachment calibration technique based on previous work. In the real world, passive haptic feedback implicitly supports people during handwriting and sketching on physical surfaces (e.g., a table or a whiteboard). This natural feedback is an important factor for task performance and to reduce the fatigue of the arm. As we are not integrating microcontrollers or 3D printing to recognize a physical surface, we provide a calibration technique (3ViSuAl) to physically align virtual surfaces and interaction techniques to enable 2D interaction (in 3D) as well as handwriting and sketching with a digital pen, ink, and paper. Previous work integrated web browsers as virtual surfaces instead of texture-based virtual surfaces (e.g., whiteboards). Web platforms provide a vast number of features for handwriting and sketching, data persistence, and collaborative work that can be reused in XR.
However, none of the related approaches have been explored in combination. We combine these concepts and features, which have been used separately in the past to eliminate previous limitations, use advantages, and provide a comprehensive open source XR framework for 2D interaction (in 3D) as well as handwriting and sketching on physically aligned virtual surfaces, the OTSS. We believe that such an open source implementation is highly relevant and significant to the field because it provides researchers and practitioners with a comprehensive and shared foundation making it easy to reuse, extend, adapt, and/or replicate previous results, without the need for a tedious and potentially error-prone replication of an implementation just by the information given in the papers. In addition, we adopt previous approaches using visuomotor and graphomotor tasks accompanied by handwriting exercises to assess the users' handwriting and sketching performance. Furthermore, the stroke execution time, the number of strokes, and the in-air time can be used to determine to which extent the user draws the shape as a fast arm movement vs. a movement with interruptions or corrections. We also integrate precision, accuracy, and latency measurements based on established research.

CONCEPT
We define the OTSS as a framework for 2D interaction (in 3D) as well as for handwriting and sketching with digital pen, ink, and paper in XR (see Figure 1). In Section 3.1, we describe the construction process to convert consumer-grade XR controllers into flexible, stylus-oriented writing and sketching devices (Stylus Construction), and offer a calibration technique for the tip attachment (Stylus Calibration). Section 3.2 describes our surface calibration procedure to calibrate a rectangle to flat physical surfaces and how to align arbitrary virtual surfaces like canvases, web browsers, keyboards, or touch pads to the rectangle (Section 3.2.1). We integrate a visual feedback system to guide the user during the calibration process (Section 3.2.2). For interactions with the virtual surface, we include a surface interaction component (Section 3.2.3). In Section 3.3, we present our evaluation suite, which includes technical measurements for the stylus and surface calibration techniques as well as a concept for a usability evaluation using established questionnaires. We evaluate handwriting and sketching performance by a standardized test providing visual-motor integration tasks and motor coordination tasks (Beery and Beery, 2010).
Furthermore, we include graphomotor tasks and handwriting exercises from Gerth et al. (2016a). The availability of our reference implementation as open source allows various extensions for different XR controllers and alternative calibration techniques.

Stylus Module
In this section we present our concept for constructing a stylus with consumer-grade XR controllers and a calibration technique for the self-made tip attachment. The calibration technique can also be used for XR styluses created by the HCI community. Our framework also supports commercial XR styluses that are already pre-calibrated. The result of this module is a (calibrated) XR stylus which we use as reference point for our surface calibration technique (Section 3.2.1), and to generate digital ink on digital paper or to interact with digital content like menus or web browser windows.

Stylus Construction
To use consumer-grade XR controllers as styluses for fine-motor movements on physical surfaces, they need to be extended with a stylus tip. In this way, we improve passive haptic feedback for precise stylus movements, as known from digital styluses on tables or analog pens on real surfaces. In this section, we evaluate different controller grip types, show three different tip attachments, and examine popular XR controllers in terms of weight, dimensions, and grip width. We consider the Oculus Touch Controller (Oculus Rift S/Quest) to be the most suitable XR device for conversion to an XR stylus for our use case.
Grip Types. Consumer-grade XR controllers such as the Oculus Touch and HTC Vive controller are usually held in a common posture, where the hand encloses the entire controller ( Figure 2A). Batmaz et al. (2020) describe this gripping posture as power grip. While the controller power grip is a common gripping posture for XR controllers, especially suitable for larger positional hand and arm movements, it can prevent the user from precise interactions as known from analog pens or digital styluses. XR controllers can also be held in more pencillike postures like the primitive grip, power grip or precision grip, described by Selin (2003). This approach was already considered for the HTC Vive Controller and HTC Vive Tracker (Pham and Stuerzlinger, 2019;Wang et al., 2019), and for the Oculus Touch controller (Bowers et al., 2020). Compared to the regular controller grip (Figure 2A), a primitive grip ( Figure 2B) like the palmar supinate grip (Erhardt, 1994), resembles a pen-like posture more closely. Nevertheless, this posture still requires a lot of positional wrist and arm movements and is not suitable for fine-grained movements required for handwriting (Selin, 2003). More appropriate for handwriting and sketching, XR controllers can also be held in precision grip postures (Batmaz et al., 2020) such as the dynamic tripod grip for pencils (Wynn-Parry, 1966) ( Figure 2C). The precision grip is also an intended posture for the Logitech Ink VR stylus, the Holo-Light Stylus XR, and the Wacom VR Pen. In accordance with previous work, we suggest to hold the XR controller in precision grip ( Figure 2C) to enable precise stylus movements for 2D interaction (in 3D) as well as for handwriting and sketching on physically aligned virtual 2D surfaces.
Tip Attachments. We compared three tip attachments with different material, stiffness, and build complexity (Figure 3). Combined with an Oculus Touch controller held in a precision grip, these attachments allow a pen-like feeling and also precise physical stylus movements on physical surfaces. In the end, we chose the pencil tip because of its rigidity and fitness for everyday surfaces. Our total setup cost per tip was under 10 Euro for aluminum foil, an Apple Pencil tip, and one screw. The fiber tip ( Figure 3A) is part of a commercial passive capacitive stylus. We taped a nut to the controller, then screwed in the fiber tip. We used self-adhesive aluminum foil to also let the controller operate as a passive conductor, allowing it to be used on touch screens. However, the tip's softness did not prove useful for a precise calibration. The pencil tip ( Figure 3B) is part of the publicly available Apple Pencil. It is made of hard plastic and has an internal thread. We taped a small screw, typically used for computer cases and PC hard drives, onto the controller. Next we screwed the tip on. The felt tip ( Figure 3C) has an adhesive back and was attached directly to the controller. The construction of the felt tip requires the least amount of time and effort. While similar to a felt pen in material, it does not provide a clear tip point.
Weight, Dimensions and Shape. As the weight and dimensions of a pen are important factors for handwriting and sketching, we examined popular consumer-grade XR controllers regarding weight with batteries included, dimensions, and grip width ( Table 1). We also included the HTC Vive Tracker in the comparison because it has previously been integrated into a XR stylus. In addition to these FIGURE 1 | The Off-The-Shelf Stylus (OTSS) framework: A framework for handwriting and sketching with consumer-grade XR controllers. The left hand depicts guidance on how to structure and modularize device-independent required functionalities to realize 2D interaction (in 3D) as well as for handwriting and sketching with digital pen, ink, and paper in XR on virtual/physical surfaces. The right hand defines a comprehensive test bed combining tests for precision, accuracy, and latency with extensive usability evaluations including handwriting and sketching tasks based on established visuomotor and graphomotor research. Frontiers in Virtual Reality | www.frontiersin.org June 2021 | Volume 2 | Article 684498 6 measurements, we attached an Apple Pencil Tip to the controllers to show the feasibility of our concept and for subjective verification of our technical evaluations during the construction process ( Figure 4).
The properties were measured with a commercially available digital caliper and an electronic kitchen scale and validate our values with publicly available data on product pages. For our measurements, we define the axis as the longitudinal axis, which most closely corresponds to the longitudinal axis of a pen. We define grip width as the location on the longitudinal axis where handling the controller in a pen-like posture is subjectively manageable and comfortable. Besides the HTC Vive Tracker, the Oculus Touch (Rift S/Quest) and the Pico Neo 2 controller have the lowest weights and most compact dimensions. However, the weight is still double that of the commercial Logitech VR Ink stylus (68 g). In terms of grip width, most controllers are very similar, with the exception of the HTC Vive controller and the Vive Cosmos controller. Looking at the longitudinal axis (length), it is noticeable that the two Oculus Touch controllers (Rift S/Quest and Rift CV1) have the shortest length, which is also reflected in a better subjective weight distribution relative to the grip position of the fingers.
Our technical evaluation confirms important findings of previous work (Pham and Stuerzlinger, 2019) regarding uneven weight distribution of the HTC Vive controller and the HTC Vive Tracker fixated at the end of a thin rod. Since  the Oculus Touch controllers (Rift S/Quest) have an advantage over HTC Vive controllers and other controller in terms of weight and dimensions, as well as grip width and weight distribution, we consider the Oculus Touch to be the most suitable device for our first stylus conversion and decided to use it for our reference implementation. As an additional advantage, due to the positioning of a button on the side of the longitudinal axis of the Oculus Touch controller, it can still be used. However, the integration should still be planned with caution since there is an increased probability for unplanned releases.

Stylus Calibration
For the calibration of the self-made tip attachment, we use the calibration technique of Tuceryan et al. (1995). It calculates the distance of the attached tip relative to the tracking position of the controller's center point. The offset between the attached tip and the controller's center point is estimated by sampling the controller position while pivoting it around a fixed world point in four directions ( Figure 5): up, right, down, left. The calibration sequence is not fixed and can be performed as desired. The procedure can be performed repeatedly to improve accuracy. Whenever the tip's attachment is changed, the offset needs to be calibrated. Such displacements might also arise from heavy impact or material wear. According to the Heisenberg Effect of spatial interaction (Bowman et al., 2001), button presses can influence the controller's position. Therefore, we recommend to bind button interactions to record a measurement point to the opposing controller's buttons with respect to the XR stylus. Our test runs also support the findings of Wolf et al. (2020), Bowman et al., 2001, Pham andStuerzlinger (2019), that binding the button on the same controller that the tip is attached can result in slight adjustments of the position. Following Fuhrmann et al. (2001) we also suggest to drill a small pit into a table to keep the tip in position while rotating the controller. Alternatively, to prevent damaging the table, we recommend using an additional non-slip pad, such as a mouse pad ( Figure 5). We observed that continuous rotation of the stylus controller during the calibration process can lead to a positional shift of the attached tip. Therefore, we decided to sample measurement points at discrete positions (top, right, bottom, left) to increase stability and reduce inaccuracies of the physical tip.
We also refer to a public available implementation of this calibration procedure created by Anthony Steed 9 .

Surface Module
This section describes our surface calibration procedure to physically align virtual 2D surfaces, named Visually Assisted 3point Virtual Surface calibration and Alignment (3ViSuAl). Our surface calibration procedure (Section 3.2.1) is based on previous ideas of describing a rectangle by three sampled corner points. During the calibration procedure, we propose to support the user by visual feedback regarding common dimensions and aspectratios (Section 3.2.2). We combine the previously calibrated stylus tip (Section 3.1.2) and our surface interaction feature (Section 3.2.3) to enable 2D interaction (in 3D) as well as handwriting and sketching.
For example, the virtual surface can be a digital paper that can be used to visualize digital ink (e.g., a whiteboard) or arbitrary digital content (e.g., menus, web browser, keyboards, or touch pads).

Surface Calibration
Our calibration technique (3ViSuAl) enables the user to align a virtual 2D surface to a flat physical surface at arbitrary dimensions and orientations (e.g., the alignment of a virtual web browser surface to a physical table, whiteboard, or wall).
We divide our surface calibration procedure into two steps: 1) Calibrating a rectangle based on three sampled 3D-points and 2) the alignment of a virtual surface to the position/rotation and dimension of the rectangle. Figure 6 visualizes the calibration procedure: The three sampled points are combined into a worldfixed rectangle by the following procedure: Use the first point ( Figure 6A) and the second point ( Figure 6B) as initial horizontal edges to get the width (Dimension) of the rectangle, fit plane through all three points ( Figure 6C), invert normal if the plane faces away from rather than towards the headset's camera, project third sample onto initial horizontal line to get the height (Dimension) of the rectangle. After capturing three calibration points, the rectangle is created and a virtual surface can be aligned ( Figure 6D).
Depending on the user's handedness, we propose to support both right-handed and left-handed calibration procedures and allow the definition of the height (Dimension) of the rectangle in upper and lower directions. As we described in Section 3.1.2, we bind button actions to the opposing controller to reduce possible position inaccuracies. We propose to provide the following values for the alignment: position and rotation of the center point, position of the corners (first, second, third, fourth), width and height of the rectangle, and indicators for the direction of the rectangle (left-handed/right-handed, downwards/upwards).

Visual Feedback
Our calibration procedure visually supports the user with common dimensions and aspect ratios. Common surface dimensions are especially important for aspect ratio-based content such as web platforms, and picture or video material. Typical aspect ratios are for example 1:1, 3:2, 4:3, and 16:9. Also unit-based dimensions (e.g., meters, centimeters, millimeters) are useful to mimic display sizes of consumer-grade desktop computers and laptops, smartphones, or tablets. Therefore, we propose to include visual feedback features that guide the user in creating the rectangle to achieve a desired surface size and/or aspect ratio. As indicated in Figure 6, the user is guided by visual feedback to match the dimensions of a tablet-sized surface (e.g., 12.9-inch and aspect ratio of 4:3 results in 26.2 cm × 19.7 cm). We also suggest to use the current position of the stylus' tip to continuously preview the surface as a white rectangle until three points are captured and the virtual surface can be aligned.

Surface Interaction
In this section, we refer to the concept of interacting with a virtual surface (e.g., canvases or web browsers) using game engines. As a simple approach, we propose to measure the distance to the virtual surface by casting a ray in forward direction of the calibrated tip. Respective pointer events are triggered by reaching, leaving or staying within a certain distance threshold. Beside surface interactions triggered by intersection, the interaction can be also done through distance pointing (e.g., with a laser pointer).
However, since we focus on handwriting and sketching on physical surfaces with direct contact to the surface, pointing from a distance does not fit our use case. Nevertheless, our surface interaction component also supports surface interaction by pointing.

Evaluation Suite
The evaluation suite provides technical measures regarding precision, accuracy, and latency, a concept for a usability evaluation, and a concept to measure sketching and handwriting performance.

Technical Evaluation
For the self-made tip attachment, we measure the precision of the calibration technique by using the absolute Euclidean distance. We measure the accuracy of the calibrated virtual surface by the orthogonal distance. Further, we measure the latency of the virtual surface by frame counting using a high speed camera. Stylus Precision: We define the positional distance error as the precision of the stylus calibration by calculating the absolute Euclidean distance pairwise of all sampled points based on the tip position. For sampling the points, we rotate the calibrated controller twice on a hemisphere and record measurement points continuously ( Figure 5). Similar to Fuhrmann et al. (2001), we ensure that the tip stays in place by drilling a small hole into the physical surface. We propose to use a button on the opposing controller to trigger the recording of measurement points, as unintentional controller movement cannot be avoided during button interactions (Bowman et al., 2001). It should be noted, that the controller's accuracy in position and latency depends on the overall performance characteristics of the tracking system. Therefore, we recommend to consult previous work regarding accuracy measurements of the respective controller (HTC Vive: (Niehorster et al., 2017;Borrego et al., 2018); Oculus Rift S: (Jost et al., 2019); Oculus Rift: (Borrego et al., 2018)) and latency (HTC Vive: (Niehorster et al., 2017)).
Surface Accuracy: We propose to measure the absolute orthogonal Euclidean distance between the surface and the calibrated stylus tip, to determine the positional distance error of the surface calibration technique For this purpose, the tip of the calibrated attachment is placed sequentially at nine specified measuring points and the average distance to the surface is calculated. The accuracy of the surface calibration depends on the accuracy of the tracking system as well as on the precision of the previously executed stylus calibration. It is also important to remark that the inaccuracies from previous steps will also influence the surface accuracy.
Digital Ink Latency: Since digital ink latency influences the user experience in digital handwriting and sketching tasks, we propose to calculate the time delay between the controller's movements and the appearance of the digital ink on the target surface. Since the controller's motion-to-photon latency also depends on the tracking system's performance characteristics, the tracking system's delay is implicitly included in this measurement. To compute the latency of the digital ink, it is required to measure motion-to-photon latency of 1) the stylus movement and 2) the digital ink. The difference between the two measurements describes the software-side latency of the surface (i.e., the latency within the software application). We propose to calibrate the stylus (Section 3.1.2) and measure the motion-tophoton latency of the controller by moving the physical controller in zigzag lines and comparing the change in direction of the physical controller with the virtual representation. For the motion-to-photon latency of the digital ink, we use previous zigzag lines to compare physical controller movements with the appearing digital ink on a calibrated virtual surface (Section 3.2.1).
We measure the latency of the digital ink using the established technique of video-based measurement and frame counting as described in He et al. (2000). We follow the experimental setup of Stauffert et al. (2018), by counting the frames between the beginning of the movement of the real controller and the digital ink appearing.

Usability Evaluation
Beside technical measurement, we propose to conduct user evaluations to measure task load, user experience and usability. We recommend to use questionnaires, such as the User Experience Questionnaire (UEQ, Laugwitz et al., 2008) to measure perceived user experience, the Raw Task Load Index (RTLX, Hart, 2006), a simplified version of the NASA Task Load Index (Hart and Staveland, 1988) to capture perceived task load, the System Usability Scale (SUS, Bangor et al., 2008) for general usability, and the Questionnaire for the subjective consequences of intuitive use (QUESI, Naumann and Hurtienne, 2010) to measure intuitive use. The Simulator Sickness Questionnaire (SSQ, Kennedy et al., 1993) could be included as control measure for overall well-being. We also provide rich-text fields for comments regarding advantages and disadvantages.

Sketching and Handwriting Evaluation
As Gerth et al. (2016a), Gerth et al. (2016b) has shown, standardized tests (e.g., Beery and Beery (2010)) to measure visuomotor, graphomotor, and handwriting skills can also be used on digital devices such as tablet computers. Combined with the proposed handwriting performance assessment in terms of handwriting quality and handwriting dynamics, this can also be a powerful set of methods for XR, to evaluate handwriting and sketching devices, as well as software tools. We therefore assess the performance of visuomotor, graphomotor, and handwriting skills in XR using tasks from the Beery-Buktenica Developmental Test of Visual-Motor Integration (VMI) and the VMI Supplemental Test of Motor Coordination (MC) (Beery and Beery, 2010;Gerth et al., 2016b). In addition, we suggest to evaluate the handwriting and sketching results using the Beery VMI manual (Beery and Beery, 2010), and to add the handwriting process measurements from Gerth et al. (2016b). We also recommend using the mean total deviation (Arora et al., 2017;Wacker et al., 2018) to measure differences between the given geometric form and the user's sketching. To assess visuomotor abilities, similar to Gerth et al. (2016a), we use four items of the Beery VMI ( Figure 7A) and the four corresponding items of the MC ( Figure 7B). These tasks are very basic and can be extended with more complex geometric forms provided by the Beery VMI (Beery and Beery, 2010). For graphomotor abilities, we include four continuous and repetitive movement patterns of Gerth et al. (2016a), which include unguided loop patterns ( Figure 7D) and with guided patterns ( Figure 7C). For handwriting, we included a single-word task ("Hallo") and a phrase task ("Sonne und Wellen") ( Figure 7E), as simple continuous writing movements are appropriate for handwriting evaluation (Gerth et al., 2016a).

REFERENCE IMPLEMENTATION
We chose the Oculus Rift S Head-Mounted Display (HMD) with Oculus Touch Controller for our first VR-based reference implementation. This section shows the applicability of our concept, and presents first insights into our reference implementation, which is publicly available as open-source.

Stylus Module
For our stylus construction, we use an Oculus Touch controller held in precision grip and an Apple Pencil tip affixed by a screw and aluminum foil ( Figure 3B). We implemented our stylus calibration technique by extending Anthony Steed's publicly available Unity reference implementation of the calibration procedure presented by Tuceryan et al. (1995).
As button interactions can influence the controller's position ("Heisenberg Effect" by Bowman et al., 2001), we bind button presses to the opposing controller.

Surface Module
We implemented our proposed calibration technique with visual feedback features (3ViSuAl) to physically align virtual 2D surfaces. Based on participant's feedback of our first usability evaluation (Section 6), we extended the visual feedback system with text-based information to assist the user by visually indicating fixed dimensions (Unity Units) and aspect ratios. We also support dynamic visual feedback during the calibration process by displaying the rectangle's current dimensions (width and height) as text and as a white rectangle. According to the Heisenberg Effect of spatial interaction Bowman et al., 2001, we bind button interactions for the calibration process to the opposing controller. Figure 8 visualizes the calibration procedure. The user initiates the calibration procedure by pressing the X button on the left Oculus Touch controller to capture the first point ( Figure 8A). The horizontal edge between the first captured point and the current controller position determines the width of the surface and is visualized by a white line ( Figure 8B). The third point can be placed either above or below the drawn edge and thus defines the rotation and height (Dimension) of the surface ( Figure 8C). After the third point is captured, the rectangle is calibrated, and the virtual surface is aligned with it by the position and rotation of the rectangle's center point and the dimensions (width/ height) ( Figure 8D).
For surface interactions, we implemented a custom input module based on Unity's Base Input Module. It is responsible for triggering events and sending them to game objects. This allows us to make use of Unity's pointer events. In each application frame, we measure the distance to the virtual surface by casting a ray from the calibrated stylus tip in the forward direction. When a certain threshold is reached, the corresponding pointer events (e.g., Click, Down, Up, and Move) are triggered.
We provide two virtual interaction 2D surfaces. A texturebased surface with rudimentary input functionality, implementing Unity's Event System for Pointer Events. For our web browser-based surface, we use Vuplex 3D WebView 10 , a commercial Unity plugin based on the Chromium Embedded Framework. It also implements Unity's Event System for Pointer Events. The employed Chromium instance runs as a dedicated operating system process. We developed a virtual whiteboard with the JavaScript framework PaperJS 11 , an open-source vector graphics scripting framework that runs on top of the HTML5 Canvas. It provides comprehensive functionalities for creating and manipulating vector graphics. The framework also supports algorithms to smooth, simplify or flatten the drawing path and can be extended with multi-user features.

TECHNICAL EVALUATION
To evaluate our reference implementation's precision, accuracy, and latency, we conducted four technical evaluations, according to our proposed methods (Section 3.3.1). Two experts familiar with the system measured ten samples for each technical evaluation. We analyzed the data with Python 3 in Jupyter Notebooks (Kluyver et al., 2016).

Stylus Precision
We calibrated the tip attachment (Section 3.1.2) and pairwise calculated the absolute Euclidean distance between continuously sampled points as measure of stylus calibration precision error ( Figure 9A). We measured similar calibration results for four (M 0.98 mm, SD 0.54 mm), eight (M 0.92 mm, SD 0.52 mm), and 12 (M 0.98 mm, SD 0.58 mm) calibration points during the stylus calibration procedure.

Surface Accuracy
We calibrated the surface with our proposed calibration technique 3ViSuAl (Section 3.2.1) and determined the surface's accuracy ( Figure 9C). Based on the results of the stylus precision evaluation (Section 5.1), we decided to calibrate the stylus with eight calibration points for the surface accuracy measurement, as we consider this to be a reasonable balance between stylus precision and calibration effort, and represents the real-world scenario we recommend for everyday users. We used a virtual surface with nine designated measurement points to ensure uniformity of measurement. We placed the tip of the stylus on each measurement point and averaged its distance from the surface for a time period of around 1 s. Each expert then repeated the calibration procedure and accuracy measurement 10 times. In our measurements, all of the observed points achieved an average distance of less than 1 mm (M 0.60 mm, SD 0.32 mm) from the virtual surface ( Figure 9C). Least accurate was the measurement point furthest away from the three calibration points ( Figure 9C, Point 7).

Digital Ink Latency
We measured the latency by frame counting between the physical movement of the controller and digital ink appearing on the virtual surface ( Figure 9B). As preparation, we calibrated the stylus and the surface. We set up an iPhone XR (240 Hz, 1920 × 1080 px resolution) to film the controller and an Asus ROG SWIFT PG43U monitor (144 Hz) mirroring the headset's left display. Since we used an external monitor to display the VR view, it may contain an additional time delay compared to the real display in the VR device. Nevertheless, we tried to reduce this time as much as possible by using the Oculus Mirror software. Our results are shown in Figure 9C. We drew zigzag lines and compared the physical tip's directional changes to the virtual representation (Stylus Movement) and to the digital ink (Digital Ink). On average, it took 42.57 ms (SD 15.70 ms) until the movement of the physical tip was displayed by the visual representation (Motion-to-photon latency of the controller). We measured an average latency of 79.40 ms (SD 23.26 ms) between the physical controller movement and the appearing digital ink.

Discussion
Our stylus precision measurement (Section 5.1) revealed an average precision error of 0.92 mm (SD 0.52 mm) for eight calibration points. As we calibrate the stylus tip relative to the Oculus Touch controller, it represents a fixed offset as long as the physical tip attachment is not moved due to external force. This implies that the accuracy of the tracking system can only influence the precision of the stylus tip during the calibration process and not over time and distance during handwriting and sketching. Still, the accuracy of the Oculus Touch controller with tip attachment itself is determined by the accuracy of the tracking system. In contrast, our surface calibration procedure 3ViSuAl aligns a virtual surface to a physical prop and therefore registers it in real-world coordinates. Our technical evaluation (Section 5.2) showed an average positional distance error of 0.60 mm (SD 0.32 mm). While the point capturing for the surface calibration is performed using the position and rotation of the controller, the calibrated surface is aligned in real-world coordinates, and therefore the accuracy measurements of the HMD are relevant. Jost et al. (2019) evaluated the positional and rotational accuracy of the Oculus Rift S and Oculus Touch controllers compared to an industrial high-fidelity motion tracking system (Vicon Nexus). An industrial robot arm repetitively moved the HMD and the controllers. They revealed a translational accuracy of 1.66 mm (SD 0.74 mm) (Oculus Rift S) and 4.36 mm (SD 2.91 mm) (Oculus Touch controller) with a rotational accuracy of 0.34°(SD 0.38°) (Oculus Rift S) and 1.13°(SD 1.23°) (Oculus Touch controller). Their results indicate that over a longer time period or distance, a re-calibration of the virtual surface can be required. Nevertheless, no re-calibration of the virtual surface was performed during our technical measurements and usability evaluation, nor was it necessary. We attribute this to the fact that the movements of the stylus on the calibrated surface were rather small compared to the movements during the evaluation of Jost et al. (2019). As an alternative off-the-shelf solution to the Oculus Rift S, previous work has used the HTC Vive (Pro) with HTC Vive controllers (Elmgren, 2017;Pham and Stuerzlinger, 2019;Wang et al., 2019). While in our experience, the HTC Vive controller is less suitable as an XR stylus due to its shape and weight, it shows higher accuracy (Niehorster et al., 2017;Spitzley and Karduna, 2019;Bauer et al., 2021) compared to an Oculus Touch controller. Since our OTSS framework also supports commercial XR styluses, these are also a potential solution as availability increases and prices become more consumer-friendly.
Regarding our system's latency (Section 5.3), we measured an average delay of 42.57 ms (SD 15.70 ms) until the movement of the physical tip was displayed by the visual representation (Stylus Movement) and 79.40 ms (SD 23.26 ms) between the physical controller movement and the appearing digital ink on the web browser surface (Digital Ink). As humans can reliably perceive latency lower than 10 ms, the aim for handwriting and sketching (in XR) should be to minimize the latency as much as possible. While our digital ink's latency is perceptible, the subjects of the user study did not mention it explicitly. To reduce the overall latency of commercially available state-of-the-art styluses, touch prediction algorithms are applied, which for example, reduces the latency of the Apple Pencil on an Apple iPad Pro to 42.9 ms (Yun et al., 2017). These findings are similar with evaluations by Helps and Helps (2016) measuring an average latency of 47.88 ms (SD 9.04 ms) (iPad Pro using the native application Apple Notes). For professional sketching applications (e.g., Autodesk Sketch) the average latency was even higher with 82.85 ms (SD 10.1 ms). As non-native applications do not have access to these touch predictions and especially web browsers induce additional latency due to script executions or asynchronous frame rates (Yun et al., 2017) using HTML5-Canvases implemented in commercially available whiteboards, commercial styluses also produce a noticeable delay of the digital ink. Compared to previous measurements in (Helps and Helps, 2016;Yun et al., 2017), our results for the stylus movement (M 42.57 ms) and digital ink measurement (M 79.40 ms) are similar. From these results, we derive that our system needs on average 36.83 ms (79.40ms-42.57 ms) to convert stylus movements into pointer events forwarded to the web browser and further delayed by script executions, and the fact that chromium runs in a separated operating system thread. As our stylus movement revealed a lower latency than the visualization of digital ink on a web browser, two simple solutions could reduce the perceptible latency: 1) Direct preview of the digital ink by a native gameengine line renderer and 2) implement touch prediction techniques for the web browser surface.

USABILITY EVALUATION
We conducted a first usability evaluation to demonstrate the feasibility and viability of our concept by measuring general usability aspects of the system including specific qualities related to user experience, intuitive use, and task load. We demonstrate the calibration accuracy and reliability of our reference implementation by showing handwritten and sketched results generated by subjects. As described in Section 3.2.2, we already included visual feedback regrading dimensions and aspect ratios. These features were not part of the initial usability evaluation. In this evaluation, the visual support were provided only by spherical dots that changed color from red to green when participants hit the target surface dimension.

Measures
According to our evaluation module of the framework, we measured task load, user experience, and usability. Simulator sickness was measured with the SSQ. The UEQ was used to measure perceived user experience. We measured perceived task load with the RTLX. We measured usability with the SUS, and subjective consequences of intuitive use with the QUESI. We also provided rich-text fields for comments regarding advantages and disadvantages of each of the three tasks. Figure 10 shows the experimental procedure. Participants gave informed consent before the start of the usability evaluation. We measured participants' interpupillary distance with an smartphone app (Eye Measure) and adjusted the headset accordingly.

Procedure
Participants were shown a video tutorial explaining the calibration techniques and subsequent writing. Afterwards, participants put on the VR glasses. The experimenter orally instructed participants in the training phase how to execute the task following a written script. The participants then repeated the explained task twice on their own. Subsequently the participants evaluated their experiences on a separate computer by completing the questionnaires described in the measurements section. The usability evaluation is separated in three different tasks (1: Stylus Calibration) calibrating the attached tip (2: Surface Calibration) calibrating the surface, and (3: Surface Interaction) handwriting and sketching on the surface. Each task requires executing the preceding calibration steps. As visualized in Figure 7 and the resulting user drawings in Figure 11, participants were asked to transfer the given shapes in the upper field to the lower field. They were also asked to write a text phrase three times on the provided lines. Due to the COVID-19 pandemic, we took extra hygienic measures in accordance with institutional regulations valid at the time. Before the experiment, participants were asked about their current quarantine status, whether they stayed abroad or had contact to a person proven sick in the previous 14 days or were currently suffering from fever, cough or breathlessness. None of the participants qualified for these exclusion criteria.

Participants
We recruited ten participants (three identifying as female, seven as male), eight of which are right-handed and two left-handed. Their age ranged from 22 to 36 years (M 27.00, SD 4.37). All participants reported more than 20 h VR usage and reported normal or corrected-to-normal vision.

Apparatus
Participants sat on a chair at a physical table in order to have a comfortable and natural writing position. We put a mouse pad on top of the physical table as a non-slip surface for the stylus FIGURE 10 | Experimental procedure.
calibration. We used a Microsoft Windows 10 based computer system consisting of an i7-9700K processor, 16 GB DDR4-RAM, and a NVIDIA GeForce RTX 2070 SUPER GPU.

Results
We aggregated data in Jupyter Notebooks (Kluyver et al., 2016) using Python 3. The results for the questionnaires UEQ, RTLX, SUS, QUESI, and SSQ are presented in Figure 12.

Simulator Sickness
Overall reported simulator sickness was low before and after the VR exposure (

User Experience
As stated in the Handbook for the User Experience Questionnaire 12 , ratings between −0.8 and 0.8 represent a neutral evaluation and above 0.8 indicate a positive evaluation. Ratings lower/higher than −2/+2 are extremely unlikely. For the stylus calibration tasks, Stimulation was rated as neutral with some participants rating it as "boring" and "not interesting", others as "valuable" and "motivating". All other ratings across all tasks were positive with average values above 1.0.

Task Load
Overall, task load was low across all experimental tasks.  interaction task shows the best results for QUESI subscales High Familiarity (M 4.63, SE 0.09) and Low Perceived Effort of Learning (M 4.57, SE 0.10). Overall, by observing the range of the questionnaire scales SUS and QUESI, the results are on a good to very good level.
6.5.5 User Sketching and Handwriting Figure 11 shows users' sketching and handwriting created during the surface interaction task. The tasks are also visualized in Figure 7. Due to a connection error, we unfortunately missed the user drawings for one participant, except the handwriting task for "Sonne und Wellen". We therefore report user drawings of nine participants. Participants correctly reproduced both unguided visual-motor interaction tasks (A) and guided motor coordination tasks (B). All but two participants who could not correctly reproduce the zigzag line were able to correctly reproduce the graphomotor tasks (C) as well as the free-hand loop pattern (D). For the handwriting tasks (E) all participants created legible handwriting results by copying the given text phrases ("Hallo" and "Sonne und Wellen") three times to the lines below.

User Feedback
Participants generally provided positive feedback on the two calibration tasks and the interaction task. Three participants explicitly mentioned the benefit of a VR stylus, but two also noted that using a consumer-grade VR controller as a VR stylus prevents the regular intended use. For the stylus calibration task, four participants mentioned that the calibration was intuitive and easy to use once they learned it. Three participants positively highlighted the fast calibration process, while two participants found the calibration too time-consuming. Three participants indicated that they would like to have more visual feedback. One participant was unclear about the reason for the calibration and its repetitiveness. Regarding surface calibration, four participants liked the simplicity of the process. Regarding visual support, participants did not have a clear opinion, with two of them are confident with the visual support and two would like more visual support. Two participants would like to see a reference to the real table height. Four participants particularly emphasized the naturalness, simplicity, and familiarity of the surface interaction task, including sketching and handwriting. However, four participants also noted that the VR stylus sometimes did not behave like a real pen in terms of highly accurate recognition of the start and end of handwriting and sketching.

Discussion
The evaluation of the individual questionnaires allows us a first assessment of our work in terms of low task load, high user experience, and high usability. This is also confirmed by positive user feedback highlighted by the open questions. Participants were able to use the system easily and experienced high efficiency and control over the system. They reported an appealing user experience, low physical and mental demand, and also low effort of learning. The slightly higher task load of the stylus calibration compared to the surface calibration is noticeable. We suspect that the combination of using a second controller and different hand movements between left and right hand increased the complexity of the calibration process. Open questions revealed, that it was not clear to all participants why the stylus calibration requires more repetitive action than the surface calibration. The participants successfully reproduced given shapes and created legible handwriting results, indicating that our techniques are suitable for everyday use and can be a good start for handwriting input in XR environments. Nevertheless, it should be mentioned that the produces handwriting may not as good as with a real pen in reality. Task load being higher in the handwriting task than the calibration tasks can be explained by its longer duration.

LIMITATIONS AND FUTURE WORK
Our technical measurements revealed some limitations and directions for further research. We plan to extend our algorithm to improve the detection of intended interactions with the 2D surface by adding velocity and acceleration as additional evaluation parameters. This allows us to also rely on the user's motion beside the current position of the XR stylus. An optional (re-)calibration step could also improve the initial calibration of the surface regarding position and rotation. As the web browser integration increases the overall system latency, we will also investigate possible performance improvements. Since our selected standardized test (Beery VMI) also provides more complex shapes, we will evaluate a possible extension with additional tasks, besides the already included shapes of the horizontal line, circle, rectangle, and triangle. With XR stylus input devices now soon to be commercially available, OTSS can be used as the underlying framework to provide alignment methods for physical surfaces and to test different XR styluses while obtaining comparable results. Our first implementation is based on the Oculus Rift S VR device with Oculus Touch controllers. As more XR devices such as the Varjo XR1 and XR3 are released, we will provide new reference implementations. We also plan to evaluate self-made XR styluses against commercial devices such as the Logitech VR Ink. Since our framework already integrates a web browser and supports 2D interactions as well as handwriting and sketching, one can imagine that the support for web-based questionnaires in XR is implicitly supported and will be explored in future publications.

CONCLUSION
This article introduces the Off-The-Shelf Stylus (OTSS) framework that provides guidance on how to structure and modularize device-independent required functionalities to realize 2D interaction (in 3D) as well as for handwriting and sketching with digital pen, ink, and paper in XR on virtual/ physical 2D surfaces. In comparison to related work, OTSS provides a comprehensive test bed combining tests for precision, accuracy, and latency with extensive usability evaluations including handwriting and sketching tasks based on established handwriting research investigating visuomotor and graphomotor skills. Our approach simplifies and extends earlier work that uses custom-made hardware or self-made attachments to already tracked XR controllers as virtual styluses. Whereas previous work utilized 3D printing and/or microcontrollers, we tested three attachment prototypes as more accessible and cheaper alternatives, and show how to calibrate these tips based on previous work. We recommend to use the rigid stylus attachment because it most closely resembles a regular pen and is fit for use on everyday surfaces [Q1]. Development of OTSS was accompanied and validated by an extensive reference implementation targeting the Unity game engine. It uses an Oculus Rift S headset and Oculus Touch controllers with the built-in tracking system, and an Apple Pencil tip. With our concept, we also show manufacturers an alternative use of their XR devices and propose to add a fixed thread to the bottom of each controller to simplify the attachment of accessories such as a stylus tip. For example, the HTC Vive Tracker already leverages the design from ISO 1222-2010 by providing a 1/4" screw nut. The evaluation of our reference implementation revealed an average stylus precision of 0.98 mm (SD 0.54 mm). OTSS proposes a calibration technique to align virtual 2D surfaces with arbitrary flat physical surfaces, called Visually Assisted 3point Virtual Surface calibration and Alignment (3ViSuAl) [Q2].
Our reference implementation for 3ViSuAl shows an average surface calibration accuracy of 0.60 mm (SD 0.32 mm). We recommend to integrate a commercial state-of-the-art web browser plugin to generate, store, and share written and sketched content. This also enables using cross-platform web applications in XR which consumers are already familiar with in everyday spaces like homes and schools [Q3]. We measured a motion-to-photon latency until the stylus movement was displayed as digital ink in VR of 79.40 ms on average (SD 23.26 mm) for the web browser, including the controller's motion-to-photon latency of 42.57 ms (SD 15.70 mm) without the usage of the web component. The usability evaluation highlights the benefits of our solution in terms of low task load, high usability, and high user experience. Overall, participants reported high enjoyment and usability, while also experiencing low physical and mental demand, and low effort of learning. Participants calibrated the stylus and the surface themselves, and successfully reproduced given shapes and created legible handwriting, based on previous visuomotor, graphomotor, and handwriting research [Q4].
We provide source code access to our reference implementation including stylus and surface calibration, and surface interaction features (https://go.uniwue.de/hci-otss). We believe that such an open source implementation is highly relevant and significant to the field because it provides researchers and practitioners with a comprehensive and shared foundation making it easy to reuse, extend, adapt, and/or replicate previous results, without the need for a tedious and potentially error-prone replication of an implementation just by the information given in the papers.
Based on our results, we propose three design implications: 1) Self-made XR styluses are a promising solution for 2D interaction as well as handwriting and sketching in XR, 2) physically-aligned virtual web browser surfaces provide passive haptic feedback and enable reuse of web platform features in XR, and 3) standardized tests combined with usability evaluations and objective measurements provide a comprehensive evaluation suite for measuring handwriting and sketching performance in XR.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
Ethical review and approval was not required for the study on human participants in accordance with the local legislation and institutional requirements. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
FK and PK wrote the first draft of the paper. EG, KK and RS provided further first-hand writing. All other authors contributed to and reviewed the paper. The ideas of the paper were formulated through a series of meetings to which FK, PK, FN and ML contributed.

FUNDING
This work was supported in part by a grant from the German Federal Ministry of Education and Research (BMBF) in the project ViLeArn (reference number: 16DHB2111). This publication was supported by the Open Access Publication Fund of the University of Wuerzburg.