New Approach to Accelerated Image Annotation by Leveraging Virtual Reality and Cloud Computing

Three-dimensional imaging is at the core of medical imaging and is becoming a standard in biological research. As a result, there is an increasing need to visualize, analyze and interact with data in a natural three-dimensional context. By combining stereoscopy and motion tracking, commercial virtual reality (VR) headsets provide a solution to this critical visualization challenge by allowing users to view volumetric image stacks in a highly intuitive fashion. While optimizing the visualization and interaction process in VR remains an active topic, one of the most pressing issue is how to utilize VR for annotation and analysis of data. Annotating data is often a required step for training machine learning algorithms. For example, enhancing the ability to annotate complex three-dimensional data in biological research as newly acquired data may come in limited quantities. Similarly, medical data annotation is often time-consuming and requires expert knowledge to identify structures of interest correctly. Moreover, simultaneous data analysis and visualization in VR is computationally demanding. Here, we introduce a new procedure to visualize, interact, annotate and analyze data by combining VR with cloud computing. VR is leveraged to provide natural interactions with volumetric representations of experimental imaging data. In parallel, cloud computing performs costly computations to accelerate the data annotation with minimal input required from the user. We demonstrate multiple proof-of-concept applications of our approach on volumetric fluorescent microscopy images of mouse neurons and tumor or organ annotations in medical images.


Medical images
. Annotation in DIVA on MRI of breast tumor (white arrow) from the Cancer Imaging Archive (TCIA) from the RIDER Breast MRI collection subject RIDER-1627409910.a) Raw data visualized in 3D on DIVA and as a z-stack in the bottom right corner. b) Overlay of raw data in gray and tagging data with positive and negative tags respectively in cyan and magenta. c-d) Overlay of raw data in gray and output probabilities respectively for RFC and strong learner. Colorscale for probabilities is indicated at the right. Figure S2. Annotation in DIVA on CT-scan of lung tumor (white arrow) from the Medical Segmentation Decathlon (MSD) challenge : lung 003. b) Overlay of raw data in gray and tagging data with positive and negative tags respectively in cyan and magenta. c-d) Overlay of raw data in gray and output probabilities respectively for RFC and strong learner. Colorscale for probabilities is indicated at the right. Figure S3. Annotation in DIVA on CT-scan of pancreas (white arrow) from the Medical Segmentation Decathlon (MSD) challenge : pancreas 001. a) Raw data visualized in 3D on DIVA and as a z-stack in the bottom right corner. b) Overlay of raw data in gray and tagging data with positive and negative tags respectively in cyan and magenta. c-d) Overlay of raw data in gray and output probabilities respectively for RFC and strong learner. Colorscale for probabilities is indicated at the right. Figure S4. Annotation in DIVA on CT-scan of pancreas (white arrow) from the Medical Segmentation Decathlon (MSD) challenge : pancreas 004. a) Raw data visualized in 3D on DIVA and as a z-stack in the bottom right corner. b) Overlay of raw data in gray and tagging data with positive and negative tags respectively in cyan and magenta. c-d) Overlay of raw data in gray and output probabilities respectively for RFC and strong learner. Colorscale for probabilities is indicated at the right. Figure S5. Annotation in DIVA on CT-scan of hepatic vessel (white arrow) from the Medical Segmentation Decathlon (MSD) challenge : hepaticvessel 001. a) Raw data visualized in 3D on DIVA and as a z-stack in the bottom right corner. b) Overlay of raw data in gray and tagging data with positive and negative tags respectively in cyan and magenta. c-d) Overlay of raw data in gray and output probabilities respectively for RFC and strong learner. Colorscale for probabilities is indicated at the right. Figure S6. Annotation in DIVA on CT-scan of hepatic vessel (white arrow) from the Medical Segmentation Decathlon (MSD) challenge : hepaticvessel 002. a) Raw data visualized in 3D on DIVA and as a z-stack in the bottom right corner. b) Overlay of raw data in gray and tagging data with positive and negative tags respectively in cyan and magenta. c-d) Overlay of raw data in gray and output probabilities respectively for RFC and strong learner. Colorscale for probabilities is indicated at the right. Figure S7. Annotation in DIVA on two-photon microscopy images of mouse microglia (white arrow), courtesy of Kurt Sailor (Institut Pasteur). a) Raw data visualized in 3D on DIVA and as a z-stack in the bottom right corner. b) Overlay of raw data in gray and tagging data with positive and negative tags respectively in cyan and magenta. c-d) Overlay of raw data in gray and output probabilities respectively for RFC and strong learner. Colorscale for probabilities is indicated at the right. Figure S8. Annotation in DIVA on SEBI images of a mouse hippocampus neuron (white arrow), courtesy of Kurt Sailor (Institut Pasteur). a) Raw data visualized in 3D on DIVA and as a z-stack in the bottom right corner. b) Overlay of raw data in gray and tagging data with positive and negative tags respectively in cyan and magenta. c-d) Overlay of raw data in gray and output probabilities respectively for RFC and strong learner. Colorscale for probabilities is indicated at the right. Overlay of raw data in gray and tagging data with positive and negative tags respectively in cyan and magenta. Tags are indicated with a white arrow. b) Overlay of raw data in gray and output probabilities for RFC. Colorscale for probabilities is indicated at the right.

FEATURES IMPORTANCE
We show here two examples of feature importance evaluation for two medical image stacks and one microscopy stack.

ANNOTATING IN 3D VS ANNOTATING IN 2D
We show in this section an example of a comparison between our approach to accelerated annotation and ilastik interactive segmentation (Sommer et al., 2011). Figure S12 shows the benefit of VR and transfer function adjustment in the visualization of complex 3D structures, here on an annotation task of hepatic vessels. In such examples, structures of interest are difficult to see in 2D sections. Volumetric reconstruction in 3D allows efficient rendering of the structure, improving thus the tagging experience. 3D motion accelerates the annotation by removing the necessity to scroll from one slide to another. Figure S12. User experience comparison in visualization of an example CT-scan of hepatic vessel (white arrow in panels a) and c)) from the Medical Segmentation Decathlon (MSD) challenge : hepaticvessel 002. a) and b) Screenshots of ilastik interface with planar sections in the three natural axes. c) and d) Screenshots of DIVA interface, after appropriate transfer function setting. Tags are shown in b) and d) in cyan for the structure of interest, here hepatic vessels, and in magenta for the background.
We show in Figure S13 a performance comparison between ilastik and Voxel Learning applied on medical example images. Our tests were conducted in a one-shot paradigm, where the user provides limited tags: one line in the structure of interest, one line out of it. The tagging procedure should only last a few seconds. When compared to expert segmentation, we show on medical examples that our technique yields higher Dice coefficients than ilastik. Importing tags done in VR into ilastik improves its performance, while not reaching similar scores as Voxel Learning. Note that this comparison does not involve the quality of learning as we use similar approaches as ilastik. It rather shows the expected gain from the use of VR annotation in order to provide a one-shot segmentation of the data. Figure S13. Performance comparison between Voxel Learning and ilastik on the 8 medical examples used throughout this article: two occurences of breast tumor MRI, and CT-scan of lung tumor, hepatic vessel and pancreas. Few tags are performed in VR for Voxel Learning, and in a 2D section in ilastik. The distribution of Dice coefficient over the different examples is shown here for Voxel Learning annotation using VR tags, and ilastik annotations respectively using 2D and VR tags.

A QUANTIFICATION OF EFFICIENCY IN LEARNING
In order to provide a quantitative assessment of the quality of the annotation performed within the software, we compared it to segmentation pipelines on examples of CT-scan of lung tumor. Results are shown in Figure S14. Note that such algorithms are usually trained on vast databases already annotated by experts, while our approach is based on one-shot learning, thus requiring no previous knowledge. As these results may be corrected with additional tagging, our procedure can then be iterated and performance enhanced.  Table S1. Different measurements collected when performing data annotation with DIVA on various medical examples. For each example, the file name and the size in pixels are available. We took track as well of the tagging time in VR, the number of positively tagged voxels ("+" in cyan) and negatively tagged voxels ("-" in magenta), and the time to calculate features for the training set. For each learner, we collected the training time, the inference time and the total time for the procedure. To assess the quality of annotation, we computed the Dice coefficient and the Residual Mean Square Error (RMSE) between our results and an expert segmentation. All time measurements are expressed in seconds. Table S2. Different measurements collected when performing data annotation with DIVA on various neuronal microscopy examples. For each example, the file name and the size in pixels are available. We took track as well of the tagging time in VR, the number of positively tagged voxels ("+" in cyan) and negatively tagged voxels ("-" in magenta), and the time to calculate features for the training set. For each learner, we collected the training time, the inference time and the total time for the procedure. All time measurements are expressed in seconds. Table S3. Frame rate of the DIVA software for each image with a sampling resolution of 73 (parameter in DIVA from 0 to 100). The Desktop Frame Rate corresponds to the frame rate when using the Desktop interface (3D visualization of the volume on a 2D monitor). The VR Frame Rate corresponds to the frame rate when viewing and manipulating the volume in the virtual environment. These values are contextdependent and strongly vary upon user's actions in VR. If he navigates through the volume with the VR headset or moves too quickly, the frame rate may indeed be impacted.

VIDEOS
Video S1. Annotation in DIVA on confocal microscopy images of mouse olfactory bulb interneurons. The whole pipeline is comprised of successive steps. 1) Tagging voxels in virtual reality. Overlay of raw data and tagging data with positive and negative tags respectively in cyan and magenta. 2) Model training (here RFC) using DIVA adapted interface. 3) Inference on the whole stack. Overlay of raw data and output probabilities for RFC. Transfer function is finally set for appropriate visualization.
Video S2. Tagging step in DIVA on CT-scan of lung tumor from the Medical Segmentation Decathlon (MSD) challenge : lung 001. The transfer function is first set to ensure proper visualization of the tumor in desktop mode. The tagging is performed in VR using the clipping plane tool to navigate inside the volume and grasp the contour of the structure of interest. Overlay of raw data in gray and tagging data with positive and negative tags respectively in cyan and magenta.