Edited by:
Reviewed by:
*Correspondence:
This article was submitted to Frontiers in Radiation Oncology, a specialty of Frontiers in Oncology.
This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.
Radiotherapy is safely employed for treating wide variety of cancers. The radiotherapy workflow includes a precise positioning of the patient in the intended treatment position. While trained radiation therapists conduct patient positioning, consultation is occasionally required from other experts, including the radiation oncologist, dosimetrist, or medical physicist. In many circumstances, including rural clinics and developing countries, this expertise is not immediately available, so the patient positioning concerns of the treating therapists may not get addressed. In this paper, we present a framework to enable remotely located experts to virtually collaborate and be present inside the 3D treatment room when necessary. A multi-3D camera framework was used for acquiring the 3D treatment space. A client–server framework enabled the acquired 3D treatment room to be visualized in real-time. The computational tasks that would normally occur on the client side were offloaded to the server side to enable hardware flexibility on the client side. On the server side, a client specific real-time stereo rendering of the 3D treatment room was employed using a scalable multi graphics processing units (GPU) system. The rendered 3D images were then encoded using a GPU-based H.264 encoding for streaming. Results showed that for a stereo image size of 1280 × 960 pixels, experts with high-speed gigabit Ethernet connectivity were able to visualize the treatment space at approximately 81 frames per second. For experts remotely located and using a 100 Mbps network, the treatment space visualization occurred at 8–40 frames per second depending upon the network bandwidth. This work demonstrated the feasibility of remote real-time stereoscopic patient setup visualization, enabling expansion of high quality radiation therapy into challenging environments.
Radiotherapy is safely employed for treating wide variety of cancers. The radiotherapy workflow includes the positioning of the patient in the intended treatment position. Trained radiation therapists conduct this, but occasionally consultation is required from other experts, including the radiation oncologist, dosimetrist, or medical physicist. In many circumstances, including rural clinics and developing countries, this expertise is not immediately available, so the concerns of the treating therapists may not get addressed. By the year 2015, 15 million new cancer patients are expected in the world each year, of which 10 million will be in the developing countries. Ensuring that those patients receive appropriate treatment is a major challenge (
Radiation therapy treatments continue to gain complexity and modern linear accelerators are essentially robotically controlled, creating the need for more advanced in-room monitoring. Current monitoring is restricted to one or more 2D video cameras positioned in the room and monitored by the radiation therapists. There are neither computer-based analysis nor monitoring of the video, they are intended as straightforward monitoring devices because the therapists cannot be in the room during treatment and the radiation shielding requirements preclude the use of windows.
One of the challenges of modern radiation therapy is the distribution of specific expertise required for each clinic to safely treat their patients. Medical physicists, for example, are often called to the treatment room to assist the radiation therapists in evaluating a treatment set up. In many clinics, especially rural clinics, there are not enough medical physicists to allow full-time access, and the therapists will not have access to the expertise. This problem also exists in treatment planning expertise. Recent advances in the digital storage and efficient and reliable communications have enabled improved 2D remote visualization that facilitated a decentralized radiotherapy services by allowing remote quality assurance of treatment delivery (
Virtual reality-based visualizations greatly help in developing 3D collaborative environments for radiotherapy applications. Efforts by peers have focused on developing visualization frameworks specifically for radiotherapy training (
Our context in this paper is focused on real-time acquisition and visualization of the patient treatment setup for radiotherapy. While expertise that is required in the treatment room can be given over the telephone, it will be much less effective than having the expert physically within the room. We hypothesize that 3D visualization of the patient setup and intrafraction motion will enable experts to provide important consultative services to rural and developing country clinics. In this paper, we present an approach for remote visualizing in 3D the patient setup, allowing the expert to interact with the local team as though they were in the linear accelerator room with them. The key contribution of this paper is to present a real-time remote multi-3D camera-based imaging system that provides remote real-time 3D images of the patient positioning setup at greater than 30 frames per second (FPS) required for effective visualization (
For this work, we selected the Kinect camera as the 3D camera used for the proposed 3D monitoring system as it was cost effective and readily available across the world. From a technical perspective, Kinect cameras provide both color and depth information, which enables to use the wide range of vision and 3D information processing algorithms for monitoring and visualization purposes. These cameras were being sold as part of a computer gaming system, so they were not required to be quantitative. However, in this application, the images from multiple cameras needed to be stitched together to form a seamless and visually accurate representation of the room environment. If the cameras were not accurately calibrated, the surfaces from two cameras would not coincide where their image fields overlapped. This led to the development of a calibration procedure (Santhanam et al., in review).
The camera calibration was performed for each camera to determine the relationship between the raw depth information and the calibrated depth. Images from each camera were first corrected for camera-specific distortion. Image refining steps prepared the 3D information (2D color and depth information) from each of the cameras into a single 3D context. Image stitching was performed by computing the transformation matrix that transforms the 3D information of each camera into a specific reference camera. The 3D stitched images represented the patient setup on the treatment couch along with the gantry and couch positions.
A client and server system was used to correct, stitch, and transport the images in real-time to the observer. The term client refers to the user interface associated with each remotely located expert. The term server relates to the software interface that controls the multi 3D Kinect camera system discussed in the “Multi-3D Camera Setup” section.
For our proposed visualization system, we offloaded the rendering task to the server to allow the 3D content to be visualized at 30 FPS for satisfying real-time requirements (
The server interface consisted of three pipelined procedural threads. The first procedural thread was used to creating a connection between the client and the server. The thread also maintained the frame rate desired by the client. The second procedural thread acquired the 3D treatment room space as discussed in “Multi-3D Camera Setup” section and the third procedural thread communicated with the client for sending the 2D stereoscopic projections of the treatment room space.
The steps involved in the render-to-texture feature are as follows. First the 3D treatment room space, in the form of a vertex list, was transferred to one of the GPU’s memory. We then assigned a texture memory space as the location where the final rendered image would be placed. We then created an openGL pipeline that processed the 3D vertex list for each eye and clipping boundary specifications provided by the client. Finally, using the GPU’s vector functionality, we projected each 3D location in the vertex list through the OpenGL pipeline onto the pre-determined 2D texture. The 2D texture was then copied from the GPU into the server (
The 2D images were losslessly encoded in the H.264 stream format, which reduced the network-transmitted 2D image data size. The encoded 2D images for each eye were then transmitted to the client interface, which were subsequently decoded and visualized using Nvidia’s 3D vision interface.
The computational complexity of H.264 encoding would take a few seconds per frame if conducted using traditional compute processing units (CPU)-based architecture and so would degrade the real-time nature of the remote visualization (
For the DCT kernel, each component (
The last step of the H.264 compression was Huffman encoding, which made use of the many zeros in lower frequencies resulting from DCT and quantization (
Visualization using data streams is limited by frame-to-frame jitters that occur because of non-uniform network data transfer rates (
System configuration.
3D Camera | Microsoft Kinect (6 cameras) |
---|---|
Server | Intel Core i7 3.6 Ghz, 8 GB RAM |
Server GPU | Nvidia GTX 680m (2) |
Network interface | Ethernet |
Client | Intel Core i7 3.6 GHz, 8 GB RAM |
3D display | Viewsonic 120 Hz LED display |
3D wearable accessory | Nvidia 3D vision |
Remote visualization characteristics using a gigabit Ethernet connection.
RGB image size | 1280 × 960 pixels | 640 × 480 pixels |
---|---|---|
Stereo H.264 frame size | 110 KB | 28.5 KB |
Stereo H.264 encoding time | 14 ms | 4 ms |
Stereo image generation time | 30 ms | 30 ms |
Effective streaming bandwidth | 72 Mbps | 72 Mbps |
Frames transferred over network | ~81 FPS | ~320 FPS |
Remote visualization characteristics using a 100 Mbps connection with a frame rate of 30 FPS.
RGB image size | 1280 × 960 pixels | 640 × 480 pixels |
---|---|---|
Stereo H.264 frame size | 110 KB | 28.5 KB |
Stereo H.264 encoding time | 14 ms | 4 ms |
Effective streaming bandwidth | 8 Mbps | 8 Mbps |
Frames transferred over network | ~8 FPS | ~40 FPS |
A framework for remote 3D visualization is presented in this paper. A multi-3D camera framework is used for acquiring the D treatment space. A client–server framework enables the 3D treatment space to be visualized by remotely located experts in real-time. The visualization tasks on the client side are offloaded into the server side to enable flexibility on the client side. A scalable multi GPU system that enables rendering the 3D treatment space in stereo and in real-time is employed on the server side. The rendered 3D images are then encoded using a GPU-based H.264 encoding for streaming purposes. Results showed that experts within a clinical facility and with high-speed gigabit Ethernet connectivity will be able to visualize the treatment space with 1280 × 960 pixel resolution at approximately 81 frames per second. For experts remotely located, the treatment space visualization can be conducted at 40 FPS with a resolution of 640 × 480 pixels.
Two technical limitations were observed in our client–server setup. The network bandwidth did not form a bottleneck for experts located in the same high-speed network and visualizing with a frame size of up to 1920 × 1080 pixels. The 3D treatment space acquisition, which occurred at a rate of 30 FPS formed the bottleneck in this case. The other key tasks such as 3D stereo rendering and H.264 encoding occurred at a rate faster than the treatment space acquisition rate. However, it was observed that for frame sizes greater than 2550 × 1940 pixels, the H.264 encoding took more time that the camera acquisition. Thus for greater frame sizes, offloading client tasks to the server led to an overall reduction in the speedup. Future work will focus on improving the H.264 encoding algorithm efficiency for stereoscopic video sequences.
The second limitation was that the number of server supported clients depended on the number of GPUs available for providing the required fast 3D rendering and encoding because each GPU was dedicated to handle a set of client tasks based on the requested frame size. Future work will focus on using a GPU cluster coupled using load-balancing algorithms that enable efficient GPU usage.
Jitter in the 3D visualization occurred for clients that used available bit-rate network connections. The jitter avoidance mechanism discussed in this paper removed such artifacts, but its effectiveness was limited by the network behavior. Increases in the network delays and packet loss rates led to random decrease in the
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.