Bringing Vision-Based Measurements into our Daily Life: A Grand Challenge for Computer Vision Systems

Scharcanski, Jacob

doi:10.3389/fict.2016.00003

PERSPECTIVE article

Front. ICT, 09 March 2016

Sec. Robot Vision and Artificial Perception

Volume 3 - 2016 | https://doi.org/10.3389/fict.2016.00003

Bringing Vision-Based Measurements into our Daily Life: A Grand Challenge for Computer Vision Systems

Jacob Scharcanski*

Programa de Pós-Graduação em Engenharia Elétrica, Instituto de Informática, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil

Bringing computer vision into our daily life has been challenging researchers in industry and in academia over the past decades. However, the continuous development of cameras and computing systems turned computer vision-based measurements into a viable option, allowing new solutions to known problems. In this context, computer vision is a generic tool that can be used to measure and monitor phenomena in wide range of fields. The idea of using vision-based measurements is appealing, since these measurements can be fast and easy to obtain and handle by humans. On the other hand, these vision-based measurements need to be at least as precise and accurate as those obtained by more traditional measuring systems. This article discusses the perspectives and challenges of introducing vision-based measurements into our daily life.

1. Introduction

Bringing computer vision into our daily life has been challenging researchers in industry (Google, 2015) and in academia (Shirmohammadi and Ferrero, 2014; Fernandez-Caballero, 2015) over the past decades. The continuous development of cameras and computing devices to improve the ability to capture, interpret, and understand images turned computer vision-based instruments into a viable option (Google, 2016), providing new solutions to known problems. In this context, computer vision becomes a generic tool that can be used to measure and monitor phenomena in wide range of fields.

However, considering measurements used on a daily basis, such as measuring the volume of a room or trying to find out if a piece of furniture fits in a room, even such simple tasks involve complex computational procedures (e.g., using visual–inertial odometry to provide an accurate three-dimensional reference system for indoors scenes, or motion tracking to capture three-dimensional scene representations using sequences of two-dimensional images of uncontrolled environments) (Google, 2015). Despite the computational complexity involved in such tasks, new computer vision developments potentially can help facing such challenging tasks at manageable costs (Google, 2016). For example, computer vision systems could help improving the quality of the images taken in the presence of reflecting or occluding elements, such as windows and fences (Xue et al., 2015), compensating the viewpoint from which the image was captured in three-dimensional visual data processing (Satkin and Hebert, 2013), localizing the user in the surrounding environment, so that indoor and outdoor scenes can be represented in referenced spaces (Bettadapura et al., 2015), or yet capturing automatically the relationships among visual elements in real-world scenarios (Choi et al., 2015).

The idea of using vision-based measurements in daily life is appealing, since these measurements can be obtained fast and easily by humans. On the other hand, these vision-based measurements need to be at least as precise and accurate as those produced by more traditional measuring systems. To avoid any confusion with colloquial terms, accuracy evaluates how close the measurement of a quantity is to its true value [Bureau International des Poids et Mesures (BIPM), 2008], and precision reflects the degree to which repeated measurements show the same results under the same conditions. For example, a vision-based measurement system that produces a systematic error would be consistent but inaccurate in practice. An important challenge is to develop vision-based measurement systems that are both accurate and precise under the conditions found in practice.

2. Reliable Measurements via Computer Vision

Several technological developments rely on the acquisition and processing of images and videos to extract and enhance information (Scharcanski et al., 1993; Cavalcanti et al., 2010), in areas ranging from engineering (Scharcanski and Dodson, 1966, 2000; Scharcanski, 2005) and medicine (Wong et al., 2011) to biometric authentication (Behaine and Scharcanski, 2012; Verdoolaege et al., 2014). In particular, the growing processing capability and the improvement of the camera quality of consumer electronic devices (Google, 2016), such as smartphones and laptops, allow to obtain a wide range of vision-based measurements in a variety of areas, such as monitoring food intake, assisting the visual impaired, biometric user authentication, and in several other medicine and engineering applications.

Monitoring and controlling the nutritional intake in daily life can be challenging for chronically ill people and for healthy individuals as well (e.g., monitoring what we eat in restaurants). Computer vision systems can be trained to recognize the contents of meals, estimate the food portions, and predict their nutritional contents, such as calories (Myers et al., 2015). Furthermore, computer vision systems provide the adequate tools to capture visual cues during food preparation and help control of the nutritional intake (Malmaud et al., 2015).

Computer vision-based measurements have the untapped potential for helping persons with blindness or visual impairments to interpret and explore the visual world (Manduchi and Coughlan, 2012). This type of assistive technology has several roles to play in our society. For example, the properties of the immediate surroundings could be captured by computer vision systems, helping to measure distances and avoiding obstacles, handling apertures, such as doors, and maintaining a desired trajectory while walking. By measuring and identifying the position of a person within his/her environment, a route to a destination could be planned and tracked. Detecting printed materials could make textual information accessible to the visually impaired by using computer vision-based translators (e.g., assisting a blind person while shopping at a supermarket). Mobile computer vision devices are promising to deploy these assistive technologies and make them viable for real life applications (Manduchi and Coughlan, 2012; Google, 2016).

The utilization of biometrics has increased specially in modalities, such as face recognition, fingerprint recognition, iris recognition, hand gesture recognition, and palmprint recognition. It is desirable for vision systems to perform biometric recognition covertly, based on contactless data acquisition, and to provide reliable recognition of human beings (Scharcanski et al., 2014). This is especially true considering the applications involving uncontrolled scenarios (e.g., airports, public areas, etc.). Therefore, the development of biometric recognition systems for unconstrained conditions is of current interest, and advances in aspects, such as dealing with varying illumination sources, variations in poses and distances, or blurred and low quality data during image acquisition have been made recently (Scharcanski et al., 2014). However, advances in other areas are also important to allow vision systems operate autonomously and covertly and perform reliable recognition of human beings, such as (a) less controlled/covert data acquisition frameworks, (b) segmentation of poor quality biometric data, (c) biometric data quality assessment, (d) contactless biometric recognition strategies, (e) biometric recognition robustness to data resolution, illumination, distance, pose, motion, and occlusions, and (f) multimodal biometrics and fusion at different levels.

Vision-based measurements relying on cameras available via consumer electronics devices, such as smartphones and laptops, can have a great impact in areas, such as personalized health and medicine. For example, assistive vision systems based on RGB-D cameras (e.g., Kinect) can improve the recovery times of physiotherapy patients by increasing their engagement in the rehabilitation sessions [e.g., through the integration with 3D serious games (Duarte et al., 2014)], while providing feedback about their progress in the rehabilitation program. Also, image-based measurements can play a decisive role in early diagnosis of melanoma, which is one of the most rapidly increasing cancers in the world, with an incidence of 76,690 in the United States in 2013 (Scharcanski and Celebi, 2014). Early diagnosis is very important, since melanoma can be cured with a simple excision if detected early. Recently, macroscopy (i.e., skin lesion imaging with standard cameras) has proved valuable in evaluating morphological structures in pigmented lesions (Cavalcanti et al., 2010, 2013; Wong et al., 2011; Cavalcanti and Scharcanski, 2013; Scharcanski and Celebi, 2014). Although computerized techniques are not intended to provide a definitive diagnosis, these imaging technologies are accessible and can be useful in early melanoma detection (especially for patients with multiple atypical nevi) and can serve as an adjunct to physicians.

Computer vision systems can help overcoming common limitations of the human visual system in daily tasks (e.g., in repetitive tasks that are tiresome for humans, or in tasks that require fast response, or yet in tasks that require visual discrimination beyond the human visual system capacity). Therefore, vision-based measurements can provide an alternative to the measurement methods used traditionally in different areas, such as medicine, traffic monitoring, and industry. For example, infrared image data captured by non-mydriatic digital retinography systems are used to diagnose and treat diabetic macular edema (DME) cases using lasers. However, sequences of infrared eye fundus images are inherently noisy and show frequent background illumination changes, increasing the risk of DME laser treatments for the patients. However, highly accurate vision-based measurements (with a precision level of micrometers) can be used, for example, to detect the DME lesion spots automatically and accurately, or for detecting important anatomical retinal structures automatically (e.g., the optic disk or the macula), or yet for detecting fast even small retinal motions before the main laser hits healthy regions of the retina and compensate for involuntary retinal motions of the patient, increasing the safety of retinal laser treatment systems for DME (Scharcanski et al., 2013; Welfer et al., 2013).

Detecting and counting vehicles in urban traffic videos can be a tedious task and/or may require fast human operator responses. However, such measurements can be performed effortlessly by vision systems at user-defined virtual loops with the accuracy and precision comparable to conventional traffic monitoring approaches (e.g., by detecting and counting vehicles at user-defined virtual loops in video sequences) (Barcellos et al., 2015). However, partial and total occlusions in vehicular tracking situations can cause vehicle tracking and counting errors, posing a challenge to human controllers. Nevertheless, vision-based measurement systems can track multiple vehicles simultaneously, detect when a vehicle total occlusion starts and ends, and resume the vehicle tracking process after vehicle disocclusions, minimizing such difficulties (Scharcanski et al., 2011). Furthermore, critical visual tasks, such as pedestrian detection, are increasingly important in vehicular technology (e.g., autonomous driving). The integration of deep learning with computer vision tools has the untapped potential for approaching such visual tasks, where reduced miss rates and real-time are very important (Angelova et al., 2015).

Estimating materials properties plays a central role in vision, robotics, and engineering, especially in tasks that require going beyond the limits of human visual perception (e.g., estimating material properties from videos by detecting material vibration modes that are visually imperceptible) (Davis et al., 2015). Currently, large amounts of visual data are acquired, providing important information about the materials produced by continuous manufacturing processes, and about the manufacturing processes themselves. However, it is often difficult to measure objectively the similarity among such industrial stochastic images, or to discriminate between texture images of stochastic materials with distinct properties. The degree of discrimination between materials required by such industrial processes often goes beyond the limits of human visual perception. However, vision-based measurements can be designed to analyze such stochastic processes continuously and effortlessly and to perform tasks that would be challenging for humans (e.g., real-time analysis of industrial stochastic textures in the non-woven textiles and paper industries) (Scharcanski and Dodson, 2000; Scharcanski, 2007).

3. Conclusion

Computer vision have the potential to make measurement systems easier to use by humans and to allow the development of new measurement tools in engineering, medicine, and other areas. Vision-based measurements have the potential for making measurements even in conditions that would be uncomfortable for humans, for example, in repetitive tasks, or that require fast responses, or yet that require visual discrimination beyond the human visual system capacity. The main challenges in this area involve integrating imaging devices and other sensing devices (e.g., odometry devices) with computer vision tools, aiming to develop measurement systems that are accurate, precise, and can be used easily in daily life.

Author Contributions

The author confirms being the sole contributor of this work and approved it for publication.

Conflict of Interest Statement

The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Angelova, A., Krizhevsky, A., and Vanhoucke, V. (2015). “Pedestrian detection with a large-field-of-view deep network,” in IEEE International Conference on Robotics and Automation (Proceedings of ICRA 2015), May 26–30. Seatlle.