by September 6, 2005 0 comments

One of the most remarkable technology revolutions of the past decade is the rapid growth in digital imaging. Digital photography, videography and manipulation of images, driven by widely available and inexpensive digital still and video cameras are now commonplace. Digital still and video cameras are incorporated routinely as part of many real-world systems ranging from traffic monitoring systems and industrial inspection systems to Bollywood style entertainment, and civilian and military surveillance systems. 

Applies to: Everyone
USP: Recognition and processing of digital visual objects and its practical applications
Primary Link: 
Google keywords: computer vision tech 

In this second part on computer vision technologies we’ll look at how these devices are used today and what potential they hold for future. Vision technology has entered many applications in the consumer, commercial, civilian and military arenas. However, the successes in the field have come from unexpected directions and not where the experts may have predicted. For example, two decades ago, it was expected that the main applications would be in robot-vision, human-computer interaction and autonomous driving. The idea of ‘seeing’ a computer having human-like visual perception and capabilities (and artificial intelligence) was popular, wherein the computer could recognize people, objects and scenes like humans can. These goals are yet to be realized. Instead, real-life implementation lies elsewhere in a variety of interesting and useful but somewhat more modest applications. We will focus on four families of such applications-industrial machine vision, media and entertainment, surveillance, and traffic and driving. (we’ll omit medical imaging since its vastness merits it a separate article.) Industrial machine vision is one of the oldest areas of practical application of computer vision technologies. Typical applications include the verification of components for shape and dimensional accuracy, fault detection, placement of parts within a larger assembly, and other applications related to automation in manufacturing. The technologies developed in this area tend to be customer or application specific; although they are derived from the rich mainstream research literature in computer vision. An excellent survey of computer vision companies working in these (and other) areas can be found at

Example of 3D scene reconstruction. (left) one frame of a video sequence; (right) ‘range’ image, which depicts the 3D layout of the scene. The distance of the objects 

Vision technologies have long been applied in media and entertainment. Video motion estimation and optical-flow computations have been successfully applied towards video compression, frame-rate conversion of television content (ie, the conversion from the 30 frames/sec of NTSC signal to 25 frames/sec PAL signal) and slow motion video by inserting additional frames that do not violate the visual motion in the scene. The exciting recent development in this area is the use of 3D vision technology for movies and television shows. For example, a well-known technique used in movies called ‘match move’, routinely used in studios like ILM (Industrial Light and Magic), involves synthetic placement of 3D objects and people in real scenes captured on film or video. 

The goal is to create new views of the scene with the inserted objects so that the objects appear to move as if they were part of the real scene. Computer vision technologies are being used to model the 3D scenes and recover the camera movement from the images and videos captured during the production of the movie. Post-production tools are then used to insert the objects and create a new video sequence. A number of companies such as Realviz in France ( and 2D3 in UK ( directly apply research technologies to the film industry. Vision technologies are also impacting the home. A variety of photo-editing and manipulation products are beginning to offer capabilities that are based on vision technologies. These include image stitching to create panoramic photographs, removal of objects from photographs, precise cutting of elements from one image and seamless pasting onto others. The picture above illustrates the ‘smart erase’ object removal capability developed at Microsoft’s Cambridge UK Research lab and made available via Microsoft Digital Image 2006. However, progress in meeting the basic needs of the consumers to efficiently organize, index, search and summarize personal photo and (especially) video collections has been modest. While there are a few useful features such as the summarizer available as part of Microsoft’s moviemaker, automated tools that organize collections according to their content, or ‘clean-up’ collections by removing unwanted junk are yet to be developed. Vision-based autonomous driving research dates back to the early days of the field. This has natural military applications and so continues to receive considerable support and funding, though the levels have decreased from 15 or 20 years ago. A fully automated driving system based on vision remains a significant challenge. Research in this area is supported and driven by the US Defense Advanced Projects Agency (DARPA) (see http://www.darpa .mil /grandchallenge/) and it will be some time before this work appears in the commercial arena. However, more limited but extremely useful applications of vision technologies are already under serious consideration in the automotive industry. The Israeli company Mobileye (see http://www.mobileye. com/) has developed a number of applications such as Adaptive Cruise Control, Lane Departure Warning, Lane Change Assist and Blind Spot Detection, and other similar technologies to enhance safety while driving. An area that has been quietly achieving success is vision for security and surveillance. Although people and face recognition, under general conditions remains a challenge, access control using vision technologies is becoming a reality-person identification based on visual recognition of the iris is under evaluation in a number of places. 

Example of ‘smart erase’ object removal technology. (left) original input image; (right) result of removing the parent using the smart-erase tool

But the more advanced applications have to do with airborne and wide area video surveillance. For eg, companies working with the US Government have developed a number of solutions for the exploitation of video taken from manned and unmanned aircrafts, like applications for real-time stabilization of video taken from cameras that vibrate and shake heavily, georegistration of airborne video, and target detection and tracking (see The key lesson is not to be fixated on one set of ideas or a singular ‘Grand vision’ but to remain nimble and opportunistic and look for a wide variety of applications. Perhaps we will realize the vision of having an autonomous ‘seeing’ system some day; in the meanwhile, however, we believe the field will solve a number of useful real-world problems and develop a significant market that will sustain the research and innovation.

Dr P Anandan, MD, Microsoft Research India

No Comments so far

Jump into a conversation

No Comments Yet!

You can be the one to start a conversation.