Advertisment

Computer Vision-An Insight

author-image
PCQ Bureau
New Update

In the first part of the series, we describe the scientific foundations of the research in this fascinating area. The subsequent article in the series will assess the technology trends and the major real-world applications. The scientific origins of computational vision research lie in Psychology, Biology, Physics and Mathematics. Just as the human mind has been a source of inspiration for computation itself, the desire to understand and model the visual perception of human and other biological organisms drove some of the early research in computer vision. Since vision is a gateway to the human mind, it is natural to expect that understanding vision may help understanding the mind. Just as the camera is modeled in many ways similar to the eyes, perhaps the architecture and the software of computer vision systems could be modeled after the structures of the brain and the functional processes of the mind.

Advertisment
Direct

Hit!

Applies to: Research

USP: Computer vision is an emerging area in technology research

Primary Link:

www.vision1.com

Google keywords:

computer vision

We might regard the 19th century German scientist Herman von Helmholtz, who studied what he termed 'physiological optics', as the father of the modern thinking that brought cognitivist ideas into the study of vision. His theory on trichromatic color vision informs both modern camera design as well as all of vision research. Another example is the work of the pioneering American psychologist J J Gibson which led to the concept of 'optical flow'-which today forms the core representation of visual motion derived from video sequences. Similarly, the early representations of the image called Laplacians of Gaussian that are commonly used in vision and image processing are inspired by 'center-surround' operators found in the human brain.

The early representations of the image called Laplacians of Gaussian that are commonly used in vision and image processing are inspired by 'center-surround' operators found in the human brain

Visual Field

Our understanding of the functional anatomy of the visual processing areas of the human brain has dramatically

advanced over time

Advertisment

While the study of human vision led to fruitful findings for computer vision, it soon became clear, however, that understanding the visual system is not less complex than understanding the mind itself, a task that has fascinated philosophers and scientists since the dawn of human civilization. As digital storage of imagery became common during the eighties, there was also increasing expectation and pressure on researchers to develop ideas that had practical impact in the near term.. Starting in mid eighties, a new trend emerged in computer vision-physics-based vision. (The origins of this view date back to Helmholtz as well, but its dominance in computer vision did not begin until the eighties.) This approach carefully analyzes the physics of image formation-how light coming from ambient sources reflects and interacts with surfaces in the world, enters the lens system of the eye and is then captured by the cells of the retina. Equally relevant is the branch of mathematics called projective geometry, which describes what happens when three- dimensional shapes are projected onto 2-D planes. At a simplistic level, one can regard the goal of computer vision as that of 'inverting' this process-ie, recovering the surfaces and light sources given a set of images. This approach underlies many of the contemporary techniques that recover 3-D models of scenes from images.

Although (as we will see in part two) Physics and Geometry-based vision has seen considerable success in terms of practical results and applications, especially in media and entertainment, the complete reconstruction of the scene from images under arbitrary conditions is still an unrealized challenge. This is because too much information is lost in the image formation and projection processes, thus, giving rise to inherent ambiguities during reconstruction. But more importantly, the reconstruction approach does not address the issue of object and scene recognition, which in some sense are inherently cognitive functions.

Recently, there has been an emergence of another trend, which can be broadly described as 'statistical methods in computer vision'. This class of methods is driven by two important forces: first, the advances in computing and storage that have made it possible to work with large collections of data and detect and exploit the trends in the data. Second is the recent advances in statistical inferencing and probabilistic reasoning (also called Bayesian) methods in machine learning and pattern recognition. (While the interplay between these disciplines and vision is itself not new, it is only recently that their applications to general vision problems that involve, complex scenes and images have begun to show success.)



One can regard the field of computer vision as being about 40 years old. During this period, the scientific trends and motivations that have driven the research in the field have shifted between the paradigms and forces outlined above. Today, the statistical approach is dominant, while Physics and Geometry-based ideas continue. However, the biologically motivated methods are beginning to see a revival. This is because of recent advances in techniques and tools in neurophysiology (for example, functional MRI methods which allow the mapping of activations in specific areas of the brain even while the subject is engaged in a cognitive activity). In addition, our understanding of the functional anatomy of the visual processing areas of the human brain has dramatically advanced in the last two decades

The challenge that faces the field and new researchers entering the field is how to avoid getting trapped into one of the narrow scientific trends, and instead take a true multidisciplinary approach to the fundamental issues in the field. However, as is always the case in all science, research is a long term and social enterprise. Within the next 5-10 years, we may expect this type of convergence that leads to a more comprehensive approach to the dual quest for understanding how vision works, even while developing practical and useful applications that impact our lives. We will focus on this aspect in the next part of this article.

Dr P Anandan, MD, Microsoft Research India

Advertisment