Archive

Posts Tagged ‘Artificial intelligence’

Computer Vision Systems

September 4th, 2010 1 comment

The organization of a computer vision system is highly application dependent. Some systems are stand-alone applications, which solve a specific measurement or detection problem, while others constitute a sub-system of a larger design, which, for example, also contains sub-systems for control of mechanical actuators, planning, information databases, man-machine interfaces, etc. The specific implementation of a computer vision system also depends on if its functionality is pre-specified or if some part of it can be learned or modified during operation. There are, however, typical functions, which are found in many computer vision systems.

Image acquisition

Pre-processing

Feature extraction

Detection/segmentation

High-level processing

Typical Tasks of Computer Vision

September 4th, 2010 No comments

Each of the application areas described above employ a range of computer vision tasks; more or less well-defined measurement problems or processing problems, which can be solved using a variety of methods.

Recognition

The classical problem in computer vision, image processing, and machine vision is that of determining whether the image data contains some specific object, feature, or activity. This task can normally be solved robustly and without effort by a human, but is still not satisfactorily solved in computer vision for the general case: arbitrary objects in arbitrary situations. The existing methods for dealing with this problem can at best solve it only for specific objects, such as simple geometric objects (e.g., polyhedral), human faces, printed or hand-written characters, or vehicles, and in specific situations, typically described in terms of well-defined illumination, background, and pose of the object relative to the camera.

Different varieties of the recognition problem are described in the literature:

Object recognition: one or several pre-specified or learned objects or object classes can be recognized, usually together with their 2D positions in the image or 3D poses in the scene.

Identification: An individual instance of an object is recognized. Examples: identification of a specific person’s face or fingerprint, or identification of a specific vehicle.

Detection: the image data is scanned for a specific condition. Examples: detection of possible abnormal cells or tissues in medical images or detection of a vehicle in an automatic road toll system. Detection based on relatively simple and fast computations is sometimes used for finding smaller regions of interesting image data, which can be further analyzed by more computationally demanding techniques to produce a correct interpretation.

Several specialized tasks based on recognition exist, such as:

Content-based image retrieval: finding all images in a larger set of images which have a specific content. The content can be specified in different ways, for example in terms of similarity relative a target image (give me all images similar to image X), or in terms of high-level search criteria given as text input (give me all images which contains many houses, are taken during winter, and have no cars in them).

Pose estimation: estimating the position or orientation of a specific object relative to the camera. An example application for this technique would be assisting a robot arm in retrieving objects from a conveyor belt in an assembly line situation.

Optical character recognition (OCR): identifying characters in images of printed or handwritten text, usually with a view to encoding the text in a format more amenable to editing or indexing (e.g. ASCII).

Motion analysis

Either several tasks relate to motion estimation where an image sequence is processed to produce an estimate of the velocity at each point in the image or in the 3D scene, or even of the camera, that produces the images. Examples of such tasks are:

Egomotion: determining the 3D rigid motion (rotation and translation) of the camera from an image sequence produced by the camera.

Tracking: following the movements of a (usually) smaller set of interest points or objects (e.g., vehicles or humans) in the image sequence.

Optical flow: to determine, for each point in the image, how that point is moving relative to the image plane, i.e., its apparent motion. This motion is a result both of how the corresponding 3D point is moving in the scene and of how the camera is moving relative to the scene.

Scene reconstruction

Given one or (typically) more images of a scene, or a video, scene reconstruction aims at computing a 3D model of the scene. In the simplest case, the model can be a set of 3D points. More methods that are sophisticated produce a complete 3D surface model.

Image restoration

The aim of image restoration is the removal of noise (sensor noise, motion blur, etc.) from images. The simplest possible approach for noise removal is various types of filters such as low-pass filters or median filters. More methods that are sophisticated assume a model of how the local image structures look like, a model, which distinguishes them from the noise. By first analyzing the image data in terms of the local image structures, such as lines or edges, and then controlling the filtering based on local information from the analysis step, a better level of noise removal is usually obtained compared to the simpler approaches.

Computer Vision & Related Fields

September 4th, 2010 2 comments

The fields most closely related to computer vision are image processing, image analysis and machine vision. There is a significant overlap in the range of techniques and applications. This implies that the basic techniques that are used and developed in these fields are more or less identical, something which can be interpreted as there is only one field with different names. On the other hand, it appearsComputer Vision & Related Fields to be necessary for research groups, scientific journals, conferences and companies to present or market themselves as belonging specifically to one of these fields and, hence, various characterizations that distinguish each of the fields from the others have been presented.

The following characterizations appear relevant but should not be taken as universally accepted:

Image processing and image analysis tend to focus on 2D images, how to transform one image to another, e.g., by pixel wise operations such as contrast enhancement, local operations such as edge extraction or noise removal, or geometrical transformations such as rotating the image. This characterization implies that image processing/analysis neither require assumptions nor produce interpretations about the image content.

Computer vision tends to focus on the 3D scene projected onto one or several images, e.g., how to reconstruct structure or other information about the 3D scene from one or several images. Computer vision often relies on more or less complex assumptions about the scene depicted in an image.

Machine vision tends to focus on applications, mainly in manufacturing, e.g., vision based autonomous robots and systems for vision based inspection or measurement. This implies that image sensor technologies and control theory often are integrated with the processing of image data to control a robot and that real-time processing is emphasized by means of efficient implementations in hardware and software. It also implies that the external conditions such as lighting can be and are often more controlled in machine vision than they are in general computer vision, which can enable the use of different algorithms.

There is also a field called imaging which primarily focus on the process of producing images, but sometimes also deals with processing and analysis of images. For example, medical imaging contains lots of work on the analysis of image data in medical applications.

Finally, pattern recognition is a field, which uses various methods to extract information from signals in general, mainly based on statistical approaches. A significant part of this field is devoted to applying these methods to image data.

Computer Vision

September 4th, 2010 2 comments

Computer vision is the science and technology of machines that see. As a scientific discipline, computer vision is concerned with the theory behind artificial systems that extract information from images. The image data can take many forms, such as video sequences, views from multiple cameras, or multi-dimensional data from a medical scanner.

As a technological discipline, computer vision seeks to apply its theories and models to the construction of computer vision systems.

Computer vision is, in some ways, the inverse of computer graphics. While computer graphics produces image data from 3D models, computer vision often produces 3D models from image data. There is also a trend towards a combination of the two disciplines.

An intelligent computer system can go a long way in reducing human labor. However, if such a system can be provided with a method of actually interacting with the physical world, its usefulness is greatly increased. Robotics gives AI the means to exhibit real-world intelligence by directly manipulating their environment. That is, robotics gives the artificial mind a body.

An essential component of robotics has to do with artificial sensory systems in general and artificial vision in particular. While it is true that robotics systems exist (including many successful industrial robots) that have no sensory equipment (or very limited sensors) they tend to be very brittle systems. They need to have their work area perfectly lit, with no shadows or mess. They must have the parts needed in precisely the right position and orientation, and if they are moved to a new location, they may require hours of recalibration. If a system could be developed that could make sense out of a visual scene it would greatly enhance the potential for robotics applications. It is therefore not surprising that the study of artificial vision and robotics go hand-in-hand.

Statistical NLP

September 4th, 2010 No comments

Statistical natural-language processing uses stochastic, probabilistic and statistical methods to resolve some of the difficulties discussed above, especially those, which arise because longer sentences are highly ambiguous when, processed with realistic grammars, yielding thousands or millions of possible analyses. Methods for disambiguation often involve the use of corpora and Markov models. Statistical NLP comprises all quantitative approaches to automated language processing, including probabilistic modeling, information theory, and linear algebra. The technology for statistical NLP comes mainly from machine learning and data mining, both of which are fields of artificial intelligence that involve learning from data.

Natural Language Processing

September 4th, 2010 No comments

Natural Language processing (NLP) is a field of computer science and linguistics concerned with the interactions between computers and human (natural) languages. In theory, natural language processing is a very attractive method of human-computer interaction. Natural-language understanding is sometimes referred to as an AI-complete problem, because natural-language recognition seems to require extensive knowledge about the outside world and the ability to manipulate it.

NLP has significant overlap with the field of computational linguistics, and is often considered a sub-field of artificial intelligence.

Scripts

September 4th, 2010 No comments

Scripts were developed in the early AI work by Roger Schank, Robert P. Abelson and their research group, and are a method of representing procedural knowledge. They are very much like frames, except the values that fill the slots must be ordered.

The classic example of a script involves the typical sequence of events that occur when a person dines in a restaurant: finding a seat, reading the menu, ordering drinks from the wait staff… In the script form, these would be decomposed into conceptual transitions, such as MTRANS and PTRANS, which refer to mental transitions and physical transitions.

Schank, Abelson and their colleagues tackled some of the most difficult problems in artificial intelligence (i.e., story understanding), but ultimately their line of work ended without tangible success. This type of work received little attention after the 1980s, but it is very influential in later knowledge representation techniques, such as case-based reasoning.

Scripts can be inflexible. To deal with inflexibility, smaller modules called memory organization packets (MOP) can be combined in a way that is appropriate for the situation.

Semantic Nets

September 4th, 2010 No comments
An example of a semantic network

Image via Wikipedia

A semantic network or net is a graphic notation for representing knowledge in patterns of interconnected nodes and arcs. Computer implementations of semantic networks were first developed for artificial intelligence and machine translation, but earlier versions have long been used in philosophy, psychology, and linguistics.

What is common to all semantic networks is a declarative graphic representation that can be used either to represent knowledge or to support automated systems for reasoning about knowledge. Some versions are highly informal, but other versions are formally defined systems of logic. Following are six of the most common kinds of semantic networks, each of which is discussed in detail in one section of this article.

  • A definitional network emphasizes the subtype or is-a relation between a concept type and a newly defined subtype. The resulting network, also called a generalization or subsumption hierarchy, supports the rule of inheritance for copying properties defined for a super type to all of its subtypes. Since definitions are true by definition, the information in these networks is often assumed necessarily true.
  • Assertional networks are designed to assert propositions. Unlike definitional networks, the information in an assertional network is assumed contingently true, unless it is explicitly marked with a modal operator. Some assertional networks have been proposed as models of the conceptual structures underlying natural language semantics.
  • Implicational networks use implication as the primary relation for connecting nodes. They may be used to represent patterns of beliefs, causality, or inferences.
  • Executable networks include some mechanism, such as marker passing or attached procedures, which can perform inferences, pass messages, or search for patterns and associations.
  • Learning networks build or extend their representations by acquiring knowledge from examples. The new knowledge may change the old network by adding and deleting nodes and arcs or by modifying numerical values, called weights, associated with the nodes and arcs.
  • Hybrid networks combine two or more of the previous techniques, either in a single network or in separate, but closely interacting networks.

Some of the networks have been explicitly designed to implement hypotheses about human cognitive mechanisms, while others have been designed primarily for computer efficiency. Sometimes, computational reasons may lead to the same conclusions as psychological evidence. The distinction between definitional and assertional networks, for example, has a close parallel to Tulving’s (1972) distinction between semantic memory and episodic memory.

Different knowledge representation techniques

September 4th, 2010 1 comment

There are representation techniques such as frames, rules, tagging, and semantic networks, which have originated from theories of human information processing. Since knowledge is used to achieve intelligent behavior, the fundamental goal of knowledge representation is to represent knowledge in a manner as to facilitate inferencing (i.e. drawing conclusions) from knowledge.

Some issues that arise in knowledge representation from an AI perspective are:

  • How do people represent knowledge?
  • What is the nature of knowledge?
  • Should a representation scheme deal with a particular domain or should it be general purpose?
  • How expressive is a representation scheme or formal language?
  • Should the scheme be declarative or procedural?

There has been very little top-down discussion of the knowledge representation (KR) issues and research in this area is a well-aged quillwork. There are well known problems such as “spreading activation” (this is a problem in navigating a network of nodes), “subsumption” (this is concerned with selective inheritance; e.g. an ATV can be thought of as a specialization of a car but it inherits only particular characteristics) and “classification.” For example, a tomato could be classified both as a fruit and as a vegetable.

In the field of artificial intelligence, problem solving can be simplified by an appropriate choice of knowledge representation. Representing knowledge in some ways makes certain problems easier to solve. For example, it is easier to divide numbers represented in Hindu-Arabic numerals than numbers represented as Roman numerals.

History of Knowledge Representation

September 4th, 2010 No comments

In computer science, particularly artificial intelligence, a number of representations have been devised to structure information.

KR is most commonly used to refer to representations intended for processing by modern computers, and in particular, for representations consisting of explicit objects (the class of all elephants, or Clyde a certain individual), and of assertions or claims about them (‘Clyde is an elephant’, or ‘all elephants are grey’). Representing knowledge in such explicit form enables computers to draw conclusions from knowledge already stored (‘Clyde is grey’).

Many KR methods were tried in the 1970s and early 1980s, such as heuristic question-answering, neural networks, theorem proving, and expert systems, with varying success. Medical diagnosis (e.g., Mycin) was a major application area, as were games such as chess.

In the 1980s, formal computer knowledge representation languages and systems arose. Major projects attempted to encode wide bodies of general knowledge; for example the “Cyc” project (still ongoing) went through a large encyclopedia, encoding not the information itself, but the information a reader would need in order to understand the encyclopedia: naive physics; notions of time, causality, motivation; commonplace objects and classes of objects.

Through such work, the difficulty of KR came to be better appreciated. In computational linguistics, meanwhile, much larger databases of language information were being built, and these, along with great increases in computer speed and capacity, made deeper KR more feasible.

Several programming languages have been developed that are oriented to KR. Prolog developed in 1972, but popularized much later, represents propositions and basic logic, and can derive conclusions from known premises. KL-ONE (1980s) is more specifically aimed at knowledge representation itself. In 1995, the Dublin Core standard of metadata was conceived.

In the electronic document world, languages were being developed to represent the structure of documents, such as SGML (from which HTML descended) and later XML. These facilitated information retrieval and data mining efforts, which have in recent years begun to relate to knowledge representation.