Trends Watch

Semantic Web: A Web Beyond Keywords

Mastufa

05 Jul 2008 14:25 IST

New Update

The World Wide Web enters into its next phase called Semantic Web bringing

in a new paradigm called Web 3.0. The term Semantic Web was coined by Tim

Berners-Lee, the man who invented the (first) World Wide Web. In a Semantic Web,

machines can read and interpret web pages just like humans. Today, we can link a

Web page to another but we can't link their data together. As a result, we

browse through the links and then look for the right data within those links.

Even when you use a search engine, you enter key words and get a set of links to

websites where related information is available. They don't give you the answer

to your specific query, i.e. they don't throw up the data, just the links.

Social Networking sites these days are trying to improve upon this with the

system of tagging. The Semantic web goes beyond the keywords and into natural

language processing. So instead of typing in keywords, you can type in your

complete question, and the Symantec web will try to find the answer.

Advertisment

So, Semantic Web refers to the technology of precise vocabularies. Though

such kind of natural language processing has been in progress for years, it's

only recently that it's started to take off. Some start-ups like powerset,

textdigger and hakia are working on semantic search engines. A Semantic Web

agent does not necessarily include artificial intelligence. Instead it relies on

structured sets of information and inference rules that allow it to understand

the relationship between data sources. A computer may not understand information

the way humans can, but it has enough information to create logical connections

and take decisions accordingly. The data itself becomes a part of the Web in

case of Semantic Web -unlike the World Wide Web, which has endless information

in the form of documents - and is processed irrespective of platform,

application or domain. We can search for documents on the World Wide Web, but

their interpretation is left for the humans to do. On the other hand, Semantic

Web is all about data as well as documents on the Web so that machines can

process and even act on the data in practical ways. So while in the Non-semantic

Web (Web 1.0 and Web 2.0), we'll term the word 'snake' as snake.

However, in the Semantic web (part of Web 3.0), it would be treated as

Sauropsida,Subclass: Diapsida, Infraclass: Lepidosauromorpha, Superorder:
Lepidosauria, Order: Squamata>

Let's take another example. A Semantic Search Engine can answer questions like
'Which Indian author won Booker prize in the year 1997?' It will apply the

reasoning based on the fact that that the Web knows the difference between the

names of Indian Booker winners, respective years and even the names of books.

If we search for the keywords

“Semantic Web” in Google, it shows all sites containing information about

it. However, in a Semantic Web search such as the one provided by Powerset,

you get the definition of 'Semantic Web' along with relevant links

Advertisment

So the emphasis in Semantic Web goes to the back end. A Semantic Web

therefore is a Web of relations between resources signifying real world objects

such as, people, places and events. It is an extension of the current Web. There

is a rich set of links from the Semantic Web to HTML documents. These relations

characteristically unite a concept in the Semantic Web with the pages that are

most relevant.

Another significant aspect of the Semantic Web is that multiple sites may

contribute data about a particular resource. Without requiring any permission

from any authority, all relevant data from various sites can extend the

cumulative knowledge on the Semantic Web. This distributed extensibility is one

of the most important aspects of the Semantic Web.

Powerset gives you the direct

result of a question. While a typical search engine like Google gives a list

of sites which may not have sufficient information on the topic

Advertisment

The World Wide Web being the biggest repository of information with growing

content and arena of knowledge may create a problem as far as its non-semantic

nature is concerned. In the future, it would be extremely difficult to make

sense of this content. A search engine might help you find content containing

specific words or keywords, which may not be relevant to what you are looking

for. So what is lacking is that search is based on contents of pages and not on

the semantic meaning of the page's contents. On the other hand Semantic Web tags

all content on the Web and gives you results with relevant and precise

information.

Semantic Websites

Semantic Technology has started to take off with the recent launch of some

Websites like powerset, textdigger and hakia. Let's take the case of

powerset.com. This site took up the challenging task of applying natural

language processing to search. Powerset's first product is a search for

Wikipedia, and was launched in May '08. Powerset allows you to enter keywords,

phrases or even questions directly. Instead of giving you a list of sites,

powerset in most cases answers questions directly. The difference between

Powerset and a traditional search engine like Yahoo! And Google is that the

latter don't take into account stopwords like after, by, the, etc. Powerset

being semantically capable takes into account all such stopwords and gives you

the most relevant results. A search of 'Noam Chomsky' on powerset gives you the

direct result — a concise bio in the left side with details of Chomsky in the

right side, which you wouldn't have in any typical search engine like Yahoo!

Technologies behind Semantic Web

Following components comprise the technology behind Semantic Web.

Advertisment

1. A global naming scheme with URIs: URI (Uniform Resource Identifier)

is simply a Web identifier, like the strings starting with http or ftp that we

see on the World Wide Web. Anyone can create a URI. URI forms the base

technology on top of which to build a Web. Anything that has a URI is considered

to be on the Web. For instance, https://www.pcquest.com is an URI that identifies

a resource (PCQuest's home page) and signifies that a representation of that

resource (home page's HTML code) can be reached through HTTP from a network host

called www.pcquest.com. Every data object and every data schema/model in the

Semantic Web must have a unique URI.

2. Resource Description Framework:: Also known as RDF, this is a

standard syntax for describing data. RDF is an XML-based specification to

describe resources on the Web, intranets and extranets. RDF gives a reliable,

consistent way to describe and query Internet resources, from text pages to

audio files and video clips. It offers syntactic interoperability, and provides

the base layer for building a Semantic Web. RDF defines a directed graph of

relationships.

3. RDF Schema: This is a standard means to describe properties of

data. The semantic extension of RDF is RDF Schema that represents mechanisms to

explain groups of related resources and the relationships between them. The

class and property system of RDF Schema is akin to the type systems of

object-oriented programming languages such as Java. Both RDF Schema and RDF are

based on XML and XML Schema. The existence of standards for describing data (RDF)

and data attributes (RDF Schema) allows the development of a set of available

tools to read and exploit data from multiple sources.

Advertisment

In 'Pandorabots' you can

interchange information and ask questions. Bot uses AIML to come out with

the most relevant answer

4. Ontologies (that use OWL -Ontology Working Language): Syntactic

interoperability is required before multiple applications identify data and take

it as information. Syntactic interoperability refers to correct parsing of data.

It requires mapping between terms, which needs content analysis. This content

analysis again calls for proper and explicit qualifications of domain models,

which define the used terms and their relationships. Such formal domain models

are sometimes called Ontologies. Ontologies define data models in terms of

classes, subclasses, and properties. Web Ontology Language adds more vocabulary

to define properties and classes than RDF or RDF Schema. It can describe

relations between classes, cardinality (for example, 'exactly two'), equality,

richer typing of properties, and characteristics of properties (such as

symmetry). OWL has three sublanguages: in order of decreasing expressiveness,

they are OWL Full, OWL DL, and OWL Lite. Examples of ontologies include catalogs

for online shopping sites like Amazon.com, domain-specific standard terminology

like UNSPSC (a terminology used for products and services), or various

taxonomies on the Web, like the 'My Yahoo' categories. Components of OWL Web

Ontology Language are Classes, Properties and Individuals.

Classes

The basic building blocks of an OWL ontology involve Classes. Classes

typically represent a taxonomic hierarchy (a subclass-super class hierarchy).

OWL supports six main ways to define classes; named class is the simplest among

them. Other types include intersection classes, union classes, complement

classes, restrictions, and enumerated classes.

Advertisment

Properties

Properties have two main categories; Object properties, which relate

individuals to other individuals and Datatype properties, which connect

individuals to datatype values, such as integers, floats, and strings.Owl makes

use of XML Schema for defining datatypes.

Individuals

Individuals are example of classes. You may describe, e.g. an individual

named John as an instance of the class Person, and use the property as employer

to relate John to the individual Cyber Media, signifying that John is an

employee of Cyber Media.

ALICE, AIML and Chat Bot

AIML refers to Artificial Intelligence Markup Language which is an XML dialect
for creating natural language software agents. ALICE is a popular chat bot short

for Artificial Linguistic Internet Computer Entity which was developed in the

late 1990s by Dr. Richard Wallace. It intended to connect human and computer

interaction. A bot (short for "robot") is a program that works as an agent for a

user. On the Internet, the most ubiquitous bots are the programs, also called

spiders or crawlers that access Web sites and bring content for search engine

indexes. One of the first and most famous chatterbots (prior to the Web) was

Eliza, a program that pretended to be a psychotherapist and answered questions

with other questions. ALICE uses AIML to respond back to any question. We can

chat with any AIML-based bot on any topics and ask questions on anything. All

such chat bots are semantically capable. AIML describes a class of data objects

called AIML objects. AIML objects are made up of units called topics and

categories, which contain parsed or unparsed data. AIML supports two ways to

interface languages - the tag which executes any program accessible as

an operating system and inserts the results in the reply and the

tag allows arbitrary scripting inside the templates.

Future of Semantic Web

Implementation of OWL, RDF or the Semantic Web as a whole would be a

continuing process. But will Semantic Web benefit businesses and individuals is

what creates a confusion among the technology experts. Considering the way WWW

technologies proliferated, it's plausible that this new Web version will make

its own capabilities realized one day. However, it might initially be restricted

to intranet and extranet applications until security questions are addressed

adequately. Keeping in mind the potential of new Web — which is more about real

data than the anchored texts or pages, it is hoped that Semantic Web will lead

to evolution of human knowledge allowing people to synergize huge amounts of

data in a dynamic and relevant way. Large IT companies are awaiting the

consensus of the development community in settling down on standards is a

barrier to adoption. Another issue that is doing its rounds is the fact that, in

a world where anyone can publish anything, there is a question of reliability.

We will likely be whimsical of exercising our intelligence as far the nature of

Semantic Web is concerned. We will relatively be unintelligent being at the

disposal of the future Web which is destined to provide us with information

relevant to our search. These are the issues to be addressed before a semantic

culture takes off in true sense.

Advertisment