Trends Watch

Semantic Web: A Web Beyond Keywords

Mastufa

05 Jul 2008 14:25 IST

New Update

The World Wide Web enters into its next phase called Semantic Web bringing
in a new paradigm called Web 3.0. The term Semantic Web was coined by Tim
Berners-Lee, the man who invented the (first) World Wide Web. In a Semantic Web,
machines can read and interpret web pages just like humans. Today, we can link a
Web page to another but we can't link their data together. As a result, we
browse through the links and then look for the right data within those links.
Even when you use a search engine, you enter key words and get a set of links to
websites where related information is available. They don't give you the answer
to your specific query, i.e. they don't throw up the data, just the links.
Social Networking sites these days are trying to improve upon this with the
system of tagging. The Semantic web goes beyond the keywords and into natural
language processing. So instead of typing in keywords, you can type in your
complete question, and the Symantec web will try to find the answer.

So, Semantic Web refers to the technology of precise vocabularies. Though
such kind of natural language processing has been in progress for years, it's
only recently that it's started to take off. Some start-ups like powerset,
textdigger and hakia are working on semantic search engines. A Semantic Web
agent does not necessarily include artificial intelligence. Instead it relies on
structured sets of information and inference rules that allow it to understand
the relationship between data sources. A computer may not understand information
the way humans can, but it has enough information to create logical connections
and take decisions accordingly. The data itself becomes a part of the Web in
case of Semantic Web -unlike the World Wide Web, which has endless information
in the form of documents - and is processed irrespective of platform,
application or domain. We can search for documents on the World Wide Web, but
their interpretation is left for the humans to do. On the other hand, Semantic
Web is all about data as well as documents on the Web so that machines can
process and even act on the data in practical ways. So while in the Non-semantic
Web (Web 1.0 and Web 2.0), we'll term the word 'snake' as snake.
However, in the Semantic web (part of Web 3.0), it would be treated as

Sauropsida,Subclass: Diapsida, Infraclass: Lepidosauromorpha, Superorder:
Lepidosauria, Order: Squamata>

Let's take another example. A Semantic Search Engine can answer questions like
'Which Indian author won Booker prize in the year 1997?' It will apply the
reasoning based on the fact that that the Web knows the difference between the
names of Indian Booker winners, respective years and even the names of books.

If we search for the keywords
“Semantic Web” in Google, it shows all sites containing information about
it. However, in a Semantic Web search such as the one provided by Powerset,
you get the definition of 'Semantic Web' along with relevant links

So the emphasis in Semantic Web goes to the back end. A Semantic Web
therefore is a Web of relations between resources signifying real world objects
such as, people, places and events. It is an extension of the current Web. There
is a rich set of links from the Semantic Web to HTML documents. These relations
characteristically unite a concept in the Semantic Web with the pages that are
most relevant.

Another significant aspect of the Semantic Web is that multiple sites may
contribute data about a particular resource. Without requiring any permission
from any authority, all relevant data from various sites can extend the
cumulative knowledge on the Semantic Web. This distributed extensibility is one
of the most important aspects of the Semantic Web.

Powerset gives you the direct
result of a question. While a typical search engine like Google gives a list
of sites which may not have sufficient information on the topic

The World Wide Web being the biggest repository of information with growing
content and arena of knowledge may create a problem as far as its non-semantic
nature is concerned. In the future, it would be extremely difficult to make
sense of this content. A search engine might help you find content containing
specific words or keywords, which may not be relevant to what you are looking
for. So what is lacking is that search is based on contents of pages and not on
the semantic meaning of the page's contents. On the other hand Semantic Web tags
all content on the Web and gives you results with relevant and precise
information.

Semantic Websites

Semantic Technology has started to take off with the recent launch of some
Websites like powerset, textdigger and hakia. Let's take the case of
powerset.com. This site took up the challenging task of applying natural
language processing to search. Powerset's first product is a search for
Wikipedia, and was launched in May '08. Powerset allows you to enter keywords,
phrases or even questions directly. Instead of giving you a list of sites,
powerset in most cases answers questions directly. The difference between
Powerset and a traditional search engine like Yahoo! And Google is that the
latter don't take into account stopwords like after, by, the, etc. Powerset
being semantically capable takes into account all such stopwords and gives you
the most relevant results. A search of 'Noam Chomsky' on powerset gives you the
direct result — a concise bio in the left side with details of Chomsky in the
right side, which you wouldn't have in any typical search engine like Yahoo!

Technologies behind Semantic Web

Following components comprise the technology behind Semantic Web.

1. A global naming scheme with URIs: URI (Uniform Resource Identifier)
is simply a Web identifier, like the strings starting with http or ftp that we
see on the World Wide Web. Anyone can create a URI. URI forms the base
technology on top of which to build a Web. Anything that has a URI is considered
to be on the Web. For instance, https://www.pcquest.com is an URI that identifies
a resource (PCQuest's home page) and signifies that a representation of that
resource (home page's HTML code) can be reached through HTTP from a network host
called www.pcquest.com. Every data object and every data schema/model in the
Semantic Web must have a unique URI.

2. Resource Description Framework:: Also known as RDF, this is a
standard syntax for describing data. RDF is an XML-based specification to
describe resources on the Web, intranets and extranets. RDF gives a reliable,
consistent way to describe and query Internet resources, from text pages to
audio files and video clips. It offers syntactic interoperability, and provides
the base layer for building a Semantic Web. RDF defines a directed graph of
relationships.

3. RDF Schema: This is a standard means to describe properties of
data. The semantic extension of RDF is RDF Schema that represents mechanisms to
explain groups of related resources and the relationships between them. The
class and property system of RDF Schema is akin to the type systems of
object-oriented programming languages such as Java. Both RDF Schema and RDF are
based on XML and XML Schema. The existence of standards for describing data (RDF)
and data attributes (RDF Schema) allows the development of a set of available
tools to read and exploit data from multiple sources.

In 'Pandorabots' you can
interchange information and ask questions. Bot uses AIML to come out with
the most relevant answer

4. Ontologies (that use OWL -Ontology Working Language): Syntactic
interoperability is required before multiple applications identify data and take
it as information. Syntactic interoperability refers to correct parsing of data.
It requires mapping between terms, which needs content analysis. This content
analysis again calls for proper and explicit qualifications of domain models,
which define the used terms and their relationships. Such formal domain models
are sometimes called Ontologies. Ontologies define data models in terms of
classes, subclasses, and properties. Web Ontology Language adds more vocabulary
to define properties and classes than RDF or RDF Schema. It can describe
relations between classes, cardinality (for example, 'exactly two'), equality,
richer typing of properties, and characteristics of properties (such as
symmetry). OWL has three sublanguages: in order of decreasing expressiveness,
they are OWL Full, OWL DL, and OWL Lite. Examples of ontologies include catalogs
for online shopping sites like Amazon.com, domain-specific standard terminology
like UNSPSC (a terminology used for products and services), or various
taxonomies on the Web, like the 'My Yahoo' categories. Components of OWL Web
Ontology Language are Classes, Properties and Individuals.

Classes

The basic building blocks of an OWL ontology involve Classes. Classes
typically represent a taxonomic hierarchy (a subclass-super class hierarchy).
OWL supports six main ways to define classes; named class is the simplest among
them. Other types include intersection classes, union classes, complement
classes, restrictions, and enumerated classes.

Properties

Properties have two main categories; Object properties, which relate
individuals to other individuals and Datatype properties, which connect
individuals to datatype values, such as integers, floats, and strings.Owl makes
use of XML Schema for defining datatypes.

Individuals

Individuals are example of classes. You may describe, e.g. an individual
named John as an instance of the class Person, and use the property as employer
to relate John to the individual Cyber Media, signifying that John is an
employee of Cyber Media.

ALICE, AIML and Chat Bot

AIML refers to Artificial Intelligence Markup Language which is an XML dialect
for creating natural language software agents. ALICE is a popular chat bot short
for Artificial Linguistic Internet Computer Entity which was developed in the
late 1990s by Dr. Richard Wallace. It intended to connect human and computer
interaction. A bot (short for "robot") is a program that works as an agent for a
user. On the Internet, the most ubiquitous bots are the programs, also called
spiders or crawlers that access Web sites and bring content for search engine
indexes. One of the first and most famous chatterbots (prior to the Web) was
Eliza, a program that pretended to be a psychotherapist and answered questions
with other questions. ALICE uses AIML to respond back to any question. We can
chat with any AIML-based bot on any topics and ask questions on anything. All
such chat bots are semantically capable. AIML describes a class of data objects
called AIML objects. AIML objects are made up of units called topics and
categories, which contain parsed or unparsed data. AIML supports two ways to
interface languages - the tag which executes any program accessible as
an operating system and inserts the results in the reply and the
tag allows arbitrary scripting inside the templates.

Future of Semantic Web

Implementation of OWL, RDF or the Semantic Web as a whole would be a
continuing process. But will Semantic Web benefit businesses and individuals is
what creates a confusion among the technology experts. Considering the way WWW
technologies proliferated, it's plausible that this new Web version will make
its own capabilities realized one day. However, it might initially be restricted
to intranet and extranet applications until security questions are addressed
adequately. Keeping in mind the potential of new Web — which is more about real
data than the anchored texts or pages, it is hoped that Semantic Web will lead
to evolution of human knowledge allowing people to synergize huge amounts of
data in a dynamic and relevant way. Large IT companies are awaiting the
consensus of the development community in settling down on standards is a
barrier to adoption. Another issue that is doing its rounds is the fact that, in
a world where anyone can publish anything, there is a question of reliability.
We will likely be whimsical of exercising our intelligence as far the nature of
Semantic Web is concerned. We will relatively be unintelligent being at the
disposal of the future Web which is destined to provide us with information
relevant to our search. These are the issues to be addressed before a semantic
culture takes off in true sense.

Stay connected with us through our social media channels for the latest updates and news!