Tech Explained

In Search of Better 'Search'

PCQ Bureau

04 Jul 2009 06:50 IST

New Update

If you were to find out twenty fact about, say Iceland, and were given an
Internet connection, how would you go about it? Most probably, you would fire up
a well-known search engine like Google. You would enter Iceland as the keyword
and hit the Search button. This would spring up lots of websites related to
Iceland. You would then visit some of them that are displayed on the first
Search page to gather all your facts.

Sure this modern day search technology works, but it has several limitations.
For one, the search results simply point you to websites where you 'might' find
the information you're looking for. Two, you have to yourself judge the accuracy
of facts thrown up. Three, the process of finding the right information is time
consuming because you have to go through so many links. With the amount of
information available on the Internet growing by leaps and bounds, and other
information like audio, video, images also growting, this method of searching
will soon loose its effectiveness. For instance, what if you have a photograph
and want to find other similar ones like it on the Internet? Or, if you want to
download a particular song, but don't remember its lyrics, only the tune. How do
you find it on the Internet? These are some of the things being developed for
Internet search.

Wolfram|Alpha, the Computational engine

Computer scientist Stephen Wolfram, the inventor of Mathematica -a multi-faceted
program created in 1988 to provide a uniform system for all forms of algorithmic
computation, has landed in with yet another approach called computational
engine.

Computational engine gets you
direct information in visual representations unlike regular search engines
like Google, which simply returns links to Web pages.

Instead of searching the web and returning links, the computational engine
called Wolfram Alpha generates output by doing computations from its own
internal knowledge base. The search engine basically brings you systematic
factual knowledge, gets you things that are known, and are somehow public. It
only deals with facts and not opinions. Data that this engine comes up with are
mainly from internal knowledge base. An interesting thing here is that, the data
in Wolfram|Alpha is derived by computations, often based on multiple sources. It
deploys formulas and algorithms to compute answers for searchers. We can ask
WolphramAplha manythings in WolphramAplha . For example, you can ask about the
molecular weight of cholesterol, location of a gene in the human genome, the
number of people named John born in a particular year, the life expectancy of
50-year-olds in a country, the performance of Google stock, the height of Mt.
Everest, etc.

Components of Alpha

The main technologies behind this engine include:

Data curation: Wolfram|Alpha uses public and licensed proprietary
data sources, and the company uses automated processes and human choices to
prepare the data.
Algorithms: Alpha must pick the right computational processes to
present its results. Inside Wolfram Alpha are 5 million to 6 million lines of
Mathematica code that implement all those methods and models.
Linguistic analysis to understand what a person typed.
Presentation: Inside Alpha, there are tens of thousands of possible
graphs.

Wolfram can carry out complex
math

problems of Algebra, Matrices, Calculus, Trigonometry etc.

Picture Based Search

So, you can't remember who that person was in your wedding album or in the
conference. Not a problem, scan it and upload it to an image based search site
and ask it to find the person for you. This is what the nexgen search engines
are working on. Today if you did a picture search of a celebrity like Katrina
Kaif, you will be more successful! And a site called Picollator.com is trying to
search for other pictures you upload to it as well. The site uses pattern
recognition technologies to identify similar looking images on the Internet. But
as it has to run complex algorithms on so many images available on the net, it's
very slow as of today. The search is also not very accurate, but it works.
Google is also doing something similar with its web based version of Picasa,
which is an image and picture management portal. With the help of this
application, one can find similar faces in his/her complete photo library and
tag them with a name. This of course is not able to find all pictures of a
person but can find and recognize the face of a person wearing certain clothing
and ambient situations very well.

But, don't think that similar technology can only have leisure advantages. It
makes great business sense as well, and that's why the makers of VizSeek
“http://www.vizseek.com” came up with the idea of developing a search engine
which can search for any tool just by a photograph or doodle sketch of it. This
site was devised by some engineers keeping in mind that remembering the name of
a tool or a part in repair work can sometimes become very difficult.

Search enhancements from Google

Google rolled out a similar service in search which makes it possible to search
and compare public data in an interactive graph. Among the new search features
include Google Squared, Google Options and tool for Android.

Google Squared: This extracts information from the Web and displays it
in a table. For instance, if you type “fantasy television shows,” it may return
a table of shows with information like their release dates, directors, actors,
etc. However, users can click on individual entries to check the source, and if
the number is incorrect, can correct the numbers through new searches. Finally,
you can also save the customized table for future reference.

Google Squared is different from Wolfram|Alpha, which rather than searching the
Web for data ,taps databases licensed by Wolfram Research. In Alpha, the
emphasis is on computing and visualizing range of data on subjects like
astronomy, computer science, and weather from its own sources. Google will be
opening it up to users later this month on Google Labs.

Google Options: After doing a search, you will see a new icon saying
'Show options.' In the case of 'Switches', clicking on 'Show options' offers you
a range of options on what sorts of results you want: 'videos,' 'forums,'
'reviews,' results sorted by time frame (past 24 hours, past week, past year),
or the most recently created pages or images. This option is available now.

Google Options enables you view
your search results in terms of 'videos; 'forums', 'reviews', and also in
timeframe as shown above in the left side.

Sound Based Search

This is something very interesting for people who like music. Remember sometimes
how difficult and frustrating it becomes when you forget the name of the song
which you want to search online. To top it all, sometimes you even forget the
lyrics. All you remember is the tune of the song. But being a diehard fan of
music, you can't just leave the feeling of listening to that music.

That's the type of customers which Musipedia.com is trying to harness. In
this website, one can find any music and purchase/download it. The searching can
be done either by typing the name of the song, or by playing the melody of the
song on a virtual keyboard or just by whistling the melody to the computer's
microphone or even by tapping the keyboard. The website recognizes the timing
and nodes of the song and accordingly it searches for the correct song
instantly. Then you can either play or just purchase the song. Well! I am not
very sure about some other usability of such technology, but yes, I whistled out
some five songs to it and it was only able to search two for me. So either My
Whistling is bad (which is quite possible) or this technology has to go a long
way before it's accepted by the actual netizans.

Plagiarism Search

This is something very useful for media companies like ours. Checking the
authenticity of a guest article or even checking if an article is being used by
someone else or not on the Internet was never easy before these Plagiarism
searches came in. These websites use APIs of Google or similar search engines.
The main aim of such web portals is to search for each and every sentence in any
webpage and search for exactly same or similar sentences/word sequences in some
other articles and then give a plagiarism score to that article. It also
highlights the copied/similar texts in all the articles. One example of such a
site is http://copyscape.com .

Another very intuitive use of such service for hunting down phishing
websites. A bank can pass its site's content to copyescape.com or similar
website to check if someone is phishing its website. As a phishing site must
have the same text and similar layout, it would be easily caught.

Musipedia's virtual keyboard to
play the tune of the music you are searching.

Semantic Search

This is something that might change the complete search paradigm in the near
future. Semantic search is very much practical, easier to use than traditional
search, faster and more accurate. Semantic search refers to the technology of
precise vocabulary-based search. Though such kind of natural language processing
has been in progress for years, it was only recently that it started to take
off. Some start-ups like Powerset, Textdigger and Hakia are working on semantic
search engines. A semantic web agent does not necessarily include artificial
intelligence. Instead it relies on structured sets of information and inference
rules that allow it to understand the relationship between data sources. A
computer may not understand information the way humans can, but it has enough
information to create logical connections and take decisions accordingly. The
data itself becomes a part of the web in case of semantic web -unlike the World
Wide Web, which has endless information in the form of documents - and is
processed irrespective of platform, application or domain. We can search for
documents on the World Wide Web, but their interpretation is left to us . On the
other hand, semantic web is all about data as well as documents on the Web so
that machines can process and even act on the data in practical ways. So while
in the non-semantic web, we'll term the word 'snake' as snake. In semantic web,
it would be treated as a animal.

Let's take another example. A Semantic Search Engine can answer questions
like 'Which Indian author won Booker prize in the year 1997?' It will apply the
reasoning based on the fact that the Web knows the difference between the names
of Indian Booker winners, respective years and even the names of books.

If we search for the keywords Semantic Web in Google, it shows all sites
containing information about it. However, in a Semantic Web search such as the
one provided by Powerset, you get the definition of 'Semantic Web' along with
relevant links

So the emphasis in Semantic Web goes to the back end. There is a rich set of
links from the Semantic Web to HTML documents. These relations
characteristically unite a concept in the semantic Web with the pages that are
most relevant.

The Backend for the Bots

We talked about Symantic search which does not necessarily include artificial
intelligence, Instead it relies on structured sets of information and inference
rules that allow it to understand the relationship between data sources. Just
imagine if we can rely on artificial intelligence and NLP or Natural Language
Processing. We can get a robot connected to the Internet who can listen to one's
voice and respond to the questions in a very friendly manner. And as it is
connected to the Internet, no question goes unanswered. So we actually can
convert the Web into a Brain for our robots. This reminds me of VIKI (Virtual
Interactive Konnective Intelligence) of the Hollywood blockbuster iRobot. I just
hope in the real world it doesn't go bad as VIKI did.

Today, in real world we don't have VIKI, but we have something called ALICE
which is an AI chat Bot which work on a AIML or Artificial Intelligence Markup
Language. Before I go on, just read the following interaction of mine with
Alicebot. You can visit her at
http://alicebot.blogspot.com/

So, in the following interaction, I was able to talk with ALICE with normal
language and asked her for some information. And it was able to understand the
correct meaning and intent of my question and then respond with a most
appropriate answer. Just imagine, if you could have a similar interface for
Google or Wikipedia. What will be the level of user interaction? And coupling it
with voice reorganization and text to speech we can actually have a VIKI in
place. Let me just leave you with these thoughts on the future of Search. For
any queries or questions on this topic, visit us at
http://forums.pcquest.com

With help from Anindya Roy

Stay connected with us through our social media channels for the latest updates and news!