International Institute of Information Technology (IIIT) Hyderabad is all
geared to release what it calls the Indian Language Machine Translator. The
project headed by Prof Rajeev Sangal, Director of the institute, is being
carried out within three in-house labs, with each of them working on different
aspects of Natural Language Processing (NLP).
The research at IIIT dealt with several aspects of text and voice of NLP
distinctly and differently. The application is in finishing stages for ten
Indian languages: Hindi, Punjabi, Marathi, Bengali, Urdu, Malayalam, Tamil,
Telugu and Kannada. This product is expected to be ready for commercial use
within a year and is targeted at two distinct areas-Pilgrimage & Tourism and
Health. Prof Sangal explains, “We decided on tailoring the application for
Pilgrimage and Tourism based on usage trends, the Health application comes with
a potential social impact. An ideal example would be a man from Punjab wanting
to take his family for a holiday in Kerala. He should be able to access an
online forum, post his query in Hindi or Punjabi, and a Keralite at the other
end will view this query in Malayalam, reply in Malayalam and our man would see
the answer in Punjabi.”
Though language translation has been tried out globally for research,
development and commercial deployment, it has failed to overcome the challenges
of dialects, grammar changes and colloquialism. The research at IIIT has
understood that Indian scripts are sophisticated but not complex and is
incorporating Artificial Intelligence to language translation.
Rajeev Sangal Professor, IIIT Hyderabad |
Prof Sangal, who is also an AI expert embarked on a study of the Panini
Vyakyaran and correlated it to modern Indian literature before adding elements
of AI to enable the computer to learn, condition itself and understand the
requirement of the user. The AI aspect is divided into two components:
Rule-based processing and machine learning. In rule-based processing, an
electronic catalogue of words and phrases is fed to the computer, enabling it to
understand typical usage of grammar elements. As more and more words and phrases
are added to this catalogue, the computer forms more rigid understanding of
rules according to which a particular language operates. Machine learning on the
other hand focuses on providing statistical data based on which the computer
will learn to use the examples. While usual translation software translates
phrases from one language to another, IIIT's research focuses on dependency of
each word on the neighboring one to create a pattern of usage for each language.
This takes care of dialects, localized modifications in language and slangs. In
broad terms, this real-time language transliteration application works on a
three step methodology-analyze, transfer and generate.