Success Stories

Indian Institute of Information Technology, Allahabad : Content Plagiarism Detection

PCQ Bureau

07 May 2008 05:49 IST

New Update

We all face a deluge of information around us-web pages, email, books,

magazines, research papers and what not. Ever since Internet evolved, content

sharing across networks has been like a breeze. And it's walloping @ 33%

annually. While all this has helped in knowledge sharing, there has also been a

fair share of spoils. It's just a matter of time before a PhD student can gather

info on his subject and pass it of as original research to an unsuspecting

professor. There have been software available since ages to detect plagiarism

from websites or the intranet. Most of these only compare strings of words to

check for similarities and do not check for semantics, before labeling the

content as being plagiarized. So, the offender could bypass them by using

synonyms or changing tenses of sentences. This project was funded by Min of

Communications and IT to setup a Patent Referral Center at the institute, to

detect and discard untenable patents. The software has an in-built dictionary

that stores the pre-computed hashes of synonyms of all common words in English.

It compares with similar documents on the Internet and any soft repository that

may be assigned, and labels a document as being plagiarized if it crosses a

certain percentage of similarity.

Project Specs

Project Head:

Prof R C Tripathi,

Dean (R&D) & Head (IPRs)

Deployment Location:

Allahabad

Team Size: 6

Tech Used:

C++, PHP, MySQL and Yahoo Web API

Intended audience:Universities,

research institutions, publishing houses, patent offices and legal

professionals

Project status:The system has already

been deployed at IIIT Allahabad to screen research papers for the last two

International Conferences on 'Wireless Communications and Sensor

Networks

Implementation

Partner

Inhouse

Advertisment