Success Stories

Indian Institute of Information Technology, Allahabad : Content Plagiarism Detection

PCQ Bureau

07 May 2008 05:49 IST

New Update

We all face a deluge of information around us-web pages, email, books,
magazines, research papers and what not. Ever since Internet evolved, content
sharing across networks has been like a breeze. And it's walloping @ 33%
annually. While all this has helped in knowledge sharing, there has also been a
fair share of spoils. It's just a matter of time before a PhD student can gather
info on his subject and pass it of as original research to an unsuspecting
professor. There have been software available since ages to detect plagiarism
from websites or the intranet. Most of these only compare strings of words to
check for similarities and do not check for semantics, before labeling the
content as being plagiarized. So, the offender could bypass them by using
synonyms or changing tenses of sentences. This project was funded by Min of
Communications and IT to setup a Patent Referral Center at the institute, to
detect and discard untenable patents. The software has an in-built dictionary
that stores the pre-computed hashes of synonyms of all common words in English.
It compares with similar documents on the Internet and any soft repository that
may be assigned, and labels a document as being plagiarized if it crosses a
certain percentage of similarity.

Project Specs

Project Head:
Prof R C Tripathi,
Dean (R&D) & Head (IPRs)

Deployment Location:
Allahabad

Team Size: 6

Tech Used:
C++, PHP, MySQL and Yahoo Web API

Intended audience:Universities,
research institutions, publishing houses, patent offices and legal
professionals

Project status:The system has already
been deployed at IIIT Allahabad to screen research papers for the last two
International Conferences on 'Wireless Communications and Sensor

Networks

Implementation
Partner

Inhouse

Stay connected with us through our social media channels for the latest updates and news!