Friday, November 21, 2008  
Google
Web pcquest.com

CIOL Network sites

Search by Issue | CD Search | Sitemap | Advanced Search

"Ad: Nortel data network solutions are 40% more energy efficient" "Ad:Discover Green Intelligence, make your business strong"

Home > Technology > Filtering Focused Information

    Enterprise Solutions
    Hands On
    ITstrategy

    Developer

    Tech Forum

    Trends

    Shootout

    Reviews
    Editorials
    In Depth
    Technology
    Extraedge

    IT Careers

    Vertical Focus

Subscribe to Print magazine.


now!


Newsletter


Filtering Focused Information

Continued from Page 1

Authorities and hubs

Even though the Web’s link structure can reveal notions ofauthority, it’s not possible to apply text-based methods to collect manypotentially relevant pages, and then comb them for the most authoritative ones.For example, if we were to look for the Web’s main search engines, we woulderr badly if we searched only for "search engines". Although the setof pages containing this term is enormous, it doesn’t contain most of thenatural authorities we would expect to find, such as Yahoo!, Excite, Infoseekand AltaVista. Similarly we can’t expect Honda’s or Toyota’s home pages tocontain the words "Japanese automobile manufacturers", nor Microsoft’sor Lotus’ home pages to contain the words "software companies".

This difficulty arises mainly because many links lacksemantic content, that is, although most links represent the type of endorsementwe seek (for example, a software engineer whose home site links to Microsoft andLotus), others are created for reasons that have nothing to do with conferringauthority. Some links exist purely for navigational purposes: "Click hereto return to main menu". Others serve as paid advertisements: "Thevacation of your dreams is only a click away".

The question then arises: how do we model the way in whichauthority is conferred on the Web? Clearly, when commercial or competitiveinterests are at stake, most organizations will perceive no benefit from linkingdirectly to another one. For example, AltaVista, Excite, and Infoseek may all beauthorities for the topic "search engines", but will be unlikely toendorse one another directly.

If, as in the above example, the major search engines don’texplicitly describe themselves as authorities, how can we determine that theyare indeed the most authoritative pages on search engines? We can say that theyare authorities because many relatively anonymous pages, clearly relevant to"search engines", link to AltaVista, Excite, and Infoseek. Such"anonymous" pages are hubs that link to a collection of prominentsites on a common topic.

Hub pages appear in a variety of forms, ranging fromprofessionally-assembled resource lists on commercial sites to lists ofrecommended links on individual home pages. These pages need not be prominentthemselves, or even have any links pointing to them. Their distinguishingfeature is that they confer authority on a focused topic. In this way, theyactually form a symbiotic relationship with authorities. Thus, we can say that agood authority is a page pointed to many good hubs, while a good hub is a pagethat points to many good authorities.

This relationship between authorities and hubs is central toexploring link-based methods of search, automated compilation of high-qualityWeb resources, and discovery of cohesive Web communities.

HITS: Computing authorityand hub scores

The Hyperlink Induced Topic Search (HITS) algorithm by JonKleinberg, which is the backbone of Clever search, computes lists of authoritiesand hubs for Web search topics. Beginning with a search topic specified by oneor more query terms, HITS creates a focused sample of several thousand Web pageslikely to be rich in relevant authorities, and determines the estimated weightsof hub and authority.

Such a technique can uncover Web communities, defined by aspecific interest, which even a human-assisted search engine like Yahoo! mayoverlook.


Page(s)   1   2   3   

End of the article

PC Problems? Get a solution in 24 hours. Ask Tech Expert




Untitled 1


Does your business have Green Intelligence


What is SDSIASWODB?


No.1 Linux platform for SAP Applications


Newsletter

Message boards

Discuss this and many other IT topics at the
CIOL message board

Previous Stories

Search Engines

Understanding Geek Talk

Setting up VLANs

   
 

 
 

Magazine Subscription | RQS | Contact Us | Team PCQuest | Advertising - Print