Advertisment

Build your Own News-puller 

author-image
PCQ Bureau
New Update

Continuing from the discussion about Rich Site Summary from last month’s issue (An Alternative to Mailing Lists, page 46, PCQuest October 2003), we will look at how you can use tools available on the Internet and PERL modules to generate RSS for your own website as well as aggregate content from other RSS feeds. 

Advertisment

Internet tools



In the last issue, we saw how to hand-code an RSS file. But you can also find RSS-generators on the Internet. You can find one at www.blogstreet.com/rssecosystem.html, www.webreference.com/cgi-bin/perl/ rssedit.pl and www.webdevtips.com/webdevtips/ codegen/rss.shtml. 

PERL modules



In these sites, you have to just fill in textboxes with details you want included in your RSS feed and the generation of the XML-based RSS file is their job. How do they do this? Well, there are a few ways of doing this in various scripting languages, like Python, PHP and Perl. The Perl way, which is quite popular, uses a special RSS module called the XML::RSS library. It has objects that take care of RSS generation as well as RSS aggregation. You can get more information about it at the project page

http://perl-rss.sourceforge.net/.

So, to create you own RSS-aggregator/generator, all you need is the relevant PERL modules viz: XML::Parser and XML::RSS libraries. You can get these from CPAN on a Linux box with: 

Advertisment

# perl -MCPAN -e “install XML::Parser”



# perl -MCPAN -e “install XML::RSS”

On Windows, the last we checked, the latest Activestate Perl (5.8.X) (www.activestate.com) does not support XML-RSS, though it includes XML-Parser as a core module. You can always download the module from CPAN and install it using nmake in Windows. XML::RSS currently supports 0.9, 0.91 and 1.0 versions of

RSS.

RSS-generator code



The code to generate RSS fields from a website has fixed elements. You’ll find lots of sample code on the sourceforge.net site we’ve mentioned above. Here’s one that we modified to explain the concept of how it works.

Advertisment

# !/usr/bin/perl —w



#include the modules:


use strict;


use XML::RSS;


#Create an XML::RSS object: 


my $rss = new XML::RSS;


#Then set up the RSS files’s channel. 


#The options shown below are not comprehensive


$rss->channel(


title => ‘All The News in the World’,


‘link’ => ‘http://www.allthenews.com/news.html/’,


description => ‘Technology news etc.’


);


# Then add items using the add_item() method:


$rss->add_item(


title => “American Award for Indian Lady Entrepreneur”,


link => “ http://www.allthenews.com/news.html#1”,


description => “Veena Gunduvalli, founder and chief strategic officer of Emagia, was awarded the “Trailblazer of the Year 2002” award, by the Forum for Women Entrepreneurs (FEW).”,


);





The module also provides optional textinput() and image() methods. The image() data is used when you want to associate your RSS feed with an image. 


And textinput() data presents a form to the user, with an action defined on the click of a submit button, and the name of the text input box.





$rss->image(


title => ‘All The News’,


url => ‘http://www.news.perl.org/all.gif’,


);


$rdf->textinput(


title => ‘Search’,


description => ‘Search All The News’,


name => ‘text’,


‘link’ => ‘http://www.allthenews.com/search.cgi


);






























Finally save the file:

$rss->save(“perl_gen.rss”);

Advertisment

Read the # comments within the code to understand its functionality. This script will generate a valid RSS 1.0 file and save it as perl_gen.rss in the directory where you’ll run it, assuming you save this program as ‘perl_gen.pl’. Of course, you can call it anything you like: 

# perl perl_gen.pl

This was about generating RSS. You can also use this module to aggregate. And that’s also fairly

simple. 

Advertisment

RSS aggregator code



Try this code out.

#!/usr/bin/perl -w



# What to include


use strict;


use XML::RSS;


#Define variable to store an existing rss file in


my $file;


my $arg = shift;


$file = $arg;


# define an instance of the XML::RSS object


my $rss = new XML::RSS;


$rss->parsefile($file);


&print_html($rss);


# Subroutine


sub print_html


{


my $rss = shift;


print <























{‘channel’}->{‘link’}”>


$rss->{‘channel’}>{‘title’}







HTML


#print the channel items


foreach my $item (@{$rss->{‘items’}})


{


next unless defined ($item->{‘title’}) && 


defined ($item->{‘link’});


print “

  • {‘link’}\”>


    $item->{‘title’}


    \n”;






    }







  • Advertisment

    When you run the above script, you must provide the RSS file as a command-line argument. You can store the output in an HTML file like this:

    # perl perl_rss_agg.pl mysite.rss > mysite_agg.html

    So there you have it–a neat way to generate and aggregate RSS feeds in good old

    PERL. 

    Shruti Pareek

    Advertisment