You've got a great idea for a new application, on the web or on the desktop,
and you're starting to think about the various features and functionality it
will provide. More than likely, a big part of your idea is original, but various
features you want to implement already exist out on the web as part of another
website. For instance, perhaps you want to create a new web application called
Develevent, a site where developers can find technology events around the world
and get all the information they need to plan their trips to those events. At
its core, this web-application should have the basic functionality of listing
details about a given event and searching for events. Additionally, for any
given event, you may also want to incorporate features that show the user hotels
that are available for the nights around the event dates, flights to the nearest
airport from the user's home airport, historical weather information for the
city of the event at the given time of year, and so on. These features will be
valuable to your users, but it is not your core business, nor is it truly
feasible to become a hotel availability aggregator or a historical weather data
storehouse.
Direct Hit! |
Applies To: Web app |
Luckily for you, the web is rich with companies and websites whose core
business it already is to provide such functionality. As users of the web, we
often see these sites as silos, where in order to access their content, it is
necessary to visit the site itself. However, with open APIs, RSS feeds, and
services like Dapper and Yahoo! Pipes, it is becoming easier and more feasible
for developers to incorporate content and functionality from existing websites
into their new creations while focusing on the core functionality you are best
suited to provide. With affiliate networks such as Commission Junction, you can
even earn money when your users engage with and purchase from the sites whose
content and functionality you are incorporating into your own.
Let's take a deeper look at the various means to obtain content you could
incorporate into your new application.
Look for APIs
If a website whose content you want to use has an API (Application Programming
Interface), this is your best bet. APIs give you structured access to a
website's content and allow you to programmatically replicate certain
functionality. For instance, Amazon provides many APIs including one that allows
you to search for albums by a specific artist and filter and sort the results by
various criteria, such as number of ratings and release date. This allows you to
easily incorporate Amazon's content and functionality directly into your
application. For more about Amazon's web services, check out
http://tinyurl.com/2xyfhj.
RSS/Atom feeds
If an API is not available, you may be able to use an RSS feed. Most websites
offer RSS feeds for their content. The biggest drawback to RSS feeds (as opposed
to APIs) is that they are usually used to provide a website's latest content in
date-order. Often times, there is no RSS feed that allows you to pass a
parameter, such as a search term or an argument indicating how to sort the
results. For example, a hotel site may provide an RSS feed of its best deals
this week, but you cannot tell it to give you hotel availability in a specific
city on a specific date. However, if a site does have an RSS feed with the
content you need, this is a good and easy solution.
Expedia is a good example of a site that provides excellent RSS feeds with
parameterization. Check out their RSS feeds here: http://tinyurl.com/5wtjlb. For
an example of a feed of New York, see here:
http://tinyurl.com/5cm4ph.
What about a site that doesn't have a good feed? Or doesn't have one at
all?
If you find yourself wanting to use content and functionality from a website
that doesn't provide an API or a feed, or whose web services don't meet your
needs, you still have some choices. First of all, many sites do have APIs that
they do not expose except to their partners. There are others that provide them
to affiliate networks, so joining such a network will give you access.
Parsing & Storing/Crawling
APIs are accessed by programmatically calling a URL and downloading the results
given back. Generally the returned content of an API is XML or JSON. RSS and
Atom feeds are in XML. This allows you, as a developer, to easily parse the
content and do what you want with it. Depending on your application, there are a
few ways in which you can access and work with this structured content.
One common method is to access and parse the content on the fly as the result
of a user action. For example, let's say you provide a feature that allows your
users to search for hotels in a specific city. When the user provides the city
and dates of interest, your code would access the APIs of one or more hotel
sites, passing the city and dates, to get availability, parse the returned
results, and display them to the user immediately. This approach is most useful
when you need very fresh data and when you cannot anticipate the values your
users are going to search for. The downsides to this method are that there is
latency associated with accessing the API (ie, your user must wait) and your
site causes a lot of traffic on the API of the remote website. Also, if the
remote site goes down, your users will not have access to this feature.
To address the downsides of the on-the-fly method, as well as to allow
greater flexibility with how you filter, sort, and modify the returned data,
your code can access the content from an API 'offline.' This means that you have
some process that runs regularly (not as a result of some user's action) to
download the data from a website's API or feed and then store it for access
later. The most common place to store such data is in a database (like MySQL) or
a search engine (like Lucene/Solr), but you could even store the XML files
directly on disk and access them locally.
Earning money
Regardless of whether you choose to build your website as a business or just a
personal project, it never hurts to earn a little revenue from your efforts.
While you're unlikely to generate a lot of revenue unless your application is
heavily used, it can still be valuable to integrate with 'affiliate programs' of
websites whose content you are using. Most websites that sell directly to
consumers participate in affiliate programs. These programs share revenue with
application developers who send consumers to their site when those consumers
make a purchase. So, for instance, if you incorporate a feature into your
application that allows users to search for hotels and a user subsequently books
a hotel that you sent them to, then you could receive a commission on the sale.
Affiliate networks work by providing you with an 'affiliate ID' which you
then pass as an argument in the URL when you send a user to the website selling
a product. The network tracks any sales from that user and associate the revenue
share with your affiliate account. Revenue share can range from pennies to large
sums depending on the type of item and size of sale.
Legal issues
Whenever you work with someone else's content, you should be aware of any legal
implications that your actions may have. Regardless of the method you use to
obtain the content, make sure you refer to the website's terms of use. Unless
explicitly stated otherwise in the terms of use, you will need the direct
permission of the website owner to use his or her content.
Conclusion
As a developer, you have the entire web at your disposal when building your
applications. By harnessing the content and functionality already available on
the web, you can focus on what you do best while still providing many rich
features. The websites whose content you use will benefit by being exposed to a
wider audience and your users' experience will be enriched.
Jon Aizen, co-founder and CTO, Dapper