by January 2, 2013 0 comments



It’s late at night and you’re bored. The television is devoid of entertainment- fairly typical. You’re in the mood for a movie anyway. This latest one has great reviews but you’re still not sure if it lives up to your high standards, so you call a friend who watched it recently. Once it passes the litmus test, you head online and purchase the movie. The movie is engaging and you have a wonderful time.



How is this relevant to your online experience? Online services like Amazon and Netflix make a living acting as your friends, ostensibly helping you out by recommending things to purchase along the way. Even when you purchase the movie, your information is stored and processed to be served as recommendations to you and even others. The better their recommendations, the more you’re likely to follow their recommendations and purchase the product (at least in theory). In any case, your overall online experience is enhanced and you’re pleased with their astute inferences.


This innocuous recommendation feature is in reality powered by sophisticated algorithms and data crunching machines which reside in Amazon’s data centers. Companies spend a large amount of time constantly refining these algorithms.


There are various ways one might implement this algorithm. Companies might examine users who are similar to you and use this information to serve you recommendations. They might decide to identify similar or correlated items.


One popular algorithm to match similar items (very basic and naive) is outlined below:


for each item I1
for each customer C who bought I1

for each I2 bought by some customer C

record purchase C{I1, I2}

for each item I2

calculate similarity(I1, I2)

return table


Basically, items that a particular customer bought together are stored in a table. This is done for all items, and this information is used to calculate a similarity rating to match similar items. Similarity is calculated using the resultant item vectors (I1 and I2 for example) and algorithms like the cosine similarity algorithm take these vectors as inputs to produce a similarity rating. Billions of records are thus processed. All the complicated and heavy processing is done in data centers. When you click on a item, Amazon refers to these tables (this is a relatively fast operation; the building of these tables is the slow part) to determine which items to recommend to you.


It’s interesting how such seemingly simple “customer’s who bought this also bought this” feature is backed by so much research and complexity. In a world where customer attention is king, every competitive advantage counts. So the next time you get a recommendation online, think of the lengths that such companies go through to get you this information. Don’t feel guilty though-that service is not free because you give them your information to work with too.


[image_library_tag 715/66715, border=”0″ align=”right” hspace=”4″ vspace=”4″ ,default]

Dhruv Gairola is a Software Engineer at Bibliocommons Inc, Toronto. He enjoys building web and mobile applications in Android & iOS. His research interests include machine learning, scalability and multi-agent systems. Additionally, he is an
avid drummer and a proficient scrabble player! He blogs at http://dhruvgairola.blogspot.com

No Comments so far

Jump into a conversation

No Comments Yet!

You can be the one to start a conversation.

Your data will be safe!Your e-mail address will not be published. Also other data will not be shared with third person.