Developers

REDIS: the Data Structure Server for your Cloud

PCQ Bureau

02 May 2011 10:25 IST

New Update

REDIS is an advanced key-value store that has a number of useful structures that directly help in writing several class of distributed computing applications. In this article, we will focus on REDIS, its architecture and the data model. In the subsequent issue, we will dig deep on how REDIS can be used from a .NET client.

Traditionally, REDIS is actually more than a data structure server and sometimes it is also referred to as the remote data structure server. It has a simplified and clean implementation of Hash Table, Doubly Link List, Un-Sorted and Sorted Sets. The core server itself is a Hash Table server where the keys are strings and values could be of any types from the above data structures. That is, in REDIS, you have a key where values can be a list or a sorted set or a Hash Table. And like the usual data structure operations, you have ample operations that are provided to manipulate the remote structure. Working with REDIS is very simple. If you have not installed REDIS, go forth and grab the source. The REDIS server includes a group of databases and it's default configuration supports up to 16 of them. By default when you connect to the REDIS server, you always connect to the database numbered '0'. To override and use a different database, a simple 'SELECT' command is used.

Applies to: Cloud Storage, Distributed Storage Facilities.

USP:Data Structures Server, Atomic Operations, Extremely Lightwieght, Easy to configure and horizantally scale applications to a considerable extent.

Primary Link: http://redis.io

In this example, I created a key with 'key1' and value as 'hello world'. To create the key, we use SET and to retrieve the value we use GET. According to REDIS, keys are always strings and they are binary safe. Binary safe is a class of string functions which doesn't decipher any special character in a string and hence in REDIS you can have keys of any type.White space, null, carriage return, etc can also be included in a key. Thus 'key1', 'key/crlf1', etc are allowed. When you try to enter a new key-value pair, REDIS server first looks for the key in the hash table; if at all it exists, the new value is overwritten else a new key is created in the hash table and corresponding value is set. Since, the server is a hash table implementation, the complexity of GET and SET is no more than O(1). Since the value stored is a string, therefore like your usual string operations as supported in Java, C#, you can manipulate by inserting, deleting, iterating and find the length of the stored value. However, unlike our programming language class libraries, REDIS supports a special set of functions called INCR, DECR, INCRBY, DECRBY. These functions operate when the stored value is an integer or a double. However, important to note is that all operations that you perform on the server are atomic and at a time only one thread acts on a key. This feature has the advantage of keep things simple but indicates that only a single client operating on a key can work at a time. Inherently, there is no support for concurrency and updates from multiple clients to the same key may overwrite the values stored earlier. In the simplest usage of a REDIS key-value pair, the last write wins and there is no way you can retrieve the previous values. To circumvent, however, there are a few primitives that are designed and we shall discuss them later.

Like a simple value of type string, a list can also be created. A list is a doubly link list implementation and you have the provision of entering values from both the right and the left end. Since, you can insert, delete (push/pop) from both sides, the complexity of doing so is O(1). The list at times can be also be used as Producer-Consumer Queue by virtue of REDIS block operations. That is deleting an item is allowed only when there exists a value. And like the traditional link list, there are operations allowed for insertion in between, etc. In fact, a particular feature of REDIS called Publish/Subscribe uses lists internally to store events generated from the publishers. We shall discuss Pub/Sub in a while.

Like a list, REDIS supports two different varieties of Sets -Sets and Sorted Sets. Sets are a collection of unordered collection of elements and the mathematical union, intersection and difference operations can be applied between sets. In Figure 3, I show you two sets and their corresponding operations. Adding and deleting an element in a set is O(1) whereas union, intersection requires at least in the order of N and more. The internal implementation of Set is a List with Hash Table. The list helps in doing Union and Intersection whereas the Hash Table helps in quick insertion and deletion. However, note that members in a set must be distinct and there are no duplicates allowed.

Unlike a set which is purely unordered, a sorted set is an ordered data structure. A sorted set is a skip list implementation and hence insertion and deletion requires a O(logN). The order of insertion and deletion depends on a score and on each insertion requires its value apart from the usual member to be inserted. Like an ordered data structure, for example a Binary Search Tree, you can increment, decrement, do range queries on a pair of scores or even retrieve the results in a reverse way. Sorted sets are very important additions to the REDIS data structure list and you will find it useful in several use cases.

If that is not all sufficient, REDIS support yet another data structure called Hash. Hash actually is a hash inside a Hash. This means that as a value of the core REDIS hash table, you can store a Hash Table itself. By virtue of this, you have a two dimensional hash table. I show you below a very simple set and get on a hash table.

Apart from these core data structures, there is one interesting primitive supported by Redis. It is known as Publish-Subscribe or in short Pub-Sub. Pub-Sub is a common event based pattern that is used to write de-coupled applications and is one of the core foundational topics in the design of message queqing systems. In Pub-Sub, there are one or more publishers which publish events to a known channel. This channel in Redis is a list. Attached with the channel is a number of subscribers. When an event or a message is published to a channel it is automatically delivered to all of the subscribers. Subscribers can browse for the list of available channels, subscribe them or unsubscribe from the already subscribed channels. In fact REDIS pub-sub is used in production environment at a number of sites.

As evident, REDIS follows a simple client server architecture and we know a single server is a major point of failure, REDIS includes mechanism where in you can define arbitrary number of slaves. Slaves can be arranged like a graph or chained one after another. REDIS uses non-blocking procedure for updating the slaves and if your application can tolerate certain level of inconsistency, you may direct read operations to the slave there by keeping the main server free for writes only.

By default, the entire set of REDIS key-value are stored in memory and with a little configuration change, you may also persist the data in the disk. This is in sharp contrast to MemCache, a very popular key-value store where in all data reside in memory. By backing or flushing your in-memory data to the disk, you diminish the risk of accidentally loosing data when the server crashes. As of today, a single database can hold 2 (32) keys and a value cannot be greater than 2 GB. This is a fairly large number for most use cases and if you face constraint in keeping all of them in memory, REDIS supports a mechanism called virtual memory where in only the keys are stored in the memory where as the values are kept at the disk. It is a demand based implementation and values are only put forward to the memory when required.

Let's get back now to one of the earlier problem statements that we encountered while in strings —concurrency of operations. By default and as said earlier, REDIS rewrites values and last update wins overwriting all other previous writes. However, there is a mechanism call watch which when used track the changes to a value corresponding to a key. If there is a change, the operation called say 'set' is not invoked.

This particular technique is called optimistic concurrency mode where in each client is provided the opportunity to update the value and only the first update wins — the rest are discarded. Watch is a common feature in REDIS and it is used with yet another feature of REDIS called transactions. Like a traditional ACID transaction, you can group a series of operations on a set of keys. They execute as a single atomic unit and if successful commit the entire series of operations.

However, if an operation fails in the midst, already successful operations are not rolled off. This is in sharp contrast to ACID (Atomicity, Consistency, Isolation and Durability) transactions of RDBMS's. So far so good and there are more interesting topics associated with REDIS. Pipelining, the data communication protocol and the new REDIS design are still left to be discussed. We will cover all this and more in the subsequent issue.

This article is authored by Hrishikesh Dewan. Hrishikesh works as a Consultant for Siemens and is author of a book on WCF published by TMH. You may connect him via facebook.com/hrishikesh.dewan.
The writer is also thankful to Pieter Noordhuis of REDIS for reviewing and providing valuable suggestions for improving the article.

Stay connected with us through our social media channels for the latest updates and news!