Advertisment

REDIS: the Data Structure Server for Your Cloud

author-image
PCQ Bureau
New Update

Hrishikesh Dewan, Consultant Siemens and Author of a book on WCF

Advertisment

In the last issue, we have discussed several fundamental things associated with REDIS. The Data Model, Virtual Memory, Replication and Transactions are a few of the things that we covered earlier. In this article, we will continue with the other different features about REDIS. We will also go through a pair of C# code in this article which will show you how easy it is to interact with REDIS and discuss the new REDIS architecture.

First things first. REDIS is primarily a client—server architecture storage model where we interact with the server by sending commands and in turn receive replies from the server. Every command issued to the server has a cost associated with it. A command is accepted in the client, accurate byte representations are made and then sent across as a TCP stream. The server processes the command and returns back the result. Traditionally in such a model if you are to insert hundred key-value pairs, you will require hundred such round trip times. Quite naturally, this is a costly affair. To circumvent this, some of the REDIS commands such as SET, GET, etc support combining multiple keys in one single command. When multiple keys are command, a single byte array is created and sent to the REDIS server instead of multiple such same commands. The listing below shows you an example of such a command.

Applies To: Cloud Storage, Distributed Storage Facilities

USP: Data structure server, atomic operations, extremely lightwieght, easy to configure and horizantally scale applications to a considerable extent

Related articles: Part 1 of REDIS: http://bit.ly/l1gZHA



Advertisment

The example in Figure 1 is fine and works for a single command type. You can easily do a mental map of it to the 'params' keyword of C# where in you can pass as many parameters to a function as you like. But what about the use case if you are to make multiple commands of different types in a single command. For example, a single command that would insert values to a list, sorted list and a hash table or a command that will set a key and then retrieve a value of yet another key. Such a facility is provided in REDIS by virtue of Pipelining. Pipelining is not a new concept- (remember unix pipes!) but a very handy one in several practical scenarios.

So far in this issue and the last one, we discussed about the several different commands and how they are processed. Let's spend some time in understanding how these commands are packed and results are returned —that is the data communication protocol that REDIS follows. Like the simplicity of a REDIS server, the data protocol now known as 'REDIS Unified Protocol' is also very simple. Every command is followed by a carriage return (crlf) and the number of parameters and the command set is explicitly mentioned. In essence, the following general form is used to explicitly convert a command.

Advertisment

# number of arguments | crlf | number of bytes in argument 1 | crlf | argument 1 | number of bytes in argument 2 | crlf | argument 2 |crlf | .......| crlf | number of bytes in argument n | crlf | argument n | crlf |

To make it simple, let me show you a command. We will start with the following command — set “mykey1” “helloworld”. The redis-cli when receives this command converts it into the following form: 3 | crlf | 3 | crlf | “set” | crlf | 6 | crlf | “mykey1” | crlf | 9 | “hello world “ | crlf.

Advertisment

The protocol syntax is easy and any programming language can decently parse the command. Let us now see, the return format from the server. But before that, let us see the different types of values that can be returned. First, there are return values that includes just a status message and this is the case with most add, remove, insert and set commands. The second case is when a single or a multiple values are returned. The third case is when an integer is returned for example the INCR, DECR, STRLEN, etc kind of commands and the fifth is of course the error. To accommodate all of these return types, the protocols uses a symbol, “+” signifies a single line reply, “:” means a integer reply, “$” means a bulk reply and “*” means a multi-bulk reply. Single line reply is generally status message. Bulk replies for single values say when you use 'Get' and multi-bulk means when multiple such values are returned such as GetRange commands. A '-'character indicates an error. The first byte of every return message includes this character and every REDIS client must parse this character first to ascertain the meaning of the subsequent characters. After this single character, the result of the protocol format is same as defined for the sent. Simple isn't it? If you are programmer and have a good exposure to network programming, you should definitely try writing a REDIS client. And yes, I didn't tell you before, by default the REDIS server listens on port 6379.

To date, there are several different clients that are written using several different programming languages. You have clients written for C, C++, Java, Haskel, Erlang, Perl, Python, Ruby, etc. And to show you a bit of it in C#, I will draw upon ServiceStack's Redis library. ServiceStack's Redis client is a beautiful Open Source library that not only allows you to use the direct REDIS cli (command line interface) like commands but also natively supports POCO (Plain Old CLR objects). Simply, create a class, pass on to the client an instance of it and it will take care of converting the object into a suitable form, encode correct protocol message and pass it on to the server. The two listings below show you how easy it is with C# stack. Since REDIS stores everything as binary safe strings, therefore the C# stack first converts the POCO into a JSON object. This is all done internally and you needn't to cross your mind to put attributes, etc. There are a plenty of examples in the service stack, including using the pub-sub and transactions; do check that out.

public void TestRedisStringOnly()

{

using (IRedisClient client=new RedisClient())

{

if (client.Set("key3","This is a string"))

{

string key3Value=client.Get("key3") ;

Console.WriteLine("The value so retreived is : " + key3Value);

Console.WriteLine("deleting the value:");

if (client.Remove("key3"))

{

try

{

string retValue= client.Get("key3");

if (retValue!=null)

{

Console.WriteLine("Retreiving the value once again" + retValue);

}

else

{

Console.WriteLine("retvalue is null");

}

}

catch (RedisException re)

{

Console.WriteLine("Message is : " +re.Message);

}

}

else

{

Console.WriteLine("unable to delete the key");

}

}

}

Advertisment

You must be wondering whether REDIS has secure features built in. The cloud landscape looks a lot promising and my experience has shown that there are still large sections of enterprise users who still feel storing data in the cloud is not safe. Although this is a lengthy topic to foray, I shall discuss with you now on what security features that REDIS supports. First REDIS has no features to encrypt your data while storing it; if you need it, it's your responsibility to barge the code and write your plug in there. The maximum thing that REDIS can help you is to create passwords for access and commands to get accepted must authenticate it self by using the REDIS AUTH command. The AUTH command requires a valid password and the commands keep rolling in the server only when the AUTH command succeeds. But beyond that there is no security mechanism built in the server. A user once authenticate can see all the keys stored as well as the values that are stored. So if you want to distribute the REDIS server to multiple different users and are still skeptical about a user, think twice. This may seem like a disadvantage at first but then if you restrict the use of REDIS in specific applications in a trusted environment, skipping security is not a bad idea as well.

This is all about REDIS of today. Let me tell you now about the future of REDIS: on what directions it is moving. First, the stand-alone REDIS is now being re-designed as a distributed REDIS cluster. In it's new incarnation, REDIS will have not one but a number of servers, each server will be responsible for a certain distinct set of keys and other servers a different set of keys. This will be all transparent to the client. A client will simply send a command to create a key; the server depending on the key passed will map to the right server to store and provide access. This is a common technique now to load balance and distribute load across a set of server and they key to do this lies in a technique called 'consistent hashing'. The idea behind consistent hashing is first proposed in a paper titled “” where in the mapping of keys to nodes are defined. To put it simply, in consistent hashing you provide an identifier to each node and organize the nodes in a circular ring clock wise. A single node is connected to two adjacent nodes- one that has a key that is immediately smaller than the node and the other that has a key that is immediate bigger than the node. When you store an object, you apply a hashing function. The hash function output is also an identifier and it either maps to a node or falls in range between two different nodes. Your job is to identify the location and store the object in the node which is close to the key. And, if you replicate the object in the next three successive node, you will loose the fear of losing data if one of them goes down. These same techniques will also be used by the REDIS-CLUSTER. In REDIS-CLUSTER, each node will maintain a map of keys and node identifiers and so if you pass on an object, it would immediately find out which node should store the data and pass on the object for it. Apart from that, it will have provisions for detecting faults by virtue of a gossip based protocol where nodes will gossip about each others presence and absence. The other important modifications in the REDIS cluster is to enhance the use of data structures by choosing vector like structures for small to medium size lists and sets. There is an idea on using contiguous memory locations for storing medium size lists and replace set's hash table implementation with a fixed size value in contiguous memory. The new lists (in REDIS literature, it is termed as ZipList), there would be no pointer overheads when traversing across a link list. And in the new set (in REDIS literature, it is termed as IntSet), the gain is in efficient use of memory though due to ordered arrangement the complexity of finding an element would be O(logN).

There is no doubt REDIS is promising and that is why as of today it is being used by a large number of people in several industrial products. I will name a few but visit the REDIS website for such use cases. StackOverflow uses REDIS for caching; Guardian, the newspaper, uses it to store highly scalable data, Ohm uses it as it's main database and best of it Wooga, a multi-player gaming website uses REDIS to store game related data.

Now, we are the end of it and the only word left to tell you is 'Happy REDIShing'. If you still haven't grabbed the binary, do it and give it a try, it is worth looking at!

Advertisment