Advertisment

More Reliability with RAID

author-image
PCQ Bureau
New Update

An acronym for Redundant Array of Independent (or Inexpensive) Disks, RAID is a widely used, though old, technology that combines multiple hard drives into a single logical hard drive. The benefits of this range from increased reliability to faster data access, depending on the way RAID is implemented. RAID is mostly used in servers to increase their reliability and uptime.

Advertisment

Let’s look at some concepts related to RAID that we need to go through. A physical drive array is a collection of physical hard drives. Physical arrays can be divided or grouped together to form one or more logical arrays. These logical arrays can be divided into logical drives that the OS sees. The OS is oblivious to the presence of any physical or logical arrays, courtesy the RAID controller, which manages how the data is stored and accessed across them. A RAID controller can be implemented using hardware or software. The former is more suitable for higher RAID levels that require more CPU power. 

There are two kinds of hardware RAID controllers: internal and external. Internal RAID controllers are like any ordinary card that fits onto a slot like a PCI slot. Some motherboards even have built-in RAID controllers. Depending on the RAID controller and levels available, a certain amount of cache memory is also present. External RAID controllers come in their own case with the hard drives. High-end servers often contain a separate enclosure for the RAID controller and hard drives. An external RAID controller is usually more complex and has more memory compared to an internal one, because of the large number of hard drives and complex RAID levels it needs to work with. It generally uses a SCSI interface, which makes it easier to have large numbers of (hot swappable) hard drives. Cost wise, internal controllers are cheaper than external controllers. However, software RAID controllers use processor time, so the more complex RAID levels will slow down a system considerably. Hardware RAID is a better option in this case. 

Techniques used in RAID

Advertisment

Central to RAID are the concepts of Mirroring, Parity and Striping. 

Mirroring

Mirroring offers good fault tolerance and reliability because here two copies of the same data are stored on separate hard disks or disk arraysMirroring involves having two copies of the same data on separate hard drives or disk arrays. Since the data has to be written simultaneously to both the disks, it takes a toll on performance. However, performance increases during disk reads, as information is read from both the drives, thereby reducing wait states. Mirroring can also be performed on more than two drives, so the more number of drives, better the read performance. Mirroring offers maxi- mum security of data and ease of recovery, as two distinct copies of data are maintained. But in real life, putting mirroring into practice can be an expensive affair, as twice as much storage space is needed. Parity is a cheaper option.

Advertisment

Parity

Parity requires less storage space than mirroringSuppose you have N number of data elements, you use these N elements to create a parity element and end up with N+1 elements. If one of these N+1 elements is lost, it can be recovered as long as at least N elements remain. The RAID controller creates this extra parity element by (generally) using the XOR operation. The method reduces the amount of additional storage space needed as compared to Mirroring, but this is not as fault tolerant. Parity algorithms used in this method usually need large amounts of computing power, as the parity data has to be computed every time a read/write takes place. This means that a hardware RAID controller is required, as a software controller will tie down the CPU.

Striping

Advertisment

Striping reduces time taken to access your data by breaking your (large) file into multiple pieces and storing each piece on a separate hard disk, so that the whole file can be accessed at the same timeStriping is aimed at increasing performance. Suppose you have a large file on a single hard drive. To read the file, you’ll have to wait till it is read from the beginning to the end. Now, if you break this file into multiple pieces and store them on separate hard disks, all of which can be simultaneously accessed, the total time taken is more or less equal to the time taken to read a single part (similar to downloading a file in segments). If you increase the number of hard drives, the file will be transferred in 1/Nth the time it takes to transfer from one hard drive (where N is the number of drives). Clearly, the more the number of hard drives, the greater the increase in performance. There are two levels of striping that can be used: byte level and block level. In byte-level striping, each byte of data is written onto a different hard disk. That is, suppose you have four hard disks, the first byte is written onto the first hard disk, the second on the second and so on, and the fifth again on the first hard disk. Block-level striping involves breaking up data into blocks of a given size. These are then distributed the same way as in byte-level striping. The size of these blocks is called the stripe size. 

RAID levels

  • Based on the techniques mentioned above, there are six standard RAID levels that can be implemented.

  • RAID 0: Though technically not RAID, as it doesn’t involve any redundancy, it has come to be accepted as a RAID level. It is also called Striping as it involves only striping of data. Its performance is very good though fault tolerance is poor since there’s no data backup. It is ideal for an application that needs high storage speeds, but no redundancy, for example, temporary Photoshop files.

  • RAID 1: Involves storing identical copies of data on two different drives. It is an implementation of the Mirroring concept and has the latter’s pros and cons. It is ideal for use in small file servers.

  • RAID 2: Uses Hamming error correction code (ECC) and is intended for use with drives that don’t have built-in error detection. Data is split at the bit level and is written along with the Hamming code. When the data is read, the Hamming code can be used to detect and correct errors. Due to the complicated and expensive controller hardware needed and also due to the fact that most RAID implementations use SCSI drives (which have built-in error detection), this level is rarely used these days.

  • RAID 3: This level uses byte level striping with dedicated parity. The data is striped across the array at byte level (which increases the performance) with one dedicated drive holding the parity information for redundancy, but slows down writes. At least three hard disks are needed for such an arrangement (two for data, one for parity). Since parity needs to be recalculated after each write, a software implementation is not practical.

  • RAID 4: This level is similar to RAID 3. The only difference is that it uses block-level striping instead of byte-level striping. This gives you the advantage of changing the stripe size to suit application needs.

  • RAID 5: The most popular RAID implementation. It uses block-level striping (like RAID 4) and distributed parity. That is, instead of restricting parity information to one particular drive (like earlier levels), it distributes it to all the drives. This removes the bottleneck of writing to just one parity drive. Fault tolerance is maintained by separating the parity data from the actual data block. The actual recovery process is somewhat complicated due to the distributed nature of the parity. Since the parity information still needs to be calculated after each write, it slows down the whole process a little bit and again needs a hardware controller. It finds usage in database servers.

Since individual levels may not be suitable for everyone’s needs, a combination of RAID levels can also be used. The most popular combinations are RAID 0+1 and 1+0. These two are often thought to be one, though there exists a subtle difference. RAID 0+1 is striping and then mirroring. Let’s say you have eight drives. You split them into two arrays of four drives each and apply RAID 0 to them individually. Then you apply RAID 1 and have one array act as a mirror of the other. Now, if one of the disks in an array fail, the entire array goes down, though the other one is there (now without any fault tolerance).

RAID 1+0 applies mirroring first and striping later. That is, the eight drives are divided into four sets of two drives each. Each set now has duplicate information. Striping is then applied across these mirrored sets. This technique has better fault tolerance, as it will work fine as long as at least one mirrored set is active. Theoretically, you can have the system working fine, even with half your drives failing. These two techniques are very popular as they are easy to implement and combine the benefits of RAID 0 and 1 levels.

Advertisment