RAID (Redundant Array of Independent Disks) systems are used to increase the performance and/or reliability of the hard disk drives of the system. In this article we will explain the basics about RAID0, RAID1, RAID5 and the advantages of RAID6 over them.
There are two basic ideas behind RAID: data stripping (a.k.a. RAID0), used to increase performance, and mirroring (a.k.a. RAID1), used to increase reliability.
On RAID1 the data stored at one hard drive is automatically copied to another. In a two-disk system, data found on the second disk will be an exact copy of the data stored on the first disk. If the first hard drive goes defective, you will still have your data, since the second disk will have an updated copy of all data stored on disk 1. This process is also known as mirroring.
RAID0, on the other hand, is targeted to increase the disk performance, by dividing the files between all available disks. For example, on a RAID0 system with two disks, a 100 KB file to be stored on the hard disk is split into two 50 KB data chunks, each one stored in one different disk, increasing the disk performance, since it is faster to store half the file than to store the full file. In other words, it is faster for a hard drive to store a 50 KB file than to store a 100 KB file.
The problem with RAID0 is to increase its reliability, since if one of the disks go defective, all data is lost.
Several RAID systems were created to increase the reliability of data stripping, like RAID3, which uses an extra hard disk drive to store parity and data correction information, and RAID5, which is similar to RAID3 but stores parity and data correction information inside the disks found on the system, thus not requiring an extra hard disk drive. Keep in mind that since RAID5 will store parity and data correction information inside each “data” hard disk drive of the system, less space is left on those drives for data storage.
If there is any read error, the RAID system automatically starts a data recover operation, using the parity and error correction information to restore the data being read.
[nextpage title=”Inside RAID5″]
The problem with RAID5 is that if one of the hard disk drives fails in the exact moment that a data recovery operation is being conducted, the system fails and there is data loss, i.e., the system isn’t capable of recovering the data that started the data recovery operation and also data is lost because one of the hard disk drives is defective.
While for small RAID systems the probability of such scenario to happen is very little, for big RAID systems such situation isn’t so remote. During IDF Spring 2005 Intel provided a study pictured in Figure 1. It shows the probability of failure on a RAID5 system for three different configurations:
- Scenario 1: RAID5 system with five enterprise-class 30 GB hard disk drives (120 GB total). Frequency on which data recovery operations is necessary: one every 23 years (orange on the chart). Probability of a system failure during the data recovery operation: 0.12% (i.e., one error every 834 data recovery operations, in green on the chart).
- Scenario 2: RAID5 system with five desktop-class 300 GB hard disk drives (1.2 TB total). Frequency on which data recovery operations is necessary: one every 2.3 years (orange on the chart). Probability of a system failure during the data recovery operation: 11% (i.e., one error every nine data recovery operations, in green on the chart).
- Scenario 3: RAID5 system with 50 desktop-class 300 GB hard disk drives (15 TB total). Frequency on which data recovery operations is necessary: one every 3 months (orange on the chart). Probability of a system failure during the data recovery operation: 70% (i.e., one error every two data recovery operations, in green on the chart).
As you can see, the probability of data loss during a data recovery operation increases with both the number of hard disk drives in the system and the capacity of each hard disk drive.
[nextpage title=”RAID6 Advantages”]
RAID6 systems instead of storing only one parity and error correction information store two, which are arranged in such a way that even if one of the hard disk drives fails during the data recovery process, the systems continues operational, with no data loss.
In Figure 2, you can see the probability of data loss during a data recovery procedure on RAID0 (in blue), RAID5 (in yellow the probability of hard disk drive failure and in pink the probability of a system failure) and RAID6 (in light blue).
The “y” axis shows the number of months until a unrecoverable failure while the “x” axis shows the system capacity, in GB.
As you can see on the graph, RAID6 only has the same failure probability of RAID5 on systems starting at 23 TB, below this point RAID6 presents a failure probability far lower than RAID5, being the recommended system for storage systems where reliability is the key word.