|
Softpanorama |
May the source be with you, but remember the KISS principle ;-)
|
| Old News ;-) | Certification | Recommended Links | Reference | Selected Blueprints | Selected man pages | ||
| FAQs | Mirroring Root Filesystem |
RAID 0 volumes (striping) |
RAID 1 volumes (mirroring) |
RAID 5 volumes | RAID 0+1 | RAID 1+0 | Etc |
In 1987, Patterson, Gibson and Katz at the University of California Berkeley, published a paper entitled "A Case for Redundant Arrays of Inexpensive Disks (RAID)" . This paper described various types of disk arrays, referred to by the acronym RAID. The basic idea of RAID was to combine multiple small, inexpensive disk drives into an array of disk drives which yields performance exceeding that of a Single Large Expensive Drive (SLED). Additionally, this array of drives appears to the computer as a single logical storage unit or drive.
The Mean Time Between Failure (MTBF) of the array will be equal to the MTBF of an individual drive, divided by the number of drives in the array. Because of this, the MTBF of an array of drives would be too low for many application requirements. However, disk arrays can be made fault-tolerant by redundantly storing information in various ways.
Five types of array architectures, RAID-1 through RAID-5, were defined by the Berkeley paper, each providing disk fault-tolerance and each offering different trade-offs in features and performance. In addition to these five redundant array architectures, it has become popular to refer to a non-redundant array of disk drives as a RAID-0 array.
The basic idea behind RAID is that the array of drives appears to the computer as a single logical storage unit or volume.
RAID is a dominant enterprise configuration of disks in Solaris. Primary reasons to use RAID include:
There are six levels of RAID as well as a non-redundant array of independent disks (RAID 0). There are at least three different practically used RAID configurations that are often called levels 0, 1, 5. The Solaris Volume Manager software uses logical volumes (sets of disk slices), to implement RAID 0, RAID 1, and RAID 5:
The hardware based system manages the RAID subsystem independently from the host and presents to the host only a single disk per RAID array. This way the host doesn't have to be aware of the RAID subsystems(s).
Under Solaris both SVM and Veritas Volume Manager offer RAID-0/1 and 5. Special and pretty complex driver is needed to implement software RAID solution. This is more error prone and less compatible then hardware based solutions, especially Fiber Channel based, but it is cheaper.
Just like any other application, software-based arrays occupy host system memory, consume CPU cycles and are operating system dependent. By contending with other applications that are running concurrently for host CPU cycles and memory, software-based arrays degrade overall server performance. Also, unlike hardware-based arrays, the performance of a software-based array is directly dependent on server CPU performance and load.
Except for the array functionality, hardware-based RAID schemes have very little in common with software-based implementations. Since the host CPU can execute user applications while the array adapter's processor simultaneously executes the array functions, the result is true hardware multi-tasking. Hardware arrays also do not occupy any host system memory, nor are they operating system dependent.
Hardware arrays are also highly fault tolerant. Since the array logic is based in hardware, software is NOT required to boot. Some software arrays, however, will fail to boot if the boot drive in the array fails. For example, an array implemented in software can only be functional when the array software has been read from the disks and is memory-resident. What happens if the server can't load the array software because the disk that contains the fault tolerant software has failed? Software-based implementations commonly require a separate boot drive, which may be included or not in the array.
|
|||||||
- RAID0 is striping, in which data is spread across multiple spindles (disks). Data written to the RAID0 is broken up into chunks of a specified size (interlace value) and spread across the disks:
+--------+ +--------+ | Chunk 1| |Chunk 2 | +--------+ +--------+ | Chunk 3| |Chunk 4 | +--------+ +--------+ | Chunk 5| |Chunk 6 | +--------+ +--------+ Disk A Disk B | | +--------Stripe---------+In this case the term RAID is a misnomer as no redundancy is gained. The address space is 'interlaced' across the disks, improving performance for I/Os bigger than the interlace/chunk size as a single I/O will spread across multiple disks. In the example above, for an interlace size of 16k, the first 16k of data would reside on Disk A, the second 16k on Disk B, the third 16k on Drive A, and so on. The interlace size needs to be chosen when the logical device is created and can't be changed later without recreating the logical device from scratch.
RAID1: RAID1 is mirroring. Every byte of data written to the mirror is duplicated on both disks:
+--------+ +--------+ | Copy A | <-> | Copy B | +--------+ +--------+ Disk A Disk B | | +--------Mirror---------+The advantages are that you can lose either disk and the data will still be accessible, and reads can be alternated between the two disks to improve performance. The drawbacks are that you've doubled your storage costs and incurred additional overhead by having to generate two writes to physical devices for every one that's done to the logical mirror device.
Concatenation: One additional type of logical device that's involved when combining stripes and mirrors in SVM is a concatenation. There is no RAID nomencalture associated with a 'concat':
+--------+ | Disk A | +--------+ | Disk B | +--------+ | Disk C | +--------+ ConcatA concatenation aggregates several smaller physical devices into one large logical device. Unlike a stripe, the address space isn't interlaced across the underlying devices. This means there's no performance gain from using a concatenated device.
Since RAID0 improves performance, and RAID1 provides redundancy, someone came up with the idea to combine them. Fast and reliable. Two great tastes that taste great together!
When combining these two types of 'logical' devices there's a choice to be made -- do you mirror two stripes, or do you stripe across multiple mirrors? There are pros and cons to each approach:
- RAID 0+1: In RAID 0+1, the stripes are created first, then are mirrored together. Logically, the resulting device looks like:
+------------------------------------+ +------------------------------------+ | +--------+ +--------+ +--------+ | | +--------+ +--------+ +--------+ | | | Chunk 1| |Chunk 2 | |Chunk 3 | | | | Chunk 1| |Chunk 2 | |Chunk 3 | | | +--------+ +--------+ +--------+ | | +--------+ +--------+ +--------+ | | | Chunk 4| |Chunk 5 | |Chunk 6 | | | | Chunk 4| |Chunk 5 | |Chunk 6 | | | +--------+ +--------+ +--------+ |<--->| +--------+ +--------+ +--------+ | | | Chunk 7| |Chunk 8 | |Chunk 9 | | | | Chunk 7| |Chunk 8 | |Chunk 9 | | | +--------+ +--------+ +--------+ | | +--------+ +--------+ +--------+ | | Disk A Disk B Disk C | | Disk D Disk E Disk F | +------------------------------------+ +------------------------------------+ Stripe 1 Stripe 2 | | +-----------------------------------Mirror--------------------------------------+Advantage: Simple administrative model. Issue one command to create the first stripe, a second command to create the second stripe, and a third command to mirror them. Three commands and you're done, regardless of the number of disks in the configuration.
Disadvantage: An error on any one of the disks kills redundancy for all disks. For instance, a failure on Disk B above 'breaks' the Stripe 1 side of the mirror. As a result, should disk D, E, or F fail as well, the entire mirror becomes unusable.
RAID 1+0: In RAID 1+0 the opposite approach is taken. The disks are mirrored first, then the mirrors are combined together into a stripe:
+-----------------+ +-----------------+ +-----------------+ | Chunk 1 | | Chunk 2 | | Chunk 3 | +-----------------+ +-----------------+ +-----------------+ | Chunk 4 | | Chunk 5 | | Chunk 6 | +-----------------+ +-----------------+ +-----------------+ | Chunk 7 | | Chunk 8 | | Chunk 9 | +-----------------+ +-----------------+ +-----------------+ Disk A <---> Disk B Disk C <---> Disk D Disk E <---> Disk F Mirror 1 Mirror 2 Mirror 3 | | +-----------------------------------------------------------+ StripeAdvantage: A failure on one disk only impacts redundancy for the chunks of the stripe that are located on that disk. For instance, a failure on Disk B above only loses redundancy for every third chunk (1, 4, 7, etc.) Redundancy for the other stripe chunks is unaffected, so a second disk failure could be tolerated as long as the second failure wasn't on Disk A.
Disadvantage: More complicated from an administrative standpoint. The administrator needs to issue one creation command per mirror, then a command to stripe across the mirrors. The six-disk example above would require four commands to create, while a twelve disk configuration would require seven commands.SVM specifics
So, does SVM do RAID 0+1 or RAID 1+0? The answer is, "Yes." So it gives you a choice between the two? The answer is "No."
Obviously further explanation is necessary...
In SVM, mirror devices cannot be created from "bare" disks. You are required to create the mirror on top of another type of SVM metadevice, known as a concat/stripe*. SVM combines concatenations and stripes into a single metadevice type, in which one or more stripes are concatenated together. When used to build a mirror these concat/stripe logical devices are known as submirrors. If you want to expand the size of a mirror device you can do so by concatenating additional stripe(s) onto the concat/stripe devices that are serving as submirrors.
So, in SVM, you are always required to set up a stripe (concat/stripe) in order to create a mirror. On the surface this makes it appear that SVM does RAID 0+1. However, once you understand a bit about the SVM mirror code, you'll find RAID 1+0 lurking under the covers.
SVM mirrors are logically divided up into regions. The state of each mirror region is recorded in state database replicas* stored on disk. By individually recording the state of each region in the mirror, SVM can be smart about how it performs a resync. Following a disk failure or an unusual event (e.g. a power failure occurs after the first side of a mirror has been written to but before the matching write to the second side can be accomplished), SVM can determine which regions are out-of-sync and only synchronize them, not the entire mirror. This is known as an optimized resync.
The optimized resync mechanisms allow SVM to gain the redundancy benefits of RAID 1+0 while keeping the administrative benefits of RAID 0+1. If one of the drives in a concat/stripe device fails, only those mirror regions that correspond to data stored on the failed drive will lose redundancy. The SVM mirror code understands the layout of the concat/stripe submirrors and can therefore determine which resync regions reside on which underlying devices. For all regions of the mirror not affected by the failure, SVM will continue to provide redundancy, so a second disk failure won't necessarily prove fatal.
So, in a nutshell, SVM provides a RAID 0+1 style administrative interface but effectively implements RAID 1+0 functionality. Administrators get the best of each type, the relatively simple administration of RAID 0+1 plus the greater resilience of RAID 1+0 in the case of multiple device failures.
* concat/stripe logical devices (metadevices)
The following example shows a concat/stripe metadevice that's serving as a submirror to a mirror metadevice. Note that the metadevice is a concatenation of three separate stripes:
- Stripe 0 is a 1-way stripe (so not really striped at all) on disk slice c1t11d0s0.
- Stripe 1 is a 1-way stripe on disk slice c1t12d0s0.
- Stripe 2 is a 2-way stripe with an interlace size of 32 blocks on disk slices c1t13d0s1 and c1t14d0s2.
d1: Submirror of d0 State: Okay Size: 78003 blocks (38 MB) Stripe 0: Device Start Block Dbase State Reloc Hot Spare c1t11d0s0 0 No Okay Yes Stripe 1: Device Start Block Dbase State Reloc Hot Spare c1t12d0s0 0 No Okay Yes Stripe 2: (interlace: 32 blocks) Device Start Block Dbase State Reloc Hot Spare c1t13d0s1 0 No Okay Yes c1t14d0s2 0 No Okay Yes** State database replicas
SVM stores configuration and state information in a 'state database' in memory. Copies of this state database are stored on disk, where they are referred to as state database replicas. The primary purpose of the state database replicas is to provide non-volatile copies of the state database so that the SVM configuration is persistant across reboots. A secondary purpose of the replicas is to provide a 'scratch pad' to keep track of mirror region states.
Redundant Arrays of Independent Disks - Computerworld
Sys Admin v12, i06 Introduction to RAID
Raid Recovery Comparison Chart and Raid Types
RAID Level |
Min. Num of Drives |
Description |
Strengths |
Weaknesses |
Raid 0 |
2
|
Data striping without redundancy
|
Highest performance
|
No data protection; One drive fails, all data is lost
|
Raid 1 |
2
|
Disk mirroring
|
Very high performance; Very high data protection; Very minimal penalty
on write performance
|
High redundancy cost overhead; Because all data is duplicated, twice
the storage capacity is required
|
Raid 2 |
Not Used In LAN
|
No practical use
|
Previously used for RAM error environments correction (known as Hamming
Code ) and in disk drives before the use of embedded error correction
|
No practical use; Same performance can be achieved by RAID 3 at lower
cost
|
Raid 3 |
3
|
Byte-level data striping with dedicated parity drive
|
Excellent performance for large, sequential data requests
|
Not well-suited for transaction-oriented network applications; Single
parity drive does not support multiple, simultaneous read and write
requests
|
Raid 4 |
3 (not widely used
|
Block-level data striping with dedicated parity drive
|
Data striping supports multiple simultaneous read requests
|
Write requests suffer from same single parity-drive bottleneck as RAID
3; RAID 5 offers equal data protection and better performance at same
cost
|
Raid 5 |
3
|
Block-level data striping with distributed parity
|
Best cost/performance for transaction-oriented networks; Very high performance,
very high data protection; Supports multiple simultaneous reads and
writes; Can also be optimized for large, sequential requests
|
Write performance is slower than RAID 0 or RAID 1
|
Raid 0/1 |
4
|
Combination of RAID 0 (data striping) and RAID 1 (mirroring)
|
Highest performance, highest data protection (can tolerate multiple
drive failures)
|
High redundancy cost overhead; Because all data is duplicated, twice
the storage capacity is required; Requires minimum of four drives
|
Copyright © 1996-2008 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. Submit comments This document is an industrial compilation designed and created exclusively for educational use and is placed under the copyright of the Open Content License(OPL). Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.
Standard disclaimer: The statements, views and opinions presented on this web page are those of the author and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.
Last modified: November 08, 2008