|
Softpanorama |
May the source be with you, but remember the KISS principle ;-)
|
The Solaris volume manager (SVM) is a free component of Solaris 9 and Solaris 10. This was previously known as Solstice DiskSuite. SVM software provides mechanisms to configure physical slices of harddrive into logical volumes. Logical volumes can be configured to provide mirroring and RAID5. In its simplest form SVM uses traditional Solaris disk partitioning (up to eight partitions or slices in Solaris terminology) to build virtual disks called volumes.
Any partition can be used to create volumes, but it is common practice to reserve slice s7 for the state database replicas. Database replicas are created on selected disks and hold the SVM configuration data. It is the administrator’s responsibility to create these state databases (using the metadb command) and distribute them across disks and controllers to avoid any single points of failure.
The version of SVM bungled with Solaris 9 has several new features: soft partitions, monitoring active disks, access via SMC.
The soft partitioning feature allows to subdivide a disk into many small slices that are controlled and maintained by software (hence the term soft partitioning). Soft partitioning allows up to 8192 partitions on a single drive or volume, providing greater flexibility as with today’s 500G disks, customers often need to subdivide a disk into more then 8 partitions. Solaris Volume Manager enables an administrator to create soft partitions either on top of individual physical disks, or on existing RAID 1, RAID 5, or RAID 0 volumes.
All mirrored volumes in SVM automatically benefit from volume logging. Volume logging limits the amount of block copy activity necessary to keep the mirrored volumes in sync. Volume logging uses bitmaps held in the state databases to track changes to submirrors. Consequently, all mirrored volumes are protected against the need to perform a full-mirror resynchronization in the event of a system failure.
Tips:
set md_mirror:md_resync_bufsz = 2048
For applictions with continius write consider
Summary:
metadb -a -- creates state database replicas (you can also create then using The Solaris Volume Manager software GUI) The syntax of the command is:
metadb -a [-f] [-c n] [-l nnnn] disk_slice
metainit -- create RAID volumes The force (-f) option must be used because this is the root (/) file system, which cannot be unmounted. The syntax of the metainit command is:
metainit -f [concat_vol | stripe_vol] numstripes width component...
metastat -- Verify the status of the root (/) submirrors.
# /usr/sbin/metastat d10
metaroot allow mirroring the root (/) file system, modifes the /etc/vfstab and /etc/system files, as follows:
metaroot device
metadetach -- Detach one submirror to make the root (/) mirror a one-way mirror.
# /usr/sbin/metadetach d10 d12
metaclear clear the mirror and submirrors. The -r option recursively deletes specified metadevices and hot spare pools, associated with the targeted metadevices specified in the metaclear command.
# metaclear -r d10
d10: Mirror is cleared
d11: Concat/Stripe is cleared
# metaclear d12
d12: Concat/Stripe is cleared
|
|||||||
[Jan 14, 2006] Using Solaris Volume Manager -- a good overview
The Solaris Volume Manager (SVM) is packaged with Solaris 9 and provides advanced hard disk management capabilities, including creating RAID volumes, soft partitions, hot spare pools, and transactional volumes. Using SVM can help increase your storage capacity, ensure data availability, improve disk I/O, and lower administrative overhead. SVM was formerly known as the Solstice DiskSuite.
- RAID0 is striping, in which data is spread across multiple spindles (disks). Data written to the RAID0 is broken up into chunks of a specified size (interlace value) and spread across the disks:
+--------+ +--------+ | Chunk 1| |Chunk 2 | +--------+ +--------+ | Chunk 3| |Chunk 4 | +--------+ +--------+ | Chunk 5| |Chunk 6 | +--------+ +--------+ Disk A Disk B | | +--------Stripe---------+In this case the term RAID is a misnomer as no redundancy is gained. The address space is 'interlaced' across the disks, improving performance for I/Os bigger than the interlace/chunk size as a single I/O will spread across multiple disks. In the example above, for an interlace size of 16k, the first 16k of data would reside on Disk A, the second 16k on Disk B, the third 16k on Drive A, and so on. The interlace size needs to be chosen when the logical device is created and can't be changed later without recreating the logical device from scratch.
RAID1: RAID1 is mirroring. Every byte of data written to the mirror is duplicated on both disks:
+--------+ +--------+ | Copy A | <-> | Copy B | +--------+ +--------+ Disk A Disk B | | +--------Mirror---------+The advantages are that you can lose either disk and the data will still be accessible, and reads can be alternated between the two disks to improve performance. The drawbacks are that you've doubled your storage costs and incurred additional overhead by having to generate two writes to physical devices for every one that's done to the logical mirror device.
Concatenation: One additional type of logical device that's involved when combining stripes and mirrors in SVM is a concatenation. There is no RAID nomencalture associated with a 'concat':
+--------+ | Disk A | +--------+ | Disk B | +--------+ | Disk C | +--------+ ConcatA concatenation aggregates several smaller physical devices into one large logical device. Unlike a stripe, the address space isn't interlaced across the underlying devices. This means there's no performance gain from using a concatenated device.
Since RAID0 improves performance, and RAID1 provides redundancy, someone came up with the idea to combine them. Fast and reliable. Two great tastes that taste great together!
When combining these two types of 'logical' devices there's a choice to be made -- do you mirror two stripes, or do you stripe across multiple mirrors? There are pros and cons to each approach:
- RAID 0+1: In RAID 0+1, the stripes are created first, then are mirrored together. Logically, the resulting device looks like:
+------------------------------------+ +------------------------------------+ | +--------+ +--------+ +--------+ | | +--------+ +--------+ +--------+ | | | Chunk 1| |Chunk 2 | |Chunk 3 | | | | Chunk 1| |Chunk 2 | |Chunk 3 | | | +--------+ +--------+ +--------+ | | +--------+ +--------+ +--------+ | | | Chunk 4| |Chunk 5 | |Chunk 6 | | | | Chunk 4| |Chunk 5 | |Chunk 6 | | | +--------+ +--------+ +--------+ |<--->| +--------+ +--------+ +--------+ | | | Chunk 7| |Chunk 8 | |Chunk 9 | | | | Chunk 7| |Chunk 8 | |Chunk 9 | | | +--------+ +--------+ +--------+ | | +--------+ +--------+ +--------+ | | Disk A Disk B Disk C | | Disk D Disk E Disk F | +------------------------------------+ +------------------------------------+ Stripe 1 Stripe 2 | | +-----------------------------------Mirror--------------------------------------+Advantage: Simple administrative model. Issue one command to create the first stripe, a second command to create the second stripe, and a third command to mirror them. Three commands and you're done, regardless of the number of disks in the configuration.
Disadvantage: An error on any one of the disks kills redundancy for all disks. For instance, a failure on Disk B above 'breaks' the Stripe 1 side of the mirror. As a result, should disk D, E, or F fail as well, the entire mirror becomes unusable.
RAID 1+0: In RAID 1+0 the opposite approach is taken. The disks are mirrored first, then the mirrors are combined together into a stripe:
+-----------------+ +-----------------+ +-----------------+ | Chunk 1 | | Chunk 2 | | Chunk 3 | +-----------------+ +-----------------+ +-----------------+ | Chunk 4 | | Chunk 5 | | Chunk 6 | +-----------------+ +-----------------+ +-----------------+ | Chunk 7 | | Chunk 8 | | Chunk 9 | +-----------------+ +-----------------+ +-----------------+ Disk A <---> Disk B Disk C <---> Disk D Disk E <---> Disk F Mirror 1 Mirror 2 Mirror 3 | | +-----------------------------------------------------------+ StripeAdvantage: A failure on one disk only impacts redundancy for the chunks of the stripe that are located on that disk. For instance, a failure on Disk B above only loses redundancy for every third chunk (1, 4, 7, etc.) Redundancy for the other stripe chunks is unaffected, so a second disk failure could be tolerated as long as the second failure wasn't on Disk A.
Disadvantage: More complicated from an administrative standpoint. The administrator needs to issue one creation command per mirror, then a command to stripe across the mirrors. The six-disk example above would require four commands to create, while a twelve disk configuration would require seven commands.SVM specifics
So, does SVM do RAID 0+1 or RAID 1+0? The answer is, "Yes." So it gives you a choice between the two? The answer is "No."
Obviously further explanation is necessary...
In SVM, mirror devices cannot be created from "bare" disks. You are required to create the mirror on top of another type of SVM metadevice, known as a concat/stripe*. SVM combines concatenations and stripes into a single metadevice type, in which one or more stripes are concatenated together. When used to build a mirror these concat/stripe logical devices are known as submirrors. If you want to expand the size of a mirror device you can do so by concatenating additional stripe(s) onto the concat/stripe devices that are serving as submirrors.
So, in SVM, you are always required to set up a stripe (concat/stripe) in order to create a mirror. On the surface this makes it appear that SVM does RAID 0+1. However, once you understand a bit about the SVM mirror code, you'll find RAID 1+0 lurking under the covers.
SVM mirrors are logically divided up into regions. The state of each mirror region is recorded in state database replicas* stored on disk. By individually recording the state of each region in the mirror, SVM can be smart about how it performs a resync. Following a disk failure or an unusual event (e.g. a power failure occurs after the first side of a mirror has been written to but before the matching write to the second side can be accomplished), SVM can determine which regions are out-of-sync and only synchronize them, not the entire mirror. This is known as an optimized resync.
The optimized resync mechanisms allow SVM to gain the redundancy benefits of RAID 1+0 while keeping the administrative benefits of RAID 0+1. If one of the drives in a concat/stripe device fails, only those mirror regions that correspond to data stored on the failed drive will lose redundancy. The SVM mirror code understands the layout of the concat/stripe submirrors and can therefore determine which resync regions reside on which underlying devices. For all regions of the mirror not affected by the failure, SVM will continue to provide redundancy, so a second disk failure won't necessarily prove fatal.
So, in a nutshell, SVM provides a RAID 0+1 style administrative interface but effectively implements RAID 1+0 functionality. Administrators get the best of each type, the relatively simple administration of RAID 0+1 plus the greater resilience of RAID 1+0 in the case of multiple device failures.
* concat/stripe logical devices (metadevices)
The following example shows a concat/stripe metadevice that's serving as a submirror to a mirror metadevice. Note that the metadevice is a concatenation of three separate stripes:
- Stripe 0 is a 1-way stripe (so not really striped at all) on disk slice c1t11d0s0.
- Stripe 1 is a 1-way stripe on disk slice c1t12d0s0.
- Stripe 2 is a 2-way stripe with an interlace size of 32 blocks on disk slices c1t13d0s1 and c1t14d0s2.
d1: Submirror of d0 State: Okay Size: 78003 blocks (38 MB) Stripe 0: Device Start Block Dbase State Reloc Hot Spare c1t11d0s0 0 No Okay Yes Stripe 1: Device Start Block Dbase State Reloc Hot Spare c1t12d0s0 0 No Okay Yes Stripe 2: (interlace: 32 blocks) Device Start Block Dbase State Reloc Hot Spare c1t13d0s1 0 No Okay Yes c1t14d0s2 0 No Okay Yes** State database replicas
SVM stores configuration and state information in a 'state database' in memory. Copies of this state database are stored on disk, where they are referred to as state database replicas. The primary purpose of the state database replicas is to provide non-volatile copies of the state database so that the SVM configuration is persistant across reboots. A secondary purpose of the replicas is to provide a 'scratch pad' to keep track of mirror region states.
Andre Molyneux's Weblog Weblog
- RAID0 is striping, in which data is spread across multiple spindles (disks). Data written to the RAID0 is broken up into chunks of a specified size (interlace value) and spread across the disks:
+--------+ +--------+ | Chunk 1| |Chunk 2 | +--------+ +--------+ | Chunk 3| |Chunk 4 | +--------+ +--------+ | Chunk 5| |Chunk 6 | +--------+ +--------+ Disk A Disk B | | +--------Stripe---------+In this case the term RAID is a misnomer as no redundancy is gained. The address space is 'interlaced' across the disks, improving performance for I/Os bigger than the interlace/chunk size as a single I/O will spread across multiple disks. In the example above, for an interlace size of 16k, the first 16k of data would reside on Disk A, the second 16k on Disk B, the third 16k on Drive A, and so on. The interlace size needs to be chosen when the logical device is created and can't be changed later without recreating the logical device from scratch.
RAID1: RAID1 is mirroring. Every byte of data written to the mirror is duplicated on both disks:
+--------+ +--------+ | Copy A | <-> | Copy B | +--------+ +--------+ Disk A Disk B | | +--------Mirror---------+The advantages are that you can lose either disk and the data will still be accessible, and reads can be alternated between the two disks to improve performance. The drawbacks are that you've doubled your storage costs and incurred additional overhead by having to generate two writes to physical devices for every one that's done to the logical mirror device.
Concatenation: One additional type of logical device that's involved when combining stripes and mirrors in SVM is a concatenation. There is no RAID nomencalture associated with a 'concat':
+--------+ | Disk A | +--------+ | Disk B | +--------+ | Disk C | +--------+ ConcatA concatenation aggregates several smaller physical devices into one large logical device. Unlike a stripe, the address space isn't interlaced across the underlying devices. This means there's no performance gain from using a concatenated device.
Since RAID0 improves performance, and RAID1 provides redundancy, someone came up with the idea to combine them. Fast and reliable. Two great tastes that taste great together!
When combining these two types of 'logical' devices there's a choice to be made -- do you mirror two stripes, or do you stripe across multiple mirrors? There are pros and cons to each approach:
- RAID 0+1: In RAID 0+1, the stripes are created first, then are mirrored together. Logically, the resulting device looks like:
+------------------------------------+ +------------------------------------+ | +--------+ +--------+ +--------+ | | +--------+ +--------+ +--------+ | | | Chunk 1| |Chunk 2 | |Chunk 3 | | | | Chunk 1| |Chunk 2 | |Chunk 3 | | | +--------+ +--------+ +--------+ | | +--------+ +--------+ +--------+ | | | Chunk 4| |Chunk 5 | |Chunk 6 | | | | Chunk 4| |Chunk 5 | |Chunk 6 | | | +--------+ +--------+ +--------+ |<--->| +--------+ +--------+ +--------+ | | | Chunk 7| |Chunk 8 | |Chunk 9 | | | | Chunk 7| |Chunk 8 | |Chunk 9 | | | +--------+ +--------+ +--------+ | | +--------+ +--------+ +--------+ | | Disk A Disk B Disk C | | Disk D Disk E Disk F | +------------------------------------+ +------------------------------------+ Stripe 1 Stripe 2 | | +-----------------------------------Mirror--------------------------------------+Advantage: Simple administrative model. Issue one command to create the first stripe, a second command to create the second stripe, and a third command to mirror them. Three commands and you're done, regardless of the number of disks in the configuration.
Disadvantage: An error on any one of the disks kills redundancy for all disks. For instance, a failure on Disk B above 'breaks' the Stripe 1 side of the mirror. As a result, should disk D, E, or F fail as well, the entire mirror becomes unusable.
RAID 1+0: In RAID 1+0 the opposite approach is taken. The disks are mirrored first, then the mirrors are combined together into a stripe:
+-----------------+ +-----------------+ +-----------------+ | Chunk 1 | | Chunk 2 | | Chunk 3 | +-----------------+ +-----------------+ +-----------------+ | Chunk 4 | | Chunk 5 | | Chunk 6 | +-----------------+ +-----------------+ +-----------------+ | Chunk 7 | | Chunk 8 | | Chunk 9 | +-----------------+ +-----------------+ +-----------------+ Disk A <---> Disk B Disk C <---> Disk D Disk E <---> Disk F Mirror 1 Mirror 2 Mirror 3 | | +-----------------------------------------------------------+ StripeAdvantage: A failure on one disk only impacts redundancy for the chunks of the stripe that are located on that disk. For instance, a failure on Disk B above only loses redundancy for every third chunk (1, 4, 7, etc.) Redundancy for the other stripe chunks is unaffected, so a second disk failure could be tolerated as long as the second failure wasn't on Disk A.
Disadvantage: More complicated from an administrative standpoint. The administrator needs to issue one creation command per mirror, then a command to stripe across the mirrors. The six-disk example above would require four commands to create, while a twelve disk configuration would require seven commands.SVM specifics
So, does SVM do RAID 0+1 or RAID 1+0? The answer is, "Yes." So it gives you a choice between the two? The answer is "No."
Obviously further explanation is necessary...
In SVM, mirror devices cannot be created from "bare" disks. You are required to create the mirror on top of another type of SVM metadevice, known as a concat/stripe*. SVM combines concatenations and stripes into a single metadevice type, in which one or more stripes are concatenated together. When used to build a mirror these concat/stripe logical devices are known as submirrors. If you want to expand the size of a mirror device you can do so by concatenating additional stripe(s) onto the concat/stripe devices that are serving as submirrors.
So, in SVM, you are always required to set up a stripe (concat/stripe) in order to create a mirror. On the surface this makes it appear that SVM does RAID 0+1. However, once you understand a bit about the SVM mirror code, you'll find RAID 1+0 lurking under the covers.
SVM mirrors are logically divided up into regions. The state of each mirror region is recorded in state database replicas* stored on disk. By individually recording the state of each region in the mirror, SVM can be smart about how it performs a resync. Following a disk failure or an unusual event (e.g. a power failure occurs after the first side of a mirror has been written to but before the matching write to the second side can be accomplished), SVM can determine which regions are out-of-sync and only synchronize them, not the entire mirror. This is known as an optimized resync.
The optimized resync mechanisms allow SVM to gain the redundancy benefits of RAID 1+0 while keeping the administrative benefits of RAID 0+1. If one of the drives in a concat/stripe device fails, only those mirror regions that correspond to data stored on the failed drive will lose redundancy. The SVM mirror code understands the layout of the concat/stripe submirrors and can therefore determine which resync regions reside on which underlying devices. For all regions of the mirror not affected by the failure, SVM will continue to provide redundancy, so a second disk failure won't necessarily prove fatal.
So, in a nutshell, SVM provides a RAID 0+1 style administrative interface but effectively implements RAID 1+0 functionality. Administrators get the best of each type, the relatively simple administration of RAID 0+1 plus the greater resilience of RAID 1+0 in the case of multiple device failures.
* concat/stripe logical devices (metadevices)
The following example shows a concat/stripe metadevice that's serving as a submirror to a mirror metadevice. Note that the metadevice is a concatenation of three separate stripes:
- Stripe 0 is a 1-way stripe (so not really striped at all) on disk slice c1t11d0s0.
- Stripe 1 is a 1-way stripe on disk slice c1t12d0s0.
- Stripe 2 is a 2-way stripe with an interlace size of 32 blocks on disk slices c1t13d0s1 and c1t14d0s2.
d1: Submirror of d0 State: Okay Size: 78003 blocks (38 MB) Stripe 0: Device Start Block Dbase State Reloc Hot Spare c1t11d0s0 0 No Okay Yes Stripe 1: Device Start Block Dbase State Reloc Hot Spare c1t12d0s0 0 No Okay Yes Stripe 2: (interlace: 32 blocks) Device Start Block Dbase State Reloc Hot Spare c1t13d0s1 0 No Okay Yes c1t14d0s2 0 No Okay Yes** State database replicas
SVM stores configuration and state information in a 'state database' in memory. Copies of this state database are stored on disk, where they are referred to as state database replicas. The primary purpose of the state database replicas is to provide non-volatile copies of the state database so that the SVM configuration is persistant across reboots. A secondary purpose of the replicas is to provide a 'scratch pad' to keep track of mirror region states.
[PDF]
Transitioning to Solaris™ Volume Manager - Sun Microsystems
[PDF]
Solaris Volume Manager Performance Best Practices
docs.sun.com Solaris 9 Installation Guide
docs.sun.com: Solaris Volume Manager Administration Guide
Solaris Volume Manager Performance Best Practices
Sun Solaris Solaris Volume Manager - (Solaris 9)
Sun Solaris Solaris Volume Manager - (Solaris 9)
Solaris Volume Manager Data Sheet - (Solaris 9)
Introduction
Installing
Command Line Tools
Starting the Volume Manager Console GUI - (Enhanced Storage)
Volume Manager Components
Creating Volumes - (Using Solaris 9 Volume Manager Commands)
State Database - (State Database Replicas)
Creating a Stripe - (RAID 0)
Creating a Concatenation - (RAID 0)
Creating Mirrors - (RAID 1)
Creating a RAID5 Volume - (RAID 5)
Creating a Hot Spare
Removing Volumes - (Using Solaris 9 Volume Manager Commands)
Removing a State Database Replica
Removing a Stripe - (RAID 0)
Removing a Concatenation - (RAID 0)
Removing a Mirror - (RAID 1)
Removing a RAID5 Volume - (RAID 5)
Removing a Hot Spare
Mirroring Disks with Solstice DiskSuite
Solaris Volume Manager - Soft Partitioning Explained
ITworld.com - Setting Up RAID Volumes with Solaris Volume Manager ...
Managing Disks: Solaris Volume Manager
Configuring Boot Disks With Solaris Volume Manager Software (October 2002)
-by Erik Vanden Meersch and Kristien Hens
This article is an update to the April 2002 Sun BluePrints OnLine article,
Configuring
Boot Disks With Solstice DiskSuite Software. This article focuses on the
Solaris 9 Operating Environment, Solaris Volume Manager software, and VERITAS Volume
Manager 3.2 software. It describe how to partition and mirror the system disk, and
how to create and maintain a backup system disk. In addition, this article presents
technical arguments for the choices made, and includes detailed runbooks.
Solaris
Volume Manager Performance Best Practices (November 2003)
-by Glenn Fawcett
Compelling new features such as soft partitioning and automatic device relocation
make the Solaris Volume Manager software a viable candidate for storage management
needs. Solaris Volume Manager software features enhance storage management capabilities
beyond what is handled by intelligent storage arrays with hardware RAID. Now Solaris
Volume Manager software is integrated with the Solaris Operating Environment (Solaris
OE) and does not require additional license fees. This article provides specific
Solaris Volume Manager tips for system, storage, and database administrators who
want get the most of Solaris Volume Manager software in their data centers. This
article targets an intermediate audience.
Configuring
Boot Disks (December 2001)
-by John S. Howard and David Deeths
This article is the fourth chapter of the Sun BluePrints book titled Boot Disk
Management: A Guide For The Solaris Operating Environment (ISBN 0-13-062153-6),
which is available through www.sun.com/books, amazon.com, and Barnes & Noble bookstores.
This chapter presents a reference configuration of the root disk and associated
disks that emphasizes the value of configuring a system for high availability and
high serviceability. This chapter explains the value of creating a system with both
of these characteristics, and outlines the methods used to do so.
Wide
Thin Disk Striping (October 2000)
-by Bob Larson
In this article, the technique of using stripes to distribute data and indexes over
several disks is described. The article also contains the recommendations to use
wide-thin stripes to maximize operational flexibility while minimizing complexity.
DiskSuite 4.2.1 Reference Guide
docs.sun.com: Solaris Volume Manager Administration Guide
Copyright © 1996-2008 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. Submit comments This document is an industrial compilation designed and created exclusively for educational use and is placed under the copyright of the Open Content License(OPL). Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.
Standard disclaimer: The statements, views and opinions presented on this web page are those of the author and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.
Last modified: November 08, 2008