From owner-freebsd-doc@FreeBSD.ORG Tue Apr 8 14:40:03 2008 Return-Path: Delivered-To: freebsd-doc@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F296F106566C for ; Tue, 8 Apr 2008 14:40:02 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id DDE108FC0C for ; Tue, 8 Apr 2008 14:40:02 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.2/8.14.2) with ESMTP id m38Ee2il007967 for ; Tue, 8 Apr 2008 14:40:02 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.2/8.14.1/Submit) id m38Ee2rv007966; Tue, 8 Apr 2008 14:40:02 GMT (envelope-from gnats) Date: Tue, 8 Apr 2008 14:40:02 GMT Message-Id: <200804081440.m38Ee2rv007966@freefall.freebsd.org> To: freebsd-doc@FreeBSD.org From: Federico Galvez-Durand Cc: Subject: Re: docs/122052: minor update on handbook section 20.7.1 X-BeenThere: freebsd-doc@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Federico Galvez-Durand List-Id: Documentation project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Apr 2008 14:40:03 -0000 The following reply was made to PR docs/122052; it has been noted by GNATS. From: Federico Galvez-Durand To: FreeBSD-gnats-submit@FreeBSD.org, freebsd-doc@FreeBSD.org Cc: Subject: Re: docs/122052: minor update on handbook section 20.7.1 Date: Tue, 8 Apr 2008 07:10:36 -0700 (PDT) --0-968980677-1207663836=:26150 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Content-Id: Content-Disposition: inline Well, now the minor update is not that minor. Find attached a patch file. This patch -> deprecates: handbook/vinum-object-naming.html handbook/vinum-access-bottlenecks.html handbook/vinum/vinum-concat.png handbook/vinum/vinum-raid10-vol.png handbook/vinum/vinum-simple-vol.png handbook/vinum/vinum-striped.png handbook/vinum/vinum-mirrored-vol.png handbook/vinum/vinum-raid5-org.png handbook/vinum/vinum-striped-vol.png creates: handbook/vinum-disk-performance-issues.html handbook.new/vinum/vinum-concat.png handbook.new/vinum/vinum-raid01.png handbook.new/vinum/vinum-raid10.png handbook.new/vinum/vinum-simple.png handbook.new/vinum/vinum-raid0.png handbook.new/vinum/vinum-raid1.png handbook.new/vinum/vinum-raid5.png updates: all remaining handbook/vinum-*.html handbook/raid.html handbook/virtualization.html. I think I cannot attach the new PNG files here. Please, advise how to submit them. . ____________________________________________________________________________________ You rock. That's why Blockbuster's offering you one month of Blockbuster Total Access, No Cost. http://tc.deals.yahoo.com/tc/blockbuster/text5.com --0-968980677-1207663836=:26150 Content-Type: text/plain; name="patch01.txt" Content-Description: 2837643882-patch01.txt Content-Disposition: inline; filename="patch01.txt" diff -r -u handbook.orig/docbook.css handbook/docbook.css --- handbook.orig/docbook.css 2008-03-22 05:33:04.000000000 +0100 +++ handbook/docbook.css 2008-04-05 15:28:57.000000000 +0200 @@ -129,6 +129,26 @@ color: #000000; } +TABLE.CLASSTABLE { + border-collapse: collapse; + border-top: 2px solid gray; + border-bottom: 2px solid gray; +} + +TABLE.CLASSTABLE TH { + border-top: 2px solid gray; + border-right: 1px solid gray; + border-left: 1px solid gray; + border-bottom: 2px solid gray; +} + +TABLE.CLASSTABLE TD { + border-top: 1px solid gray; + border-right: 1px solid gray; + border-left: 1px solid gray; + border-bottom: 1px solid gray; +} + .FILENAME { color: #007a00; } diff -r -u handbook.orig/vinum-config.html handbook/vinum-config.html --- handbook.orig/vinum-config.html 2008-03-22 05:43:54.000000000 +0100 +++ handbook/vinum-config.html 2008-04-08 14:56:10.000000000 +0200 @@ -7,8 +7,8 @@ - - + + @@ -22,7 +22,7 @@ -Prev Chapter 20 The Vinum Volume Manager
-

20.8 Configuring -Vinum

+

20.6 Configuring Vinum

The GENERIC kernel does not contain Vinum. It is possible to build a special kernel which includes Vinum, but this is not recommended. The standard way to start Vinum is as a kernel module (kld). You do -not even need to use kldload(8) for Vinum: +not even need to use + +kldload(8) +for Vinum: when you start gvinum(8), it checks +href="http://www.FreeBSD.org/cgi/man.cgi?query=gvinum&sektion=8"> +gvinum(8), +it checks whether the module has been loaded, and if it is not, it loads it automatically.

-

20.8.1 Startup

+

20.6.1 Preparing a disk

+

Vinum needs a + +bsdlabel(8), +on your disk. +

Assuming +/dev/ad1 +is the device in use and your Vinum Volume will use the whole disk, it is advisable to initialize the device with a single Slice, using + +fdisk(8). The following command creates +a single Slice +s1 +over the whole disk +/dev/ad1. +

+
 +#fdisk -vI ad1
 +
+ +

After creating +the disk Slice, it can be labeled: +

+
 +#bsdlabel -w ad1s1
 +
+ +

The bsdlabel utility can not write an adequate label for Vinum automatically, you need to edit the standard label:

+
 +#bsdlabel -e ad1s1
 +
+

Will show you something similar to this:

+
 +# /dev/ad1s1:
 +8 partitions:
 +#        size   offset    fstype   [fsize bsize bps/cpg]
 +  a:  1048241       16    unused        0     0     0                    
 +  c:  1048257        0    unused        0     0         # "raw" part, don't edit
 +
+ +

You need to edit the partitions. Once this disk is not bootable (it could be, see section +Section 20.7), you could rename partition +a +to partition +h +and replace +fstype +unused +with +vinum. +The fields fsize bsize bps/cpg have no meaning for +fstype vinum. +

+
 +# /dev/ad1s1:
 +8 partitions:
 +#        size   offset    fstype   [fsize bsize bps/cpg]
 +  c:  1048257        0    unused        0     0         # "raw" part, don't edit
 +  h:  1048241       16     vinum                    
 +
+
+ +
+

20.6.2 Configuration File

+

This file can be placed anywhere in your system. After executing the instructions in this file + +gvinum(8) + will not use it anymore. Everything is stored in a database. But you should keep this file in a safe place, you may need it in case of a Volume crash. +

+

The following configuration creates a Volume named +Simple +containing a drive named +DiskB +based on the device +/dev/ad1s1. The +plex +organization is +concat +and contains only one +subdisk (sd) +. +

+
 +drive diskB device /dev/ad1s1h
 +volume Simple 
 +	plex org concat
 +	sd drive diskB
 +
+
+ +
+

20.6.3 Creating a Volume

+ +

Once you have prepared your disk and created a configuration file, you can use + +gvinum(8) +to create a Volume. +

+
 +#gvinum create Simple
 +1 drive:
 +D diskB                 State: up	/dev/ad1s1h	A: 0/511 MB (0%)
 +
 +1 volume:
 +V Simple                State: up	Plexes:       1	Size:        511 MB
 +
 +1 plex:
 +P Simple.p0           C State: up	Subdisks:     1	Size:        511 MB
 +
 +1 subdisk:
 +S Simple.p0.s0          State: up	D: diskB        Size:        511 MB
 +
+ + +

At this point, a new entry has been created for your Volume:

+ +
 +#ls -l /dev/gvinum
 +crw-r-----  1 root  operator    0,  89 Mar 26 17:17 /dev/gvinum/Simple
 +
 +/dev/gvinum/plex:
 +total 0
 +crw-r-----  1 root  operator    0,  86 Mar 26 17:17 Simple.p0
 +
 +/dev/gvinum/sd:
 +total 0
 +crw-r-----  1 root  operator    0,  83 Mar 26 17:17 Simple.p0.s0
 +
+ +
+ +
+

20.6.4 Starting a Volume

+ +

After creating a Volume you need to allow the system access to the objects:

+
 +#gvinum start Simple
 +
+ +

The Starting process can be slow, depending on the size of the subdisk or subdisks contained in your plex. Enter gvinum and use the option +l +to see whether the status of all your subdisks is already +"up" +. +

+

A message is printed by gvinum for each subdisk's Start Process completed.

+
+ +
+

20.6.5 Creating a File System

+ +

After having created a Volume, you need to create a file system using + +newfs(8) +

-

Vinum stores configuration information on the disk slices in essentially the same form +

 +#newfs /dev/gvinum/Simple
 +
+

If no errors are reported, you should check the file system:

+
 +#fsck -t ufs /dev/gvinum/Simple
 +
+

If no errors are reported, you can mount the file system:

+
 +#mount /dev/gvinum/Simple /mnt
 +
+ +

At this point, if everything seems to be right, it is desirable to reboot your machine and perform the following test:

+
 +#fsck -t ufs /dev/gvinum/Simple
 +
+

If no errors are reported, you can mount the file system:

+
 +#mount /dev/gvinum/Simple /mnt
 +
+

If everything looks fine now, then you have succeeded creating a Vinum Volume.

+ +
+ +
+

20.6.6 Mounting a Volume Automatically

+ +

In order to have your Volumes mounted automatically you need two things:

+
    +
  • +Set + geom_vinum_load="YES" +in +/boot/loader.conf. +
  • +
  • +Add an entry in + /etc/fstab +for your Volume (e.g. Simple). The mountpoint in this example is the directory + /space . See + +fstab(5) +and + +mount(8) +for details. +
  • +
     +#
     +# Device                Mountpoint  FStype  Options     Dump    Pass#
     +#
     +[...]
     +/dev/gvinum/Simple      /space      ufs     rw          2       2
     +
    + +
+

Your Volumes will be checked by + +fsck(8) +at boot time if you specify non zero values for + Dump +and + Pass +fields.

+ +
+ +
+

20.6.7 Troubleshooting

+ +
+

20.6.7.1 Creating a File System

+

The process of Starting a Volume may take long; you must be sure this process has been completed before creating a file system. At the moment this manual is written, + +newfs(8) +will not complain if you try to create a file system and the Starting process is still in progress. Even running + +fsck(8) +on your new file system may tell you everything is OK. But most probably you will not be able to use the Volume later on, after rebooting your machine. +

+ +

In case your Volume does not pass the checkup, you may try to repeat the process one more time:

+
 +#gvinum start Simple
 +#newfs /dev/gvinum/Simple
 +#fsck -t ufs /dev/gvinum/Simple
 +
+

If everything looks fine, then reboot your machine.

+
 +#shutdown -r now
 +
+

Then execute again:

+
 +#fsck -t ufs /dev/gvinum/Simple
 +#mount /dev/gvinum/Simple /mnt
 +
+

It should work without problem.

+ +
+ +
+

20.6.8 Miscellaneous Notes

+ +

Vinum stores configuration information on disk slices in essentially the same form as in the configuration files. When reading from the configuration database, Vinum recognizes a number of keywords which are not allowed in the configuration files. For example, a disk configuration might contain the following text:

@@ -86,18 +343,11 @@ to identify drives correctly even if they have been assigned different UNIX® drive IDs.

-
-

20.8.1.1 Automatic -Startup

- -
-
-

Note: This information only relates to the historic Vinum implementation. Gvinum always features an automatic -startup once the kernel module is loaded.

-
+
+

20.6.8 Differences for FreeBSD 4.X

+

In order to start Vinum automatically when you boot the system, ensure that you have the following line in your /etc/rc.conf:

@@ -119,8 +369,7 @@ does not matter which drive is read. After a crash, however, Vinum must determine which drive was updated most recently and read the configuration from this drive. It then updates the configuration if necessary from progressively older drives.

-
-
+
diff -r -u handbook.orig/vinum-data-integrity.html handbook/vinum-data-integrity.html --- handbook.orig/vinum-data-integrity.html 2008-03-22 05:43:54.000000000 +0100 +++ handbook/vinum-data-integrity.html 2008-04-08 13:00:38.000000000 +0200 @@ -7,7 +7,7 @@ - + @@ -22,7 +22,7 @@ -Prev Chapter 20 The Vinum Volume Manager
-

20.4 Data -Integrity

+

20.4 Data Integrity

-

The final problem with current disks is that they are unreliable. Although disk drive -reliability has increased tremendously over the last few years, they are still the most -likely core component of a server to fail. When they do, the results can be catastrophic: -replacing a failed disk drive and restoring data to it can take days.

- -

The traditional way to approach this problem has been mirroring, keeping two copies of the data on different -physical hardware. Since the advent of the RAID -levels, this technique has also been called RAID level -1 or RAID-1. Any write to the volume writes -to both locations; a read can be satisfied from either, so if one drive fails, the data -is still available on the other drive.

+

Although disk drive reliability has increased tremendously over the last few years, they are still the most likely core component of a server to fail. When they do, the results can be catastrophic: replacing a failed disk drive and restoring data to it can take a long time.

-

Mirroring has two problems:

+

The traditional way to approach this problem has been mirroring, keeping two copies of the data on different physical hardware. Since the advent of the RAID levels, this technique has also been called RAID level 1 or RAID-1.

+ +

An alternative solution is using an +error-correcting code. +This strategy is implemented in the RAID levels 2, 3, 4, 5 and 6. Of these, RAID-5 is the most interesting; for each data block a simple +parity check code is generated and stored as part of each stripe. +For arrays with large number of disks, the RAID-5 might not provide enough protection; in this case more complex error-correcting codes (e.g. Reed-Solomon) may provide better results. +

+

RAID levels can be nested to create other RAID configurations with improved resilience. Of these, RAID 0+1 and RAID 1+0 are explained here. Under certain conditions, these arrays can work in degraded mode with up to N/2 broken disks. However, having two Disks broken can stop the arrays, if they fail in the right positions. In both cases, having only one disk down is fully tolerated.

+

+Therefore, whenever you think about a failure in a RAID-0+1 or RAID-1+0 you are considering either the probability of having two disks failing at the same time or not having replaced the first broken disk before the second fails. On top of that, the second disk needs to fail in a very specific position inside the array. +

+

+In modern storage facilities, critical mission arrays are implemented using HOT PLUG technology, allowing a broken Disk to be replaced without having to stop the array. The probability of having a second Disk failing before having replaced the first Disk broken is mathematically clear. However the Possibility of such an event must be estimated first and it depends mainly on security policies and stock management paradigms beyond the scope of this discussion.

+

Therefore, a more interesting discussion about RAID-0+1 and RAID-1+0 reliability should be based on the Mean Time Before Failure (MTBF) of the devices in use and on other variables provided by the disk drive constructor and the storage facility administration. +

+

For the sake of simplicity, all disks (N) in a RAID are considered to have the same capacity (CAP) and R/W characteristics. This is not mandatory in all cases.

+

In the Figures, Data stored in a RAID is represented by (X;Y;Z). Data striped along an array of disks is represented by (X0,X1,X2...; Y0,Y1,Y2...; Z0,Z1,Z2...). +

+ +
+

20.4.1 RAID-1: Mirror

+

In a mirrored array, any write to the volume writes to both disks; a read can be satisfied from either disk, so if one fails, the data is still available on the other one.

+ +
+

+

Figure 20-3. RAID-1 Organization

+
  • -

    The price. It requires twice as much disk storage as a non-redundant solution.

    +

    The total storage capacity is CAP*N/2.

  • -

    The performance impact. Writes must be performed to both drives, so they take up twice -the bandwidth of a non-mirrored volume. Reads do not suffer from a performance penalty: -it even looks as if they are faster.

    +

    The Write performance is impacted because all data must be stored in both drives, so it takes up twice the bandwidth of a non-mirrored volume. Reads do not suffer from a performance penalty. +

-

An alternative solution is parity, implemented in the RAID levels 2, 3, 4 and 5. Of these, RAID-5 is the most interesting. As implemented in Vinum, it is -a variant on a striped organization which dedicates one block of each stripe to parity of -the other blocks. As implemented by Vinum, a RAID-5 -plex is similar to a striped plex, except that it implements RAID-5 by including a parity block in each stripe. As required -by RAID-5, the location of this parity block changes -from one stripe to the next. The numbers in the data blocks indicate the relative block -numbers.

+
+ +
+

20.4.2 RAID-5

-

+

As implemented in Vinum, it is a variant on a plex striped organization which dedicates one block of each stripe to parity of the other blocks (Px,Py,Pz). +As required by RAID-5, the location of this parity block changes from one stripe to the next. The numbers in the data blocks indicate the relative block numbers (X0,X1,Px; Y0,Py,Y1; Pz,Z0,Z1;...).

-

Figure 20-3. RAID-5 Organization

+

+

Figure 20-4. RAID-5 Organization

+
+ +
    +
  • +The total capacity of the array is equal to (N-1)*CAP. +

  • +
  • +At least 3 disks are necessary. +

  • +
  • +Read access is similar to that of striped organizations but write access is significantly slower. In order to update (write) one striped block you need to read the other striped blocks and compute the parity block again before writing the new block and the new parity. This effect can be mitigated by using systems with large R/W cache memory, then you do not need to read the other blocks again in order to compute the new parity. +

  • + +
  • +If one drive fails, the array can continue to operate in degraded mode: a read from one of the remaining accessible drives continues normally, but a read from the failed drive is recalculated from the corresponding block on all the remaining drives. +

  • +
-

-
-
-

Compared to mirroring, RAID-5 has the advantage of -requiring significantly less storage space. Read access is similar to that of striped -organizations, but write access is significantly slower, approximately 25% of the read -performance. If one drive fails, the array can continue to operate in degraded mode: a -read from one of the remaining accessible drives continues normally, but a read from the -failed drive is recalculated from the corresponding block from all the remaining -drives.

+ +
+

20.4.3 RAID-0+1

+ +

+In Vinum, a RAID-0+1 can be straightforward constructed by concatenating two striped plex. In this array, resilience is improved and more than one disk can fail without compromising the functionality. Performance is degraded when the array is forced to work without the full set of disks. +

+
+

+

Figure 20-5. RAID-0+1 Organization

+
    +
  • +The total storage capacity is CAP*N/2. +

  • +
  • +At least 4 disks are necessary. +

  • +
  • +This array will stop working when one disk fails in each of the mirrors (e.g. DiskB and DiskF) but it could work in degraded mode with N/2 disks down as long as they are all in the same mirror (e.g. DiskE, DiskF and DiskG). + +

  • +
+ +
+ +
+

20.4.4 RAID-1+0

+ +

+In Vinum, a RAID-1+0 can not be constructed by a simple manipulation of plexes. You need to construct the mirrors (e.g., m0, m1, m3...) first and then use these mirrors into a striped plex. +In this array, resilience is improved and more than one disk can fail without compromising the functionality. Performance is degraded when the array is forced to work without the full set of disks. +

+ +
+

+

Figure 20-6. RAID-1+0 Organization

+
+ +
    +
  • +The total storage capacity is CAP*N/2. +

  • +
  • +At least 4 disks are necessary. +

  • +
  • +This array will stop working when two disks fail in the same mirrors (e.g. DiskB and DiskC) but it could work in degraded mode with N/2 disks down as long as they are not in the same mirror (e.g. DiskB, DiskE and DiskF). + +

  • +
+
+ +
-

20.6 Some -Examples

- -

Vinum maintains a configuration -database which describes the objects known to an individual system. Initially, -the user creates the configuration database from one or more configuration files with the -aid of the gvinum(8) utility -program. Vinum stores a copy of its configuration database on each disk slice (which -Vinum calls a device) under its -control. This database is updated on each state change, so that a restart accurately -restores the state of each Vinum object.

- +

20.8 Vinum Examples

+

+All Disks in the following examples are identical in capacity (512 Mb) and R/W characteristics. However, the size reported by +gvinum(8) + is 511 Mb. This is normal in a real case, when the Disk is not exactly 536870912 bytes and some space (approx. 8 KB) is reserved by the +bsdlabel(8). +The size used for the stripes is 256k in all examples. +

+

+For the sake of simplicity, only three stripes out of many are represented in the Figures. +

-

20.6.1 The Configuration File

+

20.8.1 A Simple Volume

The configuration file describes individual Vinum objects. The definition of a simple volume might be:

 -    drive a device /dev/da3h
 -    volume myvol
 -      plex org concat
 -        sd length 512m drive a
 +#cat simple.conf
 +drive diskB device /dev/ad1s1h
 +volume Simple 
 +	plex org concat
 +	sd drive diskB
  

This file describes four Vinum objects:

@@ -67,79 +66,65 @@

The drive line describes a disk partition (drive) and its location relative to the underlying hardware. It is given the symbolic name a. This separation of the symbolic names +class="emphasis">diskB. This separation of the symbolic names from the device names allows disks to be moved from one location to another without confusion.

  • The volume line describes a -volume. The only required attribute is the name, in this case myvol.

    +Vinum Volume. The only required attribute is the name, in this case Simple.

  • -

    The plex line defines a plex. +

    The plex line defines a Vinum Plex. The only required parameter is the organization, in this case concat. No name is necessary: the system automatically generates a name from the volume name by adding the suffix .px, -where x is the number of the plex +class="EMPHASIS">.p${x}, +where ${x} is the number of the plex in the volume. Thus this plex will be called myvol.p0.

    +class="EMPHASIS">Simple.p0.

  • -

    The sd line describes a subdisk. +

    The sd line describes a Vinum subdisk. The minimum specifications are the name of a drive on which to store it, and the length of the subdisk. As with plexes, no name is necessary: the system automatically assigns names derived from the plex name by adding the suffix .sx, -where x is the number of the +class="EMPHASIS">.s${x}, +where ${x} is the number of the subdisk in the plex. Thus Vinum gives this subdisk the name myvol.p0.s0.

    +class="EMPHASIS">Simple.p0.s0.

  • -

    After processing this file, gvinum(8) produces the -following output:

    - -
     -      # gvinum -> create config1
     -      Configuration summary
     -      Drives:         1 (4 configured)
     -      Volumes:        1 (4 configured)
     -      Plexes:         1 (8 configured)
     -      Subdisks:       1 (16 configured)
     -     
     -    D a                     State: up       Device /dev/da3h        Avail: 2061/2573 MB (80%)
     -    
     -    V myvol                 State: up       Plexes:       1 Size:        512 MB
     -    
     -    P myvol.p0            C State: up       Subdisks:     1 Size:        512 MB
     -    
     -    S myvol.p0.s0           State: up       PO:        0  B Size:        512 MB
     -
    - -

    This output shows the brief listing format of gvinum(8). It is -represented graphically in Figure -20-4.

    +

    After processing this file, +gvinum(8) +produces the following output:

    -

    +
     +# gvinum create simple.conf
     +1 drive:
     +D diskB                 State: up	/dev/ad1s1h	A: 0/511 MB (0%)
     +
     +1 volume:
     +V Simple                State: up	Plexes:       1	Size:        511 MB
     +
     +1 plex:
     +P Simple.p0           C State: up	Subdisks:     1	Size:        511 MB
     +
     +1 subdisk:
     +S Simple.p0.s0          State: up	D: diskB        Size:        511 MB
     +
    -

    Figure 20-4. A Simple Vinum Volume

    -

    +

    +

    Figure 20-4. A Simple Vinum Volume

    -
    -

    This figure, and the ones which follow, represent a volume, which contains the plexes, which in turn contain the subdisks. In this trivial example, the volume contains one plex, and the plex contains one subdisk.

    @@ -147,181 +132,320 @@

    This particular volume has no specific advantage over a conventional disk partition. It contains a single plex, so it is not redundant. The plex contains a single subdisk, so there is no difference in storage allocation from a conventional disk partition. The -following sections illustrate various more interesting configuration methods.

    +following sections illustrate more interesting configuration methods.

    +
    -

    20.6.2 Increased Resilience: -Mirroring

    +

    20.8.2 RAID-1: Mirrored set

    -

    The resilience of a volume can be increased by mirroring. When laying out a mirrored -volume, it is important to ensure that the subdisks of each plex are on different drives, +

    The resilience of a volume can be increased by mirroring +(Section 20.4.1). +When laying out a mirrored volume, it is important to ensure that the subdisks of each plex are on different drives, so that a drive failure will not take down both plexes. The following configuration mirrors a volume:

     -   drive b device /dev/da4h
     -    volume mirror
     -      plex org concat
     -        sd length 512m drive a
     -      plex org concat
     -        sd length 512m drive b
     +#cat mirror.conf
     +drive diskB device /dev/ad1s1h
     +drive diskC device /dev/ad2s1h
     +volume Mirror
     +	plex org concat
     +	sd drive diskB
     +	plex org concat
     +	sd drive diskC
      
    -

    In this example, it was not necessary to specify a definition of drive a again, since Vinum keeps track of all -objects in its configuration database. After processing this definition, the +

    +After processing this definition, the configuration looks like:

     -   Drives:         2 (4 configured)
     -    Volumes:        2 (4 configured)
     -    Plexes:         3 (8 configured)
     -    Subdisks:       3 (16 configured)
     -    
     -    D a                     State: up       Device /dev/da3h        Avail: 1549/2573 MB (60%)
     -    D b                     State: up       Device /dev/da4h        Avail: 2061/2573 MB (80%)
     -
     -    V myvol                 State: up       Plexes:       1 Size:        512 MB
     -    V mirror                State: up       Plexes:       2 Size:        512 MB
     -  
     -    P myvol.p0            C State: up       Subdisks:     1 Size:        512 MB
     -    P mirror.p0           C State: up       Subdisks:     1 Size:        512 MB
     -    P mirror.p1           C State: initializing     Subdisks:     1 Size:        512 MB
     -  
     -    S myvol.p0.s0           State: up       PO:        0  B Size:        512 MB
     -    S mirror.p0.s0          State: up       PO:        0  B Size:        512 MB
     -    S mirror.p1.s0          State: empty    PO:        0  B Size:        512 MB
     +#gvinum create mirror.conf
     +2 drives:
     +D diskC                 State: up	/dev/ad2s1h	A: 0/511 MB (0%)
     +D diskB                 State: up	/dev/ad1s1h	A: 0/511 MB (0%)
     +
     +1 volume:
     +V Mirror                State: up	Plexes:       2	Size:        511 MB
     +
     +2 plexes:
     +P Mirror.p1           C State: up	Subdisks:     1	Size:        511 MB
     +P Mirror.p0           C State: up	Subdisks:     1	Size:        511 MB
     +
     +2 subdisks:
     +S Mirror.p1.s0          State: up	D: diskC        Size:        511 MB
     +S Mirror.p0.s0          State: up	D: diskB        Size:        511 MB
      
    -

    Figure 20-5 shows the structure -graphically.

    - -

    -
    -

    Figure 20-5. A Mirrored Vinum Volume

    - -

    +

    +

    Figure 20-5. A RAID-1 Vinum Volume

    -
    -
    -

    In this example, each plex contains the full 512 MB of address space. As in the -previous example, each plex contains only a single subdisk.

    -

    20.6.3 Optimizing Performance

    +

    20.8.3 RAID-0: Striped set

    -

    The mirrored volume in the previous example is more resistant to failure than an -unmirrored volume, but its performance is less: each write to the volume requires a write -to both drives, using up a greater proportion of the total disk bandwidth. Performance +

    The RAID-1 volume in the previous example is more resistant to failure than a +simple volume, but it has inferior Writing performance because each Write to the volume requires a Write +to both drives, using a greater percentage of the total disk bandwidth. Performance considerations demand a different approach: instead of mirroring, the data is striped -across as many disk drives as possible. The following configuration shows a volume with a -plex striped across four disk drives:

    +(Section 20.3.2) +across as many disk drives as possible. This configuration does not provide data protection against failure. +The following configuration shows a volume with a +plex striped across three disk drives:

    + +
     +#cat striped.conf
     +drive diskB device /dev/ad1s1h
     +drive diskC device /dev/ad2s1h
     +drive diskD device /dev/ad3s1h
     +volume Stripes
     +	plex org striped 256k
     +	sd drive diskB
     +	sd drive diskC
     +	sd drive diskD
     +
     -   drive c device /dev/da5h
     -    drive d device /dev/da6h
     -    volume stripe
     -    plex org striped 512k
     -      sd length 128m drive a
     -      sd length 128m drive b
     -      sd length 128m drive c
     -      sd length 128m drive d
     -
    - -

    As before, it is not necessary to define the drives which are already known to Vinum. -After processing this definition, the configuration looks like:

    - -
     -   Drives:         4 (4 configured)
     -    Volumes:        3 (4 configured)
     -    Plexes:         4 (8 configured)
     -    Subdisks:       7 (16 configured)
     -  
     -    D a                     State: up       Device /dev/da3h        Avail: 1421/2573 MB (55%)
     -    D b                     State: up       Device /dev/da4h        Avail: 1933/2573 MB (75%)
     -    D c                     State: up       Device /dev/da5h        Avail: 2445/2573 MB (95%)
     -    D d                     State: up       Device /dev/da6h        Avail: 2445/2573 MB (95%)
     -  
     -    V myvol                 State: up       Plexes:       1 Size:        512 MB
     -    V mirror                State: up       Plexes:       2 Size:        512 MB
     -    V striped               State: up       Plexes:       1 Size:        512 MB
     -  
     -    P myvol.p0            C State: up       Subdisks:     1 Size:        512 MB
     -    P mirror.p0           C State: up       Subdisks:     1 Size:        512 MB
     -    P mirror.p1           C State: initializing     Subdisks:     1 Size:        512 MB
     -    P striped.p1            State: up       Subdisks:     1 Size:        512 MB
     -  
     -    S myvol.p0.s0           State: up       PO:        0  B Size:        512 MB
     -    S mirror.p0.s0          State: up       PO:        0  B Size:        512 MB
     -    S mirror.p1.s0          State: empty    PO:        0  B Size:        512 MB
     -    S striped.p0.s0         State: up       PO:        0  B Size:        128 MB
     -    S striped.p0.s1         State: up       PO:      512 kB Size:        128 MB
     -    S striped.p0.s2         State: up       PO:     1024 kB Size:        128 MB
     -    S striped.p0.s3         State: up       PO:     1536 kB Size:        128 MB
     +#gvinum create striped.conf
     +3 drives:
     +D diskD                 State: up	/dev/ad3s1h	A: 0/511 MB (0%)
     +D diskC                 State: up	/dev/ad2s1h	A: 0/511 MB (0%)
     +D diskB                 State: up	/dev/ad1s1h	A: 0/511 MB (0%)
     +
     +1 volume:
     +V Stripes               State: up	Plexes:       1	Size:       1534 MB
     +
     +1 plex:
     +P Stripes.p0          S State: up	Subdisks:     3	Size:       1534 MB
     +
     +3 subdisks:
     +S Stripes.p0.s2         State: up	D: diskD        Size:        511 MB
     +S Stripes.p0.s1         State: up	D: diskC        Size:        511 MB
     +S Stripes.p0.s0         State: up	D: diskB        Size:        511 MB
      

    -

    Figure 20-6. A Striped Vinum Volume

    -

    +

    +

    Figure 20.6. A Striped Vinum Volume

    +
    + +
    + +
    +

    20.8.4 RAID-5: Striped set with distributed parity

    +

    RAID-1 resilience can be improved by using a striped array with distributed parity, this configuration is known as RAID-5 +(Section 20.4.2). + The cost of this strategy is the space consumed by the parity data (usually the size of one disk of the array) and slower read/write access. The minimum number of disks required is 3 and the array continues operating, in degraded mode, when one disk fails. +

    + +
     +#cat raid5.conf
     +drive diskB device /dev/ad1s1h
     +drive diskC device /dev/ad2s1h
     +drive diskD device /dev/ad3s1h
     +volume Raid5 
     +	plex org raid5 256k
     +	sd drive diskB
     +	sd drive diskC
     +	sd drive diskD
     +
    + + +
     +#gvinum create raid5.conf
     +3 drives:
     +D diskD                 State: up	/dev/ad3s1h	A: 0/511 MB (0%)
     +D diskC                 State: up	/dev/ad2s1h	A: 0/511 MB (0%)
     +D diskB                 State: up	/dev/ad1s1h	A: 0/511 MB (0%)
     +
     +1 volume:
     +V Raid5                   State: up	Plexes:       1	Size:       1023 MB
     +
     +1 plex:
     +P Raid5.p0             R5 State: up	Subdisks:     3	Size:       1023 MB
     +
     +3 subdisks:
     +S Raid5.p0.s2             State: up	D: diskD        Size:        511 MB
     +S Raid5.p0.s1             State: up	D: diskC        Size:        511 MB
     +S Raid5.p0.s0             State: up	D: diskB        Size:        511 MB
     +
    + +
    +

    +

    Figure 20-7. A RAID-5 Vinum Volume

    -
    -
    -

    This volume is represented in Figure -20-6. The darkness of the stripes indicates the position within the plex address -space: the lightest stripes come first, the darkest last.

    -

    20.6.4 Resilience and -Performance

    +

    20.8.5 RAID 0+1

    -

    With sufficient hardware, it is +

    With sufficient hardware, it is possible to build volumes which show both increased resilience and increased performance compared to standard UNIX® partitions. A typical -configuration file might be:

    +configuration file for a RAID-0+1 +(Section 20.4.3). +might be:

    + +
     +#cat raid01.conf
     +drive diskB device /dev/da0s1h
     +drive diskC device /dev/da1s1h
     +drive diskD device /dev/da2s1h
     +drive diskE device /dev/da3s1h
     +drive diskF device /dev/da4s1h
     +drive diskG device /dev/da5s1h
     +volume RAID01
     +	plex org striped 256k
     +		sd drive diskB
     +		sd drive diskC
     +		sd drive diskD
     +	plex org striped 256k
     +		sd drive diskE
     +		sd drive diskF
     +		sd drive diskG
     +
     -   volume raid10
     -      plex org striped 512k
     -        sd length 102480k drive a
     -        sd length 102480k drive b
     -        sd length 102480k drive c
     -        sd length 102480k drive d
     -        sd length 102480k drive e
     -      plex org striped 512k
     -        sd length 102480k drive c
     -        sd length 102480k drive d
     -        sd length 102480k drive e
     -        sd length 102480k drive a
     -        sd length 102480k drive b
     +# gvinum create raid01.conf
     +6 drives:
     +D diskG                 State: up	/dev/da5s1h	A: 0/511 MB (0%)
     +D diskF                 State: up	/dev/da4s1h	A: 0/511 MB (0%)
     +D diskE                 State: up	/dev/da3s1h	A: 0/511 MB (0%)
     +D diskD                 State: up	/dev/da2s1h	A: 0/511 MB (0%)
     +D diskC                 State: up	/dev/da1s1h	A: 0/511 MB (0%)
     +D diskB                 State: up	/dev/da0s1h	A: 0/511 MB (0%)
     +
     +1 volume:
     +V RAID01               State: up	Plexes:       2	Size:       1535 MB
     +
     +2 plexes:
     +P RAID01.p1          S State: up	Subdisks:     3	Size:       1535 MB
     +P RAID01.p0          S State: up	Subdisks:     3	Size:       1535 MB
     +
     +6 subdisks:
     +S RAID01.p1.s2         State: up	D: diskG        Size:        511 MB
     +S RAID01.p1.s1         State: up	D: diskF        Size:        511 MB
     +S RAID01.p1.s0         State: up	D: diskE        Size:        511 MB
     +S RAID01.p0.s2         State: up	D: diskD        Size:        511 MB
     +S RAID01.p0.s1         State: up	D: diskC        Size:        511 MB
     +S RAID01.p0.s0         State: up	D: diskB        Size:        511 MB
      

    The subdisks of the second plex are offset by two drives from those of the first plex: this helps ensure that writes do not go to the same subdisks even if a transfer goes over two drives.

    -

    Figure 20-7 represents the -structure of this volume.

    - -

    +
    +

    +

    Figure 20-8. A RAID-0+1 Vinum Volume

    +
    -
    -

    Figure 20-7. A Mirrored, Striped Vinum Volume

    +
    -

    -
    -
    +
    +

    20.8.5 RAID 1+0

    + +

    With sufficient hardware, it is possible to build volumes which show both increased resilience and increased performance +compared to standard UNIX® partitions in more than one way. The RAID-1+0 configuration differs from RAID-0+1 in the way mirrors and stripes are used. A typical configuration file for a RAID-1+0 +(Section 20.4.4). +might be:

    + +
     +#cat raid10_ph1.conf
     +drive diskB device /dev/da0s1h
     +drive diskC device /dev/da1s1h
     +drive diskD device /dev/da2s1h
     +drive diskE device /dev/da3s1h
     +drive diskF device /dev/da4s1h
     +drive diskG device /dev/da5s1h
     +volume m0
     +	plex org concat
     +		sd drive diskB
     +	plex org concat
     +		sd drive diskC
     +volume m1
     +	plex org concat
     +		sd drive diskD
     +	plex org concat
     +		sd drive diskE
     +volume m2
     +	plex org concat
     +		sd drive diskF
     +	plex org concat
     +		sd drive diskG
     +
     +#cat raid10_ph2.conf
     +drive dm0 device /dev/gvinum/m0
     +drive dm1 device /dev/gvinum/m1
     +drive dm2 device /dev/gvinum/m2
     +
     +volume RAID10
     +	plex org striped 256k
     +		sd drive dm0
     +		sd drive dm1
     +		sd drive dm2
     +
    + +
     +#gvinum create raid10_ph1.conf
     +#gvinum create raid10_ph2.conf
     +
    + +
     +# gvinum list
     +9 drives:
     +D dm2                   State: up	/dev/gvinum/sd/m2.p0.s0	A: 0/511 MB (0%)
     +D dm1                   State: up	/dev/gvinum/sd/m1.p0.s0	A: 0/511 MB (0%)
     +D dm0                   State: up	/dev/gvinum/sd/m0.p0.s0	A: 0/511 MB (0%)
     +D diskG                 State: up	/dev/da5s1h	A: 0/511 MB (0%)
     +D diskF                 State: up	/dev/da4s1h	A: 0/511 MB (0%)
     +D diskE                 State: up	/dev/da3s1h	A: 0/511 MB (0%)
     +D diskD                 State: up	/dev/da2s1h	A: 0/511 MB (0%)
     +D diskC                 State: up	/dev/da1s1h	A: 0/511 MB (0%)
     +D diskB                 State: up	/dev/da0s1h	A: 0/511 MB (0%)
     +
     +4 volumes:
     +V RAID10                State: up	Plexes:       1	Size:       1534 MB
     +V m2                    State: up	Plexes:       2	Size:        511 MB
     +V m1                    State: up	Plexes:       2	Size:        511 MB
     +V m0                    State: up	Plexes:       2	Size:        511 MB
     +
     +7 plexes:
     +P RAID10.p0           S State: up	Subdisks:     3	Size:       1534 MB
     +P m2.p1               C State: up	Subdisks:     1	Size:        511 MB
     +P m2.p0               C State: up	Subdisks:     1	Size:        511 MB
     +P m1.p1               C State: up	Subdisks:     1	Size:        511 MB
     +P m1.p0               C State: up	Subdisks:     1	Size:        511 MB
     +P m0.p1               C State: up	Subdisks:     1	Size:        511 MB
     +P m0.p0               C State: up	Subdisks:     1	Size:        511 MB
     +
     +9 subdisks:
     +S RAID10.p0.s2          State: up	D: dm2          Size:        511 MB
     +S RAID10.p0.s1          State: up	D: dm1          Size:        511 MB
     +S RAID10.p0.s0          State: up	D: dm0          Size:        511 MB
     +S m2.p1.s0              State: up	D: diskG        Size:        511 MB
     +S m2.p0.s0              State: up	D: diskF        Size:        511 MB
     +S m1.p1.s0              State: up	D: diskE        Size:        511 MB
     +S m1.p0.s0              State: up	D: diskD        Size:        511 MB
     +S m0.p1.s0              State: up	D: diskC        Size:        511 MB
     +S m0.p0.s0              State: up	D: diskB        Size:        511 MB
     +
    + +
    +

    +

    Figure 20-9. A RAID-1+0 Volume

    +
    diff -r -u handbook.orig/vinum-intro.html handbook/vinum-intro.html --- handbook.orig/vinum-intro.html 2008-03-22 05:43:54.000000000 +0100 +++ handbook/vinum-intro.html 2008-04-08 14:23:40.000000000 +0200 @@ -3,12 +3,12 @@ -Disks Are Too Small +Introduction - + @@ -25,7 +25,7 @@ Prev Chapter 20 The Vinum Volume Manager -Next @@ -34,14 +34,24 @@
    -

    20.2 Disks Are Too -Small

    +

    20.2 Introduction

    + +

    +Since computers begun to be used as data storage devices the issue of ensuring a safe operation has been studied. +

    +

    +Different strategies have been developed, one of the most interesting is the Redundant Arrays of Inexpensive Disks (RAID). +The term RAID was first defined by David A. Patterson, Garth A. Gibson and Randy Katz at the University of California, Berkeley in 1987. They studied the possibility of using two or more drives to appear as a single device to the host system and published a paper: "A Case for Redundant Arrays of Inexpensive Disks (RAID)" in June 1988 at the SIGMOD conference. However, the idea of using redundant disk arrays was first patented by Norman Ken Ouchi at IBM. This patent was awarded in 1978 (U.S. patent 4,092,7 32) titled "System for recovering data stored in failed memory unit." The claims for this patent describe what would later be named RAID-5 with full stripe writes. This 1978 patent also acknowledges that disk mirroring or duplexing (RAID-1) and protection with dedicated parity (RAID-4) were prior art at the time the patent was deposited. +

    +

    +VINUM is a Volume Manager and can be understood as a Software capable of implementing RAID-0, RAID-1 and RAID-5 specifications. Nowadays, hardware RAID-Controllers are very popular and some of them have significant better performance than a similar Software RAID approach. Nevertheless, a Software Volume Manager can provide more flexibility and can also be used in conjunction with a hardware controller. +

    +

    +Since FreeBSD RELEASE 5.0, VINUM has been integrated under the GEOM framework +(Chapter 19), +which also provides an alternative way of implementing RAID-0 and RAID-1. +

    -

    Disks are getting bigger, but so are data storage requirements. Often you will find -you want a file system that is bigger than the disks you have available. Admittedly, this -problem is not as acute as it was ten years ago, but it still exists. Some systems have -solved this by creating an abstract device which stores its data on a number of -disks.

    diff -r -u handbook.orig/vinum-objects.html handbook/vinum-objects.html --- handbook.orig/vinum-objects.html 2008-03-22 05:43:54.000000000 +0100 +++ handbook/vinum-objects.html 2008-04-08 14:34:32.000000000 +0200 @@ -8,7 +8,7 @@ - + @@ -25,7 +25,7 @@ Prev Chapter 20 The Vinum Volume Manager -Next @@ -36,15 +36,14 @@

    20.5 Vinum Objects

    -

    In order to address these problems, Vinum implements a four-level hierarchy of -objects:

    +

    Vinum implements a four-level hierarchy of objects:

    • The most visible object is the virtual disk, called a volume. Volumes have essentially the same properties as a UNIX® disk drive, though there are some minor -differences. They have no size limitations.

      +differences. Their size is not limited by the size of an individual drive.

    • @@ -103,31 +102,34 @@

      20.5.3 Performance Issues

      -

      Vinum implements both concatenation and striping at the plex level:

      +

      Vinum implements Concatenation, Striping and RAID-5 at the plex level:

      • -

        A concatenated plex uses the +

        A Concatenated plex uses the address space of each subdisk in turn.

      • -

        A striped plex stripes the data +

        A Striped plex stripes the data across each subdisk. The subdisks must all have the same size, and there must be at least two subdisks in order to distinguish it from a concatenated plex.

      • +
      • +Like a striped plex, a RAID-5 plex stripes the data across each subdisk. The subdisks +must all have the same size, and there must be at least three subdisks, otherwise mirroring would be more efficient. +
      -

      20.5.4 Which Plex -Organization?

      +

      20.5.4 Which Plex Organization?

      -

      The version of Vinum supplied with FreeBSD 7.0 implements two kinds of plex:

      +

      The version of Vinum supplied with FreeBSD 7.0 implements three kinds of plex:

      • -

        Concatenated plexes are the most flexible: they can contain any number of subdisks, +

        Concatenated plexes are the most flexible: they can contain any number of subdisks, and the subdisks may be of different length. The plex may be extended by adding additional subdisks. They require less CPU time than striped plexes, though the difference in CPU overhead @@ -136,29 +138,30 @@

      • -

        The greatest advantage of striped (RAID-0) plexes +

        The greatest advantage of Striped (RAID-0) plexes is that they reduce hot spots: by choosing an optimum sized stripe (about 256 kB), you can even out the load on the component drives. The disadvantages of this approach are (fractionally) more complex code and restrictions on subdisks: they must be all the same size, and extending a plex by adding new subdisks is so complicated that Vinum currently does not implement it. Vinum imposes an additional, trivial restriction: a striped plex -must have at least two subdisks, since otherwise it is indistinguishable from a +must have at least two subdisks, otherwise it is indistinguishable from a concatenated plex.

      • +
      • +RAID-5 plexes are effectively an extension of striped plexes. Compared to striped +plexes, they offer the advantage of fault tolerance, but the disadvantages of higher +storage cost and significantly higher CPU overhead, particularly for writes. The code +is an order of magnitude more complex than for concatenated and striped plexes. Like +striped plexes, RAID-5 plexes must have equal-sized subdisks and cannot currently be +extended. Vinum enforces a minimum of three subdisks for a RAID-5 plex, since any +smaller number would not make sense +
      -

      Table 20-1 summarizes the advantages -and disadvantages of each plex organization.

      -
      -

      Table 20-1. Vinum Plex Organizations

      +

      Table 20-1. Vinum Plex Organizations: advantages and disadvantages

      - -----+
      @@ -171,7 +174,7 @@ - + @@ -179,18 +182,76 @@ - + + + + + + + + + +
      Plex type
      concatenatedConcatenated 1 yes no
      stripedStriped 2 no yes High performance in combination with highly concurrent access
      RAID-53noyesHighly reliable storage, efficient read access, data update has moderate performance
      +
      + +
      +

      20.5.5 Object Naming

      + +

      Vinum assigns default names to plexes and subdisks, although they +may be overridden. Overriding the default names is not recommended: experience with the +VERITAS volume manager, which allows arbitrary naming of objects, has shown that this +flexibility does not bring a significant advantage, and it can cause confusion.

      + +

      Names may contain any non-blank character, but it is recommended to restrict them to +letters, digits and the underscore characters. The names of volumes, plexes and subdisks +may be up to 64 characters long, and the names of drives may be up to 32 characters +long.

      + +

      Vinum objects are assigned device nodes in the hierarchy /dev/gvinum. All volumes get direct entries there too. +

      + +
        + +
      • +

        The directories /dev/gvinum/plex, and /dev/gvinum/sd contain device nodes for each plex and for +each subdisk, respectively.

        +

        For each Volume created, there will be a /dev/gvinum/My-Volume-Name entry.

        +
      • +
      +
      + +
      +

      20.5.6 Differences for FreeBSD 4.X

      + +

      Vinum objects are assigned device nodes in the hierarchy /dev/vinum. +

      +
        +
      • +

        The control devices /dev/vinum/control and /dev/vinum/controld, used by +gvinum(8) +and the Vinum daemon respectively.

        +
      • + +
      • +

        A directory /dev/vinum/drive with entries for each drive. +These entries are in fact symbolic links to the corresponding disk nodes.

        +
      • + + +
      +
      + diff -r -u handbook.orig/vinum-root.html handbook/vinum-root.html --- handbook.orig/vinum-root.html 2008-03-22 05:43:54.000000000 +0100 +++ handbook/vinum-root.html 2008-04-08 14:28:55.000000000 +0200 @@ -3,12 +3,12 @@ -Using Vinum for the Root Filesystem +Using Vinum for the Root File system - + @@ -25,7 +25,7 @@ Prev Chapter 20 The Vinum Volume Manager -Next @@ -34,110 +34,56 @@
      -

      20.9 Using Vinum for the Root -Filesystem

      +

      20.7 Using Vinum for the Root +File system

      -

      For a machine that has fully-mirrored filesystems using Vinum, it is desirable to also -mirror the root filesystem. Setting up such a configuration is less trivial than -mirroring an arbitrary filesystem because:

      +

      For a machine that has fully-mirrored file systems using Vinum, it is desirable to also +mirror the root file system. Setting up such a configuration is less trivial than +mirroring an arbitrary file system because:

      • -

        The root filesystem must be available very early during the boot process, so the Vinum +

        The root file system must be available very early during the boot process, so the Vinum infrastructure must already be available at this time.

      • -

        The volume containing the root filesystem also contains the system bootstrap and the +

        The volume containing the root file system also contains the system bootstrap and the kernel, which must be read using the host system's native utilities (e. g. the BIOS on PC-class machines) which often cannot be taught about the details of Vinum.

      In the following sections, the term “root volume” is generally used to -describe the Vinum volume that contains the root filesystem. It is probably a good idea +describe the Vinum volume that contains the root file system. It is probably a good idea to use the name "root" for this volume, but this is not technically required in any way. All command examples in the following sections assume this name though.

      -

      20.9.1 Starting up Vinum Early Enough -for the Root Filesystem

      +

      20.7.1 Starting up Vinum Early Enough +for the Root File system

      -

      There are several measures to take for this to happen:

      - -
        -
      • -

        Vinum must be available in the kernel at boot-time. Thus, the method to start Vinum -automatically described in Section -20.8.1.1 is not applicable to accomplish this task, and the start_vinum parameter must actually not be set when the following setup is being arranged. The -first option would be to compile Vinum statically into the kernel, so it is available all -the time, but this is usually not desirable. There is another option as well, to have /boot/loader (Section -12.3.3) load the vinum kernel module early, before starting the kernel. This can be -accomplished by putting the line:

        +

        Vinum must be available in the kernel at boot-time. +Add the following line to your /boot/loader.conf (Section +12.3.3) in order to load the Vinum kernel module early enough.

          geom_vinum_load="YES"
          
        -

        into the file /boot/loader.conf.

        -
      • - -
      • -
        -
        -

        Note: For Gvinum, all -startup is done automatically once the kernel module has been loaded, so the procedure -described above is all that is needed. The following text documents the behaviour of the -historic Vinum system, for the sake of older setups.

        -
        -
        - -

        Vinum must be initialized early since it needs to supply the volume for the root -filesystem. By default, the Vinum kernel part is not looking for drives that might -contain Vinum volume information until the administrator (or one of the startup scripts) -issues a vinum start command.

        - -
        -
        -

        Note: The following paragraphs are outlining the steps needed for FreeBSD 5.X -and above. The setup required for FreeBSD 4.X differs, and is described below in Section 20.9.5.

        -
        -
        - -

        By placing the line:

        - -
         -vinum.autostart="YES"
         -
        - -

        into /boot/loader.conf, Vinum is instructed to automatically -scan all drives for Vinum information as part of the kernel startup.

        - -

        Note that it is not necessary to instruct the kernel where to look for the root -filesystem. /boot/loader looks up the name of the root device -in /etc/fstab, and passes this information on to the kernel. -When it comes to mount the root filesystem, the kernel figures out from the device name -provided which driver to ask to translate this into the internal device ID (major/minor -number).

        -
      • -
      -

      20.9.2 Making a Vinum-based Root +

      20.7.2 Making a Vinum-based Root Volume Accessible to the Bootstrap

      Since the current FreeBSD bootstrap is only 7.5 KB of code, and already has the burden -of reading files (like /boot/loader) from the UFS filesystem, +of reading files (like /boot/loader) from the UFS file system, it is sheer impossible to also teach it about internal Vinum structures so it could parse the Vinum configuration data, and figure out about the elements of a boot volume itself. Thus, some tricks are necessary to provide the bootstrap code with the illusion of a -standard "a" partition that contains the root filesystem.

      +standard "a" partition that contains the root file system.

      For this to be possible at all, the following requirements must be met for the root volume:

      @@ -153,9 +99,9 @@

    Note that it is desirable and possible that there are multiple plexes, each containing -one replica of the root filesystem. The bootstrap process will, however, only use one of +one replica of the root file system. The bootstrap process will, however, only use one of these replica for finding the bootstrap and all the files, until the kernel will -eventually mount the root filesystem itself. Each single subdisk within these plexes will +eventually mount the root file system itself. Each single subdisk within these plexes will then need its own "a" partition illusion, for the respective device to become bootable. It is not strictly needed that each of these faked "a" partitions is located at the same offset within its device, @@ -186,18 +132,18 @@

      # bsdlabel -e devname
     +class="REPLACEABLE">${devname}
      

    for each device that participates in the root volume. devname must be either the name of the disk (like ${devname} must be either the name of the disk (like da0) for disks without a slice (aka. fdisk) table, or the name of the slice (like ad0s1).

    If there is already an "a" partition on the device -(presumably, containing a pre-Vinum root filesystem), it should be renamed to something +(presumably, containing a pre-Vinum root file system), it should be renamed to something else, so it remains accessible (just in case), but will no longer be used by default to -bootstrap the system. Note that active partitions (like a root filesystem currently +bootstrap the system. Note that active partitions (like a root file system currently mounted) cannot be renamed, so this must be executed either when being booted from a “Fixit” medium, or in a two-step process, where (in a mirrored situation) the disk that has not been currently booted is being manipulated first.

    @@ -209,7 +155,7 @@ partition can be taken verbatim from the calculation above. The "fstype" should be 4.2BSD. The "fsize", "bsize", and "cpg" values should best be chosen to match the actual filesystem, +class="LITERAL">"cpg" values should best be chosen to match the actual file system, though they are fairly unimportant within this context.

    That way, a new "a" partition will be established that @@ -225,20 +171,20 @@

      # fsck -n /dev/devnamea
     +class="REPLACEABLE">${devname}a
      

    It should be remembered that all files containing control information must be relative -to the root filesystem in the Vinum volume which, when setting up a new Vinum root -volume, might not match the root filesystem that is currently active. So in particular, +to the root file system in the Vinum volume which, when setting up a new Vinum root +volume, might not match the root file system that is currently active. So in particular, the files /etc/fstab and /boot/loader.conf need to be taken care of.

    At next reboot, the bootstrap should figure out the appropriate control information -from the new Vinum-based root filesystem, and act accordingly. At the end of the kernel +from the new Vinum-based root file system, and act accordingly. At the end of the kernel initialization process, after all devices have been announced, the prominent notice that shows the success of this setup is a message like:

    @@ -248,7 +194,7 @@
    -

    20.9.3 Example of a Vinum-based Root +

    20.7.3 Example of a Vinum-based Root Setup

    After the Vinum root volume has been set up, the output of gvinum @@ -293,7 +239,7 @@ class="LITERAL">"offset" parameter is the sum of the offset within the Vinum partition "h", and the offset of this partition within the device (or slice). This is a typical setup that is necessary to avoid the problem -described in Section 20.9.4.3. It can also +described in Section 20.7.4.3. It can also be seen that the entire "a" partition is completely within the "h" partition containing all the Vinum data for this device.

    @@ -303,13 +249,13 @@

    -

    20.9.4 Troubleshooting

    +

    20.7.4 Troubleshooting

    If something goes wrong, a way is needed to recover from the situation. The following list contains few known pitfalls and solutions.

    -

    20.9.4.1 System Bootstrap Loads, but +

    20.7.4.1 System Bootstrap Loads, but System Does Not Boot

    If for any reason the system does not continue to boot, the bootstrap can be @@ -324,26 +270,26 @@

    When ready, the boot process can be continued with a boot -as. The options -as will request the kernel to ask for -the root filesystem to mount (-a), and make the boot process -stop in single-user mode (-s), where the root filesystem is +the root file system to mount (-a), and make the boot process +stop in single-user mode (-s), where the root file system is mounted read-only. That way, even if only one plex of a multi-plex volume has been mounted, no data inconsistency between plexes is being risked.

    -

    At the prompt asking for a root filesystem to mount, any device that contains a valid -root filesystem can be entered. If /etc/fstab had been set up +

    At the prompt asking for a root file system to mount, any device that contains a valid +root file system can be entered. If /etc/fstab had been set up correctly, the default should be something like ufs:/dev/gvinum/root. A typical alternate choice would be something like ufs:da0d which could be a hypothetical partition that -contains the pre-Vinum root filesystem. Care should be taken if one of the alias "a" partitions are entered here that are actually reference to the subdisks of the Vinum root device, because in a mirrored setup, this would only mount one -piece of a mirrored root device. If this filesystem is to be mounted read-write later on, +piece of a mirrored root device. If this file system is to be mounted read-write later on, it is necessary to remove the other plex(es) of the Vinum root volume since these plexes would otherwise carry inconsistent data.

    -

    20.9.4.2 Only Primary Bootstrap +

    20.7.4.2 Only Primary Bootstrap Loads

    If /boot/loader fails to load, but the primary bootstrap @@ -352,12 +298,12 @@ point, using the space key. This will make the bootstrap stop in stage two, see Section 12.3.2. An attempt can be made here to boot off an alternate partition, like the partition containing the -previous root filesystem that has been moved away from "a" +previous root file system that has been moved away from "a" above.

    -

    20.9.4.3 Nothing +

    20.7.4.3 Nothing Boots, the Bootstrap Panics

    This situation will happen if the bootstrap had been destroyed by the Vinum @@ -381,9 +327,32 @@

    -

    20.9.5 Differences for +

    20.7.5 Differences for FreeBSD 4.X

    +

    Vinum must be initialized early since it needs to supply the volume for the root +file system. By default, the Vinum kernel part is not looking for drives that might +contain Vinum volume information until the administrator (or one of the startup scripts) +issues a vinum start command.

    + +

    By placing the line:

    + +
     +vinum.autostart="YES"
     +
    + +

    into /boot/loader.conf, Vinum is instructed to automatically +scan all drives for Vinum information as part of the kernel startup.

    + +

    Note that it is not necessary to instruct the kernel where to look for the root +file system. /boot/loader looks up the name of the root device +in /etc/fstab, and passes this information on to the kernel. +When it comes to mount the root file system, the kernel figures out from the device name +provided which driver to ask to translate this into the internal device ID (major/minor +number).

    + + +

    Under FreeBSD 4.X, some internal functions required to make Vinum automatically scan all disks are missing, and the code that figures out the internal ID of the root device is not smart enough to handle a name like /dev/vinum/root @@ -402,7 +371,7 @@ listed, nor is it necessary to add each slice and/or partition explicitly, since Vinum will scan all slices and partitions of the named drives for valid Vinum headers.

    -

    Since the routines used to parse the name of the root filesystem, and derive the +

    Since the routines used to parse the name of the root file system, and derive the device ID (major/minor number) are only prepared to handle “classical” device names like /dev/ad0s1a, they cannot make any sense out of a root volume name like /dev/vinum/root. For that reason, Vinum @@ -422,7 +391,7 @@ name of the root device string being passed (that is, "vinum" in our case), it will use the pre-allocated device ID, instead of trying to figure out one itself. That way, during the usual automatic startup, it can continue to mount the Vinum -root volume for the root filesystem.

    +root volume for the root file system.

    However, when boot -a has been requesting to ask for entering the name of the root device manually, it must be noted that this routine still cannot @@ -447,7 +416,7 @@ accesskey="P">Prev Home -Next @@ -455,7 +424,7 @@ Configuring Vinum Up -Virtualization +Vinum Examples

    diff -r -u handbook.orig/vinum-vinum.html handbook/vinum-vinum.html --- handbook.orig/vinum-vinum.html 2008-03-22 05:43:54.000000000 +0100 +++ handbook/vinum-vinum.html 2008-04-08 14:40:26.000000000 +0200 @@ -8,7 +8,7 @@ - + @@ -42,21 +42,20 @@
    20.1 Synopsis
    -
    20.2 Disks Are Too Small
    +
    20.2 Introduction
    -
    20.3 Access Bottlenecks
    +
    20.3 Disk Performance Issues
    20.4 Data Integrity
    20.5 Vinum Objects
    -
    20.6 Some Examples
    +
    20.6 Configuring Vinum
    -
    20.7 Object Naming
    +
    20.7 Using Vinum for the Root File system
    -
    20.8 Configuring Vinum
    +
    20.8 Vinum Examples
    -
    20.9 Using Vinum for the Root Filesystem
    @@ -86,7 +85,9 @@ users safeguard themselves against such issues is through the use of multiple, and sometimes redundant, disks. In addition to supporting various cards and controllers for hardware RAID systems, the base FreeBSD system includes the Vinum Volume Manager, a block -device driver that implements virtual disk drives. vinum(4) +that implements virtual disk drives. Vinum is a so-called Volume Manager, a virtual disk driver that addresses these three problems. Vinum provides more flexibility, performance, and reliability than @@ -100,12 +101,13 @@

    Note: Starting with FreeBSD 5, Vinum has been rewritten in order to fit into the GEOM architecture (Chapter 19), retaining the original ideas, -terminology, and on-disk metadata. This rewrite is called gvinum (for GEOM -vinum). The following text usually refers to gvinum(8) +(for GEOM vinum). The following text usually refers to Vinum as an abstract name, regardless of the implementation -variant. Any command invocations should now be done using the gvinum command, and the name of the kernel module has been changed +variant. Any command invocations should now be done using the +gvinum(8) +command, and the name of the kernel module has been changed from vinum.ko to geom_vinum.ko, and all device nodes reside under /dev/gvinum instead of /dev/vinum. As of FreeBSD 6, the old Vinum implementation is no @@ -132,7 +134,7 @@ UFS Journaling Through GEOM Up -Disks Are Too Small +Introduction

    --- /dev/null 2008-04-08 15:00:00.000000000 +0200 +++ handbook/vinum-disk-performance-issues.html 2008-04-08 15:09:49.000000000 +0200 @@ -0,0 +1,148 @@ + + + + +Disk Performance Issues + + + + + + + + + + + +
    +

    20.3 Disk Performance Issues

    + +

    Modern systems frequently need to access data in a highly concurrent manner. For +example, large FTP or HTTP servers can maintain thousands of concurrent sessions and have +multiple 100 Mbit/s connections to the outside world. +

    + +

    +The most critical parameter is the load that a transfer places on the subsystem, in other words the time +for which a transfer occupies a drive. +

    + +

    In any disk transfer, the drive must first position the heads, wait for the first +sector to pass under the read head, and then perform the transfer. These actions can be +considered to be atomic: it does not make any sense to interrupt them. +The data transfer time is negligible compared to the time taken for positioning the heads.

    + +

    The traditional and obvious solution to this bottleneck is “more +spindles”: rather than using one large disk, it uses several smaller disks with the +same aggregate storage space. Each disk is capable of positioning and transferring +independently, so the effective throughput increases by a factor close to the number of +disks used.

    + +

    The exact throughput improvement is, of course, smaller than the number of disks +involved: although each drive is capable of transferring in parallel, there is no way to +ensure that the requests are evenly distributed across the drives. Inevitably the load on +one drive will be higher than on another.

    + +

    The evenness of the load on the disks is strongly dependent on the way the data is +shared across the drives. In the following discussion, it is convenient to think of the +disk storage as a large number of data sectors which are addressable by number, rather +like the pages in a book. +

    + +
    +

    20.3.1 Concatenation

    + +

    The most obvious method is to divide the virtual disk into +groups of consecutive sectors the size of the individual physical disks and store them in +this manner, rather like taking a large book and tearing it into smaller sections. This +method is called concatenation and +has the advantage that the disks are not required to have any specific size +relationships. It works well when the access to the virtual disk is spread evenly about +its address space. When access is concentrated on a smaller area, the improvement is less +marked. Figure 20-1 illustrates +the sequence in which storage units are allocated in a concatenated organization.

    + +

    + +
    +

    +

    Figure 20-1. Concatenated Organization

    +
    + +
    + +
    +

    20.3.2 Striping

    + +

    An alternative mapping is to divide the address space into smaller, equal-sized +components and store them sequentially on different devices. For example, the first 256 +sectors may be stored on the first disk, the next 256 sectors on the next disk and so on. +After filling the last disk, the process repeats until the disks are full. This mapping +is called striping or RAID-0. Striping requires somewhat +more effort to locate the data, and it can cause additional I/O load where a transfer is +spread over multiple disks, but it can also provide a more constant load across the +disks. Figure 20-2 illustrates +the sequence in which storage units are allocated in a striped organization.

    + +

    + +
    +

    +

    Figure 20-2. Striped Organization

    +
    + +
    + + + +

    This, and other documents, can be downloaded from ftp://ftp.FreeBSD.org/pub/FreeBSD/doc/.

    + +

    For questions about FreeBSD, read the documentation before contacting <questions@FreeBSD.org>.
    +For questions about this documentation, e-mail <doc@FreeBSD.org>.

    + + + --- handbook.orig/virtualization.html 2008-03-22 05:43:54.000000000 +0100 +++ handbook/virtualization.html 2008-04-08 15:14:45.000000000 +0200 @@ -7,8 +7,8 @@ - + @@ -23,7 +23,7 @@ -Prev -Prev Home @@ -126,7 +126,7 @@ -Using Vinum for the Root Filesystem +Vinum Examples Up FreeBSD as a Guest OS --- handbook.orig/raid.html 2008-03-22 05:43:54.000000000 +0100 +++ handbook/raid.html 2008-04-08 15:43:16.000000000 +0200 @@ -93,8 +93,8 @@

    Next, consider how to attach them as part of the file system. You should research both -vinum(8) (vinum(4) (Chapter 20) and ccd(4). In this @@ -309,17 +309,18 @@

    18.4.1.2 The Vinum Volume Manager

    -

    The Vinum Volume Manager is a block device driver which implements virtual disk +

    The Vinum Volume Manager is a block device driver +vinum(4) +which implements virtual disk drives. It isolates disk hardware from the block device interface and maps data in ways which result in an increase in flexibility, performance and reliability compared to the -traditional slice view of disk storage. vinum(8) implements +traditional slice view of disk storage. Vinum implements the RAID-0, RAID-1 and RAID-5 models, both individually and in combination.

    -

    See Chapter 20 for more information about vinum(8).

    +

    See Chapter 20 for more information about most recent Vinum implementation, gvinum(8), under the Geom architecture +Chapter 19 +.

    --0-968980677-1207663836=:26150--