From owner-freebsd-geom@FreeBSD.ORG Mon Oct 29 11:07:04 2007 Return-Path: Delivered-To: freebsd-geom@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9761E16A47A for ; Mon, 29 Oct 2007 11:07:02 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id D29BC13C48D for ; Mon, 29 Oct 2007 11:07:02 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.1/8.14.1) with ESMTP id l9TB72He090101 for ; Mon, 29 Oct 2007 11:07:02 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.1/8.14.1/Submit) id l9TB72Dg090097 for freebsd-geom@FreeBSD.org; Mon, 29 Oct 2007 11:07:02 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 29 Oct 2007 11:07:02 GMT Message-Id: <200710291107.l9TB72Dg090097@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-geom@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-geom@FreeBSD.org X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Oct 2007 11:07:04 -0000 Current FreeBSD problem reports Critical problems Serious problems S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/73177 geom kldload geom_* causes panic due to memory exhaustion o kern/76538 geom [gbde] nfs-write on gbde partition stalls and continue o kern/83464 geom [geom] [patch] Unhandled malloc failures within libgeo o kern/84556 geom [geom] GBDE-encrypted swap causes panic at shutdown o kern/87544 geom [gbde] mmaping large files on a gbde filesystem deadlo o kern/89102 geom [geom_vfs] [panic] panic when forced unmount FS from u o bin/90093 geom fdisk(8) incapable of altering in-core geometry o kern/90582 geom [geom_mirror] [panic] Restore cause panic string (ffs_ o kern/98034 geom [geom] dereference of NULL pointer in acd_geom_detach o kern/104389 geom [geom] [patch] sys/geom/geom_dump.c doesn't encode XML o kern/113419 geom [geom] geom fox multipathing not failing back o misc/113543 geom [geom] [patch] geom(8) utilities don't work inside the o kern/113957 geom [gmirror] gmirror is intermittently reporting a degrad o kern/115572 geom [gbde] gbde partitions fail at 28bit/48bit LBA address 14 problems total. Non-critical problems S Tracker Resp. Description -------------------------------------------------------------------------------- o bin/78131 geom gbde "destroy" not working. o kern/79251 geom [2TB] newfs fails on 2.6TB gbde device o kern/94632 geom [geom] Kernel output resets input while GELI asks for f kern/105390 geom [geli] filesystem on a md backed by sparse file with s o kern/107707 geom [geom] [patch] add new class geom_xbox360 to slice up p bin/110705 geom gmirror control utility does not exit with correct exi o kern/113837 geom [geom] unable to access 1024 sector size storage o kern/113885 geom [geom] [patch] improved gmirror balance algorithm o kern/114532 geom GEOM_MIRROR shows up in kldstat even if compiled in th o kern/115547 geom [geom] [patch] for GEOM Eli to get password from stdin 10 problems total. From owner-freebsd-geom@FreeBSD.ORG Mon Oct 29 16:08:37 2007 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BCECF16A46E for ; Mon, 29 Oct 2007 16:08:37 +0000 (UTC) (envelope-from felipe@neuwald.biz) Received: from itacaiunas.cepatec.org.br (itacaiunas.cepatec.org.br [200.152.208.51]) by mx1.freebsd.org (Postfix) with ESMTP id 6B18B13C45A for ; Mon, 29 Oct 2007 16:08:32 +0000 (UTC) (envelope-from felipe@neuwald.biz) Received: from localhost (vermelho [10.0.0.5]) by itacaiunas.cepatec.org.br (Postfix) with ESMTP id F18F01155D6 for ; Mon, 29 Oct 2007 13:59:46 -0200 (BRST) X-Virus-Scanned: amavisd-new at cepatec.org.br Received: from itacaiunas.cepatec.org.br ([10.0.0.3]) by localhost (vermelho.cepatec.org.br [10.0.0.5]) (amavisd-new, port 10024) with ESMTP id tFoFwzjcED8j for ; Mon, 29 Oct 2007 12:59:46 -0300 (BRT) Received: from [192.168.0.152] (unknown [200.199.198.61]) by itacaiunas.cepatec.org.br (Postfix) with ESMTP id 05F801155D7 for ; Mon, 29 Oct 2007 13:59:43 -0200 (BRST) Message-ID: <47260370.3060004@neuwald.biz> Date: Mon, 29 Oct 2007 13:59:44 -0200 From: Felipe Neuwald User-Agent: Thunderbird 1.5.0.13 (X11/20070824) MIME-Version: 1.0 To: freebsd-geom@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Raid 0 + 1 X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Oct 2007 16:08:37 -0000 Hi Folks, I talked with my customer, and we decided to implement a 0 + 1 RAID, with 4 disks of 250Gb each. Here is how my RAID is working now: [root@fileserver /]# gvinum list 4 drives: D a State: up /dev/ad4 A: 0/238474 MB (0%) D b State: up /dev/ad5 A: 0/238475 MB (0%) D c State: up /dev/ad6 A: 0/238475 MB (0%) D d State: up /dev/ad7 A: 0/238475 MB (0%) 1 volume: V data State: up Plexes: 1 Size: 931 GB 1 plex: P data.p0 S State: up Subdisks: 4 Size: 931 GB 4 subdisks: S data.p0.s0 State: up D: a Size: 232 GB S data.p0.s1 State: up D: b Size: 232 GB S data.p0.s2 State: up D: c Size: 232 GB S data.p0.s3 State: up D: d Size: 232 GB Could someone give me on example of how I'll implement a 0 + 1 RAID? Thank you very much, Felipe Neuwald. From owner-freebsd-geom@FreeBSD.ORG Mon Oct 29 17:29:45 2007 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1FF3616A469 for ; Mon, 29 Oct 2007 17:29:45 +0000 (UTC) (envelope-from felipe@neuwald.biz) Received: from itacaiunas.cepatec.org.br (itacaiunas.cepatec.org.br [200.152.208.51]) by mx1.freebsd.org (Postfix) with ESMTP id 8A41E13C491 for ; Mon, 29 Oct 2007 17:29:27 +0000 (UTC) (envelope-from felipe@neuwald.biz) Received: from localhost (vermelho [10.0.0.5]) by itacaiunas.cepatec.org.br (Postfix) with ESMTP id 46761115528 for ; Mon, 29 Oct 2007 11:26:16 -0200 (BRST) X-Virus-Scanned: amavisd-new at cepatec.org.br Received: from itacaiunas.cepatec.org.br ([10.0.0.3]) by localhost (vermelho.cepatec.org.br [10.0.0.5]) (amavisd-new, port 10024) with ESMTP id fVMjLuFkMvZv for ; Mon, 29 Oct 2007 10:26:14 -0300 (BRT) Received: from [192.168.0.152] (unknown [200.199.198.61]) by itacaiunas.cepatec.org.br (Postfix) with ESMTP id 5F4B71153EC for ; Mon, 29 Oct 2007 11:26:13 -0200 (BRST) Message-ID: <4725DF75.9030601@neuwald.biz> Date: Mon, 29 Oct 2007 11:26:13 -0200 From: Felipe Neuwald User-Agent: Thunderbird 1.5.0.13 (X11/20070824) MIME-Version: 1.0 To: freebsd-geom@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Raid 0 + 1 X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Oct 2007 17:29:45 -0000 Hi Folks, I talked with my customer, and we decided to implement a 0 + 1 RAID, with 4 disks of 250Gb each. Here is how my RAID is working now: [root@fileserver /]# gvinum list 4 drives: D a State: up /dev/ad4 A: 0/238474 MB (0%) D b State: up /dev/ad5 A: 0/238475 MB (0%) D c State: up /dev/ad6 A: 0/238475 MB (0%) D d State: up /dev/ad7 A: 0/238475 MB (0%) 1 volume: V data State: up Plexes: 1 Size: 931 GB 1 plex: P data.p0 S State: up Subdisks: 4 Size: 931 GB 4 subdisks: S data.p0.s0 State: up D: a Size: 232 GB S data.p0.s1 State: up D: b Size: 232 GB S data.p0.s2 State: up D: c Size: 232 GB S data.p0.s3 State: up D: d Size: 232 GB Could someone give me on example of how I'll implement a 0 + 1 RAID? Thank you very much, Felipe Neuwald. From owner-freebsd-geom@FreeBSD.ORG Mon Oct 29 18:14:25 2007 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 572FF16A419 for ; Mon, 29 Oct 2007 18:14:25 +0000 (UTC) (envelope-from felipe@neuwald.biz) Received: from itacaiunas.cepatec.org.br (itacaiunas.cepatec.org.br [200.152.208.51]) by mx1.freebsd.org (Postfix) with ESMTP id E7DF013C4CB for ; Mon, 29 Oct 2007 18:14:19 +0000 (UTC) (envelope-from felipe@neuwald.biz) Received: from localhost (vermelho [10.0.0.5]) by itacaiunas.cepatec.org.br (Postfix) with ESMTP id EADF11155DC; Mon, 29 Oct 2007 16:05:41 -0200 (BRST) X-Virus-Scanned: amavisd-new at cepatec.org.br Received: from itacaiunas.cepatec.org.br ([10.0.0.3]) by localhost (vermelho.cepatec.org.br [10.0.0.5]) (amavisd-new, port 10024) with ESMTP id QTK6fflbT3Ov; Mon, 29 Oct 2007 15:05:39 -0300 (BRT) Received: from [192.168.0.152] (unknown [200.199.198.61]) by itacaiunas.cepatec.org.br (Postfix) with ESMTP id 2B0F71155AD; Mon, 29 Oct 2007 16:05:37 -0200 (BRST) Message-ID: <472620F1.90807@neuwald.biz> Date: Mon, 29 Oct 2007 16:05:37 -0200 From: Felipe Neuwald User-Agent: Thunderbird 1.5.0.13 (X11/20070824) MIME-Version: 1.0 To: Kevin Thompson References: <4414.147.177.192.113.1193678595.squirrel@angst.csh.rit.edu> In-Reply-To: <4414.147.177.192.113.1193678595.squirrel@angst.csh.rit.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-geom@freebsd.org Subject: Re: Raid 0 + 1 X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Oct 2007 18:14:25 -0000 Hi Kevin, A bootable 0+1 RAID isn't my idea, cause I have one small disk to system files, but I'll try use your documentation to make my 0+1 RAID. I have one blog to put these documentation, if it's ok to you, please take a look at ontheroadbrother.blogspot.com . Cheers, Felipe Neuwald Kevin Thompson escreveu: >> Hi Folks, >> >> I talked with my customer, and we decided to implement a 0 + 1 RAID, >> with 4 disks of 250Gb each. >> >> Here is how my RAID is working now: >> > > I have some rough instructions I wrote up about doing a bootable 0+1 Geom > RAID, in wiki format. I don't have a public place to post them, so I've > just copied them below. Feedback is welcome. > > Once you get it setup, I would highly recommend that you experiment with > the setup before relying on it - make sure you know how to handle failed > disk replacements, etc. VMWare/camcontrol/atacontrol is very handy in this > regard. > > --Kevin Thompson > > > > > ==Introduction== > > This article describes one method to build a RAID 0+1 array with 4 logical > disks using the FreeBSD GEOM framework, and being able to boot off of such > an array. > > Building a GEOM mirror and being able to directly boot off of it is a > relatively simple task - this is due to the fact that when mirroring is > done, all FreeBSD slices maintain their disk-level structure - GEOM stores > its metadata at the end of the disk/partition/etc. Most importantly, in > such a configuration the MBR and boot file system is untouched. > > The latter is not the case in RAID 0, RAID 0+1, or RAID 1+0 setups. > > In striping configurations, the boot filesystem is interleaved between the > various disks (usually two), and as such, the filesystem appears to be > corrupt if read from only one disk. > > Since the MBR program on the boot disk is incapable of understanding this > physical block layout, it is unable to find the kernel in order to start > FreeBSD. > > However, should the MBR program be able to read the boot filesystem, it > can then load the kernel and related kernel modules, which would then > allow the system to 'boot' from the raidset. > > Technically speaking, the only file system that has to remain untouched is > the /boot file system. root (/), /usr, /var, /tmp, et cetera may all then > be mounted from a raidset. > > In this example, for simplicity's sake, I protect all of the typical root > file system, not just /boot. A more enterprising user may want to modify > their approach. > > ==Instructions== > Boot with a FreeBSD install disk, then start the fixit console. Next > you'll need to clean up the environment (the fixit console needs a little > updating) > ln -s /dist/boot/kernel /boot/kernel > ln -s /dist/lib /lib > EDITOR=/mnt2/usr/bin/vi; export EDITOR > PATH=$PATH':'/mnt2/sbin':'/mnt2/usr/bin':'/mnt2/usr/sbin > export PATH > > > Now load the kernel modules for the geom modules we're going to be using. > glabel load > gstripe load > gmirror load > > Next, we're going to label each of the drives. The 'da0' drive name is > typically assigned by the bios on boot, by way of some sort of metric such > as chain location. The geom label module writes a fixed label to each > drive, so that no matter where each drive goes, or if new devices are > inserted and the numbering is reordered, the raid set will always work the > same. > glabel label geom0 da0 > glabel label geom1 da1 > glabel label geom2 da2 > glabel label geom3 da3 > > Now install a mbr and basic partioning on each drive. This will create a > single partition taking the entire drive for each drive. > fdisk -vBI /dev/label/geom0 > fdisk -vBI /dev/label/geom1 > fdisk -vBI /dev/label/geom2 > fdisk -vBI /dev/label/geom3 > > On the first disk slice of the first drive, install a simple disk label > and bootstrap code. > bsdlabel -wB /dev/label/geom0s1 > > Now edit the generic label on that disk, setting it as you please. 'a' is > commonly root, 'b' the swap partition and 'd' the rest. Don't create any > more partitions other than a, b and d (d will be used as the provider for > a future geom consumer). > > slave(/u9/antiduh) # bsdlabel -e /dev/label/geom0s1 > # size offset fstype [fsize bsize bps/cpg] > a: 500M 16 4.2BSD > b: 500M * swap > c: # leave as is > d: * * 4.2BSD > > The '*' for size means use whatever is left, and the '*' for offset means > use the next logical offset. > > Once you're finished labeling the first drive, write the label for the > drive out to a file, then use it to initialize the other three disks: > bsdlabel /dev/label/geom0s1 > /file > bsdlabel -R /dev/label/geom1s1 /file > bsdlabel -R /dev/label/geom2s1 /file > bsdlabel -R /dev/label/geom3s1 /file > > > Now, create a GEOM mirror out of the 'a' partition of each drive. This > will eventually be the root partition. The new device will be called > '''boot''', and will enumerate in FreeBSD as '''/dev/mirror/boot''' > gmirror label -vh boot /dev/label/geom0s1a /dev/label/geom1s1a > /dev/label/geom2s1a /dev/label/geom3s1a > > Now we're going to pull together the d partion on each drive, create > pairwise stripes, then mirror the new stripes together. > gstripe label -vh -s 131072 st0 /dev/label/geom0s1d /dev/geom1s1d > gstripe label -vh -s 131072 st1 /dev/label/geom2s1d /dev/geom3s1d > > We should now have two new devices '''/dev/stripe/st0''' and > '''/dev/stripe/st1'''. Now mirror those two devices to create our final > device that will next be used for the rest of our filesystems: > gmirror label -vh gm0 /dev/stripe/st0 /dev/stripe/st1 > > We now have our final device '''/dev/mirror/gm0''' that is going to serve > as the base for our regular filesystems. > > Create a basic slicing of the new disk, then edit to taste. Note that we > don't fdisk the raw gm0 device - we create slices directly on the raw > device. > slave(/u9/antiduh) # bsdlabel -wB /dev/mirror/gm0 > slave(/u9/antiduh) # bsdlabel -e /dev/mirror/gm0 > # size offset fstype [fsize bsize bps/cpg] > a: 1 16 unused # that is just a bare '1', not '1M'. This is to get > around a bug in bsdlabel > c: #-- leave as is -- > e: 1000M * 4.2BSD # /var > f: 500M * 4.2BSD # /tmp > d: * * 4.2BSD # /usr, which just gets the rest > > Now create our filesystems on associated slices (the -U option enables > soft updates) > newfs /dev/mirror/boot > newfs -U /dev/mirror/gm0d > newfs -U /dev/mirror/gm0e > newfs -U /dev/mirror/gm0f > > Now we finally get down to mounting the disks, installing the OS on it, > setting up the OS to boot, and finally booting. > > Mount the root disk as /mnt > mount /dev/mirror/boot /mnt > > Create mount points for our other file systems, then mount them: > mkdir /mnt/usr > mkdir /mnt/var > mkdir /mnt/tmp > > mount /dev/mirror/gm0d /mnt/usr > mount /dev/mirror/gm0e /mnt/var > mount /dev/mirror/gm0f /mnt/tmp > > Now we're going to do a install the fun way: > DESTDIR=/mnt > export DESTDIR > cd /dist/6.2-RELEASE/base; ./install.sh > cd ../ports; ./install.sh > cd ../manpages; ./install.sh > cd ../kernels; ./install.sh GENERIC > mv /mnt/boot/GENERIC/* /mnt/boot/kernel > > > You now have a basic, blank, sorta bootable unconfigured FreeBSD install > on the machine. > > Setup the boot loader to load the needed kernel modules, so that we can > mount root from the mirror. > echo 'geom_label_load="YES"' >> /mnt/boot/loader.conf > echo 'geom_stripe_load="YES"' >> /mnt/boot/loader.conf > echo 'geom_mirror_load="YES"' >> /mnt/boot/loader.conf > > Set a kernel option to use more memory but make the stripe layer faster, > otherwise stripes are unbearably slow. > echo 'kern.geom.stripe.fast=1"' >> /mnt/boot/loader.conf > > Now setup our initial fstab file: > echo '/dev/label/geom0s1b none swap sw 0 0' >> /mnt/etc/fstab > echo '/dev/label/geom1s1b none swap sw 0 0' >> /mnt/etc/fstab > echo '/dev/label/geom2s1b none swap sw 0 0' >> /mnt/etc/fstab > echo '/dev/label/geom3s1b none swap sw 0 0' >> /mnt/etc/fstab > echo '/dev/mirror/boot / ufs rw 1 1' >> /mnt/etc/fstab > echo '/dev/mirror/gm0d /usr ufs rw 2 2' >> /mnt/etc/fstab > echo '/dev/mirror/gm0e /var ufs rw 2 2' >> /mnt/etc/fstab > echo '/dev/mirror/gm0f /tmp ufs rw 2 2' >> /mnt/etc/fstab > > Reboot and the machine should start the raid sets automatically and mount > off of the raid array automatically. Keep that install CD handy in case > you messed up. > > Login, change the root password, setup rc.conf (hostname, interfaces, ssh, > linux binary compat... ), start installing stuff, etc. > > Enjoy. > > ==Block Diagram== > > da0 --label-->geom0 > da1 --label-->geom1 > da2 --label-->geom2 > da3 --label-->geom3 > > label/geom0s1a --|--mirror-->/dev/mirror/boot > label/geom1s1a --| > label/geom2s1a --| > label/geom3s1a --| > > label/geom0s1d --|--stripe-->/dev/stripe/st0 --|--mirror-->/dev/mirror/gm0 > label/geom1s1d --| | > | > label/geom2s1d --|--stripe-->/dev/stripe/st1 --| > label/geom3s1d --| > > mirror/boot --> / > > mirror/gm0s1d --> /usr > mirror/gm0s1e --> /var > mirror/gm0s1f --> /tmp > > label/geom0s1b --> swap > label/geom1s1b --> swap > label/geom2s1b --> swap > label/geom3s1b --> swap > > > > > > > From owner-freebsd-geom@FreeBSD.ORG Mon Oct 29 20:28:31 2007 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F051C16A418 for ; Mon, 29 Oct 2007 20:28:31 +0000 (UTC) (envelope-from antiduh@csh.rit.edu) Received: from angst.csh.rit.edu (angst.csh.rit.edu [129.21.60.148]) by mx1.freebsd.org (Postfix) with ESMTP id A5E3613C4A6 for ; Mon, 29 Oct 2007 20:28:31 +0000 (UTC) (envelope-from antiduh@csh.rit.edu) Received: from angst.csh.rit.edu (localhost [127.0.0.1]) by angst.csh.rit.edu (Postfix) with ESMTP id 77580B8F3; Mon, 29 Oct 2007 13:23:14 -0400 (EDT) Received: from 147.177.192.113 (SquirrelMail authenticated user antiduh) by angst.csh.rit.edu with HTTP; Mon, 29 Oct 2007 13:23:15 -0400 (EDT) Message-ID: <4414.147.177.192.113.1193678595.squirrel@angst.csh.rit.edu> Date: Mon, 29 Oct 2007 13:23:15 -0400 (EDT) From: "Kevin Thompson" To: "Felipe Neuwald" User-Agent: SquirrelMail/1.4.11 MIME-Version: 1.0 Content-Type: text/plain;charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Priority: 3 (Normal) Importance: Normal Cc: freebsd-geom@freebsd.org Subject: Re: Raid 0 + 1 X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Oct 2007 20:28:32 -0000 > Hi Folks, > > I talked with my customer, and we decided to implement a 0 + 1 RAID, > with 4 disks of 250Gb each. > > Here is how my RAID is working now: I have some rough instructions I wrote up about doing a bootable 0+1 Geom RAID, in wiki format. I don't have a public place to post them, so I've just copied them below. Feedback is welcome. Once you get it setup, I would highly recommend that you experiment with the setup before relying on it - make sure you know how to handle failed disk replacements, etc. VMWare/camcontrol/atacontrol is very handy in this regard. --Kevin Thompson ==Introduction== This article describes one method to build a RAID 0+1 array with 4 logical disks using the FreeBSD GEOM framework, and being able to boot off of such an array. Building a GEOM mirror and being able to directly boot off of it is a relatively simple task - this is due to the fact that when mirroring is done, all FreeBSD slices maintain their disk-level structure - GEOM stores its metadata at the end of the disk/partition/etc. Most importantly, in such a configuration the MBR and boot file system is untouched. The latter is not the case in RAID 0, RAID 0+1, or RAID 1+0 setups. In striping configurations, the boot filesystem is interleaved between the various disks (usually two), and as such, the filesystem appears to be corrupt if read from only one disk. Since the MBR program on the boot disk is incapable of understanding this physical block layout, it is unable to find the kernel in order to start FreeBSD. However, should the MBR program be able to read the boot filesystem, it can then load the kernel and related kernel modules, which would then allow the system to 'boot' from the raidset. Technically speaking, the only file system that has to remain untouched is the /boot file system. root (/), /usr, /var, /tmp, et cetera may all then be mounted from a raidset. In this example, for simplicity's sake, I protect all of the typical root file system, not just /boot. A more enterprising user may want to modify their approach. ==Instructions== Boot with a FreeBSD install disk, then start the fixit console. Next you'll need to clean up the environment (the fixit console needs a little updating) ln -s /dist/boot/kernel /boot/kernel ln -s /dist/lib /lib EDITOR=/mnt2/usr/bin/vi; export EDITOR PATH=$PATH':'/mnt2/sbin':'/mnt2/usr/bin':'/mnt2/usr/sbin export PATH Now load the kernel modules for the geom modules we're going to be using. glabel load gstripe load gmirror load Next, we're going to label each of the drives. The 'da0' drive name is typically assigned by the bios on boot, by way of some sort of metric such as chain location. The geom label module writes a fixed label to each drive, so that no matter where each drive goes, or if new devices are inserted and the numbering is reordered, the raid set will always work the same. glabel label geom0 da0 glabel label geom1 da1 glabel label geom2 da2 glabel label geom3 da3 Now install a mbr and basic partioning on each drive. This will create a single partition taking the entire drive for each drive. fdisk -vBI /dev/label/geom0 fdisk -vBI /dev/label/geom1 fdisk -vBI /dev/label/geom2 fdisk -vBI /dev/label/geom3 On the first disk slice of the first drive, install a simple disk label and bootstrap code. bsdlabel -wB /dev/label/geom0s1 Now edit the generic label on that disk, setting it as you please. 'a' is commonly root, 'b' the swap partition and 'd' the rest. Don't create any more partitions other than a, b and d (d will be used as the provider for a future geom consumer). slave(/u9/antiduh) # bsdlabel -e /dev/label/geom0s1 # size offset fstype [fsize bsize bps/cpg] a: 500M 16 4.2BSD b: 500M * swap c: # leave as is d: * * 4.2BSD The '*' for size means use whatever is left, and the '*' for offset means use the next logical offset. Once you're finished labeling the first drive, write the label for the drive out to a file, then use it to initialize the other three disks: bsdlabel /dev/label/geom0s1 > /file bsdlabel -R /dev/label/geom1s1 /file bsdlabel -R /dev/label/geom2s1 /file bsdlabel -R /dev/label/geom3s1 /file Now, create a GEOM mirror out of the 'a' partition of each drive. This will eventually be the root partition. The new device will be called '''boot''', and will enumerate in FreeBSD as '''/dev/mirror/boot''' gmirror label -vh boot /dev/label/geom0s1a /dev/label/geom1s1a /dev/label/geom2s1a /dev/label/geom3s1a Now we're going to pull together the d partion on each drive, create pairwise stripes, then mirror the new stripes together. gstripe label -vh -s 131072 st0 /dev/label/geom0s1d /dev/geom1s1d gstripe label -vh -s 131072 st1 /dev/label/geom2s1d /dev/geom3s1d We should now have two new devices '''/dev/stripe/st0''' and '''/dev/stripe/st1'''. Now mirror those two devices to create our final device that will next be used for the rest of our filesystems: gmirror label -vh gm0 /dev/stripe/st0 /dev/stripe/st1 We now have our final device '''/dev/mirror/gm0''' that is going to serve as the base for our regular filesystems. Create a basic slicing of the new disk, then edit to taste. Note that we don't fdisk the raw gm0 device - we create slices directly on the raw device. slave(/u9/antiduh) # bsdlabel -wB /dev/mirror/gm0 slave(/u9/antiduh) # bsdlabel -e /dev/mirror/gm0 # size offset fstype [fsize bsize bps/cpg] a: 1 16 unused # that is just a bare '1', not '1M'. This is to get around a bug in bsdlabel c: #-- leave as is -- e: 1000M * 4.2BSD # /var f: 500M * 4.2BSD # /tmp d: * * 4.2BSD # /usr, which just gets the rest Now create our filesystems on associated slices (the -U option enables soft updates) newfs /dev/mirror/boot newfs -U /dev/mirror/gm0d newfs -U /dev/mirror/gm0e newfs -U /dev/mirror/gm0f Now we finally get down to mounting the disks, installing the OS on it, setting up the OS to boot, and finally booting. Mount the root disk as /mnt mount /dev/mirror/boot /mnt Create mount points for our other file systems, then mount them: mkdir /mnt/usr mkdir /mnt/var mkdir /mnt/tmp mount /dev/mirror/gm0d /mnt/usr mount /dev/mirror/gm0e /mnt/var mount /dev/mirror/gm0f /mnt/tmp Now we're going to do a install the fun way: DESTDIR=/mnt export DESTDIR cd /dist/6.2-RELEASE/base; ./install.sh cd ../ports; ./install.sh cd ../manpages; ./install.sh cd ../kernels; ./install.sh GENERIC mv /mnt/boot/GENERIC/* /mnt/boot/kernel You now have a basic, blank, sorta bootable unconfigured FreeBSD install on the machine. Setup the boot loader to load the needed kernel modules, so that we can mount root from the mirror. echo 'geom_label_load="YES"' >> /mnt/boot/loader.conf echo 'geom_stripe_load="YES"' >> /mnt/boot/loader.conf echo 'geom_mirror_load="YES"' >> /mnt/boot/loader.conf Set a kernel option to use more memory but make the stripe layer faster, otherwise stripes are unbearably slow. echo 'kern.geom.stripe.fast=1"' >> /mnt/boot/loader.conf Now setup our initial fstab file: echo '/dev/label/geom0s1b none swap sw 0 0' >> /mnt/etc/fstab echo '/dev/label/geom1s1b none swap sw 0 0' >> /mnt/etc/fstab echo '/dev/label/geom2s1b none swap sw 0 0' >> /mnt/etc/fstab echo '/dev/label/geom3s1b none swap sw 0 0' >> /mnt/etc/fstab echo '/dev/mirror/boot / ufs rw 1 1' >> /mnt/etc/fstab echo '/dev/mirror/gm0d /usr ufs rw 2 2' >> /mnt/etc/fstab echo '/dev/mirror/gm0e /var ufs rw 2 2' >> /mnt/etc/fstab echo '/dev/mirror/gm0f /tmp ufs rw 2 2' >> /mnt/etc/fstab Reboot and the machine should start the raid sets automatically and mount off of the raid array automatically. Keep that install CD handy in case you messed up. Login, change the root password, setup rc.conf (hostname, interfaces, ssh, linux binary compat... ), start installing stuff, etc. Enjoy. ==Block Diagram== da0 --label-->geom0 da1 --label-->geom1 da2 --label-->geom2 da3 --label-->geom3 label/geom0s1a --|--mirror-->/dev/mirror/boot label/geom1s1a --| label/geom2s1a --| label/geom3s1a --| label/geom0s1d --|--stripe-->/dev/stripe/st0 --|--mirror-->/dev/mirror/gm0 label/geom1s1d --| | | label/geom2s1d --|--stripe-->/dev/stripe/st1 --| label/geom3s1d --| mirror/boot --> / mirror/gm0s1d --> /usr mirror/gm0s1e --> /var mirror/gm0s1f --> /tmp label/geom0s1b --> swap label/geom1s1b --> swap label/geom2s1b --> swap label/geom3s1b --> swap From owner-freebsd-geom@FreeBSD.ORG Mon Oct 29 21:06:35 2007 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 784F616A46C for ; Mon, 29 Oct 2007 21:06:35 +0000 (UTC) (envelope-from lulf@stud.ntnu.no) Received: from decibel.pvv.ntnu.no (unknown [IPv6:2001:700:300:1900:2e0:81ff:fe2d:e9b2]) by mx1.freebsd.org (Postfix) with ESMTP id AA0DA13C4C4 for ; Mon, 29 Oct 2007 21:06:34 +0000 (UTC) (envelope-from lulf@stud.ntnu.no) Received: from tvilling.pvv.ntnu.no ([129.241.210.198] helo=carrot.studby.ntnu.no ident=lulf) by decibel.pvv.ntnu.no with esmtp (Exim 4.60) (envelope-from ) id 1Imbod-0007CW-Di; Mon, 29 Oct 2007 22:06:31 +0100 Date: Mon, 29 Oct 2007 22:06:29 +0100 From: Ulf Lilleengen To: Felipe Neuwald Message-ID: <20071029210629.GA26364@carrot.studby.ntnu.no> Mail-Followup-To: Felipe Neuwald , freebsd-geom@freebsd.org References: <47260370.3060004@neuwald.biz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <47260370.3060004@neuwald.biz> User-Agent: Mutt/1.5.16 (2007-06-09) Cc: freebsd-geom@freebsd.org Subject: Re: Raid 0 + 1 X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: ...@carrot.studby.ntnu.no List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Oct 2007 21:06:35 -0000 On Mon, Oct 29, 2007 at 01:59:44PM -0200, Felipe Neuwald wrote: > Hi Folks, > > I talked with my customer, and we decided to implement a 0 + 1 RAID, > with 4 disks of 250Gb each. > > Here is how my RAID is working now: > > [root@fileserver /]# gvinum list > 4 drives: > D a State: up /dev/ad4 A: 0/238474 MB (0%) > D b State: up /dev/ad5 A: 0/238475 MB (0%) > D c State: up /dev/ad6 A: 0/238475 MB (0%) > D d State: up /dev/ad7 A: 0/238475 MB (0%) > > 1 volume: > V data State: up Plexes: 1 Size: 931 GB > > 1 plex: > P data.p0 S State: up Subdisks: 4 Size: 931 GB > > 4 subdisks: > S data.p0.s0 State: up D: a Size: 232 GB > S data.p0.s1 State: up D: b Size: 232 GB > S data.p0.s2 State: up D: c Size: 232 GB > S data.p0.s3 State: up D: d Size: 232 GB > > > Could someone give me on example of how I'll implement a 0 + 1 RAID? > Hello, You need to create two plexes to do this, so something like: drive a device /dev/ad4 drive b device /dev/ad5 drive c device /dev/ad6 drive d device /dev/ad7 volume data plex org striped 493k # or some other stripesize sd drive a sd drive b plex org striped 493k sd drive c sd drive d Should do the trick. Also note that you can use a combination of gstripe(8) and gmirror(8) to achieve the same effect. -- Ulf Lilleengen From owner-freebsd-geom@FreeBSD.ORG Tue Oct 30 01:38:38 2007 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D99D216A469 for ; Tue, 30 Oct 2007 01:38:38 +0000 (UTC) (envelope-from lulf@stud.ntnu.no) Received: from signal.itea.ntnu.no (signal.itea.ntnu.no [129.241.190.231]) by mx1.freebsd.org (Postfix) with ESMTP id 9982613C4B3 for ; Tue, 30 Oct 2007 01:38:38 +0000 (UTC) (envelope-from lulf@stud.ntnu.no) Received: from localhost (localhost [127.0.0.1]) by signal.itea.ntnu.no (Postfix) with ESMTP id 74FAC34876; Mon, 29 Oct 2007 23:13:09 +0100 (CET) Received: from caracal.stud.ntnu.no (caracal.stud.ntnu.no [129.241.56.185]) by signal.itea.ntnu.no (Postfix) with ESMTP; Mon, 29 Oct 2007 23:13:09 +0100 (CET) Received: by caracal.stud.ntnu.no (Postfix, from userid 2312) id AF1046241BA; Mon, 29 Oct 2007 23:13:23 +0100 (CET) Date: Mon, 29 Oct 2007 23:13:23 +0100 From: Ulf Lilleengen To: Felipe Neuwald Message-ID: <20071029221323.GA28014@stud.ntnu.no> References: <47260370.3060004@neuwald.biz> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <47260370.3060004@neuwald.biz> User-Agent: Mutt/1.5.9i X-Content-Scanned: with sophos and spamassassin at mailgw.ntnu.no. Cc: freebsd-geom@freebsd.org Subject: Re: Raid 0 + 1 X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Oct 2007 01:38:38 -0000 On man, okt 29, 2007 at 01:59:44 -0200, Felipe Neuwald wrote: > Hi Folks, > > I talked with my customer, and we decided to implement a 0 + 1 RAID, > with 4 disks of 250Gb each. > > Here is how my RAID is working now: > > [root@fileserver /]# gvinum list > 4 drives: > D a State: up /dev/ad4 A: 0/238474 MB (0%) > D b State: up /dev/ad5 A: 0/238475 MB (0%) > D c State: up /dev/ad6 A: 0/238475 MB (0%) > D d State: up /dev/ad7 A: 0/238475 MB (0%) > > 1 volume: > V data State: up Plexes: 1 Size: 931 GB > > 1 plex: > P data.p0 S State: up Subdisks: 4 Size: 931 GB > > 4 subdisks: > S data.p0.s0 State: up D: a Size: 232 GB > S data.p0.s1 State: up D: b Size: 232 GB > S data.p0.s2 State: up D: c Size: 232 GB > S data.p0.s3 State: up D: d Size: 232 GB > > > Could someone give me on example of how I'll implement a 0 + 1 RAID? > Hello, You need to create two plexes to do this, so something like: drive a device /dev/ad4 drive b device /dev/ad5 drive c device /dev/ad6 drive d device /dev/ad7 volume data plex org striped 493k # or some other stripesize sd drive a sd drive b plex org striped 493k sd drive c sd drive d Should do the trick. Also note that you can use a combination of gstripe(8) and gmirror(8) to achieve the same effect. -- Ulf Lilleengen From owner-freebsd-geom@FreeBSD.ORG Tue Oct 30 04:34:49 2007 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C235F16A419 for ; Tue, 30 Oct 2007 04:34:49 +0000 (UTC) (envelope-from tom.hurst@clara.net) Received: from spork.qfe3.net (spork.qfe3.net [212.13.207.101]) by mx1.freebsd.org (Postfix) with ESMTP id 84D7313C4AA for ; Tue, 30 Oct 2007 04:34:49 +0000 (UTC) (envelope-from tom.hurst@clara.net) Received: from [81.104.144.87] (helo=voi.aagh.net) by spork.qfe3.net with esmtp (Exim 4.66 (FreeBSD)) (envelope-from ) id 1ImiQJ-000OUJ-31; Tue, 30 Oct 2007 04:09:51 +0000 Received: from freaky by voi.aagh.net with local (Exim 4.68 (FreeBSD)) (envelope-from ) id 1ImiQI-000KCr-W2; Tue, 30 Oct 2007 04:09:51 +0000 Date: Tue, 30 Oct 2007 04:09:50 +0000 From: Thomas Hurst To: Kevin Thompson Message-ID: <20071030040950.GA76585@voi.aagh.net> Mail-Followup-To: Kevin Thompson , Felipe Neuwald , freebsd-geom@freebsd.org References: <4414.147.177.192.113.1193678595.squirrel@angst.csh.rit.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4414.147.177.192.113.1193678595.squirrel@angst.csh.rit.edu> Organization: Not much. User-Agent: Mutt/1.5.16 (2007-06-09) Sender: Thomas Hurst Cc: Felipe Neuwald , freebsd-geom@freebsd.org Subject: Re: Raid 0 + 1 X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Oct 2007 04:34:49 -0000 * Kevin Thompson (antiduh@csh.rit.edu) wrote: > Now we're going to pull together the d partion on each drive, create > pairwise stripes, then mirror the new stripes together. > gstripe label -vh -s 131072 st0 /dev/label/geom0s1d /dev/geom1s1d > gstripe label -vh -s 131072 st1 /dev/label/geom2s1d /dev/geom3s1d > > We should now have two new devices '''/dev/stripe/st0''' and > '''/dev/stripe/st1'''. Now mirror those two devices to create our final > device that will next be used for the rest of our filesystems: > gmirror label -vh gm0 /dev/stripe/st0 /dev/stripe/st1 Er, shouldn't you be doing this the other way around? Make two mirrors, then stripe across them. IO performance should be identical in the normal case, degrade less with a single disk failure (since only one disk drops out of the array instead of an entire pair), and it'll be more likely to survive a two disk failure. -- Thomas 'Freaky' Hurst http://hur.st/ From owner-freebsd-geom@FreeBSD.ORG Tue Oct 30 05:30:36 2007 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C54C116A419; Tue, 30 Oct 2007 05:30:36 +0000 (UTC) (envelope-from outi@bytephobia.de) Received: from dd18312.kasserver.com (dd18312.kasserver.com [85.13.138.194]) by mx1.freebsd.org (Postfix) with ESMTP id 57A9B13C4A6; Tue, 30 Oct 2007 05:30:36 +0000 (UTC) (envelope-from outi@bytephobia.de) Received: from mobility.bytephobia.de (pD9E35322.dip.t-dialin.net [217.227.83.34]) by dd18312.kasserver.com (Postfix) with ESMTP id 674491936B05A; Tue, 30 Oct 2007 00:12:38 +0100 (CET) Date: Tue, 30 Oct 2007 00:12:28 +0100 From: Patrick Hurrelmann To: freebsd-geom@freebsd.org, freebsd-questions@freebsd.org Message-ID: <20071030001228.65816a87@mobility.bytephobia.de> Organization: private X-Mailer: Claws Mail 3.0.2 (GTK+ 2.10.14; i386-portbld-freebsd7.0) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: Subject: Fw: Best way for a gmirrored gjournal? X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Oct 2007 05:30:36 -0000 Dear all, I'm forwarding this message to this lists, as current@ obviously was the wrong recipient. I kindly ask you for your ideas and proposals on my questions below. Regards, Patrick Begin forwarded message: Date: Mon, 22 Oct 2007 19:08:35 +0200 From: Patrick Hurrelmann To: freebsd-current@freebsd.org Subject: Best way for a gmirrored gjournal? Hi all, Currently I'm trying to install a new server and need some hints on how to best configure filesystems using gmirror and gjournal. The server in question is a amd64 with 512mb of ram and 2x 80gb sata hdds. So I was thinking of a mount-point layout like the following: ad0s1 / (1gb) swap (1gb) /var (8gb) /tmp (1gb) /home (4gb) /usr (13gb) /jails (39gb) ad0s2 10gb for journaling Which would leave a space of 10gb for journaling. I digged through the mailinglist-archives and man-pages of gmirror and gjournal but all I ended up with are questions and doubts :) Now I wanted to create 2 mirrors (gm0s1 and gm0s2). Gmirror gm0s1 containing the slices ad0s1 and ad2s1, while gm0s2 should contain ad0s2 and ad2s2. I created 2 slices, as with the above shown partitioning I was running out of mount-points for this slice. Is such a layout reasonable? Or is it stupid to use a dedicated slice just for journaling and better skip e.g /tmp partition to leave space for a dedicated journaling partition on this slice? Btw. are 10gb enough for journaling of 6 partitions? Or do I need one dedicated partition for journaling each? If I skip using a separate partition for journaling data, gjournal keeps telling me that e.g. the root partion of 1gb is too small for jorunaling. Would it be save to decrease journal size altough man-page discourages? What do you people out there suggest? How do you handle systems with gmirror and gjournal combined? Or even use ZFS although ram is limited (as the machine will serve up several jails with e.g. postgres)? I'm really looking forward to suggestions from you. I intentionally directed this mail to current@ as I think that here are the most people around with experience on gjournal. But if I better should direct this mail to questions@ I'm happy to do so, too. Best regards, Patrick -- ==================================================================== Patrick Hurrelmann | "Programming today is a race between software Mannheim, Germany | engineers striving to build bigger and better | idiot-proof programs, and the Universe trying outi@bytephobia.de | to produce bigger and better idiots. So far, www.bytephobia.de | the Universe is winning." - Rich Cook /"\ \ / ASCII Ribbon Campaign X against HTML email & vCards / \ From owner-freebsd-geom@FreeBSD.ORG Wed Oct 31 19:28:11 2007 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9CD4716A421 for ; Wed, 31 Oct 2007 19:28:11 +0000 (UTC) (envelope-from freebsd-lists@ideo.com.br) Received: from an-out-0708.google.com (an-out-0708.google.com [209.85.132.244]) by mx1.freebsd.org (Postfix) with ESMTP id 1532413C494 for ; Wed, 31 Oct 2007 19:28:10 +0000 (UTC) (envelope-from freebsd-lists@ideo.com.br) Received: by an-out-0708.google.com with SMTP id c24so39571ana for ; Wed, 31 Oct 2007 12:27:46 -0700 (PDT) Received: by 10.142.162.5 with SMTP id k5mr2081859wfe.1193843658279; Wed, 31 Oct 2007 08:14:18 -0700 (PDT) Received: by 10.142.155.19 with HTTP; Wed, 31 Oct 2007 08:14:18 -0700 (PDT) Message-ID: <8d4842b50710310814w3880f7d3ldf8abe3a236cbcc8@mail.gmail.com> Date: Wed, 31 Oct 2007 12:14:18 -0300 From: "Marco Haddad" To: freebsd-geom@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: gvinum and raid5 X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 31 Oct 2007 19:28:11 -0000 Hello, I have been using gvinum to build raid5 volumes since vinum retairement with some success. Most dificulties are related to not yet implemented commands, but the hope for a more complete version keeps me goning on. I found in recent researchs that a lot of people say gvinum should not be trusted, when it comes to raid5. I began to get worried. Am I alone using gvinum raid5? Did everyone abandon it? What about the development guys? Is there anyone still working on it? Will a complete gvinum ever be released? Hopping for good news, Marco Haddad From owner-freebsd-geom@FreeBSD.ORG Wed Oct 31 21:58:10 2007 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3DBDD16A421 for ; Wed, 31 Oct 2007 21:58:10 +0000 (UTC) (envelope-from lulf@stud.ntnu.no) Received: from signal.itea.ntnu.no (signal.itea.ntnu.no [129.241.190.231]) by mx1.freebsd.org (Postfix) with ESMTP id EAE2A13C49D for ; Wed, 31 Oct 2007 21:58:09 +0000 (UTC) (envelope-from lulf@stud.ntnu.no) Received: from localhost (localhost [127.0.0.1]) by signal.itea.ntnu.no (Postfix) with ESMTP id E97B63387D; Wed, 31 Oct 2007 22:57:40 +0100 (CET) Received: from caracal.stud.ntnu.no (caracal.stud.ntnu.no [129.241.56.185]) by signal.itea.ntnu.no (Postfix) with ESMTP; Wed, 31 Oct 2007 22:57:40 +0100 (CET) Received: by caracal.stud.ntnu.no (Postfix, from userid 2312) id 40A48624219; Wed, 31 Oct 2007 22:57:56 +0100 (CET) Date: Wed, 31 Oct 2007 22:57:56 +0100 From: Ulf Lilleengen To: Marco Haddad Message-ID: <20071031215756.GB1670@stud.ntnu.no> References: <8d4842b50710310814w3880f7d3ldf8abe3a236cbcc8@mail.gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <8d4842b50710310814w3880f7d3ldf8abe3a236cbcc8@mail.gmail.com> User-Agent: Mutt/1.5.9i X-Content-Scanned: with sophos and spamassassin at mailgw.ntnu.no. Cc: freebsd-geom@freebsd.org Subject: Re: gvinum and raid5 X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 31 Oct 2007 21:58:10 -0000 On ons, okt 31, 2007 at 12:14:18 -0300, Marco Haddad wrote: > Hello, > > I have been using gvinum to build raid5 volumes since vinum retairement with > some success. Most dificulties are related to not yet implemented commands, > but the hope for a more complete version keeps me goning on. > > I found in recent researchs that a lot of people say gvinum should not be > trusted, when it comes to raid5. I began to get worried. Am I alone using > gvinum raid5? Did everyone abandon it? What about the development guys? Is > there anyone still working on it? Will a complete gvinum ever be released? > I'm working on it, and there are definately people still using it. (I've recieved a number of private mails as well as those seen on this list). IMO, gvinum can be trusted when it comes to raid5. I've not experienced any corruption-bugs or anything like that with it. I'm working on preparing my SoC work for inclusion in CURRENT these days (It takes some time because it have to be reviewed, and it's a lot of code, so be patient). The important commands (IMO :)) have been implemented (attach/detach/start/stop/mirror/stripe/raid5/concat), but I'm curious to hear what commands you do miss, so that I can look into it and consider implementing them. I'm interested in helping making gvinum better, so any suggestions are welcome. I hope to backport gvinum to both RELENG_7 and RELENG_6 when the time is right, but as I said, you'd have to be patient. -- Ulf Lilleengen From owner-freebsd-geom@FreeBSD.ORG Thu Nov 1 09:32:18 2007 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3041D16A46B; Thu, 1 Nov 2007 09:32:18 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from falcon.cybervisiontech.com (falcon.cybervisiontech.com [217.20.163.9]) by mx1.freebsd.org (Postfix) with ESMTP id DD91B13C4A6; Thu, 1 Nov 2007 09:32:17 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from localhost (localhost [127.0.0.1]) by falcon.cybervisiontech.com (Postfix) with ESMTP id 5F2A8744007; Thu, 1 Nov 2007 11:31:44 +0200 (EET) X-Virus-Scanned: Debian amavisd-new at falcon.cybervisiontech.com Received: from falcon.cybervisiontech.com ([127.0.0.1]) by localhost (falcon.cybervisiontech.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id UzRX8orDLg2d; Thu, 1 Nov 2007 11:31:44 +0200 (EET) Received: from [10.2.1.87] (gateway.cybervisiontech.com.ua [88.81.251.18]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by falcon.cybervisiontech.com (Postfix) with ESMTP id 07A9C744005; Thu, 1 Nov 2007 11:31:43 +0200 (EET) Message-ID: <47299CFE.6010309@icyb.net.ua> Date: Thu, 01 Nov 2007 11:31:42 +0200 From: Andriy Gapon User-Agent: Thunderbird 2.0.0.6 (X11/20070803) MIME-Version: 1.0 To: freebsd-geom@freebsd.org, Pawel Jakub Dawidek Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: Subject: gjournal for 6.X and fsck X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Nov 2007 09:32:18 -0000 It seems that gjournal patch for 6.X doesn't include the changes to make fsck aware of gjounral that were added to CURRENT code. Is it possible to produce a patch with such changes ? I would appreciate even a patch as it was applied to current, I think it will be very easy to massage it for 6.X. It's just that unlike svn (or many other source control systems) it is very hard to extract a "change set" to multiple files from CVS. Thank you! P.S. Just thought up of another nice-to-have thing: it seems that the 6.X patch has the earlier approach with .deleted directory, which was later improved. It would be nice to get a patch for this as well. And now I think that the two things are probably quite related. -- Andriy Gapon From owner-freebsd-geom@FreeBSD.ORG Thu Nov 1 09:59:32 2007 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1ABD716A41A; Thu, 1 Nov 2007 09:59:32 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from falcon.cybervisiontech.com (falcon.cybervisiontech.com [217.20.163.9]) by mx1.freebsd.org (Postfix) with ESMTP id CA28D13C4A8; Thu, 1 Nov 2007 09:59:31 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from localhost (localhost [127.0.0.1]) by falcon.cybervisiontech.com (Postfix) with ESMTP id 6E59E744005; Thu, 1 Nov 2007 11:59:09 +0200 (EET) X-Virus-Scanned: Debian amavisd-new at falcon.cybervisiontech.com Received: from falcon.cybervisiontech.com ([127.0.0.1]) by localhost (falcon.cybervisiontech.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id fc5kcZj0BemT; Thu, 1 Nov 2007 11:59:09 +0200 (EET) Received: from [10.2.1.87] (gateway.cybervisiontech.com.ua [88.81.251.18]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by falcon.cybervisiontech.com (Postfix) with ESMTP id 0A99043D47F; Thu, 1 Nov 2007 11:59:08 +0200 (EET) Message-ID: <4729A36C.50506@icyb.net.ua> Date: Thu, 01 Nov 2007 11:59:08 +0200 From: Andriy Gapon User-Agent: Thunderbird 2.0.0.6 (X11/20070803) MIME-Version: 1.0 To: freebsd-geom@freebsd.org, Pawel Jakub Dawidek References: <47299CFE.6010309@icyb.net.ua> In-Reply-To: <47299CFE.6010309@icyb.net.ua> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: Subject: Re: gjournal for 6.X and fsck X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Nov 2007 09:59:32 -0000 on 01/11/2007 11:31 Andriy Gapon said the following: > It seems that gjournal patch for 6.X doesn't include the changes to make > fsck aware of gjounral that were added to CURRENT code. > Is it possible to produce a patch with such changes ? > > I would appreciate even a patch as it was applied to current, I think it > will be very easy to massage it for 6.X. It's just that unlike svn (or > many other source control systems) it is very hard to extract a "change > set" to multiple files from CVS. > > Thank you! > > P.S. Just thought up of another nice-to-have thing: it seems that the > 6.X patch has the earlier approach with .deleted directory, which was > later improved. It would be nice to get a patch for this as well. And > now I think that the two things are probably quite related. On another thought: maybe MFC gjournal to RELENG_6 in time for 6.3 ? I know, this might a bit too much asking :-) -- Andriy Gapon From owner-freebsd-geom@FreeBSD.ORG Fri Nov 2 09:04:29 2007 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BD26D16A417 for ; Fri, 2 Nov 2007 09:04:29 +0000 (UTC) (envelope-from joe@rootnode.com) Received: from mail.osoft.us (osoft.us [67.14.192.59]) by mx1.freebsd.org (Postfix) with ESMTP id 9F41F13C4A3 for ; Fri, 2 Nov 2007 09:04:29 +0000 (UTC) (envelope-from joe@rootnode.com) Received: from [10.0.2.105] (adsl-65-67-81-98.dsl.ltrkar.swbell.net [65.67.81.98]) by mail.osoft.us (Postfix) with ESMTP id 3F2D833C8A; Thu, 1 Nov 2007 22:21:01 -0600 (CST) Message-ID: <472AA59F.3020103@rootnode.com> Date: Thu, 01 Nov 2007 23:20:47 -0500 From: Joe Koberg User-Agent: Mail/News 1.5.0.8 (Windows/20061104) MIME-Version: 1.0 To: Ulf Lilleengen References: <8d4842b50710310814w3880f7d3ldf8abe3a236cbcc8@mail.gmail.com> <20071031215756.GB1670@stud.ntnu.no> In-Reply-To: <20071031215756.GB1670@stud.ntnu.no> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Marco Haddad , freebsd-geom@freebsd.org Subject: Re: gvinum and raid5 X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 02 Nov 2007 09:04:29 -0000 Ulf Lilleengen wrote: > On ons, okt 31, 2007 at 12:14:18 -0300, Marco Haddad wrote: > >> I found in recent researchs that a lot of people say gvinum should not be >> trusted, when it comes to raid5. I began to get worried. Am I alone using >> >> > I'm working on it, and there are definately people still using it. (I've > recieved a number of private mails as well as those seen on this list). IMO, > gvinum can be trusted when it comes to raid5. I've not experienced any > corruption-bugs or anything like that with it. > The source of the mistrust may be the fact that few software-only RAID-5 systems can guarantee write consistency across a multi-drive read-update-write cycle in the case of, e.g., power failure. There is no way for the software RAID to force the parallel writes to complete simultaneously on all drives, and from the time the first starts until the last is completed, the array is in an inconsistent (corrupted) state. Dedicated RAID hardware solves this with battery-backed RAM that maintains the array state in a very robust manner. Dedicated controllers also tend to be connected to "better" SCSI or SAS drives that properly report write completion via their command queuing protocol. ZFS tackles this problem by not writing data back in place, with inline checksums of all data and metadata (so that corruption is detectable), and by dynamically-sized "full stripe writes" for every write (no read-update-write cycle required). A solution for gvinum/UFS may be to set the stripe and filesystem block sizes the same, so that a partial stripe is never written and thus no read-update-write cycle occurs. However the use of in-place updates still has the possibility of corrupting data if the write completes on one drive in the array and not the other. The visibility of this "RAID-5 hole" may be very low if you have a well-behaved system (and drives) on a UPS. But since the corruption is silent, you can be stung far down the road if something "bad" does happen without notice. Especially with ATA drives with less robust writeback cache behavior in small-system environments (without backup power, maybe-flaky cabling, etc...). It is important to note that I am describing a universal problem with software RAID-5, and not any shortcoming of gvinum in particular. Joe Koberg joe at osoft dot us From owner-freebsd-geom@FreeBSD.ORG Fri Nov 2 11:13:38 2007 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E1BFE16A421 for ; Fri, 2 Nov 2007 11:13:38 +0000 (UTC) (envelope-from anjoel.s@gmail.com) Received: from ro-out-1112.google.com (ro-out-1112.google.com [72.14.202.183]) by mx1.freebsd.org (Postfix) with ESMTP id 6C69213C480 for ; Fri, 2 Nov 2007 11:13:37 +0000 (UTC) (envelope-from anjoel.s@gmail.com) Received: by ro-out-1112.google.com with SMTP id m6so83580roe for ; Fri, 02 Nov 2007 04:13:23 -0700 (PDT) Received: by 10.115.79.1 with SMTP id g1mr991947wal.1193945380880; Thu, 01 Nov 2007 12:29:40 -0700 (PDT) Received: by 10.114.160.7 with HTTP; Thu, 1 Nov 2007 12:29:40 -0700 (PDT) Message-ID: <3a72fe8f0711011229s1d23366ame17ff3f4ee1f65e0@mail.gmail.com> Date: Thu, 1 Nov 2007 17:29:40 -0200 From: "Anderson J. de Souza" To: freebsd-geom@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: Bug on md + geli + jail !? X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 02 Nov 2007 11:13:39 -0000 Hello people ! Well,... I have an good system with Jails over Partitioned Encrypted Memory Disc, initialized with rc system, but if I does one jail stop [jailname], geli don't remove partition mdX.elia, and I can't use this device again. I try use rm on device file, then I when restart geli over mdX, its dont make mdX.elia ,.. i try show it with "devfs rule apply path md6.elia hide" but then show 2 devices: like this: crw-r----- 1 root operator 0, 160 Nov 1 09:45 /dev/md6.elia crw-r----- 1 root operator 0, 160 Nov 1 09:45 /dev/md6.elia if I try mount the device the system show: mount: /dev/md6.elia: Device not configured My partition scheme: (BEFORE BUG) /dev/md6.elia on /jails/revproxy (ufs, local, read-only, acls) /dev/md6.elib on /jails/revproxy/var (ufs, local, soft-updates, acls) /dev/md6.elid on /jails/revproxy/tmp (ufs, local, nosuid, soft-updates, acls) /dev/md6.elie on /jails/revproxy/usr (ufs, local, soft-updates, acls) /dev/md6.elia on /jails/revproxy/data (ufs, local, nosuid, soft-updates, acls) TUNEFS AFTER BUG: free05-meta# tunefs -p /dev/md6.elia tunefs: /dev/md6.elia: could not open special device TUNEFS BEFORE BUG: # tunefs -p /dev/md6.elia tunefs: ACLs: (-a) disabled tunefs: MAC multilabel: (-l) disabled tunefs: soft updates: (-n) enabled tunefs: maximum blocks per file in a cylinder group: (-e) 2048 tunefs: average file size: (-f) 16384 tunefs: average number of files in a directory: (-s) 64 tunefs: minimum percentage of free space: (-m) 8% tunefs: optimization preference: (-o) time tunefs: volume label: (-L) # tunefs -p /dev/md6.elib tunefs: ACLs: (-a) disabled tunefs: MAC multilabel: (-l) disabled tunefs: soft updates: (-n) enabled tunefs: maximum blocks per file in a cylinder group: (-e) 2048 tunefs: average file size: (-f) 16384 tunefs: average number of files in a directory: (-s) 64 tunefs: minimum percentage of free space: (-m) 8% tunefs: optimization preference: (-o) time tunefs: volume label: (-L) # tunefs -p /dev/md6.elid tunefs: ACLs: (-a) disabled tunefs: MAC multilabel: (-l) disabled tunefs: soft updates: (-n) enabled tunefs: maximum blocks per file in a cylinder group: (-e) 2048 tunefs: average file size: (-f) 16384 tunefs: average number of files in a directory: (-s) 64 tunefs: minimum percentage of free space: (-m) 8% tunefs: optimization preference: (-o) time tunefs: volume label: (-L) # tunefs -p /dev/md6.elie tunefs: ACLs: (-a) disabled tunefs: MAC multilabel: (-l) disabled tunefs: soft updates: (-n) enabled tunefs: maximum blocks per file in a cylinder group: (-e) 2048 tunefs: average file size: (-f) 16384 tunefs: average number of files in a directory: (-s) 64 tunefs: minimum percentage of free space: (-m) 8% tunefs: optimization preference: (-o) time tunefs: volume label: (-L) My Devices before stop jail: crw-r----- 1 root operator 0, 101 Nov 1 09:45 /dev/md6 crw-r----- 1 root operator 0, 122 Nov 1 09:45 /dev/md6.eli crw-r----- 1 root operator 0, 123 Nov 1 09:45 /dev/md6.elia crw-r----- 1 root operator 0, 124 Nov 1 09:45 /dev/md6.elib crw-r----- 1 root operator 0, 125 Nov 1 09:45 /dev/md6.elic crw-r----- 1 root operator 0, 126 Nov 1 09:45 /dev/md6.elid crw-r----- 1 root operator 0, 127 Nov 1 09:45 /dev/md6.elie MY BSDLABELS: # /dev/md6.eli: 8 partitions: # size offset fstype [fsize bsize bps/cpg] a: 262144 16 4.2BSD 2048 16384 16392 b: 2097152 262160 4.2BSD 2048 16384 28552 c: 6291455 0 unused 0 0 # "raw" part, don't edit d: 524288 2359312 4.2BSD 2048 16384 32776 e: 3407855 2883600 4.2BSD 2048 16384 28552 -- ___________________ Anderson J. de Souza - Networking and Security - [ - Professional Consulting - The best firewall - ] http://anjoel.s.googlepages.com - anjoel.s@gmail.com Phone: +55 (54) 9115.13.15 - Sip: 1-747-006-0374 From owner-freebsd-geom@FreeBSD.ORG Fri Nov 2 20:11:40 2007 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B3D9416A41A for ; Fri, 2 Nov 2007 20:11:40 +0000 (UTC) (envelope-from pgiessel@mac.com) Received: from smtpoutm.mac.com (smtpoutm.mac.com [17.148.16.66]) by mx1.freebsd.org (Postfix) with ESMTP id 8893E13C48A for ; Fri, 2 Nov 2007 20:11:39 +0000 (UTC) (envelope-from pgiessel@mac.com) Received: from webmail012 (webmail012-s [10.13.128.12]) by smtpoutm.mac.com (Xserve/smtpout003/MantshX 4.0) with ESMTP id lA2JcaJp009034; Fri, 2 Nov 2007 12:38:36 -0700 (PDT) Date: Fri, 02 Nov 2007 12:38:36 -0700 From: Peter Giessel To: Joe Koberg Message-ID: <0001DFFC-0115-1000-9A80-3F81219C1B16-Webmail-10013@mac.com> in-reply-to: <472AA59F.3020103@rootnode.com> references: <8d4842b50710310814w3880f7d3ldf8abe3a236cbcc8@mail.gmail.com> <20071031215756.GB1670@stud.ntnu.no> <472AA59F.3020103@rootnode.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Originating-IP: 69.178.5.90 Received: from [69.178.5.90] from webmail.mac.com with HTTP; Fri, 02 Nov 2007 12:38:36 -0700 Cc: Marco Haddad , Ulf Lilleengen , freebsd-geom@freebsd.org Subject: Re: gvinum and raid5 X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 02 Nov 2007 20:11:40 -0000 On Friday, November 02, 2007, at 01:04AM, "Joe Koberg" wrote: >Ulf Lilleengen wrote: >> On ons, okt 31, 2007 at 12:14:18 -0300, Marco Haddad wrote: >> >>> I found in recent researchs that a lot of people say gvinum should not be >>> trusted, when it comes to raid5. I began to get worried. Am I alone using >>> >>> >> I'm working on it, and there are definately people still using it. (I've >> recieved a number of private mails as well as those seen on this list). IMO, >> gvinum can be trusted when it comes to raid5. I've not experienced any >> corruption-bugs or anything like that with it. >> > >The source of the mistrust may be the fact that few software-only RAID-5 >systems can guarantee write consistency across a multi-drive >read-update-write cycle in the case of, e.g., power failure. That may be the true source, but my source of mistrust comes from a few drive failures and gvinum's inability to rebuild the replaced drive. Worked fine under vinum in tests, tried the same thing in gvinum (granted, this was under FreeBSD 5), and the array failed to rebuild. I can't be 100% sure it wasn't a flakey ATA controller and not gvinum's fault, and I no longer have access to the box to play with, but when I was playing with gvinum, replacing a failed drive usually resulted in panics. From owner-freebsd-geom@FreeBSD.ORG Sat Nov 3 01:33:19 2007 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E3E7E16A469 for ; Sat, 3 Nov 2007 01:33:19 +0000 (UTC) (envelope-from freebsd-lists@ideo.com.br) Received: from nz-out-0506.google.com (nz-out-0506.google.com [64.233.162.228]) by mx1.freebsd.org (Postfix) with ESMTP id 957D713C48E for ; Sat, 3 Nov 2007 01:33:18 +0000 (UTC) (envelope-from freebsd-lists@ideo.com.br) Received: by nz-out-0506.google.com with SMTP id l8so727254nzf for ; Fri, 02 Nov 2007 18:32:55 -0700 (PDT) Received: by 10.142.14.20 with SMTP id 20mr721912wfn.1194053574296; Fri, 02 Nov 2007 18:32:54 -0700 (PDT) Received: by 10.142.135.15 with HTTP; Fri, 2 Nov 2007 18:32:54 -0700 (PDT) Message-ID: <8d4842b50711021832g7ad7cec9x48d2f114b1e41f5f@mail.gmail.com> Date: Fri, 2 Nov 2007 22:32:54 -0300 From: "Marco Haddad" To: "Peter Giessel" In-Reply-To: <0001DFFC-0115-1000-9A80-3F81219C1B16-Webmail-10013@mac.com> MIME-Version: 1.0 References: <8d4842b50710310814w3880f7d3ldf8abe3a236cbcc8@mail.gmail.com> <20071031215756.GB1670@stud.ntnu.no> <472AA59F.3020103@rootnode.com> <0001DFFC-0115-1000-9A80-3F81219C1B16-Webmail-10013@mac.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-geom@freebsd.org, Ulf Lilleengen , Joe Koberg Subject: Re: gvinum and raid5 X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 03 Nov 2007 01:33:20 -0000 Hi, I must say that I had a strong faith on vinum too. I used it on a dozen servers to build raid5 volumes, specially when the hard drives were small and unreliable. So I had a few crashes naturally, but replacing the failed disk was easy and rebuild worked all times. I started using gvinum at the first SCSI controller not supported by vinum found. As gvinum solved the vinum problem with that controller, it immediately received the same faith I had on vinum. I kept using gvinum many times after, till my faith was shaken by a hard disk crash, because I could not get the replacement drive added to the raid5 volume. After a lot of head bumping against the wall, I came up with this work around procedure to replace a failed disk. I have just used this procedure today to replace a SATA hard disk that I suspect was the cause of an intermittent failure, with such a success that I began to think it isn't so bad... Anyway I'll describe a simple example in order to get your comments. Suppose a simple system with three hard disks ad0, ad1 and ad2. They were fdisked and labled equally. ad0s1a is the / and ad0s1d, ad1s1d and ad2s1d are of the same size and are used by gvinum as drives AD0, AD1 and AD2. Each drive has only one slice and they are joined in a raid5 plex formming the volume VOL. The gvinum create script would be the following: drive AD0 device /dev/ad0s1d drive AD1 device /dev/ad1s1d drive AD2 device /dev/ad2s1d volume VOL plex org raid5 128K sd drive AD0 sd drive AD1 sd drive AD2 Suppose ad1 crashes and gvinum marks it as down. With the command "gvinum l" we would get something like this: 3 drives: D AD0 State: up /dev/ad0s1d ... D AD1 State: down /dev/ad1s1d ... D AD2 State: up /dev/ad2s1d ... 1 volumes: V VOL State: up ... 1 plexes: P VOL.p0 R5 State: degraded ... 3 subdisks: S VOL.p0.s0 State: up D: AD0 ... S VOL.p0.s1 State: down D: AD1 ... S VOL.p0.s2 State: up D: AD2 ... First thing I do: edit fstab and comment out the line mounting /dev/gvinum/VOL wherever it was mounted. It is necessary because once mounted gvinum can not operate most commands, and umount doesn't do the trick. Then I shutdown the system and replace the hard disk and bring it up again. At this point the first weird thing can be noted. With 'gvinum l' you would get: 2 drives: D AD0 State: up /dev/ad0s1d ... D AD2 State: up /dev/ad2s1d ... 1 volumes: V VOL State: up ... 1 plexes: P VOL.p0 R5 State: up ... 3 subdisks: S VOL.p0.s0 State: up D: AD0 ... S VOL.p0.s1 State: up D: AD1 ... S VOL.p0.s2 State: up D: AD2 ... What? The AD1 is gonne, ok, but why the subdisk VOL.p0.s1 is up? And it makes the plex up instead of degraded. The first time I saw it I got the shivers. Next step is to fdisk and label the new disk just like the old one. The new disk can be bigger but, I think, the partition ad1s1d must be the same size as before. At this point should be enough to use gvinum create with a script file containing only the line: drive AD1 device /dev/ad1s1d but gvinum would panic with that and the system would lock or core dump. Then something weird must be done: remove all gvinum objects with 'gvinum rm ---'. Yes, just to make it clear, in this case the commands would be: gvinum rm -r AD0 gvinum rm -r AD2 gvinum rm VOL gvinum rm VOL.p0 gvinum rm VOL.p0.s1 Then we can use 'gvinum create' with the original script to recreate everything. Now it is all up again, but it isn't just right. The subdisk VOL.p0.s1 must be marked as stale with: gvinum setstate -f stale VOL.p0.s1 This brings back the plex to degraded mode and we can use: gvinum start VOL to rebuild it. It may take about 1 hour per 100GB of volume space, so we better grab some lunch... The progress can be seen at any time with: gvinum ls After that, a 'fsck -t ufs /dev/gvinum/VOL' will probably catch some errors left behind when the drive came down. Now we just need to uncomment that line in fstab and reboot. I think there's no easier way... Regards, Marco Haddad On 11/2/07, Peter Giessel wrote: > > On Friday, November 02, 2007, at 01:04AM, "Joe Koberg" < joe@rootnode.com> > wrote: > >Ulf Lilleengen wrote: > >> On ons, okt 31, 2007 at 12:14:18 -0300, Marco Haddad wrote: > >> > >>> I found in recent researchs that a lot of people say gvinum should not > be > >>> trusted, when it comes to raid5. I began to get worried. Am I alone > using > >>> > >>> > >> I'm working on it, and there are definately people still using it. > (I've > >> recieved a number of private mails as well as those seen on this list). > IMO, > >> gvinum can be trusted when it comes to raid5. I've not experienced any > >> corruption-bugs or anything like that with it. > >> > > > >The source of the mistrust may be the fact that few software-only RAID-5 > >systems can guarantee write consistency across a multi-drive > >read-update-write cycle in the case of, e.g., power failure. > > That may be the true source, but my source of mistrust comes from a few > drive failures and gvinum's inability to rebuild the replaced drive. > > Worked fine under vinum in tests, tried the same thing in gvinum (granted, > this was under FreeBSD 5), and the array failed to rebuild. > > I can't be 100% sure it wasn't a flakey ATA controller and not gvinum's > fault, and I no longer have access to the box to play with, but when I was > playing with gvinum, replacing a failed drive usually resulted in panics. > From owner-freebsd-geom@FreeBSD.ORG Sat Nov 3 02:57:09 2007 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 771F216A418 for ; Sat, 3 Nov 2007 02:57:09 +0000 (UTC) (envelope-from lulf@stud.ntnu.no) Received: from signal.itea.ntnu.no (signal.itea.ntnu.no [129.241.190.231]) by mx1.freebsd.org (Postfix) with ESMTP id EF9C113C4B2 for ; Sat, 3 Nov 2007 02:57:08 +0000 (UTC) (envelope-from lulf@stud.ntnu.no) Received: from localhost (localhost [127.0.0.1]) by signal.itea.ntnu.no (Postfix) with ESMTP id D613433AD0; Sat, 3 Nov 2007 02:54:18 +0100 (CET) Received: from caracal.stud.ntnu.no (caracal.stud.ntnu.no [129.241.56.185]) by signal.itea.ntnu.no (Postfix) with ESMTP; Sat, 3 Nov 2007 02:54:18 +0100 (CET) Received: by caracal.stud.ntnu.no (Postfix, from userid 2312) id 3153162421A; Sat, 3 Nov 2007 02:54:35 +0100 (CET) Date: Sat, 3 Nov 2007 02:54:35 +0100 From: Ulf Lilleengen To: Marco Haddad Message-ID: <20071103015435.GB22755@stud.ntnu.no> References: <8d4842b50710310814w3880f7d3ldf8abe3a236cbcc8@mail.gmail.com> <20071031215756.GB1670@stud.ntnu.no> <472AA59F.3020103@rootnode.com> <0001DFFC-0115-1000-9A80-3F81219C1B16-Webmail-10013@mac.com> <8d4842b50711021832g7ad7cec9x48d2f114b1e41f5f@mail.gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <8d4842b50711021832g7ad7cec9x48d2f114b1e41f5f@mail.gmail.com> User-Agent: Mutt/1.5.9i X-Content-Scanned: with sophos and spamassassin at mailgw.ntnu.no. Cc: Joe Koberg , Peter Giessel , freebsd-geom@freebsd.org Subject: Re: gvinum and raid5 X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 03 Nov 2007 02:57:09 -0000 On fre, nov 02, 2007 at 10:32:54 -0300, Marco Haddad wrote: > Hi, > > I must say that I had a strong faith on vinum too. I used it on a dozen > servers to build raid5 volumes, specially when the hard drives were small > and unreliable. So I had a few crashes naturally, but replacing the failed > disk was easy and rebuild worked all times. > [...] > > Suppose ad1 crashes and gvinum marks it as down. With the command "gvinum l" > we would get something like this: > > 3 drives: > D AD0 State: up /dev/ad0s1d ... > D AD1 State: down /dev/ad1s1d ... > D AD2 State: up /dev/ad2s1d ... > > 1 volumes: > V VOL State: up ... > > 1 plexes: > P VOL.p0 R5 State: degraded ... > > 3 subdisks: > S VOL.p0.s0 State: up D: AD0 ... > S VOL.p0.s1 State: down D: AD1 ... > S VOL.p0.s2 State: up D: AD2 ... > > > First thing I do: edit fstab and comment out the line mounting > /dev/gvinum/VOL wherever it was mounted. It is necessary because once > mounted gvinum can not operate most commands, and umount doesn't do the > trick. Then I shutdown the system and replace the hard disk and bring it up > again. True, this was a bit of a pain, also because the geom_vinum module can't be kldunloaded. This have been fixed in the "experimental" version :) > > At this point the first weird thing can be noted. With 'gvinum l' you would > get: > > 2 drives: > D AD0 State: up /dev/ad0s1d ... > D AD2 State: up /dev/ad2s1d ... > > 1 volumes: > V VOL State: up ... > > 1 plexes: > P VOL.p0 R5 State: up ... > > 3 subdisks: > S VOL.p0.s0 State: up D: AD0 ... > S VOL.p0.s1 State: up D: AD1 ... > S VOL.p0.s2 State: up D: AD2 ... > > What? The AD1 is gonne, ok, but why the subdisk VOL.p0.s1 is up? And it > makes the plex up instead of degraded. The first time I saw it I got the > shivers. This is fixed in new gvinum changes from Summer of Code. The current version of gvinum is very bad at keeping correct configuration data, which have many issues. > > Next step is to fdisk and label the new disk just like the old one. The new > disk can be bigger but, I think, the partition ad1s1d must be the same size > as before. Yes, at least the same size or bigger. > [...] > to rebuild it. It may take about 1 hour per 100GB of volume space, so we > better grab some lunch... > > The progress can be seen at any time with: > > gvinum ls > > After that, a 'fsck -t ufs /dev/gvinum/VOL' will probably catch some errors > left behind when the drive came down. > > Now we just need to uncomment that line in fstab and reboot. > > I think there's no easier way... Yes there is. Replacing a drive in gvinum follows the following procedure: 1. Create config for the new drive and name the drive _differently_ than the old one. 2. Use the gvinum 'move' command to move the stale subdisk to the new drive. 3. Make sure the the subdisk now points to the new drive and that it's in the 'stale' state. 4. Start the plex (gvinum start). The other issues you encountered have been fixed in my gvinum work this summer. Also, replacing a drive and rebuilding a plex can happen without unmounting your volume in the new gvinum. brave users can find patches here: http://people.freebsd.org/~lulf/patches/gvinum All testing is very much appreciated. -- Ulf Lilleengen From owner-freebsd-geom@FreeBSD.ORG Sat Nov 3 03:46:03 2007 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 58D0816A417 for ; Sat, 3 Nov 2007 03:46:03 +0000 (UTC) (envelope-from lulf@stud.ntnu.no) Received: from fri.itea.ntnu.no (fri.itea.ntnu.no [129.241.7.60]) by mx1.freebsd.org (Postfix) with ESMTP id 1878B13C4B5 for ; Sat, 3 Nov 2007 03:46:02 +0000 (UTC) (envelope-from lulf@stud.ntnu.no) Received: from localhost (localhost [127.0.0.1]) by fri.itea.ntnu.no (Postfix) with ESMTP id 2AB828C05; Sat, 3 Nov 2007 02:42:50 +0100 (CET) Received: from caracal.stud.ntnu.no (caracal.stud.ntnu.no [129.241.56.185]) by fri.itea.ntnu.no (Postfix) with ESMTP; Sat, 3 Nov 2007 02:42:49 +0100 (CET) Received: by caracal.stud.ntnu.no (Postfix, from userid 2312) id 79D0B62421A; Sat, 3 Nov 2007 02:43:06 +0100 (CET) Date: Sat, 3 Nov 2007 02:43:06 +0100 From: Ulf Lilleengen To: Peter Giessel Message-ID: <20071103014306.GA22755@stud.ntnu.no> References: <8d4842b50710310814w3880f7d3ldf8abe3a236cbcc8@mail.gmail.com> <20071031215756.GB1670@stud.ntnu.no> <472AA59F.3020103@rootnode.com> <0001DFFC-0115-1000-9A80-3F81219C1B16-Webmail-10013@mac.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <0001DFFC-0115-1000-9A80-3F81219C1B16-Webmail-10013@mac.com> User-Agent: Mutt/1.5.9i X-Content-Scanned: with sophos and spamassassin at mailgw.ntnu.no. Cc: freebsd-geom@freebsd.org Subject: Re: gvinum and raid5 X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 03 Nov 2007 03:46:03 -0000 On fre, nov 02, 2007 at 12:38:36 -0700, Peter Giessel wrote: > On Friday, November 02, 2007, at 01:04AM, "Joe Koberg" wrote: > >Ulf Lilleengen wrote: > >> On ons, okt 31, 2007 at 12:14:18 -0300, Marco Haddad wrote: > >> > >>> I found in recent researchs that a lot of people say gvinum should not be > >>> trusted, when it comes to raid5. I began to get worried. Am I alone using > >>> > >>> > >> I'm working on it, and there are definately people still using it. (I've > >> recieved a number of private mails as well as those seen on this list). IMO, > >> gvinum can be trusted when it comes to raid5. I've not experienced any > >> corruption-bugs or anything like that with it. > >> > > > >The source of the mistrust may be the fact that few software-only RAID-5 > >systems can guarantee write consistency across a multi-drive > >read-update-write cycle in the case of, e.g., power failure. > > That may be the true source, but my source of mistrust comes from a few > drive failures and gvinum's inability to rebuild the replaced drive. > > Worked fine under vinum in tests, tried the same thing in gvinum (granted, > this was under FreeBSD 5), and the array failed to rebuild. > > I can't be 100% sure it wasn't a flakey ATA controller and not gvinum's > fault, and I no longer have access to the box to play with, but when I was > playing with gvinum, replacing a failed drive usually resulted in panics. Well, all I can say is that I've tested this many times with gvinum in CURRENT/7.x/6.x as well as my SoC work, and I made updates to the manpage to give examples on how to do this as well. Also, for the software RAID-5 problems... they are hard to "fix" since gvinum doesn't really know anything about the consumers. However, it could be interesting to try out different optimizations like not reading parity when having a sufficiently large request, or some sort of write cache until one can issue a large enough request. -- Ulf Lilleengen From owner-freebsd-geom@FreeBSD.ORG Sat Nov 3 05:21:00 2007 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A1AC616A417 for ; Sat, 3 Nov 2007 05:21:00 +0000 (UTC) (envelope-from freebsd-lists@ideo.com.br) Received: from rv-out-0910.google.com (rv-out-0910.google.com [209.85.198.186]) by mx1.freebsd.org (Postfix) with ESMTP id 8161C13C4AC for ; Sat, 3 Nov 2007 05:21:00 +0000 (UTC) (envelope-from freebsd-lists@ideo.com.br) Received: by rv-out-0910.google.com with SMTP id l15so815116rvb for ; Fri, 02 Nov 2007 22:20:34 -0700 (PDT) Received: by 10.142.76.4 with SMTP id y4mr722812wfa.1194053309004; Fri, 02 Nov 2007 18:28:29 -0700 (PDT) Received: by 10.142.135.15 with HTTP; Fri, 2 Nov 2007 18:28:28 -0700 (PDT) Message-ID: <8d4842b50711021828m30816730rc8fb7f31278bb9c4@mail.gmail.com> Date: Fri, 2 Nov 2007 22:28:28 -0300 From: "Marco Haddad" To: "Peter Giessel" In-Reply-To: <0001DFFC-0115-1000-9A80-3F81219C1B16-Webmail-10013@mac.com> MIME-Version: 1.0 References: <8d4842b50710310814w3880f7d3ldf8abe3a236cbcc8@mail.gmail.com> <20071031215756.GB1670@stud.ntnu.no> <472AA59F.3020103@rootnode.com> <0001DFFC-0115-1000-9A80-3F81219C1B16-Webmail-10013@mac.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-geom@freebsd.org, Ulf Lilleengen , Joe Koberg Subject: Re: gvinum and raid5 X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 03 Nov 2007 05:21:00 -0000 Hi, I must say that I had a strong faith on vinum too. I used it on a dozen servers to build raid5 volumes, specially when the hard drives were small and unreliable. So I had a few crashes naturally, but replacing the failed disk was easy and rebuild worked all times. I started using gvinum at the first SCSI controller not supported by vinum found. As gvinum solved the vinum problem with that controller, it immediately received the same faith I had on vinum. I kept using gvinum many times after, till my faith was shaken by a hard disk crash, because I could not get the replacement drive added to the raid5 volume. After a lot of head bumping against the wall, I came up with this work around procedure to replace a failed disk. I have just used this procedure today to replace a SATA hard disk that I suspect was the cause of an intermittent failure, with such a success that I began to think it isn't so bad... Anyway I'll describe a simple example in order to get your comments. Suppose a simple system with three hard disks ad0, ad1 and ad2. They were fdisked and labled equally. ad0s1a is the / and ad0s1d, ad1s1d and ad2s1d are of the same size and are used by gvinum as drives AD0, AD1 and AD2. Each drive has only one slice and they are joined in a raid5 plex formming the volume VOL. The gvinum create script would be the following: drive AD0 device /dev/ad0s1d drive AD1 device /dev/ad1s1d drive AD2 device /dev/ad2s1d volume VOL plex org raid5 128K sd drive AD0 sd drive AD1 sd drive AD2 Suppose ad1 crashes and gvinum marks it as down. With the command "gvinum l" we would get something like this: 3 drives: D AD0 State: up /dev/ad0s1d ... D AD1 State: down /dev/ad1s1d ... D AD2 State: up /dev/ad2s1d ... 1 volumes: V VOL State: up ... 1 plexes: P VOL.p0 R5 State: degraded ... 3 subdisks: S VOL.p0.s0 State: up D: AD0 ... S VOL.p0.s1 State: down D: AD1 ... S VOL.p0.s2 State: up D: AD2 ... First thing I do: edit fstab and comment out the line mounting /dev/gvinum/VOL wherever it was mounted. It is necessary because once mounted gvinum can not operate most commands, and umount doesn't do the trick. Then I shutdown the system and replace the hard disk and bring it up again. At this point the first weird thing can be noted. With 'gvinum l' you would get: 2 drives: D AD0 State: up /dev/ad0s1d ... D AD2 State: up /dev/ad2s1d ... 1 volumes: V VOL State: up ... 1 plexes: P VOL.p0 R5 State: up ... 3 subdisks: S VOL.p0.s0 State: up D: AD0 ... S VOL.p0.s1 State: up D: AD1 ... S VOL.p0.s2 State: up D: AD2 ... What? The AD1 is gonne, ok, but why the subdisk VOL.p0.s1 is up? And it makes the plex up instead of degraded. The first time I saw it I got the shivers. Next step is to fdisk and label the new disk just like the old one. The new disk can be bigger but, I think, the partition ad1s1d must be the same size as before. At this point should be enough to use gvinum create with a script file containing only the line: drive AD1 device /dev/ad1s1d but gvinum would panic with that and the system would lock or core dump. Then something weird must be done: remove all gvinum objects with 'gvinum rm ---'. Yes, just to make it clear, in this case the commands would be: gvinum rm -r AD0 gvinum rm -r AD2 gvinum rm VOL gvinum rm VOL.p0 gvinum rm VOL.p0.s1 Then we can use 'gvinum create' with the original script to recreate everything. Now it is all up again, but it isn't just right. The subdisk VOL.p0.s1 must be marked as stale with: gvinum setstate -f stale VOL.p0.s1 This brings back the plex to degraded mode and we can use: gvinum start VOL to rebuild it. It may take about 1 hour per 100GB of volume space, so we better grab some lunch... The progress can be seen at any time with: gvinum ls After that, a 'fsck -t ufs /dev/gvinum/VOL' will probably catch some errors left behind when the drive came down. Now we just need to uncomment that line in fstab and reboot. I think there's no easier way... Regards, Marco Haddad On 11/2/07, Peter Giessel wrote: > > On Friday, November 02, 2007, at 01:04AM, "Joe Koberg" > wrote: > >Ulf Lilleengen wrote: > >> On ons, okt 31, 2007 at 12:14:18 -0300, Marco Haddad wrote: > >> > >>> I found in recent researchs that a lot of people say gvinum should not > be > >>> trusted, when it comes to raid5. I began to get worried. Am I alone > using > >>> > >>> > >> I'm working on it, and there are definately people still using it. > (I've > >> recieved a number of private mails as well as those seen on this list). > IMO, > >> gvinum can be trusted when it comes to raid5. I've not experienced any > >> corruption-bugs or anything like that with it. > >> > > > >The source of the mistrust may be the fact that few software-only RAID-5 > >systems can guarantee write consistency across a multi-drive > >read-update-write cycle in the case of, e.g., power failure. > > That may be the true source, but my source of mistrust comes from a few > drive failures and gvinum's inability to rebuild the replaced drive. > > Worked fine under vinum in tests, tried the same thing in gvinum (granted, > this was under FreeBSD 5), and the array failed to rebuild. > > I can't be 100% sure it wasn't a flakey ATA controller and not gvinum's > fault, and I no longer have access to the box to play with, but when I was > playing with gvinum, replacing a failed drive usually resulted in panics. >