From owner-freebsd-geom@FreeBSD.ORG  Mon Oct 29 11:07:04 2007
Return-Path: <owner-freebsd-geom@FreeBSD.ORG>
Delivered-To: freebsd-geom@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 9761E16A47A
	for <freebsd-geom@hub.freebsd.org>;
	Mon, 29 Oct 2007 11:07:02 +0000 (UTC)
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id D29BC13C48D
	for <freebsd-geom@hub.freebsd.org>;
	Mon, 29 Oct 2007 11:07:02 +0000 (UTC)
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.1/8.14.1) with ESMTP id l9TB72He090101
	for <freebsd-geom@FreeBSD.org>; Mon, 29 Oct 2007 11:07:02 GMT
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: (from gnats@localhost)
	by freefall.freebsd.org (8.14.1/8.14.1/Submit) id l9TB72Dg090097
	for freebsd-geom@FreeBSD.org; Mon, 29 Oct 2007 11:07:02 GMT
	(envelope-from owner-bugmaster@FreeBSD.org)
Date: Mon, 29 Oct 2007 11:07:02 GMT
Message-Id: <200710291107.l9TB72Dg090097@freefall.freebsd.org>
X-Authentication-Warning: freefall.freebsd.org: gnats set sender to
	owner-bugmaster@FreeBSD.org using -f
From: FreeBSD bugmaster <bugmaster@FreeBSD.org>
To: freebsd-geom@FreeBSD.org
Cc: 
Subject: Current problem reports assigned to freebsd-geom@FreeBSD.org
X-BeenThere: freebsd-geom@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: GEOM-specific discussions and implementations
	<freebsd-geom.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-geom>
List-Post: <mailto:freebsd-geom@freebsd.org>
List-Help: <mailto:freebsd-geom-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 29 Oct 2007 11:07:04 -0000

Current FreeBSD problem reports
Critical problems
Serious problems

S Tracker      Resp.      Description
--------------------------------------------------------------------------------
o kern/73177   geom       kldload geom_* causes panic due to memory exhaustion
o kern/76538   geom       [gbde] nfs-write on gbde partition stalls and continue
o kern/83464   geom       [geom] [patch] Unhandled malloc failures within libgeo
o kern/84556   geom       [geom] GBDE-encrypted swap causes panic at shutdown
o kern/87544   geom       [gbde] mmaping large files on a gbde filesystem deadlo
o kern/89102   geom       [geom_vfs] [panic] panic when forced unmount FS from u
o bin/90093    geom       fdisk(8) incapable of altering in-core geometry
o kern/90582   geom       [geom_mirror] [panic] Restore cause panic string (ffs_
o kern/98034   geom       [geom] dereference of NULL pointer in acd_geom_detach 
o kern/104389  geom       [geom] [patch] sys/geom/geom_dump.c doesn't encode XML
o kern/113419  geom       [geom] geom fox multipathing not failing back
o misc/113543  geom       [geom] [patch] geom(8) utilities don't work inside the
o kern/113957  geom       [gmirror] gmirror is intermittently reporting a degrad
o kern/115572  geom       [gbde] gbde partitions fail at 28bit/48bit LBA address

14 problems total.

Non-critical problems

S Tracker      Resp.      Description
--------------------------------------------------------------------------------
o bin/78131    geom       gbde "destroy" not working.
o kern/79251   geom       [2TB] newfs fails on 2.6TB gbde device
o kern/94632   geom       [geom] Kernel output resets input while GELI asks for 
f kern/105390  geom       [geli] filesystem on a md backed by sparse file with s
o kern/107707  geom       [geom] [patch] add new class geom_xbox360 to slice up 
p bin/110705   geom       gmirror control utility does not exit with correct exi
o kern/113837  geom       [geom] unable to access 1024 sector size storage
o kern/113885  geom       [geom] [patch] improved gmirror balance algorithm
o kern/114532  geom       GEOM_MIRROR shows up in kldstat even if compiled in th
o kern/115547  geom       [geom] [patch] for GEOM Eli to get password from stdin

10 problems total.


From owner-freebsd-geom@FreeBSD.ORG  Mon Oct 29 16:08:37 2007
Return-Path: <owner-freebsd-geom@FreeBSD.ORG>
Delivered-To: freebsd-geom@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id BCECF16A46E
	for <freebsd-geom@freebsd.org>; Mon, 29 Oct 2007 16:08:37 +0000 (UTC)
	(envelope-from felipe@neuwald.biz)
Received: from itacaiunas.cepatec.org.br (itacaiunas.cepatec.org.br
	[200.152.208.51])
	by mx1.freebsd.org (Postfix) with ESMTP id 6B18B13C45A
	for <freebsd-geom@freebsd.org>; Mon, 29 Oct 2007 16:08:32 +0000 (UTC)
	(envelope-from felipe@neuwald.biz)
Received: from localhost (vermelho [10.0.0.5])
	by itacaiunas.cepatec.org.br (Postfix) with ESMTP id F18F01155D6
	for <freebsd-geom@freebsd.org>; Mon, 29 Oct 2007 13:59:46 -0200 (BRST)
X-Virus-Scanned: amavisd-new at cepatec.org.br
Received: from itacaiunas.cepatec.org.br ([10.0.0.3])
	by localhost (vermelho.cepatec.org.br [10.0.0.5]) (amavisd-new,
	port 10024)
	with ESMTP id tFoFwzjcED8j for <freebsd-geom@freebsd.org>;
	Mon, 29 Oct 2007 12:59:46 -0300 (BRT)
Received: from [192.168.0.152] (unknown [200.199.198.61])
	by itacaiunas.cepatec.org.br (Postfix) with ESMTP id 05F801155D7
	for <freebsd-geom@freebsd.org>; Mon, 29 Oct 2007 13:59:43 -0200 (BRST)
Message-ID: <47260370.3060004@neuwald.biz>
Date: Mon, 29 Oct 2007 13:59:44 -0200
From: Felipe Neuwald <felipe@neuwald.biz>
User-Agent: Thunderbird 1.5.0.13 (X11/20070824)
MIME-Version: 1.0
To: freebsd-geom@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Raid 0 + 1
X-BeenThere: freebsd-geom@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: GEOM-specific discussions and implementations
	<freebsd-geom.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-geom>
List-Post: <mailto:freebsd-geom@freebsd.org>
List-Help: <mailto:freebsd-geom-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 29 Oct 2007 16:08:37 -0000

Hi Folks,

I talked with my customer, and we decided to implement a 0 + 1 RAID,
with 4 disks of 250Gb each.

Here is how my RAID is working now:

[root@fileserver /]# gvinum list
4 drives:
D a                     State: up       /dev/ad4        A: 0/238474 MB (0%)
D b                     State: up       /dev/ad5        A: 0/238475 MB (0%)
D c                     State: up       /dev/ad6        A: 0/238475 MB (0%)
D d                     State: up       /dev/ad7        A: 0/238475 MB (0%)

1 volume:
V data                  State: up       Plexes:       1 Size:        931 GB

1 plex:
P data.p0             S State: up       Subdisks:     4 Size:        931 GB

4 subdisks:
S data.p0.s0            State: up       D: a            Size:        232 GB
S data.p0.s1            State: up       D: b            Size:        232 GB
S data.p0.s2            State: up       D: c            Size:        232 GB
S data.p0.s3            State: up       D: d            Size:        232 GB


Could someone give me on example of how I'll implement a 0 + 1 RAID?

Thank you very much,

Felipe Neuwald.


From owner-freebsd-geom@FreeBSD.ORG  Mon Oct 29 17:29:45 2007
Return-Path: <owner-freebsd-geom@FreeBSD.ORG>
Delivered-To: freebsd-geom@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 1FF3616A469
	for <freebsd-geom@freebsd.org>; Mon, 29 Oct 2007 17:29:45 +0000 (UTC)
	(envelope-from felipe@neuwald.biz)
Received: from itacaiunas.cepatec.org.br (itacaiunas.cepatec.org.br
	[200.152.208.51])
	by mx1.freebsd.org (Postfix) with ESMTP id 8A41E13C491
	for <freebsd-geom@freebsd.org>; Mon, 29 Oct 2007 17:29:27 +0000 (UTC)
	(envelope-from felipe@neuwald.biz)
Received: from localhost (vermelho [10.0.0.5])
	by itacaiunas.cepatec.org.br (Postfix) with ESMTP id 46761115528
	for <freebsd-geom@freebsd.org>; Mon, 29 Oct 2007 11:26:16 -0200 (BRST)
X-Virus-Scanned: amavisd-new at cepatec.org.br
Received: from itacaiunas.cepatec.org.br ([10.0.0.3])
	by localhost (vermelho.cepatec.org.br [10.0.0.5]) (amavisd-new,
	port 10024)
	with ESMTP id fVMjLuFkMvZv for <freebsd-geom@freebsd.org>;
	Mon, 29 Oct 2007 10:26:14 -0300 (BRT)
Received: from [192.168.0.152] (unknown [200.199.198.61])
	by itacaiunas.cepatec.org.br (Postfix) with ESMTP id 5F4B71153EC
	for <freebsd-geom@freebsd.org>; Mon, 29 Oct 2007 11:26:13 -0200 (BRST)
Message-ID: <4725DF75.9030601@neuwald.biz>
Date: Mon, 29 Oct 2007 11:26:13 -0200
From: Felipe Neuwald <felipe@neuwald.biz>
User-Agent: Thunderbird 1.5.0.13 (X11/20070824)
MIME-Version: 1.0
To: freebsd-geom@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Raid 0 + 1
X-BeenThere: freebsd-geom@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: GEOM-specific discussions and implementations
	<freebsd-geom.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-geom>
List-Post: <mailto:freebsd-geom@freebsd.org>
List-Help: <mailto:freebsd-geom-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 29 Oct 2007 17:29:45 -0000

Hi Folks,

I talked with my customer, and we decided to implement a 0 + 1 RAID, 
with 4 disks of 250Gb each.

Here is how my RAID is working now:

[root@fileserver /]# gvinum list
4 drives:
D a                     State: up       /dev/ad4        A: 0/238474 MB (0%)
D b                     State: up       /dev/ad5        A: 0/238475 MB (0%)
D c                     State: up       /dev/ad6        A: 0/238475 MB (0%)
D d                     State: up       /dev/ad7        A: 0/238475 MB (0%)

1 volume:
V data                  State: up       Plexes:       1 Size:        931 GB

1 plex:
P data.p0             S State: up       Subdisks:     4 Size:        931 GB

4 subdisks:
S data.p0.s0            State: up       D: a            Size:        232 GB
S data.p0.s1            State: up       D: b            Size:        232 GB
S data.p0.s2            State: up       D: c            Size:        232 GB
S data.p0.s3            State: up       D: d            Size:        232 GB


Could someone give me on example of how I'll implement a 0 + 1 RAID?

Thank you very much,

Felipe Neuwald.


From owner-freebsd-geom@FreeBSD.ORG  Mon Oct 29 18:14:25 2007
Return-Path: <owner-freebsd-geom@FreeBSD.ORG>
Delivered-To: freebsd-geom@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 572FF16A419
	for <freebsd-geom@freebsd.org>; Mon, 29 Oct 2007 18:14:25 +0000 (UTC)
	(envelope-from felipe@neuwald.biz)
Received: from itacaiunas.cepatec.org.br (itacaiunas.cepatec.org.br
	[200.152.208.51])
	by mx1.freebsd.org (Postfix) with ESMTP id E7DF013C4CB
	for <freebsd-geom@freebsd.org>; Mon, 29 Oct 2007 18:14:19 +0000 (UTC)
	(envelope-from felipe@neuwald.biz)
Received: from localhost (vermelho [10.0.0.5])
	by itacaiunas.cepatec.org.br (Postfix) with ESMTP id EADF11155DC;
	Mon, 29 Oct 2007 16:05:41 -0200 (BRST)
X-Virus-Scanned: amavisd-new at cepatec.org.br
Received: from itacaiunas.cepatec.org.br ([10.0.0.3])
	by localhost (vermelho.cepatec.org.br [10.0.0.5]) (amavisd-new,
	port 10024)
	with ESMTP id QTK6fflbT3Ov; Mon, 29 Oct 2007 15:05:39 -0300 (BRT)
Received: from [192.168.0.152] (unknown [200.199.198.61])
	by itacaiunas.cepatec.org.br (Postfix) with ESMTP id 2B0F71155AD;
	Mon, 29 Oct 2007 16:05:37 -0200 (BRST)
Message-ID: <472620F1.90807@neuwald.biz>
Date: Mon, 29 Oct 2007 16:05:37 -0200
From: Felipe Neuwald <felipe@neuwald.biz>
User-Agent: Thunderbird 1.5.0.13 (X11/20070824)
MIME-Version: 1.0
To: Kevin Thompson <antiduh@csh.rit.edu>
References: <4414.147.177.192.113.1193678595.squirrel@angst.csh.rit.edu>
In-Reply-To: <4414.147.177.192.113.1193678595.squirrel@angst.csh.rit.edu>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-geom@freebsd.org
Subject: Re: Raid 0 + 1
X-BeenThere: freebsd-geom@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: GEOM-specific discussions and implementations
	<freebsd-geom.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-geom>
List-Post: <mailto:freebsd-geom@freebsd.org>
List-Help: <mailto:freebsd-geom-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 29 Oct 2007 18:14:25 -0000

Hi Kevin,

A bootable 0+1 RAID isn't my idea, cause I have one small disk to system 
files, but I'll try use your documentation to make my 0+1 RAID.

I have one blog to put these documentation, if it's ok to you, please 
take a look at ontheroadbrother.blogspot.com .

Cheers,

Felipe Neuwald

Kevin Thompson escreveu:
>> Hi Folks,
>>
>> I talked with my customer, and we decided to implement a 0 + 1 RAID,
>> with 4 disks of 250Gb each.
>>
>> Here is how my RAID is working now:
>>     
>
> I have some rough instructions I wrote up about doing a bootable 0+1 Geom
> RAID, in wiki format. I don't have a public place to post them, so I've
> just copied them below. Feedback is welcome.
>
> Once you get it setup, I would highly recommend that you experiment with
> the setup before relying on it - make sure you know how to handle failed
> disk replacements, etc. VMWare/camcontrol/atacontrol is very handy in this
> regard.
>
> --Kevin Thompson
>
>
>
>
> ==Introduction==
>
> This article describes one method to build a RAID 0+1 array with 4 logical
> disks using the FreeBSD GEOM framework, and being able to boot off of such
> an array.
>
> Building a GEOM mirror and being able to directly boot off of it is a
> relatively simple task - this is due to the fact that when mirroring is
> done, all FreeBSD slices maintain their disk-level structure - GEOM stores
> its metadata at the end of the disk/partition/etc. Most importantly, in
> such a configuration the MBR and boot file system is untouched.
>
> The latter is not the case in RAID 0, RAID 0+1, or RAID 1+0 setups.
>
> In striping configurations, the boot filesystem is interleaved between the
> various disks (usually two), and as such, the filesystem appears to be
> corrupt if read from only one disk.
>
> Since the MBR program on the boot disk is incapable of understanding this
> physical block layout, it is unable to find the kernel in order to start
> FreeBSD.
>
> However, should the MBR program be able to read the boot filesystem, it
> can then load the kernel and related kernel modules, which would then
> allow the system to 'boot' from the raidset.
>
> Technically speaking, the only file system that has to remain untouched is
> the /boot file system. root (/), /usr, /var, /tmp, et cetera may all then
> be mounted from a raidset.
>
> In this example, for simplicity's sake, I protect all of the typical root
> file system,  not just /boot. A more enterprising user may want to modify
> their approach.
>
> ==Instructions==
> Boot with a FreeBSD install disk, then start the fixit console. Next
> you'll need to clean up the environment (the fixit console needs a little
> updating)
>  ln -s /dist/boot/kernel /boot/kernel
>  ln -s /dist/lib /lib
>  EDITOR=/mnt2/usr/bin/vi; export EDITOR
>  PATH=$PATH':'/mnt2/sbin':'/mnt2/usr/bin':'/mnt2/usr/sbin
>  export PATH
>
>
> Now load the kernel modules for the geom modules we're going to be using.
>  glabel load
>  gstripe load
>  gmirror load
>
> Next, we're going to label each of the drives. The 'da0' drive name is
> typically assigned by the bios on boot, by way of some sort of metric such
> as chain location. The geom label module writes a fixed label to each
> drive, so that no matter where each drive goes, or if new devices are
> inserted and the numbering is reordered, the raid set will always work the
> same.
>  glabel label geom0 da0
>  glabel label geom1 da1
>  glabel label geom2 da2
>  glabel label geom3 da3
>
> Now install a mbr and basic partioning on each drive. This will create a
> single partition taking the entire drive for each drive.
>  fdisk -vBI /dev/label/geom0
>  fdisk -vBI /dev/label/geom1
>  fdisk -vBI /dev/label/geom2
>  fdisk -vBI /dev/label/geom3
>
> On the first disk slice of the first drive, install a simple disk label
> and bootstrap code.
>  bsdlabel -wB /dev/label/geom0s1
>
> Now edit the generic label on that disk, setting it as you please. 'a' is
> commonly root, 'b' the swap partition and 'd' the rest. Don't create any
> more partitions other than a, b and d (d will be used as the provider for
> a future geom consumer).
>
>  slave(/u9/antiduh) # bsdlabel -e /dev/label/geom0s1
>  #  size   offset    fstype   [fsize bsize bps/cpg]
>  a: 500M   16 	     4.2BSD
>  b: 500M   *         swap
>  c:                         # leave as is
>  d: *      *         4.2BSD
>
> The '*' for size means use whatever is left, and the '*' for offset means
> use the next logical offset.
>
> Once you're finished labeling the first drive, write the label for the
> drive out to a file, then use it to initialize the other three disks:
>  bsdlabel /dev/label/geom0s1 > /file
>  bsdlabel -R /dev/label/geom1s1 /file
>  bsdlabel -R /dev/label/geom2s1 /file
>  bsdlabel -R /dev/label/geom3s1 /file
>
>
> Now, create a GEOM mirror out of the 'a' partition of each drive. This
> will eventually be the root partition. The new device will be called
> '''boot''', and will enumerate in FreeBSD as '''/dev/mirror/boot'''
>  gmirror label -vh boot /dev/label/geom0s1a /dev/label/geom1s1a
> /dev/label/geom2s1a /dev/label/geom3s1a
>
> Now we're going to pull together the d partion on each drive, create
> pairwise stripes, then mirror the new stripes together.
>  gstripe label -vh -s 131072 st0 /dev/label/geom0s1d /dev/geom1s1d
>  gstripe label -vh -s 131072 st1 /dev/label/geom2s1d /dev/geom3s1d
>
> We should now have two new devices '''/dev/stripe/st0''' and
> '''/dev/stripe/st1'''. Now mirror those two devices to create our final
> device that will next be used for the rest of our filesystems:
>  gmirror label -vh gm0 /dev/stripe/st0 /dev/stripe/st1
>
> We now have our final device '''/dev/mirror/gm0''' that is going to serve
> as the base for our regular filesystems.
>
> Create a basic slicing of the new disk, then edit to taste. Note that we
> don't fdisk the raw gm0 device - we create slices directly on the raw
> device.
>  slave(/u9/antiduh) # bsdlabel -wB /dev/mirror/gm0
>  slave(/u9/antiduh) # bsdlabel -e /dev/mirror/gm0
>  #        size   offset    fstype   [fsize bsize bps/cpg]
>  a: 1		16	unused	# that is just a bare '1', not '1M'. This is to get
> around a bug in bsdlabel
>  c: #-- leave as is --
>  e: 1000M 	*	4.2BSD	# /var
>  f: 500M	*	4.2BSD  # /tmp
>  d: *		*	4.2BSD  # /usr, which just gets the rest
>
> Now create our filesystems on associated slices (the -U option enables
> soft updates)
>  newfs /dev/mirror/boot
>  newfs -U /dev/mirror/gm0d
>  newfs -U /dev/mirror/gm0e
>  newfs -U /dev/mirror/gm0f
>
> Now we finally get down to mounting the disks, installing the OS on it,
> setting up the OS to boot, and finally booting.
>
> Mount the root disk as /mnt
>  mount /dev/mirror/boot /mnt
>
> Create mount points for our other file systems, then mount them:
>  mkdir /mnt/usr
>  mkdir /mnt/var
>  mkdir /mnt/tmp
>
>  mount /dev/mirror/gm0d /mnt/usr
>  mount /dev/mirror/gm0e /mnt/var
>  mount /dev/mirror/gm0f /mnt/tmp
>
> Now we're going to do a install the fun way:
>  DESTDIR=/mnt
>  export DESTDIR
>  cd /dist/6.2-RELEASE/base; ./install.sh
>  cd ../ports; ./install.sh
>  cd ../manpages; ./install.sh
>  cd ../kernels; ./install.sh GENERIC
>  mv /mnt/boot/GENERIC/* /mnt/boot/kernel
>
>
> You now have a basic, blank, sorta bootable unconfigured FreeBSD install
> on the machine.
>
> Setup the boot loader to load the needed kernel modules, so that we can
> mount root from the mirror.
>  echo 'geom_label_load="YES"'  >> /mnt/boot/loader.conf
>  echo 'geom_stripe_load="YES"' >> /mnt/boot/loader.conf
>  echo 'geom_mirror_load="YES"' >> /mnt/boot/loader.conf
>
> Set a kernel option to use more memory but make the stripe layer faster,
> otherwise stripes are unbearably slow.
>  echo 'kern.geom.stripe.fast=1"' >> /mnt/boot/loader.conf
>
> Now setup our initial fstab file:
>  echo '/dev/label/geom0s1b	none	swap	sw	0	0' >> /mnt/etc/fstab
>  echo '/dev/label/geom1s1b	none	swap	sw	0	0' >> /mnt/etc/fstab
>  echo '/dev/label/geom2s1b	none	swap	sw	0	0' >> /mnt/etc/fstab
>  echo '/dev/label/geom3s1b	none	swap	sw	0	0' >> /mnt/etc/fstab
>  echo '/dev/mirror/boot		/	ufs	rw	1 	1' >> /mnt/etc/fstab
>  echo '/dev/mirror/gm0d		/usr	ufs	rw	2 	2' >> /mnt/etc/fstab
>  echo '/dev/mirror/gm0e		/var	ufs	rw	2 	2' >> /mnt/etc/fstab
>  echo '/dev/mirror/gm0f		/tmp	ufs	rw	2 	2' >> /mnt/etc/fstab
>
> Reboot and the machine should start the raid sets automatically and mount
> off of the raid array automatically. Keep that install CD handy in case
> you messed up.
>
> Login, change the root password, setup rc.conf (hostname, interfaces, ssh,
> linux binary compat... ), start installing stuff, etc.
>
> Enjoy.
>
> ==Block Diagram==
> <code>
>  da0 --label-->geom0
>  da1 --label-->geom1
>  da2 --label-->geom2
>  da3 --label-->geom3
>
>  label/geom0s1a --|--mirror-->/dev/mirror/boot
>  label/geom1s1a --|
>  label/geom2s1a --|
>  label/geom3s1a --|
>
>  label/geom0s1d --|--stripe-->/dev/stripe/st0 --|--mirror-->/dev/mirror/gm0
>  label/geom1s1d --|                             |
>                                                 |
>  label/geom2s1d --|--stripe-->/dev/stripe/st1 --|
>  label/geom3s1d --|
>
>  mirror/boot   --> /
>
>  mirror/gm0s1d --> /usr
>  mirror/gm0s1e --> /var
>  mirror/gm0s1f --> /tmp
>
>  label/geom0s1b --> swap
>  label/geom1s1b --> swap
>  label/geom2s1b --> swap
>  label/geom3s1b --> swap
> </code>
>
>
>
>
>
>   


From owner-freebsd-geom@FreeBSD.ORG  Mon Oct 29 20:28:31 2007
Return-Path: <owner-freebsd-geom@FreeBSD.ORG>
Delivered-To: freebsd-geom@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id F051C16A418
	for <freebsd-geom@freebsd.org>; Mon, 29 Oct 2007 20:28:31 +0000 (UTC)
	(envelope-from antiduh@csh.rit.edu)
Received: from angst.csh.rit.edu (angst.csh.rit.edu [129.21.60.148])
	by mx1.freebsd.org (Postfix) with ESMTP id A5E3613C4A6
	for <freebsd-geom@freebsd.org>; Mon, 29 Oct 2007 20:28:31 +0000 (UTC)
	(envelope-from antiduh@csh.rit.edu)
Received: from angst.csh.rit.edu (localhost [127.0.0.1])
	by angst.csh.rit.edu (Postfix) with ESMTP id 77580B8F3;
	Mon, 29 Oct 2007 13:23:14 -0400 (EDT)
Received: from 147.177.192.113 (SquirrelMail authenticated user antiduh)
	by angst.csh.rit.edu with HTTP; Mon, 29 Oct 2007 13:23:15 -0400 (EDT)
Message-ID: <4414.147.177.192.113.1193678595.squirrel@angst.csh.rit.edu>
Date: Mon, 29 Oct 2007 13:23:15 -0400 (EDT)
From: "Kevin Thompson" <antiduh@csh.rit.edu>
To: "Felipe Neuwald" <felipe@neuwald.biz>
User-Agent: SquirrelMail/1.4.11
MIME-Version: 1.0
Content-Type: text/plain;charset=iso-8859-1
Content-Transfer-Encoding: 8bit
X-Priority: 3 (Normal)
Importance: Normal
Cc: freebsd-geom@freebsd.org
Subject: Re: Raid 0 + 1
X-BeenThere: freebsd-geom@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: GEOM-specific discussions and implementations
	<freebsd-geom.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-geom>
List-Post: <mailto:freebsd-geom@freebsd.org>
List-Help: <mailto:freebsd-geom-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 29 Oct 2007 20:28:32 -0000

> Hi Folks,
>
> I talked with my customer, and we decided to implement a 0 + 1 RAID,
> with 4 disks of 250Gb each.
>
> Here is how my RAID is working now:

I have some rough instructions I wrote up about doing a bootable 0+1 Geom
RAID, in wiki format. I don't have a public place to post them, so I've
just copied them below. Feedback is welcome.

Once you get it setup, I would highly recommend that you experiment with
the setup before relying on it - make sure you know how to handle failed
disk replacements, etc. VMWare/camcontrol/atacontrol is very handy in this
regard.

--Kevin Thompson


==Introduction==

This article describes one method to build a RAID 0+1 array with 4 logical
disks using the FreeBSD GEOM framework, and being able to boot off of such
an array.

Building a GEOM mirror and being able to directly boot off of it is a
relatively simple task - this is due to the fact that when mirroring is
done, all FreeBSD slices maintain their disk-level structure - GEOM stores
its metadata at the end of the disk/partition/etc. Most importantly, in
such a configuration the MBR and boot file system is untouched.

The latter is not the case in RAID 0, RAID 0+1, or RAID 1+0 setups.

In striping configurations, the boot filesystem is interleaved between the
various disks (usually two), and as such, the filesystem appears to be
corrupt if read from only one disk.

Since the MBR program on the boot disk is incapable of understanding this
physical block layout, it is unable to find the kernel in order to start
FreeBSD.

However, should the MBR program be able to read the boot filesystem, it
can then load the kernel and related kernel modules, which would then
allow the system to 'boot' from the raidset.

Technically speaking, the only file system that has to remain untouched is
the /boot file system. root (/), /usr, /var, /tmp, et cetera may all then
be mounted from a raidset.

In this example, for simplicity's sake, I protect all of the typical root
file system,  not just /boot. A more enterprising user may want to modify
their approach.

==Instructions==
Boot with a FreeBSD install disk, then start the fixit console. Next
you'll need to clean up the environment (the fixit console needs a little
updating)
 ln -s /dist/boot/kernel /boot/kernel
 ln -s /dist/lib /lib
 EDITOR=/mnt2/usr/bin/vi; export EDITOR
 PATH=$PATH':'/mnt2/sbin':'/mnt2/usr/bin':'/mnt2/usr/sbin
 export PATH


Now load the kernel modules for the geom modules we're going to be using.
 glabel load
 gstripe load
 gmirror load

Next, we're going to label each of the drives. The 'da0' drive name is
typically assigned by the bios on boot, by way of some sort of metric such
as chain location. The geom label module writes a fixed label to each
drive, so that no matter where each drive goes, or if new devices are
inserted and the numbering is reordered, the raid set will always work the
same.
 glabel label geom0 da0
 glabel label geom1 da1
 glabel label geom2 da2
 glabel label geom3 da3

Now install a mbr and basic partioning on each drive. This will create a
single partition taking the entire drive for each drive.
 fdisk -vBI /dev/label/geom0
 fdisk -vBI /dev/label/geom1
 fdisk -vBI /dev/label/geom2
 fdisk -vBI /dev/label/geom3

On the first disk slice of the first drive, install a simple disk label
and bootstrap code.
 bsdlabel -wB /dev/label/geom0s1

Now edit the generic label on that disk, setting it as you please. 'a' is
commonly root, 'b' the swap partition and 'd' the rest. Don't create any
more partitions other than a, b and d (d will be used as the provider for
a future geom consumer).

 slave(/u9/antiduh) # bsdlabel -e /dev/label/geom0s1
 #  size   offset    fstype   [fsize bsize bps/cpg]
 a: 500M   16 	     4.2BSD
 b: 500M   *         swap
 c:                         # leave as is
 d: *      *         4.2BSD

The '*' for size means use whatever is left, and the '*' for offset means
use the next logical offset.

Once you're finished labeling the first drive, write the label for the
drive out to a file, then use it to initialize the other three disks:
 bsdlabel /dev/label/geom0s1 > /file
 bsdlabel -R /dev/label/geom1s1 /file
 bsdlabel -R /dev/label/geom2s1 /file
 bsdlabel -R /dev/label/geom3s1 /file


Now, create a GEOM mirror out of the 'a' partition of each drive. This
will eventually be the root partition. The new device will be called
'''boot''', and will enumerate in FreeBSD as '''/dev/mirror/boot'''
 gmirror label -vh boot /dev/label/geom0s1a /dev/label/geom1s1a
/dev/label/geom2s1a /dev/label/geom3s1a

Now we're going to pull together the d partion on each drive, create
pairwise stripes, then mirror the new stripes together.
 gstripe label -vh -s 131072 st0 /dev/label/geom0s1d /dev/geom1s1d
 gstripe label -vh -s 131072 st1 /dev/label/geom2s1d /dev/geom3s1d

We should now have two new devices '''/dev/stripe/st0''' and
'''/dev/stripe/st1'''. Now mirror those two devices to create our final
device that will next be used for the rest of our filesystems:
 gmirror label -vh gm0 /dev/stripe/st0 /dev/stripe/st1

We now have our final device '''/dev/mirror/gm0''' that is going to serve
as the base for our regular filesystems.

Create a basic slicing of the new disk, then edit to taste. Note that we
don't fdisk the raw gm0 device - we create slices directly on the raw
device.
 slave(/u9/antiduh) # bsdlabel -wB /dev/mirror/gm0
 slave(/u9/antiduh) # bsdlabel -e /dev/mirror/gm0
 #        size   offset    fstype   [fsize bsize bps/cpg]
 a: 1		16	unused	# that is just a bare '1', not '1M'. This is to get
around a bug in bsdlabel
 c: #-- leave as is --
 e: 1000M 	*	4.2BSD	# /var
 f: 500M	*	4.2BSD  # /tmp
 d: *		*	4.2BSD  # /usr, which just gets the rest

Now create our filesystems on associated slices (the -U option enables
soft updates)
 newfs /dev/mirror/boot
 newfs -U /dev/mirror/gm0d
 newfs -U /dev/mirror/gm0e
 newfs -U /dev/mirror/gm0f

Now we finally get down to mounting the disks, installing the OS on it,
setting up the OS to boot, and finally booting.

Mount the root disk as /mnt
 mount /dev/mirror/boot /mnt

Create mount points for our other file systems, then mount them:
 mkdir /mnt/usr
 mkdir /mnt/var
 mkdir /mnt/tmp

 mount /dev/mirror/gm0d /mnt/usr
 mount /dev/mirror/gm0e /mnt/var
 mount /dev/mirror/gm0f /mnt/tmp

Now we're going to do a install the fun way:
 DESTDIR=/mnt
 export DESTDIR
 cd /dist/6.2-RELEASE/base; ./install.sh
 cd ../ports; ./install.sh
 cd ../manpages; ./install.sh
 cd ../kernels; ./install.sh GENERIC
 mv /mnt/boot/GENERIC/* /mnt/boot/kernel


You now have a basic, blank, sorta bootable unconfigured FreeBSD install
on the machine.

Setup the boot loader to load the needed kernel modules, so that we can
mount root from the mirror.
 echo 'geom_label_load="YES"'  >> /mnt/boot/loader.conf
 echo 'geom_stripe_load="YES"' >> /mnt/boot/loader.conf
 echo 'geom_mirror_load="YES"' >> /mnt/boot/loader.conf

Set a kernel option to use more memory but make the stripe layer faster,
otherwise stripes are unbearably slow.
 echo 'kern.geom.stripe.fast=1"' >> /mnt/boot/loader.conf

Now setup our initial fstab file:
 echo '/dev/label/geom0s1b	none	swap	sw	0	0' >> /mnt/etc/fstab
 echo '/dev/label/geom1s1b	none	swap	sw	0	0' >> /mnt/etc/fstab
 echo '/dev/label/geom2s1b	none	swap	sw	0	0' >> /mnt/etc/fstab
 echo '/dev/label/geom3s1b	none	swap	sw	0	0' >> /mnt/etc/fstab
 echo '/dev/mirror/boot		/	ufs	rw	1 	1' >> /mnt/etc/fstab
 echo '/dev/mirror/gm0d		/usr	ufs	rw	2 	2' >> /mnt/etc/fstab
 echo '/dev/mirror/gm0e		/var	ufs	rw	2 	2' >> /mnt/etc/fstab
 echo '/dev/mirror/gm0f		/tmp	ufs	rw	2 	2' >> /mnt/etc/fstab

Reboot and the machine should start the raid sets automatically and mount
off of the raid array automatically. Keep that install CD handy in case
you messed up.

Login, change the root password, setup rc.conf (hostname, interfaces, ssh,
linux binary compat... ), start installing stuff, etc.

Enjoy.

==Block Diagram==
<code>
 da0 --label-->geom0
 da1 --label-->geom1
 da2 --label-->geom2
 da3 --label-->geom3

 label/geom0s1a --|--mirror-->/dev/mirror/boot
 label/geom1s1a --|
 label/geom2s1a --|
 label/geom3s1a --|

 label/geom0s1d --|--stripe-->/dev/stripe/st0 --|--mirror-->/dev/mirror/gm0
 label/geom1s1d --|                             |
                                                |
 label/geom2s1d --|--stripe-->/dev/stripe/st1 --|
 label/geom3s1d --|

 mirror/boot   --> /

 mirror/gm0s1d --> /usr
 mirror/gm0s1e --> /var
 mirror/gm0s1f --> /tmp

 label/geom0s1b --> swap
 label/geom1s1b --> swap
 label/geom2s1b --> swap
 label/geom3s1b --> swap
</code>


From owner-freebsd-geom@FreeBSD.ORG  Mon Oct 29 21:06:35 2007
Return-Path: <owner-freebsd-geom@FreeBSD.ORG>
Delivered-To: freebsd-geom@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 784F616A46C
	for <freebsd-geom@freebsd.org>; Mon, 29 Oct 2007 21:06:35 +0000 (UTC)
	(envelope-from lulf@stud.ntnu.no)
Received: from decibel.pvv.ntnu.no (unknown
	[IPv6:2001:700:300:1900:2e0:81ff:fe2d:e9b2])
	by mx1.freebsd.org (Postfix) with ESMTP id AA0DA13C4C4
	for <freebsd-geom@freebsd.org>; Mon, 29 Oct 2007 21:06:34 +0000 (UTC)
	(envelope-from lulf@stud.ntnu.no)
Received: from tvilling.pvv.ntnu.no
	([129.241.210.198] helo=carrot.studby.ntnu.no ident=lulf)
	by decibel.pvv.ntnu.no with esmtp (Exim 4.60)
	(envelope-from <lulf@stud.ntnu.no>)
	id 1Imbod-0007CW-Di; Mon, 29 Oct 2007 22:06:31 +0100
Date: Mon, 29 Oct 2007 22:06:29 +0100
From: Ulf Lilleengen <lulf@stud.ntnu.no>
To: Felipe Neuwald <felipe@neuwald.biz>
Message-ID: <20071029210629.GA26364@carrot.studby.ntnu.no>
Mail-Followup-To: Felipe Neuwald <felipe@neuwald.biz>, freebsd-geom@freebsd.org
References: <47260370.3060004@neuwald.biz>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <47260370.3060004@neuwald.biz>
User-Agent: Mutt/1.5.16 (2007-06-09)
Cc: freebsd-geom@freebsd.org
Subject: Re: Raid 0 + 1
X-BeenThere: freebsd-geom@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: ...@carrot.studby.ntnu.no
List-Id: GEOM-specific discussions and implementations
	<freebsd-geom.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-geom>
List-Post: <mailto:freebsd-geom@freebsd.org>
List-Help: <mailto:freebsd-geom-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 29 Oct 2007 21:06:35 -0000

On Mon, Oct 29, 2007 at 01:59:44PM -0200, Felipe Neuwald wrote:
> Hi Folks,
> 
> I talked with my customer, and we decided to implement a 0 + 1 RAID,
> with 4 disks of 250Gb each.
> 
> Here is how my RAID is working now:
> 
> [root@fileserver /]# gvinum list
> 4 drives:
> D a                     State: up       /dev/ad4        A: 0/238474 MB (0%)
> D b                     State: up       /dev/ad5        A: 0/238475 MB (0%)
> D c                     State: up       /dev/ad6        A: 0/238475 MB (0%)
> D d                     State: up       /dev/ad7        A: 0/238475 MB (0%)
> 
> 1 volume:
> V data                  State: up       Plexes:       1 Size:        931 GB
> 
> 1 plex:
> P data.p0             S State: up       Subdisks:     4 Size:        931 GB
> 
> 4 subdisks:
> S data.p0.s0            State: up       D: a            Size:        232 GB
> S data.p0.s1            State: up       D: b            Size:        232 GB
> S data.p0.s2            State: up       D: c            Size:        232 GB
> S data.p0.s3            State: up       D: d            Size:        232 GB
> 
> 
> Could someone give me on example of how I'll implement a 0 + 1 RAID?
> 
Hello,

You need to create two plexes to do this, so something like:

drive a device /dev/ad4
drive b device /dev/ad5
drive c device /dev/ad6
drive d device /dev/ad7

volume data
plex org striped 493k # or some other stripesize
sd drive a
sd drive b

plex org striped 493k
sd drive c
sd drive d


Should do the trick. Also note that you can use a combination of gstripe(8) and
gmirror(8) to achieve the same effect.

-- 
Ulf Lilleengen

From owner-freebsd-geom@FreeBSD.ORG  Tue Oct 30 01:38:38 2007
Return-Path: <owner-freebsd-geom@FreeBSD.ORG>
Delivered-To: freebsd-geom@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D99D216A469
	for <freebsd-geom@freebsd.org>; Tue, 30 Oct 2007 01:38:38 +0000 (UTC)
	(envelope-from lulf@stud.ntnu.no)
Received: from signal.itea.ntnu.no (signal.itea.ntnu.no [129.241.190.231])
	by mx1.freebsd.org (Postfix) with ESMTP id 9982613C4B3
	for <freebsd-geom@freebsd.org>; Tue, 30 Oct 2007 01:38:38 +0000 (UTC)
	(envelope-from lulf@stud.ntnu.no)
Received: from localhost (localhost [127.0.0.1])
	by signal.itea.ntnu.no (Postfix) with ESMTP id 74FAC34876;
	Mon, 29 Oct 2007 23:13:09 +0100 (CET)
Received: from caracal.stud.ntnu.no (caracal.stud.ntnu.no [129.241.56.185])
	by signal.itea.ntnu.no (Postfix) with ESMTP;
	Mon, 29 Oct 2007 23:13:09 +0100 (CET)
Received: by caracal.stud.ntnu.no (Postfix, from userid 2312)
	id AF1046241BA; Mon, 29 Oct 2007 23:13:23 +0100 (CET)
Date: Mon, 29 Oct 2007 23:13:23 +0100
From: Ulf Lilleengen <lulf@stud.ntnu.no>
To: Felipe Neuwald <felipe@neuwald.biz>
Message-ID: <20071029221323.GA28014@stud.ntnu.no>
References: <47260370.3060004@neuwald.biz>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <47260370.3060004@neuwald.biz>
User-Agent: Mutt/1.5.9i
X-Content-Scanned: with sophos and spamassassin at mailgw.ntnu.no.
Cc: freebsd-geom@freebsd.org
Subject: Re: Raid 0 + 1
X-BeenThere: freebsd-geom@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: GEOM-specific discussions and implementations
	<freebsd-geom.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-geom>
List-Post: <mailto:freebsd-geom@freebsd.org>
List-Help: <mailto:freebsd-geom-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 30 Oct 2007 01:38:38 -0000

On man, okt 29, 2007 at 01:59:44 -0200, Felipe Neuwald wrote:
> Hi Folks,
> 
> I talked with my customer, and we decided to implement a 0 + 1 RAID,
> with 4 disks of 250Gb each.
> 
> Here is how my RAID is working now:
> 
> [root@fileserver /]# gvinum list
> 4 drives:
> D a                     State: up       /dev/ad4        A: 0/238474 MB (0%)
> D b                     State: up       /dev/ad5        A: 0/238475 MB (0%)
> D c                     State: up       /dev/ad6        A: 0/238475 MB (0%)
> D d                     State: up       /dev/ad7        A: 0/238475 MB (0%)
> 
> 1 volume:
> V data                  State: up       Plexes:       1 Size:        931 GB
> 
> 1 plex:
> P data.p0             S State: up       Subdisks:     4 Size:        931 GB
> 
> 4 subdisks:
> S data.p0.s0            State: up       D: a            Size:        232 GB
> S data.p0.s1            State: up       D: b            Size:        232 GB
> S data.p0.s2            State: up       D: c            Size:        232 GB
> S data.p0.s3            State: up       D: d            Size:        232 GB
> 
> 
> Could someone give me on example of how I'll implement a 0 + 1 RAID?
> 
Hello,

You need to create two plexes to do this, so something like:

drive a device /dev/ad4
drive b device /dev/ad5
drive c device /dev/ad6
drive d device /dev/ad7

volume data
plex org striped 493k # or some other stripesize
sd drive a
sd drive b

plex org striped 493k
sd drive c
sd drive d


Should do the trick. Also note that you can use a combination of gstripe(8) and
gmirror(8) to achieve the same effect.

-- 
Ulf Lilleengen

From owner-freebsd-geom@FreeBSD.ORG  Tue Oct 30 04:34:49 2007
Return-Path: <owner-freebsd-geom@FreeBSD.ORG>
Delivered-To: freebsd-geom@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C235F16A419
	for <freebsd-geom@freebsd.org>; Tue, 30 Oct 2007 04:34:49 +0000 (UTC)
	(envelope-from tom.hurst@clara.net)
Received: from spork.qfe3.net (spork.qfe3.net [212.13.207.101])
	by mx1.freebsd.org (Postfix) with ESMTP id 84D7313C4AA
	for <freebsd-geom@freebsd.org>; Tue, 30 Oct 2007 04:34:49 +0000 (UTC)
	(envelope-from tom.hurst@clara.net)
Received: from [81.104.144.87] (helo=voi.aagh.net)
	by spork.qfe3.net with esmtp (Exim 4.66 (FreeBSD))
	(envelope-from <tom.hurst@clara.net>)
	id 1ImiQJ-000OUJ-31; Tue, 30 Oct 2007 04:09:51 +0000
Received: from freaky by voi.aagh.net with local (Exim 4.68 (FreeBSD))
	(envelope-from <tom.hurst@clara.net>)
	id 1ImiQI-000KCr-W2; Tue, 30 Oct 2007 04:09:51 +0000
Date: Tue, 30 Oct 2007 04:09:50 +0000
From: Thomas Hurst <tom.hurst@clara.net>
To: Kevin Thompson <antiduh@csh.rit.edu>
Message-ID: <20071030040950.GA76585@voi.aagh.net>
Mail-Followup-To: Kevin Thompson <antiduh@csh.rit.edu>,
	Felipe Neuwald <felipe@neuwald.biz>, freebsd-geom@freebsd.org
References: <4414.147.177.192.113.1193678595.squirrel@angst.csh.rit.edu>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4414.147.177.192.113.1193678595.squirrel@angst.csh.rit.edu>
Organization: Not much.
User-Agent: Mutt/1.5.16 (2007-06-09)
Sender: Thomas Hurst <freaky@voi.aagh.net>
Cc: Felipe Neuwald <felipe@neuwald.biz>, freebsd-geom@freebsd.org
Subject: Re: Raid 0 + 1
X-BeenThere: freebsd-geom@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: GEOM-specific discussions and implementations
	<freebsd-geom.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-geom>
List-Post: <mailto:freebsd-geom@freebsd.org>
List-Help: <mailto:freebsd-geom-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 30 Oct 2007 04:34:49 -0000

* Kevin Thompson (antiduh@csh.rit.edu) wrote:

> Now we're going to pull together the d partion on each drive, create
> pairwise stripes, then mirror the new stripes together.
>  gstripe label -vh -s 131072 st0 /dev/label/geom0s1d /dev/geom1s1d
>  gstripe label -vh -s 131072 st1 /dev/label/geom2s1d /dev/geom3s1d
> 
> We should now have two new devices '''/dev/stripe/st0''' and
> '''/dev/stripe/st1'''. Now mirror those two devices to create our final
> device that will next be used for the rest of our filesystems:
>  gmirror label -vh gm0 /dev/stripe/st0 /dev/stripe/st1

Er, shouldn't you be doing this the other way around?  Make two mirrors,
then stripe across them.  IO performance should be identical in the
normal case, degrade less with a single disk failure (since only one
disk drops out of the array instead of an entire pair), and it'll be
more likely to survive a two disk failure.

-- 
Thomas 'Freaky' Hurst
    http://hur.st/

From owner-freebsd-geom@FreeBSD.ORG  Tue Oct 30 05:30:36 2007
Return-Path: <owner-freebsd-geom@FreeBSD.ORG>
Delivered-To: freebsd-geom@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C54C116A419;
	Tue, 30 Oct 2007 05:30:36 +0000 (UTC)
	(envelope-from outi@bytephobia.de)
Received: from dd18312.kasserver.com (dd18312.kasserver.com [85.13.138.194])
	by mx1.freebsd.org (Postfix) with ESMTP id 57A9B13C4A6;
	Tue, 30 Oct 2007 05:30:36 +0000 (UTC)
	(envelope-from outi@bytephobia.de)
Received: from mobility.bytephobia.de (pD9E35322.dip.t-dialin.net
	[217.227.83.34])
	by dd18312.kasserver.com (Postfix) with ESMTP id 674491936B05A;
	Tue, 30 Oct 2007 00:12:38 +0100 (CET)
Date: Tue, 30 Oct 2007 00:12:28 +0100
From: Patrick Hurrelmann <outi@bytephobia.de>
To: freebsd-geom@freebsd.org, freebsd-questions@freebsd.org
Message-ID: <20071030001228.65816a87@mobility.bytephobia.de>
Organization: private
X-Mailer: Claws Mail 3.0.2 (GTK+ 2.10.14; i386-portbld-freebsd7.0)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Cc: 
Subject: Fw: Best way for a gmirrored gjournal?
X-BeenThere: freebsd-geom@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: GEOM-specific discussions and implementations
	<freebsd-geom.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-geom>
List-Post: <mailto:freebsd-geom@freebsd.org>
List-Help: <mailto:freebsd-geom-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 30 Oct 2007 05:30:36 -0000

Dear all,

I'm forwarding this message to this lists, as current@ obviously was
the wrong recipient.

I kindly ask you for your ideas and proposals on my questions below.

Regards,
Patrick


Begin forwarded message:

Date: Mon, 22 Oct 2007 19:08:35 +0200
From: Patrick Hurrelmann <outi@bytephobia.de>
To: freebsd-current@freebsd.org
Subject: Best way for a gmirrored gjournal?


Hi all,

Currently I'm trying to install a new server and need some hints on how
to best configure filesystems using gmirror and gjournal.

The server in question is a amd64 with 512mb of ram and 2x 80gb sata
hdds. So I was thinking of a mount-point layout like the following:

ad0s1
 /       (1gb)
 swap    (1gb)
 /var    (8gb)
 /tmp    (1gb)
 /home   (4gb)
 /usr   (13gb)
 /jails (39gb)
ad0s2
 10gb for journaling

Which would leave a space of 10gb for journaling. I digged through the
mailinglist-archives and man-pages of gmirror and gjournal but all I
ended up with are questions and doubts :)

Now I wanted to create 2 mirrors (gm0s1 and gm0s2). Gmirror gm0s1
containing the slices ad0s1 and ad2s1, while gm0s2 should contain ad0s2
and ad2s2. I created 2 slices, as with the above shown partitioning I
was running out of mount-points for this slice.

Is such a layout reasonable? Or is it stupid to use a dedicated slice
just for journaling and better skip e.g /tmp partition to leave space
for a dedicated journaling partition on this slice? Btw. are 10gb
enough for journaling of 6 partitions? Or do I need one dedicated
partition for journaling each?

If I skip using a separate partition for journaling data, gjournal keeps
telling me that e.g. the root partion of 1gb is too small for
jorunaling. Would it be save to decrease journal size altough man-page
discourages?

What do you people out there suggest? How do you handle systems with
gmirror and gjournal combined? Or even use ZFS although ram is limited
(as the machine will serve up several jails with e.g. postgres)? 

I'm really looking forward to suggestions from you. I intentionally
directed this mail to current@ as I think that here are the most people
around with experience on gjournal. But if I better should direct this
mail to questions@ I'm happy to do so, too.

Best regards,

Patrick

-- 
====================================================================
Patrick Hurrelmann   | "Programming today is a race between software
Mannheim, Germany    | engineers striving to build bigger and better
                     | idiot-proof programs, and the Universe trying
outi@bytephobia.de   | to produce bigger and better idiots. So far,
www.bytephobia.de    | the Universe is winning."         - Rich Cook

                  /"\
                  \ /    ASCII Ribbon Campaign
                   X   against HTML email & vCards
                  / \

From owner-freebsd-geom@FreeBSD.ORG  Wed Oct 31 19:28:11 2007
Return-Path: <owner-freebsd-geom@FreeBSD.ORG>
Delivered-To: freebsd-geom@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 9CD4716A421
	for <freebsd-geom@freebsd.org>; Wed, 31 Oct 2007 19:28:11 +0000 (UTC)
	(envelope-from freebsd-lists@ideo.com.br)
Received: from an-out-0708.google.com (an-out-0708.google.com [209.85.132.244])
	by mx1.freebsd.org (Postfix) with ESMTP id 1532413C494
	for <freebsd-geom@freebsd.org>; Wed, 31 Oct 2007 19:28:10 +0000 (UTC)
	(envelope-from freebsd-lists@ideo.com.br)
Received: by an-out-0708.google.com with SMTP id c24so39571ana
	for <freebsd-geom@freebsd.org>; Wed, 31 Oct 2007 12:27:46 -0700 (PDT)
Received: by 10.142.162.5 with SMTP id k5mr2081859wfe.1193843658279;
	Wed, 31 Oct 2007 08:14:18 -0700 (PDT)
Received: by 10.142.155.19 with HTTP; Wed, 31 Oct 2007 08:14:18 -0700 (PDT)
Message-ID: <8d4842b50710310814w3880f7d3ldf8abe3a236cbcc8@mail.gmail.com>
Date: Wed, 31 Oct 2007 12:14:18 -0300
From: "Marco Haddad" <freebsd-lists@ideo.com.br>
To: freebsd-geom@freebsd.org
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Subject: gvinum and raid5
X-BeenThere: freebsd-geom@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: GEOM-specific discussions and implementations
	<freebsd-geom.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-geom>
List-Post: <mailto:freebsd-geom@freebsd.org>
List-Help: <mailto:freebsd-geom-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 31 Oct 2007 19:28:11 -0000

Hello,

I have been using gvinum to build raid5 volumes since vinum retairement with
some success. Most dificulties are related to not yet implemented commands,
but the hope for a more complete version keeps me goning on.

I found in recent researchs that a lot of people say gvinum should not be
trusted, when it comes to raid5. I began to get worried. Am I alone using
gvinum raid5? Did everyone abandon it? What about the development guys? Is
there anyone still working on it? Will a complete gvinum ever be released?

Hopping for good news,
Marco Haddad

From owner-freebsd-geom@FreeBSD.ORG  Wed Oct 31 21:58:10 2007
Return-Path: <owner-freebsd-geom@FreeBSD.ORG>
Delivered-To: freebsd-geom@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 3DBDD16A421
	for <freebsd-geom@freebsd.org>; Wed, 31 Oct 2007 21:58:10 +0000 (UTC)
	(envelope-from lulf@stud.ntnu.no)
Received: from signal.itea.ntnu.no (signal.itea.ntnu.no [129.241.190.231])
	by mx1.freebsd.org (Postfix) with ESMTP id EAE2A13C49D
	for <freebsd-geom@freebsd.org>; Wed, 31 Oct 2007 21:58:09 +0000 (UTC)
	(envelope-from lulf@stud.ntnu.no)
Received: from localhost (localhost [127.0.0.1])
	by signal.itea.ntnu.no (Postfix) with ESMTP id E97B63387D;
	Wed, 31 Oct 2007 22:57:40 +0100 (CET)
Received: from caracal.stud.ntnu.no (caracal.stud.ntnu.no [129.241.56.185])
	by signal.itea.ntnu.no (Postfix) with ESMTP;
	Wed, 31 Oct 2007 22:57:40 +0100 (CET)
Received: by caracal.stud.ntnu.no (Postfix, from userid 2312)
	id 40A48624219; Wed, 31 Oct 2007 22:57:56 +0100 (CET)
Date: Wed, 31 Oct 2007 22:57:56 +0100
From: Ulf Lilleengen <lulf@stud.ntnu.no>
To: Marco Haddad <freebsd-lists@ideo.com.br>
Message-ID: <20071031215756.GB1670@stud.ntnu.no>
References: <8d4842b50710310814w3880f7d3ldf8abe3a236cbcc8@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <8d4842b50710310814w3880f7d3ldf8abe3a236cbcc8@mail.gmail.com>
User-Agent: Mutt/1.5.9i
X-Content-Scanned: with sophos and spamassassin at mailgw.ntnu.no.
Cc: freebsd-geom@freebsd.org
Subject: Re: gvinum and raid5
X-BeenThere: freebsd-geom@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: GEOM-specific discussions and implementations
	<freebsd-geom.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-geom>
List-Post: <mailto:freebsd-geom@freebsd.org>
List-Help: <mailto:freebsd-geom-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 31 Oct 2007 21:58:10 -0000

On ons, okt 31, 2007 at 12:14:18 -0300, Marco Haddad wrote:
> Hello,
> 
> I have been using gvinum to build raid5 volumes since vinum retairement with
> some success. Most dificulties are related to not yet implemented commands,
> but the hope for a more complete version keeps me goning on.
> 
> I found in recent researchs that a lot of people say gvinum should not be
> trusted, when it comes to raid5. I began to get worried. Am I alone using
> gvinum raid5? Did everyone abandon it? What about the development guys? Is
> there anyone still working on it? Will a complete gvinum ever be released?
> 

I'm working on it, and there are definately people still using it. (I've
recieved a number of private mails as well as those seen on this list). IMO,
gvinum can be trusted when it comes to raid5. I've not experienced any
corruption-bugs or anything like that with it. 

I'm working on preparing my SoC work for inclusion in CURRENT these days (It
takes some time because it have to be reviewed, and it's a lot of code, so be
patient). The important commands (IMO :)) have been implemented
(attach/detach/start/stop/mirror/stripe/raid5/concat), but I'm curious to
hear what commands you do miss, so that I can look into it and consider
implementing them. I'm interested in helping making gvinum better, so any
suggestions are welcome. 

I hope to backport gvinum to both RELENG_7 and RELENG_6 when the time is
right, but as I said, you'd have to be patient.

-- 
Ulf Lilleengen

From owner-freebsd-geom@FreeBSD.ORG  Thu Nov  1 09:32:18 2007
Return-Path: <owner-freebsd-geom@FreeBSD.ORG>
Delivered-To: freebsd-geom@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 3041D16A46B;
	Thu,  1 Nov 2007 09:32:18 +0000 (UTC) (envelope-from avg@icyb.net.ua)
Received: from falcon.cybervisiontech.com (falcon.cybervisiontech.com
	[217.20.163.9])
	by mx1.freebsd.org (Postfix) with ESMTP id DD91B13C4A6;
	Thu,  1 Nov 2007 09:32:17 +0000 (UTC) (envelope-from avg@icyb.net.ua)
Received: from localhost (localhost [127.0.0.1])
	by falcon.cybervisiontech.com (Postfix) with ESMTP id 5F2A8744007;
	Thu,  1 Nov 2007 11:31:44 +0200 (EET)
X-Virus-Scanned: Debian amavisd-new at falcon.cybervisiontech.com
Received: from falcon.cybervisiontech.com ([127.0.0.1])
	by localhost (falcon.cybervisiontech.com [127.0.0.1]) (amavisd-new,
	port 10024)
	with ESMTP id UzRX8orDLg2d; Thu,  1 Nov 2007 11:31:44 +0200 (EET)
Received: from [10.2.1.87] (gateway.cybervisiontech.com.ua [88.81.251.18])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by falcon.cybervisiontech.com (Postfix) with ESMTP id 07A9C744005;
	Thu,  1 Nov 2007 11:31:43 +0200 (EET)
Message-ID: <47299CFE.6010309@icyb.net.ua>
Date: Thu, 01 Nov 2007 11:31:42 +0200
From: Andriy Gapon <avg@icyb.net.ua>
User-Agent: Thunderbird 2.0.0.6 (X11/20070803)
MIME-Version: 1.0
To: freebsd-geom@freebsd.org, Pawel Jakub Dawidek <pjd@freebsd.org>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Cc: 
Subject: gjournal for 6.X and fsck
X-BeenThere: freebsd-geom@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: GEOM-specific discussions and implementations
	<freebsd-geom.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-geom>
List-Post: <mailto:freebsd-geom@freebsd.org>
List-Help: <mailto:freebsd-geom-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 01 Nov 2007 09:32:18 -0000


It seems that gjournal patch for 6.X doesn't include the changes to make
fsck aware of gjounral that were added to CURRENT code.
Is it possible to produce a patch with such changes ?

I would appreciate even a patch as it was applied to current, I think it
will be very easy to massage it for 6.X. It's just that unlike svn (or
many other source control systems) it is very hard to extract a "change
set" to multiple files from CVS.

Thank you!

P.S. Just thought up of another nice-to-have thing: it seems that the
6.X patch has the earlier approach with .deleted directory, which was
later improved. It would be nice to get a patch for this as well. And
now I think that the two things are probably quite related.

-- 
Andriy Gapon

From owner-freebsd-geom@FreeBSD.ORG  Thu Nov  1 09:59:32 2007
Return-Path: <owner-freebsd-geom@FreeBSD.ORG>
Delivered-To: freebsd-geom@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 1ABD716A41A;
	Thu,  1 Nov 2007 09:59:32 +0000 (UTC) (envelope-from avg@icyb.net.ua)
Received: from falcon.cybervisiontech.com (falcon.cybervisiontech.com
	[217.20.163.9])
	by mx1.freebsd.org (Postfix) with ESMTP id CA28D13C4A8;
	Thu,  1 Nov 2007 09:59:31 +0000 (UTC) (envelope-from avg@icyb.net.ua)
Received: from localhost (localhost [127.0.0.1])
	by falcon.cybervisiontech.com (Postfix) with ESMTP id 6E59E744005;
	Thu,  1 Nov 2007 11:59:09 +0200 (EET)
X-Virus-Scanned: Debian amavisd-new at falcon.cybervisiontech.com
Received: from falcon.cybervisiontech.com ([127.0.0.1])
	by localhost (falcon.cybervisiontech.com [127.0.0.1]) (amavisd-new,
	port 10024)
	with ESMTP id fc5kcZj0BemT; Thu,  1 Nov 2007 11:59:09 +0200 (EET)
Received: from [10.2.1.87] (gateway.cybervisiontech.com.ua [88.81.251.18])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by falcon.cybervisiontech.com (Postfix) with ESMTP id 0A99043D47F;
	Thu,  1 Nov 2007 11:59:08 +0200 (EET)
Message-ID: <4729A36C.50506@icyb.net.ua>
Date: Thu, 01 Nov 2007 11:59:08 +0200
From: Andriy Gapon <avg@icyb.net.ua>
User-Agent: Thunderbird 2.0.0.6 (X11/20070803)
MIME-Version: 1.0
To: freebsd-geom@freebsd.org, Pawel Jakub Dawidek <pjd@freebsd.org>
References: <47299CFE.6010309@icyb.net.ua>
In-Reply-To: <47299CFE.6010309@icyb.net.ua>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Cc: 
Subject: Re: gjournal for 6.X and fsck
X-BeenThere: freebsd-geom@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: GEOM-specific discussions and implementations
	<freebsd-geom.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-geom>
List-Post: <mailto:freebsd-geom@freebsd.org>
List-Help: <mailto:freebsd-geom-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 01 Nov 2007 09:59:32 -0000

on 01/11/2007 11:31 Andriy Gapon said the following:
> It seems that gjournal patch for 6.X doesn't include the changes to make
> fsck aware of gjounral that were added to CURRENT code.
> Is it possible to produce a patch with such changes ?
> 
> I would appreciate even a patch as it was applied to current, I think it
> will be very easy to massage it for 6.X. It's just that unlike svn (or
> many other source control systems) it is very hard to extract a "change
> set" to multiple files from CVS.
> 
> Thank you!
> 
> P.S. Just thought up of another nice-to-have thing: it seems that the
> 6.X patch has the earlier approach with .deleted directory, which was
> later improved. It would be nice to get a patch for this as well. And
> now I think that the two things are probably quite related.

On another thought: maybe MFC gjournal to RELENG_6 in time for 6.3 ?
I know, this might a bit too much asking :-)

-- 
Andriy Gapon

From owner-freebsd-geom@FreeBSD.ORG  Fri Nov  2 09:04:29 2007
Return-Path: <owner-freebsd-geom@FreeBSD.ORG>
Delivered-To: freebsd-geom@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id BD26D16A417
	for <freebsd-geom@freebsd.org>; Fri,  2 Nov 2007 09:04:29 +0000 (UTC)
	(envelope-from joe@rootnode.com)
Received: from mail.osoft.us (osoft.us [67.14.192.59])
	by mx1.freebsd.org (Postfix) with ESMTP id 9F41F13C4A3
	for <freebsd-geom@freebsd.org>; Fri,  2 Nov 2007 09:04:29 +0000 (UTC)
	(envelope-from joe@rootnode.com)
Received: from [10.0.2.105] (adsl-65-67-81-98.dsl.ltrkar.swbell.net
	[65.67.81.98]) by mail.osoft.us (Postfix) with ESMTP id 3F2D833C8A;
	Thu,  1 Nov 2007 22:21:01 -0600 (CST)
Message-ID: <472AA59F.3020103@rootnode.com>
Date: Thu, 01 Nov 2007 23:20:47 -0500
From: Joe Koberg <joe@rootnode.com>
User-Agent: Mail/News 1.5.0.8 (Windows/20061104)
MIME-Version: 1.0
To: Ulf Lilleengen <lulf@stud.ntnu.no>
References: <8d4842b50710310814w3880f7d3ldf8abe3a236cbcc8@mail.gmail.com>
	<20071031215756.GB1670@stud.ntnu.no>
In-Reply-To: <20071031215756.GB1670@stud.ntnu.no>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Marco Haddad <freebsd-lists@ideo.com.br>, freebsd-geom@freebsd.org
Subject: Re: gvinum and raid5
X-BeenThere: freebsd-geom@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: GEOM-specific discussions and implementations
	<freebsd-geom.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-geom>
List-Post: <mailto:freebsd-geom@freebsd.org>
List-Help: <mailto:freebsd-geom-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 02 Nov 2007 09:04:29 -0000

Ulf Lilleengen wrote:
> On ons, okt 31, 2007 at 12:14:18 -0300, Marco Haddad wrote:
>   
>> I found in recent researchs that a lot of people say gvinum should not be
>> trusted, when it comes to raid5. I began to get worried. Am I alone using
>>
>>     
> I'm working on it, and there are definately people still using it. (I've
> recieved a number of private mails as well as those seen on this list). IMO,
> gvinum can be trusted when it comes to raid5. I've not experienced any
> corruption-bugs or anything like that with it. 
>   

The source of the mistrust may be the fact that few software-only RAID-5 
systems can guarantee write consistency across a multi-drive 
read-update-write cycle in the case of, e.g., power failure.

There is no way for the software RAID to force the parallel writes to 
complete simultaneously on all drives, and from the time the first 
starts until the last is completed, the array is in an inconsistent 
(corrupted) state.

Dedicated RAID hardware solves this with battery-backed RAM that 
maintains the array state in a very robust manner.  Dedicated 
controllers also tend to be connected to "better" SCSI or SAS drives 
that properly report write completion via their command queuing protocol.

ZFS tackles this problem by not writing data back in place, with inline 
checksums of all data and metadata (so that corruption is detectable), 
and by dynamically-sized "full stripe writes" for every write (no 
read-update-write cycle required).

A solution for gvinum/UFS may be to set the stripe and filesystem block 
sizes the same, so that a partial stripe is never written and thus no 
read-update-write cycle occurs. However the use of in-place updates 
still has the possibility of corrupting data if the write completes on 
one drive in the array and not the other.

The visibility of this "RAID-5 hole" may be very low if you have a 
well-behaved system (and drives) on a UPS.  But since the corruption is 
silent, you can be stung far down the road if something "bad" does 
happen without notice.  Especially with ATA drives with less robust 
writeback cache behavior in small-system environments (without backup 
power, maybe-flaky cabling, etc...).

It is important to note that I am describing a universal problem with 
software RAID-5, and not any shortcoming of gvinum in particular.


Joe Koberg

joe at osoft dot us


From owner-freebsd-geom@FreeBSD.ORG  Fri Nov  2 11:13:38 2007
Return-Path: <owner-freebsd-geom@FreeBSD.ORG>
Delivered-To: freebsd-geom@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E1BFE16A421
	for <freebsd-geom@freebsd.org>; Fri,  2 Nov 2007 11:13:38 +0000 (UTC)
	(envelope-from anjoel.s@gmail.com)
Received: from ro-out-1112.google.com (ro-out-1112.google.com [72.14.202.183])
	by mx1.freebsd.org (Postfix) with ESMTP id 6C69213C480
	for <freebsd-geom@freebsd.org>; Fri,  2 Nov 2007 11:13:37 +0000 (UTC)
	(envelope-from anjoel.s@gmail.com)
Received: by ro-out-1112.google.com with SMTP id m6so83580roe
	for <freebsd-geom@freebsd.org>; Fri, 02 Nov 2007 04:13:23 -0700 (PDT)
Received: by 10.115.79.1 with SMTP id g1mr991947wal.1193945380880;
	Thu, 01 Nov 2007 12:29:40 -0700 (PDT)
Received: by 10.114.160.7 with HTTP; Thu, 1 Nov 2007 12:29:40 -0700 (PDT)
Message-ID: <3a72fe8f0711011229s1d23366ame17ff3f4ee1f65e0@mail.gmail.com>
Date: Thu, 1 Nov 2007 17:29:40 -0200
From: "Anderson J. de Souza" <anjoel.s@gmail.com>
To: freebsd-geom@freebsd.org
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Subject: Bug on md + geli + jail !?
X-BeenThere: freebsd-geom@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: GEOM-specific discussions and implementations
	<freebsd-geom.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-geom>
List-Post: <mailto:freebsd-geom@freebsd.org>
List-Help: <mailto:freebsd-geom-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 02 Nov 2007 11:13:39 -0000

Hello people !

    Well,... I have an  good system with Jails over Partitioned Encrypted
Memory Disc, initialized with rc system, but if I does one jail stop
[jailname], geli don't remove partition mdX.elia, and I can't use this
device again.

I try use rm on device file, then I when restart geli over mdX, its dont
make mdX.elia ,..
i try show it with "devfs rule apply path md6.elia hide" but then show 2
devices: like this:
crw-r-----  1 root  operator    0, 160 Nov  1 09:45 /dev/md6.elia
crw-r-----  1 root  operator    0, 160 Nov  1 09:45 /dev/md6.elia

if I try mount the device the system show:
mount: /dev/md6.elia: Device not configured


My partition scheme: (BEFORE BUG)
/dev/md6.elia on /jails/revproxy (ufs, local, read-only, acls)
/dev/md6.elib on /jails/revproxy/var (ufs, local, soft-updates, acls)
/dev/md6.elid on /jails/revproxy/tmp (ufs, local, nosuid, soft-updates,
acls)
/dev/md6.elie on /jails/revproxy/usr (ufs, local, soft-updates, acls)
/dev/md6.elia on /jails/revproxy/data (ufs, local, nosuid, soft-updates,
acls)


TUNEFS AFTER BUG:
free05-meta# tunefs -p /dev/md6.elia
tunefs: /dev/md6.elia: could not open special device

TUNEFS BEFORE BUG:
# tunefs -p /dev/md6.elia
tunefs: ACLs: (-a)                                         disabled
tunefs: MAC multilabel: (-l)                               disabled
tunefs: soft updates: (-n)                                 enabled
tunefs: maximum blocks per file in a cylinder group: (-e)  2048
tunefs: average file size: (-f)                            16384
tunefs: average number of files in a directory: (-s)       64
tunefs: minimum percentage of free space: (-m)             8%
tunefs: optimization preference: (-o)                      time
tunefs: volume label: (-L)
# tunefs -p /dev/md6.elib
tunefs: ACLs: (-a)                                         disabled
tunefs: MAC multilabel: (-l)                               disabled
tunefs: soft updates: (-n)                                 enabled
tunefs: maximum blocks per file in a cylinder group: (-e)  2048
tunefs: average file size: (-f)                            16384
tunefs: average number of files in a directory: (-s)       64
tunefs: minimum percentage of free space: (-m)             8%
tunefs: optimization preference: (-o)                      time
tunefs: volume label: (-L)
# tunefs -p /dev/md6.elid
tunefs: ACLs: (-a)                                         disabled
tunefs: MAC multilabel: (-l)                               disabled
tunefs: soft updates: (-n)                                 enabled
tunefs: maximum blocks per file in a cylinder group: (-e)  2048
tunefs: average file size: (-f)                            16384
tunefs: average number of files in a directory: (-s)       64
tunefs: minimum percentage of free space: (-m)             8%
tunefs: optimization preference: (-o)                      time
tunefs: volume label: (-L)
# tunefs -p /dev/md6.elie
tunefs: ACLs: (-a)                                         disabled
tunefs: MAC multilabel: (-l)                               disabled
tunefs: soft updates: (-n)                                 enabled
tunefs: maximum blocks per file in a cylinder group: (-e)  2048
tunefs: average file size: (-f)                            16384
tunefs: average number of files in a directory: (-s)       64
tunefs: minimum percentage of free space: (-m)             8%
tunefs: optimization preference: (-o)                      time
tunefs: volume label: (-L)


My Devices before stop jail:
crw-r-----  1 root  operator    0, 101 Nov  1 09:45 /dev/md6
crw-r-----  1 root  operator    0, 122 Nov  1 09:45 /dev/md6.eli
crw-r-----  1 root  operator    0, 123 Nov  1 09:45 /dev/md6.elia
crw-r-----  1 root  operator    0, 124 Nov  1 09:45 /dev/md6.elib
crw-r-----  1 root  operator    0, 125 Nov  1 09:45 /dev/md6.elic
crw-r-----  1 root  operator    0, 126 Nov  1 09:45 /dev/md6.elid
crw-r-----  1 root  operator    0, 127 Nov  1 09:45 /dev/md6.elie


MY BSDLABELS:
# /dev/md6.eli:
8 partitions:
#        size   offset    fstype   [fsize bsize bps/cpg]
  a:   262144       16    4.2BSD     2048 16384 16392
  b:  2097152   262160    4.2BSD     2048 16384 28552
  c:  6291455        0    unused        0     0         # "raw" part, don't
edit
  d:   524288  2359312    4.2BSD     2048 16384 32776
  e:  3407855  2883600    4.2BSD     2048 16384 28552


-- 
___________________
Anderson J. de Souza
- Networking and Security -

[ - Professional Consulting - The best firewall - ]
http://anjoel.s.googlepages.com - anjoel.s@gmail.com
Phone: +55 (54) 9115.13.15 - Sip: 1-747-006-0374

From owner-freebsd-geom@FreeBSD.ORG  Fri Nov  2 20:11:40 2007
Return-Path: <owner-freebsd-geom@FreeBSD.ORG>
Delivered-To: freebsd-geom@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B3D9416A41A
	for <freebsd-geom@freebsd.org>; Fri,  2 Nov 2007 20:11:40 +0000 (UTC)
	(envelope-from pgiessel@mac.com)
Received: from smtpoutm.mac.com (smtpoutm.mac.com [17.148.16.66])
	by mx1.freebsd.org (Postfix) with ESMTP id 8893E13C48A
	for <freebsd-geom@freebsd.org>; Fri,  2 Nov 2007 20:11:39 +0000 (UTC)
	(envelope-from pgiessel@mac.com)
Received: from webmail012 (webmail012-s [10.13.128.12])
	by smtpoutm.mac.com (Xserve/smtpout003/MantshX 4.0) with ESMTP id
	lA2JcaJp009034; Fri, 2 Nov 2007 12:38:36 -0700 (PDT)
Date: Fri, 02 Nov 2007 12:38:36 -0700
From: Peter Giessel <pgiessel@mac.com>
To: Joe Koberg <joe@rootnode.com>
Message-ID: <0001DFFC-0115-1000-9A80-3F81219C1B16-Webmail-10013@mac.com>
in-reply-to: <472AA59F.3020103@rootnode.com>
references: <8d4842b50710310814w3880f7d3ldf8abe3a236cbcc8@mail.gmail.com>
	<20071031215756.GB1670@stud.ntnu.no> <472AA59F.3020103@rootnode.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-Originating-IP: 69.178.5.90
Received: from [69.178.5.90] from webmail.mac.com with HTTP;
	Fri, 02 Nov 2007 12:38:36 -0700
Cc: Marco Haddad <freebsd-lists@ideo.com.br>,
	Ulf Lilleengen <lulf@stud.ntnu.no>, freebsd-geom@freebsd.org
Subject: Re: gvinum and raid5
X-BeenThere: freebsd-geom@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: GEOM-specific discussions and implementations
	<freebsd-geom.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-geom>
List-Post: <mailto:freebsd-geom@freebsd.org>
List-Help: <mailto:freebsd-geom-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 02 Nov 2007 20:11:40 -0000

On Friday, November 02, 2007, at 01:04AM, "Joe Koberg" <joe@rootnode.com> wrote:
>Ulf Lilleengen wrote:
>> On ons, okt 31, 2007 at 12:14:18 -0300, Marco Haddad wrote:
>>   
>>> I found in recent researchs that a lot of people say gvinum should not be
>>> trusted, when it comes to raid5. I began to get worried. Am I alone using
>>>
>>>     
>> I'm working on it, and there are definately people still using it. (I've
>> recieved a number of private mails as well as those seen on this list). IMO,
>> gvinum can be trusted when it comes to raid5. I've not experienced any
>> corruption-bugs or anything like that with it. 
>>   
>
>The source of the mistrust may be the fact that few software-only RAID-5 
>systems can guarantee write consistency across a multi-drive 
>read-update-write cycle in the case of, e.g., power failure.

That may be the true source, but my source of mistrust comes from a few
drive failures and gvinum's inability to rebuild the replaced drive.

Worked fine under vinum in tests, tried the same thing in gvinum (granted,
this was under FreeBSD 5), and the array failed to rebuild.

I can't be 100% sure it wasn't a flakey ATA controller and not gvinum's
fault, and I no longer have access to the box to play with, but when I was
playing with gvinum, replacing a failed drive usually resulted in panics.

From owner-freebsd-geom@FreeBSD.ORG  Sat Nov  3 01:33:19 2007
Return-Path: <owner-freebsd-geom@FreeBSD.ORG>
Delivered-To: freebsd-geom@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E3E7E16A469
	for <freebsd-geom@freebsd.org>; Sat,  3 Nov 2007 01:33:19 +0000 (UTC)
	(envelope-from freebsd-lists@ideo.com.br)
Received: from nz-out-0506.google.com (nz-out-0506.google.com [64.233.162.228])
	by mx1.freebsd.org (Postfix) with ESMTP id 957D713C48E
	for <freebsd-geom@freebsd.org>; Sat,  3 Nov 2007 01:33:18 +0000 (UTC)
	(envelope-from freebsd-lists@ideo.com.br)
Received: by nz-out-0506.google.com with SMTP id l8so727254nzf
	for <freebsd-geom@freebsd.org>; Fri, 02 Nov 2007 18:32:55 -0700 (PDT)
Received: by 10.142.14.20 with SMTP id 20mr721912wfn.1194053574296;
	Fri, 02 Nov 2007 18:32:54 -0700 (PDT)
Received: by 10.142.135.15 with HTTP; Fri, 2 Nov 2007 18:32:54 -0700 (PDT)
Message-ID: <8d4842b50711021832g7ad7cec9x48d2f114b1e41f5f@mail.gmail.com>
Date: Fri, 2 Nov 2007 22:32:54 -0300
From: "Marco Haddad" <freebsd-lists@ideo.com.br>
To: "Peter Giessel" <pgiessel@mac.com>
In-Reply-To: <0001DFFC-0115-1000-9A80-3F81219C1B16-Webmail-10013@mac.com>
MIME-Version: 1.0
References: <8d4842b50710310814w3880f7d3ldf8abe3a236cbcc8@mail.gmail.com>
	<20071031215756.GB1670@stud.ntnu.no> <472AA59F.3020103@rootnode.com>
	<0001DFFC-0115-1000-9A80-3F81219C1B16-Webmail-10013@mac.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: freebsd-geom@freebsd.org, Ulf Lilleengen <lulf@stud.ntnu.no>,
	Joe Koberg <joe@rootnode.com>
Subject: Re: gvinum and raid5
X-BeenThere: freebsd-geom@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: GEOM-specific discussions and implementations
	<freebsd-geom.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-geom>
List-Post: <mailto:freebsd-geom@freebsd.org>
List-Help: <mailto:freebsd-geom-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 03 Nov 2007 01:33:20 -0000

Hi,

I must say that I had a strong faith on vinum too. I used it on a dozen
servers to build raid5 volumes, specially when the hard drives were small
and unreliable. So I had a few crashes naturally, but replacing the failed
disk was easy and rebuild worked all times.

I started using gvinum at the first SCSI controller not supported by vinum
found. As gvinum solved the vinum problem with that controller, it
immediately received the same faith I had on vinum. I kept using gvinum many
times after, till my faith was shaken by a hard disk crash, because I could
not get the replacement drive added to the raid5 volume. After a lot of head
bumping against the wall, I came up with this work around procedure to
replace a failed disk.

I have just used this procedure today to replace a SATA hard disk that I
suspect was the cause of an intermittent failure, with such a success that I
began to think it isn't so bad... Anyway I'll describe a simple example in
order to get your comments.

Suppose a simple system with three hard disks ad0, ad1 and ad2. They were
fdisked and labled equally. ad0s1a is the / and ad0s1d, ad1s1d and ad2s1d
are of the same size and are used by gvinum as drives AD0, AD1 and AD2. Each
drive has only one slice and they are joined in a raid5 plex formming the
volume VOL. The gvinum create script would be the following:

drive AD0 device /dev/ad0s1d
drive AD1 device /dev/ad1s1d
drive AD2 device /dev/ad2s1d
volume VOL
  plex org raid5 128K
    sd drive AD0
    sd drive AD1
    sd drive AD2

Suppose ad1 crashes and gvinum marks it as down. With the command "gvinum l"
we would get something like this:

3 drives:
D AD0    State: up        /dev/ad0s1d    ...
D AD1    State: down    /dev/ad1s1d    ...
D AD2    State: up        /dev/ad2s1d    ...

1 volumes:
V VOL    State: up    ...

1 plexes:
P VOL.p0    R5 State: degraded    ...

3 subdisks:
S VOL.p0.s0    State: up        D: AD0    ...
S VOL.p0.s1    State: down    D: AD1    ...
S VOL.p0.s2    State: up        D: AD2    ...


First thing I do: edit fstab and comment out the line mounting
/dev/gvinum/VOL wherever it was mounted. It is necessary because once
mounted gvinum can not operate most commands, and umount doesn't do the
trick. Then I shutdown the system and replace the hard disk and bring it up
again.

At this point the first weird thing can be noted. With 'gvinum l' you would
get:

2 drives:
D AD0    State: up        /dev/ad0s1d    ...
D AD2    State: up        /dev/ad2s1d    ...

1 volumes:
V VOL    State: up    ...

1 plexes:
P VOL.p0    R5 State: up    ...

3 subdisks:
S VOL.p0.s0    State: up        D: AD0    ...
S VOL.p0.s1    State: up        D: AD1    ...
S VOL.p0.s2    State: up        D: AD2    ...

What? The AD1 is gonne, ok, but why the subdisk VOL.p0.s1 is up? And it
makes the plex up instead of degraded. The first time I saw it I got the
shivers.

Next step is to fdisk and label the new disk just like the old one. The new
disk can be bigger but, I think, the partition ad1s1d must be the same size
as before.

At this point should be enough to use gvinum create with a script file
containing only the line:

drive AD1 device /dev/ad1s1d

but gvinum would panic with that and the system would lock or core dump.
Then something weird must be done: remove all gvinum objects with 'gvinum rm
---'. Yes, just to make it clear, in this case the commands would be:

gvinum rm -r AD0
gvinum rm -r AD2
gvinum rm VOL
gvinum rm VOL.p0
gvinum rm VOL.p0.s1

Then we can use 'gvinum create' with the original script to recreate
everything.

Now it is all up again, but it isn't just right. The subdisk VOL.p0.s1 must
be marked as stale with:

gvinum setstate -f stale VOL.p0.s1

This brings back the plex to degraded mode and we can use:

gvinum start VOL

to rebuild it. It may take about 1 hour per 100GB of volume space, so we
better grab some lunch...

The progress can be seen at any time with:

gvinum ls

After that, a 'fsck -t ufs /dev/gvinum/VOL' will probably catch some errors
left behind when the drive came down.

Now we just need to uncomment that line in fstab and reboot.

I think there's no easier way...

Regards,
Marco Haddad


On 11/2/07, Peter Giessel <pgiessel@mac.com > wrote:
>
> On Friday, November 02, 2007, at 01:04AM, "Joe Koberg" < joe@rootnode.com>
> wrote:
> >Ulf Lilleengen wrote:
> >> On ons,


okt 31, 2007 at 12:14:18 -0300, Marco Haddad wrote:
> >>
> >>> I found in recent researchs that a lot of people say gvinum should not
> be
> >>> trusted, when it comes to raid5. I began to get worried. Am I alone
> using
> >>>
> >>>
> >> I'm working on it, and there are definately people still using it.
> (I've
> >> recieved a number of private mails as well as those seen on this list).
> IMO,
> >> gvinum can be trusted when it comes to raid5. I've not experienced any
> >> corruption-bugs or anything like that with it.
> >>
> >
> >The source of the mistrust may be the fact that few software-only RAID-5
> >systems can guarantee write consistency across a multi-drive
> >read-update-write cycle in the case of, e.g., power failure.
>
> That may be the true source, but my source of mistrust comes from a few
> drive failures and gvinum's inability to rebuild the replaced drive.
>
> Worked fine under vinum in tests, tried the same thing in gvinum (granted,
> this was under FreeBSD 5), and the array failed to rebuild.
>
> I can't be 100% sure it wasn't a flakey ATA controller and not gvinum's
> fault, and I no longer have access to the box to play with, but when I was
> playing with gvinum, replacing a failed drive usually resulted in panics.
>

From owner-freebsd-geom@FreeBSD.ORG  Sat Nov  3 02:57:09 2007
Return-Path: <owner-freebsd-geom@FreeBSD.ORG>
Delivered-To: freebsd-geom@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 771F216A418
	for <freebsd-geom@freebsd.org>; Sat,  3 Nov 2007 02:57:09 +0000 (UTC)
	(envelope-from lulf@stud.ntnu.no)
Received: from signal.itea.ntnu.no (signal.itea.ntnu.no [129.241.190.231])
	by mx1.freebsd.org (Postfix) with ESMTP id EF9C113C4B2
	for <freebsd-geom@freebsd.org>; Sat,  3 Nov 2007 02:57:08 +0000 (UTC)
	(envelope-from lulf@stud.ntnu.no)
Received: from localhost (localhost [127.0.0.1])
	by signal.itea.ntnu.no (Postfix) with ESMTP id D613433AD0;
	Sat,  3 Nov 2007 02:54:18 +0100 (CET)
Received: from caracal.stud.ntnu.no (caracal.stud.ntnu.no [129.241.56.185])
	by signal.itea.ntnu.no (Postfix) with ESMTP;
	Sat,  3 Nov 2007 02:54:18 +0100 (CET)
Received: by caracal.stud.ntnu.no (Postfix, from userid 2312)
	id 3153162421A; Sat,  3 Nov 2007 02:54:35 +0100 (CET)
Date: Sat, 3 Nov 2007 02:54:35 +0100
From: Ulf Lilleengen <lulf@stud.ntnu.no>
To: Marco Haddad <freebsd-lists@ideo.com.br>
Message-ID: <20071103015435.GB22755@stud.ntnu.no>
References: <8d4842b50710310814w3880f7d3ldf8abe3a236cbcc8@mail.gmail.com>
	<20071031215756.GB1670@stud.ntnu.no>
	<472AA59F.3020103@rootnode.com>
	<0001DFFC-0115-1000-9A80-3F81219C1B16-Webmail-10013@mac.com>
	<8d4842b50711021832g7ad7cec9x48d2f114b1e41f5f@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <8d4842b50711021832g7ad7cec9x48d2f114b1e41f5f@mail.gmail.com>
User-Agent: Mutt/1.5.9i
X-Content-Scanned: with sophos and spamassassin at mailgw.ntnu.no.
Cc: Joe Koberg <joe@rootnode.com>, Peter Giessel <pgiessel@mac.com>,
	freebsd-geom@freebsd.org
Subject: Re: gvinum and raid5
X-BeenThere: freebsd-geom@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: GEOM-specific discussions and implementations
	<freebsd-geom.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-geom>
List-Post: <mailto:freebsd-geom@freebsd.org>
List-Help: <mailto:freebsd-geom-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 03 Nov 2007 02:57:09 -0000

On fre, nov 02, 2007 at 10:32:54 -0300, Marco Haddad wrote:
> Hi,
> 
> I must say that I had a strong faith on vinum too. I used it on a dozen
> servers to build raid5 volumes, specially when the hard drives were small
> and unreliable. So I had a few crashes naturally, but replacing the failed
> disk was easy and rebuild worked all times.
> 
[...]
> 
> Suppose ad1 crashes and gvinum marks it as down. With the command "gvinum l"
> we would get something like this:
> 
> 3 drives:
> D AD0    State: up        /dev/ad0s1d    ...
> D AD1    State: down    /dev/ad1s1d    ...
> D AD2    State: up        /dev/ad2s1d    ...
> 
> 1 volumes:
> V VOL    State: up    ...
> 
> 1 plexes:
> P VOL.p0    R5 State: degraded    ...
> 
> 3 subdisks:
> S VOL.p0.s0    State: up        D: AD0    ...
> S VOL.p0.s1    State: down    D: AD1    ...
> S VOL.p0.s2    State: up        D: AD2    ...
> 
> 
> First thing I do: edit fstab and comment out the line mounting
> /dev/gvinum/VOL wherever it was mounted. It is necessary because once
> mounted gvinum can not operate most commands, and umount doesn't do the
> trick. Then I shutdown the system and replace the hard disk and bring it up
> again.
True, this was a bit of a pain, also because the geom_vinum module can't be
kldunloaded. This have been fixed in the "experimental" version :)
> 
> At this point the first weird thing can be noted. With 'gvinum l' you would
> get:
> 
> 2 drives:
> D AD0    State: up        /dev/ad0s1d    ...
> D AD2    State: up        /dev/ad2s1d    ...
> 
> 1 volumes:
> V VOL    State: up    ...
> 
> 1 plexes:
> P VOL.p0    R5 State: up    ...
> 
> 3 subdisks:
> S VOL.p0.s0    State: up        D: AD0    ...
> S VOL.p0.s1    State: up        D: AD1    ...
> S VOL.p0.s2    State: up        D: AD2    ...
> 
> What? The AD1 is gonne, ok, but why the subdisk VOL.p0.s1 is up? And it
> makes the plex up instead of degraded. The first time I saw it I got the
> shivers.
This is fixed in new gvinum changes from Summer of Code. The current version
of gvinum is very bad at keeping correct configuration data, which have many
issues.
> 
> Next step is to fdisk and label the new disk just like the old one. The new
> disk can be bigger but, I think, the partition ad1s1d must be the same size
> as before.
Yes, at least the same size or bigger.
> 
[...]
> to rebuild it. It may take about 1 hour per 100GB of volume space, so we
> better grab some lunch...
> 
> The progress can be seen at any time with:
> 
> gvinum ls
> 
> After that, a 'fsck -t ufs /dev/gvinum/VOL' will probably catch some errors
> left behind when the drive came down.
> 
> Now we just need to uncomment that line in fstab and reboot.
> 
> I think there's no easier way...

Yes there is. Replacing a drive in gvinum follows the following procedure:
1. Create config for the new drive and name the drive _differently_ than the
old one.

2. Use the gvinum 'move' command to move the stale subdisk to the new drive.

3. Make sure the the subdisk now points to the new drive and that it's in the
'stale' state.

4. Start the plex (gvinum start).

The other issues you encountered have been fixed in my gvinum work this
summer. Also, replacing a drive and rebuilding a plex can happen without
unmounting your volume in the new gvinum. 

brave users can find patches here:
http://people.freebsd.org/~lulf/patches/gvinum

All testing is very much appreciated.

-- 
Ulf Lilleengen

From owner-freebsd-geom@FreeBSD.ORG  Sat Nov  3 03:46:03 2007
Return-Path: <owner-freebsd-geom@FreeBSD.ORG>
Delivered-To: freebsd-geom@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 58D0816A417
	for <freebsd-geom@freebsd.org>; Sat,  3 Nov 2007 03:46:03 +0000 (UTC)
	(envelope-from lulf@stud.ntnu.no)
Received: from fri.itea.ntnu.no (fri.itea.ntnu.no [129.241.7.60])
	by mx1.freebsd.org (Postfix) with ESMTP id 1878B13C4B5
	for <freebsd-geom@freebsd.org>; Sat,  3 Nov 2007 03:46:02 +0000 (UTC)
	(envelope-from lulf@stud.ntnu.no)
Received: from localhost (localhost [127.0.0.1])
	by fri.itea.ntnu.no (Postfix) with ESMTP id 2AB828C05;
	Sat,  3 Nov 2007 02:42:50 +0100 (CET)
Received: from caracal.stud.ntnu.no (caracal.stud.ntnu.no [129.241.56.185])
	by fri.itea.ntnu.no (Postfix) with ESMTP;
	Sat,  3 Nov 2007 02:42:49 +0100 (CET)
Received: by caracal.stud.ntnu.no (Postfix, from userid 2312)
	id 79D0B62421A; Sat,  3 Nov 2007 02:43:06 +0100 (CET)
Date: Sat, 3 Nov 2007 02:43:06 +0100
From: Ulf Lilleengen <lulf@stud.ntnu.no>
To: Peter Giessel <pgiessel@mac.com>
Message-ID: <20071103014306.GA22755@stud.ntnu.no>
References: <8d4842b50710310814w3880f7d3ldf8abe3a236cbcc8@mail.gmail.com>
	<20071031215756.GB1670@stud.ntnu.no>
	<472AA59F.3020103@rootnode.com>
	<0001DFFC-0115-1000-9A80-3F81219C1B16-Webmail-10013@mac.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <0001DFFC-0115-1000-9A80-3F81219C1B16-Webmail-10013@mac.com>
User-Agent: Mutt/1.5.9i
X-Content-Scanned: with sophos and spamassassin at mailgw.ntnu.no.
Cc: freebsd-geom@freebsd.org
Subject: Re: gvinum and raid5
X-BeenThere: freebsd-geom@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: GEOM-specific discussions and implementations
	<freebsd-geom.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-geom>
List-Post: <mailto:freebsd-geom@freebsd.org>
List-Help: <mailto:freebsd-geom-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 03 Nov 2007 03:46:03 -0000

On fre, nov 02, 2007 at 12:38:36 -0700, Peter Giessel wrote:
> On Friday, November 02, 2007, at 01:04AM, "Joe Koberg" <joe@rootnode.com> wrote:
> >Ulf Lilleengen wrote:
> >> On ons, okt 31, 2007 at 12:14:18 -0300, Marco Haddad wrote:
> >>   
> >>> I found in recent researchs that a lot of people say gvinum should not be
> >>> trusted, when it comes to raid5. I began to get worried. Am I alone using
> >>>
> >>>     
> >> I'm working on it, and there are definately people still using it. (I've
> >> recieved a number of private mails as well as those seen on this list). IMO,
> >> gvinum can be trusted when it comes to raid5. I've not experienced any
> >> corruption-bugs or anything like that with it. 
> >>   
> >
> >The source of the mistrust may be the fact that few software-only RAID-5 
> >systems can guarantee write consistency across a multi-drive 
> >read-update-write cycle in the case of, e.g., power failure.
> 
> That may be the true source, but my source of mistrust comes from a few
> drive failures and gvinum's inability to rebuild the replaced drive.
> 
> Worked fine under vinum in tests, tried the same thing in gvinum (granted,
> this was under FreeBSD 5), and the array failed to rebuild.
> 
> I can't be 100% sure it wasn't a flakey ATA controller and not gvinum's
> fault, and I no longer have access to the box to play with, but when I was
> playing with gvinum, replacing a failed drive usually resulted in panics.

Well, all I can say is that I've tested this many times with gvinum in
CURRENT/7.x/6.x as well as my SoC work, and I made updates to the manpage to
give examples on how to do this as well. 

Also, for the software RAID-5 problems... they are hard to "fix" since gvinum
doesn't really know anything about the consumers. However, it could be
interesting to try out different optimizations like not reading parity when
having a sufficiently large request, or some sort of write cache until one
can issue a large enough request.
-- 
Ulf Lilleengen

From owner-freebsd-geom@FreeBSD.ORG  Sat Nov  3 05:21:00 2007
Return-Path: <owner-freebsd-geom@FreeBSD.ORG>
Delivered-To: freebsd-geom@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A1AC616A417
	for <freebsd-geom@freebsd.org>; Sat,  3 Nov 2007 05:21:00 +0000 (UTC)
	(envelope-from freebsd-lists@ideo.com.br)
Received: from rv-out-0910.google.com (rv-out-0910.google.com [209.85.198.186])
	by mx1.freebsd.org (Postfix) with ESMTP id 8161C13C4AC
	for <freebsd-geom@freebsd.org>; Sat,  3 Nov 2007 05:21:00 +0000 (UTC)
	(envelope-from freebsd-lists@ideo.com.br)
Received: by rv-out-0910.google.com with SMTP id l15so815116rvb
	for <freebsd-geom@freebsd.org>; Fri, 02 Nov 2007 22:20:34 -0700 (PDT)
Received: by 10.142.76.4 with SMTP id y4mr722812wfa.1194053309004;
	Fri, 02 Nov 2007 18:28:29 -0700 (PDT)
Received: by 10.142.135.15 with HTTP; Fri, 2 Nov 2007 18:28:28 -0700 (PDT)
Message-ID: <8d4842b50711021828m30816730rc8fb7f31278bb9c4@mail.gmail.com>
Date: Fri, 2 Nov 2007 22:28:28 -0300
From: "Marco Haddad" <freebsd-lists@ideo.com.br>
To: "Peter Giessel" <pgiessel@mac.com>
In-Reply-To: <0001DFFC-0115-1000-9A80-3F81219C1B16-Webmail-10013@mac.com>
MIME-Version: 1.0
References: <8d4842b50710310814w3880f7d3ldf8abe3a236cbcc8@mail.gmail.com>
	<20071031215756.GB1670@stud.ntnu.no> <472AA59F.3020103@rootnode.com>
	<0001DFFC-0115-1000-9A80-3F81219C1B16-Webmail-10013@mac.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: freebsd-geom@freebsd.org, Ulf Lilleengen <lulf@stud.ntnu.no>,
	Joe Koberg <joe@rootnode.com>
Subject: Re: gvinum and raid5
X-BeenThere: freebsd-geom@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: GEOM-specific discussions and implementations
	<freebsd-geom.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-geom>
List-Post: <mailto:freebsd-geom@freebsd.org>
List-Help: <mailto:freebsd-geom-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 03 Nov 2007 05:21:00 -0000

Hi,

I must say that I had a strong faith on vinum too. I used it on a dozen
servers to build raid5 volumes, specially when the hard drives were small
and unreliable. So I had a few crashes naturally, but replacing the failed
disk was easy and rebuild worked all times.

I started using gvinum at the first SCSI controller not supported by vinum
found. As gvinum solved the vinum problem with that controller, it
immediately received the same faith I had on vinum. I kept using gvinum many
times after, till my faith was shaken by a hard disk crash, because I could
not get the replacement drive added to the raid5 volume. After a lot of head
bumping against the wall, I came up with this work around procedure to
replace a failed disk.

I have just used this procedure today to replace a SATA hard disk that I
suspect was the cause of an intermittent failure, with such a success that I
began to think it isn't so bad... Anyway I'll describe a simple example in
order to get your comments.

Suppose a simple system with three hard disks ad0, ad1 and ad2. They were
fdisked and labled equally. ad0s1a is the / and ad0s1d, ad1s1d and ad2s1d
are of the same size and are used by gvinum as drives AD0, AD1 and AD2. Each
drive has only one slice and they are joined in a raid5 plex formming the
volume VOL. The gvinum create script would be the following:

drive AD0 device /dev/ad0s1d
drive AD1 device /dev/ad1s1d
drive AD2 device /dev/ad2s1d
volume VOL
  plex org raid5 128K
    sd drive AD0
    sd drive AD1
    sd drive AD2

Suppose ad1 crashes and gvinum marks it as down. With the command "gvinum l"
we would get something like this:

3 drives:
D AD0    State: up        /dev/ad0s1d    ...
D AD1    State: down    /dev/ad1s1d    ...
D AD2    State: up        /dev/ad2s1d    ...

1 volumes:
V VOL    State: up    ...

1 plexes:
P VOL.p0    R5 State: degraded    ...

3 subdisks:
S VOL.p0.s0    State: up        D: AD0    ...
S VOL.p0.s1    State: down    D: AD1    ...
S VOL.p0.s2    State: up        D: AD2    ...


First thing I do: edit fstab and comment out the line mounting
/dev/gvinum/VOL wherever it was mounted. It is necessary because once
mounted gvinum can not operate most commands, and umount doesn't do the
trick. Then I shutdown the system and replace the hard disk and bring it up
again.

At this point the first weird thing can be noted. With 'gvinum l' you would
get:

2 drives:
D AD0    State: up        /dev/ad0s1d    ...
D AD2    State: up        /dev/ad2s1d    ...

1 volumes:
V VOL    State: up    ...

1 plexes:
P VOL.p0    R5 State: up    ...

3 subdisks:
S VOL.p0.s0    State: up        D: AD0    ...
S VOL.p0.s1    State: up        D: AD1    ...
S VOL.p0.s2    State: up        D: AD2    ...

What? The AD1 is gonne, ok, but why the subdisk VOL.p0.s1 is up? And it
makes the plex up instead of degraded. The first time I saw it I got the
shivers.

Next step is to fdisk and label the new disk just like the old one. The new
disk can be bigger but, I think, the partition ad1s1d must be the same size
as before.

At this point should be enough to use gvinum create with a script file
containing only the line:

drive AD1 device /dev/ad1s1d

but gvinum would panic with that and the system would lock or core dump.
Then something weird must be done: remove all gvinum objects with 'gvinum rm
---'. Yes, just to make it clear, in this case the commands would be:

gvinum rm -r AD0
gvinum rm -r AD2
gvinum rm VOL
gvinum rm VOL.p0
gvinum rm VOL.p0.s1

Then we can use 'gvinum create' with the original script to recreate
everything.

Now it is all up again, but it isn't just right. The subdisk VOL.p0.s1 must
be marked as stale with:

gvinum setstate -f stale VOL.p0.s1

This brings back the plex to degraded mode and we can use:

gvinum start VOL

to rebuild it. It may take about 1 hour per 100GB of volume space, so we
better grab some lunch...

The progress can be seen at any time with:

gvinum ls

After that, a 'fsck -t ufs /dev/gvinum/VOL' will probably catch some errors
left behind when the drive came down.

Now we just need to uncomment that line in fstab and reboot.

I think there's no easier way...

Regards,
Marco Haddad


On 11/2/07, Peter Giessel <pgiessel@mac.com> wrote:
>
> On Friday, November 02, 2007, at 01:04AM, "Joe Koberg" <joe@rootnode.com>
> wrote:
> >Ulf Lilleengen wrote:
> >> On ons,


okt 31, 2007 at 12:14:18 -0300, Marco Haddad wrote:
> >>
> >>> I found in recent researchs that a lot of people say gvinum should not
> be
> >>> trusted, when it comes to raid5. I began to get worried. Am I alone
> using
> >>>
> >>>
> >> I'm working on it, and there are definately people still using it.
> (I've
> >> recieved a number of private mails as well as those seen on this list).
> IMO,
> >> gvinum can be trusted when it comes to raid5. I've not experienced any
> >> corruption-bugs or anything like that with it.
> >>
> >
> >The source of the mistrust may be the fact that few software-only RAID-5
> >systems can guarantee write consistency across a multi-drive
> >read-update-write cycle in the case of, e.g., power failure.
>
> That may be the true source, but my source of mistrust comes from a few
> drive failures and gvinum's inability to rebuild the replaced drive.
>
> Worked fine under vinum in tests, tried the same thing in gvinum (granted,
> this was under FreeBSD 5), and the array failed to rebuild.
>
> I can't be 100% sure it wasn't a flakey ATA controller and not gvinum's
> fault, and I no longer have access to the box to play with, but when I was
> playing with gvinum, replacing a failed drive usually resulted in panics.
>