From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 25 06:02:15 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 1C3EA16A475;
	Wed, 25 Jul 2007 06:02:15 +0000 (UTC)
	(envelope-from remko@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id ED2E213C45D;
	Wed, 25 Jul 2007 06:02:14 +0000 (UTC)
	(envelope-from remko@FreeBSD.org)
Received: from freefall.freebsd.org (remko@localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.1/8.14.1) with ESMTP id l6P62E9c065895;
	Wed, 25 Jul 2007 06:02:14 GMT
	(envelope-from remko@freefall.freebsd.org)
Received: (from remko@localhost)
	by freefall.freebsd.org (8.14.1/8.14.1/Submit) id l6P62E6G065891;
	Wed, 25 Jul 2007 06:02:14 GMT (envelope-from remko)
Date: Wed, 25 Jul 2007 06:02:14 GMT
Message-Id: <200707250602.l6P62E6G065891@freefall.freebsd.org>
To: remko@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org
From: remko@FreeBSD.org
Cc: 
Subject: Re: kern/114847: [ntfs] [patch] dirmask support for NTFS ala MSDOSFS
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 25 Jul 2007 06:02:15 -0000

Synopsis: [ntfs] [patch] dirmask support for NTFS ala MSDOSFS

Responsible-Changed-From-To: freebsd-bugs->freebsd-fs
Responsible-Changed-By: remko
Responsible-Changed-When: Wed Jul 25 06:02:14 UTC 2007
Responsible-Changed-Why: 
I think the FS list is a better place for this PR.

http://www.freebsd.org/cgi/query-pr.cgi?pr=114847

From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 25 06:02:49 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5400916A469;
	Wed, 25 Jul 2007 06:02:49 +0000 (UTC)
	(envelope-from remko@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id 32AA113C45E;
	Wed, 25 Jul 2007 06:02:49 +0000 (UTC)
	(envelope-from remko@FreeBSD.org)
Received: from freefall.freebsd.org (remko@localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.1/8.14.1) with ESMTP id l6P62n6P065983;
	Wed, 25 Jul 2007 06:02:49 GMT
	(envelope-from remko@freefall.freebsd.org)
Received: (from remko@localhost)
	by freefall.freebsd.org (8.14.1/8.14.1/Submit) id l6P62nJl065979;
	Wed, 25 Jul 2007 06:02:49 GMT (envelope-from remko)
Date: Wed, 25 Jul 2007 06:02:49 GMT
Message-Id: <200707250602.l6P62nJl065979@freefall.freebsd.org>
To: remko@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org
From: remko@FreeBSD.org
Cc: 
Subject: Re: kern/114856: [ntfs] [patch] Bug in NTFS allows bogus file modes.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 25 Jul 2007 06:02:49 -0000

Synopsis: [ntfs] [patch] Bug in NTFS allows bogus file modes.

Responsible-Changed-From-To: freebsd-bugs->freebsd-fs
Responsible-Changed-By: remko
Responsible-Changed-When: Wed Jul 25 06:02:48 UTC 2007
Responsible-Changed-Why: 
I think the FS list is a better place for this PR.

http://www.freebsd.org/cgi/query-pr.cgi?pr=114856

From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 25 06:07:03 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id F38C816A417;
	Wed, 25 Jul 2007 06:07:02 +0000 (UTC)
	(envelope-from remko@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id D227513C442;
	Wed, 25 Jul 2007 06:07:02 +0000 (UTC)
	(envelope-from remko@FreeBSD.org)
Received: from freefall.freebsd.org (remko@localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.1/8.14.1) with ESMTP id l6P6727M066259;
	Wed, 25 Jul 2007 06:07:02 GMT
	(envelope-from remko@freefall.freebsd.org)
Received: (from remko@localhost)
	by freefall.freebsd.org (8.14.1/8.14.1/Submit) id l6P672K4066255;
	Wed, 25 Jul 2007 06:07:02 GMT (envelope-from remko)
Date: Wed, 25 Jul 2007 06:07:02 GMT
Message-Id: <200707250607.l6P672K4066255@freefall.freebsd.org>
To: remko@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org
From: remko@FreeBSD.org
Cc: 
Subject: Re: kern/114676: [ufs] snapshot creation panics: snapacct_ufs2: bad
	block
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 25 Jul 2007 06:07:03 -0000

Synopsis: [ufs] snapshot creation panics: snapacct_ufs2: bad block

Responsible-Changed-From-To: freebsd-bugs->freebsd-fs
Responsible-Changed-By: remko
Responsible-Changed-When: Wed Jul 25 06:07:01 UTC 2007
Responsible-Changed-Why: 
Seems more FS related, reassign.

http://www.freebsd.org/cgi/query-pr.cgi?pr=114676

From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 25 09:23:49 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id DEB1116A4A1
	for <freebsd-fs@freebsd.org>; Wed, 25 Jul 2007 09:23:49 +0000 (UTC)
	(envelope-from M.S.Powell@salford.ac.uk)
Received: from abbe.salford.ac.uk (abbe.salford.ac.uk [146.87.0.10])
	by mx1.freebsd.org (Postfix) with SMTP id 587DF13C4B5
	for <freebsd-fs@freebsd.org>; Wed, 25 Jul 2007 09:23:49 +0000 (UTC)
	(envelope-from M.S.Powell@salford.ac.uk)
Received: (qmail 65884 invoked by uid 98); 25 Jul 2007 10:23:46 +0100
Received: from 146.87.255.121 by abbe.salford.ac.uk (envelope-from
	<M.S.Powell@salford.ac.uk>, uid 401) with qmail-scanner-2.01 
	(clamdscan: 0.90/3762. spamassassin: 3.1.8.  
	Clear:RC:1(146.87.255.121):. 
	Processed in 0.045947 secs); 25 Jul 2007 09:23:46 -0000
Received: from rust.salford.ac.uk (HELO rust.salford.ac.uk) (146.87.255.121)
	by abbe.salford.ac.uk (qpsmtpd/0.3x.614) with SMTP;
	Wed, 25 Jul 2007 10:23:46 +0100
Received: (qmail 58773 invoked by uid 1002); 25 Jul 2007 09:23:44 -0000
Received: from localhost (sendmail-bs@127.0.0.1)
	by localhost with SMTP; 25 Jul 2007 09:23:44 -0000
Date: Wed, 25 Jul 2007 10:23:44 +0100 (BST)
From: "Mark Powell" <M.S.Powell@salford.ac.uk>
To: Pawel Jakub Dawidek <pjd@FreeBSD.org>
In-Reply-To: <20070721065204.GA2044@garage.freebsd.pl>
Message-ID: <20070725095723.T57231@rust.salford.ac.uk>
References: <20070719102302.R1534@rust.salford.ac.uk>
	<20070719135510.GE1194@garage.freebsd.pl>
	<20070719181313.G4923@rust.salford.ac.uk>
	<20070721065204.GA2044@garage.freebsd.pl>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-fs@freebsd.org
Subject: Re: ZfS & GEOM with many odd drive sizes
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 25 Jul 2007 09:23:50 -0000

On Sat, 21 Jul 2007, Pawel Jakub Dawidek wrote:

Thanks for your reply.

> Be sure to turn off debugging, ie. remove WITNESS, INVARIANTS and
> INVARIANT_SUPPORT options from your kernel configuration.
> Other than that, ZFS may just be more CPU hungry...

   I have. Makes little difference. Think the idea of using an Athlon XP 
for ZFS has turned out to be a bridge too far. The new 65nm Athlon 64 x2 
are very cheap now. Time for an upgrade.
   You said that replacing one device with another is not a problem. Just 
to be clear on this as it's a key factor in me going with this solution. I 
hope this isn't too naive a question, but the answer will be here for 
others :)
   Suppose instead of gconcat I used gstripe on the 250+200 combinations:

i.e. (slice 1 on all drives is reserved for ufs gmirror of /boot and 
block device swap)

gs0 ad0s2 ad1s2
gs1 ad2s2 ad3s2
gs2 ad4s2 ad5s2

I use these gstripes and the single 400GB drive to construct the zpool:

zpool create tank raidz /dev/mirror/gs0 /dev/mirror/gs1 /dev/mirror/gs2 ad6s2

If for example ad3 fails and thus gs1 fails, how is this replaced in the 
zpool? e.g. suppose I replace both ad2 and ad3 with a new 500GB drive as 
ad2. Is fixing this as simple as:

zpool replace tank /dev/mirror/gs1 ad2s2

Many thanks.

-- 
Mark Powell - UNIX System Administrator - The University of Salford
Information Services Division, Clifford Whitworth Building,
Salford University, Manchester, M5 4WT, UK.
Tel: +44 161 295 4837  Fax: +44 161 295 5888  www.pgp.com for PGP key

From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 25 09:30:58 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B776116A417;
	Wed, 25 Jul 2007 09:30:58 +0000 (UTC) (envelope-from dfr@rabson.org)
Received: from itchy.rabson.org (unknown [IPv6:2001:618:400::50b1:e8f2])
	by mx1.freebsd.org (Postfix) with ESMTP id 1E88513C46C;
	Wed, 25 Jul 2007 09:30:57 +0000 (UTC) (envelope-from dfr@rabson.org)
Received: from [80.177.232.250] (herring.rabson.org [80.177.232.250])
	by itchy.rabson.org (8.13.3/8.13.3) with ESMTP id l6P9UmpN005605;
	Wed, 25 Jul 2007 10:30:48 +0100 (BST) (envelope-from dfr@rabson.org)
From: Doug Rabson <dfr@rabson.org>
To: Mark Powell <M.S.Powell@salford.ac.uk>
In-Reply-To: <20070725095723.T57231@rust.salford.ac.uk>
References: <20070719102302.R1534@rust.salford.ac.uk>
	<20070719135510.GE1194@garage.freebsd.pl>
	<20070719181313.G4923@rust.salford.ac.uk>
	<20070721065204.GA2044@garage.freebsd.pl>
	<20070725095723.T57231@rust.salford.ac.uk>
Content-Type: text/plain
Date: Wed, 25 Jul 2007 10:30:48 +0100
Message-Id: <1185355848.3698.7.camel@herring.rabson.org>
Mime-Version: 1.0
X-Mailer: Evolution 2.10.2 FreeBSD GNOME Team Port 
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=-1.4 required=5.0 tests=ALL_TRUSTED autolearn=failed 
	version=3.1.0
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on itchy.rabson.org
X-Virus-Scanned: ClamAV 0.87.1/3762/Wed Jul 25 06:17:29 2007 on
	itchy.rabson.org
X-Virus-Status: Clean
Cc: freebsd-fs@freebsd.org, Pawel Jakub Dawidek <pjd@freebsd.org>
Subject: Re: ZfS & GEOM with many odd drive sizes
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 25 Jul 2007 09:30:58 -0000

On Wed, 2007-07-25 at 10:23 +0100, Mark Powell wrote:
> On Sat, 21 Jul 2007, Pawel Jakub Dawidek wrote:
> 
> Thanks for your reply.
> 
> > Be sure to turn off debugging, ie. remove WITNESS, INVARIANTS and
> > INVARIANT_SUPPORT options from your kernel configuration.
> > Other than that, ZFS may just be more CPU hungry...
> 
>    I have. Makes little difference. Think the idea of using an Athlon XP 
> for ZFS has turned out to be a bridge too far. The new 65nm Athlon 64 x2 
> are very cheap now. Time for an upgrade.
>    You said that replacing one device with another is not a problem. Just 
> to be clear on this as it's a key factor in me going with this solution. I 
> hope this isn't too naive a question, but the answer will be here for 
> others :)
>    Suppose instead of gconcat I used gstripe on the 250+200 combinations:
> 
> i.e. (slice 1 on all drives is reserved for ufs gmirror of /boot and 
> block device swap)
> 
> gs0 ad0s2 ad1s2
> gs1 ad2s2 ad3s2
> gs2 ad4s2 ad5s2
> 
> I use these gstripes and the single 400GB drive to construct the zpool:
> 
> zpool create tank raidz /dev/mirror/gs0 /dev/mirror/gs1 /dev/mirror/gs2 ad6s2
> 
> If for example ad3 fails and thus gs1 fails, how is this replaced in the 
> zpool? e.g. suppose I replace both ad2 and ad3 with a new 500GB drive as 
> ad2. Is fixing this as simple as:
> 
> zpool replace tank /dev/mirror/gs1 ad2s2
> 
> Many thanks.

I'm not really sure why you are using gmirror, gconcat or gstripe at
all. Surely it would be easier to let ZFS manage the mirroring and
concatentation. If you do that, ZFS can use its checksums to continually
monitor the two sides of your mirrors for consistency and will be able
to notice as early as possible when one of the drives goes flakey. For
concats, ZFS will also spread redundant copies of metadata (and regular
data if you use 'zfs set copies=<N>') across the disks in the compat. If
you have to replace one half of a mirror, ZFS has enough information to
know exactly which blocks needs to be copied to the new drive which can
make recovery much quicker.


From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 25 10:13:20 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D245416A418
	for <freebsd-fs@freebsd.org>; Wed, 25 Jul 2007 10:13:20 +0000 (UTC)
	(envelope-from M.S.Powell@salford.ac.uk)
Received: from abbe.salford.ac.uk (abbe.salford.ac.uk [146.87.0.10])
	by mx1.freebsd.org (Postfix) with SMTP id 438CF13C458
	for <freebsd-fs@freebsd.org>; Wed, 25 Jul 2007 10:13:20 +0000 (UTC)
	(envelope-from M.S.Powell@salford.ac.uk)
Received: (qmail 3431 invoked by uid 98); 25 Jul 2007 11:13:19 +0100
Received: from 146.87.255.121 by abbe.salford.ac.uk (envelope-from
	<M.S.Powell@salford.ac.uk>, uid 401) with qmail-scanner-2.01 
	(clamdscan: 0.90/3762. spamassassin: 3.1.8.  
	Clear:RC:1(146.87.255.121):. 
	Processed in 0.046831 secs); 25 Jul 2007 10:13:19 -0000
Received: from rust.salford.ac.uk (HELO rust.salford.ac.uk) (146.87.255.121)
	by abbe.salford.ac.uk (qpsmtpd/0.3x.614) with SMTP;
	Wed, 25 Jul 2007 11:13:19 +0100
Received: (qmail 59922 invoked by uid 1002); 25 Jul 2007 10:13:16 -0000
Received: from localhost (sendmail-bs@127.0.0.1)
	by localhost with SMTP; 25 Jul 2007 10:13:16 -0000
Date: Wed, 25 Jul 2007 11:13:16 +0100 (BST)
From: "Mark Powell" <M.S.Powell@salford.ac.uk>
To: Doug Rabson <dfr@rabson.org>
In-Reply-To: <1185355848.3698.7.camel@herring.rabson.org>
Message-ID: <20070725103746.N57231@rust.salford.ac.uk>
References: <20070719102302.R1534@rust.salford.ac.uk> 
	<20070719135510.GE1194@garage.freebsd.pl>
	<20070719181313.G4923@rust.salford.ac.uk>
	<20070721065204.GA2044@garage.freebsd.pl>
	<20070725095723.T57231@rust.salford.ac.uk>
	<1185355848.3698.7.camel@herring.rabson.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-fs@freebsd.org
Subject: Re: ZfS & GEOM with many odd drive sizes
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 25 Jul 2007 10:13:20 -0000

On Wed, 25 Jul 2007, Doug Rabson wrote:

> I'm not really sure why you are using gmirror, gconcat or gstripe at
> all. Surely it would be easier to let ZFS manage the mirroring and
> concatentation. If you do that, ZFS can use its checksums to continually
> monitor the two sides of your mirrors for consistency and will be able
> to notice as early as possible when one of the drives goes flakey. For
> concats, ZFS will also spread redundant copies of metadata (and regular
> data if you use 'zfs set copies=<N>') across the disks in the compat. If
> you have to replace one half of a mirror, ZFS has enough information to
> know exactly which blocks needs to be copied to the new drive which can
> make recovery much quicker.

gmirror is only going to used for the ufs /boot parition and block device 
swap. (I'll ignore the smallish space used by that below.)
   I thought gstripe was a solution cos I mentioned in the original post 
that I have the following drives to play with; 1x400GB, 3x250GB, 3x200GB.
   If I make a straight zpool with all those drives I get a total usable 
7x200GB raidz with only an effective 6x200GB=1200GB of usable storage. 
Also a 7 device raidz cries out for being a raidz2? That's a further 200GB 
of storage lost.
   My original plan was (because of the largest drive being a single 400GB) 
was to gconcat (now to gstripe) the smaller drives into 3 pairs of 
250GB+200GB, making three new 450GB devices. This would make a zpool of 4 
devices i.e. 1x400GB+3x450GB giving effective storage of 1200GB. Yes, it's 
the same as above (as long as raidz2 is not used there), but I was 
thinking about future expansion...
   The advantge this approach seems to give is that when drives fail each 
device (which is either a single drive or a gstripe pair) can be replaced 
with a modern larger drive (500GB or 750GB depending on what's economical 
at the time).
   Once that replacement has been performed only 4 times, the zpool will 
increase in size (actually it will increase straight away by 4x50GB total 
if the 400GB drive fails 1st).
   In addition, once a couple of drives in a pair have failed and are 
replaced by a single large drive, there will also be smaller 250GB or 
200GB drives spare which can be further added to the zpool as a zfs 
mirror.
   The alternative of using a zpool of 7 individual drives means that I 
need to replace many more drives to actually see an increase in zpool 
size.
   Yes, there a large number of combinations here, but it seems that the 
zpool will increase in size sooner this way?
   I believe my reasoning is correct here? Let me know if your experience 
would suggest otherwise.
   Many thanks.

-- 
Mark Powell - UNIX System Administrator - The University of Salford
Information Services Division, Clifford Whitworth Building,
Salford University, Manchester, M5 4WT, UK.
Tel: +44 161 295 4837  Fax: +44 161 295 5888  www.pgp.com for PGP key

From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 25 11:17:26 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 08D3416A421
	for <freebsd-fs@freebsd.org>; Wed, 25 Jul 2007 11:17:26 +0000 (UTC)
	(envelope-from M.S.Powell@salford.ac.uk)
Received: from akis.salford.ac.uk (akis.salford.ac.uk [146.87.0.14])
	by mx1.freebsd.org (Postfix) with SMTP id 6BA4213C4A6
	for <freebsd-fs@freebsd.org>; Wed, 25 Jul 2007 11:17:24 +0000 (UTC)
	(envelope-from M.S.Powell@salford.ac.uk)
Received: (qmail 32712 invoked by uid 98); 25 Jul 2007 12:17:23 +0100
Received: from 146.87.255.121 by akis.salford.ac.uk (envelope-from
	<M.S.Powell@salford.ac.uk>, uid 401) with qmail-scanner-2.01 
	(clamdscan: 0.90/3762. spamassassin: 3.1.8.  
	Clear:RC:1(146.87.255.121):. 
	Processed in 0.0415 secs); 25 Jul 2007 11:17:23 -0000
Received: from rust.salford.ac.uk (HELO rust.salford.ac.uk) (146.87.255.121)
	by akis.salford.ac.uk (qpsmtpd/0.3x.614) with SMTP;
	Wed, 25 Jul 2007 12:17:23 +0100
Received: (qmail 60504 invoked by uid 1002); 25 Jul 2007 11:17:21 -0000
Received: from localhost (sendmail-bs@127.0.0.1)
	by localhost with SMTP; 25 Jul 2007 11:17:21 -0000
Date: Wed, 25 Jul 2007 12:17:21 +0100 (BST)
From: "Mark Powell" <M.S.Powell@salford.ac.uk>
To: Doug Rabson <dfr@rabson.org>
In-Reply-To: <3A5D89E1-A7B1-4B10-ADB8-F58332306691@rabson.org>
Message-ID: <20070725120913.A57231@rust.salford.ac.uk>
References: <20070719102302.R1534@rust.salford.ac.uk> 
	<20070719135510.GE1194@garage.freebsd.pl>
	<20070719181313.G4923@rust.salford.ac.uk>
	<20070721065204.GA2044@garage.freebsd.pl>
	<20070725095723.T57231@rust.salford.ac.uk>
	<1185355848.3698.7.camel@herring.rabson.org>
	<20070725103746.N57231@rust.salford.ac.uk>
	<3A5D89E1-A7B1-4B10-ADB8-F58332306691@rabson.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-fs@freebsd.org
Subject: Re: ZfS & GEOM with many odd drive sizes
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 25 Jul 2007 11:17:26 -0000

On Wed, 25 Jul 2007, Doug Rabson wrote:

>> gmirror is only going to used for the ufs /boot parition and block device 
>> swap. (I'll ignore the smallish space used by that below.)
>
> Just to muddy the waters a little - I'm working on ZFS native boot code at 
> the moment. It probably won't ship with 7.0 but should be available shortly 
> after.

Great work. That will be zfs mirror only right?

>>  I believe my reasoning is correct here? Let me know if your experience 
>> would suggest otherwise.
>
> Your reasoning sounds fine now that I have the bigger picture in my head. I 
> don't have a lot of experience here - for my ZFS testing, I just bought a 
> couple of cheap 300GB drives which I'm using as a simple mirror. From what I 
> have read, mirrors and raidz2 are roughly equivalent in 'mean time to data 
> loss' terms with raidz1 quite a bit less safe due to the extra vulnerability 
> window between a drive failure and replacement.

So back to my original question :)
   If one drive in a gconcat gc1 (ad2s2+ad3s2), say ad3 fails, and the 
broken gconcat is completely replaced with a new 500GB drive ad2, is 
fixing that as simple as:

zpool replace tank gc1 ad2

Many thanks.

-- 
Mark Powell - UNIX System Administrator - The University of Salford
Information Services Division, Clifford Whitworth Building,
Salford University, Manchester, M5 4WT, UK.
Tel: +44 161 295 4837  Fax: +44 161 295 5888  www.pgp.com for PGP key

From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 25 11:23:15 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D518916A418
	for <freebsd-fs@freebsd.org>; Wed, 25 Jul 2007 11:23:15 +0000 (UTC)
	(envelope-from dfr@rabson.org)
Received: from mail.qubesoft.com (gate.qubesoft.com [217.169.36.34])
	by mx1.freebsd.org (Postfix) with ESMTP id 646C913C459
	for <freebsd-fs@freebsd.org>; Wed, 25 Jul 2007 11:23:15 +0000 (UTC)
	(envelope-from dfr@rabson.org)
Received: from [10.201.19.245] (doug02.dyn.qubesoft.com [10.201.19.245])
	by mail.qubesoft.com (8.13.3/8.13.3) with ESMTP id l6PB1u5e002918;
	Wed, 25 Jul 2007 12:02:00 +0100 (BST) (envelope-from dfr@rabson.org)
In-Reply-To: <20070725103746.N57231@rust.salford.ac.uk>
References: <20070719102302.R1534@rust.salford.ac.uk>
	<20070719135510.GE1194@garage.freebsd.pl>
	<20070719181313.G4923@rust.salford.ac.uk>
	<20070721065204.GA2044@garage.freebsd.pl>
	<20070725095723.T57231@rust.salford.ac.uk>
	<1185355848.3698.7.camel@herring.rabson.org>
	<20070725103746.N57231@rust.salford.ac.uk>
Mime-Version: 1.0 (Apple Message framework v752.2)
Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
Message-Id: <3A5D89E1-A7B1-4B10-ADB8-F58332306691@rabson.org>
Content-Transfer-Encoding: 7bit
From: Doug Rabson <dfr@rabson.org>
Date: Wed, 25 Jul 2007 12:01:53 +0100
To: Mark Powell <M.S.Powell@salford.ac.uk>
X-Mailer: Apple Mail (2.752.2)
X-Spam-Status: No, score=-2.8 required=5.0 tests=ALL_TRUSTED autolearn=failed 
	version=3.0.4
X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on mail.qubesoft.com
X-Virus-Scanned: ClamAV 0.86.2/3762/Wed Jul 25 06:17:29 2007 on
	mail.qubesoft.com
X-Virus-Status: Clean
Cc: freebsd-fs@freebsd.org
Subject: Re: ZfS & GEOM with many odd drive sizes
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 25 Jul 2007 11:23:15 -0000


On 25 Jul 2007, at 11:13, Mark Powell wrote:

> On Wed, 25 Jul 2007, Doug Rabson wrote:
>
>> I'm not really sure why you are using gmirror, gconcat or gstripe at
>> all. Surely it would be easier to let ZFS manage the mirroring and
>> concatentation. If you do that, ZFS can use its checksums to  
>> continually
>> monitor the two sides of your mirrors for consistency and will be  
>> able
>> to notice as early as possible when one of the drives goes flakey.  
>> For
>> concats, ZFS will also spread redundant copies of metadata (and  
>> regular
>> data if you use 'zfs set copies=<N>') across the disks in the  
>> compat. If
>> you have to replace one half of a mirror, ZFS has enough  
>> information to
>> know exactly which blocks needs to be copied to the new drive  
>> which can
>> make recovery much quicker.
>
> gmirror is only going to used for the ufs /boot parition and block  
> device swap. (I'll ignore the smallish space used by that below.)

Just to muddy the waters a little - I'm working on ZFS native boot  
code at the moment. It probably won't ship with 7.0 but should be  
available shortly after.

>   I thought gstripe was a solution cos I mentioned in the original  
> post that I have the following drives to play with; 1x400GB,  
> 3x250GB, 3x200GB.
>   If I make a straight zpool with all those drives I get a total  
> usable 7x200GB raidz with only an effective 6x200GB=1200GB of  
> usable storage. Also a 7 device raidz cries out for being a raidz2?  
> That's a further 200GB of storage lost.
>   My original plan was (because of the largest drive being a single  
> 400GB) was to gconcat (now to gstripe) the smaller drives into 3  
> pairs of 250GB+200GB, making three new 450GB devices. This would  
> make a zpool of 4 devices i.e. 1x400GB+3x450GB giving effective  
> storage of 1200GB. Yes, it's the same as above (as long as raidz2  
> is not used there), but I was thinking about future expansion...
>   The advantge this approach seems to give is that when drives fail  
> each device (which is either a single drive or a gstripe pair) can  
> be replaced with a modern larger drive (500GB or 750GB depending on  
> what's economical at the time).
>   Once that replacement has been performed only 4 times, the zpool  
> will increase in size (actually it will increase straight away by  
> 4x50GB total if the 400GB drive fails 1st).
>   In addition, once a couple of drives in a pair have failed and  
> are replaced by a single large drive, there will also be smaller  
> 250GB or 200GB drives spare which can be further added to the zpool  
> as a zfs mirror.
>   The alternative of using a zpool of 7 individual drives means  
> that I need to replace many more drives to actually see an increase  
> in zpool size.
>   Yes, there a large number of combinations here, but it seems that  
> the zpool will increase in size sooner this way?
>   I believe my reasoning is correct here? Let me know if your  
> experience would suggest otherwise.
>   Many thanks.
>

Your reasoning sounds fine now that I have the bigger picture in my  
head. I don't have a lot of experience here - for my ZFS testing, I  
just bought a couple of cheap 300GB drives which I'm using as a  
simple mirror. From what I have read, mirrors and raidz2 are roughly  
equivalent in 'mean time to data loss' terms with raidz1 quite a bit  
less safe due to the extra vulnerability window between a drive  
failure and replacement.


From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 25 12:53:57 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 84E3B16A418
	for <freebsd-fs@freebsd.org>; Wed, 25 Jul 2007 12:53:57 +0000 (UTC)
	(envelope-from dfr@rabson.org)
Received: from mail.qubesoft.com (gate.qubesoft.com [217.169.36.34])
	by mx1.freebsd.org (Postfix) with ESMTP id 19D3713C45B
	for <freebsd-fs@freebsd.org>; Wed, 25 Jul 2007 12:53:56 +0000 (UTC)
	(envelope-from dfr@rabson.org)
Received: from [10.201.19.245] (doug02.dyn.qubesoft.com [10.201.19.245])
	by mail.qubesoft.com (8.13.3/8.13.3) with ESMTP id l6PCrnAC007158;
	Wed, 25 Jul 2007 13:53:53 +0100 (BST) (envelope-from dfr@rabson.org)
In-Reply-To: <20070725120913.A57231@rust.salford.ac.uk>
References: <20070719102302.R1534@rust.salford.ac.uk>
	<20070719135510.GE1194@garage.freebsd.pl>
	<20070719181313.G4923@rust.salford.ac.uk>
	<20070721065204.GA2044@garage.freebsd.pl>
	<20070725095723.T57231@rust.salford.ac.uk>
	<1185355848.3698.7.camel@herring.rabson.org>
	<20070725103746.N57231@rust.salford.ac.uk>
	<3A5D89E1-A7B1-4B10-ADB8-F58332306691@rabson.org>
	<20070725120913.A57231@rust.salford.ac.uk>
Mime-Version: 1.0 (Apple Message framework v752.2)
Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
Message-Id: <6FF8729F-B449-4EFA-B3C6-8B9A9E6F6C4F@rabson.org>
Content-Transfer-Encoding: 7bit
From: Doug Rabson <dfr@rabson.org>
Date: Wed, 25 Jul 2007 13:53:46 +0100
To: Mark Powell <M.S.Powell@salford.ac.uk>
X-Mailer: Apple Mail (2.752.2)
X-Spam-Status: No, score=-2.8 required=5.0 tests=ALL_TRUSTED autolearn=failed 
	version=3.0.4
X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on mail.qubesoft.com
X-Virus-Scanned: ClamAV 0.86.2/3762/Wed Jul 25 06:17:29 2007 on
	mail.qubesoft.com
X-Virus-Status: Clean
Cc: freebsd-fs@freebsd.org
Subject: Re: ZfS & GEOM with many odd drive sizes
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 25 Jul 2007 12:53:57 -0000


On 25 Jul 2007, at 12:17, Mark Powell wrote:

> On Wed, 25 Jul 2007, Doug Rabson wrote:
>
>>> gmirror is only going to used for the ufs /boot parition and  
>>> block device swap. (I'll ignore the smallish space used by that  
>>> below.)
>>
>> Just to muddy the waters a little - I'm working on ZFS native boot  
>> code at the moment. It probably won't ship with 7.0 but should be  
>> available shortly after.
>
> Great work. That will be zfs mirror only right?

The code is close to being able to support collections of mirrors. No  
raidz or raidz2 for now though.

>
>>>  I believe my reasoning is correct here? Let me know if your  
>>> experience would suggest otherwise.
>>
>> Your reasoning sounds fine now that I have the bigger picture in  
>> my head. I don't have a lot of experience here - for my ZFS  
>> testing, I just bought a couple of cheap 300GB drives which I'm  
>> using as a simple mirror. From what I have read, mirrors and  
>> raidz2 are roughly equivalent in 'mean time to data loss' terms  
>> with raidz1 quite a bit less safe due to the extra vulnerability  
>> window between a drive failure and replacement.
>
> So back to my original question :)
>   If one drive in a gconcat gc1 (ad2s2+ad3s2), say ad3 fails, and  
> the broken gconcat is completely replaced with a new 500GB drive  
> ad2, is fixing that as simple as:
>
> zpool replace tank gc1 ad2

That sounds right.


From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 25 16:22:23 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 73F9716A41B
	for <freebsd-fs@freebsd.org>; Wed, 25 Jul 2007 16:22:23 +0000 (UTC)
	(envelope-from M.S.Powell@salford.ac.uk)
Received: from akis.salford.ac.uk (akis.salford.ac.uk [146.87.0.14])
	by mx1.freebsd.org (Postfix) with SMTP id D8A4313C483
	for <freebsd-fs@freebsd.org>; Wed, 25 Jul 2007 16:22:22 +0000 (UTC)
	(envelope-from M.S.Powell@salford.ac.uk)
Received: (qmail 4213 invoked by uid 98); 25 Jul 2007 17:22:21 +0100
Received: from 146.87.255.121 by akis.salford.ac.uk (envelope-from
	<M.S.Powell@salford.ac.uk>, uid 401) with qmail-scanner-2.01 
	(clamdscan: 0.90/3763. spamassassin: 3.1.8.  
	Clear:RC:1(146.87.255.121):. 
	Processed in 0.070154 secs); 25 Jul 2007 16:22:21 -0000
Received: from rust.salford.ac.uk (HELO rust.salford.ac.uk) (146.87.255.121)
	by akis.salford.ac.uk (qpsmtpd/0.3x.614) with SMTP;
	Wed, 25 Jul 2007 17:22:20 +0100
Received: (qmail 62424 invoked by uid 1002); 25 Jul 2007 16:20:22 -0000
Received: from localhost (sendmail-bs@127.0.0.1)
	by localhost with SMTP; 25 Jul 2007 16:20:22 -0000
Date: Wed, 25 Jul 2007 17:20:22 +0100 (BST)
From: "Mark Powell" <M.S.Powell@salford.ac.uk>
To: Doug Rabson <dfr@rabson.org>
In-Reply-To: <6FF8729F-B449-4EFA-B3C6-8B9A9E6F6C4F@rabson.org>
Message-ID: <20070725171343.M61339@rust.salford.ac.uk>
References: <20070719102302.R1534@rust.salford.ac.uk> 
	<20070719135510.GE1194@garage.freebsd.pl>
	<20070719181313.G4923@rust.salford.ac.uk>
	<20070721065204.GA2044@garage.freebsd.pl>
	<20070725095723.T57231@rust.salford.ac.uk>
	<1185355848.3698.7.camel@herring.rabson.org>
	<20070725103746.N57231@rust.salford.ac.uk>
	<3A5D89E1-A7B1-4B10-ADB8-F58332306691@rabson.org>
	<20070725120913.A57231@rust.salford.ac.uk>
	<6FF8729F-B449-4EFA-B3C6-8B9A9E6F6C4F@rabson.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-fs@freebsd.org
Subject: Re: ZfS & GEOM with many odd drive sizes
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 25 Jul 2007 16:22:23 -0000

On Wed, 25 Jul 2007, Doug Rabson wrote:

> On 25 Jul 2007, at 12:17, Mark Powell wrote:
>> Great work. That will be zfs mirror only right?
>
> The code is close to being able to support collections of mirrors. No raidz 
> or raidz2 for now though.

That's great news.
   So that would mean, if a raidz vdev was required on a system another 
pool would have to be created with only a mirror vdev in it, to have / on 
zfs too?
   Considering the work involved, is raidz / support really worth it? Of 
course, it's fantastic if you plan to tackle it, but I don't envy you the 
task :(

>> So back to my original question :)
>>  If one drive in a gconcat gc1 (ad2s2+ad3s2), say ad3 fails, and the broken 
>> gconcat is completely replaced with a new 500GB drive ad2, is fixing that 
>> as simple as:
>> 
>> zpool replace tank gc1 ad2
>
> That sounds right.

Thanks for the info. It's good to know how to fix an array before it's 
created :)
   Cheers.

-- 
Mark Powell - UNIX System Administrator - The University of Salford
Information Services Division, Clifford Whitworth Building,
Salford University, Manchester, M5 4WT, UK.
Tel: +44 161 295 4837  Fax: +44 161 295 5888  www.pgp.com for PGP key

From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 25 16:39:33 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 55B7C16A421
	for <freebsd-fs@freebsd.org>; Wed, 25 Jul 2007 16:39:33 +0000 (UTC)
	(envelope-from dfr@rabson.org)
Received: from mail.qubesoft.com (gate.qubesoft.com [217.169.36.34])
	by mx1.freebsd.org (Postfix) with ESMTP id CA3DD13C4A3
	for <freebsd-fs@freebsd.org>; Wed, 25 Jul 2007 16:39:32 +0000 (UTC)
	(envelope-from dfr@rabson.org)
Received: from [10.201.19.245] (doug02.dyn.qubesoft.com [10.201.19.245])
	by mail.qubesoft.com (8.13.3/8.13.3) with ESMTP id l6PGdMX1015863;
	Wed, 25 Jul 2007 17:39:31 +0100 (BST) (envelope-from dfr@rabson.org)
In-Reply-To: <20070725171343.M61339@rust.salford.ac.uk>
References: <20070719102302.R1534@rust.salford.ac.uk>
	<20070719135510.GE1194@garage.freebsd.pl>
	<20070719181313.G4923@rust.salford.ac.uk>
	<20070721065204.GA2044@garage.freebsd.pl>
	<20070725095723.T57231@rust.salford.ac.uk>
	<1185355848.3698.7.camel@herring.rabson.org>
	<20070725103746.N57231@rust.salford.ac.uk>
	<3A5D89E1-A7B1-4B10-ADB8-F58332306691@rabson.org>
	<20070725120913.A57231@rust.salford.ac.uk>
	<6FF8729F-B449-4EFA-B3C6-8B9A9E6F6C4F@rabson.org>
	<20070725171343.M61339@rust.salford.ac.uk>
Mime-Version: 1.0 (Apple Message framework v752.2)
Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
Message-Id: <77814562-8B5E-4E3C-9018-59F7E8FBF8C8@rabson.org>
Content-Transfer-Encoding: 7bit
From: Doug Rabson <dfr@rabson.org>
Date: Wed, 25 Jul 2007 17:39:22 +0100
To: Mark Powell <M.S.Powell@salford.ac.uk>
X-Mailer: Apple Mail (2.752.2)
X-Spam-Status: No, score=-2.8 required=5.0 tests=ALL_TRUSTED autolearn=failed 
	version=3.0.4
X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on mail.qubesoft.com
X-Virus-Scanned: ClamAV 0.86.2/3763/Wed Jul 25 16:37:41 2007 on
	mail.qubesoft.com
X-Virus-Status: Clean
Cc: freebsd-fs@freebsd.org
Subject: Re: ZfS & GEOM with many odd drive sizes
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 25 Jul 2007 16:39:33 -0000


On 25 Jul 2007, at 17:20, Mark Powell wrote:

> On Wed, 25 Jul 2007, Doug Rabson wrote:
>
>> On 25 Jul 2007, at 12:17, Mark Powell wrote:
>>> Great work. That will be zfs mirror only right?
>>
>> The code is close to being able to support collections of mirrors.  
>> No raidz or raidz2 for now though.
>
> That's great news.
>   So that would mean, if a raidz vdev was required on a system  
> another pool would have to be created with only a mirror vdev in  
> it, to have / on zfs too?
>   Considering the work involved, is raidz / support really worth  
> it? Of course, it's fantastic if you plan to tackle it, but I don't  
> envy you the task :(

In theory supporting raidz isn't that hard although the layout policy  
is undocumented. I've looked at the code and I could probably borrow  
some code from the 'real' zfs to figure out the layout and support  
non-degraded raidz and raidz2. Supported degraded configurations is  
more effort because of the extra code to re-generate the date from  
the parity.

The biggest problem here is space. The wretched PC platform requires  
us to bootstrap the system starting from a single sector's worth of  
code (512 bytes). That code runs in stone-age 16bit mode and loads  
the second stage from a fixed disk location. To keep my sanity, I'm  
currently trying to limit the code size of the second stage to 16k.  
This second stage has to understand ZFS well enough to load the third  
stage /boot/loader code from the pool. I currently have exactly 171  
bytes of free space in boot2.

I could probably squeeze another 4k into the second stage bootstrap  
by re-writing boot1 again. I will probably have to do that to support  
collections of disks/mirrors anyway. Doing that will mean permanently  
giving up the idea of booting ZFS on systems that don't support LBA  
addressing. Tthis already disabled in my boot1 code but could be  
resurrected after some hair pulling - increasing the size of boot2  
would make supporting legacy (>10hys old) BIOS machines impossible.


From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 25 16:58:54 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C8EFF16A421
	for <freebsd-fs@freebsd.org>; Wed, 25 Jul 2007 16:58:54 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from phoenix.cs.uoguelph.ca (phoenix.cs.uoguelph.ca [131.104.94.216])
	by mx1.freebsd.org (Postfix) with ESMTP id 881B313C457
	for <freebsd-fs@freebsd.org>; Wed, 25 Jul 2007 16:58:54 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from muncher.cs.uoguelph.ca (muncher.cs.uoguelph.ca [131.104.96.170])
	by phoenix.cs.uoguelph.ca (8.13.1/8.13.1) with ESMTP id l6PGwrcU014063
	for <freebsd-fs@freebsd.org>; Wed, 25 Jul 2007 12:58:53 -0400
Received: from localhost (rmacklem@localhost)
	by muncher.cs.uoguelph.ca (8.11.7p3+Sun/8.11.6) with ESMTP id
	l6PH3Jw02212
	for <freebsd-fs@freebsd.org>; Wed, 25 Jul 2007 13:03:19 -0400 (EDT)
X-Authentication-Warning: muncher.cs.uoguelph.ca: rmacklem owned process doing
	-bs
Date: Wed, 25 Jul 2007 13:03:19 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
X-X-Sender: rmacklem@muncher
To: freebsd-fs@freebsd.org
Message-ID: <Pine.GSO.4.63.0707251255180.1429@muncher>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Scanned-By: MIMEDefang 2.57 on 131.104.94.216
Subject: handling unresonsive NFS servers
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 25 Jul 2007 16:58:54 -0000

I have been thinking about what to do on a client when an NFS server is
unresponsive and thought I'd email to see what others thought?

"intr mounts" - These don't work correctly and it is nearly impossible
 	to make them work correctly. The problem is that, often, the
 	process which has a termination signal posted against it is blocked
 	waiting for some resource (vnode lock, buffer cache block,...) that
 	another process that is waiting for an RPC reply from the
 	unresponsive server, holds. Also, for NFSv4, a client can't just
         forget about an RPC that alters state on the server. If it does
         so, the RPC may have been performed on the server and the client's
         view of state might become inconsistent with the server's view.
 	(As such, I feel this should be "deprecated or disabled". I don't
 	 like things that "sorta work", but I can understand why some might
 	 feel that it should remain for NFSv2,3.)

"soft mounts" - These have the problem that system calls may terminate
 	abnormally when all you have is a slow, heavily loaded server.
 	As such, they might be ok for read-only mounts using NFSv2,3,
 	but seem too dangerous for anything else. (Very few apps. expect
 	an I/O system call to fail with ETIMEDOUT.)

So, about all I can think to do is make "umount -f" work properly. Since
it terminates all outstanding RPCs on the mount point (and gets rid of all
state for NFSv4), this can be made to work well. (Mac OS X does this.)
A problem with this is that it can only be done by someone with system
priviledge. However, it seems to me that most systems are either personal 
(laptops or desktops) where the person has system priviledge OR systems 
running as servers in machine room environments. The latter usually have 
sysadmin monitoring and also tend to talk to NFS servers where
connectivity seldom goes away. As such, needing system priviledge doesn't 
seem too serious an issue to me.

Any other thoughts? rick

From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 25 17:17:51 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B580716A41A
	for <freebsd-fs@freebsd.org>; Wed, 25 Jul 2007 17:17:51 +0000 (UTC)
	(envelope-from anderson@freebsd.org)
Received: from ns.trinitel.com (186.161.36.72.static.reverse.layeredtech.com
	[72.36.161.186])
	by mx1.freebsd.org (Postfix) with ESMTP id 8449213C45D
	for <freebsd-fs@freebsd.org>; Wed, 25 Jul 2007 17:17:48 +0000 (UTC)
	(envelope-from anderson@freebsd.org)
Received: from proton.local (209-163-168-124.static.twtelecom.net
	[209.163.168.124]) (authenticated bits=0)
	by ns.trinitel.com (8.14.1/8.14.1) with ESMTP id l6PHHmxn033498;
	Wed, 25 Jul 2007 12:17:48 -0500 (CDT)
	(envelope-from anderson@freebsd.org)
Message-ID: <46A785BC.8030602@freebsd.org>
Date: Wed, 25 Jul 2007 12:17:48 -0500
From: Eric Anderson <anderson@freebsd.org>
User-Agent: Thunderbird 2.0.0.5 (Macintosh/20070716)
MIME-Version: 1.0
To: Rick Macklem <rmacklem@uoguelph.ca>
References: <Pine.GSO.4.63.0707251255180.1429@muncher>
In-Reply-To: <Pine.GSO.4.63.0707251255180.1429@muncher>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=ham
	version=3.1.8
X-Spam-Checker-Version: SpamAssassin 3.1.8 (2007-02-13) on ns.trinitel.com
Cc: freebsd-fs@freebsd.org
Subject: Re: handling unresonsive NFS servers
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 25 Jul 2007 17:17:51 -0000

Rick Macklem wrote:
> I have been thinking about what to do on a client when an NFS server is
> unresponsive and thought I'd email to see what others thought?
> 
> "intr mounts" - These don't work correctly and it is nearly impossible
>     to make them work correctly. The problem is that, often, the
>     process which has a termination signal posted against it is blocked
>     waiting for some resource (vnode lock, buffer cache block,...) that
>     another process that is waiting for an RPC reply from the
>     unresponsive server, holds. Also, for NFSv4, a client can't just
>         forget about an RPC that alters state on the server. If it does
>         so, the RPC may have been performed on the server and the client's
>         view of state might become inconsistent with the server's view.
>     (As such, I feel this should be "deprecated or disabled". I don't
>      like things that "sorta work", but I can understand why some might
>      feel that it should remain for NFSv2,3.)
> 
> "soft mounts" - These have the problem that system calls may terminate
>     abnormally when all you have is a slow, heavily loaded server.
>     As such, they might be ok for read-only mounts using NFSv2,3,
>     but seem too dangerous for anything else. (Very few apps. expect
>     an I/O system call to fail with ETIMEDOUT.)
> 
> So, about all I can think to do is make "umount -f" work properly. Since
> it terminates all outstanding RPCs on the mount point (and gets rid of all
> state for NFSv4), this can be made to work well. (Mac OS X does this.)
> A problem with this is that it can only be done by someone with system
> priviledge. However, it seems to me that most systems are either 
> personal (laptops or desktops) where the person has system priviledge OR 
> systems running as servers in machine room environments. The latter 
> usually have sysadmin monitoring and also tend to talk to NFS servers where
> connectivity seldom goes away. As such, needing system priviledge 
> doesn't seem too serious an issue to me.
> 
> Any other thoughts? rick

I agree with you 100%.  In datacenters that I have run, umount -f should 
always work (in my opinion), and should be a superuser privilege.  If I 
am root, and I say 'umount -f ...' - just do it.  NFS servers do go 
away, and sometimes you *have* to umount -f.  In linux, you can make 
that happen (mostly), but FreeBSD doesn't like it much.  Anyone who runs 
with soft mounts, or intr mounts, should be prepared for inconsistent 
data, or broken apps when there are NFS issues.  Typically I expect my 
hard mounts (non-interruptable) to stick, and applications to block, 
until the mount comes back.  If I need to remove the mount though, I 
want to be able to do it with a umount -f command, and have it 'just 
work', since I know the consequences.

Eric


From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 25 17:37:26 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 8182416A418
	for <freebsd-fs@freebsd.org>; Wed, 25 Jul 2007 17:37:26 +0000 (UTC)
	(envelope-from rees@citi.umich.edu)
Received: from citi.umich.edu (citi.umich.edu [141.211.133.111])
	by mx1.freebsd.org (Postfix) with ESMTP id 5E5F313C465
	for <freebsd-fs@freebsd.org>; Wed, 25 Jul 2007 17:37:26 +0000 (UTC)
	(envelope-from rees@citi.umich.edu)
Received: from citi.umich.edu (dumaguete.citi.umich.edu [141.211.133.51])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(Client CN "Jim Rees", Issuer "CITI Production KCA" (verified OK))
	by citi.umich.edu (Postfix) with ESMTP id 7AAA047C2;
	Wed, 25 Jul 2007 13:12:15 -0400 (EDT)
Date: Wed, 25 Jul 2007 13:12:14 -0400
From: Jim Rees <rees@freebsd.org>
To: Rick Macklem <rmacklem@uoguelph.ca>
Message-ID: <20070725171214.GC25749@citi.umich.edu>
References: <Pine.GSO.4.63.0707251255180.1429@muncher>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <Pine.GSO.4.63.0707251255180.1429@muncher>
Cc: freebsd-fs@freebsd.org
Subject: Re: handling unresonsive NFS servers
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 25 Jul 2007 17:37:26 -0000

Afs has the same problem, and solves it by marking a server "down" when it
doesn't respond.  The timeout is very long, like a minute or more.  Normally
this would permanently hang the client, but once the server is marked down,
any subsequent operations fail immediately.  The client checks periodically
to see if the server has come back up.  Failing this way is better than
waiting forever, because waiting forever results in a reboot when the
machine's owner runs out of patience.

And by all means, do fix umount -f.

From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 25 18:13:36 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D44E716A419;
	Wed, 25 Jul 2007 18:13:36 +0000 (UTC)
	(envelope-from bakul@bitblocks.com)
Received: from mail.bitblocks.com (mail.bitblocks.com [64.142.15.60])
	by mx1.freebsd.org (Postfix) with ESMTP id B83E213C457;
	Wed, 25 Jul 2007 18:13:36 +0000 (UTC)
	(envelope-from bakul@bitblocks.com)
Received: from bitblocks.com (localhost.bitblocks.com [127.0.0.1])
	by mail.bitblocks.com (Postfix) with ESMTP id 9F47E5B3B;
	Wed, 25 Jul 2007 10:47:15 -0700 (PDT)
To: Doug Rabson <dfr@rabson.org>
In-reply-to: Your message of "Wed, 25 Jul 2007 10:30:48 BST."
	<1185355848.3698.7.camel@herring.rabson.org> 
Date: Wed, 25 Jul 2007 10:47:15 -0700
From: Bakul Shah <bakul@bitblocks.com>
Message-Id: <20070725174715.9F47E5B3B@mail.bitblocks.com>
Cc: freebsd-fs@freebsd.org, Pawel Jakub Dawidek <pjd@freebsd.org>,
	Mark Powell <M.S.Powell@salford.ac.uk>
Subject: Re: ZfS & GEOM with many odd drive sizes 
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 25 Jul 2007 18:13:36 -0000

>                 If you do that, ZFS can use its checksums to continually
> monitor the two sides of your mirrors for consistency and will be able
> to notice as early as possible when one of the drives goes flakey.

Does it really do this?  As I understood it, only one of the
disks in a mirror will be read for a given block.  If the
checksum fails, the same block from the other disk is read
and checksummed.  If all the disks in a mirror are read for
every block, ZFS read performance would get somewhat worse
instead of linear scaling up with more disks in a mirror.  In
order to monitor data on both disks one would need to
periodically run "zpool scrub", no?  But that is not
*continuous* monitoring of the two sides.

From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 25 18:21:10 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 2FAEB16A418
	for <freebsd-fs@freebsd.org>; Wed, 25 Jul 2007 18:21:10 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from gigi.cs.uoguelph.ca (gigi.cs.uoguelph.ca [131.104.94.210])
	by mx1.freebsd.org (Postfix) with ESMTP id C8FE713C474
	for <freebsd-fs@freebsd.org>; Wed, 25 Jul 2007 18:21:09 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from muncher.cs.uoguelph.ca (muncher.cs.uoguelph.ca [131.104.96.170])
	by gigi.cs.uoguelph.ca (8.13.1/8.13.1) with ESMTP id l6PIL67f025911;
	Wed, 25 Jul 2007 14:21:06 -0400
Received: from localhost (rmacklem@localhost)
	by muncher.cs.uoguelph.ca (8.11.7p3+Sun/8.11.6) with ESMTP id
	l6PIPYQ14848; Wed, 25 Jul 2007 14:25:34 -0400 (EDT)
X-Authentication-Warning: muncher.cs.uoguelph.ca: rmacklem owned process doing
	-bs
Date: Wed, 25 Jul 2007 14:25:34 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
X-X-Sender: rmacklem@muncher
To: Jim Rees <rees@freebsd.org>
In-Reply-To: <20070725171214.GC25749@citi.umich.edu>
Message-ID: <Pine.GSO.4.63.0707251421590.14081@muncher>
References: <Pine.GSO.4.63.0707251255180.1429@muncher>
	<20070725171214.GC25749@citi.umich.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Scanned-By: MIMEDefang 2.57 on 131.104.94.210
Cc: freebsd-fs@freebsd.org
Subject: Re: handling unresonsive NFS servers
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 25 Jul 2007 18:21:10 -0000


On Wed, 25 Jul 2007, Jim Rees wrote:

> Afs has the same problem, and solves it by marking a server "down" when it
> doesn't respond.  The timeout is very long, like a minute or more.  Normally
> this would permanently hang the client, but once the server is marked down,
> any subsequent operations fail immediately.  The client checks periodically
> to see if the server has come back up.  Failing this way is better than
> waiting forever, because waiting forever results in a reboot when the
> machine's owner runs out of patience.

Linux has something called a "lazy" umount, which I think is similar to 
the above, except that it is invoked by a sysadmin instead of a timeout
(and doesn't come back, just umounts when the RPCs finally happen). I
didn't see much use in it, but I can see that setting a mount point
"not working for now" might be useful.

>
> And by all means, do fix umount -f.
>


From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 25 18:37:17 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D235416A417
	for <freebsd-fs@freebsd.org>; Wed, 25 Jul 2007 18:37:17 +0000 (UTC)
	(envelope-from anderson@freebsd.org)
Received: from ns.trinitel.com (186.161.36.72.static.reverse.layeredtech.com
	[72.36.161.186])
	by mx1.freebsd.org (Postfix) with ESMTP id 1A21913C442
	for <freebsd-fs@freebsd.org>; Wed, 25 Jul 2007 18:37:17 +0000 (UTC)
	(envelope-from anderson@freebsd.org)
Received: from proton.local (209-163-168-124.static.twtelecom.net
	[209.163.168.124]) (authenticated bits=0)
	by ns.trinitel.com (8.14.1/8.14.1) with ESMTP id l6PIbGta046864;
	Wed, 25 Jul 2007 13:37:16 -0500 (CDT)
	(envelope-from anderson@freebsd.org)
Message-ID: <46A7985C.3010202@freebsd.org>
Date: Wed, 25 Jul 2007 13:37:16 -0500
From: Eric Anderson <anderson@freebsd.org>
User-Agent: Thunderbird 2.0.0.5 (Macintosh/20070716)
MIME-Version: 1.0
To: Jim Rees <rees@freebsd.org>
References: <Pine.GSO.4.63.0707251255180.1429@muncher>
	<20070725171214.GC25749@citi.umich.edu>
In-Reply-To: <20070725171214.GC25749@citi.umich.edu>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=ham
	version=3.1.8
X-Spam-Checker-Version: SpamAssassin 3.1.8 (2007-02-13) on ns.trinitel.com
Cc: freebsd-fs@freebsd.org
Subject: Re: handling unresonsive NFS servers
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 25 Jul 2007 18:37:17 -0000

Jim Rees wrote:
> Afs has the same problem, and solves it by marking a server "down" when it
> doesn't respond.  The timeout is very long, like a minute or more.  Normally
> this would permanently hang the client, but once the server is marked down,
> any subsequent operations fail immediately.  The client checks periodically
> to see if the server has come back up.  Failing this way is better than
> waiting forever, because waiting forever results in a reboot when the
> machine's owner runs out of patience.


For 'fail immediately', what does that mean?  It returns EIO?  That 
might be sufficient, although I think 1min is pretty low for NFS.  Of 
course, if it's settable, then that's good. :)

Eric

From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 25 18:41:26 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4576916A418
	for <freebsd-fs@freebsd.org>; Wed, 25 Jul 2007 18:41:26 +0000 (UTC)
	(envelope-from rees@citi.umich.edu)
Received: from citi.umich.edu (citi.umich.edu [141.211.133.111])
	by mx1.freebsd.org (Postfix) with ESMTP id 1F7B013C442
	for <freebsd-fs@freebsd.org>; Wed, 25 Jul 2007 18:41:18 +0000 (UTC)
	(envelope-from rees@citi.umich.edu)
Received: from citi.umich.edu (dsl093-001-248.det1.dsl.speakeasy.net
	[66.93.1.248])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(Client CN "Jim Rees", Issuer "CITI Production KCA" (verified OK))
	by citi.umich.edu (Postfix) with ESMTP id ECA3A47E9;
	Wed, 25 Jul 2007 14:41:17 -0400 (EDT)
Date: Wed, 25 Jul 2007 14:41:15 -0400
From: Jim Rees <rees@freebsd.org>
To: Eric Anderson <anderson@freebsd.org>
Message-ID: <20070725184114.GA12728@citi.umich.edu>
References: <Pine.GSO.4.63.0707251255180.1429@muncher>
	<20070725171214.GC25749@citi.umich.edu>
	<46A7985C.3010202@freebsd.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <46A7985C.3010202@freebsd.org>
Cc: freebsd-fs@freebsd.org
Subject: Re: handling unresonsive NFS servers
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 25 Jul 2007 18:41:26 -0000

Eric Anderson wrote:

  For 'fail immediately', what does that mean?  It returns EIO?

I don't know.  Personally I like EHOSTDOWN.

From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 25 18:57:42 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id AB8DF16A41F;
	Wed, 25 Jul 2007 18:57:42 +0000 (UTC) (envelope-from dfr@rabson.org)
Received: from itchy.rabson.org (unknown [IPv6:2001:618:400::50b1:e8f2])
	by mx1.freebsd.org (Postfix) with ESMTP id 2FDDA13C483;
	Wed, 25 Jul 2007 18:57:42 +0000 (UTC) (envelope-from dfr@rabson.org)
Received: from [80.177.232.250] (herring.rabson.org [80.177.232.250])
	by itchy.rabson.org (8.13.3/8.13.3) with ESMTP id l6PIva0C009137;
	Wed, 25 Jul 2007 19:57:38 +0100 (BST) (envelope-from dfr@rabson.org)
From: Doug Rabson <dfr@rabson.org>
To: Bakul Shah <bakul@bitblocks.com>
In-Reply-To: <20070725174715.9F47E5B3B@mail.bitblocks.com>
References: <20070725174715.9F47E5B3B@mail.bitblocks.com>
Content-Type: text/plain
Date: Wed, 25 Jul 2007 19:57:36 +0100
Message-Id: <1185389856.3698.11.camel@herring.rabson.org>
Mime-Version: 1.0
X-Mailer: Evolution 2.10.2 FreeBSD GNOME Team Port 
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=-1.4 required=5.0 tests=ALL_TRUSTED autolearn=failed 
	version=3.1.0
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on itchy.rabson.org
X-Virus-Scanned: ClamAV 0.87.1/3763/Wed Jul 25 16:37:41 2007 on
	itchy.rabson.org
X-Virus-Status: Clean
Cc: freebsd-fs@freebsd.org, Pawel Jakub Dawidek <pjd@freebsd.org>,
	Mark Powell <M.S.Powell@salford.ac.uk>
Subject: Re: ZfS & GEOM with many odd drive sizes
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 25 Jul 2007 18:57:42 -0000

On Wed, 2007-07-25 at 10:47 -0700, Bakul Shah wrote:
> >                 If you do that, ZFS can use its checksums to continually
> > monitor the two sides of your mirrors for consistency and will be able
> > to notice as early as possible when one of the drives goes flakey.
> 
> Does it really do this?  As I understood it, only one of the
> disks in a mirror will be read for a given block.  If the
> checksum fails, the same block from the other disk is read
> and checksummed.  If all the disks in a mirror are read for
> every block, ZFS read performance would get somewhat worse
> instead of linear scaling up with more disks in a mirror.  In
> order to monitor data on both disks one would need to
> periodically run "zpool scrub", no?  But that is not
> *continuous* monitoring of the two sides.

This is of course correct. I should have said "continuously checks the
data which you are actually looking at on a regular basis". The
consistency check is via the block checksum (not comparing the date from
the two sides of the mirror).


From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 25 19:46:38 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id BF84416A419
	for <freebsd-fs@freebsd.org>; Wed, 25 Jul 2007 19:46:38 +0000 (UTC)
	(envelope-from nowickis@tlen.pl)
Received: from tur.go2.pl (tur.go2.pl [193.17.41.50])
	by mx1.freebsd.org (Postfix) with ESMTP id 7F2BE13C4A3
	for <freebsd-fs@freebsd.org>; Wed, 25 Jul 2007 19:46:38 +0000 (UTC)
	(envelope-from nowickis@tlen.pl)
Received: from rekin18.go2.pl (rekin18.go2.pl [193.17.41.40])
	by tur.go2.pl (o2.pl Mailer 2.0.1) with ESMTP id F22B7230980
	for <freebsd-fs@freebsd.org>; Wed, 25 Jul 2007 21:16:29 +0200 (CEST)
Received: from o2.pl (unknown [10.0.0.38])
	by rekin18.go2.pl (Postfix) with SMTP id 0254253DD5
	for <freebsd-fs@freebsd.org>; Wed, 25 Jul 2007 21:16:28 +0200 (CEST)
From: =?UTF-8?Q?nowickis?= <nowickis@tlen.pl>
To: freebsd-fs@freebsd.org
Mime-Version: 1.0
Message-ID: <123cb29.3cae55fc.46a7a18c.933@o2.pl>
Date: Wed, 25 Jul 2007 21:16:28 +0200
X-Originator: 89.78.226.21
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Subject: UnionFS
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 25 Jul 2007 19:46:38 -0000

Hi.

I'm=20curious=20about=20your=20experience=20with=20unionfs.
Have=20you=20try=20it?=20Did=20you=20have=20some=20troubles=20while=20usi=
ng=20it.

Sebastian


From owner-freebsd-fs@FreeBSD.ORG  Thu Jul 26 02:25:30 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E35E316A419;
	Thu, 26 Jul 2007 02:25:30 +0000 (UTC)
	(envelope-from rodrigc@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id BB63113C468;
	Thu, 26 Jul 2007 02:25:30 +0000 (UTC)
	(envelope-from rodrigc@FreeBSD.org)
Received: from freefall.freebsd.org (rodrigc@localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.1/8.14.1) with ESMTP id l6Q2PULe047966;
	Thu, 26 Jul 2007 02:25:30 GMT
	(envelope-from rodrigc@freefall.freebsd.org)
Received: (from rodrigc@localhost)
	by freefall.freebsd.org (8.14.1/8.14.1/Submit) id l6Q2PU38047962;
	Thu, 26 Jul 2007 02:25:30 GMT (envelope-from rodrigc)
Date: Thu, 26 Jul 2007 02:25:30 GMT
Message-Id: <200707260225.l6Q2PU38047962@freefall.freebsd.org>
To: rodrigc@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org
From: rodrigc@FreeBSD.org
Cc: 
Subject: Re: kern/112658: [smbfs] [patch] smbfs and caching problems
	(resolves bin/111004)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 26 Jul 2007 02:25:31 -0000

Synopsis: [smbfs] [patch] smbfs and caching problems (resolves bin/111004)

Responsible-Changed-From-To: freebsd-bugs->freebsd-fs
Responsible-Changed-By: rodrigc
Responsible-Changed-When: Thu Jul 26 02:24:18 UTC 2007
Responsible-Changed-Why: 


http://www.freebsd.org/cgi/query-pr.cgi?pr=112658

From owner-freebsd-fs@FreeBSD.ORG  Thu Jul 26 06:59:28 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 088E916A41F
	for <freebsd-fs@freebsd.org>; Thu, 26 Jul 2007 06:59:28 +0000 (UTC)
	(envelope-from M.S.Powell@salford.ac.uk)
Received: from abbe.salford.ac.uk (abbe.salford.ac.uk [146.87.0.10])
	by mx1.freebsd.org (Postfix) with SMTP id 7226413C46C
	for <freebsd-fs@freebsd.org>; Thu, 26 Jul 2007 06:59:27 +0000 (UTC)
	(envelope-from M.S.Powell@salford.ac.uk)
Received: (qmail 12054 invoked by uid 98); 26 Jul 2007 07:59:25 +0100
Received: from 146.87.255.121 by abbe.salford.ac.uk (envelope-from
	<M.S.Powell@salford.ac.uk>, uid 401) with qmail-scanner-2.01 
	(clamdscan: 0.90/3775. spamassassin: 3.1.8.  
	Clear:RC:1(146.87.255.121):. 
	Processed in 0.106566 secs); 26 Jul 2007 06:59:25 -0000
Received: from rust.salford.ac.uk (HELO rust.salford.ac.uk) (146.87.255.121)
	by abbe.salford.ac.uk (qpsmtpd/0.3x.614) with SMTP;
	Thu, 26 Jul 2007 07:59:25 +0100
Received: (qmail 68238 invoked by uid 1002); 26 Jul 2007 06:59:23 -0000
Received: from localhost (sendmail-bs@127.0.0.1)
	by localhost with SMTP; 26 Jul 2007 06:59:23 -0000
Date: Thu, 26 Jul 2007 07:59:23 +0100 (BST)
From: "Mark Powell" <M.S.Powell@salford.ac.uk>
To: Doug Rabson <dfr@rabson.org>
In-Reply-To: <1185389856.3698.11.camel@herring.rabson.org>
Message-ID: <20070726075607.W68220@rust.salford.ac.uk>
References: <20070725174715.9F47E5B3B@mail.bitblocks.com>
	<1185389856.3698.11.camel@herring.rabson.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-fs@freebsd.org, Pawel Jakub Dawidek <pjd@freebsd.org>,
	Mark Powell <M.S.Powell@salford.ac.uk>
Subject: Re: ZfS & GEOM with many odd drive sizes
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 26 Jul 2007 06:59:28 -0000

On Wed, 25 Jul 2007, Doug Rabson wrote:
> On Wed, 2007-07-25 at 10:47 -0700, Bakul Shah wrote:
>> Does it really do this?  As I understood it, only one of the
>> disks in a mirror will be read for a given block.  If the
>> checksum fails, the same block from the other disk is read
>> and checksummed.  If all the disks in a mirror are read for
>> every block, ZFS read performance would get somewhat worse
>> instead of linear scaling up with more disks in a mirror.  In
>> order to monitor data on both disks one would need to
>> periodically run "zpool scrub", no?  But that is not
>> *continuous* monitoring of the two sides.
>
> This is of course correct. I should have said "continuously checks the
> data which you are actually looking at on a regular basis". The
> consistency check is via the block checksum (not comparing the date from
> the two sides of the mirror).

ACcording to this:

http://www.opensolaris.org/jive/thread.jspa?threadID=23093&tstart=0

RAID-Z has to read every drive to be able to checksum a block.
   Isn't this the reason why RAID-Z random reads are so slow and also the 
reason the pre-fetcher exists to speed up sequential reads?
   Cheers.

-- 
Mark Powell - UNIX System Administrator - The University of Salford
Information Services Division, Clifford Whitworth Building,
Salford University, Manchester, M5 4WT, UK.
Tel: +44 161 295 4837  Fax: +44 161 295 5888  www.pgp.com for PGP key

From owner-freebsd-fs@FreeBSD.ORG  Thu Jul 26 07:29:43 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 2068F16A419;
	Thu, 26 Jul 2007 07:29:43 +0000 (UTC) (envelope-from dfr@rabson.org)
Received: from itchy.rabson.org (unknown [IPv6:2001:618:400::50b1:e8f2])
	by mx1.freebsd.org (Postfix) with ESMTP id 8111913C45B;
	Thu, 26 Jul 2007 07:29:42 +0000 (UTC) (envelope-from dfr@rabson.org)
Received: from [80.177.232.250] (herring.rabson.org [80.177.232.250])
	by itchy.rabson.org (8.13.3/8.13.3) with ESMTP id l6Q7TXR4016034;
	Thu, 26 Jul 2007 08:29:33 +0100 (BST) (envelope-from dfr@rabson.org)
From: Doug Rabson <dfr@rabson.org>
To: Mark Powell <M.S.Powell@salford.ac.uk>
In-Reply-To: <20070726075607.W68220@rust.salford.ac.uk>
References: <20070725174715.9F47E5B3B@mail.bitblocks.com>
	<1185389856.3698.11.camel@herring.rabson.org>
	<20070726075607.W68220@rust.salford.ac.uk>
Content-Type: text/plain
Date: Thu, 26 Jul 2007 08:29:33 +0100
Message-Id: <1185434973.3698.18.camel@herring.rabson.org>
Mime-Version: 1.0
X-Mailer: Evolution 2.10.2 FreeBSD GNOME Team Port 
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=-1.4 required=5.0 tests=ALL_TRUSTED autolearn=failed 
	version=3.1.0
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on itchy.rabson.org
X-Virus-Scanned: ClamAV 0.87.1/3775/Thu Jul 26 06:56:02 2007 on
	itchy.rabson.org
X-Virus-Status: Clean
Cc: freebsd-fs@freebsd.org, Pawel Jakub Dawidek <pjd@freebsd.org>
Subject: Re: ZfS & GEOM with many odd drive sizes
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 26 Jul 2007 07:29:43 -0000

On Thu, 2007-07-26 at 07:59 +0100, Mark Powell wrote:
> On Wed, 25 Jul 2007, Doug Rabson wrote:
> > On Wed, 2007-07-25 at 10:47 -0700, Bakul Shah wrote:
> >> Does it really do this?  As I understood it, only one of the
> >> disks in a mirror will be read for a given block.  If the
> >> checksum fails, the same block from the other disk is read
> >> and checksummed.  If all the disks in a mirror are read for
> >> every block, ZFS read performance would get somewhat worse
> >> instead of linear scaling up with more disks in a mirror.  In
> >> order to monitor data on both disks one would need to
> >> periodically run "zpool scrub", no?  But that is not
> >> *continuous* monitoring of the two sides.
> >
> > This is of course correct. I should have said "continuously checks the
> > data which you are actually looking at on a regular basis". The
> > consistency check is via the block checksum (not comparing the date from
> > the two sides of the mirror).
> 
> ACcording to this:
> 
> http://www.opensolaris.org/jive/thread.jspa?threadID=23093&tstart=0
> 
> RAID-Z has to read every drive to be able to checksum a block.
>    Isn't this the reason why RAID-Z random reads are so slow and also the 
> reason the pre-fetcher exists to speed up sequential reads?
>    Cheers.

When its reading, RAID-Z only has to read the blocks which contain data
- the parity block is only read if either the vdev is in degraded mode
after a drive failure or one (two for RAID-Z2) of the data block reads
fails.

For pools which contain a single RAID-Z or RAID-Z2 group, this is
probably a performance issue. Larger pools containing multiple RAID-Z
groups can spread the load to improve this.


From owner-freebsd-fs@FreeBSD.ORG  Thu Jul 26 07:47:17 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 0AA3D16A417
	for <freebsd-fs@freebsd.org>; Thu, 26 Jul 2007 07:47:17 +0000 (UTC)
	(envelope-from M.S.Powell@salford.ac.uk)
Received: from abbe.salford.ac.uk (abbe.salford.ac.uk [146.87.0.10])
	by mx1.freebsd.org (Postfix) with SMTP id 7462B13C468
	for <freebsd-fs@freebsd.org>; Thu, 26 Jul 2007 07:47:16 +0000 (UTC)
	(envelope-from M.S.Powell@salford.ac.uk)
Received: (qmail 33792 invoked by uid 98); 26 Jul 2007 08:47:15 +0100
Received: from 146.87.255.121 by abbe.salford.ac.uk (envelope-from
	<M.S.Powell@salford.ac.uk>, uid 401) with qmail-scanner-2.01 
	(clamdscan: 0.90/3775. spamassassin: 3.1.8.  
	Clear:RC:1(146.87.255.121):. 
	Processed in 0.064179 secs); 26 Jul 2007 07:47:15 -0000
Received: from rust.salford.ac.uk (HELO rust.salford.ac.uk) (146.87.255.121)
	by abbe.salford.ac.uk (qpsmtpd/0.3x.614) with SMTP;
	Thu, 26 Jul 2007 08:47:15 +0100
Received: (qmail 68571 invoked by uid 1002); 26 Jul 2007 07:47:13 -0000
Received: from localhost (sendmail-bs@127.0.0.1)
	by localhost with SMTP; 26 Jul 2007 07:47:13 -0000
Date: Thu, 26 Jul 2007 08:47:13 +0100 (BST)
From: "Mark Powell" <M.S.Powell@salford.ac.uk>
To: Doug Rabson <dfr@rabson.org>
In-Reply-To: <1185434973.3698.18.camel@herring.rabson.org>
Message-ID: <20070726083224.O68220@rust.salford.ac.uk>
References: <20070725174715.9F47E5B3B@mail.bitblocks.com> 
	<1185389856.3698.11.camel@herring.rabson.org>
	<20070726075607.W68220@rust.salford.ac.uk>
	<1185434973.3698.18.camel@herring.rabson.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-fs@freebsd.org, Pawel Jakub Dawidek <pjd@freebsd.org>
Subject: Re: ZfS & GEOM with many odd drive sizes
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 26 Jul 2007 07:47:17 -0000

On Thu, 26 Jul 2007, Doug Rabson wrote:

> When its reading, RAID-Z only has to read the blocks which contain data
> - the parity block is only read if either the vdev is in degraded mode
> after a drive failure or one (two for RAID-Z2) of the data block reads
> fails.

Yes, but that article does not mention reading parity. What it's saying is 
that every block is striped across multiple drives. The checksum for that 
block thus applies to data which is on multiple drives. Therefore to 
checksum a block you have to read all the parts of the block from every 
drive except one in the RAIDz array:

"This makes read performance of a RAID-Z pool be the same as that of a 
single disk, even if you only needed a small read from block D."

> For pools which contain a single RAID-Z or RAID-Z2 group, this is
> probably a performance issue. Larger pools containing multiple RAID-Z
> groups can spread the load to improve this.

This isn't something that's immediately obvious, coming from fixed stripe 
size raid5. Now it seems that the variable stripe size has a rather 
serious performance penalty.
   It seems that if you have 8 drives, it'd be much more prudent to make 
two RAIDz of 3+1 rather than one of 6+2.
   Cheers.

-- 
Mark Powell - UNIX System Administrator - The University of Salford
Information Services Division, Clifford Whitworth Building,
Salford University, Manchester, M5 4WT, UK.
Tel: +44 161 295 4837  Fax: +44 161 295 5888  www.pgp.com for PGP key

From owner-freebsd-fs@FreeBSD.ORG  Fri Jul 27 09:32:22 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D3BCD16A417
	for <freebsd-fs@freebsd.org>; Fri, 27 Jul 2007 09:32:22 +0000 (UTC)
	(envelope-from M.S.Powell@salford.ac.uk)
Received: from abbe.salford.ac.uk (abbe.salford.ac.uk [146.87.0.10])
	by mx1.freebsd.org (Postfix) with SMTP id 51EC113C458
	for <freebsd-fs@freebsd.org>; Fri, 27 Jul 2007 09:32:21 +0000 (UTC)
	(envelope-from M.S.Powell@salford.ac.uk)
Received: (qmail 80895 invoked by uid 98); 27 Jul 2007 10:32:20 +0100
Received: from 146.87.255.121 by abbe.salford.ac.uk (envelope-from
	<M.S.Powell@salford.ac.uk>, uid 401) with qmail-scanner-2.01 
	(clamdscan: 0.90/3779. spamassassin: 3.1.8.  
	Clear:RC:1(146.87.255.121):. 
	Processed in 0.056505 secs); 27 Jul 2007 09:32:20 -0000
Received: from rust.salford.ac.uk (HELO rust.salford.ac.uk) (146.87.255.121)
	by abbe.salford.ac.uk (qpsmtpd/0.3x.614) with SMTP;
	Fri, 27 Jul 2007 10:32:20 +0100
Received: (qmail 78183 invoked by uid 1002); 27 Jul 2007 09:32:18 -0000
Received: from localhost (sendmail-bs@127.0.0.1)
	by localhost with SMTP; 27 Jul 2007 09:32:18 -0000
Date: Fri, 27 Jul 2007 10:32:18 +0100 (BST)
From: "Mark Powell" <M.S.Powell@salford.ac.uk>
To: freebsd-fs@freebsd.org
Message-ID: <20070727100039.V68220@rust.salford.ac.uk>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Subject: Breaking raidz and zpool bug?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 27 Jul 2007 09:32:22 -0000

Hi,
   Have a machine with two identical drives ad[01]. Each have a 4GB slice1 
and the rest as slice 2. Slice 1 contains a gmirrored ufs /boot and swap. 
Slice2 I foolishly raidz'ed and put / on there. All works well.
   I realised the error of making a raidz of only 2 drives and wanted to 
convert this setup to gmirror without any backup/restore or pulling of 
drives to force them to error.
   I assume the difficulty with doing this is from deliberate safeguards to 
prevent data lose in normal usage?
   1st I needed to break the raidz, stop it using one of the drives, so I 
can make the mirror on it. I thought I could just wipe ad1s2, but I am 
prevented from doing that cos it's being used by zfs. Even with 
kern.geom.debugflags=16. I couldn't change the partition details using 
fdisk for ad1s2 as it doesn't allow it for partitions in use.
   So i blanked sector 0 on ad1 and rebooted. zpool status showed the raidz 
as degraded. I thought I'd then create a zpool mirror on ad1s2, but of 
course I can't cos it's still part of the raidz. I could find no way to 
remove ad1s2 from the raidz. zpool detach is only for hot spares.
   I tried to get around the problems of the system not letting me do 
anything with ad1s2, by creating an identical ad1s3 and then changing the 
slice type of ad1s2 to 1 (DOS FAT 16-bit).
   I rebooted, but the zfs root would not mount. I booted into a test 
enviroment and zpool status told me the worst that no replicas could be 
found. At first I assumed I'd made a mess of something, but after 
reflection I was sure I'd not touched ad0.
   I changed the type of ad1s2 back to FreeBSD 165 and the zfs root worked 
fine again albeit in the degraded state.
   It shouldn't be possible to break a raidz simply by changing the slice 
type?
   Is this a bug?
   And does anyone have ideas for what I was trying?
   Cheers.

-- 
Mark Powell - UNIX System Administrator - The University of Salford
Information Services Division, Clifford Whitworth Building,
Salford University, Manchester, M5 4WT, UK.
Tel: +44 161 295 4837  Fax: +44 161 295 5888  www.pgp.com for PGP key