From owner-freebsd-fs@FreeBSD.ORG  Wed Jan 23 03:51:25 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 7FD5554F
 for <freebsd-fs@freebsd.org>; Wed, 23 Jan 2013 03:51:25 +0000 (UTC)
 (envelope-from freebsd@deman.com)
Received: from plato.corp.nas.com (plato.corp.nas.com [66.114.32.138])
 by mx1.freebsd.org (Postfix) with ESMTP id 3C8347E1
 for <freebsd-fs@freebsd.org>; Wed, 23 Jan 2013 03:51:24 +0000 (UTC)
Received: from localhost (localhost [127.0.0.1])
 by plato.corp.nas.com (Postfix) with ESMTP id 55CC112E2056E;
 Tue, 22 Jan 2013 19:51:24 -0800 (PST)
X-Virus-Scanned: amavisd-new at corp.nas.com
Received: from plato.corp.nas.com ([127.0.0.1])
 by localhost (plato.corp.nas.com [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id zbtFqZ0Kw7qc; Tue, 22 Jan 2013 19:51:23 -0800 (PST)
Received: from [192.168.113.72]
 (75-151-97-138-washington.hfc.comcastbusiness.net [75.151.97.138])
 by plato.corp.nas.com (Postfix) with ESMTPSA id 2C17112E2055A;
 Tue, 22 Jan 2013 19:51:23 -0800 (PST)
Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\))
Subject: Re: RFC: Suggesting ZFS "best practices" in FreeBSD
From: Michael DeMan <freebsd@deman.com>
In-Reply-To: <50FF4D87.5080404@cse.yorku.ca>
Date: Tue, 22 Jan 2013 19:51:22 -0800
Message-Id: <8DC70418-7CF9-4839-BDC6-1A1AF5354307@deman.com>
References: <314B600D-E8E6-4300-B60F-33D5FA5A39CF@sarenet.es>
 <alpine.BSF.2.00.1301220759420.61512@wonkity.com>
 <B60D815C-3F2D-4FA7-B5CC-D04EC262853B@deman.com>
 <alpine.BSF.2.00.1301221907120.66546@wonkity.com>
 <50FF4D87.5080404@cse.yorku.ca>
To: Jason Keltz <jas@cse.yorku.ca>
X-Mailer: Apple Mail (2.1499)
Content-Type: text/plain;
	charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.14
Cc: FreeBSD Filesystems <freebsd-fs@freebsd.org>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 23 Jan 2013 03:51:25 -0000

Inline below...
On Jan 22, 2013, at 6:40 PM, Jason Keltz <jas@cse.yorku.ca> wrote:
<SNIP>
>>> #1.  Map the physical drive slots to how they show up in FBSD so if =
a disk is removed and the machine is rebooted all the disks after that =
removed one do not have an 'off by one error'.  i.e. if you have =
ada0-ada14 and remove ada8 then reboot - normally FBSD skips that =
missing ada8 drive and the next drive (that used to be ada9) is now =
called ada8 and so on...
>>=20
>> How do you do that?  If I'm in that situation, I think I could find =
the bad drive, or at least the good ones, with diskinfo and the drive =
serial number.  One suggestion I saw somewhere was to use disk serial =
numbers for label values.
> I think that was using /boot/device.hints.  Unfortunately it only =
works for some systems, and not for all..  and someone shared an =
experience with me where a kernel update caused the card probe order to =
change, the devices to change, and then it all broke...  It worked for =
one card, not for the other...  I gave up because I wanted consistency =
across different systems..

I am not sure, but possibly I hit that same issue about pci-probing with =
our ZFS test machine - basically I vaguely recall asking to have the =
SATA controllers have their slots swapped without completely knowing why =
it needed to be done other than it did need to be done.  It could have =
been from an upgrade from FBSD 7.x -> 8.x -> 9.x, or could have just =
because its a test box and there were other things going on with for a =
while and the cards had got put back in out of order after doing some =
other stuff.

This is actually kind of an interesting problem overall - logical vs. =
physical and how to keep things mapped in a way that makes sense.  The =
linux community has run into this and substantially (from a basic end =
user perspective) changed the way they deal with hardware MAC addresses =
and ethernet cards between RHEL5 and RHEL6.  Ultimately neither of their =
techniques works very well.  For the FreeBSD community we should =
probably pick one or another strategy and just standardize on it with =
its warts and all and have it documented?

>=20
> In my own opinion, the whole process of partitioning drives, labelling =
them, all kinds of tricks for dealing with 4k drives, manually =
configuring /boot/device.hints, etc. is something that we have to do, =
but honestly, I really believe there *has* to be a better way.... =20

I agree.  At this point the only solution I can think of to be able to =
use ZFS on FreeBSD for production systems is to write scripts that do =
all of this - all the goofy gpart + gnop + everything else.  How is =
anybody supposed to replace a disk in a system in an emergency situation =
by having to run a bunch of cryptic command line stuff on a disk before =
they can even confidently put it in as a replacement for the original?  =
And by definition of having to do a bunch of manual command line stuff =
you can not be reliably confident?

> Years back when I was using a 3ware/AMCC RAID card (actually, I AM =
still using a few), none of this was an issue... every disk just =
appeared in order.. I didn't have to configure anything specially ..  =
ordering never changed when I removed a drive, I didn't need to =
partition or do anything with the disks - just give it the raw disks, =
and it knew what to do...  If anything, I took my labeller and labelled =
the disk bays with a numeric label so when I got an error, I knew which =
disk to pull, but order never changed, and I always pulled the right =
drive... Now, I look at my pricey "new" system, see disks ordered by =
default in what seems like an almost "random" order... I dded each drive =
to figure out the exact ordering, and labelled the disks, but it just =
gets really annoying....


A lot of these things - about making sure that a little extra space is =
spared on the drive when an array is first built so that when a new =
drive with slightly smaller capacity is the replacement - the RAID =
vendors have hidden that away from the end user.  In many cases they =
have only done that in the last 10 years or so?  And I stumbled a few =
weeks ago about a Sun ZFS user that had received Sun certified disks =
that had the same issue - a few sectors too small...


Overall you are describing exactly the kind of behavior I want, and I =
think everybody needs from a FreeBSD+ZFS system.

- Alarm sent out - drive #52 failed- wake up and deal with it.
- Go to server (or call data center) - groggily look at labels on front =
of disk caddies - physically pull drive #52
- insert new similarly sized drive from inventory as new #52. =20
- Verify resilver is in progress
- Confidently go back to bed knowing all is okay

The above scenario is just unworkable right now for most people (even =
tech-savvy people) because of the lack of documentation - hence I am =
glad to see some kind of 'best practices' document put together.
<SNIP>

- Mike