From owner-freebsd-fs@FreeBSD.ORG  Wed Jan 23 03:51:25 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Received: from (
 by (Postfix) with ESMTP id 7FD5554F
 for <>; Wed, 23 Jan 2013 03:51:25 +0000 (UTC)
Received: from ( [])
 by (Postfix) with ESMTP id 3C8347E1
 for <>; Wed, 23 Jan 2013 03:51:24 +0000 (UTC)
Received: from localhost (localhost [])
 by (Postfix) with ESMTP id 55CC112E2056E;
 Tue, 22 Jan 2013 19:51:24 -0800 (PST)
X-Virus-Scanned: amavisd-new at
Received: from ([])
 by localhost ( []) (amavisd-new, port 10024)
 with ESMTP id zbtFqZ0Kw7qc; Tue, 22 Jan 2013 19:51:23 -0800 (PST)
Received: from []
 ( [])
 by (Postfix) with ESMTPSA id 2C17112E2055A;
 Tue, 22 Jan 2013 19:51:23 -0800 (PST)
Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\))
Subject: Re: RFC: Suggesting ZFS "best practices" in FreeBSD
From: Michael DeMan <>
In-Reply-To: <>
Date: Tue, 22 Jan 2013 19:51:22 -0800
Message-Id: <>
References: <>
To: Jason Keltz <>
X-Mailer: Apple Mail (2.1499)
Content-Type: text/plain;
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.14
Cc: FreeBSD Filesystems <>
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <>
List-Unsubscribe: <>,
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>,
X-List-Received-Date: Wed, 23 Jan 2013 03:51:25 -0000

Inline below...
On Jan 22, 2013, at 6:40 PM, Jason Keltz <> wrote:
>>> #1.  Map the physical drive slots to how they show up in FBSD so if =
a disk is removed and the machine is rebooted all the disks after that =
removed one do not have an 'off by one error'.  i.e. if you have =
ada0-ada14 and remove ada8 then reboot - normally FBSD skips that =
missing ada8 drive and the next drive (that used to be ada9) is now =
called ada8 and so on...
>> How do you do that?  If I'm in that situation, I think I could find =
the bad drive, or at least the good ones, with diskinfo and the drive =
serial number.  One suggestion I saw somewhere was to use disk serial =
numbers for label values.
> I think that was using /boot/device.hints.  Unfortunately it only =
works for some systems, and not for all..  and someone shared an =
experience with me where a kernel update caused the card probe order to =
change, the devices to change, and then it all broke...  It worked for =
one card, not for the other...  I gave up because I wanted consistency =
across different systems..

I am not sure, but possibly I hit that same issue about pci-probing with =
our ZFS test machine - basically I vaguely recall asking to have the =
SATA controllers have their slots swapped without completely knowing why =
it needed to be done other than it did need to be done.  It could have =
been from an upgrade from FBSD 7.x -> 8.x -> 9.x, or could have just =
because its a test box and there were other things going on with for a =
while and the cards had got put back in out of order after doing some =
other stuff.

This is actually kind of an interesting problem overall - logical vs. =
physical and how to keep things mapped in a way that makes sense.  The =
linux community has run into this and substantially (from a basic end =
user perspective) changed the way they deal with hardware MAC addresses =
and ethernet cards between RHEL5 and RHEL6.  Ultimately neither of their =
techniques works very well.  For the FreeBSD community we should =
probably pick one or another strategy and just standardize on it with =
its warts and all and have it documented?

> In my own opinion, the whole process of partitioning drives, labelling =
them, all kinds of tricks for dealing with 4k drives, manually =
configuring /boot/device.hints, etc. is something that we have to do, =
but honestly, I really believe there *has* to be a better way.... =20

I agree.  At this point the only solution I can think of to be able to =
use ZFS on FreeBSD for production systems is to write scripts that do =
all of this - all the goofy gpart + gnop + everything else.  How is =
anybody supposed to replace a disk in a system in an emergency situation =
by having to run a bunch of cryptic command line stuff on a disk before =
they can even confidently put it in as a replacement for the original?  =
And by definition of having to do a bunch of manual command line stuff =
you can not be reliably confident?

> Years back when I was using a 3ware/AMCC RAID card (actually, I AM =
still using a few), none of this was an issue... every disk just =
appeared in order.. I didn't have to configure anything specially ..  =
ordering never changed when I removed a drive, I didn't need to =
partition or do anything with the disks - just give it the raw disks, =
and it knew what to do...  If anything, I took my labeller and labelled =
the disk bays with a numeric label so when I got an error, I knew which =
disk to pull, but order never changed, and I always pulled the right =
drive... Now, I look at my pricey "new" system, see disks ordered by =
default in what seems like an almost "random" order... I dded each drive =
to figure out the exact ordering, and labelled the disks, but it just =
gets really annoying....

A lot of these things - about making sure that a little extra space is =
spared on the drive when an array is first built so that when a new =
drive with slightly smaller capacity is the replacement - the RAID =
vendors have hidden that away from the end user.  In many cases they =
have only done that in the last 10 years or so?  And I stumbled a few =
weeks ago about a Sun ZFS user that had received Sun certified disks =
that had the same issue - a few sectors too small...

Overall you are describing exactly the kind of behavior I want, and I =
think everybody needs from a FreeBSD+ZFS system.

- Alarm sent out - drive #52 failed- wake up and deal with it.
- Go to server (or call data center) - groggily look at labels on front =
of disk caddies - physically pull drive #52
- insert new similarly sized drive from inventory as new #52. =20
- Verify resilver is in progress
- Confidently go back to bed knowing all is okay

The above scenario is just unworkable right now for most people (even =
tech-savvy people) because of the lack of documentation - hence I am =
glad to see some kind of 'best practices' document put together.

- Mike