From owner-freebsd-stable@FreeBSD.ORG Sat Jan 26 05:31:52 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 6F7D0A88 for ; Sat, 26 Jan 2013 05:31:52 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from qmta12.emeryville.ca.mail.comcast.net (qmta12.emeryville.ca.mail.comcast.net [IPv6:2001:558:fe2d:44:76:96:27:227]) by mx1.freebsd.org (Postfix) with ESMTP id 49DF9FC2 for ; Sat, 26 Jan 2013 05:31:52 +0000 (UTC) Received: from omta13.emeryville.ca.mail.comcast.net ([76.96.30.52]) by qmta12.emeryville.ca.mail.comcast.net with comcast id sTxa1k00617UAYkACVXrTz; Sat, 26 Jan 2013 05:31:51 +0000 Received: from koitsu.strangled.net ([67.180.84.87]) by omta13.emeryville.ca.mail.comcast.net with comcast id sVXq1k00J1t3BNj8ZVXqW8; Sat, 26 Jan 2013 05:31:51 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 9678D73A1B; Fri, 25 Jan 2013 21:31:50 -0800 (PST) Date: Fri, 25 Jan 2013 21:31:50 -0800 From: Jeremy Chadwick To: Warren Block Subject: Re: RFC: Suggesting ZFS "best practices" in FreeBSD Message-ID: <20130126053150.GA4398@icarus.home.lan> References: <20130124174039.GA35811@icarus.home.lan> <20130126025929.GA2777@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20121106; t=1359178311; bh=QnfesQy01QAFwlGJLgWbR1wV1qc8YpF2A0yNHKUEgRs=; h=Received:Received:Received:Date:From:To:Subject:Message-ID: MIME-Version:Content-Type; b=KegTWfaGk2r0c5k2XPkd1zQQvXROnnkvufzorXbn+EUssnX91o248/Ij4MVdegxks YOWv/SAukofzMNRqNkCTQHHYV5BU05ltV5J+OaFnAPPKEkjX2yHJkCw/LvyJxpySXk O5taZLJ+wXjQoJOzibkaf0RSGdut/p8eHyetTbCzKAdzmtCJDDtMZXP9cIg0SWNvoC asXMQ3kS0CFd7EONq6STDrvCAx1gP5n/ED3b58h1spWJgcYovBk176b8K6x4/ThpqP 34+4eBU9TF3+sZ1PQi2VdTfrw4AkWJm1DRPjorzohg2HOgvTEejZkc+X6aDGebX9Qd B43KirnXAXT1A== Cc: freebsd@deman.com, freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 26 Jan 2013 05:31:52 -0000 On Fri, Jan 25, 2013 at 08:48:51PM -0700, Warren Block wrote: > On Fri, 25 Jan 2013, Jeremy Chadwick wrote: > > >On Fri, Jan 25, 2013 at 12:58:15PM -0700, Warren Block wrote: > >>On Thu, 24 Jan 2013, Jeremy Chadwick wrote: > >> > >>>>>#1. Map the physical drive slots to how they show up in FBSD so if a > >>>>>disk is removed and the machine is rebooted all the disks after that > >>>>>removed one do not have an 'off by one error'. i.e. if you have > >>>>>ada0-ada14 and remove ada8 then reboot - normally FBSD skips that > >>>>>missing ada8 drive and the next drive (that used to be ada9) is now > >>>>>called ada8 and so on... > >>>> > >>>>How do you do that? If I'm in that situation, I think I could find the > >>>>bad drive, or at least the good ones, with diskinfo and the drive serial > >>>>number. One suggestion I saw somewhere was to use disk serial numbers > >>>>for label values. > >>> > >>>The term FreeBSD uses for this is called "wiring down" or "wired down", > >>>and is documented in CAM(4). It's come up repeatedly over the years but > >>>for whatever reason people overlook it or can't find it. > >> > >>I was aware of it, it just seems like there ought to be a better way > >>to identify drives than by messing with the hardware configuration. > > > >I understand what you mean, but it's actually messing with a software > >configuration (specifically CAM). > > > >It's a one-time change that solves the dilemma; it only has to be > >adjusted if you change controller brands or models, which is a lot less > >often than changing disks. > > > >>Something more elegant, less tied to changing the hardware > >>configuration of the host. Assigning the drive serial number as a > >>label, for example. > > > >Hmm... all this does is change the nature of the problem, no? You > >still have the issue of "having to know some magical number" to > >determine out what path name refers to what physical disk in your system. > >Can you expand on how this would solve it? > > It's not so much a solution as in the right domain. The point, as I > see it, is being able to identify individual disks uniquely. > Forcing static devices names does that, sort of. But plug a > different disk into the same port as an existing one, and that disk > is now identified as the old one. Identifying individual disks is a separate subject, as I see it, from that of what the original concern was. Quoting that concern: > >>>>>#1. Map the physical drive slots to how they show up in FBSD so if a > >>>>>disk is removed and the machine is rebooted all the disks after that > >>>>>removed one do not have an 'off by one error'. i.e. if you have > >>>>>ada0-ada14 and remove ada8 then reboot - normally FBSD skips that > >>>>>missing ada8 drive and the next drive (that used to be ada9) is now > >>>>>called ada8 and so on... How I interpret that: "when I have a drive bay that's not populated, or a SATA port that has nothing on it, the /dev/adaX numbering changes or shifts by N. That's frustrating!" He's trying to ensure 1:1 static device numbering. The way to do that in CAM is "wiring down". There are lots of methods to avoid using the adaX/daX/etc. nomenclature -- labels of course! -- but like I said those just exchange one problem for another. I do wish there was some intelligent way in software to accomplish the "wiring down" method without having to do loader.conf modifications, I just don't know how software would be able to make those kinds of decisions. > Using a unique identifier already built into those drives helps. > Serial numbers are unique, built into the drive, and even printed on > the paper label. They can be queried through software and take no > disk space. If a drive fails electronically to the point it can't > be queried, that serial number can be identified from a current list > of all the drive serial numbers in the array--it's the one not > there. How does that serial number correlate with anything physical though? CAM's "wired down" method allows you to correlate a number (device number) with something physical? If what you're saying is "we should have something *like* labels, but instead not a label at all, instead use what's already there (WWN or serial number or something generated off of serno+modelstring+etc.)" then yeah, I'm hearing you on FM. :-) > There are problems, they aren't like LEDs on each drive that could > flash to identify it. Some enclosures don't make drive labels easy > to see. Some of that can be addressed with labels. Er, sticky > labels, on the outside of the drive or enclosure. And serial > numbers are often inconveniently long. On my Supermicro SC733T chassis, I ended up using a label maker to print out labels that read "ada0", "ada1", etc. and placed them next to each respective hot-swap bay. Not a lot of people know about /dev/led/ (see ahci(4), search for "LED"). SGPIO does what you want (see SFF-8485 per Seagate). It's wonderful except when someone doesn't play nice -- you know what they say about standards..... > >As for a unique number per disk, disks within the past ~5 years (SATA, > >SAS, and some SCSI) all tend to have this: it's called a WWN: > > > >http://en.wikipedia.org/wiki/World_Wide_Name > > > >But older ATA disks (and by older I don't mean ancient, I mean even > >semi-old) may not have this, which means you get to use something else. > >UUIDs come to mind, but then the question becomes what do you base the > >generation off of? Model string + serial number + firmware? > > > >There are also complexities depending on HBAs (disk controllers) as > >well; I've seen references, at least on Solaris, of people having one > >disk showing up twice across 2 separate controllers (i.e. only 1 > >physical disk in the machine, but showing up as both c8d0 and c9d0, both > >with the same model string and serial number). I imagine some RAID > >controllers would do this (when a drive isn't part of an array; it might > >show up as both /dev/adaX and /dev/somedriverX). I know at some point I > >saw this with FreeBSD too during an OS install, I just can't remember > >what the names were that I saw. > > Surely that ought to be considered a bug. Any drive ID system is > going to be vulnerable to certain I think we just see differently on the matter. Here's a sort of inverted example using famous Intel ICHxxR controllers: You enable RAID mode in your PC BIOS, knowing FreeBSD has GEOM_RAID support. You have 2 disks, and you want to RAID-0 them. You go into the option ROM and assign disk 0 and disk 1 to a volume. Now you boot FreeBSD to install the OS and are greeted to install it on one of 3 disks: raid/r0, ada0, and ada1. Surprise! On the opposite side, I've seen HBAs which to accomplish JBOD capability require you to assign each disk as a RAID-0 array. 5 disks, each RAID-0, resulting in FreeBSD showing 5 separate volumes. Failure to assign them as RAID-0 (i.e. leaving them blank or "------") depends on the controller and driver too (some show no drives on the bus in this case). Surprise! Again! :-) Most of this is just the nature of the beast with storage. > >Linux has by-uuid and by-id (the latter is what you'd like), but there > >are caveats to that too: > > > >https://wiki.archlinux.org/index.php/Persistent_block_device_naming > >http://www.terabyteunlimited.com/kb/article.php?id=389 > > > >So at the end of the day I prefer CAM's "wired down" method -- the > >reason is that by modifying loader.conf I **know for sure** bay/cable X > >maps to /dev/adaX, and it's a one-time deal until I decide to move from > >my ICH9 controller to, say, an Areca. > > That illustrates one problem with making the configuration specific > to host hardware as compared to drive specific. You might have missed "by-id" on the first link -- it's based off some kind of number (I can't figure out if it's truly disk serial number or WWN). > As far as "best practices", situations vary so much that I don't > know if any drive ID method can be recommended. For a FreeBSD ZFS > document, a useful sample configuration is going to be small enough > that anything would work. A survey of the techniques in use at > various data centers would be interesting. I'd agree here wholeheartedly. And I would find such a survey very interesting too. My money would be on on /dev/adaX or similar being used as a majority. This certainly stems from a) lack of education about labels (people often don't know they exist), b) too many choices (UFS labels, GEOM labels, GPT labels, and I think I'm missing one), and c) confusion over what label and utility correlates with what. For (c), example: gpart(8) talks about label support "for partitions that support them", lists off partitioning methods it supports, but doesn't tell you which ones support labels. Those are **partition** labels, by the way; don't confuse those with, say, UFS labels. I'll give you one reason why /dev/adaX or similar conventions win out, and I'll bring ZFS into the picture: Say you have a raidz1 pool of ada1/2/3. ada2 fails (you no longer can read ANY data off the drive). You go out to the datacenter with a replacement disk, yank the old, insert the new, and issue "zpool replace pool ada2". You leave. (And if you use autoreplace=yes you don't even need the last step). Done. Now let's say you use a "labelled" setup (vs. raw disks), so using GEOM or GPT labels. You do the exact same thing, but instead of the last step and leaving you have to go fiddling around with glabel/gpart, naming things the same, "hoping" you have that documented somewhere. You end up dumping data from another (working) disk in the pool, saying "ahh right", do you best to mimic it -- all while hoping you don't make typos since the datacenter is freezing cold, hoping you get it right; get it wrong and you might have to start over from the beginning ("oh god, gpart destroy and..."). And if your machine has no network access at that time, firing up lynx/w3m to look online is out of the question. Now pretend for a moment we have something like /dev/wwn/xxx for WWN support (or something similar for serial numbers -- doesn't matter). Yank old, insert new, and issue "zpool replace pool... uhh, wait". You then have to go fiddling around to find the WWN. Let's say you find it quickly. "zpool replace pool /dev/wwn/old /dev/wwn/new" (note the additional parameter). You leave. KISS principle wins out for me, as someone who did co-located hosting for over 17 years. But I'm certain there are outfits who heavily use labelling for a lot of reasons (/dev naming consistency across multiple hardware systems comes to mind; "I want /dev/label/snakes regardless if I'm using an aac(4) controller or a siis(4) controller!"). Anyway, taken too much time to write this mail, have other things to do tonight. :-) -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |