From owner-freebsd-fs@FreeBSD.ORG Thu Mar 21 18:11:07 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 3E96E5F7 for ; Thu, 21 Mar 2013 18:11:07 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from relay00.pair.com (relay00.pair.com [209.68.5.9]) by mx1.freebsd.org (Postfix) with SMTP id 073D915F for ; Thu, 21 Mar 2013 18:11:06 +0000 (UTC) Received: (qmail 78625 invoked by uid 0); 21 Mar 2013 18:11:04 -0000 Received: from 173.48.104.62 (HELO ?10.2.2.1?) (173.48.104.62) by relay00.pair.com with SMTP; 21 Mar 2013 18:11:04 -0000 X-pair-Authenticated: 173.48.104.62 Message-ID: <514B4D38.6090101@sneakertech.com> Date: Thu, 21 Mar 2013 14:11:04 -0400 From: Quartz User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: Jeremy Chadwick Subject: Re: ZFS: Failed pool causes system to hang References: <20130321044557.GA15977@icarus.home.lan> <514AA192.2090006@sneakertech.com> <20130321085304.GB16997@icarus.home.lan> In-Reply-To: <20130321085304.GB16997@icarus.home.lan> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Mar 2013 18:11:07 -0000 >> I'm not >> messing with partitions yet because I don't want to complicate things. >> (I will eventually be going that route though as the controller tends >> to renumber drives in a first-come-first-serve order that makes some >> things difficult). > > Solving this is easy, WITHOUT use of partitions or labels. There is a > feature of CAM(4) called "wired down" or "wiring down", where you can in > essence statically map a SATA port to a static device number regardless > if a disk is inserted at the time the kernel boots My wording implied the wrong thing here: the dev ID mapping issue is *one* of the reasons I'm going to go with partitions. Another other being the "replacement disk is one sector too small" issue, and that gpt labels give me the ability to reference drives by arbitrary string, which makes it easier because I don't have to remember which dev ID corresponds to which physical bay. I probably want to know about this trick anyway though, it looks useful. > I can help you with this, but I need to see a dmesg (everything from > boot to the point mountroot gets done). Can do, but I'll need to reinstall again first. Gimmie a little while. >> I'm experiencing fatal issues with pools hanging my machine requiring a >> hard-reset. > > This, to me, means something very different than what was described in a > subsequent follow-up: Well, what I meant here is that when the pool fails it takes the entire machine down with it in short order. Having a machine become unresponsive and require a panel-button hard reset (with subsequent fsck-ing and possible corruption) counts as a fatal problem in my book. I don't accept this type of behavior in *any* system, even a windows desktop. > S1. In your situation, when a ZFS pool loses enough vdev or vdev members > to cause permanent pool damage (as in completely 100% unrecoverable, > such as losing 3 disks of a raidz2 pool), any I/O to the pool results in > that applications hanging. Sorta. Yes the command I issued hangs, but so do a lot of other things as well. I can't kill -9 any of them or reboot or anything. >The system is still functional/usable (e.g. > I/O to other pools and non-ZFS filesystems works fine), Assuming I do those *first*. Once something touches the pool, all bets are off. 'ps' and 'top' seem safe, but things like 'cd' are a gamble. Admittedly though, I haven't spent any time testing exactly what does and doesn't work and if there's a pattern to it. > A1. This is because "failmode=wait" on the pool, which is the default > property value. This is by design; there is no ZFS "timeout" for this > sort of thing. "failmode=continue" is what you're looking for (keep > reading). > > S2. If the pool uses "failmode=continue", there is no change in > behaviour, (i.e. EIO is still never returned). > > A2. That sounds like a bug then. I test your claim below, and you might > be surprised at the findings. As far as I'm aware, "wait" will hang all i/o read or write, whereas "continue" is supposed to hang only write. My problem (as near I can tell) is that nothing informs or limits processes from trying to write to the pool, so "continue" effectively only delays the inevitable by several seconds. > S3. If the previously-yanked disks are reinserted, the issue remains. > > A3. What you're looking for is the "autoreplace" pool property. No it's not. I *don't* want the pool trying to suck up a freshly inserted drive without my explicit say so. I only mentioned this because some other thread I was reading implied that zfs would come back to life if it could talk to the drive again. > And in the other window where dd is running, it immediately terminates > with EIO: IIRC I only tried popping a third disk during activity once... It was during scp from another machine and it just paused. During all other tests, I've waited to make sure everything settles down first. > One thing to note (and it's important) above is that da2 is still > considered "ONLINE". More on that in a moment. Yeah I noticed that in my testing. > root@testbox:/root # zpool replace array da2 > cannot open 'da2': no such GEOM provider > must be a full path or shorthand device name > > This would indicate a separate/different bug, probably in CAM or its > related pieces. I don't even get as far as this. Most of the time, once something caused the hang, not a lot works past that point. Assuming I followed your example to the letter and typed 'ls' first, 'zpool replace' would have just hung as well without printing anything. > I'll end this Email with (hopefully) an educational statement: I hope > my analysis shows you why very thorough, detailed output/etc. needs to > be provided when reporting a problem, and not just some "general" > description. This is why hard data/logs/etc. are necessary, and why > every single step of the way needs to be provided, including physical > tasks performed. Oh I agree, but etiquette dictates I don't spam people with 5kb of unsolicited text including every possible detail about everything, especially when I'm not even sure if it's the right mailing list. > P.S. -- I started this Email at 23:15 PDT. It's now 01:52 PDT. To whom > should I send a bill for time rendered? ;-) Ha, I think I have you beat there :) I'll frequently spend hours writing single emails. ______________________________________ it has a certain smooth-brained appeal