From owner-freebsd-fs@FreeBSD.ORG Sat Apr 13 23:02:08 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 321FA225 for ; Sat, 13 Apr 2013 23:02:08 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from relay01.pair.com (relay01.pair.com [209.68.5.15]) by mx1.freebsd.org (Postfix) with SMTP id CA14A1714 for ; Sat, 13 Apr 2013 23:02:07 +0000 (UTC) Received: (qmail 1687 invoked by uid 0); 13 Apr 2013 23:02:06 -0000 Received: from 173.48.104.62 (HELO ?10.2.2.1?) (173.48.104.62) by relay01.pair.com with SMTP; 13 Apr 2013 23:02:06 -0000 X-pair-Authenticated: 173.48.104.62 Message-ID: <5169E3ED.1000900@sneakertech.com> Date: Sat, 13 Apr 2013 19:02:05 -0400 From: Quartz User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: Jeremy Chadwick Subject: Re: A failed drive causes system to hang References: <51672164.1090908@o2.pl> <20130411212408.GA60159@icarus.home.lan> <5168821F.5020502@o2.pl> <20130412220350.GA82467@icarus.home.lan> <516917CA.5040607@sneakertech.com> <20130413154130.GA877@icarus.home.lan> <5169C747.8030806@sneakertech.com> <20130413213630.GA6018@icarus.home.lan> In-Reply-To: <20130413213630.GA6018@icarus.home.lan> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 13 Apr 2013 23:02:08 -0000 > This conflicts directly with your above statement. Yeah, I realized that after looking things over again. See the email I sent a few minutes ago. I'm going to do some testing to get a bead on exactly what does and doesn't fail, and when. It's a slow process though. > If you have other proof that indicates otherwise (such as non-ZFS > filesystems start also stalling/causing problems), please provide those > details. The main counter example I have of this is that I know I had it hang when trying to cd to my home directory. I've also had it hang when running random other commands I wouldn't expect to be zfs related, but I don't remember exactly all the things I typed and how long after I popped the drives it happened, so I'm going to try and figure out what the pattern is. >But as it stands, we don't even know what the "boot drive" > consists of (filesystems, etc.) because you haven't provided any of that > necessary information. Yes I did, on Tuesday in fact, but I'll repeat it here again since you obviously missed it. It's a single ufs disk that houses the entirety of the system. Root, var, home, /etc, swap partition... all of it. It's not any form of raid or dual boot or anything special, just a stock single-disk default install with no custom config. > What you've stated happens differs from the above, Everything you wrote is the same thing I'm seeing, which is why I considered you to have confirmed the base problem. The only things that differ are that "zpool status" and other zfs related commands are not guaranteed to work (or at least not indefinitely), and "all io on the boot drive works for me guaranteed" (which you didn't list here but I'm assuming is implied). > The paragraph I was referring to: [pedantic] Ok, that's the last paragraph. The thing after it is a postscript, which isn't counted. [/pedantic] > All I was able to reproduce was that I/O ***to the pool*** (once its > broken) stalls. I'm still waiting on you to go through the same > method/model I did here, providing all the data: > you'll see that I go "step by step" looking at certain things, > and this is why I > keep asking for you to please go step-by-step in reproducing your issue, > provide all output (including commands you issue), and all physical > tasks performed, plus what the console shows each step of the way. > So can you > please go through the same procedure/methodology and do the same > write-up I did, but with your system? Look, I can't read your mind any more than you can read mine. I can do "run command xyz -abc and tell me what FOO is set to", or "yank each drive in order by /dev ID, wait one second and run zpool status". I can't do "find all the important stuffs", and I'm not going to automatically assume that just because someone ran some test or series of commands that they silently expect me to do the same. If you wanted me to run through those exact steps in that exact order and give you the output, it would've been nice if you'd actually *said* that at some point. - Although you seem to want to disagree with me, you did confirm the main issue in that you're also seeing a zfs related hang. The point of contention is my observation that zfs commands and non-zfs-related io may also fail after some undefined period of time. It's not clear to me how long you had your test system up and if you waited long enough to see the same issues. Either way, I need to do more testing to try and figure out if there's a pattern here. ______________________________________ it has a certain smooth-brained appeal