From owner-freebsd-fs@FreeBSD.ORG Sat Apr 13 20:59:59 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 7ECEC236 for ; Sat, 13 Apr 2013 20:59:59 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from relay00.pair.com (relay00.pair.com [209.68.5.9]) by mx1.freebsd.org (Postfix) with SMTP id 21513137A for ; Sat, 13 Apr 2013 20:59:58 +0000 (UTC) Received: (qmail 92824 invoked by uid 0); 13 Apr 2013 20:59:52 -0000 Received: from 173.48.104.62 (HELO ?10.2.2.1?) (173.48.104.62) by relay00.pair.com with SMTP; 13 Apr 2013 20:59:52 -0000 X-pair-Authenticated: 173.48.104.62 Message-ID: <5169C747.8030806@sneakertech.com> Date: Sat, 13 Apr 2013 16:59:51 -0400 From: Quartz User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: Jeremy Chadwick Subject: Re: A failed drive causes system to hang References: <51672164.1090908@o2.pl> <20130411212408.GA60159@icarus.home.lan> <5168821F.5020502@o2.pl> <20130412220350.GA82467@icarus.home.lan> <516917CA.5040607@sneakertech.com> <20130413154130.GA877@icarus.home.lan> In-Reply-To: <20130413154130.GA877@icarus.home.lan> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 13 Apr 2013 20:59:59 -0000 > This is what happens when end-users start to try and "correlate" issues > to one another's without actually taking the time to fully read the > thread and follow along actively. He was experiencing a system hang, which appeared to be related to zfs and/or cam. I'm experiencing a system hang, which appears to be related to zfs and/or cam. I am in fact following along with this thread. > Your issue: "on my raidz2 pool, when I lose more than 2 disks, I/O to > the pool stalls indefinitely, Close, but not quite- Yes, io to the pool stalls, but io in general also stalls. It appears the problem possibly doesn't start until there's io traffic to the pool though. >but I can still use the system barring > ZFS-related things; No. I've responded to this misconception on your part more than once- I *CANNOT* use the system in any reliable way, random commands fail. I've had it hang trying cd from one dir on the boot volume to another dir on the boot volume. The only thing I can *reliably* do is log in. Past that point all bets are off. >I don't know how to get the system back into a > usable state from this situation" "...short of having to hard reset", yes. > Else, all you've provided so far is a general explanation. You have > still not provided concise step-by-step information like I've asked. *WHAT* info? You have YET TO TELL ME WHAT THE CRAP YOU ACTUALLY NEED from me. I've said many times I'm perfectly willing to give you logs or run tests, but I'm not about to post a tarball of my entire drive and output of every possible command I could ever run. For all the harping you do about "not enough info" you're just as bad yourself. > I've gone so far as to give you an example of what to provide: > > http://lists.freebsd.org/pipermail/freebsd-fs/2013-March/016814.html The only thing there you ask for is a dmesg, which I subsequently provided. Nowhere in that thread do you ask me to give you *anything* else, besides your generic mantra of "more info". And yes, I did read it again just now three times over to make sure. The closest you come is: "This is why hard data/logs/etc. are necessary, and why every single step of the way needs to be provided, including physical tasks performed." ... but you still never told me WHICH logs or WHAT data you need. I've already given you the steps I took re: removing drives, steps which *you yourself* confirmed to express the problem. > I will again point to the 2nd-to-last paragraph of my above referenced > mail. The "2nd-to-last paragraph" is: "So in summary: there seem to be multiple issues shown above, but I can confirm that failmode=continue **does** pass EIO to *running* processes that are doing I/O. Subsequent I/O, however, is questionable at this time." Unless you're typing in a language other than english, that isn't asking me jack shit. > Once concise details are given and (highly preferable!) a step-by-step > way to reproduce the issue 100% of the time *YOU'VE ALREADY REPRODUCED THIS ON YOUR OWN MACHINE.* Seriously, wtf? ______________________________________ it has a certain smooth-brained appeal