Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 13 Apr 2013 19:02:05 -0400
From:      Quartz <quartz@sneakertech.com>
To:        Jeremy Chadwick <jdc@koitsu.org>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: A failed drive causes system to hang
Message-ID:  <5169E3ED.1000900@sneakertech.com>
In-Reply-To: <20130413213630.GA6018@icarus.home.lan>
References:  <mailman.11.1365681601.78138.freebsd-fs@freebsd.org> <51672164.1090908@o2.pl> <20130411212408.GA60159@icarus.home.lan> <5168821F.5020502@o2.pl> <20130412220350.GA82467@icarus.home.lan> <516917CA.5040607@sneakertech.com> <20130413154130.GA877@icarus.home.lan> <5169C747.8030806@sneakertech.com> <20130413213630.GA6018@icarus.home.lan>

next in thread | previous in thread | raw e-mail | index | archive | help
> This conflicts directly with your above statement.

Yeah, I realized that after looking things over again. See the email I 
sent a few minutes ago.

I'm going to do some testing to get a bead on exactly what does and 
doesn't fail, and when. It's a slow process though.


> If you have other proof that indicates otherwise (such as non-ZFS
> filesystems start also stalling/causing problems), please provide those
> details.

The main counter example I have of this is that I know I had it hang 
when trying to cd to my home directory. I've also had it hang when 
running random other commands I wouldn't expect to be zfs related, but I 
don't remember exactly all the things I typed and how long after I 
popped the drives it happened, so I'm going to try and figure out what 
the pattern is.


>But as it stands, we don't even know what the "boot drive"
> consists of (filesystems, etc.) because you haven't provided any of that
> necessary information.

Yes I did, on Tuesday in fact, but I'll repeat it here again since you 
obviously missed it. It's a single ufs disk that houses the entirety of 
the system. Root, var, home, /etc, swap partition... all of it. It's not 
any form of raid or dual boot or anything special, just a stock 
single-disk default install with no custom config.


> What you've stated happens differs from the above,

Everything you wrote is the same thing I'm seeing, which is why I 
considered you to have confirmed the base problem. The only things that 
differ are that "zpool status" and other zfs related commands are not 
guaranteed to work (or at least not indefinitely), and "all io on the 
boot drive works for me guaranteed" (which you didn't list here but I'm 
assuming is implied).


> The paragraph I was referring to:

[pedantic]
Ok, that's the last paragraph. The thing after it is a postscript, which 
isn't counted.
[/pedantic]


> All I was able to reproduce was that I/O ***to the pool*** (once its
> broken) stalls.  I'm still waiting on you to go through the same
> method/model I did here, providing all the data:

> you'll see that I go "step by step" looking at certain things,

> and this is why I
> keep asking for you to please go step-by-step in reproducing your issue,
> provide all output (including commands you issue), and all physical
> tasks performed, plus what the console shows each step of the way.

> So can you
> please go through the same procedure/methodology and do the same
> write-up I did, but with your system?

Look, I can't read your mind any more than you can read mine. I can do 
"run command xyz -abc and tell me what FOO is set to", or "yank each 
drive in order by /dev ID, wait one second and run zpool status". I 
can't do "find all the important stuffs", and I'm not going to 
automatically assume that just because someone ran some test or series 
of commands that they silently expect me to do the same. If you wanted 
me to run through those exact steps in that exact order and give you the 
output, it would've been nice if you'd actually *said* that at some point.


-

Although you seem to want to disagree with me, you did confirm the main 
issue in that you're also seeing a zfs related hang. The point of 
contention is my observation that zfs commands and non-zfs-related io 
may also fail after some undefined period of time. It's not clear to me 
how long you had your test system up and if you waited long enough to 
see the same issues. Either way, I need to do more testing to try and 
figure out if there's a pattern here.

______________________________________
it has a certain smooth-brained appeal



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5169E3ED.1000900>