From owner-freebsd-fs@FreeBSD.ORG  Tue Apr  9 11:52:03 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 38AF428E
 for <freebsd-fs@freebsd.org>; Tue,  9 Apr 2013 11:52:03 +0000 (UTC)
 (envelope-from tevans.uk@googlemail.com)
Received: from mail-la0-x229.google.com (mail-la0-x229.google.com
 [IPv6:2a00:1450:4010:c03::229])
 by mx1.freebsd.org (Postfix) with ESMTP id BAC4F33D
 for <freebsd-fs@freebsd.org>; Tue,  9 Apr 2013 11:52:02 +0000 (UTC)
Received: by mail-la0-f41.google.com with SMTP id er20so1230154lab.28
 for <freebsd-fs@freebsd.org>; Tue, 09 Apr 2013 04:52:01 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=googlemail.com; s=20120113;
 h=mime-version:x-received:in-reply-to:references:date:message-id
 :subject:from:to:cc:content-type;
 bh=9ESf9FC4iRlKb3EfZkSGxosLNSlpWhg2McMQZpcNdCM=;
 b=KAskdekUEAaWQBa1pJL7CC1DHXlg+hIFMqXJCxFEAXdapmtptnfZxql4rK7XyvIPJP
 mJd0XbcI6fC0X1TdOD1ITROyB8X3Cz/hYyBWTeRyfHp4xoA0Q+xaJ+1LnvMccVMnFDnk
 JIV4/HVRzRoBfFy9SpD9HwJPVOpo6vSBiagTXZXSU9REf48D/0R3fZ/Iv2fib5m/N819
 516MXQxzehTI7jdv/60EajavIMBfR3OuDLmLVPtWu2kdK8eDuyZtUBH0I9l236uJHhIb
 cZS754iOugqDjgDrFNamPItrP5TOOhXtu40kjp4uecJ/bYMi8ENCoO5LbF5Dufji1tqS
 aSNA==
MIME-Version: 1.0
X-Received: by 10.112.128.231 with SMTP id nr7mr1643lbb.26.1365508321570; Tue,
 09 Apr 2013 04:52:01 -0700 (PDT)
Received: by 10.112.198.201 with HTTP; Tue, 9 Apr 2013 04:52:01 -0700 (PDT)
In-Reply-To: <5163F03B.9060700@sneakertech.com>
References: <2092374421.4491514.1365459764269.JavaMail.root@k-state.edu>
 <5163F03B.9060700@sneakertech.com>
Date: Tue, 9 Apr 2013 12:52:01 +0100
Message-ID: <CAFHbX1LO9OvbqyYYaob-7nQSA_dwQkMK7+vn9c4QrXQuKvTCFA@mail.gmail.com>
Subject: Re: ZFS: Failed pool causes system to hang
From: Tom Evans <tevans.uk@googlemail.com>
To: Quartz <quartz@sneakertech.com>
Content-Type: text/plain; charset=UTF-8
Cc: FreeBSD FS <freebsd-fs@freebsd.org>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 09 Apr 2013 11:52:03 -0000

On Tue, Apr 9, 2013 at 11:40 AM, Quartz <quartz@sneakertech.com> wrote:
>
>> So, you're not really waiting a long time....
>
>
> I still don't think you're 100% clear on what's happening in my case. I'm
> trying to explain that my problem is *prior* to the motherboard resetting,
> NOT after. If I hard-reset the machine with the front panel switch, it boots
> just fine every time.
>
> When my pool *FAILS* (ie; is unrecoverable because I lost too many drives)
> it hangs effectively all io on the entire machine. I can't cd or ls
> directories, I can't run any zfs commands, and I can't issue a reboot or
> halt. This is a hang. The machine is completely useless in this state. There
> is no disk or cpu activity churning. There's no pool (anymore) to be trying
> to resilver or whatever anyway.
>
> I'm not going to wait 3+ hours for "shutdown -r now" to bring the machine
> down. Especially not when I already know that zfs won't let it.
>

I think what Lawrence is trying to explain is that a "hang" is not
necessarily a deadlock. Leaving the system for an extended period may
bring it back. What you are saying is also valid, that a hang that
long is equivalent to a deadlock in your usage. Computers, even
essential dedicated servers sometimes hang, which is why it is common
to have some way of remotely power cycling. If your server is
important, you need some sort of RAC for these scenarios.

So, how to find out where the hang is. Your ZFS pools and your root
disk probably - I've not seen a dmesg - share one thing in common,
ATA/AHCI. If root does not also use this, does losing the pool still
cause problems with root? Perhaps breaking into ddb at this point
could tell us something.

Cheers

Tom