From owner-freebsd-fs@FreeBSD.ORG Tue Apr 9 11:52:03 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 38AF428E for ; Tue, 9 Apr 2013 11:52:03 +0000 (UTC) (envelope-from tevans.uk@googlemail.com) Received: from mail-la0-x229.google.com (mail-la0-x229.google.com [IPv6:2a00:1450:4010:c03::229]) by mx1.freebsd.org (Postfix) with ESMTP id BAC4F33D for ; Tue, 9 Apr 2013 11:52:02 +0000 (UTC) Received: by mail-la0-f41.google.com with SMTP id er20so1230154lab.28 for ; Tue, 09 Apr 2013 04:52:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=9ESf9FC4iRlKb3EfZkSGxosLNSlpWhg2McMQZpcNdCM=; b=KAskdekUEAaWQBa1pJL7CC1DHXlg+hIFMqXJCxFEAXdapmtptnfZxql4rK7XyvIPJP mJd0XbcI6fC0X1TdOD1ITROyB8X3Cz/hYyBWTeRyfHp4xoA0Q+xaJ+1LnvMccVMnFDnk JIV4/HVRzRoBfFy9SpD9HwJPVOpo6vSBiagTXZXSU9REf48D/0R3fZ/Iv2fib5m/N819 516MXQxzehTI7jdv/60EajavIMBfR3OuDLmLVPtWu2kdK8eDuyZtUBH0I9l236uJHhIb cZS754iOugqDjgDrFNamPItrP5TOOhXtu40kjp4uecJ/bYMi8ENCoO5LbF5Dufji1tqS aSNA== MIME-Version: 1.0 X-Received: by 10.112.128.231 with SMTP id nr7mr1643lbb.26.1365508321570; Tue, 09 Apr 2013 04:52:01 -0700 (PDT) Received: by 10.112.198.201 with HTTP; Tue, 9 Apr 2013 04:52:01 -0700 (PDT) In-Reply-To: <5163F03B.9060700@sneakertech.com> References: <2092374421.4491514.1365459764269.JavaMail.root@k-state.edu> <5163F03B.9060700@sneakertech.com> Date: Tue, 9 Apr 2013 12:52:01 +0100 Message-ID: Subject: Re: ZFS: Failed pool causes system to hang From: Tom Evans To: Quartz Content-Type: text/plain; charset=UTF-8 Cc: FreeBSD FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Apr 2013 11:52:03 -0000 On Tue, Apr 9, 2013 at 11:40 AM, Quartz wrote: > >> So, you're not really waiting a long time.... > > > I still don't think you're 100% clear on what's happening in my case. I'm > trying to explain that my problem is *prior* to the motherboard resetting, > NOT after. If I hard-reset the machine with the front panel switch, it boots > just fine every time. > > When my pool *FAILS* (ie; is unrecoverable because I lost too many drives) > it hangs effectively all io on the entire machine. I can't cd or ls > directories, I can't run any zfs commands, and I can't issue a reboot or > halt. This is a hang. The machine is completely useless in this state. There > is no disk or cpu activity churning. There's no pool (anymore) to be trying > to resilver or whatever anyway. > > I'm not going to wait 3+ hours for "shutdown -r now" to bring the machine > down. Especially not when I already know that zfs won't let it. > I think what Lawrence is trying to explain is that a "hang" is not necessarily a deadlock. Leaving the system for an extended period may bring it back. What you are saying is also valid, that a hang that long is equivalent to a deadlock in your usage. Computers, even essential dedicated servers sometimes hang, which is why it is common to have some way of remotely power cycling. If your server is important, you need some sort of RAC for these scenarios. So, how to find out where the hang is. Your ZFS pools and your root disk probably - I've not seen a dmesg - share one thing in common, ATA/AHCI. If root does not also use this, does losing the pool still cause problems with root? Perhaps breaking into ddb at this point could tell us something. Cheers Tom