From owner-freebsd-fs@FreeBSD.ORG Thu Jul 4 14:56:10 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id AC641DA4 for ; Thu, 4 Jul 2013 14:56:10 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from relay3-d.mail.gandi.net (relay3-d.mail.gandi.net [217.70.183.195]) by mx1.freebsd.org (Postfix) with ESMTP id 521B31AC0 for ; Thu, 4 Jul 2013 14:56:10 +0000 (UTC) Received: from mfilter25-d.gandi.net (mfilter25-d.gandi.net [217.70.178.153]) by relay3-d.mail.gandi.net (Postfix) with ESMTP id 57580A80D9; Thu, 4 Jul 2013 16:55:53 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at mfilter25-d.gandi.net Received: from relay3-d.mail.gandi.net ([217.70.183.195]) by mfilter25-d.gandi.net (mfilter25-d.gandi.net [10.0.15.180]) (amavisd-new, port 10024) with ESMTP id l27hw-nSdhOq; Thu, 4 Jul 2013 16:55:51 +0200 (CEST) X-Originating-IP: 76.102.14.35 Received: from jdc.koitsu.org (c-76-102-14-35.hsd1.ca.comcast.net [76.102.14.35]) (Authenticated sender: jdc@koitsu.org) by relay3-d.mail.gandi.net (Postfix) with ESMTPSA id A99B0A80C4; Thu, 4 Jul 2013 16:55:50 +0200 (CEST) Received: by icarus.home.lan (Postfix, from userid 1000) id C776573A1C; Thu, 4 Jul 2013 07:55:48 -0700 (PDT) Date: Thu, 4 Jul 2013 07:55:48 -0700 From: Jeremy Chadwick To: Travis Mikalson Subject: Re: Report: ZFS deadlock in 9-STABLE Message-ID: <20130704145548.GA91766@icarus.home.lan> References: <51D45401.5050801@terranova.net> <51D5776F.5060101@FreeBSD.org> <51D57C19.1080906@terranova.net> <51D5804B.7090702@FreeBSD.org> <51D586F9.7060508@terranova.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <51D586F9.7060508@terranova.net> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 14:56:10 -0000 On Thu, Jul 04, 2013 at 10:30:17AM -0400, Travis Mikalson wrote: > > > Andriy Gapon wrote: > > on 04/07/2013 16:43 Travis Mikalson said the following: > >> Yes, that helpful article is where I got the run-down on how best to > >> report what was going on here. I still believe this is an actual > >> deadlock bug and not a storage layer issue. > >> > >> I have not seen any indications of any problems with my storage layer. > >> You'd think there would be some scary-looking complaint on the console > >> during one of these deadlocks if it had suddenly lost the capability to > >> communicate with most or all the disks, but I've deadlocked at least 10 > >> times now in 2013 and never anything of the sort. Thanks to IPMI, I have > >> actually viewed the console each time it has happened. > > > > Well, I do consider GEOM, CAM, drivers to be parts of the storage layer. > > In other words, everything below ZFS. > > Ah, I believe I understand. It's not necessarily a hardware issue (which > is what I took away from the original verbage), the deadlock may have > occurred in other parts of the storage layer. > > FWIW, my simple UFS compact flash that I boot from also becomes > inaccessible during these deadlocks. All UFS and ZFS storage goes dead > simultaneously. If it were purely a ZFS issue, I suppose one might > expect to still be able to read from their UFS filesystem. I'd like to get output from all of these commands: - dmesg (you can hide/XXX out the system name if you want, but please don't remove anything else, barring IP addresses/etc.) - zpool get all - zfs get all - "gpart show -p" for every disk on the system - "vmstat -i" when the system is livelocked (if possible; see below) - The exact brand and model string of mps(4) controllers you're using - The exact firmware version and firmware type (often a 2-letter code) you're using on your mps(4) controllers (dmesg might show some of this but possibly not all) - Is powerd(8) running on this system at all? Please put these in separate files and upload them to http://tog.net/freebsd/ if you could. (For the gpart output, you can put all the output from all the disks in a single file) I can see your ZFS disks are probably using those mps(4) controllers. I also see you have an AHCI controller. I know you can't move all your disks to the AHCI controller due to there not being enough ports, and the controller might not even work with SAS disks (depends, some newer/higher end Intel ones do), but: A "CF drive locking up too" doesn't really tell us anything about the CF drive, how it's hooked up, etc... But I'd rather not even go into that, because: Advice: Hook a SATA disk up to your ahci(4) controller and just leave it there. No filesystem, just a raw disk sitting on a bus. When the livelock happens, in another window issue "dd if=/dev/ada0 of=/dev/null bs=64k" (disk might not be named ada0; again, need that dmesg) and after a second or two press Ctrl-T to see if you get any output (output should be immediate). If you do get output, it means GEOM and/or CAM are still functional in some manner, and that puts more focus on the mps(4) side of things. There are still nearly infinite explanations for what's going on though. Which leads me to... Question: If the system is livelocked, how are you running "procstat -kk -a" in the first place? Or does it "livelock" and then release itself from the pain (eventually), only later to re-lock? A "livelock" usually implies the system is alive in some way (hitting NumLock on the keyboard (hopefully PS/2) still toggles the LED (kernel does this -- I've used this as a way to see if a system is locked up or not for years)) just that some layer pertaining to your focus (ZFS I/O) is wonky. If it comes and goes, there may be some explanations for that, but output from those commands would greatly help. Question: What's with the tunings in loader.conf and sysctl.conf for ZFS? Not saying those are the issue, just asking why you're setting those at all. Is there something we need to know about that you've run into in the past? -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB |