From owner-freebsd-current@FreeBSD.ORG Mon Nov 5 17:05:21 2007 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3797F16A41A; Mon, 5 Nov 2007 17:05:21 +0000 (UTC) (envelope-from mcdouga9@egr.msu.edu) Received: from mx.egr.msu.edu (surfnturf.egr.msu.edu [35.9.37.164]) by mx1.freebsd.org (Postfix) with ESMTP id 0005113C49D; Mon, 5 Nov 2007 17:05:20 +0000 (UTC) (envelope-from mcdouga9@egr.msu.edu) Received: from localhost (localhost.egr.msu.edu [127.0.0.1]) by mx.egr.msu.edu (Postfix) with ESMTP id ACF252EB932; Mon, 5 Nov 2007 12:05:08 -0500 (EST) X-Virus-Scanned: amavisd-new at egr.msu.edu Received: from mx.egr.msu.edu ([127.0.0.1]) by localhost (surfnturf.egr.msu.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id x86H2E16wqaK; Mon, 5 Nov 2007 12:05:08 -0500 (EST) Received: from localhost (daemon.egr.msu.edu [35.9.44.65]) by mx.egr.msu.edu (Postfix) with ESMTP id 84C062EB90B; Mon, 5 Nov 2007 12:05:08 -0500 (EST) Received: by localhost (Postfix, from userid 21281) id 6F76233C3D; Mon, 5 Nov 2007 12:05:08 -0500 (EST) Date: Mon, 5 Nov 2007 12:05:08 -0500 From: Adam McDougall To: Kris Kennaway Message-ID: <20071105170508.GA4037@egr.msu.edu> References: <200711021208.25913.Thomas.Sparrevohn@btinternet.com> <200711041423.54336.Thomas.Sparrevohn@btinternet.com> <472DDEA2.7080804@FreeBSD.org> <200711050041.38229.Thomas.Sparrevohn@btinternet.com> <472EE13E.9030908@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <472EE13E.9030908@FreeBSD.org> User-Agent: Mutt/1.5.16 (2007-06-09) Cc: freebsd-current@freebsd.org Subject: Re: ZFS Hangs X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 05 Nov 2007 17:05:21 -0000 On Mon, Nov 05, 2007 at 10:24:14AM +0100, Kris Kennaway wrote: Thomas Sparrevohn wrote: >> On Sunday 04 November 2007 15:00:50 Kris Kennaway wrote: >>> http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug.html >>> >> Oh my god - Overlooked that ;-) - funny that - Its a bit tricky as it not >> possibly to dump a kernel >> when the swap is on ZFS - I did a test with all debugging enabled and the >> problem >> did not show up - which makes it somewhat nasty - I check if I can >> reproduce it with only DDB enabled You can still hook up a serial console, or at the very least take photographs of the screen with the relevant DDB information. Or add another disk and dump on that. Kris I have some screenshots of ps in ddb from one of several zfs hangs I've had on one amd64 system: http://www.egr.msu.edu/~mcdouga9/pics/zfs/ I didn't post every single screenful since I don't have a microsd reader handy, and emailing the pictures off my phone is painful. If I missed a screenshot of one or more particular processes that might have a telling state, let me know. I also have a gzipped kernel + dump from a forced panic when it was in this state, if a developer is interested in it please let me know so I can post it somewhere private since the system is in NIS and likely has tables cached in memory. It is running a kernel from Oct 17. I tried a kernel with WITNESS, INVARIANTS etc but it did the same hang without any panic. I completed a zpool scrub this morning with no errors. Lately zfs seems to wedge up every single night when rsync from remote servers run. This is the only amd64 system I have zfs on, the other two are i386 and the problems on those systems have only been kmem panics which so far have been avoidable. I can help by checking somewhat specific things and running prescribed tests, but right now I don't have time to tackle this problem on this system and learn how to debug it entirely on my own starting with nothing more than a DDB guide from the handbook. Its not that I refuse to; I recognize its difficult to join remote skill with local hands for something this technical. Friday I replaced the motherboard/cpu just as a shot in the dark (since the system had some strange instability in the past) but this didn't help zfs (not surprised). When zfs was hung saturday morning, I tried to reboot it but reboot would not even get far enough to stop new ssh connections.