From owner-freebsd-current@FreeBSD.ORG Tue Nov 6 14:34:25 2007 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5D46816A418 for ; Tue, 6 Nov 2007 14:34:25 +0000 (UTC) (envelope-from mcdouga9@egr.msu.edu) Received: from mx.egr.msu.edu (surfnturf.egr.msu.edu [35.9.37.164]) by mx1.freebsd.org (Postfix) with ESMTP id DE19613C4AC for ; Tue, 6 Nov 2007 14:34:24 +0000 (UTC) (envelope-from mcdouga9@egr.msu.edu) Received: from localhost (localhost.egr.msu.edu [127.0.0.1]) by mx.egr.msu.edu (Postfix) with ESMTP id 5FB762EB922 for ; Tue, 6 Nov 2007 09:34:14 -0500 (EST) X-Virus-Scanned: amavisd-new at egr.msu.edu Received: from mx.egr.msu.edu ([127.0.0.1]) by localhost (surfnturf.egr.msu.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id YU5x2ZvE1MkW for ; Tue, 6 Nov 2007 09:34:14 -0500 (EST) Received: from localhost (daemon.egr.msu.edu [35.9.44.65]) by mx.egr.msu.edu (Postfix) with ESMTP id 1FB5D2EB921 for ; Tue, 6 Nov 2007 09:34:14 -0500 (EST) Received: by localhost (Postfix, from userid 21281) id 01E6333C3D; Tue, 6 Nov 2007 09:34:14 -0500 (EST) Date: Tue, 6 Nov 2007 09:34:13 -0500 From: Adam McDougall To: freebsd-current@freebsd.org Message-ID: <20071106143413.GC7920@egr.msu.edu> References: <200711021208.25913.Thomas.Sparrevohn@btinternet.com> <200711041423.54336.Thomas.Sparrevohn@btinternet.com> <472DDEA2.7080804@FreeBSD.org> <200711050041.38229.Thomas.Sparrevohn@btinternet.com> <472EE13E.9030908@FreeBSD.org> <20071105170508.GA4037@egr.msu.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071105170508.GA4037@egr.msu.edu> User-Agent: Mutt/1.5.16 (2007-06-09) Subject: Re: ZFS Hangs X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 Nov 2007 14:34:25 -0000 On Mon, Nov 05, 2007 at 12:05:08PM -0500, Adam McDougall wrote: On Mon, Nov 05, 2007 at 10:24:14AM +0100, Kris Kennaway wrote: Thomas Sparrevohn wrote: >> On Sunday 04 November 2007 15:00:50 Kris Kennaway wrote: >>> http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug.html >>> >> Oh my god - Overlooked that ;-) - funny that - Its a bit tricky as it not >> possibly to dump a kernel >> when the swap is on ZFS - I did a test with all debugging enabled and the >> problem >> did not show up - which makes it somewhat nasty - I check if I can >> reproduce it with only DDB enabled You can still hook up a serial console, or at the very least take photographs of the screen with the relevant DDB information. Or add another disk and dump on that. Kris I have some screenshots of ps in ddb from one of several zfs hangs I've had on one amd64 system: http://www.egr.msu.edu/~mcdouga9/pics/zfs/ I didn't post every single screenful since I don't have a microsd reader handy, and emailing the pictures off my phone is painful. If I missed a screenshot of one or more particular processes that might have a telling state, let me know. I also have a gzipped kernel + dump from a forced panic when it was in this state, if a developer is interested in it please let me know so I can post it somewhere private since the system is in NIS and likely has tables cached in memory. It is running a kernel from Oct 17. I tried a kernel with WITNESS, INVARIANTS etc but it did the same hang without any panic. I completed a zpool scrub this morning with no errors. Lately zfs seems to wedge up every single night when rsync from remote servers run. This is the only amd64 system I have zfs on, the other two are i386 and the problems on those systems have only been kmem panics which so far have been avoidable. I can help by checking somewhat specific things and running prescribed tests, but right now I don't have time to tackle this problem on this system and learn how to debug it entirely on my own starting with nothing more than a DDB guide from the handbook. Its not that I refuse to; I recognize its difficult to join remote skill with local hands for something this technical. Sorry if I seemed negetive or unhelpful, I will try on my own if I have time but I'm pretty busy lately. On a hunch from other past emails, I tried turning off ZIL and so far it survived the night, rsync is still running. The only other change I did was running the zpool scrub yesterday (no fixes were needed) and I applied the patch to make more of the zfs process states visible in top. I've rebooted several times (each time after zfs hung) so uptime isn't an issue, but for every day rsync doesn't finish, the next day's rsync might has more updates because it missed a day. Friday I replaced the motherboard/cpu just as a shot in the dark (since the system had some strange instability in the past) but this didn't help zfs (not surprised). When zfs was hung saturday morning, I tried to reboot it but reboot would not even get far enough to stop new ssh connections. _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"