From owner-freebsd-fs@FreeBSD.ORG Mon Feb 23 23:47:13 2009 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A755D1065772 for ; Mon, 23 Feb 2009 23:47:13 +0000 (UTC) (envelope-from k0802647@telus.net) Received: from defout.telus.net (defout.telus.net [204.209.205.13]) by mx1.freebsd.org (Postfix) with ESMTP id 468A68FC1E for ; Mon, 23 Feb 2009 23:47:13 +0000 (UTC) (envelope-from k0802647@telus.net) Received: from priv-edmwaa05.telusplanet.net ([204.209.205.55]) by priv-edmwes51.telusplanet.net (InterMail vM.7.08.04.00 201-2186-134-20080326) with ESMTP id <20090223234707.XUJT1759.priv-edmwes51.telusplanet.net@priv-edmwaa05.telusplanet.net>; Mon, 23 Feb 2009 16:47:07 -0700 Received: from oliver.bc.lan (d75-157-11-254.bchsia.telus.net [75.157.11.254]) by priv-edmwaa05.telusplanet.net (BorderWare Security Platform) with ESMTP id DA0936253025982D; Mon, 23 Feb 2009 16:47:06 -0700 (MST) Received: from [10.111.111.112] (unknown [10.111.111.112]) by oliver.bc.lan (Postfix) with ESMTP id A078A62AA; Mon, 23 Feb 2009 15:47:06 -0800 (PST) Message-ID: <49A3357A.7080008@telus.net> Date: Mon, 23 Feb 2009 15:47:06 -0800 From: Carl User-Agent: Thunderbird 2.0.0.19 (Windows/20081209) MIME-Version: 1.0 To: Robert Watson References: <49A10626.8060705@telus.net> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: UFS2 and/or sparse file bug causing copy process to land in 'D'' state? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Feb 2009 23:47:14 -0000 Robert Watson wrote: > It would be interesting to get kernel stack traces of the involved > processes/threads; there are various ways to do this, such as using > DDB. If you have a kernel.symbols for the kernel, then you can run kgdb > on kernel.symbols and /dev/mem to generate traces without interrupting > operation (although if the system is in the throes of deadlocking, that > may not be a concern or even possible). You can also use procstat -kk > to retrieve kernel stack traces, with a bit less information (such as no > arguments) to help narrow things down more. > > Unfortunately, debugging this type of problem, as you've intuited, is > best done with serial console access and a local box so that the > debugging information can be extracted. It would be interesting to know > if you can force a crashdump on the box to get the information for > post-mortem debugging. This may be possible using "reboot -d" -- I've > never used this, but have every reason to think it will work. I have both a local and remote box. The problems I'm seeing are all occurring on the local box because as yet I cannot afford to cause them on a remote box. If you were to guess I've never used DDB or any other kernel debugging, you'd be spot on. I'm currently running the 7.0-RELEASE GENERIC kernel. I see a /boot/kernel/kernel.symbols in the filesystem. The system is nominally headless with a serial console, although I primarily use SSH. Even if I knew what to do with them, actually collecting kernel dumps is a hit or miss affair because of gmirror, but this particular problem doesn't cause kernel core dumps on its own (thankfully, since gmirror resyncs take a long time on terabyte drives). So, if you were able to clearly spell out the stripped down steps I should take in conjunction with my earlier truncate sequence and if it doesn't require rebuilding the kernel, I might be able to accommodate. Learning all about kernel debugging would be interesting but doesn't fit in my schedule right now. Anyone willing to attempt to reproduce this problem on their system? Carl / K0802647