From owner-freebsd-hackers@FreeBSD.ORG Thu May 31 15:58:04 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D1D0F1065693; Thu, 31 May 2012 15:58:04 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id 06D998FC21; Thu, 31 May 2012 15:58:00 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 7C7D3B995; Thu, 31 May 2012 11:57:59 -0400 (EDT) From: John Baldwin To: Mark Felder Date: Thu, 31 May 2012 11:57:42 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p13; KDE/4.5.5; amd64; ; ) References: <201205311048.45813.jhb@freebsd.org> In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <201205311157.42909.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Thu, 31 May 2012 11:57:59 -0400 (EDT) Cc: freebsd-hackers@freebsd.org, freebsd-questions@freebsd.org Subject: Re: Please help me diagnose this crazy VMWare/FreeBSD 8.x crash X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 31 May 2012 15:58:04 -0000 On Thursday, May 31, 2012 11:11:11 am Mark Felder wrote: > So when this hang happens, there never is a real panic. It just sits in a > state which I describe as like being in a deadlock. How would I go about > getting a crashdump if it never panics? Is it possible to do the dump over > a network or something because I don't believe it can write through the > controller at all. You can break into ddb and run 'call doadump'. It should use polled IO, so there is a slight chance of it working. > Also, thank you for the KTR_SCHED tip. This is the type of info I was > looking for. Unfortunately I've only ever seen this crash once on a kernel > with debugging enabled. The machine which is currently prepared to do this > work used to crash a few times a week and now it has 70 days uptime... > however, it is an example of a machine with mpt0 and em0 sharing an IRQ so > I might be able to trigger it using Dane's method. > > $ vmstat -i > interrupt total rate > irq1: atkbd0 392 0 > irq6: fdc0 9 0 > irq14: ata0 34 0 > irq18: em0 mpt0 1189748491 218 > cpu0: timer 2174263198 400 > Total 3364012124 619 > > > I'm doing my best to get you guys the info you need, but this is one heck > of a Heisenbug... Thanks. -- John Baldwin