From owner-freebsd-hackers@FreeBSD.ORG Wed Jul 10 23:07:15 2013 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 17DFA6A7 for ; Wed, 10 Jul 2013 23:07:15 +0000 (UTC) (envelope-from vince@unsane.co.uk) Received: from unsane.co.uk (unsane-pt.tunnel.tserv5.lon1.ipv6.he.net [IPv6:2001:470:1f08:110::2]) by mx1.freebsd.org (Postfix) with ESMTP id B0F791096 for ; Wed, 10 Jul 2013 23:07:14 +0000 (UTC) Received: from vincemacbook.unsane.co.uk (vincemacbook.unsane.co.uk [10.10.10.20]) (authenticated bits=0) by unsane.co.uk (8.14.7/8.14.6) with ESMTP id r6AN7AF9013535 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Thu, 11 Jul 2013 00:07:10 +0100 (BST) (envelope-from vince@unsane.co.uk) Message-ID: <51DDE91E.4000400@unsane.co.uk> Date: Thu, 11 Jul 2013 00:07:10 +0100 From: Vincent Hoffman User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:17.0) Gecko/20130620 Thunderbird/17.0.7 MIME-Version: 1.0 To: Kevin Day Subject: Re: Kernel dumps [was Re: possible changes from Panzura] References: <9890DFF1-892A-4DCA-9E33-B70681154F43@mail.turbofuzz.com> <4F0DFAB7-D6D5-4068-A543-C9DF885D1A7D@dragondata.com> In-Reply-To: <4F0DFAB7-D6D5-4068-A543-C9DF885D1A7D@dragondata.com> X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit Cc: hackers@freebsd.org, Jordan Hubbard X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Jul 2013 23:07:15 -0000 On 10/07/2013 23:09, Kevin Day wrote: >> >> Those sound useful. Just out of curiosity, however, since we're on the topic of kernel dumps: Has anyone even looked into the notion of an emergency fall-back network stack to enable remote kernel panic (or system hang) debugging, the way OS X lets you do? I can't tell you the number of times I've NMI'd a Mac and connected to it remotely in a scenario where everything was totally wedged and just a couple of minutes in kgdb (or now lldb) quickly showed that everything was waiting on a specific lock and the problem became manifestly clear. >> >> The feature also lets you scrape a panic'd machine with automation, running some kgdb scripts against it to glean useful information for later analysis vs having to have someone schlep the dump image manually to triage. It's going to be damn hard to live without this now, and if someone else isn't working on it, that's good to know too! > > At a previous employer, we had a system where on a panic it had a totally separate stack capable of just IP/UDP/TFTP and would save its core via TFTP to a server. This isn’t as nice as full remote debugging, but it was a whole lot easier to develop. The caveats I remember were: > > 1) We didn’t want to implement ARP, so you had to write the mac address of the “dump server” to the kernel via sysctl before crashing. > 2) We also didn’t want to have to deal with routing tables, so you had to manually specify what interface to blast packets out to, also via sysctl. > 3) After a panic we didn’t want to rely on interrupt processing working, so it polled the network interface and blocked whenever it needed to. Since this was an embedded system, it wasn’t too big of a deal - only one network driver had to be hacked to support this. Basically a flag that would switch to “disable normal processing, switch to polled fifos for input and output” until reboot. > 4) The whole system used only preallocated buffers and its own stack (carved out from memory on boot) so even if the kernel’s malloc was trashed, we could still dump. > > I’m not sure this really would scratch your itch, but I believe this took me no more than a day or two to implement. Parts #1 and #2 would be pretty easy, but I’m not sure how generic the kernel could support an emergency network mode that doesn’t require interrupts for every network card out there. Maybe that isn’t as important to you as it was to us. > > The whole exercise is much easier if you don’t use TFTP but a custom protocol that doesn’t require the crashing system to receive any packets, if it can just blast away at some random host oblivious if it’s working or not, it’s a lot less code to write. > There was some work on something similar at one point, not sure what came of it. http://lists.freebsd.org/pipermail/freebsd-current/2010-September/020164.html Vince > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" >