Date: Wed, 10 Jul 2013 17:09:10 -0500 From: Kevin Day <toasty@dragondata.com> To: Jordan Hubbard <jkh@mail.turbofuzz.com> Cc: hackers@freebsd.org Subject: Re: Kernel dumps [was Re: possible changes from Panzura] Message-ID: <4F0DFAB7-D6D5-4068-A543-C9DF885D1A7D@dragondata.com> In-Reply-To: <9890DFF1-892A-4DCA-9E33-B70681154F43@mail.turbofuzz.com> References: <FDEEB55D-823B-4899-8EEC-7F5306D91F5B@elischer.org> <9890DFF1-892A-4DCA-9E33-B70681154F43@mail.turbofuzz.com>
next in thread | previous in thread | raw e-mail | index | archive | help
>=20 >=20 > Those sound useful. Just out of curiosity, however, since we're on = the topic of kernel dumps: Has anyone even looked into the notion of an = emergency fall-back network stack to enable remote kernel panic (or = system hang) debugging, the way OS X lets you do? I can't tell you the = number of times I've NMI'd a Mac and connected to it remotely in a = scenario where everything was totally wedged and just a couple of = minutes in kgdb (or now lldb) quickly showed that everything was waiting = on a specific lock and the problem became manifestly clear. >=20 > The feature also lets you scrape a panic'd machine with automation, = running some kgdb scripts against it to glean useful information for = later analysis vs having to have someone schlep the dump image manually = to triage. It's going to be damn hard to live without this now, and if = someone else isn't working on it, that's good to know too! At a previous employer, we had a system where on a panic it had a = totally separate stack capable of just IP/UDP/TFTP and would save its = core via TFTP to a server. This isn=92t as nice as full remote = debugging, but it was a whole lot easier to develop. The caveats I = remember were: 1) We didn=92t want to implement ARP, so you had to write the mac = address of the =93dump server=94 to the kernel via sysctl before = crashing. 2) We also didn=92t want to have to deal with routing tables, so you had = to manually specify what interface to blast packets out to, also via = sysctl. 3) After a panic we didn=92t want to rely on interrupt processing = working, so it polled the network interface and blocked whenever it = needed to. Since this was an embedded system, it wasn=92t too big of a = deal - only one network driver had to be hacked to support this. = Basically a flag that would switch to =93disable normal processing, = switch to polled fifos for input and output=94 until reboot. 4) The whole system used only preallocated buffers and its own stack = (carved out from memory on boot) so even if the kernel=92s malloc was = trashed, we could still dump. I=92m not sure this really would scratch your itch, but I believe this = took me no more than a day or two to implement. Parts #1 and #2 would be = pretty easy, but I=92m not sure how generic the kernel could support an = emergency network mode that doesn=92t require interrupts for every = network card out there. Maybe that isn=92t as important to you as it was = to us. The whole exercise is much easier if you don=92t use TFTP but a custom = protocol that doesn=92t require the crashing system to receive any = packets, if it can just blast away at some random host oblivious if it=92s= working or not, it=92s a lot less code to write.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4F0DFAB7-D6D5-4068-A543-C9DF885D1A7D>