From owner-freebsd-hackers@FreeBSD.ORG Wed Jul 10 22:09:24 2013 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id E460D4FA for ; Wed, 10 Jul 2013 22:09:24 +0000 (UTC) (envelope-from toasty@dragondata.com) Received: from mail-gh0-x22e.google.com (mail-gh0-x22e.google.com [IPv6:2607:f8b0:4002:c05::22e]) by mx1.freebsd.org (Postfix) with ESMTP id A24781E5F for ; Wed, 10 Jul 2013 22:09:24 +0000 (UTC) Received: by mail-gh0-f174.google.com with SMTP id r17so2551349ghr.33 for ; Wed, 10 Jul 2013 15:09:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dragondata.com; s=google; h=content-type:mime-version:subject:from:in-reply-to:date:cc :message-id:references:to:x-mailer; bh=AGggicbba6vYC+SFopoJGDiCQEqkGH6D1c9NBCkhrXI=; b=HjsCc+4UUev8AKJLIzriEBjArgiXzHri9IM6nX3dZXirJ7kY1CEI5t4cp2pTIH5tzy +J1iDebwraimR7CvGjNzzOmR7tawYKLudOikfDDw/sNand09+WsXUpWGJiLcdTkNi7JX +MqUwosMjB4TKHi06HWMDqUEybpAfbeyiLsTI= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=content-type:mime-version:subject:from:in-reply-to:date:cc :message-id:references:to:x-mailer:x-gm-message-state; bh=AGggicbba6vYC+SFopoJGDiCQEqkGH6D1c9NBCkhrXI=; b=FEbzc1aOAYpiDy0Ar+ioAosOqc/Q3bXJBApZfok8oP6Ltj6Z9Y4Z7UR+RbtnkNXx8d fzRZILJF+KQDG/TU0p/MQJBh8vDaFA5v0Qks46iazFZL4THOypr3Mm5wqK0vV340Y4l9 PknCCASVagvJ7xZ/ygQr2bzkqlVYV9RS/9FAd+yxPfDoFnK8vLqi+hg0O0WlF9L6LQ6+ OQH07mimWg1uyZGlkPUEGLNTNXTZoPKigmqOviwtqsRdkx7HcArAdLCuSB74uYlUjCzQ QCbvps3p+nu2AImaL7e4r8lpVWBfHoiHT9IQVClX8QwlR07h/JjyULarYvXHxj7KJCZ4 xYOQ== X-Received: by 10.236.31.202 with SMTP id m50mr19275029yha.19.1373494163958; Wed, 10 Jul 2013 15:09:23 -0700 (PDT) Received: from vpn155.rw1.your.org (vpn155.rw1.your.org. [204.9.51.155]) by mx.google.com with ESMTPSA id m5sm55398020yha.23.2013.07.10.15.09.22 for (version=TLSv1.2 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 10 Jul 2013 15:09:23 -0700 (PDT) Mime-Version: 1.0 (Mac OS X Mail 7.0 \(1784.1\)) Subject: Re: Kernel dumps [was Re: possible changes from Panzura] From: Kevin Day In-Reply-To: <9890DFF1-892A-4DCA-9E33-B70681154F43@mail.turbofuzz.com> Date: Wed, 10 Jul 2013 17:09:10 -0500 Message-Id: <4F0DFAB7-D6D5-4068-A543-C9DF885D1A7D@dragondata.com> References: <9890DFF1-892A-4DCA-9E33-B70681154F43@mail.turbofuzz.com> To: Jordan Hubbard X-Mailer: Apple Mail (2.1784.1) X-Gm-Message-State: ALoCoQmrlP4jfHLSoTVrC/IK3L/q/BBBPFjuJbjS9zLXW+YWOi1NOmDsxjZbM8cT79xD4XDD1hxy Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Jul 2013 22:09:25 -0000 >=20 >=20 > Those sound useful. Just out of curiosity, however, since we're on = the topic of kernel dumps: Has anyone even looked into the notion of an = emergency fall-back network stack to enable remote kernel panic (or = system hang) debugging, the way OS X lets you do? I can't tell you the = number of times I've NMI'd a Mac and connected to it remotely in a = scenario where everything was totally wedged and just a couple of = minutes in kgdb (or now lldb) quickly showed that everything was waiting = on a specific lock and the problem became manifestly clear. >=20 > The feature also lets you scrape a panic'd machine with automation, = running some kgdb scripts against it to glean useful information for = later analysis vs having to have someone schlep the dump image manually = to triage. It's going to be damn hard to live without this now, and if = someone else isn't working on it, that's good to know too! At a previous employer, we had a system where on a panic it had a = totally separate stack capable of just IP/UDP/TFTP and would save its = core via TFTP to a server. This isn=92t as nice as full remote = debugging, but it was a whole lot easier to develop. The caveats I = remember were: 1) We didn=92t want to implement ARP, so you had to write the mac = address of the =93dump server=94 to the kernel via sysctl before = crashing. 2) We also didn=92t want to have to deal with routing tables, so you had = to manually specify what interface to blast packets out to, also via = sysctl. 3) After a panic we didn=92t want to rely on interrupt processing = working, so it polled the network interface and blocked whenever it = needed to. Since this was an embedded system, it wasn=92t too big of a = deal - only one network driver had to be hacked to support this. = Basically a flag that would switch to =93disable normal processing, = switch to polled fifos for input and output=94 until reboot. 4) The whole system used only preallocated buffers and its own stack = (carved out from memory on boot) so even if the kernel=92s malloc was = trashed, we could still dump. I=92m not sure this really would scratch your itch, but I believe this = took me no more than a day or two to implement. Parts #1 and #2 would be = pretty easy, but I=92m not sure how generic the kernel could support an = emergency network mode that doesn=92t require interrupts for every = network card out there. Maybe that isn=92t as important to you as it was = to us. The whole exercise is much easier if you don=92t use TFTP but a custom = protocol that doesn=92t require the crashing system to receive any = packets, if it can just blast away at some random host oblivious if it=92s= working or not, it=92s a lot less code to write.