Date: Sat, 9 Dec 2006 21:57:57 +0000 (GMT) From: Robert Watson <rwatson@FreeBSD.org> To: Andrew Pantyukhin <infofarmer@FreeBSD.org> Cc: freebsd-current@freebsd.org, yal <yal@yal.hopto.org> Subject: Re: CURRENT freezes on Laitude D520 Message-ID: <20061209214233.L2273@fledge.watson.org> In-Reply-To: <cb5206420612091310r719f7b3en2d4fb35b23453ddf@mail.gmail.com> References: <52944.192.168.1.110.1165679313.squirrel@yal.hopto.org> <20061209195519.B60055@mp2.macomnet.net> <20061209204924.N9926@fledge.watson.org> <cb5206420612091310r719f7b3en2d4fb35b23453ddf@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 10 Dec 2006, Andrew Pantyukhin wrote: > On 12/9/06, Robert Watson <rwatson@freebsd.org> wrote: >> While this may be useful in doing some initial debugging and working around >> the problem, I'd really appreciate it if people didn't run with >> debug.mpsafenet="0" for any extended period, as it masks bugs rather than >> fixing them, and results in them not getting fixed. It also leads to a >> significant performance hit, and we really don't want people running with >> debugging features like this turned on by default; I'd rather they used the >> cycles on INVARIANTS and WITNESS. > > Do we have to forget about IPSEC+IPv6? I'm proposing removing the loader.conf/loader tunable, not the setting. Right now the debug_mpsafenet variable is set if one of the following two conditions holds: (1) A component declaring NET_NEEDS_GIANT is compiled into the kernel. (2) The administrator sets debug.mpsafenet to a non-zero value in loader.conf or manually in the loader. The suggestion is that we remove (2), but not (1): specifically, that we allow sections of the kernel to declare dependence on Giant over the network stack, but we no longer allow administrators to force Giant over the stack using debug.mpsafenet (in -CURRENT). A suggestion I had from Kris was to cause the machine to beep for two minutes on boot if debug.mpsafenet=1 was set, in order to produce a deterrent without removing it for use in debugging. Right now, setting debug.mpsafenet=1 has three effects: (1) Place Giant over the network stack, creating a single lock that spans the entire stack, preventing parallelism, as well as acting as a "master" lock which implicitly prevents lock order-related deadlocks in the stack. (2) Effectively disabling preemption in the network stack, as ithreads and the netisr will be unable to start running until user threads exit the stack, regardless of priority. (3) Effectively disable direct dispatch, as non-MPSAFE netisr handlers are always deferred rather than executing in the ithread context. I suspect that many of the people setting debug.mpsafenet=1 and declaring the problem fixed are seeing the change due to (2) and (3), indirect rather than direct effects of (1). I would much rather people experimented with: - Disabling direct dispatch (net.isr.direct=0) - Disabling preemption (compiling out options PREEMPTION) - Running with WITNESS, which reports lock order reversals. which get a bit more to the heart of most problems. debug.mpsafenet=1 really exists for the purposes of supporting components which are not sufficiently locked to allow the stack to run MPSAFE, rather than as a means of disabling direct dispatch and preemption, which speak to different types of problems. The main reason that I haven't removed the administrator tunable to date is that I suspect it will be quite helpful when KAME IPSEC locking happens, but since that appears not to have happened yet, debug.mpsafenet as an option is likely causing more harm than good by being available as a stand-in sysctl masking other problems, causing people to not get to the point of properly identifying the actual cause (device driver bugs, etc). Robert N M Watson Computer Laboratory University of Cambridge
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20061209214233.L2273>