Date: Thu, 22 Feb 2007 15:54:24 -0600 From: Guy Helmer <ghelmer@palisadesys.com> To: Rink Springer <rink@freebsd.org> Cc: stable@freebsd.org, roel@qsp.nl Subject: Re: Deadlock in state 'sysctl lock' Message-ID: <45DE1110.8050208@palisadesys.com> In-Reply-To: <20070220105902.GC39393@rink.nu> References: <20070220105902.GC39393@rink.nu>
next in thread | previous in thread | raw e-mail | index | archive | help
Rink Springer wrote: > Hi people, > > At work, one of our SpamAssassin/ClamAV filtering machines just entered > a deadlock state: > > FreeBSD/i386 (xxx.qsp.nl) (cuad0) > > login: root > load: 0.00 cmd: login 683 [sysctl lock] 0.00u 0.00s 0% 148k > load: 0.00 cmd: login 683 [sysctl lock] 0.00u 0.00s 0% 148k > load: 0.00 cmd: login 683 [sysctl lock] 0.00u 0.00s 0% 148k > load: 0.00 cmd: login 683 [sysctl lock] 0.00u 0.00s 0% 148k > load: 0.00 cmd: login 683 [sysctl lock] 0.00u 0.00s 0% 148k > > After inspection, I believe the following code in > kern/kern_sysctl.c:userland_sysctl() is the culprit: > > SYSCTL_LOCK(); > > do { > req.oldidx = 0; > req.newidx = 0; > error = sysctl_root(0, name, namelen, &req); > } while (error == EAGAIN); > > if (req.lock == REQ_WIRED && req.validlen > 0) > vsunlock(req.oldptr, req.validlen); > > SYSCTL_UNLOCK(); > > Clearly, should sysctl_root() always return EAGAIN, this will cause a > serious deadlock condition. It appears this is possible. > > The only plausible reference to sysctl's returning EGAIN seems to be in > kern/kern_proc.c:sysctl_out_proc(). However, this code returns ESRCH > if the process couldn't have been found in the fast place, and since the > complete handler function will be called by sysctl_root() every > iteration, and thus will do a pfind() and return ESRCH if it failed and > not EAGAIN as it will later on in the code path. > > The machine is a 6.0-STABLE SMP machine of 30-Mar-2006. No debugging > options are in the kernel as the machine has quite some load. The only > console messages were a lot of 'calcru' messages. > > Any help is very much appreciated. For now, I'd like to propose a change > to kern/kern_sysctl.c:userland_sysctl(), to ensure this will never keep > looping on EAGAIN states (preferably, it should trigger a panic or at > least a KASSERT should such a condition occour). I know this is a > bandaid for a problem we don't really quite understand yet, but this may > ease debugging later on (especially as it will help us understand where > exactly it is going bad) > > Any comments? It looks to me this deadlock is quite rare (in fact, I've > never seen it before), but I believe it is serious enough to be > addressed, even with such a bandaid until the real solution is presented > by someone who knows the sysctl internals better than I do. > > Interesting. Twice I have had a 6.2 system stuck where sendmail was holding the sysctl lock while another process was holding the proctree and/or allproc lock, if I remember correctly. Guy
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?45DE1110.8050208>