Date: Wed, 11 Mar 2015 21:34:07 -0700 From: Mark Johnston <markj@freebsd.org> To: Nick Frampton <nick.frampton@akips.com> Cc: kib@FreeBSD.org, freebsd-stable@freebsd.org, John Baldwin <jhb@freebsd.org> Subject: Re: Suspected libkvm infinite loop Message-ID: <20150312043407.GA11120@raichu> In-Reply-To: <5501108C.4080303@akips.com> References: <54FE3803.2000307@akips.com> <20150310215913.GB52108@charmander.picturesperfect.net> <54FFBDE9.5060702@akips.com> <1648097.s1OBMXVVbH@ralph.baldwin.cx> <5501108C.4080303@akips.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Mar 12, 2015 at 02:05:32PM +1000, Nick Frampton wrote: > On 12/03/15 00:38, John Baldwin wrote: > >>> It sounds like this issue might be the one fixed in r272566: if the > >>> > >KERN_PROC_ALL sysctl is read with an insufficiently large buffer, an > >>> > >sbuf error return value could bubble up and be treated as ERESTART, > >>> > >resulting in a loop. > >>> > > > >>> > >This can be confirmed with something like > >>> > > > >>> > > dtrace -n 'syscall:::entry/pid == $target/{@[probefunc] = count();} tick-3s {exit(0);}' -p <pid of looping proc> > >>> > > > >>> > >If the output consists solely of __sysctl, this bug is likely the > >>> > >culprit. > >> > > >> >Unfortunately, I accidentally killed fstat this morning before I could do any further debug. > >> > > >> >I ran truss -p on it yesterday and it was spinning solely on __sysctl. > >> > > >> >I'll try compiling with debug symbols in case it happens again. I haven't been able to reproduce the > >> >problem in a reasonable time frame so it could be days or weeks before we see it happen again. > > Tha truss output is consistent with Mark's suggestion, so I would try > > his suggested fix of 272566. > > I patched the 10.1 kernel with r272566 and it appears to have fixed the issue. Is this patch likely > to be MFCed back to 10-stable? I can't see any reason it shouldn't be, and there was an MFC reminder in the commit log entry for that revision. I've cc'ed kib@, who might have a reason. > > Our RC script forks off about 200 processes when starting our software, and I wrote a small script > to repeatedly stop/start the software, which fairly reliably reproduces the issue about 1 in 10 > times. I've been running the script with the patched kernel for an hour now and I haven't seen the > issue appear.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20150312043407.GA11120>