Date: Wed, 11 Mar 2015 10:38:12 -0400 From: John Baldwin <jhb@freebsd.org> To: Nick Frampton <nick.frampton@akips.com> Cc: Mark Johnston <markj@freebsd.org>, freebsd-stable@freebsd.org Subject: Re: Suspected libkvm infinite loop Message-ID: <1648097.s1OBMXVVbH@ralph.baldwin.cx> In-Reply-To: <54FFBDE9.5060702@akips.com> References: <54FE3803.2000307@akips.com> <20150310215913.GB52108@charmander.picturesperfect.net> <54FFBDE9.5060702@akips.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wednesday, March 11, 2015 02:00:41 PM Nick Frampton wrote: > On 11/03/15 07:59, Mark Johnston wrote: > > On Tue, Mar 10, 2015 at 02:10:09PM -0400, John Baldwin wrote: > >> Often loops using libkvm are due to programs using libkvm are trying to read > >> kernel data structures while they are changing. However, if you use sysctls > >> to fetch this data instead, you should be able to get a stable snapshot of the > >> system state without getting stuck in a possible loop. I believe for libkvm > >> to use sysctl instead of /dev/kmem you have to pass a NULL for the kernel and > >> "/dev/null" for the core image. > > In our code, we're invoking kvm_openfiles as you suggest: > kd = kvm_openfiles (NULL, _PATH_DEVNULL, NULL, O_RDONLY, errbuf) > > > > It sounds like this issue might be the one fixed in r272566: if the > > KERN_PROC_ALL sysctl is read with an insufficiently large buffer, an > > sbuf error return value could bubble up and be treated as ERESTART, > > resulting in a loop. > > > > This can be confirmed with something like > > > > dtrace -n 'syscall:::entry /pid == $target/{@[probefunc] = count();} tick-3s {exit(0);}' -p <pid of looping proc> > > > > If the output consists solely of __sysctl, this bug is likely the > > culprit. > > Unfortunately, I accidentally killed fstat this morning before I could do any further debug. > > I ran truss -p on it yesterday and it was spinning solely on __sysctl. > > I'll try compiling with debug symbols in case it happens again. I haven't been able to reproduce the > problem in a reasonable time frame so it could be days or weeks before we see it happen again. Tha truss output is consistent with Mark's suggestion, so I would try his suggested fix of 272566. -- John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1648097.s1OBMXVVbH>