Date: Tue, 10 Mar 2015 14:59:13 -0700 From: Mark Johnston <markj@FreeBSD.org> To: John Baldwin <jhb@freebsd.org> Cc: Nick Frampton <nick.frampton@akips.com>, freebsd-stable@freebsd.org Subject: Re: Suspected libkvm infinite loop Message-ID: <20150310215913.GB52108@charmander.picturesperfect.net> In-Reply-To: <4637620.LE11f9AQj7@ralph.baldwin.cx> References: <54FE3803.2000307@akips.com> <4637620.LE11f9AQj7@ralph.baldwin.cx>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Mar 10, 2015 at 02:10:09PM -0400, John Baldwin wrote:
> On Tuesday, March 10, 2015 10:17:07 AM Nick Frampton wrote:
> > Hi,
> >
> > For the past several months, we have had an intermittent problem where a
> > process calling kvm_openfiles(3) or kvm_getprocs(3) (not sure which) gets
> > stuck in an infinite loop and goes to 100% cpu. We have just observed
> > "fstat -m" do the same thing and suspect it may be the same problem.
> >
> > Our environment is a 10.1-RELEASE-p6 amd64 guest running in VirtualBox, with
> > ufs root and zfs /home.
> >
> > Has anyone else experienced this? Is there anything we can do to investigate
> > the problem further?
>
> Often loops using libkvm are due to programs using libkvm are trying to read
> kernel data structures while they are changing. However, if you use sysctls
> to fetch this data instead, you should be able to get a stable snapshot of the
> system state without getting stuck in a possible loop. I believe for libkvm
> to use sysctl instead of /dev/kmem you have to pass a NULL for the kernel and
> "/dev/null" for the core image. fstat -m should be doing that by default
> however, so if it is not that, can you ktrace fstat when it is spinning to see
> if it is spinning userland or in the kernel? If you see no activity via
> ktrace, then it is spinning in one of the two places without making any system
> calls, etc. You can attach to it with gdb to pause it, then see where gdb
> thinks it is. If gdb hangs attaching to it, then it is stuck in the kernel.
>
> If gdb attaches to it ok, then it is spinning in userland. Unfortunately, for
> gdb to be useful, you really need debug symbols. We don't currently provide
> those for release binaries or binaries provided via freebsd-update (though
> that is being worked on for 11.0). If you build from source, then the
> simplest way to get this is to add 'WITH_DEBUG_FILES=yes' to /etc/src.conf and
> rebuild your world without NO_CLEAN. If you are building from source and are
> able to reproduce with those binaries, then after attaching to the process
> with gdb, use 'bt' to see where it is hung and reply with that.
>
> If it is hanging in the kernel, then you will need to use the kernel debugger
> to see where it is hanging. The simplest way to do this is probably to force
> a crash via the debug.kdb.panic sysctl (set it to a non-zero value). You will
> then need to fire up kgdb on the crash dump after it reboots, switch to the
> fstat process via the 'proc <pid>' command and get a backtrace via 'bt'.
It sounds like this issue might be the one fixed in r272566: if the
KERN_PROC_ALL sysctl is read with an insufficiently large buffer, an
sbuf error return value could bubble up and be treated as ERESTART,
resulting in a loop.
This can be confirmed with something like
dtrace -n 'syscall:::entry /pid == $target/{@[probefunc] = count();} tick-3s {exit(0);}' -p <pid of looping proc>
If the output consists solely of __sysctl, this bug is likely the
culprit.
-Mark
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20150310215913.GB52108>
