Date: Tue, 10 Mar 2015 14:59:13 -0700 From: Mark Johnston <markj@FreeBSD.org> To: John Baldwin <jhb@freebsd.org> Cc: Nick Frampton <nick.frampton@akips.com>, freebsd-stable@freebsd.org Subject: Re: Suspected libkvm infinite loop Message-ID: <20150310215913.GB52108@charmander.picturesperfect.net> In-Reply-To: <4637620.LE11f9AQj7@ralph.baldwin.cx> References: <54FE3803.2000307@akips.com> <4637620.LE11f9AQj7@ralph.baldwin.cx>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Mar 10, 2015 at 02:10:09PM -0400, John Baldwin wrote: > On Tuesday, March 10, 2015 10:17:07 AM Nick Frampton wrote: > > Hi, > > > > For the past several months, we have had an intermittent problem where a > > process calling kvm_openfiles(3) or kvm_getprocs(3) (not sure which) gets > > stuck in an infinite loop and goes to 100% cpu. We have just observed > > "fstat -m" do the same thing and suspect it may be the same problem. > > > > Our environment is a 10.1-RELEASE-p6 amd64 guest running in VirtualBox, with > > ufs root and zfs /home. > > > > Has anyone else experienced this? Is there anything we can do to investigate > > the problem further? > > Often loops using libkvm are due to programs using libkvm are trying to read > kernel data structures while they are changing. However, if you use sysctls > to fetch this data instead, you should be able to get a stable snapshot of the > system state without getting stuck in a possible loop. I believe for libkvm > to use sysctl instead of /dev/kmem you have to pass a NULL for the kernel and > "/dev/null" for the core image. fstat -m should be doing that by default > however, so if it is not that, can you ktrace fstat when it is spinning to see > if it is spinning userland or in the kernel? If you see no activity via > ktrace, then it is spinning in one of the two places without making any system > calls, etc. You can attach to it with gdb to pause it, then see where gdb > thinks it is. If gdb hangs attaching to it, then it is stuck in the kernel. > > If gdb attaches to it ok, then it is spinning in userland. Unfortunately, for > gdb to be useful, you really need debug symbols. We don't currently provide > those for release binaries or binaries provided via freebsd-update (though > that is being worked on for 11.0). If you build from source, then the > simplest way to get this is to add 'WITH_DEBUG_FILES=yes' to /etc/src.conf and > rebuild your world without NO_CLEAN. If you are building from source and are > able to reproduce with those binaries, then after attaching to the process > with gdb, use 'bt' to see where it is hung and reply with that. > > If it is hanging in the kernel, then you will need to use the kernel debugger > to see where it is hanging. The simplest way to do this is probably to force > a crash via the debug.kdb.panic sysctl (set it to a non-zero value). You will > then need to fire up kgdb on the crash dump after it reboots, switch to the > fstat process via the 'proc <pid>' command and get a backtrace via 'bt'. It sounds like this issue might be the one fixed in r272566: if the KERN_PROC_ALL sysctl is read with an insufficiently large buffer, an sbuf error return value could bubble up and be treated as ERESTART, resulting in a loop. This can be confirmed with something like dtrace -n 'syscall:::entry /pid == $target/{@[probefunc] = count();} tick-3s {exit(0);}' -p <pid of looping proc> If the output consists solely of __sysctl, this bug is likely the culprit. -Mark
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20150310215913.GB52108>