Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 10 Mar 2015 14:59:13 -0700
From:      Mark Johnston <markj@FreeBSD.org>
To:        John Baldwin <jhb@freebsd.org>
Cc:        Nick Frampton <nick.frampton@akips.com>, freebsd-stable@freebsd.org
Subject:   Re: Suspected libkvm infinite loop
Message-ID:  <20150310215913.GB52108@charmander.picturesperfect.net>
In-Reply-To: <4637620.LE11f9AQj7@ralph.baldwin.cx>
References:  <54FE3803.2000307@akips.com> <4637620.LE11f9AQj7@ralph.baldwin.cx>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Mar 10, 2015 at 02:10:09PM -0400, John Baldwin wrote:
> On Tuesday, March 10, 2015 10:17:07 AM Nick Frampton wrote:
> > Hi,
> > 
> > For the past several months, we have had an intermittent problem where a
> > process calling kvm_openfiles(3) or kvm_getprocs(3) (not sure which) gets
> > stuck in an infinite loop and goes to 100% cpu. We have just observed
> > "fstat -m" do the same thing and suspect it may be the same problem.
> > 
> > Our environment is a 10.1-RELEASE-p6 amd64 guest running in VirtualBox, with
> > ufs root and zfs /home.
> > 
> > Has anyone else experienced this? Is there anything we can do to investigate
> > the problem further?
> 
> Often loops using libkvm are due to programs using libkvm are trying to read 
> kernel data structures while they are changing.  However, if you use sysctls 
> to fetch this data instead, you should be able to get a stable snapshot of the 
> system state without getting stuck in a possible loop.  I believe for libkvm 
> to use sysctl instead of /dev/kmem you have to pass a NULL for the kernel and 
> "/dev/null" for the core image.  fstat -m should be doing that by default 
> however, so if it is not that, can you ktrace fstat when it is spinning to see 
> if it is spinning userland or in the kernel?  If you see no activity via 
> ktrace, then it is spinning in one of the two places without making any system 
> calls, etc.  You can attach to it with gdb to pause it, then see where gdb 
> thinks it is.  If gdb hangs attaching to it, then it is stuck in the kernel.  
> 
> If gdb attaches to it ok, then it is spinning in userland.  Unfortunately, for 
> gdb to be useful, you really need debug symbols.  We don't currently provide 
> those for release binaries or binaries provided via freebsd-update (though 
> that is being worked on for 11.0).  If you build from source, then the 
> simplest way to get this is to add 'WITH_DEBUG_FILES=yes' to /etc/src.conf and 
> rebuild your world without NO_CLEAN.  If you are building from source and are 
> able to reproduce with those binaries, then after attaching to the process 
> with gdb, use 'bt' to see where it is hung and reply with that.
> 
> If it is hanging in the kernel, then you will need to use the kernel debugger 
> to see where it is hanging.  The simplest way to do this is probably to force 
> a crash via the debug.kdb.panic sysctl (set it to a non-zero value).  You will 
> then need to fire up kgdb on the crash dump after it reboots, switch to the 
> fstat process via the 'proc <pid>' command and get a backtrace via 'bt'.

It sounds like this issue might be the one fixed in r272566: if the
KERN_PROC_ALL sysctl is read with an insufficiently large buffer, an
sbuf error return value could bubble up and be treated as ERESTART,
resulting in a loop.

This can be confirmed with something like

  dtrace -n 'syscall:::entry /pid == $target/{@[probefunc] = count();} tick-3s {exit(0);}' -p <pid of looping proc>

If the output consists solely of __sysctl, this bug is likely the
culprit.

-Mark



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20150310215913.GB52108>