From owner-freebsd-stable@FreeBSD.ORG Tue Mar 10 21:59:19 2015 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 06D293B9; Tue, 10 Mar 2015 21:59:19 +0000 (UTC) Received: from mail-pa0-x22e.google.com (mail-pa0-x22e.google.com [IPv6:2607:f8b0:400e:c03::22e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id C542580B; Tue, 10 Mar 2015 21:59:18 +0000 (UTC) Received: by pabrd3 with SMTP id rd3so5637489pab.5; Tue, 10 Mar 2015 14:59:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=Ldj8/YChVqSp8ZlnLwzi7EUGt/SA/O7Sce1XPMi5BAo=; b=kyswx/TDtzgriTz2dJ7lPy8YtiuajahW0VhpY5cM9w45bMM89GfTAC+nr2MuRIyGHS YKThOsBMu+0nF0dW+mSSqBAByLyuQQe3ONTusJGKL3V2fghHHCmI1wsYOHGmpkUE2EU4 ixAdd74q/z5/iBbNi1B0V1bmZvITK0v/yI73nPNAYzKyukih00TdKiy9q02JKnFJnE2b AQcAtAYh3pMclbEXP1GH9RklvSxYVtTyuJfqkoEz7327UnUrCzSsKMepKFi3rY/1UIBm eWMyMZxmCvk/yVC0N52o0Am+FGA0Ise/IY/mFCuAg5AqzEuNbJHJrwzuAezb6SqjLBef HRdw== X-Received: by 10.70.34.129 with SMTP id z1mr21154517pdi.113.1426024758181; Tue, 10 Mar 2015 14:59:18 -0700 (PDT) Received: from charmander.picturesperfect.net (c-67-182-131-225.hsd1.wa.comcast.net. [67.182.131.225]) by mx.google.com with ESMTPSA id ux7sm2655046pab.19.2015.03.10.14.59.16 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 10 Mar 2015 14:59:17 -0700 (PDT) Sender: Mark Johnston Date: Tue, 10 Mar 2015 14:59:13 -0700 From: Mark Johnston To: John Baldwin Subject: Re: Suspected libkvm infinite loop Message-ID: <20150310215913.GB52108@charmander.picturesperfect.net> References: <54FE3803.2000307@akips.com> <4637620.LE11f9AQj7@ralph.baldwin.cx> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4637620.LE11f9AQj7@ralph.baldwin.cx> User-Agent: Mutt/1.5.23 (2014-03-12) Cc: Nick Frampton , freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Mar 2015 21:59:19 -0000 On Tue, Mar 10, 2015 at 02:10:09PM -0400, John Baldwin wrote: > On Tuesday, March 10, 2015 10:17:07 AM Nick Frampton wrote: > > Hi, > > > > For the past several months, we have had an intermittent problem where a > > process calling kvm_openfiles(3) or kvm_getprocs(3) (not sure which) gets > > stuck in an infinite loop and goes to 100% cpu. We have just observed > > "fstat -m" do the same thing and suspect it may be the same problem. > > > > Our environment is a 10.1-RELEASE-p6 amd64 guest running in VirtualBox, with > > ufs root and zfs /home. > > > > Has anyone else experienced this? Is there anything we can do to investigate > > the problem further? > > Often loops using libkvm are due to programs using libkvm are trying to read > kernel data structures while they are changing. However, if you use sysctls > to fetch this data instead, you should be able to get a stable snapshot of the > system state without getting stuck in a possible loop. I believe for libkvm > to use sysctl instead of /dev/kmem you have to pass a NULL for the kernel and > "/dev/null" for the core image. fstat -m should be doing that by default > however, so if it is not that, can you ktrace fstat when it is spinning to see > if it is spinning userland or in the kernel? If you see no activity via > ktrace, then it is spinning in one of the two places without making any system > calls, etc. You can attach to it with gdb to pause it, then see where gdb > thinks it is. If gdb hangs attaching to it, then it is stuck in the kernel. > > If gdb attaches to it ok, then it is spinning in userland. Unfortunately, for > gdb to be useful, you really need debug symbols. We don't currently provide > those for release binaries or binaries provided via freebsd-update (though > that is being worked on for 11.0). If you build from source, then the > simplest way to get this is to add 'WITH_DEBUG_FILES=yes' to /etc/src.conf and > rebuild your world without NO_CLEAN. If you are building from source and are > able to reproduce with those binaries, then after attaching to the process > with gdb, use 'bt' to see where it is hung and reply with that. > > If it is hanging in the kernel, then you will need to use the kernel debugger > to see where it is hanging. The simplest way to do this is probably to force > a crash via the debug.kdb.panic sysctl (set it to a non-zero value). You will > then need to fire up kgdb on the crash dump after it reboots, switch to the > fstat process via the 'proc ' command and get a backtrace via 'bt'. It sounds like this issue might be the one fixed in r272566: if the KERN_PROC_ALL sysctl is read with an insufficiently large buffer, an sbuf error return value could bubble up and be treated as ERESTART, resulting in a loop. This can be confirmed with something like dtrace -n 'syscall:::entry /pid == $target/{@[probefunc] = count();} tick-3s {exit(0);}' -p If the output consists solely of __sysctl, this bug is likely the culprit. -Mark