From owner-freebsd-stable@FreeBSD.ORG Wed Mar 11 14:54:16 2015 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id AD0E65BE; Wed, 11 Mar 2015 14:54:16 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 85566950; Wed, 11 Mar 2015 14:54:16 +0000 (UTC) Received: from ralph.baldwin.cx (pool-173-54-116-245.nwrknj.fios.verizon.net [173.54.116.245]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 5C690B91F; Wed, 11 Mar 2015 10:54:15 -0400 (EDT) From: John Baldwin To: Nick Frampton Subject: Re: Suspected libkvm infinite loop Date: Wed, 11 Mar 2015 10:38:12 -0400 Message-ID: <1648097.s1OBMXVVbH@ralph.baldwin.cx> User-Agent: KMail/4.14.2 (FreeBSD/10.1-STABLE; KDE/4.14.2; amd64; ; ) In-Reply-To: <54FFBDE9.5060702@akips.com> References: <54FE3803.2000307@akips.com> <20150310215913.GB52108@charmander.picturesperfect.net> <54FFBDE9.5060702@akips.com> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Wed, 11 Mar 2015 10:54:15 -0400 (EDT) Cc: Mark Johnston , freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Mar 2015 14:54:16 -0000 On Wednesday, March 11, 2015 02:00:41 PM Nick Frampton wrote: > On 11/03/15 07:59, Mark Johnston wrote: > > On Tue, Mar 10, 2015 at 02:10:09PM -0400, John Baldwin wrote: > >> Often loops using libkvm are due to programs using libkvm are trying to read > >> kernel data structures while they are changing. However, if you use sysctls > >> to fetch this data instead, you should be able to get a stable snapshot of the > >> system state without getting stuck in a possible loop. I believe for libkvm > >> to use sysctl instead of /dev/kmem you have to pass a NULL for the kernel and > >> "/dev/null" for the core image. > > In our code, we're invoking kvm_openfiles as you suggest: > kd = kvm_openfiles (NULL, _PATH_DEVNULL, NULL, O_RDONLY, errbuf) > > > > It sounds like this issue might be the one fixed in r272566: if the > > KERN_PROC_ALL sysctl is read with an insufficiently large buffer, an > > sbuf error return value could bubble up and be treated as ERESTART, > > resulting in a loop. > > > > This can be confirmed with something like > > > > dtrace -n 'syscall:::entry /pid == $target/{@[probefunc] = count();} tick-3s {exit(0);}' -p > > > > If the output consists solely of __sysctl, this bug is likely the > > culprit. > > Unfortunately, I accidentally killed fstat this morning before I could do any further debug. > > I ran truss -p on it yesterday and it was spinning solely on __sysctl. > > I'll try compiling with debug symbols in case it happens again. I haven't been able to reproduce the > problem in a reasonable time frame so it could be days or weeks before we see it happen again. Tha truss output is consistent with Mark's suggestion, so I would try his suggested fix of 272566. -- John Baldwin