From owner-freebsd-stable@FreeBSD.ORG Thu Mar 12 15:09:37 2015 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A9B9F366; Thu, 12 Mar 2015 15:09:37 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 7F8B879A; Thu, 12 Mar 2015 15:09:37 +0000 (UTC) Received: from ralph.baldwin.cx (pool-173-54-116-245.nwrknj.fios.verizon.net [173.54.116.245]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 90637B963; Thu, 12 Mar 2015 11:09:36 -0400 (EDT) From: John Baldwin To: Konstantin Belousov Subject: Re: Suspected libkvm infinite loop Date: Thu, 12 Mar 2015 10:39:10 -0400 Message-ID: <2374792.i316gF0qRo@ralph.baldwin.cx> User-Agent: KMail/4.14.2 (FreeBSD/10.1-STABLE; KDE/4.14.2; amd64; ; ) In-Reply-To: <20150312104023.GL2379@kib.kiev.ua> References: <54FE3803.2000307@akips.com> <20150312043407.GA11120@raichu> <20150312104023.GL2379@kib.kiev.ua> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Thu, 12 Mar 2015 11:09:36 -0400 (EDT) Cc: freebsd-stable@freebsd.org, Mark Johnston , Nick Frampton , kib@freebsd.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Mar 2015 15:09:37 -0000 On Thursday, March 12, 2015 12:40:23 PM Konstantin Belousov wrote: > On Wed, Mar 11, 2015 at 09:34:07PM -0700, Mark Johnston wrote: > > On Thu, Mar 12, 2015 at 02:05:32PM +1000, Nick Frampton wrote: > > > On 12/03/15 00:38, John Baldwin wrote: > > > >>> It sounds like this issue might be the one fixed in r272566: if the > > > >>> > >KERN_PROC_ALL sysctl is read with an insufficiently large buffer, an > > > >>> > >sbuf error return value could bubble up and be treated as ERESTART, > > > >>> > >resulting in a loop. > > > >>> > > > > > >>> > >This can be confirmed with something like > > > >>> > > > > > >>> > > dtrace -n 'syscall:::entry/pid == $target/{@[probefunc] = count();} tick-3s {exit(0);}' -p > > > >>> > > > > > >>> > >If the output consists solely of __sysctl, this bug is likely the > > > >>> > >culprit. > > > >> > > > > >> >Unfortunately, I accidentally killed fstat this morning before I could do any further debug. > > > >> > > > > >> >I ran truss -p on it yesterday and it was spinning solely on __sysctl. > > > >> > > > > >> >I'll try compiling with debug symbols in case it happens again. I haven't been able to reproduce the > > > >> >problem in a reasonable time frame so it could be days or weeks before we see it happen again. > > > > Tha truss output is consistent with Mark's suggestion, so I would try > > > > his suggested fix of 272566. > > > > > > I patched the 10.1 kernel with r272566 and it appears to have fixed the issue. Is this patch likely > > > to be MFCed back to 10-stable? > > > > I can't see any reason it shouldn't be, and there was an MFC reminder in > > the commit log entry for that revision. I've cc'ed kib@, who might have a > > reason. > > The mentioned commit depends on r271976, in fact it depends on the series of > commits, including r271486 and r271489. > > I did not merged r271976 with manual resolution of the conficts, since it > means that the work done for HEAD needs to be redone for stable/10 to > ensure that all cases are covered. Later, when the mentioned series is > merged, the work should be redone once more. > > And to note, r271489 is not trivially mergeable as well, just checked. You could merge r272566 and just fixup the sbuf_bcat() in export_fd_to_sb() in kern_descrip.c instead. I hadn't really considered fo_fill_kinfo to be something that was mergeable to 10. -- John Baldwin