Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 12 Mar 2015 10:39:10 -0400
From:      John Baldwin <jhb@freebsd.org>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        freebsd-stable@freebsd.org, Mark Johnston <markj@freebsd.org>, Nick Frampton <nick.frampton@akips.com>, kib@freebsd.org
Subject:   Re: Suspected libkvm infinite loop
Message-ID:  <2374792.i316gF0qRo@ralph.baldwin.cx>
In-Reply-To: <20150312104023.GL2379@kib.kiev.ua>
References:  <54FE3803.2000307@akips.com> <20150312043407.GA11120@raichu> <20150312104023.GL2379@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thursday, March 12, 2015 12:40:23 PM Konstantin Belousov wrote:
> On Wed, Mar 11, 2015 at 09:34:07PM -0700, Mark Johnston wrote:
> > On Thu, Mar 12, 2015 at 02:05:32PM +1000, Nick Frampton wrote:
> > > On 12/03/15 00:38, John Baldwin wrote:
> > > >>> It sounds like this issue might be the one fixed in r272566: if the
> > > >>> > >KERN_PROC_ALL sysctl is read with an insufficiently large buffer, an
> > > >>> > >sbuf error return value could bubble up and be treated as ERESTART,
> > > >>> > >resulting in a loop.
> > > >>> > >
> > > >>> > >This can be confirmed with something like
> > > >>> > >
> > > >>> > >    dtrace -n 'syscall:::entry/pid == $target/{@[probefunc] = count();} tick-3s {exit(0);}' -p <pid of looping proc>
> > > >>> > >
> > > >>> > >If the output consists solely of __sysctl, this bug is likely the
> > > >>> > >culprit.
> > > >> >
> > > >> >Unfortunately, I accidentally killed fstat this morning before I could do any further debug.
> > > >> >
> > > >> >I ran truss -p on it yesterday and it was spinning solely on __sysctl.
> > > >> >
> > > >> >I'll try compiling with debug symbols in case it happens again. I haven't been able to reproduce the
> > > >> >problem in a reasonable time frame so it could be days or weeks before we see it happen again.
> > > > Tha truss output is consistent with Mark's suggestion, so I would try
> > > > his suggested fix of 272566.
> > > 
> > > I patched the 10.1 kernel with r272566 and it appears to have fixed the issue. Is this patch likely 
> > > to be MFCed back to 10-stable?
> > 
> > I can't see any reason it shouldn't be, and there was an MFC reminder in
> > the commit log entry for that revision. I've cc'ed kib@, who might have a
> > reason.
> 
> The mentioned commit depends on r271976, in fact it depends on the series of
> commits, including r271486 and r271489.
> 
> I did not merged r271976 with manual resolution of the conficts, since it
> means that the work done for HEAD needs to be redone for stable/10 to
> ensure that all cases are covered.  Later, when the mentioned series is
> merged, the work should be redone once more.
> 
> And to note, r271489 is not trivially mergeable as well, just checked.

You could merge r272566 and just fixup the sbuf_bcat() in export_fd_to_sb()
in kern_descrip.c instead.  I hadn't really considered fo_fill_kinfo to be
something that was mergeable to 10.

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2374792.i316gF0qRo>