Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 12 Mar 2015 14:05:32 +1000
From:      Nick Frampton <nick.frampton@akips.com>
To:        John Baldwin <jhb@freebsd.org>
Cc:        Mark Johnston <markj@freebsd.org>, freebsd-stable@freebsd.org
Subject:   Re: Suspected libkvm infinite loop
Message-ID:  <5501108C.4080303@akips.com>
In-Reply-To: <1648097.s1OBMXVVbH@ralph.baldwin.cx>
References:  <54FE3803.2000307@akips.com> <20150310215913.GB52108@charmander.picturesperfect.net> <54FFBDE9.5060702@akips.com> <1648097.s1OBMXVVbH@ralph.baldwin.cx>

next in thread | previous in thread | raw e-mail | index | archive | help
On 12/03/15 00:38, John Baldwin wrote:
>>> It sounds like this issue might be the one fixed in r272566: if the
>>> > >KERN_PROC_ALL sysctl is read with an insufficiently large buffer, an
>>> > >sbuf error return value could bubble up and be treated as ERESTART,
>>> > >resulting in a loop.
>>> > >
>>> > >This can be confirmed with something like
>>> > >
>>> > >    dtrace -n 'syscall:::entry/pid == $target/{@[probefunc] = count();} tick-3s {exit(0);}' -p <pid of looping proc>
>>> > >
>>> > >If the output consists solely of __sysctl, this bug is likely the
>>> > >culprit.
>> >
>> >Unfortunately, I accidentally killed fstat this morning before I could do any further debug.
>> >
>> >I ran truss -p on it yesterday and it was spinning solely on __sysctl.
>> >
>> >I'll try compiling with debug symbols in case it happens again. I haven't been able to reproduce the
>> >problem in a reasonable time frame so it could be days or weeks before we see it happen again.
> Tha truss output is consistent with Mark's suggestion, so I would try
> his suggested fix of 272566.

I patched the 10.1 kernel with r272566 and it appears to have fixed the issue. Is this patch likely 
to be MFCed back to 10-stable?

Our RC script forks off about 200 processes when starting our software, and I wrote a small script 
to repeatedly stop/start the software, which fairly reliably reproduces the issue about 1 in 10 
times. I've been running the script with the patched kernel for an hour now and I haven't seen the 
issue appear.

Thanks for your help.

-Nick
-- 
Founder, CTO
www.akips.com



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5501108C.4080303>