From owner-freebsd-stable@FreeBSD.ORG Thu Mar 12 04:05:42 2015 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 41D63740; Thu, 12 Mar 2015 04:05:42 +0000 (UTC) Received: from mail.akips.com (mail.akips.com [65.19.130.19]) by mx1.freebsd.org (Postfix) with ESMTP id 2AEC2ED8; Thu, 12 Mar 2015 04:05:41 +0000 (UTC) Received: from [10.1.8.7] (CPE-120-146-191-2.static.qld.bigpond.net.au [120.146.191.2]) by mail.akips.com (Postfix) with ESMTPSA id B83F42824E; Thu, 12 Mar 2015 14:05:34 +1000 (EST) Message-ID: <5501108C.4080303@akips.com> Date: Thu, 12 Mar 2015 14:05:32 +1000 From: Nick Frampton User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0 MIME-Version: 1.0 To: John Baldwin Subject: Re: Suspected libkvm infinite loop References: <54FE3803.2000307@akips.com> <20150310215913.GB52108@charmander.picturesperfect.net> <54FFBDE9.5060702@akips.com> <1648097.s1OBMXVVbH@ralph.baldwin.cx> In-Reply-To: <1648097.s1OBMXVVbH@ralph.baldwin.cx> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,URIBL_BLOCKED autolearn=disabled version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on host1.akips.com Cc: Mark Johnston , freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Mar 2015 04:05:42 -0000 On 12/03/15 00:38, John Baldwin wrote: >>> It sounds like this issue might be the one fixed in r272566: if the >>> > >KERN_PROC_ALL sysctl is read with an insufficiently large buffer, an >>> > >sbuf error return value could bubble up and be treated as ERESTART, >>> > >resulting in a loop. >>> > > >>> > >This can be confirmed with something like >>> > > >>> > > dtrace -n 'syscall:::entry/pid == $target/{@[probefunc] = count();} tick-3s {exit(0);}' -p >>> > > >>> > >If the output consists solely of __sysctl, this bug is likely the >>> > >culprit. >> > >> >Unfortunately, I accidentally killed fstat this morning before I could do any further debug. >> > >> >I ran truss -p on it yesterday and it was spinning solely on __sysctl. >> > >> >I'll try compiling with debug symbols in case it happens again. I haven't been able to reproduce the >> >problem in a reasonable time frame so it could be days or weeks before we see it happen again. > Tha truss output is consistent with Mark's suggestion, so I would try > his suggested fix of 272566. I patched the 10.1 kernel with r272566 and it appears to have fixed the issue. Is this patch likely to be MFCed back to 10-stable? Our RC script forks off about 200 processes when starting our software, and I wrote a small script to repeatedly stop/start the software, which fairly reliably reproduces the issue about 1 in 10 times. I've been running the script with the patched kernel for an hour now and I haven't seen the issue appear. Thanks for your help. -Nick -- Founder, CTO www.akips.com