From owner-freebsd-stable@FreeBSD.ORG Thu Mar 12 04:34:14 2015 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 55F396A4; Thu, 12 Mar 2015 04:34:14 +0000 (UTC) Received: from mail-pa0-x22c.google.com (mail-pa0-x22c.google.com [IPv6:2607:f8b0:400e:c03::22c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 1EADD2B5; Thu, 12 Mar 2015 04:34:14 +0000 (UTC) Received: by padet14 with SMTP id et14so17254065pad.0; Wed, 11 Mar 2015 21:34:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=f09oNAHrFPpw+L7sFwcFrr0OTbsI6YpbjEb5yiGAG4E=; b=scD2w9SA5Edm3dGO/71kgDJAr7im0MqfFngZ+vYdVcPZM8dBC8KI461aBvcVVYFhpU 6uCt0xB37i5xrbRyUC1R6+PvPWgFBRGRTs96Gzdk0K1NhRJBdAT4qb6Z3124dDUxOaKt bDK+BDumSO1vWsneMJnL+5aCjbegUeoM6u6/azpeDpGkceEM/gGaEwRYdSemnNnhzAt9 KKy33tNKttjXe5oPajxGnbl684MNk1KX1D4sulYS34NPI15hQwXblGXJqXNKX9puPH4i saO3HCTF8wapCmObcs376/JfBTJKsuVRxPljQyVnSy9DjrSf9zhaf+UiTsrU1RonEN24 JATw== X-Received: by 10.66.65.195 with SMTP id z3mr85930782pas.10.1426134853774; Wed, 11 Mar 2015 21:34:13 -0700 (PDT) Received: from raichu (216-243-33-91.users.condointernet.net. [216.243.33.91]) by mx.google.com with ESMTPSA id k5sm8348328pdf.95.2015.03.11.21.34.12 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 11 Mar 2015 21:34:12 -0700 (PDT) Sender: Mark Johnston Date: Wed, 11 Mar 2015 21:34:07 -0700 From: Mark Johnston To: Nick Frampton Subject: Re: Suspected libkvm infinite loop Message-ID: <20150312043407.GA11120@raichu> References: <54FE3803.2000307@akips.com> <20150310215913.GB52108@charmander.picturesperfect.net> <54FFBDE9.5060702@akips.com> <1648097.s1OBMXVVbH@ralph.baldwin.cx> <5501108C.4080303@akips.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5501108C.4080303@akips.com> User-Agent: Mutt/1.5.23 (2014-03-12) Cc: kib@FreeBSD.org, freebsd-stable@freebsd.org, John Baldwin X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Mar 2015 04:34:14 -0000 On Thu, Mar 12, 2015 at 02:05:32PM +1000, Nick Frampton wrote: > On 12/03/15 00:38, John Baldwin wrote: > >>> It sounds like this issue might be the one fixed in r272566: if the > >>> > >KERN_PROC_ALL sysctl is read with an insufficiently large buffer, an > >>> > >sbuf error return value could bubble up and be treated as ERESTART, > >>> > >resulting in a loop. > >>> > > > >>> > >This can be confirmed with something like > >>> > > > >>> > > dtrace -n 'syscall:::entry/pid == $target/{@[probefunc] = count();} tick-3s {exit(0);}' -p > >>> > > > >>> > >If the output consists solely of __sysctl, this bug is likely the > >>> > >culprit. > >> > > >> >Unfortunately, I accidentally killed fstat this morning before I could do any further debug. > >> > > >> >I ran truss -p on it yesterday and it was spinning solely on __sysctl. > >> > > >> >I'll try compiling with debug symbols in case it happens again. I haven't been able to reproduce the > >> >problem in a reasonable time frame so it could be days or weeks before we see it happen again. > > Tha truss output is consistent with Mark's suggestion, so I would try > > his suggested fix of 272566. > > I patched the 10.1 kernel with r272566 and it appears to have fixed the issue. Is this patch likely > to be MFCed back to 10-stable? I can't see any reason it shouldn't be, and there was an MFC reminder in the commit log entry for that revision. I've cc'ed kib@, who might have a reason. > > Our RC script forks off about 200 processes when starting our software, and I wrote a small script > to repeatedly stop/start the software, which fairly reliably reproduces the issue about 1 in 10 > times. I've been running the script with the patched kernel for an hour now and I haven't seen the > issue appear.