From owner-freebsd-questions@FreeBSD.ORG Mon Mar 29 18:28:53 2010 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 74FE71065676; Mon, 29 Mar 2010 18:28:53 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 4455D8FC0A; Mon, 29 Mar 2010 18:28:53 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id D5ED446B8C; Mon, 29 Mar 2010 14:28:52 -0400 (EDT) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPA id 310358A01F; Mon, 29 Mar 2010 14:28:52 -0400 (EDT) From: John Baldwin To: freebsd-stable@freebsd.org Date: Mon, 29 Mar 2010 14:27:34 -0400 User-Agent: KMail/1.12.1 (FreeBSD/7.3-CBSD-20100217; KDE/4.3.1; amd64; ; ) References: <20100329173038.GA4969@icarus.home.lan> In-Reply-To: <20100329173038.GA4969@icarus.home.lan> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201003291427.34641.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Mon, 29 Mar 2010 14:28:52 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-1.7 required=4.2 tests=AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx X-Mailman-Approved-At: Mon, 29 Mar 2010 18:38:08 +0000 Cc: freebsd-hackers@freebsd.org, Masoom Shaikh , Ivan Voras , Jeremy Chadwick , freebsd-questions@freebsd.org Subject: Re: random FreeBSD panics X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Mar 2010 18:28:53 -0000 On Monday 29 March 2010 1:30:38 pm Jeremy Chadwick wrote: > On Mon, Mar 29, 2010 at 05:01:02PM +0000, Masoom Shaikh wrote: > > On Sun, Mar 28, 2010 at 5:38 PM, Ivan Voras wrote: > > > On 28 March 2010 16:42, Masoom Shaikh wrote: > > > > > >> lets assume if this is h/w problem, then how can other OSes overcome > > >> this ? is there a way to make FreeBSD ignore this as well, let it > > >> result in reasonable performance penalty. > > > > > > Very probably, if only we could detect where the problem is. > > > Try adding "options PRINTF_BUFR_SIZE=128" to the kernel > > > > this option is already there > > The key word in Ivan's phrase is "less mangled". Neither use of or > increasing PRINTF_BUFR_SIZE solves the problem of interspersed console > output. I've been ranting/raving about this problem for years now; it > truly looks like a mutex lock issue (or lack of such lock), but I've > been told numerous times that isn't the case. > > To developers: what incentives would help get this issue well-needed > attention? This problem makes kernel debugging, panic analysis, and > other console-oriented viewing basically impossible. I was recently going to look at it. The somewhat drastic approach I was going to take was to add a simple serializing lock around trap_fatal() and a few other places that do similar block prints (e.g. mca_log()). One of the issues with fixing this in printf itself is that you'd want probably want to serialize complete lines of text on a per-thread basis. You would want to be able to accumulate this line of text across multiple calls to printf (think of it as line-buffering ala stdio). However, some folks may be nervous about printf not printing things immediately. The other issue is that lots of code assumes it can call printf from anywhere and everywhere. Mostly this just means that if you add locking and line- buffering to printf(9) you have to be very careful to make sure it works in odd places. Probably a lot of this could be solved by deferring things like trap_fatal() until panic() has already been called (which is bde's preferred solution I think). -- John Baldwin