From owner-freebsd-current@FreeBSD.ORG  Sun Dec 10 03:11:42 2006
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
X-Original-To: freebsd-current@freebsd.org
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id D016716A40F;
	Sun, 10 Dec 2006 03:11:42 +0000 (UTC)
	(envelope-from avatar@mmlab.cse.yzu.edu.tw)
Received: from www.mmlab.cse.yzu.edu.tw (www.mmlab.cse.yzu.edu.tw
	[140.138.150.166])
	by mx1.FreeBSD.org (Postfix) with ESMTP id A4C5543C9F;
	Sun, 10 Dec 2006 03:10:34 +0000 (GMT)
	(envelope-from avatar@mmlab.cse.yzu.edu.tw)
Received: by www.mmlab.cse.yzu.edu.tw (qmail, from userid 1000)
	id 97AE68C9A18; Sun, 10 Dec 2006 11:11:40 +0800 (CST)
Received: from localhost (localhost [127.0.0.1])
	by www.mmlab.cse.yzu.edu.tw (qmail) with ESMTP id 72FE98C984F;
	Sun, 10 Dec 2006 11:11:40 +0800 (CST)
Date: Sun, 10 Dec 2006 11:11:40 +0800 (CST)
From: Tai-hwa Liang <avatar@mmlab.cse.yzu.edu.tw>
To: Robert Watson <rwatson@FreeBSD.org>
In-Reply-To: <20061209214233.L2273@fledge.watson.org>
Message-ID: <0612101036232.41529@www.mmlab.cse.yzu.edu.tw>
References: <52944.192.168.1.110.1165679313.squirrel@yal.hopto.org> 
	<20061209195519.B60055@mp2.macomnet.net>
	<20061209204924.N9926@fledge.watson.org>
	<cb5206420612091310r719f7b3en2d4fb35b23453ddf@mail.gmail.com>
	<20061209214233.L2273@fledge.watson.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: Andrew Pantyukhin <infofarmer@FreeBSD.org>, freebsd-current@freebsd.org,
	yal <yal@yal.hopto.org>
Subject: Re: CURRENT freezes on Laitude D520
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 10 Dec 2006 03:11:42 -0000

On Sat, 9 Dec 2006, Robert Watson wrote:
[...]
> Right now, setting debug.mpsafenet=1 has three effects:
>
> (1) Place Giant over the network stack, creating a single lock that spans the
>    entire stack, preventing parallelism, as well as acting as a "master" 
> lock
>    which implicitly prevents lock order-related deadlocks in the stack.
>
> (2) Effectively disabling preemption in the network stack, as ithreads and 
> the
>    netisr will be unable to start running until user threads exit the stack,
>    regardless of priority.
>
> (3) Effectively disable direct dispatch, as non-MPSAFE netisr handlers are
>    always deferred rather than executing in the ithread context.
>
> I suspect that many of the people setting debug.mpsafenet=1 and declaring the 
> problem fixed are seeing the change due to (2) and (3), indirect rather than 
> direct effects of (1).  I would much rather people experimented with:
>
> - Disabling direct dispatch (net.isr.direct=0)
>
> - Disabling preemption (compiling out options PREEMPTION)
>
> - Running with WITNESS, which reports lock order reversals.
>
> which get a bit more to the heart of most problems.  debug.mpsafenet=1 really 
> exists for the purposes of supporting components which are not sufficiently 
> locked to allow the stack to run MPSAFE, rather than as a means of disabling 
> direct dispatch and preemption, which speak to different types of problems. 
> The main reason that I haven't removed the administrator tunable to date is 
> that I suspect it will be quite helpful when KAME IPSEC locking happens, but 
> since that appears not to have happened yet, debug.mpsafenet as an option is 
> likely causing more harm than good by being available as a stand-in sysctl 
> masking other problems, causing people to not get to the point of properly 
> identifying the actual cause (device driver bugs, etc).

   Can the aforementioned tricks(1/2/3) being applied to RELENG_6 as well?

   We are using RELENG_6 as our production server(postfix, squid,
pf firewall/NAT, FAST_IPSEC VPN, ...), which is a dual Athlon MP board
with three NICs(two fxp cards and one onboard xl, connected to three
different networks).

   I haven't try WITNESS, yet; however, I'm very sure that net.isr.direct=0
plus that there is no PREEMPTION in current kernel.  The problem is that,
with debug.mpsafenet=1, we'll always run into hard freeze w/o having any
kdb> prompt on console.

   Whilst turning debug.mpsafenet off only masks the real problem, I'm still
wondering about if there is any less damaging way to track such problem
down in a _production_ environment.

-- 
Thanks,

Tai-hwa Liang