From owner-freebsd-current@FreeBSD.ORG  Wed Apr 18 08:03:34 2007
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
X-Original-To: current@freebsd.org
Delivered-To: freebsd-current@FreeBSD.ORG
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 5250816A400
	for <current@freebsd.org>; Wed, 18 Apr 2007 08:03:34 +0000 (UTC)
	(envelope-from julian@elischer.org)
Received: from outP.internet-mail-service.net (outP.internet-mail-service.net
	[216.240.47.239])
	by mx1.freebsd.org (Postfix) with ESMTP id 3D26A13C487
	for <current@freebsd.org>; Wed, 18 Apr 2007 08:03:34 +0000 (UTC)
	(envelope-from julian@elischer.org)
Received: from mx0.idiom.com (HELO idiom.com) (216.240.32.160)
	by out.internet-mail-service.net (qpsmtpd/0.32) with ESMTP;
	Wed, 18 Apr 2007 00:31:51 -0700
Received: from [192.168.2.6] (home.elischer.org [216.240.48.38])
	by idiom.com (Postfix) with ESMTP id 0F1C3125AED;
	Wed, 18 Apr 2007 01:03:33 -0700 (PDT)
Message-ID: <4625D0DB.1080902@elischer.org>
Date: Wed, 18 Apr 2007 01:03:39 -0700
From: Julian Elischer <julian@elischer.org>
User-Agent: Thunderbird 1.5.0.10 (Macintosh/20070221)
MIME-Version: 1.0
To: Robert Watson <rwatson@FreeBSD.org>
References: <20070417153357.GA1335@seekingfire.com>	<20070417173005.O42234@fledge.watson.org>	<20070417181627.GA1225@seekingfire.com>	<20070417220339.E2913@fledge.watson.org>
	<20070418084345.H2913@fledge.watson.org>
In-Reply-To: <20070418084345.H2913@fledge.watson.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Tillman Hodgson <tillman@seekingfire.com>, current@freebsd.org
Subject: Re: Panic on boot with April 16 src (lengthy info attached)
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 18 Apr 2007 08:03:34 -0000

Robert Watson wrote:
> 
> On Tue, 17 Apr 2007, Robert Watson wrote:
> 
>>> I originally put it in there to work around a LOR that I was 
>>> experiencing (based on you mentioning it in an email to current@ Sun 
>>> 18 Mar 2007 15:50). http://sources.zabbadoz.net/freebsd/lor/191.html 
>>> doesn't show any changes to that particular LOR, do you happen to 
>>> know if there's any ongoing work on this? I'm very willing to act as 
>>> a test system.
>>
>> I chatted with Andre about the panic earlier this afternoon, and it 
>> sounds like the fix is straight forward.  I would anticipate seeing it 
>> committed in the near future.
>>
>> I'll send out an e-mail explaining the above lock order reversal 
>> tomorrow morning.  I understand that several people have been looking 
>> at this, so perhaps one of those people will reply talking about it 
>> before then. :-)
> 
> The essential problem of this lock order reversal has to do with the 
> fact that higher network stack layers hold locks over lower network 
> stack layers.  For example, the lock for a TCP connection is held over 
> the operation to enqueue the TCP packet for transmission at a lower 
> layer.  This is necessary in order to maintain TCP transmission order 
> into the transmission queue between multiple threads operating on the 
> same TCP connection, as if the "transmit and enqueue" operation were 
> non-atomic with respect to the same TCP connection in another thread, 
> quite damaging reordering could take place.  We directly dispatch the 
> entire outbound network stack from that enqueue point, meaning that the 
> per-TCP connection lock is held over that processing path, including the 
> firewall.  As a result, PCB locks (TCP connection locks) preceed the 
> firewall in the lock order.
> 
> Firewall locks are about protecting the rule state of the firewall from 
> corruption when firewall rules are updated, allowing readers to 
> interpret the rules using a static snapshot, and writers to avoid 
> mangling the rules via simultaneous non-atomic update.  As such, when 
> the firewall code is entered, the firewall lock is acquired, and held 
> until the packet has been completely processed.  Things get sticky deep 
> in the firewall code because our firewalls include credential-aware 
> rules, which essentially "peek up the stack" in order to decide what 
> user is associated with a packet before delivery to the connection is 
> done.  The firewall rule lock is held over this lookup and inspection of 
> TCP-layer state.  In the out-bound path, we pass down the TCP state 
> reference (PCB pointer) and guarantee the lock is already held. However, 
> in the in-bound direction, the firewall has to do the full lookup and 
> lock acquisition.  Which reverses the lock order, and can lead to 
> deadlocks.

I am doing work on fixing htis for ipfw.
it involves moving ipfw to a lockless method of operation.
(more info will be in the ipfw list in a few days)

> 
> debug.mpsafenet=0 places the Giant lock in front of all network stack 
> lock acquisition, which effectively serializes all of the above.  It 
> doesn't remove the lock order reversal, but it does eliminate 
> simultaneous lock acquisition, removing one of the necessary causes of 
> deadlock.  This trick of a serializing "global" lock in order to prevent 
> lock order between "leaf" locks is not an uncommon technique, but in 
> this case has a significant overhead (requiring non-parallelism in 
> network processing), and needs to be fixed.
> 
> The key is to guarantee that the acquisition of the firewall reference 
> will never be blocked waiting on a PCB lock -- i.e., that the firewall 
> "lock" isn't a lock so much as a reference count that will never have to 
> wait, removing the waiting requirement from the deadlock equation.  I 
> know that Julian Elischer has been looking at doing this, and others may 
> have also.  The model is essentially that you either starve writers to 
> the firewall data, or you create a read-only snapshot to be used by 
> readers in the event a writer arrives, allowing readers to pick up the 
> new rules if available, or the old rules if not, and never wait 
> indefinitely either way.

yep..
I have detailed plans afoot but not for pf.
I wouldn't know pf if it came up and kicked me in the shins so I'll be
leaving that to someone else.

> 
> Robert N M Watson
> Computer Laboratory
> University of Cambridge
> _______________________________________________
> freebsd-current@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"