Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 18 Apr 2007 01:03:39 -0700
From:      Julian Elischer <julian@elischer.org>
To:        Robert Watson <rwatson@FreeBSD.org>
Cc:        Tillman Hodgson <tillman@seekingfire.com>, current@freebsd.org
Subject:   Re: Panic on boot with April 16 src (lengthy info attached)
Message-ID:  <4625D0DB.1080902@elischer.org>
In-Reply-To: <20070418084345.H2913@fledge.watson.org>
References:  <20070417153357.GA1335@seekingfire.com>	<20070417173005.O42234@fledge.watson.org>	<20070417181627.GA1225@seekingfire.com>	<20070417220339.E2913@fledge.watson.org> <20070418084345.H2913@fledge.watson.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Robert Watson wrote:
> 
> On Tue, 17 Apr 2007, Robert Watson wrote:
> 
>>> I originally put it in there to work around a LOR that I was 
>>> experiencing (based on you mentioning it in an email to current@ Sun 
>>> 18 Mar 2007 15:50). http://sources.zabbadoz.net/freebsd/lor/191.html 
>>> doesn't show any changes to that particular LOR, do you happen to 
>>> know if there's any ongoing work on this? I'm very willing to act as 
>>> a test system.
>>
>> I chatted with Andre about the panic earlier this afternoon, and it 
>> sounds like the fix is straight forward.  I would anticipate seeing it 
>> committed in the near future.
>>
>> I'll send out an e-mail explaining the above lock order reversal 
>> tomorrow morning.  I understand that several people have been looking 
>> at this, so perhaps one of those people will reply talking about it 
>> before then. :-)
> 
> The essential problem of this lock order reversal has to do with the 
> fact that higher network stack layers hold locks over lower network 
> stack layers.  For example, the lock for a TCP connection is held over 
> the operation to enqueue the TCP packet for transmission at a lower 
> layer.  This is necessary in order to maintain TCP transmission order 
> into the transmission queue between multiple threads operating on the 
> same TCP connection, as if the "transmit and enqueue" operation were 
> non-atomic with respect to the same TCP connection in another thread, 
> quite damaging reordering could take place.  We directly dispatch the 
> entire outbound network stack from that enqueue point, meaning that the 
> per-TCP connection lock is held over that processing path, including the 
> firewall.  As a result, PCB locks (TCP connection locks) preceed the 
> firewall in the lock order.
> 
> Firewall locks are about protecting the rule state of the firewall from 
> corruption when firewall rules are updated, allowing readers to 
> interpret the rules using a static snapshot, and writers to avoid 
> mangling the rules via simultaneous non-atomic update.  As such, when 
> the firewall code is entered, the firewall lock is acquired, and held 
> until the packet has been completely processed.  Things get sticky deep 
> in the firewall code because our firewalls include credential-aware 
> rules, which essentially "peek up the stack" in order to decide what 
> user is associated with a packet before delivery to the connection is 
> done.  The firewall rule lock is held over this lookup and inspection of 
> TCP-layer state.  In the out-bound path, we pass down the TCP state 
> reference (PCB pointer) and guarantee the lock is already held. However, 
> in the in-bound direction, the firewall has to do the full lookup and 
> lock acquisition.  Which reverses the lock order, and can lead to 
> deadlocks.

I am doing work on fixing htis for ipfw.
it involves moving ipfw to a lockless method of operation.
(more info will be in the ipfw list in a few days)

> 
> debug.mpsafenet=0 places the Giant lock in front of all network stack 
> lock acquisition, which effectively serializes all of the above.  It 
> doesn't remove the lock order reversal, but it does eliminate 
> simultaneous lock acquisition, removing one of the necessary causes of 
> deadlock.  This trick of a serializing "global" lock in order to prevent 
> lock order between "leaf" locks is not an uncommon technique, but in 
> this case has a significant overhead (requiring non-parallelism in 
> network processing), and needs to be fixed.
> 
> The key is to guarantee that the acquisition of the firewall reference 
> will never be blocked waiting on a PCB lock -- i.e., that the firewall 
> "lock" isn't a lock so much as a reference count that will never have to 
> wait, removing the waiting requirement from the deadlock equation.  I 
> know that Julian Elischer has been looking at doing this, and others may 
> have also.  The model is essentially that you either starve writers to 
> the firewall data, or you create a read-only snapshot to be used by 
> readers in the event a writer arrives, allowing readers to pick up the 
> new rules if available, or the old rules if not, and never wait 
> indefinitely either way.

yep..
I have detailed plans afoot but not for pf.
I wouldn't know pf if it came up and kicked me in the shins so I'll be
leaving that to someone else.

> 
> Robert N M Watson
> Computer Laboratory
> University of Cambridge
> _______________________________________________
> freebsd-current@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4625D0DB.1080902>