From owner-freebsd-stable@FreeBSD.ORG  Sun Mar 16 21:16:16 2008
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 33CC2106564A
	for <freebsd-stable@freebsd.org>; Sun, 16 Mar 2008 21:16:16 +0000 (UTC)
	(envelope-from razor@dataxnet.ro)
Received: from mail.dataxnet.ro (datax28.mediasat.ro [80.96.28.28])
	by mx1.freebsd.org (Postfix) with SMTP id 3D2FD8FC1E
	for <freebsd-stable@freebsd.org>; Sun, 16 Mar 2008 21:16:14 +0000 (UTC)
	(envelope-from razor@dataxnet.ro)
Received: (qmail 72233 invoked by uid 1001); 16 Mar 2008 23:16:16 +0200
Date: Sun, 16 Mar 2008 23:16:16 +0200
From: Alex Popa <razor@dataxnet.ro>
To: Max Laier <max@love2party.net>
Message-ID: <20080316211616.GA67593@dataxnet.ro>
References: <20080314192359.GA4677@dataxnet.ro>
	<20080315203121.I42065@fledge.watson.org>
	<200803152217.02568.max@love2party.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <200803152217.02568.max@love2party.net>
User-Agent: Mutt/1.4.2.2i
Cc: freebsd-stable@freebsd.org, Robert Watson <rwatson@freebsd.org>
Subject: Re: Lock Order Reversal on 7.0-STABLE with pf and ipfw / dummynet
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 16 Mar 2008 21:16:16 -0000

This is a mixed reply to both the previous mails, bear with me please.

On Sat, Mar 15, 2008 at 10:16:54PM +0100, Max Laier wrote:
> On Saturday 15 March 2008, Robert Watson wrote:
> > On Fri, 14 Mar 2008, Alex Popa wrote:
> > > [snip]
> > > The LOR messages from dmesg of 7.0-STABLE are as follows:
> > >
> > > lock order reversal:
> > > 1st 0xffffffffb19e0680 pf task mtx (pf task mtx) @
> > > /usr/src/sys/modules/pf/../../contrib/pf/net/pf.c:6729 2nd
> > > 0xffffff00042ea0f0 radix node head (radix node head) @
> > > /usr/src/sys/net/route.c:147
> 
> I haven't seen this one before, can you obtain the trace for this, please?  
> You might need KDB & DDB for that - not sure.

I'll do my best (see below for my questions about getting a trace).

> > > lock order reversal: 
> > > 1st 0xffffffff80938508 PFil hook read/write mutex (PFil hook
> > > read/write mutex) @ /usr/src/sys/net/pfil.c:73 2nd 0xffffffff80938c48
> > > tcp (tcp) @ /usr/src/sys/netinet/tcp_input.c:400
> 
> This one is most certainly harmless and can be ignored.  It is caused by 
> user/group rules, but a LOR with the read instance of a rwlock will not 
> lead to a deadlock.

I'm not using uid/gid/jail rules as far as I remember.  I'll add another
reply with pf.conf and the script I use to generate and reload the ipfw
rules (but I'll anonymize them).

> > Dear Alex,
> >
> > Thanks for this report, and sorry about the problem.  It could well be
> > that the lock order warning from WITNESS is related to the hang, and
> > might reflect a recursion-related bug in the pf policy routing code. 
> > I'm not sure to what extent you can tolerate further downtime, but it
> > would be useful to gather some more information about the hang itself
> > to try and confirm the involvement of lock order.  In particular, if
> > it's feasible, it would be very helpful if you could boot back to the
> > 7-STABLE kernel (keeping the 6.2-STABLE userspace should be fine, I
> 
> you'll need at least a new pfctl, because the ioctl interface to /dev/pf 
> has changed.

Switching between 6.2-RELEASE-p7 (not STABLE, because as I said 6.3
exhibited the lockups too) and 7-STABLE isn't that much of a problem.
The upgrade path was "buy a new hard drive, set up everything and then
adapt the old config files"... actually we bought 2 harddrives, and I
set them up one with amd64 and another with i386.  I think /etc and
/usr/local/etc are perfectly identical on these 2 (I adapted the configs
from 6.2 to 7.0, but I just copied them from amd64 to i386).

So, actions needed to switch:  Backup the database on 6.2 (with IP/MAC
mappings and a bit more), put in the 7.0 hard drive, boot off 7.0,
restore DB, let it run.  Total downtime should be around 7 minutes tops.

> > think), and when the hang occurs, use the console debuggger (ideally
> > hooked up to serial or firewire) to run the following debugging
> > commands:
> >
> >    show pcpu
> >    show allpcpu
> >    trace
> >    alltrace
> >    show allocks
> >    show witness
> >    show lockedvnods
> >    show uma
> >    show malloc

This is where things get a bit tricky, and I need advice.

As I said, my observation is that the keyboard seems to stop working
when the lockup occurs, that is, pressing Num Lock won't toggle the
state of the LED.  Thus I have some doubts that trying the good-old
Control-Alt-ESC would have the desired effect (dropping me into the
debugger).  However, I'm not that familiar with the FreeBSD
architecture, and wouldn't be surprised if the LED toggling would be in
another thread and the macine will actually respond to the keyboard
interrupt and drop me into ddb.  Also, judging by the lack of NumLock
activity (it works fine when the system's up), would serial console or
firewire be functional during the lockup?

Also, a bit of explanations:

Why I'm asking the above:  The current motherboard has a serial port
(and it works, we've used it), but not a firewire port.  The other
motherboard we tried has firewire, but no serial.  As a console
workstation, I can get a few with serials, but not so easy with
firewire.  The null modem cable might be a problem too, depending on
length.

Also, since the lockup isn't easily reproducible, I'll probably need to
spend some hours on location and if I'm going to do that, I'd like a
degree of hope that either keyboard, serial console or firewire will
work.  Also, firewire will require me to switch motherboards, but that
can be done together with the hard drive swapping, during the night.

After a bit of studying NOTES, I was wondering if a combination of
serial console (or just plain console) with "options WITNESS_KDB" would
help get a "good enough" trace.  The upside of this is that both LORs
usually occur early (not much later than the login prompt, usually
earlier) as opposed to after 12...18 hours, and I can either force a
panic after each and get 2 core dumps, or run the debug commands
suggested (either as debug LOR1 / continue / debug LOR2, or debug LOR1 /
reboot / "continue" LOR1 / debug LOR2 - whichever is more appropriate).

For the moment I have both hard drives (7.0-STABLE/amd64 and
7.0-RELEASE/i386) and the new motherboard (no serial, but with firewire)
as a working computer under my desk.  I can prepare for the night-time
switch and debug by compiling kernel and/or world and doing some
preliminary testing here.  If I really need to test null modem console,
I can put the hdd in my own desktop and test with another machine.

> > A shot-in-the-dark guess is that something about pf's interactions with
> > the protocol stack is involved here, but unfortunately I suspect we'll
> > need some more information to track it down.
> >
> > Also, could you confirm if you're using any credential-related firewall
> > rules with either ipfw or pf?  These would be uid/gid/jail matching
> > rules.

As I said above, I don't use any uid/gid/jail rules.  Mail with pf.conf
and ipfw config incoming shortly after this one.

> > Robert N M Watson
> > Computer Laboratory
> > University of Cambridge
> >
[snip]
> 
> That's quite a complex setup.  It would really be interesting to get the 
> trace for the first LOR in order to figure out which code path we are 
> looking at.  I have a feeling that it might be the dummynet entry point, 
> but w/o the trace this is only speculation.

Working on it.

> -- 
> /"\  Best regards,                      | mlaier@freebsd.org
> \ /  Max Laier                          | ICQ #67774661
>  X   http://pf4freebsd.love2party.net/  | mlaier@EFnet
> / \  ASCII Ribbon Campaign              | Against HTML Mail and News


I'd like suggestions / comments about the kernel config I'm thinking
about for debugging purposes:

- take my KERNEL (GENERIC + IPFW - IPv6 and SCTP and wireless), and add:

options		WITNESS
options		WITNESS_KDB	# only if debug-on-first-warn is wanted
options		WITNESS_SKIPSPIN
options		KDB
#options	KDB_TRACE	# not needed since I'll trace anyway?
options		DDB
#options	BREAK_TO_DEBUGGER	# would that work for my kind of lockup?
options		MSGBUF_SIZE=409600


Ideally I would like to hear that the manual tracing and debugging with
a keyboard console would provide enough info.  I'll increase the kernel
buffer size to 400k as above, so I don't lose info when I continue and
dmesg > log.txt.

Just as easily, I can try forcing a panic at the LORs and keeping the
kernel dumps (with optional debugging in ddb like above).  The advantage
is that this might andswer supplementary questions after the deed is
done.

Both the above options should be possible this week.

The serial console part may or may not happen this week, and I'm quite
positive it will take another week before I find the time to spend 16+
hours on location, waiting for a lockup (which might happen at a busy
time and therefore I'll have very little time to do all the debugging).

Tips / suggestions are most welcome!

Thanks for the help!
	Alex

-- 
 "Computer science is no more about computers
     than astronomy is about telescopes" -- E. W. Dijkstra