From owner-freebsd-stable@FreeBSD.ORG Sun Mar 16 21:16:16 2008 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 33CC2106564A for ; Sun, 16 Mar 2008 21:16:16 +0000 (UTC) (envelope-from razor@dataxnet.ro) Received: from mail.dataxnet.ro (datax28.mediasat.ro [80.96.28.28]) by mx1.freebsd.org (Postfix) with SMTP id 3D2FD8FC1E for ; Sun, 16 Mar 2008 21:16:14 +0000 (UTC) (envelope-from razor@dataxnet.ro) Received: (qmail 72233 invoked by uid 1001); 16 Mar 2008 23:16:16 +0200 Date: Sun, 16 Mar 2008 23:16:16 +0200 From: Alex Popa To: Max Laier Message-ID: <20080316211616.GA67593@dataxnet.ro> References: <20080314192359.GA4677@dataxnet.ro> <20080315203121.I42065@fledge.watson.org> <200803152217.02568.max@love2party.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200803152217.02568.max@love2party.net> User-Agent: Mutt/1.4.2.2i Cc: freebsd-stable@freebsd.org, Robert Watson Subject: Re: Lock Order Reversal on 7.0-STABLE with pf and ipfw / dummynet X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Mar 2008 21:16:16 -0000 This is a mixed reply to both the previous mails, bear with me please. On Sat, Mar 15, 2008 at 10:16:54PM +0100, Max Laier wrote: > On Saturday 15 March 2008, Robert Watson wrote: > > On Fri, 14 Mar 2008, Alex Popa wrote: > > > [snip] > > > The LOR messages from dmesg of 7.0-STABLE are as follows: > > > > > > lock order reversal: > > > 1st 0xffffffffb19e0680 pf task mtx (pf task mtx) @ > > > /usr/src/sys/modules/pf/../../contrib/pf/net/pf.c:6729 2nd > > > 0xffffff00042ea0f0 radix node head (radix node head) @ > > > /usr/src/sys/net/route.c:147 > > I haven't seen this one before, can you obtain the trace for this, please? > You might need KDB & DDB for that - not sure. I'll do my best (see below for my questions about getting a trace). > > > lock order reversal: > > > 1st 0xffffffff80938508 PFil hook read/write mutex (PFil hook > > > read/write mutex) @ /usr/src/sys/net/pfil.c:73 2nd 0xffffffff80938c48 > > > tcp (tcp) @ /usr/src/sys/netinet/tcp_input.c:400 > > This one is most certainly harmless and can be ignored. It is caused by > user/group rules, but a LOR with the read instance of a rwlock will not > lead to a deadlock. I'm not using uid/gid/jail rules as far as I remember. I'll add another reply with pf.conf and the script I use to generate and reload the ipfw rules (but I'll anonymize them). > > Dear Alex, > > > > Thanks for this report, and sorry about the problem. It could well be > > that the lock order warning from WITNESS is related to the hang, and > > might reflect a recursion-related bug in the pf policy routing code. > > I'm not sure to what extent you can tolerate further downtime, but it > > would be useful to gather some more information about the hang itself > > to try and confirm the involvement of lock order. In particular, if > > it's feasible, it would be very helpful if you could boot back to the > > 7-STABLE kernel (keeping the 6.2-STABLE userspace should be fine, I > > you'll need at least a new pfctl, because the ioctl interface to /dev/pf > has changed. Switching between 6.2-RELEASE-p7 (not STABLE, because as I said 6.3 exhibited the lockups too) and 7-STABLE isn't that much of a problem. The upgrade path was "buy a new hard drive, set up everything and then adapt the old config files"... actually we bought 2 harddrives, and I set them up one with amd64 and another with i386. I think /etc and /usr/local/etc are perfectly identical on these 2 (I adapted the configs from 6.2 to 7.0, but I just copied them from amd64 to i386). So, actions needed to switch: Backup the database on 6.2 (with IP/MAC mappings and a bit more), put in the 7.0 hard drive, boot off 7.0, restore DB, let it run. Total downtime should be around 7 minutes tops. > > think), and when the hang occurs, use the console debuggger (ideally > > hooked up to serial or firewire) to run the following debugging > > commands: > > > > show pcpu > > show allpcpu > > trace > > alltrace > > show allocks > > show witness > > show lockedvnods > > show uma > > show malloc This is where things get a bit tricky, and I need advice. As I said, my observation is that the keyboard seems to stop working when the lockup occurs, that is, pressing Num Lock won't toggle the state of the LED. Thus I have some doubts that trying the good-old Control-Alt-ESC would have the desired effect (dropping me into the debugger). However, I'm not that familiar with the FreeBSD architecture, and wouldn't be surprised if the LED toggling would be in another thread and the macine will actually respond to the keyboard interrupt and drop me into ddb. Also, judging by the lack of NumLock activity (it works fine when the system's up), would serial console or firewire be functional during the lockup? Also, a bit of explanations: Why I'm asking the above: The current motherboard has a serial port (and it works, we've used it), but not a firewire port. The other motherboard we tried has firewire, but no serial. As a console workstation, I can get a few with serials, but not so easy with firewire. The null modem cable might be a problem too, depending on length. Also, since the lockup isn't easily reproducible, I'll probably need to spend some hours on location and if I'm going to do that, I'd like a degree of hope that either keyboard, serial console or firewire will work. Also, firewire will require me to switch motherboards, but that can be done together with the hard drive swapping, during the night. After a bit of studying NOTES, I was wondering if a combination of serial console (or just plain console) with "options WITNESS_KDB" would help get a "good enough" trace. The upside of this is that both LORs usually occur early (not much later than the login prompt, usually earlier) as opposed to after 12...18 hours, and I can either force a panic after each and get 2 core dumps, or run the debug commands suggested (either as debug LOR1 / continue / debug LOR2, or debug LOR1 / reboot / "continue" LOR1 / debug LOR2 - whichever is more appropriate). For the moment I have both hard drives (7.0-STABLE/amd64 and 7.0-RELEASE/i386) and the new motherboard (no serial, but with firewire) as a working computer under my desk. I can prepare for the night-time switch and debug by compiling kernel and/or world and doing some preliminary testing here. If I really need to test null modem console, I can put the hdd in my own desktop and test with another machine. > > A shot-in-the-dark guess is that something about pf's interactions with > > the protocol stack is involved here, but unfortunately I suspect we'll > > need some more information to track it down. > > > > Also, could you confirm if you're using any credential-related firewall > > rules with either ipfw or pf? These would be uid/gid/jail matching > > rules. As I said above, I don't use any uid/gid/jail rules. Mail with pf.conf and ipfw config incoming shortly after this one. > > Robert N M Watson > > Computer Laboratory > > University of Cambridge > > [snip] > > That's quite a complex setup. It would really be interesting to get the > trace for the first LOR in order to figure out which code path we are > looking at. I have a feeling that it might be the dummynet entry point, > but w/o the trace this is only speculation. Working on it. > -- > /"\ Best regards, | mlaier@freebsd.org > \ / Max Laier | ICQ #67774661 > X http://pf4freebsd.love2party.net/ | mlaier@EFnet > / \ ASCII Ribbon Campaign | Against HTML Mail and News I'd like suggestions / comments about the kernel config I'm thinking about for debugging purposes: - take my KERNEL (GENERIC + IPFW - IPv6 and SCTP and wireless), and add: options WITNESS options WITNESS_KDB # only if debug-on-first-warn is wanted options WITNESS_SKIPSPIN options KDB #options KDB_TRACE # not needed since I'll trace anyway? options DDB #options BREAK_TO_DEBUGGER # would that work for my kind of lockup? options MSGBUF_SIZE=409600 Ideally I would like to hear that the manual tracing and debugging with a keyboard console would provide enough info. I'll increase the kernel buffer size to 400k as above, so I don't lose info when I continue and dmesg > log.txt. Just as easily, I can try forcing a panic at the LORs and keeping the kernel dumps (with optional debugging in ddb like above). The advantage is that this might andswer supplementary questions after the deed is done. Both the above options should be possible this week. The serial console part may or may not happen this week, and I'm quite positive it will take another week before I find the time to spend 16+ hours on location, waiting for a lockup (which might happen at a busy time and therefore I'll have very little time to do all the debugging). Tips / suggestions are most welcome! Thanks for the help! Alex -- "Computer science is no more about computers than astronomy is about telescopes" -- E. W. Dijkstra