From owner-freebsd-stable@FreeBSD.ORG Sat Mar 15 21:30:18 2008 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8EE78106566B for ; Sat, 15 Mar 2008 21:30:18 +0000 (UTC) (envelope-from max@love2party.net) Received: from moutng.kundenserver.de (moutng.kundenserver.de [212.227.126.188]) by mx1.freebsd.org (Postfix) with ESMTP id 25B198FC1E for ; Sat, 15 Mar 2008 21:30:18 +0000 (UTC) (envelope-from max@love2party.net) Received: from amd64.laiers.local (dslb-088-066-008-052.pools.arcor-ip.net [88.66.8.52]) by mrelayeu.kundenserver.de (node=mrelayeu3) with ESMTP (Nemesis) id 0MKxQS-1Jadjh0Wfe-000320; Sat, 15 Mar 2008 22:16:13 +0100 From: Max Laier Organization: FreeBSD To: Alex Popa Date: Sat, 15 Mar 2008 22:16:54 +0100 User-Agent: KMail/1.9.7 References: <20080314192359.GA4677@dataxnet.ro> <20080315203121.I42065@fledge.watson.org> In-Reply-To: <20080315203121.I42065@fledge.watson.org> X-Face: ,,8R(x[kmU]tKN@>gtH1yQE4aslGdu+2]; R]*pL,U>^H?)gW@49@wdJ`H<=?utf-8?q?=25=7D*=5FBD=0A=09U=5For=3D=5CmOZf764=26nYj=3DJYbR1PW0ud?=>|!~,,CPC.1-D$FG@0h3#'5"k{V]a~.<=?utf-8?q?mZ=7D44=23Se=7Em=0A=09Fe=7E=5C=5DX5B=5D=5Fxj?=(ykz9QKMw_l0C2AQ]}Ym8)fU MIME-Version: 1.0 Content-Type: multipart/signed; boundary="nextPart1488400.EEhlPe4bs8"; protocol="application/pgp-signature"; micalg=pgp-sha1 Content-Transfer-Encoding: 7bit Message-Id: <200803152217.02568.max@love2party.net> X-Provags-ID: V01U2FsdGVkX1/hVzp6XCqshOeeUNc4ROpAXeKA3/qZDgkjs0e AqQWtnXP4jIqNnsBgwq6nCnyFTj2ysMWZ80vNVksJs2TxiZOyF s/aK743pUTmxzZeELkZcidSGKYQuVmuYS550ZRtQLw= Cc: freebsd-stable@freebsd.org, Robert Watson Subject: Re: Lock Order Reversal on 7.0-STABLE with pf and ipfw / dummynet X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 15 Mar 2008 21:30:18 -0000 --nextPart1488400.EEhlPe4bs8 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline On Saturday 15 March 2008, Robert Watson wrote: > On Fri, 14 Mar 2008, Alex Popa wrote: > > World was cvsupped on March 6th, around 18:00 GMT. > > > > Built and installed kernel + world, with options WITNESS and > > WITNESS_SKIPSPIN. > > > > Short background: 7.0-RELEASE had excellent performance on the > > machine, but it would randomly lock up after some hours (usually over > > 10 hours). The lockups were hard, meaning nothing seemed to work > > (NumLock didn't toggle the keyboard LED, no replies to ping, no disk > > activity). We changed the motherboard and RAM and had the same > > behaviour. 6.2-REL is rock solid on this machine (had over 50 days > > uptime), but upgrading to 6.3-REL made it lock up just like 7.0 (so > > we put 6.2 back and accepted the lower performance for the time > > being). > > > > The LOR messages from dmesg of 7.0-STABLE are as follows: > > > > lock order reversal: > > 1st 0xffffffffb19e0680 pf task mtx (pf task mtx) @ > > /usr/src/sys/modules/pf/../../contrib/pf/net/pf.c:6729 2nd > > 0xffffff00042ea0f0 radix node head (radix node head) @ > > /usr/src/sys/net/route.c:147 I haven't seen this one before, can you obtain the trace for this, please? = =20 You might need KDB & DDB for that - not sure. > > lock order reversal:=20 > > 1st 0xffffffff80938508 PFil hook read/write mutex (PFil hook > > read/write mutex) @ /usr/src/sys/net/pfil.c:73 2nd 0xffffffff80938c48 > > tcp (tcp) @ /usr/src/sys/netinet/tcp_input.c:400 This one is most certainly harmless and can be ignored. It is caused by=20 user/group rules, but a LOR with the read instance of a rwlock will not=20 lead to a deadlock. > Dear Alex, > > Thanks for this report, and sorry about the problem. It could well be > that the lock order warning from WITNESS is related to the hang, and > might reflect a recursion-related bug in the pf policy routing code.=20 > I'm not sure to what extent you can tolerate further downtime, but it > would be useful to gather some more information about the hang itself > to try and confirm the involvement of lock order. In particular, if > it's feasible, it would be very helpful if you could boot back to the > 7-STABLE kernel (keeping the 6.2-STABLE userspace should be fine, I you'll need at least a new pfctl, because the ioctl interface to /dev/pf=20 has changed. > think), and when the hang occurs, use the console debuggger (ideally > hooked up to serial or firewire) to run the following debugging > commands: > > show pcpu > show allpcpu > trace > alltrace > show allocks > show witness > show lockedvnods > show uma > show malloc > > A shot-in-the-dark guess is that something about pf's interactions with > the protocol stack is involved here, but unfortunately I suspect we'll > need some more information to track it down. > > Also, could you confirm if you're using any credential-related firewall > rules with either ipfw or pf? These would be uid/gid/jail matching > rules. > > Robert N M Watson > Computer Laboratory > University of Cambridge > > > More details about the machine in the attached dmesg. It's a SMP > > with 4GB of RAM, 3 gigabit cards (em0, em1 and, depending on the > > motherboard we used, either bge0 or msk0). Only em0 is linked to a > > gigabit port, the others are 100Mbits/s > > > > My setup has in-kernel IPFIREWALL, IPFIREWALL_VERBOSE, > > IPFIREWALL_DEFAULT_TO_ACCEPT, DUMMYNET. I have commented out INET6, > > SCTP and the wireless interfaces. WITNESS and WITNESS_SKIPSPIN were > > only added in the hope of figuring out what locks it up, and they did > > signal these 2 LORs. > > > > pf and pflog are loaded as modules (pf_enable and pflog_enable set to > > yes in rc.conf). > > > > - The ipfw/dummynet side: > > > > I use net.link.ether.ipfw =3D 1 for MAC address checking, ipfw + > > dummynet for traffic shaping (4 queues at 95Mbits/s for the 2 > > external interfaces in/out, and 4 more queues for traffic that goes > > outside the AS group for which we have fast access). Deciding which > > queue traffic goes in depends on its source address and whether its > > destination is in ipfw tables 1, 2 or none. These tables are > > synchronized from pf tables via a custom script in crontab, which > > runs every 3 minutes. The pf tables used as source for these are > > controlled by OpenBGPD. > > > > - The pf side: > > > > Filtering is done here, as is policy routing. Filtering also > > contains redirecting to a transparent squid proxy of traffic destined > > to port 80 but not bound for networks received via BGP and saved to > > tables and . Metro and special port 80 traffic goes > > directly to the destination server. > > > > Traffic from net1 and net2 is routed via the "other" external > > interface, which doesn't contain the default route... with the > > exception of traffic to pf table (from BGP, same as table 2 > > in ipfw). Traffic to is routed via fastroute in pf > > (meaning using the default route). That's quite a complex setup. It would really be interesting to get the=20 trace for the first LOR in order to figure out which code path we are=20 looking at. I have a feeling that it might be the dummynet entry point,=20 but w/o the trace this is only speculation. > > Attached are full dmesg and the kernel config. > > > > I still have access to the hard drive with 7.0-STABLE on it, but not > > the motherboard/CPU and the network cards... they are running off the > > hard drive with 6.2 on it. =2D-=20 /"\ Best regards, | mlaier@freebsd.org \ / Max Laier | ICQ #67774661 X http://pf4freebsd.love2party.net/ | mlaier@EFnet / \ ASCII Ribbon Campaign | Against HTML Mail and News --nextPart1488400.EEhlPe4bs8 Content-Type: application/pgp-signature; name=signature.asc Content-Description: This is a digitally signed message part. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4 (FreeBSD) iD8DBQBH3DzOXyyEoT62BG0RAh02AJ9hiDNrJqYSk9CkSGQFhKHakG5XDwCdHICn vy+CLMkO02wlNUYqjhRxD9k= =NmmE -----END PGP SIGNATURE----- --nextPart1488400.EEhlPe4bs8--