From owner-freebsd-stable@FreeBSD.ORG Sun Mar 15 01:43:23 2009 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6211F106566C; Sun, 15 Mar 2009 01:43:23 +0000 (UTC) (envelope-from nick@nickwithers.com) Received: from mail.nickwithers.com (mail.nickwithers.com [123.243.228.66]) by mx1.freebsd.org (Postfix) with ESMTP id 8787E8FC15; Sun, 15 Mar 2009 01:43:21 +0000 (UTC) (envelope-from nick@nickwithers.com) Received: from [10.0.0.245] (presario.shmon.net [10.0.0.245]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.nickwithers.com (Postfix) with ESMTPSA id E106644; Sun, 15 Mar 2009 12:43:08 +1100 (EST) From: Nick Withers To: Robert Watson In-Reply-To: References: <1236920519.1490.30.camel@localhost> <1237020646.1532.24.camel@localhost> Content-Type: multipart/signed; micalg="pgp-sha1"; protocol="application/pgp-signature"; boundary="=-cLe9RqL1Ai6FM3YmThmg" Date: Sun, 15 Mar 2009 12:43:01 +1100 Message-Id: <1237081381.1581.2.camel@localhost> Mime-Version: 1.0 X-Mailer: Evolution 2.24.5 FreeBSD GNOME Team Port X-MailScanner-ID: E106644.E9FA3 X-nickwithers-MailScanner: Found to be clean X-nickwithers-MailScanner-From: nick@nickwithers.com Cc: freebsd-stable@freebsd.org Subject: Re: NICs locking up, "*tcp_sc_h" X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 15 Mar 2009 01:43:23 -0000 --=-cLe9RqL1Ai6FM3YmThmg Content-Type: text/plain Content-Transfer-Encoding: quoted-printable On Sat, 2009-03-14 at 18:01 +0000, Robert Watson wrote: > On Sat, 14 Mar 2009, Nick Withers wrote: >=20 > > Right, here we go! > ... >=20 > Turns out that the problem is a lock cycle triggered by the syncache call= ing,=20 > indirectly, the firewall during output, and the firewall trying to look u= p the=20 > connection for the packet. Thread one: >=20 > > Tracing PID 31 tid 100030 td 0xffffff00012016e0 > > sched_switch() at sched_switch+0xdf > > mi_switch() at mi_switch+0x18b > > turnstile_wait() at turnstile_wait+0x1c4 > > _mtx_lock_sleep() at _mtx_lock_sleep+0x76 > > _mtx_lock_flags() at _mtx_lock_flags+0x95 > > syncache_lookup() at syncache_lookup+0xee > > syncache_expand() at syncache_expand+0x38 > > tcp_input() at tcp_input+0x99b > > ip_input() at ip_input+0xaf > > ether_demux() at ether_demux+0x1b9 > > ether_input() at ether_input+0x1bb > > fxp_intr() at fxp_intr+0x224 > > ithread_loop() at ithread_loop+0xe9 > > fork_exit() at fork_exit+0x112 > > fork_trampoline() at fork_trampoline+0xe > > --- trap 0, rip =3D 0, rsp =3D 0xfffffffe80174d30, rbp =3D 0 --- >=20 > This thread holds TCP locks and is trying to acquire the syncache lock.=20 > Thread two: >=20 > > sched_switch() at sched_switch+0xdf > > mi_switch() at mi_switch+0x18b > > turnstile_wait() at turnstile_wait+0x1c4 > > _rw_rlock() at _rw_rlock+0x9c > > ipfw_chk() at ipfw_chk+0x3ac1 > > ipfw_check_out() at ipfw_check_out+0xb1 > > pfil_run_hooks() at pfil_run_hooks+0xac > > ip_output() at ip_output+0x357 > > syncache_respond() at syncache_respond+0x2fd > > syncache_timer() at syncache_timer+0x15a > > softclock() at softclock+0x270 > > ithread_loop() at ithread_loop+0xe9 > > fork_exit() at fork_exit+0x112 > > fork_trampoline() at fork_trampoline+0xe >=20 > This is the syncache timer holding syncache locks, calling IP output, and= IPFW=20 > trying to acquire TCP locks. >=20 > Am I right in thinking that you are using uid/gid/jail firewall rules? You are indeed. > They=20 > suffer from a fundamental architectural problem in that they require reac= hing=20 > "up" to a higher level of the stack at times when it's not always a good = idea=20 > to do so. In general we solve the problem by passing "down" the inpcb fo= r a=20 > connection in the output path so that TCP doesn't have to look it up --=20 > however, in the case of the syncache we actually don't have the inpcb eas= ily=20 > in hand (or at least, we have it, but we can't just lock it because synca= che=20 > locks are after TCP locks in the lock order...). It transpires that what= the=20 > firewall really wants is not the inpcb, but the credential, but those are= =20 > interfaces we can't change right now. Thanks for the explanation! > I'll need to think a bit about a proper fix for this, but you'll find the= =20 > problem likely goes away if you eliminate all uid/gid/jail rules from you= r=20 > firewall. You could also tweak the syncache logic not to use a retransmi= t=20 > timer, which might slightly extend the time it takes for systems to conne= ct to=20 > your host in the presence of packet loss, but would eliminate this=20 > transmission path entirely. We'll need a real and more general fix, howe= ver,=20 > to commit, and I'll look and see what I can come up with. Brilliant, thanks very much. I'll work without uid rules for the time being, then. Ta for your time and help on this! > Robert N M Watson > Computer Laboratory > University of Cambridge --=20 Nick Withers email: nick@nickwithers.com Web: http://www.nickwithers.com Mobile: +61 414 397 446 --=-cLe9RqL1Ai6FM3YmThmg Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.11 (FreeBSD) iEYEABECAAYFAkm8XSUACgkQ3wcG/Pf4Wrjd4wCglWdiU6OFd6gChYVP3yLS6TOv qr8AnR7WHu2DdH16HnILcpNIgxJRwFJR =1DFu -----END PGP SIGNATURE----- --=-cLe9RqL1Ai6FM3YmThmg--