From owner-freebsd-stable@FreeBSD.ORG  Sat Mar 14 18:01:33 2009
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 84405106566C
	for <freebsd-stable@freebsd.org>; Sat, 14 Mar 2009 18:01:33 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 5D64B8FC08
	for <freebsd-stable@freebsd.org>; Sat, 14 Mar 2009 18:01:33 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from fledge.watson.org (fledge.watson.org [65.122.17.41])
	by cyrus.watson.org (Postfix) with ESMTPS id 0517546B46;
	Sat, 14 Mar 2009 14:01:33 -0400 (EDT)
Date: Sat, 14 Mar 2009 18:01:32 +0000 (GMT)
From: Robert Watson <rwatson@FreeBSD.org>
X-X-Sender: robert@fledge.watson.org
To: Nick Withers <nick@nickwithers.com>
In-Reply-To: <1237020646.1532.24.camel@localhost>
Message-ID: <alpine.BSF.2.00.0903141756100.27597@fledge.watson.org>
References: <1236920519.1490.30.camel@localhost>
	<alpine.BSF.2.00.0903130935290.61873@fledge.watson.org>
	<alpine.BSF.2.00.0903130949210.61873@fledge.watson.org>
	<1237020646.1532.24.camel@localhost>
User-Agent: Alpine 2.00 (BSF 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII
Cc: freebsd-stable@freebsd.org
Subject: Re: NICs locking up, "*tcp_sc_h"
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 14 Mar 2009 18:01:33 -0000


On Sat, 14 Mar 2009, Nick Withers wrote:

> Right, here we go!
...

Turns out that the problem is a lock cycle triggered by the syncache calling, 
indirectly, the firewall during output, and the firewall trying to look up the 
connection for the packet.  Thread one:

> Tracing PID 31 tid 100030 td 0xffffff00012016e0
> sched_switch() at sched_switch+0xdf
> mi_switch() at mi_switch+0x18b
> turnstile_wait() at turnstile_wait+0x1c4
> _mtx_lock_sleep() at _mtx_lock_sleep+0x76
> _mtx_lock_flags() at _mtx_lock_flags+0x95
> syncache_lookup() at syncache_lookup+0xee
> syncache_expand() at syncache_expand+0x38
> tcp_input() at tcp_input+0x99b
> ip_input() at ip_input+0xaf
> ether_demux() at ether_demux+0x1b9
> ether_input() at ether_input+0x1bb
> fxp_intr() at fxp_intr+0x224
> ithread_loop() at ithread_loop+0xe9
> fork_exit() at fork_exit+0x112
> fork_trampoline() at fork_trampoline+0xe
> --- trap 0, rip = 0, rsp = 0xfffffffe80174d30, rbp = 0 ---

This thread holds TCP locks and is trying to acquire the syncache lock. 
Thread two:

> sched_switch() at sched_switch+0xdf
> mi_switch() at mi_switch+0x18b
> turnstile_wait() at turnstile_wait+0x1c4
> _rw_rlock() at _rw_rlock+0x9c
> ipfw_chk() at ipfw_chk+0x3ac1
> ipfw_check_out() at ipfw_check_out+0xb1
> pfil_run_hooks() at pfil_run_hooks+0xac
> ip_output() at ip_output+0x357
> syncache_respond() at syncache_respond+0x2fd
> syncache_timer() at syncache_timer+0x15a
> softclock() at softclock+0x270
> ithread_loop() at ithread_loop+0xe9
> fork_exit() at fork_exit+0x112
> fork_trampoline() at fork_trampoline+0xe

This is the syncache timer holding syncache locks, calling IP output, and IPFW 
trying to acquire TCP locks.

Am I right in thinking that you are using uid/gid/jail firewall rules?  They 
suffer from a fundamental architectural problem in that they require reaching 
"up" to a higher level of the stack at times when it's not always a good idea 
to do so.  In general we solve the problem by passing "down" the inpcb for a 
connection in the output path so that TCP doesn't have to look it up -- 
however, in the case of the syncache we actually don't have the inpcb easily 
in hand (or at least, we have it, but we can't just lock it because syncache 
locks are after TCP locks in the lock order...).  It transpires that what the 
firewall really wants is not the inpcb, but the credential, but those are 
interfaces we can't change right now.

I'll need to think a bit about a proper fix for this, but you'll find the 
problem likely goes away if you eliminate all uid/gid/jail rules from your 
firewall.  You could also tweak the syncache logic not to use a retransmit 
timer, which might slightly extend the time it takes for systems to connect to 
your host in the presence of packet loss, but would eliminate this 
transmission path entirely.  We'll need a real and more general fix, however, 
to commit, and I'll look and see what I can come up with.

Robert N M Watson
Computer Laboratory
University of Cambridge