From owner-freebsd-current@FreeBSD.ORG  Fri Jul  3 21:28:50 2009
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 646BA1065672
	for <freebsd-current@freebsd.org>; Fri,  3 Jul 2009 21:28:50 +0000 (UTC)
	(envelope-from lstewart@freebsd.org)
Received: from lauren.room52.net (lauren.room52.net [210.50.193.198])
	by mx1.freebsd.org (Postfix) with ESMTP id 0AE508FC18
	for <freebsd-current@freebsd.org>; Fri,  3 Jul 2009 21:28:49 +0000 (UTC)
	(envelope-from lstewart@freebsd.org)
Received: from lstewart-laptop.caia.swin.edu.au
	(host86-150-124-14.range86-150.btcentralplus.com [86.150.124.14])
	(authenticated bits=0)
	by lauren.room52.net (8.14.3/8.14.3) with ESMTP id n63LSLvC013137
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Sat, 4 Jul 2009 07:28:41 +1000 (EST)
	(envelope-from lstewart@freebsd.org)
Message-ID: <4A4E77EA.7030803@freebsd.org>
Date: Fri, 03 Jul 2009 22:28:10 +0100
From: Lawrence Stewart <lstewart@freebsd.org>
User-Agent: Thunderbird 2.0.0.22 (X11/20090626)
MIME-Version: 1.0
To: Kamigishi Rei <spambox@haruhiism.net>
References: <4A45ABB1.7040506@haruhiism.net>	<4A48CE02.5000200@freebsd.org>	<4A48D4A2.8010207@haruhiism.net>	<4A48DA31.5010900@freebsd.org>
	<4A4BE438.5060203@haruhiism.net>
In-Reply-To: <4A4BE438.5060203@haruhiism.net>
Content-Type: multipart/mixed; boundary="------------020601070500000604040401"
X-Spam-Status: No, score=-0.4 required=5.0 tests=AWL,BAYES_00,RCVD_IN_PBL,
	RDNS_DYNAMIC,SPF_SOFTFAIL autolearn=disabled version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on lauren.room52.net
Cc: silby@freebsd.org, freebsd-current@freebsd.org
Subject: Re: r194546 amd64: kernel panic in tcp_sack.c
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 03 Jul 2009 21:28:50 -0000

This is a multi-part message in MIME format.
--------------020601070500000604040401
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Kamigishi Rei wrote:
> Lawrence Stewart wrote:
>> Ok. I'm working on a patch to address a different TCP/SACK issue, but 
>> it may in fact be partially relevant to the cause of your panic... 
>> can't promise when I'll be able to take a close look at this but I'm 
>> aware of it now so that's a start. If you run into it again or find 
>> the trigger for the panic, please let me know.
> Reproduced. I don't know what was the trigger this time. The system runs 
> lighttpd, fastcgi python and php, mysql and postgresql.
> Also, in this core somehow everything looks quite similar (looks like 
> another page fault but with panic() called prior to getting into fatal 
> trap 12) to my two earlier dumps:
> http://lists.freebsd.org/pipermail/freebsd-current/2009-June/008777.html
> http://lists.freebsd.org/pipermail/freebsd-current/2009-June/008781.html
> so maybe it's not really a problem with tcp_sack.c or netisr.c.
> 

With some poking around and a key insight provided by Robert Watson, I 
believe we've got this sorted.

The SACK code puts a global cap on the amount of memory that can be used 
for SACK accounting. The variable V_tcp_sack_globalholes tracks how many 
SACK holes are currently allocated across all active TCP connections. It 
gets incremented in tcp_sackhole_alloc() and decremented in 
tcp_sackhole_free() in netinet/tcp_sack.c.

It turns out that there is currently no lock synchronising access to the 
variable, and the incrementing/decrementing is not being done 
atomically. In Kamigishi's case, his server had a traffic profile 
consisting of a large number of clients simultaneously connecting over 
cruddy links which was giving the SACK accounting a real workout. The 
inevitable race would strike one or more times, leaving the count of 
holes not in tune with reality, and eventually when traffic died down 
the variable would decrement down below 0, triggering the panic. Note 
that this panic only occurs if INVARIANTS is compiled into the kernel so 
the issue has been around for some time but not noticed.

The attached patch makes use of the atomic(9) KPI to ensure 
incrementing/decrementing the variable is done atomically, which should 
fix the bug.

Reviews/testing would be good so that we can get this into 8.0.

Cheers,
Lawrence

--------------020601070500000604040401
Content-Type: text/plain;
 name="sack_globalholes.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="sack_globalholes.diff"

Index: sys/netinet/tcp_sack.c
===================================================================
--- sys/netinet/tcp_sack.c	(revision 195317)
+++ sys/netinet/tcp_sack.c	(working copy)
@@ -273,7 +273,7 @@
 	hole->rxmit = start;
 
 	tp->snd_numholes++;
-	V_tcp_sack_globalholes++;
+	atomic_add_int(&V_tcp_sack_globalholes, 1);
 
 	return hole;
 }
@@ -289,7 +289,7 @@
 	uma_zfree(V_sack_hole_zone, hole);
 
 	tp->snd_numholes--;
-	V_tcp_sack_globalholes--;
+	atomic_subtract_int(&V_tcp_sack_globalholes, 1);
 
 	KASSERT(tp->snd_numholes >= 0, ("tp->snd_numholes >= 0"));
 	KASSERT(V_tcp_sack_globalholes >= 0, ("tcp_sack_globalholes >= 0"));

--------------020601070500000604040401--