From owner-freebsd-net@FreeBSD.ORG  Sat Nov 10 04:13:16 2007
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C534016A420
	for <net@freebsd.org>; Sat, 10 Nov 2007 04:13:16 +0000 (UTC)
	(envelope-from silby@silby.com)
Received: from relay03.pair.com (relay03.pair.com [209.68.5.17])
	by mx1.freebsd.org (Postfix) with SMTP id 6DD5913C4B2
	for <net@freebsd.org>; Sat, 10 Nov 2007 04:13:16 +0000 (UTC)
	(envelope-from silby@silby.com)
Received: (qmail 40071 invoked from network); 10 Nov 2007 03:46:22 -0000
Received: from unknown (HELO localhost) (unknown)
	by unknown with SMTP; 10 Nov 2007 03:46:22 -0000
X-pair-Authenticated: 209.68.2.70
Date: Fri, 9 Nov 2007 21:46:21 -0600 (CST)
From: Mike Silbersack <silby@silby.com>
To: Matt Reimer <mattjreimer@gmail.com>
In-Reply-To: <f383264b0711091609n81875b6v444055960ab0fd96@mail.gmail.com>
Message-ID: <20071109213846.O46803@odysseus.silby.com>
References: <f383264b0711091609n81875b6v444055960ab0fd96@mail.gmail.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: net@freebsd.org
Subject: Re: Should syncache.count ever be negative?
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 10 Nov 2007 04:13:16 -0000


On Fri, 9 Nov 2007, Matt Reimer wrote:

> On a eight core machine running RELENG_7 I'm seeing TCP stalls,
> sometimes lasting up to 60 seconds or so. While trying to track this
> down I noticed that net.inet.tcp.syncache.count is negative. Should it
> be possible for the count to go negative? Perhaps it indicates a race,
> or the counter is wrongly being decremented twice?

I just took a look at the code, and you are correct that the count is not 
locked; it looks like you're hitting the race.  However, it doesn't look 
like anything is checking the count, so that should not be the cause of 
your TCP stalls.

Can you install netperf and run both the TCP_STREAM and UDP_STREAM tests 
just to make sure that your network card is working properly?  We've 
recently found that the fast interrupt handlers we use in some network 
drivers act strangely when sharing interrupts.  So, that's a first thing 
to test before we poke at the upper layers.

If that doesn't help, can you post more details about how you are 
stressing the system?

Thanks,

-Mike