From owner-freebsd-net@FreeBSD.ORG Sat Nov 10 04:13:16 2007 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C534016A420 for ; Sat, 10 Nov 2007 04:13:16 +0000 (UTC) (envelope-from silby@silby.com) Received: from relay03.pair.com (relay03.pair.com [209.68.5.17]) by mx1.freebsd.org (Postfix) with SMTP id 6DD5913C4B2 for ; Sat, 10 Nov 2007 04:13:16 +0000 (UTC) (envelope-from silby@silby.com) Received: (qmail 40071 invoked from network); 10 Nov 2007 03:46:22 -0000 Received: from unknown (HELO localhost) (unknown) by unknown with SMTP; 10 Nov 2007 03:46:22 -0000 X-pair-Authenticated: 209.68.2.70 Date: Fri, 9 Nov 2007 21:46:21 -0600 (CST) From: Mike Silbersack To: Matt Reimer In-Reply-To: Message-ID: <20071109213846.O46803@odysseus.silby.com> References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: net@freebsd.org Subject: Re: Should syncache.count ever be negative? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 10 Nov 2007 04:13:16 -0000 On Fri, 9 Nov 2007, Matt Reimer wrote: > On a eight core machine running RELENG_7 I'm seeing TCP stalls, > sometimes lasting up to 60 seconds or so. While trying to track this > down I noticed that net.inet.tcp.syncache.count is negative. Should it > be possible for the count to go negative? Perhaps it indicates a race, > or the counter is wrongly being decremented twice? I just took a look at the code, and you are correct that the count is not locked; it looks like you're hitting the race. However, it doesn't look like anything is checking the count, so that should not be the cause of your TCP stalls. Can you install netperf and run both the TCP_STREAM and UDP_STREAM tests just to make sure that your network card is working properly? We've recently found that the fast interrupt handlers we use in some network drivers act strangely when sharing interrupts. So, that's a first thing to test before we poke at the upper layers. If that doesn't help, can you post more details about how you are stressing the system? Thanks, -Mike