From owner-freebsd-current@FreeBSD.ORG  Sun Mar  7 11:59:36 2010
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: current@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 771B5106566B;
	Sun,  7 Mar 2010 11:59:36 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 549A48FC17;
	Sun,  7 Mar 2010 11:59:36 +0000 (UTC)
Received: from fledge.watson.org (fledge.watson.org [65.122.17.41])
	by cyrus.watson.org (Postfix) with ESMTPS id CE53346B51;
	Sun,  7 Mar 2010 06:59:35 -0500 (EST)
Date: Sun, 7 Mar 2010 11:59:35 +0000 (GMT)
From: Robert Watson <rwatson@FreeBSD.org>
X-X-Sender: robert@fledge.watson.org
To: current@FreeBSD.org, stable@FreeBSD.org
Message-ID: <alpine.BSF.2.00.1003071141050.9729@fledge.watson.org>
User-Agent: Alpine 2.00 (BSF 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII
Cc: 
Subject: net.inet.tcp.timer_race: does anyone have a non-zero value?
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 07 Mar 2010 11:59:36 -0000


Dear all:

I'm embarking on some new network stack locking work, which requires me to 
address a number of loose ends in the current model.  A few years ago, my 
attention was drawn to a largly theoretical race, which had existed in the BSD 
code since inception.  It is detected and handled in practice, but relies on 
type stability of TCP connection data structures, which will need to change in 
the future due to on-going virtualization work.  I didn't fix it at the time, 
but did add a counter so that we could see if it was happening in the field -- 
that counter, net.inet.tcp.timer_race, indicates whether or not the stack has 
detected it happening (and then handled it).  This e-mail is to collect the 
results of that in-the-field survey.

Please check the results of the following command:

   % sysctl net.inet.tcp.timer_race
   net.inet.tcp.timer_race: 0

If your system shows a non-zero value, please send me a *private e-mail* with 
the output of that command, plus also the output of "sysctl kern.smp", 
"uptime", and a brief description of the workload and network interface 
configuration.  For example: it's a busy 8-core web server with roughly X 
connections/second, and that has three em network interfaces used to load 
balance from an upstream source.  IPSEC is used for management purposes (but 
not bulk traffic), and there's a local MySQL database.

I've already seen one non-zero report, but would be interested in knowing a 
bit more about the kinds of situations where it's happening so that I can 
prioritize fixing it appropriately, but also reason about the frequency at 
which it happens so we can select a fix that avoids adding significant 
overhead in the common case.

Thanks,

Robert N M Watson
Computer Laboratory
University of Cambridge