From owner-freebsd-current@FreeBSD.ORG Tue Mar 9 08:53:36 2010 Return-Path: Delivered-To: current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 103F21065675; Tue, 9 Mar 2010 08:53:36 +0000 (UTC) (envelope-from bc979@lafn.org) Received: from zoom.lafn.org (zoom.lafn.ORG [206.117.18.8]) by mx1.freebsd.org (Postfix) with ESMTP id CFB328FC21; Tue, 9 Mar 2010 08:53:35 +0000 (UTC) Received: from [10.0.1.4] (pool-71-109-144-133.lsanca.dsl-w.verizon.net [71.109.144.133]) (authenticated bits=0) by zoom.lafn.org (8.14.3/8.14.2) with ESMTP id o298rYvW093739 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Tue, 9 Mar 2010 00:53:35 -0800 (PST) (envelope-from bc979@lafn.org) References: In-Reply-To: Mime-Version: 1.0 (Apple Message framework v1077) Content-Type: text/plain; charset=us-ascii Message-Id: <80C9B3BA-C498-419B-BD5E-6C2111F24F64@lafn.org> Content-Transfer-Encoding: quoted-printable From: Doug Hardie Date: Tue, 9 Mar 2010 00:53:34 -0800 To: Robert Watson X-Mailer: Apple Mail (2.1077) X-Virus-Scanned: clamav-milter 0.95.3 at zoom.lafn.org X-Virus-Status: Clean X-Mailman-Approved-At: Tue, 09 Mar 2010 12:32:26 +0000 Cc: stable@freebsd.org, current@freebsd.org Subject: Re: Survey results very helpful, thanks! (was: Re: net.inet.tcp.timer_race: does anyone have a non-zero value?) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Mar 2010 08:53:36 -0000 On 8 March 2010, at 12:33, Robert Watson wrote: >=20 > On Mon, 8 Mar 2010, Doug Hardie wrote: >=20 >> I run a number of 4 core systems with em interfaces. These are = production systems that are unmanned and located a long way from me. = Under unusual conditions it can take up to 6 hours to get there. I have = been waiting to switch to 8.0 because of the discussions on the em = device and now it sounds like I had better just skip 8.x and wait for 9. = 7.2 is working just fine. >=20 > Not sure that any information in this survey thread should be relevant = to that decision. This race has existed since before FreeBSD, having = appeared in the original BSD network stack, and is just as present in = FreeBSD 7.x as 8.x or 9.x. When I learned about the race during the = early 7.x development cycle, I added a counter/statistic to measure how = much it happened in practice, but was not able to exercise it in my = testing, and so left the counter in to appear in 7.0 and later so that = we could perform this survey as core counts/etc increase. >=20 > The two likely outcomes were "it is never exercised" and "it is = exercised but only very infrequently", neither really justifying the = quite complex change to correct it given requirements at the time. = On-going development work on the virtual network stack is what justifies = correcting the bug at this point, moving from detecting and handling the = race to preventing it from occuring as an invariant. The motivation = here, BTW, is that we'd like to eliminate the type-stable storage = requirement for connection state (which ensures that memory once used = for a connection block is only ever used for connection blocks in the = future), allowing memory to be fully freed when a virtual network stack = is destroyed. Using type-stable storage helped address this bug, but = was primarily present to reduce the overhead of monitoring using = netstat(1). We'll now need to use a slightly more expensive solution = (true reference counts) in that context, although in practice it will = almost certainly be an unmeasurable cost. >=20 > Which is to say that while there might be something in the em/altq/... = thread to reasonably lead you to avoid 8.0, nothing in the TCP timer = race thread should do so, since it affects 7.2 just as much as 8.0. = Even if you do see a non-zero counter, that's not a matter for = operational concern, just useful from the perspective of a network stack = developer to understanding timing and behaviors in the stack. :-) Thanks for the complete explanation. I don't believe the ALTQ issue = will affect me. I am not currently using it and do not expect to in the = near future. In addition, there was a posting that a fix for at least = part of that will be added in a week or so. Given all that it appears = its time to start the planning/testing process for 8.