Date: Tue, 9 Mar 2010 00:53:34 -0800 From: Doug Hardie <bc979@lafn.org> To: Robert Watson <rwatson@freebsd.org> Cc: stable@freebsd.org, current@freebsd.org Subject: Re: Survey results very helpful, thanks! (was: Re: net.inet.tcp.timer_race: does anyone have a non-zero value?) Message-ID: <80C9B3BA-C498-419B-BD5E-6C2111F24F64@lafn.org> In-Reply-To: <alpine.BSF.2.00.1003082020560.96747@fledge.watson.org> References: <alpine.BSF.2.00.1003071141050.9729@fledge.watson.org> <alpine.BSF.2.00.1003081450310.23881@fledge.watson.org> <FF1D92A1-89BD-457E-9A6C-089D20E4D175@lafn.org> <alpine.BSF.2.00.1003082020560.96747@fledge.watson.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 8 March 2010, at 12:33, Robert Watson wrote: >=20 > On Mon, 8 Mar 2010, Doug Hardie wrote: >=20 >> I run a number of 4 core systems with em interfaces. These are = production systems that are unmanned and located a long way from me. = Under unusual conditions it can take up to 6 hours to get there. I have = been waiting to switch to 8.0 because of the discussions on the em = device and now it sounds like I had better just skip 8.x and wait for 9. = 7.2 is working just fine. >=20 > Not sure that any information in this survey thread should be relevant = to that decision. This race has existed since before FreeBSD, having = appeared in the original BSD network stack, and is just as present in = FreeBSD 7.x as 8.x or 9.x. When I learned about the race during the = early 7.x development cycle, I added a counter/statistic to measure how = much it happened in practice, but was not able to exercise it in my = testing, and so left the counter in to appear in 7.0 and later so that = we could perform this survey as core counts/etc increase. >=20 > The two likely outcomes were "it is never exercised" and "it is = exercised but only very infrequently", neither really justifying the = quite complex change to correct it given requirements at the time. = On-going development work on the virtual network stack is what justifies = correcting the bug at this point, moving from detecting and handling the = race to preventing it from occuring as an invariant. The motivation = here, BTW, is that we'd like to eliminate the type-stable storage = requirement for connection state (which ensures that memory once used = for a connection block is only ever used for connection blocks in the = future), allowing memory to be fully freed when a virtual network stack = is destroyed. Using type-stable storage helped address this bug, but = was primarily present to reduce the overhead of monitoring using = netstat(1). We'll now need to use a slightly more expensive solution = (true reference counts) in that context, although in practice it will = almost certainly be an unmeasurable cost. >=20 > Which is to say that while there might be something in the em/altq/... = thread to reasonably lead you to avoid 8.0, nothing in the TCP timer = race thread should do so, since it affects 7.2 just as much as 8.0. = Even if you do see a non-zero counter, that's not a matter for = operational concern, just useful from the perspective of a network stack = developer to understanding timing and behaviors in the stack. :-) Thanks for the complete explanation. I don't believe the ALTQ issue = will affect me. I am not currently using it and do not expect to in the = near future. In addition, there was a posting that a fix for at least = part of that will be added in a week or so. Given all that it appears = its time to start the planning/testing process for 8.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?80C9B3BA-C498-419B-BD5E-6C2111F24F64>