Date: Thu, 14 Oct 2004 15:55:55 -0400 (EDT) From: Robert Watson <rwatson@FreeBSD.ORG> To: Ian FREISLICH <if@hetzner.co.za> Cc: Kris Kennaway <kris@obsecurity.org> Subject: Re: network slowness/freez-up since update 10/11 Message-ID: <Pine.NEB.3.96L.1041014154818.84384a-100000@fledge.watson.org> In-Reply-To: <E1CI5Cg-000GBI-00@hetzner.co.za>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 14 Oct 2004, Ian FREISLICH wrote: > Andrey Chernov wrote: > > > You mean, until rwatson changed the default to debug.mpsafenet=1? :-) > > > > Your guess is precisely right! :-) > > > > (IMHO making such commit without testing major drivers such as if_de was > > wrong step) > > I always thought the spin on debug.mpsafenet=1 with if_de was YYMV. > There were many calls for the maintainers of the driver to fix it, but > zero response IIRC. Maybe making it on by default was a little hasty, > but anyone that follows -CURRENT like they should if they run it weuld > have been aware of this and set debug.mpsafenet=0 in their loader.conf > when they saw that commit. (Kind comments on handling of mpsafenet work ommitted in quote, but much appreciated). I was chatting wit Max Laier this evening, and he suggested that he was worried that the ALTQ changes might actually be the problem. He has created a small patch to back those changes out, as well as a change to tweak the behavior. You can find the patches here: http://people.freebsd.org/~mlaier/if_de.c.backout.diff http://people.freebsd.org/~mlaier/if_de.c.drvlen.diff I looked at the queueing pieces yesterday but didn't see any obvious problems with them. I think it's worth trying each of these patches to see if one of them has the desired effect, however. The problem appears to lie somehow in the hand-off between the network stack and driver, as that's the primary difference between the debug.mpsafenet={0,1} cases. FYI, here are some things we've tried looking at so far: - We thought there might be a race in the handling of IFF_OACTIVE and its use in if_handoff(), since IFF_OACTIVE is used differently in if_de that most drivers. However, removing the IFF_OACTIVE test in iff_handoff() did not resolve the problem in John's configuration. - We were concerned there was a race in the task queue handoff used to schedule the interface start routine asynchronusly from the queue insert. We instrumented the task queue code with timing and didn't find anything abnormal (i.e., no waits long enough to explain the observed delays). So it seems likely to be one of the two following sorts of things: - A problem in the if_de driver, perhaps due to less Giant on the rest of the stack, that causes it to improperly move data in and out of the interface queues, or monitor for entires in the queue, resulting in delays. - A race introduced by Giant removal wherein the if_de driver behaves incorrectly if a packet is found in the ifq by the interrupt handler if the tulip_start function has not yet been run for that packet. Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Principal Research Scientist, McAfee Research
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.NEB.3.96L.1041014154818.84384a-100000>