Date: Wed, 6 Dec 2006 18:01:48 +1100 (EST) From: Bruce Evans <bde@zeta.org.au> To: Marius Strobl <marius@FreeBSD.org> Cc: cvs-src@FreeBSD.org, src-committers@FreeBSD.org, cvs-all@FreeBSD.org Subject: Re: cvs commit: src/sys/pci if_xl.c if_xlreg.h Message-ID: <20061206164242.A32496@delplex.bde.org> In-Reply-To: <200612060218.kB62IfVn046324@repoman.freebsd.org> References: <200612060218.kB62IfVn046324@repoman.freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 6 Dec 2006, Marius Strobl wrote: > marius 2006-12-06 02:18:41 UTC > > FreeBSD src repository > > Modified files: > sys/pci if_xl.c if_xlreg.h > Log: > - Use the xl_stats_update() callout instead of if_slowtimo() for > driving xl_watchdog() in order to avoid races accessing if_timer. It's a shame to force all NIC drivers to manage the timeout for this. Most have a timeout for other purposes so I couldn't see how to save much code using a callback, but a callback would be cleaner. (To avoid the race, just move the decrement of the count to drivers.) > While at it relax the watchdog a bit by reloading it in xl_txeof()/ > xl_txeof_90xB() if there are still packets enqueued. Er, xl_txeof_90xB() was one of the 20-30% of txeof() routines that handled this correctly. The timeout shouldn't be touched in xx_txeof() except to clear it when all descriptors have been handled. Reloading it without even checking that at least one descriptor has been handled is almost as bad as clearing it without checking, since xx_txeof() may be called when there is nothing for it to do. This is most broken in the DEVICE_POLLING case. Then xx_txeof() will be called every few mS, much more often than the counter is decrememented, so unconditionally reloading it ensures that it never works. Reloading it only if some progress has been made is mainly a small pessimization. If the initial timeout is not long enough for all the descriptors, then something is wrong and it is probably better to let the watchdog try to fix it than to risk packets trickling out at a rate of 1 every (<timeout> - epsilon) seconds, especially since it takes more code to implement the trickling. However, the trickling case is very unlikely to occur and most or all xx_start() routines don't worry about it -- they just reload the full timeout and don't waste time calculating a timeout based on the number of new and old descriptors. xl_txeof_90xB() now reloads the timeout without checking anything except that there are still some unhandled descriptors. ISTR that this bug has been fixed in a few drivers, mainly ones that support DEVICE_POLLING. Grepping for "0 : 5" shows the bug in the following drivers: dc, pcn, sis, xl. All of these except pcn support DEVICE_POLLING. The bug may be spelled in other ways. Non-bugs handling the timer are spelled too differently too. xl_txeof() spells this too differently: - attached to it too closely (but applying to both xl txeof() routines) there is a comment that "A frame was downloaded". This may have been correct once upon a time, but now both routines are called without checking that a frame has been downloaded for polling and also from xl_start_locked(). - it has a banal comment "Clear the timeout (sic) timer." that is now further from echoing the code. The spelling differences are so large that it is unclear if it has the bug: - it clears the timeout correctly (only clear if no more descriptors). - reloading of the timeout is dependent on hardware stuff that I don't really understand. Apparently the device has a hardare register which we can poll to see whether the device is stalled. I think the hardware shouldn't be trusted if it has stalled, and anyway, the hardware stuff shouldn't be done here since it usually just wastes time (especially in the DEVICE_POLLING case). Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20061206164242.A32496>