From owner-cvs-all@FreeBSD.ORG Wed Dec 6 07:01:53 2006 Return-Path: X-Original-To: cvs-all@FreeBSD.org Delivered-To: cvs-all@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 2B7FD16A403; Wed, 6 Dec 2006 07:01:53 +0000 (UTC) (envelope-from bde@zeta.org.au) Received: from mailout1.pacific.net.au (mailout1-3.pacific.net.au [61.8.2.210]) by mx1.FreeBSD.org (Postfix) with ESMTP id C135743CA7; Wed, 6 Dec 2006 07:01:08 +0000 (GMT) (envelope-from bde@zeta.org.au) Received: from mailproxy1.pacific.net.au (mailproxy1.pacific.net.au [61.8.2.162]) by mailout1.pacific.net.au (Postfix) with ESMTP id EF274328285; Wed, 6 Dec 2006 18:01:50 +1100 (EST) Received: from katana.zip.com.au (katana.zip.com.au [61.8.7.246]) by mailproxy1.pacific.net.au (Postfix) with ESMTP id 5FC778C03; Wed, 6 Dec 2006 18:01:49 +1100 (EST) Date: Wed, 6 Dec 2006 18:01:48 +1100 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Marius Strobl In-Reply-To: <200612060218.kB62IfVn046324@repoman.freebsd.org> Message-ID: <20061206164242.A32496@delplex.bde.org> References: <200612060218.kB62IfVn046324@repoman.freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: cvs-src@FreeBSD.org, src-committers@FreeBSD.org, cvs-all@FreeBSD.org Subject: Re: cvs commit: src/sys/pci if_xl.c if_xlreg.h X-BeenThere: cvs-all@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: CVS commit messages for the entire tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Dec 2006 07:01:53 -0000 On Wed, 6 Dec 2006, Marius Strobl wrote: > marius 2006-12-06 02:18:41 UTC > > FreeBSD src repository > > Modified files: > sys/pci if_xl.c if_xlreg.h > Log: > - Use the xl_stats_update() callout instead of if_slowtimo() for > driving xl_watchdog() in order to avoid races accessing if_timer. It's a shame to force all NIC drivers to manage the timeout for this. Most have a timeout for other purposes so I couldn't see how to save much code using a callback, but a callback would be cleaner. (To avoid the race, just move the decrement of the count to drivers.) > While at it relax the watchdog a bit by reloading it in xl_txeof()/ > xl_txeof_90xB() if there are still packets enqueued. Er, xl_txeof_90xB() was one of the 20-30% of txeof() routines that handled this correctly. The timeout shouldn't be touched in xx_txeof() except to clear it when all descriptors have been handled. Reloading it without even checking that at least one descriptor has been handled is almost as bad as clearing it without checking, since xx_txeof() may be called when there is nothing for it to do. This is most broken in the DEVICE_POLLING case. Then xx_txeof() will be called every few mS, much more often than the counter is decrememented, so unconditionally reloading it ensures that it never works. Reloading it only if some progress has been made is mainly a small pessimization. If the initial timeout is not long enough for all the descriptors, then something is wrong and it is probably better to let the watchdog try to fix it than to risk packets trickling out at a rate of 1 every ( - epsilon) seconds, especially since it takes more code to implement the trickling. However, the trickling case is very unlikely to occur and most or all xx_start() routines don't worry about it -- they just reload the full timeout and don't waste time calculating a timeout based on the number of new and old descriptors. xl_txeof_90xB() now reloads the timeout without checking anything except that there are still some unhandled descriptors. ISTR that this bug has been fixed in a few drivers, mainly ones that support DEVICE_POLLING. Grepping for "0 : 5" shows the bug in the following drivers: dc, pcn, sis, xl. All of these except pcn support DEVICE_POLLING. The bug may be spelled in other ways. Non-bugs handling the timer are spelled too differently too. xl_txeof() spells this too differently: - attached to it too closely (but applying to both xl txeof() routines) there is a comment that "A frame was downloaded". This may have been correct once upon a time, but now both routines are called without checking that a frame has been downloaded for polling and also from xl_start_locked(). - it has a banal comment "Clear the timeout (sic) timer." that is now further from echoing the code. The spelling differences are so large that it is unclear if it has the bug: - it clears the timeout correctly (only clear if no more descriptors). - reloading of the timeout is dependent on hardware stuff that I don't really understand. Apparently the device has a hardare register which we can poll to see whether the device is stalled. I think the hardware shouldn't be trusted if it has stalled, and anyway, the hardware stuff shouldn't be done here since it usually just wastes time (especially in the DEVICE_POLLING case). Bruce