Date: Thu, 24 Aug 2006 08:12:54 +0800 From: "Wilkinson, Alex" <alex.wilkinson@dsto.defence.gov.au> To: freebsd-current@freebsd.org Subject: Re: call for bge(4) testers Message-ID: <20060824001254.GA75529@obelix.dsto.defence.gov.au> In-Reply-To: <20060823100420.GG96644@cell.sick.ru> References: <20060822042023.GC12848@cdnetworks.co.kr> <20060823093741.GF96644@FreeBSD.org> <20060823095504.GI17902@cdnetworks.co.kr> <20060823100420.GG96644@cell.sick.ru>
next in thread | previous in thread | raw e-mail | index | archive | help
0n Wed, Aug 23, 2006 at 02:04:20PM +0400, Gleb Smirnoff wrote: >On Wed, Aug 23, 2006 at 06:55:04PM +0900, Pyun YongHyeon wrote: >P> On Wed, Aug 23, 2006 at 01:37:41PM +0400, Gleb Smirnoff wrote: >P> > On Tue, Aug 22, 2006 at 01:20:23PM +0900, Pyun YongHyeon wrote: >P> > P> After fixing em(4) watchdog bug, I looked over bge(4) and I think >P> > P> bge(4) may suffer from the same issue. So if you have seen occasional >P> > P> watchdog timeout errors on bge(4) please give the attached patch a try. >P> > P> The patch does fix false watchdog timeout error only. >P> > P> Typical pheonoma for false watchdog timeout error are >P> > P> o polling(4) fix the issue >P> > P> o random watchdog error >P> > P> >P> > P> If my patch fix the issue you could see the following messages. >P> > P> "missing Tx completion interrupt!" or "link lost -- resetting" >P> > >P> > I still think that this fix is incorrect. It is just a more gentle >P> > recovery from a fake watchdog timeout. >P> >P> Its sole purpose is to reinitialize hardware for real watchdog >P> timeouts. It's not fix for general watchdog timeouts. As I said other >P> mails, the fake watchdog timeout(losing Tx interrupts) for hardwares >P> with Tx interrupt moderation capability could be normal thing. So I >P> just want to know bge(4) also has the same feature(bug). > >According to several emails about em(4) fake watchdog timeouts, the >problem can be fixed by setting debug.mpsafenet=0. This makes me think >that the problem isn't caused by TX interrupt moderation, but some race >in the kernel. Really, if_slowtimo() doesn't acquire driver lock before >checking and modifying the if_timer field. > >Afaik, NIC drivers that can do interrupt moderation should set a timer >to a sane value, based on interrupt moderation settings, so that the >watchdog won't be ever called fakely. What is interrupt moderation ? -aW
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060824001254.GA75529>