From owner-freebsd-current@FreeBSD.ORG Thu Aug 24 00:18:32 2006 Return-Path: X-Original-To: freebsd-current@freebsd.org Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 50AD816A4DD for ; Thu, 24 Aug 2006 00:18:32 +0000 (UTC) (envelope-from wilkinsa@obelix.dsto.defence.gov.au) Received: from digger1.defence.gov.au (digger1.defence.gov.au [203.5.217.4]) by mx1.FreeBSD.org (Postfix) with ESMTP id C740D43D45 for ; Thu, 24 Aug 2006 00:18:19 +0000 (GMT) (envelope-from wilkinsa@obelix.dsto.defence.gov.au) Received: from ednmsw501.dsto.defence.gov.au (ednmsw501.dsto.defence.gov.au [131.185.2.150]) by digger1.defence.gov.au with ESMTP id k7O0GLUB028087 for ; Thu, 24 Aug 2006 09:46:21 +0930 (CST) Received: from muttley.dsto.defence.gov.au (unverified) by ednmsw501.dsto.defence.gov.au (Content Technologies SMTPRS 4.3.17) with ESMTP id for ; Thu, 24 Aug 2006 09:48:11 +0930 Received: from ednex510.dsto.defence.gov.au (ednex510.dsto.defence.gov.au [131.185.2.170]) by muttley.dsto.defence.gov.au (8.11.3/8.11.3) with ESMTP id k7O0DJ723621 for ; Thu, 24 Aug 2006 09:43:19 +0930 (CST) Received: from obelix.dsto.defence.gov.au ([203.6.60.208]) by ednex510.dsto.defence.gov.au with Microsoft SMTPSVC (6.0.3790.1830); Thu, 24 Aug 2006 09:43:19 +0930 Received: from obelix.dsto.defence.gov.au (localhost [127.0.0.1]) by obelix.dsto.defence.gov.au (8.13.7/8.13.7) with ESMTP id k7O0CsG2009188 for ; Thu, 24 Aug 2006 08:12:54 +0800 (WST) (envelope-from wilkinsa@obelix.dsto.defence.gov.au) Received: (from wilkinsa@localhost) by obelix.dsto.defence.gov.au (8.13.7/8.13.7/Submit) id k7O0CsAG009187 for freebsd-current@freebsd.org; Thu, 24 Aug 2006 08:12:54 +0800 (WST) (envelope-from wilkinsa) Date: Thu, 24 Aug 2006 08:12:54 +0800 From: "Wilkinson, Alex" To: freebsd-current@freebsd.org Message-ID: <20060824001254.GA75529@obelix.dsto.defence.gov.au> Mail-Followup-To: freebsd-current@freebsd.org References: <20060822042023.GC12848@cdnetworks.co.kr> <20060823093741.GF96644@FreeBSD.org> <20060823095504.GI17902@cdnetworks.co.kr> <20060823100420.GG96644@cell.sick.ru> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <20060823100420.GG96644@cell.sick.ru> User-Agent: Mutt/1.5.12-2006-07-14 X-OriginalArrivalTime: 24 Aug 2006 00:13:19.0377 (UTC) FILETIME=[1A73C010:01C6C712] X-TM-AS-Product-Ver: SMEX-7.0.0.1345-3.6.1039-14646.003 X-TM-AS-Result: No--0.315400-0.000000-31 Subject: Re: call for bge(4) testers X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 Aug 2006 00:18:32 -0000 0n Wed, Aug 23, 2006 at 02:04:20PM +0400, Gleb Smirnoff wrote: >On Wed, Aug 23, 2006 at 06:55:04PM +0900, Pyun YongHyeon wrote: >P> On Wed, Aug 23, 2006 at 01:37:41PM +0400, Gleb Smirnoff wrote: >P> > On Tue, Aug 22, 2006 at 01:20:23PM +0900, Pyun YongHyeon wrote: >P> > P> After fixing em(4) watchdog bug, I looked over bge(4) and I think >P> > P> bge(4) may suffer from the same issue. So if you have seen occasional >P> > P> watchdog timeout errors on bge(4) please give the attached patch a try. >P> > P> The patch does fix false watchdog timeout error only. >P> > P> Typical pheonoma for false watchdog timeout error are >P> > P> o polling(4) fix the issue >P> > P> o random watchdog error >P> > P> >P> > P> If my patch fix the issue you could see the following messages. >P> > P> "missing Tx completion interrupt!" or "link lost -- resetting" >P> > >P> > I still think that this fix is incorrect. It is just a more gentle >P> > recovery from a fake watchdog timeout. >P> >P> Its sole purpose is to reinitialize hardware for real watchdog >P> timeouts. It's not fix for general watchdog timeouts. As I said other >P> mails, the fake watchdog timeout(losing Tx interrupts) for hardwares >P> with Tx interrupt moderation capability could be normal thing. So I >P> just want to know bge(4) also has the same feature(bug). > >According to several emails about em(4) fake watchdog timeouts, the >problem can be fixed by setting debug.mpsafenet=0. This makes me think >that the problem isn't caused by TX interrupt moderation, but some race >in the kernel. Really, if_slowtimo() doesn't acquire driver lock before >checking and modifying the if_timer field. > >Afaik, NIC drivers that can do interrupt moderation should set a timer >to a sane value, based on interrupt moderation settings, so that the >watchdog won't be ever called fakely. What is interrupt moderation ? -aW