From owner-freebsd-current@FreeBSD.ORG Wed Aug 23 10:04:36 2006 Return-Path: X-Original-To: freebsd-current@FreeBSD.org Delivered-To: freebsd-current@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 958E816A4E2 for ; Wed, 23 Aug 2006 10:04:36 +0000 (UTC) (envelope-from glebius@FreeBSD.org) Received: from cell.sick.ru (cell.sick.ru [217.72.144.68]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8D21443D69 for ; Wed, 23 Aug 2006 10:04:22 +0000 (GMT) (envelope-from glebius@FreeBSD.org) Received: from cell.sick.ru (glebius@localhost [127.0.0.1]) by cell.sick.ru (8.13.4/8.13.3) with ESMTP id k7NA4LC7074710 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 23 Aug 2006 14:04:21 +0400 (MSD) (envelope-from glebius@FreeBSD.org) Received: (from glebius@localhost) by cell.sick.ru (8.13.4/8.13.1/Submit) id k7NA4K9P074709; Wed, 23 Aug 2006 14:04:20 +0400 (MSD) (envelope-from glebius@FreeBSD.org) X-Authentication-Warning: cell.sick.ru: glebius set sender to glebius@FreeBSD.org using -f Date: Wed, 23 Aug 2006 14:04:20 +0400 From: Gleb Smirnoff To: Pyun YongHyeon Message-ID: <20060823100420.GG96644@cell.sick.ru> References: <20060822042023.GC12848@cdnetworks.co.kr> <20060823093741.GF96644@FreeBSD.org> <20060823095504.GI17902@cdnetworks.co.kr> Mime-Version: 1.0 Content-Type: text/plain; charset=koi8-r Content-Disposition: inline In-Reply-To: <20060823095504.GI17902@cdnetworks.co.kr> User-Agent: Mutt/1.5.6i Cc: freebsd-current@FreeBSD.org Subject: Re: call for bge(4) testers X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Aug 2006 10:04:36 -0000 On Wed, Aug 23, 2006 at 06:55:04PM +0900, Pyun YongHyeon wrote: P> On Wed, Aug 23, 2006 at 01:37:41PM +0400, Gleb Smirnoff wrote: P> > On Tue, Aug 22, 2006 at 01:20:23PM +0900, Pyun YongHyeon wrote: P> > P> After fixing em(4) watchdog bug, I looked over bge(4) and I think P> > P> bge(4) may suffer from the same issue. So if you have seen occasional P> > P> watchdog timeout errors on bge(4) please give the attached patch a try. P> > P> The patch does fix false watchdog timeout error only. P> > P> Typical pheonoma for false watchdog timeout error are P> > P> o polling(4) fix the issue P> > P> o random watchdog error P> > P> P> > P> If my patch fix the issue you could see the following messages. P> > P> "missing Tx completion interrupt!" or "link lost -- resetting" P> > P> > I still think that this fix is incorrect. It is just a more gentle P> > recovery from a fake watchdog timeout. P> P> Its sole purpose is to reinitialize hardware for real watchdog P> timeouts. It's not fix for general watchdog timeouts. As I said other P> mails, the fake watchdog timeout(losing Tx interrupts) for hardwares P> with Tx interrupt moderation capability could be normal thing. So I P> just want to know bge(4) also has the same feature(bug). According to several emails about em(4) fake watchdog timeouts, the problem can be fixed by setting debug.mpsafenet=0. This makes me think that the problem isn't caused by TX interrupt moderation, but some race in the kernel. Really, if_slowtimo() doesn't acquire driver lock before checking and modifying the if_timer field. Afaik, NIC drivers that can do interrupt moderation should set a timer to a sane value, based on interrupt moderation settings, so that the watchdog won't be ever called fakely. P> > The more I think, the more I doubt that we really need the P> > watchdog infrastructure that comes from old days. P> P> Would you give other way to recover from Tx stuck condition without P> using watchdog? May be driver should take care of that theirselves, why not? At least the callout routine will have access to the driver mutex, contrary to if_slowtimo(). -- Totus tuus, Glebius. GLEBIUS-RIPN GLEB-RIPE