From owner-freebsd-current@FreeBSD.ORG Wed Aug 23 10:54:47 2006 Return-Path: X-Original-To: freebsd-current@freebsd.org Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E110C16A4E0 for ; Wed, 23 Aug 2006 10:54:46 +0000 (UTC) (envelope-from pyunyh@gmail.com) Received: from py-out-1112.google.com (py-out-1112.google.com [64.233.166.178]) by mx1.FreeBSD.org (Postfix) with ESMTP id E151643F22 for ; Wed, 23 Aug 2006 10:51:22 +0000 (GMT) (envelope-from pyunyh@gmail.com) Received: by py-out-1112.google.com with SMTP id o67so142031pye for ; Wed, 23 Aug 2006 03:51:22 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:date:from:to:cc:subject:message-id:reply-to:references:mime-version:content-type:content-disposition:in-reply-to:user-agent; b=BMaR7x4NDbLKO6LoN8uSmV7IRN/GrmMtuw0li3R6uV3JvF8iQKugk9+Kzobohq2BSZ8ssjmLkgl9eQe0SBekhjBpJDe/MbOCN1q7llZPBcphO+3CnNyYFG8P48VA3HNzeLvJ/AvbxgXQ3R51ujGNrRq0tSH0O/Inm6DdfCPegY8= Received: by 10.35.66.12 with SMTP id t12mr358374pyk; Wed, 23 Aug 2006 03:51:22 -0700 (PDT) Received: from michelle.cdnetworks.co.kr ( [211.53.35.84]) by mx.gmail.com with ESMTP id 10sm689920nzo.2006.08.23.03.51.20; Wed, 23 Aug 2006 03:51:22 -0700 (PDT) Received: from michelle.cdnetworks.co.kr (localhost.cdnetworks.co.kr [127.0.0.1]) by michelle.cdnetworks.co.kr (8.13.5/8.13.5) with ESMTP id k7NApIba020158 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 23 Aug 2006 19:51:18 +0900 (KST) (envelope-from pyunyh@gmail.com) Received: (from yongari@localhost) by michelle.cdnetworks.co.kr (8.13.5/8.13.5/Submit) id k7NApIh6020157; Wed, 23 Aug 2006 19:51:18 +0900 (KST) (envelope-from pyunyh@gmail.com) Date: Wed, 23 Aug 2006 19:51:18 +0900 From: Pyun YongHyeon To: Gleb Smirnoff Message-ID: <20060823105118.GJ17902@cdnetworks.co.kr> References: <20060822042023.GC12848@cdnetworks.co.kr> <20060823093741.GF96644@FreeBSD.org> <20060823095504.GI17902@cdnetworks.co.kr> <20060823100420.GG96644@cell.sick.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20060823100420.GG96644@cell.sick.ru> User-Agent: Mutt/1.4.2.1i Cc: freebsd-current@FreeBSD.org Subject: Re: call for bge(4) testers X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: pyunyh@gmail.com List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Aug 2006 10:54:47 -0000 On Wed, Aug 23, 2006 at 02:04:20PM +0400, Gleb Smirnoff wrote: > On Wed, Aug 23, 2006 at 06:55:04PM +0900, Pyun YongHyeon wrote: > P> On Wed, Aug 23, 2006 at 01:37:41PM +0400, Gleb Smirnoff wrote: > P> > On Tue, Aug 22, 2006 at 01:20:23PM +0900, Pyun YongHyeon wrote: > P> > P> After fixing em(4) watchdog bug, I looked over bge(4) and I think > P> > P> bge(4) may suffer from the same issue. So if you have seen occasional > P> > P> watchdog timeout errors on bge(4) please give the attached patch a try. > P> > P> The patch does fix false watchdog timeout error only. > P> > P> Typical pheonoma for false watchdog timeout error are > P> > P> o polling(4) fix the issue > P> > P> o random watchdog error > P> > P> > P> > P> If my patch fix the issue you could see the following messages. > P> > P> "missing Tx completion interrupt!" or "link lost -- resetting" > P> > > P> > I still think that this fix is incorrect. It is just a more gentle > P> > recovery from a fake watchdog timeout. > P> > P> Its sole purpose is to reinitialize hardware for real watchdog > P> timeouts. It's not fix for general watchdog timeouts. As I said other > P> mails, the fake watchdog timeout(losing Tx interrupts) for hardwares > P> with Tx interrupt moderation capability could be normal thing. So I > P> just want to know bge(4) also has the same feature(bug). > > According to several emails about em(4) fake watchdog timeouts, the > problem can be fixed by setting debug.mpsafenet=0. This makes me think > that the problem isn't caused by TX interrupt moderation, but some race > in the kernel. Really, if_slowtimo() doesn't acquire driver lock before > checking and modifying the if_timer field. > Hmm... I didn't say the problem was caused by TX interrupt moderation. I can't sure but I'm under the impression it has *two* different issues. If you think fake watchdog timeout fix is not adequate one please let me know. I'll backout the change if you want. > Afaik, NIC drivers that can do interrupt moderation should set a timer > to a sane value, based on interrupt moderation settings, so that the > watchdog won't be ever called fakely. > Yes. Normally it should. But I saw the issues on Marvell Yukon too. > P> > The more I think, the more I doubt that we really need the > P> > watchdog infrastructure that comes from old days. > P> > P> Would you give other way to recover from Tx stuck condition without > P> using watchdog? > > May be driver should take care of that theirselves, why not? At least > the callout routine will have access to the driver mutex, contrary to > if_slowtimo(). > -- Regards, Pyun YongHyeon