From owner-freebsd-stable@FreeBSD.ORG Sat Oct 21 02:40:52 2006 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id AA16916A403 for ; Sat, 21 Oct 2006 02:40:52 +0000 (UTC) (envelope-from jfvogel@gmail.com) Received: from py-out-1112.google.com (py-out-1112.google.com [64.233.166.183]) by mx1.FreeBSD.org (Postfix) with ESMTP id D8BF343D46 for ; Sat, 21 Oct 2006 02:40:51 +0000 (GMT) (envelope-from jfvogel@gmail.com) Received: by py-out-1112.google.com with SMTP id c59so344005pyc for ; Fri, 20 Oct 2006 19:40:51 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=eZMoD7DZbWgkMHfuAshMRYHKVDaxvPizSrC9M2Vg6wTiuBBODViCXDm27/py/ADrmRfVdoFC4gfBD4UzzI6rTo4ErQ5Nj3Phr7zxU4Uf6dLqioVYuNpo2hDGvn3hSMCmn2MXSqz0HOsCoNX8a970FgVfhLf3e83uWhKnc85W10o= Received: by 10.35.80.20 with SMTP id h20mr1844368pyl; Fri, 20 Oct 2006 19:40:50 -0700 (PDT) Received: by 10.35.119.1 with HTTP; Fri, 20 Oct 2006 19:40:50 -0700 (PDT) Message-ID: <2a41acea0610201940g5718e12avf9bfa61bb38e777d@mail.gmail.com> Date: Fri, 20 Oct 2006 19:40:50 -0700 From: "Jack Vogel" To: "Bill Paul" In-Reply-To: <20061020234636.1BD5216A40F@hub.freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <2a41acea0610201452v22f2bae9mcc0e71d2157d8bbb@mail.gmail.com> <20061020234636.1BD5216A40F@hub.freebsd.org> Cc: freebsd-stable@freebsd.org, kris@obsecurity.org Subject: Re: em network issues X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 21 Oct 2006 02:40:52 -0000 On 10/20/06, Bill Paul wrote: > > [...] > > > > Another thing that might be handy is improving the watchdog timeout > > > message so that it dumps the state of the ICR and ICM registers (and > > > maybe some other interesting driver and/or device state). The timeout > > > implies no interrupts were delivered for a Long Time (tm). If the > > > ICM register indicates interrupts have been masked, then that means > > > em_intr_fast() was triggered by and interrupt and it scheduled work, > > > but that work never executed. If that really is what happened, then > > > I can understand the watchdog error occuring. If that's _not_ what > > > happened, them something else is screwed up. > > > > Jesse Brandeburg just did an interesting hack for the Linux driver, I > > was considering trying to code an equivalent thing up for us. We > > have evidence that on some AMD based systems there are writebacks > > that get lost, since the TX cleanup relies on the DD being set you > > are hosed when this happens. What he did was make a cleanup > > routine that ONLY uses the head and tail pointers and NOT the done > > bit. Then, in the watchdog routine, if there is evidence of this problem > > it will switch the cleanup function pointer to this alternate clean code. > > Oho, I didn't realize the 8254x had producer/consumer indexes like this. > Hm. But the documentation for the Transmit Descriptor Head register > says: > > "Reading the transmit descriptor head to determine which buffers > have been used (and can be returned to the memory pool) is not reliable." > > There's a similar notation for the Receive Descriptor Head register. > > I wonder what's unreliable about it. > > > At least one user that was having a problem has reported this solved > > it. It may be one of the issues hitting us as well. > > Switching from testing the descriptor completion bits to using the > consumer indexes should be pretty straightforward. It's worth a shot > at any rate. > I have not yet looked at Jesse's code to see if he does anything fancy but there is one other driver that I know of on our hardware (and no its not for that so-called OS from Redmond) that has always done this so it must not be THAT unreliable. It just isnt using the full capability of the hardware, but if it works.... :) Jesse's code is supposed to be on our driver site on sourceforge, I just have been too busy to go look for it, but its public. BTW, I got a Smartbits unit in my cubicle today, got software installed and hardware almost there, not quite done yet. It sure can pump LOTS of packets though :) Will report results as I get them. Jack