From owner-freebsd-net@FreeBSD.ORG Tue Nov 23 14:45:05 2010 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E7929106564A; Tue, 23 Nov 2010 14:45:04 +0000 (UTC) (envelope-from mike@sentex.net) Received: from smarthost1.sentex.ca (smarthost1-6.sentex.ca [IPv6:2607:f3e0:0:1::12]) by mx1.freebsd.org (Postfix) with ESMTP id 9762E8FC0C; Tue, 23 Nov 2010 14:45:04 +0000 (UTC) Received: from [IPv6:2607:f3e0:0:4:49a2:dbc6:564:65a6] ([IPv6:2607:f3e0:0:4:49a2:dbc6:564:65a6]) by smarthost1.sentex.ca (8.14.4/8.14.4) with ESMTP id oANEiu3d097451 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Tue, 23 Nov 2010 09:44:56 -0500 (EST) (envelope-from mike@sentex.net) Message-ID: <4CEBD363.2070402@sentex.net> Date: Tue, 23 Nov 2010 09:44:51 -0500 From: Mike Tancsa User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.12) Gecko/20101027 Thunderbird/3.1.6 MIME-Version: 1.0 To: Ivan Voras References: <4CEBBB8F.70400@sentex.net> In-Reply-To: X-Enigmail-Version: 1.1.1 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.67 on IPv6:2607:f3e0:0:1::12 Cc: freebsd-net@freebsd.org, Jack Vogel , freebsd-hardware@freebsd.org Subject: Re: em driver, 82574L chip, and possibly ASPM X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Nov 2010 14:45:05 -0000 On 11/23/2010 8:16 AM, Ivan Voras wrote: > On 11/23/10 14:03, Mike Tancsa wrote: >> On 11/23/2010 7:47 AM, Ivan Voras wrote: >>> It looks like I'm unfortunate enough to have to deploy on a machine >>> which has the 82574L Intel NIC chip on a Supermicro X8SIE-F board, which >>> apparently has hardware issues, according to this thread: >>> >>> http://sourceforge.net/tracker/index.php?func=detail&aid=2908463&group_id=42302&atid=447449 >>> >>> >>> >> >> Interesting, this is the same nic that has been giving me grief! Mine is >> on an Intel server board (S3420GPX). The symptoms are VERY similar to >> what the LINUX user sees as well with RX errors and the traffic patterns. > > I've posted detailed info on this NIC in the thread "em card wedging" - > can you compare it with yours? > > The whole thing looks very sensitive to BIOS settings. I've just toggled > something that looked unrelated (don't remember what, I've been toggling > BIOS settings all day) and the machine has been doing a flood-ping for > 20 minutes without wedging (which doesn't mean it won't wedge as soon as > I send this message, it did such things before). I posted whats in the BIOS at http://www.tancsa.com/82574.html Unfortunately, if I disable the BIOS option highlighted I can no longer netboot the box :( For my production box having the issues, this is not a problem. But it makes it difficult for testing on my lab box. I am not sure if that even really disables IPMI ? Also on this box whats NIC1 and NIC2 is the opposite of what FreeBSD sees as em0 and em1. So far I have tried Driver from HEAD -- This seems to help a bit in that wedges are less disable MSIX - no difference, still hangs It seems the nic will get one error and never recover. There will just be a steady stream of them. On the other onboard nic (a different type of em), the card will see the odd "no_buff" error, but it recovers like all the other em nics. Where as this problem nic, gets errors and they just keep on going up and up. Using the driver from HEAD, I can do an ifconfig em1 down;sleep 1;ifconfig em1 up and that fixes the problem dev.em.1.mac_stats.missed_packets: 1292 dev.em.1.mac_stats.recv_no_buff: 31 where as previous versions of the driver would panic the box doing that. Looking at the driver from HEAD, there does seem to be some mention of ASPM. Is this what the LINUX driver is doing too ? /* PCI-Ex Control Registers */ switch (hw->mac.type) { case e1000_82574: case e1000_82583: reg = E1000_READ_REG(hw, E1000_GCR); reg |= (1 << 22); E1000_WRITE_REG(hw, E1000_GCR, reg); /* * Workaround for hardware errata. * apply workaround for hardware errata documented in errata * docs Fixes issue where some error prone or unreliable PCIe * completions are occurring, particularly with ASPM enabled. * Without fix, issue can cause tx timeouts. */ reg = E1000_READ_REG(hw, E1000_GCR2); reg |= 1; E1000_WRITE_REG(hw, E1000_GCR2, reg); break; default: break; } return; ---Mike