From owner-freebsd-net@FreeBSD.ORG Wed Apr 18 20:17:33 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 8BD6C106564A for ; Wed, 18 Apr 2012 20:17:33 +0000 (UTC) (envelope-from jfvogel@gmail.com) Received: from mail-we0-f182.google.com (mail-we0-f182.google.com [74.125.82.182]) by mx1.freebsd.org (Postfix) with ESMTP id 1250B8FC08 for ; Wed, 18 Apr 2012 20:17:32 +0000 (UTC) Received: by wern13 with SMTP id n13so6549993wer.13 for ; Wed, 18 Apr 2012 13:17:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=hiCBOsEi2i52MkKKBYN3KJ7Q5n9aGgODhBfuu3nhqEg=; b=ufAMw9Gqf3fQYH5vKzTqOkWCwJtIqhcICURylPe10IqbbS7ndlLTvdbvd1bsTuZHv8 qqnJcqD6axZy/fAACTMY5fhijdkeBLbZMA5qY6RT7SImgpBFRu4QDm/GpdWdElyeKpZQ k1PEipdB1nyZsef9IMRWIfcw+9GkCvarrM28Q9tHGTk8u0mH/SP8cQIlsP3p59wiNyRZ /n8+DLslT0UX2W3mIkMpobVuGUfVnUwblCbnosgLTgUsPa2M/M4xJ8jdjiXJx7inBt1c iDoISnA2Ur3mxmtOOlk+s+EJyGDOmvne/6l/y+d9bNcWkUMlSzUSjDQdBIQoCUem6lPH Y2QA== MIME-Version: 1.0 Received: by 10.180.107.104 with SMTP id hb8mr9309108wib.8.1334780251951; Wed, 18 Apr 2012 13:17:31 -0700 (PDT) Received: by 10.180.3.170 with HTTP; Wed, 18 Apr 2012 13:17:31 -0700 (PDT) In-Reply-To: References: Date: Wed, 18 Apr 2012 13:17:31 -0700 Message-ID: From: Jack Vogel To: Lars Wilke Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-net@freebsd.org Subject: Re: Watchdog timeout em driver 8.2-R X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Apr 2012 20:17:33 -0000 On Wed, Apr 18, 2012 at 7:01 AM, Lars Wilke wrote: > Hi, > > i first posted the following to the -stable list but got no > reply. Maybe someone here has some advice for me. > > > Switch: HP ProCurve 2910al > The switch does passive LACP > > Motherboard: Supermicro X8DTN+-F > > NIC: Quad Port Card, i.e. em1: > em1@pci0:6:0:1: class=0x020000 card=0x125e15d9 chip=0x105e8086 > rev=0x06 hdr=0x00 > vendor = 'Intel Corporation' > device = 'HP NC360T PCIe DP Gigabit Server Adapter (n1e5132)' > class = network > subclass = ethernet > bar [10] = type Memory, range 32, base 0xfb9e0000, size 131072, > enabled > bar [14] = type Memory, range 32, base 0xfb9c0000, size 131072, > enabled > bar [18] = type I/O Port, range 32, base 0xcc00, size 32, enabled > cap 01[c8] = powerspec 2 supports D0 D3 current D0 > cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message > cap 10[e0] = PCI-Express 1 endpoint max data 256(256) link x4(x4) > ecap 0001[100] = AER 1 0 fatal 1 non-fatal 0 corrected > ecap 0003[140] = Serial 1 002590ffff0484d8 > > I use CAT 6 cables and the switch and server are in the same cabinet. > > OS: FBSD is 8.2-Release > > rc.conf: > ifconfig_em0="up" > ifconfig_em1="up" > ifconfig_em2="up" > ifconfig_em3="up" > cloned_interfaces="lagg0" > ifconfig_lagg0="laggproto lacp laggport em0 laggport em1 laggport em2 > laggport em3" > ipv4_addrs_lagg0="192.168.80.20/24" > > > Hm, what sysctls might be interesting? > I use: > net.inet.tcp.sendbuf_max=16777216 > net.inet.tcp.recvbuf_max=16777216 > net.inet.tcp.sendspace=65536 > net.inet.tcp.recvspace=131072 > kern.ipc.nmbclusters=230400 > kern.maxvnodes=250000 > kern.maxfiles=65536 > kern.maxfilesperproc=32768 > vfs.read_max=32 > > loader.conf: does only contain stuff concerning zfs > > Except for swap the whole system uses zfs, swap is on a geom mirror. > > Once in a while i see this messages in /var/log/messages > > Apr 13 08:53:07 san02 kernel: em1: Watchdog timeout -- resetting > Apr 13 08:53:07 san02 kernel: em1: Queue(0) tdh = 232, hw tdt = 190 > Apr 13 08:53:07 san02 kernel: em1: TX(0) desc avail = 31,Next TX to > Clean = 221 > Apr 13 08:53:07 san02 kernel: em1: Link is Down > Apr 13 08:53:07 san02 kernel: em1: link state changed to DOWN > > Sometimes nothing for days, sometimes under high Network load (NFSv3), > sometimes > multiple times a day. I see this message/behaviour on always the same two > of the > four interfaces (em1 and em3). > > Then the NIC does not have the ACTIVE flag anymore, an ifconfig em1 up > solves the issue. But why does it loose the ACTIVE state and why does the > NIC reset itself in the first place? > Because a watchdog reset is just that, a reset, so it causes the hardware to reinitialize. It should come back up, I do not know why it did not, maybe the renegotiation with the switch fails for some reason? One thought is to get the latest em driver and see if the behavior changes, if that driver is the distributed 8.2 its pretty old. > On the switch i see that the port matching em1 on the server has left > the trunk, so the missing ACTIVE flag is not lying 8-/ > > Googling found many postings with the same problem and one site suggested > that this might be an ACPI problem but nothing concrete and the postings > i found were mostly FBSD7 and older. > > Any pointers would be appreciated. > Thank you > > --lars > > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" > > > > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >