From owner-freebsd-stable@FreeBSD.ORG Mon Oct 5 14:19:10 2009 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 50E881065670 for ; Mon, 5 Oct 2009 14:19:10 +0000 (UTC) (envelope-from db@danielbond.org) Received: from mail.nsn.no (mailone.nsn.no [62.89.38.160]) by mx1.freebsd.org (Postfix) with SMTP id B3E7F8FC14 for ; Mon, 5 Oct 2009 14:19:09 +0000 (UTC) Received: (qmail 49811 invoked by uid 0); 5 Oct 2009 14:19:07 -0000 Received: from unknown (HELO ?172.16.3.90?) (85.95.44.187) by mail.nsn.no with SMTP; 5 Oct 2009 14:19:07 -0000 Message-Id: <6194E9BC-3A3D-4941-A777-88C7411905B0@danielbond.org> From: Daniel Bond To: Rudy In-Reply-To: <4AC66437.4040704@monkeybrains.net> Content-Type: multipart/signed; protocol="application/pgp-signature"; micalg=pgp-sha1; boundary="Apple-Mail-13--580249135" Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v936) Date: Mon, 5 Oct 2009 16:19:01 +0200 References: <4AB9638B.8040607@monkeybrains.net> <4AC318E2.70306@monkeybrains.net> <4AC3DB8F.7010602@monkeybrains.net> <2a41acea0909301556g1df7dbafv813f5924553c8bfb@mail.gmail.com> <4AC5198E.7030609@monkeybrains.net> <4AC51B4C.7080905@monkeybrains.net> <2a41acea0910011450v41590f3dn112f367f26faed2d@mail.gmail.com> <4AC64835.3060107@monkeybrains.net> <2a41acea0910021237w415efa2cs4354a0f99aef8f6@mail.gmail.com> <4AC66437.4040704@monkeybrains.net> X-Pgp-Agent: GPGMail 1.2.0 (v56) X-Mailer: Apple Mail (2.936) Cc: freebsd-stable@freebsd.org Subject: Re: em0 watchdog timeouts X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 05 Oct 2009 14:19:10 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --Apple-Mail-13--580249135 Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Hi, I've been struggling with watchdog timeouts in 7.1/7.2-RELEASE for the past 6months too. It looks related. I've tried to replace the hardware 3 times (2 different IBM x3755 chassis, one IBM x3650 chassis). I tried first with onboard broadcom NICs (bce-based) PCIx-based, until I had issues with "watchdog timeout". I tried replacing it with a 4-port pci-x Intel NIC, which gave me same problems. I was told that the 4-port intel NICs had an onboard bus- controller, that could cause trouble, so I replaced this with a 2-port PCI-e intel, which I was told by a Sepherosa Ziehau was the best performing gig-e NIC (rx/tx). Still getting watchdog timeouts, I tried upgrading all sort of sysctls I found in mailing-list threads (disable msi/msix interrupts, adjust rx/tx processing, etc, etc). I tried upgrading BIOS, firmware on all kinds of stuff (disks, BMC, etc, etc) to newest version. I also tried using a different qlogic isp(4) FC-controller (PCI-e). No matter what I tried, I could not diagnose this problem, or at least fix it. Also it happened rarely enough, to not be easy to debugging. I would get a series of "watchdog timeout -- resetting", until the NIC would go completly offline - at the point I'd reboot it from console. This happened about once every 1-10 days, usually about 11-13:00. This machine has now been replaced with Linux, unfortunately, just to avoid more customer complaints and downtime. The IBM x3755 with FreeBSD7.2 which was replaced with Linux, is still online, and can be put at disposal for any developers who would like to debug this further. Like Stefan Krueger mentioned, this machine is also running as NFS server, with a mix of BSD and Linux clients, and it's getting hit pretty hard by clients. Hope we can iron this bug out, in the future. Best regards, Daniel Bond. On Oct 2, 2009, at 10:36 PM, Rudy wrote: > > Ah, I'll stop messing with them. > > > I just set them all to 0 to see if that will help and noticed the card > was leaving tx_int_delay=1. > > # sysctl dev.em.4.debug=1 > Oct 2 13:26:07 mango kernel: em4: tx_int_delay = 1, > tx_abs_int_delay = 0 > Oct 2 13:26:07 mango kernel: em4: rx_int_delay = 0, > rx_abs_int_delay = 0 > > # sysctl dev.em.4 > dev.em.4.%desc: Intel(R) PRO/1000 Network Connection 6.9.12 > dev.em.4.rx_int_delay: 0 > dev.em.4.tx_int_delay: 0 > dev.em.4.rx_abs_int_delay: 0 > dev.em.4.tx_abs_int_delay: 0 > > Splitting traffic to different ports has brought down the watchdog > events to once a day. ... essentially, I have a quad 30Mbps (not quad > 1Gbps) card. heheh. > Would turning off net.inet.ip.fastforwarding or any other setting > help? > > Today, I set net.inet.ip.fw.enable=0 and I'll see if that helps. I > have > a feeling that isn't related to the NIC at all, but I'm not sure what > else to try. > > Rudy > > > > Jack Vogel wrote: >> Watchdog resets the adapter. Messing with these values is of >> dubious value >> anyway. >> >> Jack >> >> >> On Fri, Oct 2, 2009 at 11:36 AM, Rudy >> wrote: >> >> >>> I noticed something interesting. >>> >>> I set the rc_int_delay to 0: >>> sysctl dev.em.5.rx_int_delay=0 >>> >>> Chcking via sysctl dev.em.5.debug=1 shows ex_int_delay is indeed 0: >>> Oct 1 17:32:41 mango kernel: em5: rx_int_delay = 0, >>> rx_abs_int_delay = 66 >>> >>> After a watchdog event, sysctl dev.em.5.debug=1 shows ex_int_delay >>> is >>> now 32: >>> Oct 2 11:29:49 mango kernel: em5: rx_int_delay = 32, >>> rx_abs_int_delay = >>> 66 >>> >>> However, running sysctl dev.em.5 shows it as 0: >>> dev.em.5.rx_int_delay: 0 >>> dev.em.5.tx_int_delay: 66 >>> >>> Seems like the adapter and the kernel don't agree on the >>> rx_int_delay >>> value. >>> >>> Rudy >>> >>> >> >> > > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org > " --Apple-Mail-13--580249135 content-type: application/pgp-signature; x-mac-type=70674453; name=PGP.sig content-description: This is a digitally signed message part content-disposition: inline; filename=PGP.sig content-transfer-encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.11 (Darwin) iEYEARECAAYFAkrKAFsACgkQF4Ca8+3pySUChwCeP0yHBmX4PGFqbHvgb6oULPyc kaIAnRRsVPnwB3/qf+DmenXyzyfC6yFn =lWQb -----END PGP SIGNATURE----- --Apple-Mail-13--580249135--