From owner-freebsd-net@FreeBSD.ORG Mon Feb 22 20:46:35 2010 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 158241065672 for ; Mon, 22 Feb 2010 20:46:35 +0000 (UTC) (envelope-from kirk.davis@epsb.ca) Received: from Exchange22.EDU.epsb.ca (exchange22.epsb.ca [198.161.119.187]) by mx1.freebsd.org (Postfix) with ESMTP id DD95E8FC19 for ; Mon, 22 Feb 2010 20:46:34 +0000 (UTC) Received: from Exchange26.EDU.epsb.ca ([10.0.5.123]) by Exchange22.EDU.epsb.ca with Microsoft SMTPSVC(6.0.3790.3959); Mon, 22 Feb 2010 13:46:33 -0700 X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Date: Mon, 22 Feb 2010 13:46:33 -0700 Message-ID: <529374128DC1B04D9D037911B8E8F05301C17A55@Exchange26.EDU.epsb.ca> In-Reply-To: <43669_1266865888_4B82D6E0_43669_263_1_2a41acea1002221113v26804200q4f3971c3359dffab@mail.gmail.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Intel em0: watchdog timeout Thread-Index: Acqz8ySKb3Qr2EGzQK+F8V0QX/sW1wAC8zXg References: <529374128DC1B04D9D037911B8E8F05301C17A51@Exchange26.EDU.epsb.ca> <43416_1266864062_4B82CFBE_43416_81_1_2a41acea1002221043k1b8742c9m8fb484a8e8a4fdda@mail.gmail.com> <529374128DC1B04D9D037911B8E8F05301C17A54@Exchange26.EDU.epsb.ca> <43669_1266865888_4B82D6E0_43669_263_1_2a41acea1002221113v26804200q4f3971c3359dffab@mail.gmail.com> From: "Kirk Davis" To: "Jack Vogel" X-OriginalArrivalTime: 22 Feb 2010 20:46:33.0712 (UTC) FILETIME=[1E6FBF00:01CAB400] Cc: freebsd-net@freebsd.org Subject: Re: Intel em0: watchdog timeout X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Feb 2010 20:46:35 -0000 From: Jack Vogel [mailto:jfvogel@gmail.com]=20 =09 Try `sysctl dev.em.0.stats=3D1` and em.2, you're right though, doesn't look like any system mbuf failures. Does this need to be done in loader.conf? It doesn't seem to take from the command line. # sysctl dev.em.2.stats=3D1 =20 dev.em.2.stats: -1 -> -1 # sysctl dev.em.2.stats dev.em.2.stats: -1 =20 =09 7.2 seems to be a stable base OS and driver, 8 is better in some respects, but has not been without its reported problems. I leave the choice to you. =09 Without more data I am not sure what is causing the watchdog. Yes, I am having trouble tracking it down. I up'ed the mbufs to 65536 just to see if it made any difference but it is still happening. ############ SET NMBCLUSTERS TO 65536 ########################## Feb 22 12:45:21 inet-gw kernel: em0: watchdog timeout -- resetting Feb 22 12:45:21 inet-gw kernel: em0: link state changed to DOWN Feb 22 12:45:25 inet-gw kernel: em0: link state changed to UP Feb 22 12:45:25 inet-gw kernel: em0: link state changed to DOWN Feb 22 12:45:28 inet-gw kernel: em0: link state changed to UP Feb 22 12:45:29 inet-gw kernel: em0: link state changed to DOWN Feb 22 12:45:31 inet-gw kernel: em0: link state changed to UP # netstat -m 8183/6037/14220 mbufs in use (current/cache/total) 7160/3598/10758/65536 mbuf clusters in use (current/cache/total/max) 7160/3592 mbuf+clusters out of packet secondary zone in use (current/cache) 0/104/104/12800 4k (page size) jumbo clusters in use (current/cache/total/max) 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max) 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max) 16365K/9121K/25487K bytes allocated to network (current/cache/total) 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0/0/0 sfbufs in use (current/peak/max) 0 requests for sfbufs denied 0 requests for sfbufs delayed 0 requests for I/O initiated by sendfile 0 calls to protocol drain routines I guess I will have to build up the new server with 7.3 on it and see if the newer driver makes any difference. =09 ---- Kirk =09 =09 =09 On Mon, Feb 22, 2010 at 10:55 AM, Kirk Davis wrote: =09 I have a backup server sitting here that I am going to load 7.3-RC1 onto and test with it. It is the exact duplicate hardware so that should help with the upgraded driver. Does it make sence to go to 8.0? =20 Here is the mbuf usage on this server. I'm nore sure exactly how to read this but it seem to looks OK. # netstat -m 8181/5904/14085 mbufs in use (current/cache/total) 7159/3471/10630/25600 mbuf clusters in use (current/cache/total/max) 7159/3465 mbuf+clusters out of packet secondary zone in use (current/cache) 0/104/104/12800 4k (page size) jumbo clusters in use (current/cache/total/max) 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max) 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max) 16363K/8834K/25197K bytes allocated to network (current/cache/total) 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0/0/0 sfbufs in use (current/peak/max) 0 requests for sfbufs denied 0 requests for sfbufs delayed 0 requests for I/O initiated by sendfile 0 calls to protocol drain routines =09 =20 ---- Kirk =20 ________________________________ From: Jack Vogel [mailto:jfvogel@gmail.com]=20 Sent: Monday, February 22, 2010 11:43 AM To: Kirk Davis Cc: freebsd-net@freebsd.org Subject: [SPAM:#] Re: Intel em0: watchdog timeout =09 =09 With the increased load you might be running out of mbufs more easily, would suggest you increase the mbuf pool. =09 This is an old old driver now, you might consider going to something a bit more recent.=20 =20 Jack =09 =09 =09 On Mon, Feb 22, 2010 at 10:14 AM, Kirk Davis wrote: =09 Hi, I have a FreeBSD server running Quagga as a BGP router. It has a number of interfaces in it both bce and em. The most heavily used interfaces are starting to give me watchdog timeout errors just in the last week. We normally sustain about 300Mb/s on both of these interfaces but in the last week this now up to 380Mb/s. =09 This is a Intel Pro/1000 PT dual interface PCI-E card. There is two of them in the server. The server is a Dell 2950 =09 Searching the mailing list and checking on google has not turned up much. Since this is our main router it is difficult to test with. I have seen one message that suggests trying to set hw.em.rxd=3D1024 and hw.em.txd=3D1024 in loader.conf and another that suggested turning off but none this has not made any difference. =09 The odd thing is that this just started. This box has been up and running fine for a while. The only thing different on our network had been an increase in the bandwidth. =09 Any idea where I go from here to trouble shoot this? =09 # uname -a FreeBSD inet-gw.epsb.ca 7.1-STABLE FreeBSD 7.1-STABLE #3: Mon Mar 23 16:08:53 MDT 2009 =09 root@inet-gw-test.epsb.ca:/usr/obj/usr/src/sys/DELL2950 amd64 =09 # tail /var/log/messages Feb 19 12:26:04 inet-gw kernel: em0: watchdog timeout -- resetting Feb 19 12:26:04 inet-gw kernel: em0: link state changed to DOWN Feb 19 12:26:07 inet-gw kernel: em0: link state changed to UP Feb 19 12:26:08 inet-gw kernel: em0: link state changed to DOWN Feb 19 12:26:10 inet-gw kernel: em0: link state changed to UP Feb 19 14:44:20 inet-gw kernel: em0: watchdog timeout -- resetting Feb 19 14:44:20 inet-gw kernel: em0: link state changed to DOWN Feb 19 14:44:23 inet-gw kernel: em0: link state changed to UP Feb 19 15:05:03 inet-gw kernel: em2: watchdog timeout -- resetting Feb 19 15:05:03 inet-gw kernel: em2: link state changed to DOWN Feb 19 15:05:05 inet-gw kernel: em2: link state changed to UP Feb 19 15:07:39 inet-gw kernel: em2: watchdog timeout -- resetting Feb 19 15:07:39 inet-gw kernel: em2: link state changed to DOWN Feb 19 15:07:42 inet-gw kernel: em2: link state changed to UP =09 # from /var/run/dmesg.boot em0: port 0xdce0-0xdcff mem 0xd5ee0000-0xd5efffff,0xd5ec0000-0xd5edffff irq 17 at device 0.0 on pci8 em0: Using MSI interrupt em0: [FILTER] em0: Ethernet address: 00:15:17:a6:ae:94 em2: port 0xcce0-0xccff mem 0xde3e0000-0xde3fffff,0xde3c0000-0xde3dffff irq 16 at device 0.0 on pci10 em2: Using MSI interrupt em2: [FILTER] em2: Ethernet address: 00:15:17:a6:af:d6 =09 # pciconf -lv em0@pci0:8:0:0: class=3D0x020000 card=3D0x135e8086 chip=3D0x105e8086 rev=3D0x06 hdr=3D0x00 vendor =3D 'Intel Corporation' device =3D 'PRO/1000 PT' class =3D network subclass =3D ethernet em2@pci0:10:0:0: class=3D0x020000 card=3D0x135e8086 chip=3D0x105e8086 rev=3D0x06 hdr=3D0x00 vendor =3D 'Intel Corporation' device =3D 'PRO/1000 PT' class =3D network subclass =3D ethernet =09 # netstat -bdhI em2 2 input (em2) output packets errs bytes packets errs bytes colls drops 65K 0 72M 51K 0 9.4M 0 0 69K 0 78M 52K 0 8.5M 0 0 76K 0 88M 55K 0 11M 0 0 74K 0 85M 54K 0 10M 0 0 78K 0 91M 56K 0 9.0M 0 0 75K 0 86M 54K 0 8.7M 0 0 74K 0 85M 54K 0 9.2M 0 0 75K 0 86M 56K 0 10M 0 0 78K 0 88M 55K 0 12M 0 0 78K 0 90M 58K 0 12M 0 0 76K 0 87M 54K 0 10M 0 0 79K 0 91M 56K 0 10M 0 0 =09 =09 ---- Kirk =09 ------------------------------------------------------------------------ -------- Kirk Davis Senior Network Analyst, ITS Edmonton Public Schools One Kingsway Ave. Edmonton, Alberta, Canada T5H 4G9 =09 _______________________________________________ freebsd-net@freebsd.org mailing list =09 http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" =09