From owner-freebsd-stable@FreeBSD.ORG Wed Sep 27 07:15:41 2006 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6A59616A417 for ; Wed, 27 Sep 2006 07:15:41 +0000 (UTC) (envelope-from ob@gruft.de) Received: from obh.snafu.de (obh.snafu.de [213.73.92.34]) by mx1.FreeBSD.org (Postfix) with ESMTP id B905343D5A for ; Wed, 27 Sep 2006 07:15:40 +0000 (GMT) (envelope-from ob@gruft.de) Received: from ob by obh.snafu.de with local (Exim 4.63 (FreeBSD)) (envelope-from ) id 1GSTdr-000727-0p for freebsd-stable@freebsd.org; Wed, 27 Sep 2006 09:15:39 +0200 Date: Wed, 27 Sep 2006 09:15:39 +0200 From: Oliver Brandmueller To: freebsd-stable@freebsd.org Message-ID: <20060927071538.GF22229@e-Gitt.NET> Mail-Followup-To: freebsd-stable@freebsd.org References: <451A1375.5080202@gneto.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="IMjqdzrDRly81ofr" Content-Disposition: inline In-Reply-To: <451A1375.5080202@gneto.com> User-Agent: Mutt/1.5.13 (2006-08-11) Sender: Oliver Brandmueller Subject: Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Sep 2006 07:15:41 -0000 --IMjqdzrDRly81ofr Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hi, On Wed, Sep 27, 2006 at 08:00:21AM +0200, Martin Nilsson wrote: > I get tons of these: > em0: watchdog timeout -- resetting > em0: link state changed to DOWN > em0: link state changed to UP >=20 > mailbox# pciconf -lv > em0@pci13:0:0: class=3D0x020000 card=3D0x108c15d9 chip=3D0x108c8086 rev= =3D0x03=20 > hdr=3D0x00 > vendor =3D 'Intel Corporation' > device =3D 'PRO/1000 PM' > class =3D network > subclass =3D ethernet > em1@pci14:0:0: class=3D0x020000 card=3D0x109a15d9 chip=3D0x109a8086 rev= =3D0x00=20 > hdr=3D0x00 > vendor =3D 'Intel Corporation' > class =3D network > subclass =3D ethernet >=20 [...] > I have only seen them on em0. Yesterday I tried sysutils/cpuburn on=20 > similar boxes that are netbooted with NFS mounted drives and everytime I= =20 > loaded the two CPU cores the network went down. I see the same. Very much on this one, where I workaround the problem by using polling, it's a UP machine. FreeBSD nessie 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #3: Fri Sep 15 09:48:3= 6 CEST 2006 root@nessie:/usr/obj/usr/src/sys/NESSIE i386 em0@pci2:1:0: class=3D0x020000 card=3D0x10198086 chip=3D0x10198086 rev=3D= 0x00 hdr=3D0x00 vendor =3D 'Intel Corporation' device =3D '82547EI Gigabit Ethernet Controller (LOM)' class =3D network subclass =3D ethernet irq18: em0 uhci2 3319 0 Another machine, also UP, but with two interfaces. The problem is not as=20 apparent as on the first machine, but it's there. This machine is not as=20 loaded usually (CPU wise) as the first machine. The problem is ONLY on=20 em1: FreeBSD hudson 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #48: Thu Sep 14 10:19:= 46 CEST 2006 root@hudson:/usr/obj/usr/src/sys/NFS-32-FBSD6 i386 em0@pci1:1:0: class=3D0x020000 card=3D0x10758086 chip=3D0x10758086 rev=3D= 0x00 hdr=3D0x00 vendor =3D 'Intel Corporation' device =3D '82547EI Gigabit Ethernet Controller' class =3D network subclass =3D ethernet em1@pci3:2:0: class=3D0x020000 card=3D0x10768086 chip=3D0x10768086 rev=3D= 0x00 hdr=3D0x00 vendor =3D 'Intel Corporation' device =3D '82547EI Gigabit Ethernet Controller' class =3D network subclass =3D ethernet irq17: em1 ichsmb0 950121879 855 irq18: em0 71437344 64 The problem appeared after the em updates during the last weeks in the kernel and has not been observed before this. em is always loaded as a=20 module in my kernels. The problem seems to occur more often if the=20 machine's CPU is busy. I have several SMP machines with the following em interfaces, which=20 DON'T show the problem, but they also have different chipset on the em=20 interface. Most of the kernels were built between Sep 7 and Sep 19. 3 times this: em0@pci4:5:0: class=3D0x020000 card=3D0x34248086 chip=3D0x10108086 rev=3D= 0x01 hdr=3D0x00 em1@pci4:5:1: class=3D0x020000 card=3D0x34248086 chip=3D0x10108086 rev=3D= 0x01 hdr=3D0x00 irq23: em0 970303432 750 3 times this: em0@pci4:5:0: class=3D0x020000 card=3D0x34258086 chip=3D0x100e8086 rev=3D= 0x02 hdr=3D0x00 irq23: em0 292477376 435 So I can observe at least 3 interesting differences: - the interface showing the problems shares the interrupt - for me it happens on UP machines only - the chips are different What I can't do: moving the interfaces between machines, these are=20 onboard interfaces. What I could do: I could try to unload the USB driver or the ichsmb=20 driver on the machiens, where the interrupts are shared. Anyway, the USB=20 is not used currently (I have it enabled to be prepared to hook up a USB=20 Mass Storage device, which never happend since the problem occured). The=20 ichsmb also is usually not queried. Any suggestions on how I could help? - Olli --=20 | Oliver Brandmueller | Offenbacher Str. 1 | Germany D-14197 Berlin | | Fon +49-172-3130856 | Fax +49-172-3145027 | WWW: http://the.addict.de/ | | Ich bin das Internet. Sowahr ich Gott helfe. | | Eine gewerbliche Nutzung aller enthaltenen Adressen ist nicht gestattet! | --IMjqdzrDRly81ofr Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (FreeBSD) iD8DBQFFGiUaiqtMdzjafykRAle5AJ9OQMWWJMEffZNYLN+z/JrI8OCphQCgxVaH jb9oTMzYrXEOBjvwenFkhtI= =GKtS -----END PGP SIGNATURE----- --IMjqdzrDRly81ofr--