Date: Fri, 21 Jan 2011 01:23:57 +0100 From: Rick van Hattem <Rick.van.Hattem@fawo.nl> To: freebsd-hardware@freebsd.org Subject: Regular garbled data on console and failing igb0 Message-ID: <201101210124.02738.Rick.van.Hattem@fawo.nl>
next in thread | raw e-mail | index | archive | help
--nextPart1624433.Keim91nXfQ Content-Type: multipart/mixed; boundary="Boundary-01=_eINON4LFaYyUEWI" Content-Transfer-Encoding: 7bit --Boundary-01=_eINON4LFaYyUEWI Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Hi, A couple of weeks ago I installed FreeBSD 8.2RC1 on a new machine (8.1 was= =20 having issues with the raid card, since 8.2 is nearly final I figured... wh= y=20 not). The machine has been running smoothly for a while, even while load- testing the harddrives and network for more than 24 hours. Since everything was running smoothly I decided to move one of the producti= on=20 PostgreSQL databases to this machine. However... after a couple of hours I = got=20 the following error from the iDRAC: PCIE Fatal Err: Critical Event sensor, bus fatal error (Slot 3) was asserted =46ollowed by a lot of garbled text in the console (see the full log in the= =20 attachment) and immediately this message: Jan 20 21:09:25 sh4 kernel: NMI ISA 30, EISA ff Jan 20 21:09:25 sh4 kernel: NMI ... going to debugger Jan 20 21:09:25 sh4 kernel: NMI ISA 30, EISA ff Jan 20 21:09:25 sh4 kernel: NMI ... going to debugger Jan 20 21:09:25 sh4 kernel: NMI ISA N2M0I, I ESIASNA Mff2 =46ollowed by this: Jan 20 21:09:38 sh4 kernel: igb0: Watchdog timeout -- resetting Jan 20 21:09:38 sh4 kernel: igb0: Queue(0) tdh =3D 944, hw tdt =3D 945 Jan 20 21:09:38 sh4 kernel: igb0: TX(0) desc avail =3D 1023,Next TX to Clea= n =3D=20 944 Jan 20 21:09:38 sh4 kernel: igb0: link state changed to DOWN Jan 20 21:09:41 sh4 kernel: igb0: link state changed to UP After which the lagg0 interface (which is using igb0 and igb1 as an lacp=20 trunk) marks the igb0 interface as down. After a while the second interface= =20 got the same issue which caused the lagg0 interface to become non-functiona= l=20 and the server unreachable. This error looks quite a bit like the one talked about in this thread:=20 http://docs.freebsd.org/cgi/getmsg.cgi?fetch=3D81462+0+/usr/local/www/db/te= xt/2010/freebsd- net/20100801.freebsd-net But the given solution there (disabling polling) won't help because I don't= =20 even have device polling enabled in the kernel. =46or the record, the machine regularly shows small amounts of garbled text= even=20 outside of these network interface crashes as can be seen in the "garbled.l= og"=20 file. The real crash starts at 21:09:24 according to the iDRAC log. My kernel config is mainly stock, some modules disabled. DEVICE_POLLING is not enabled. The garbled text should be caused by the print buffer since I do have=20 PRINTF_BUFR_SIZE=3D128 in the config. Thanks in advance for any help. ~rick --Boundary-01=_eINON4LFaYyUEWI-- --nextPart1624433.Keim91nXfQ Content-Type: application/pgp-signature; name=signature.asc Content-Description: This is a digitally signed message part. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) iEYEABECAAYFAk040h4ACgkQ25UAkonD6zxvxgCgz324PHg5lkIeNpqq/sbOzQ/A 9jYAoM4uTeb3uCkHO4u8jfR8IDbQZUCH =Z0Sa -----END PGP SIGNATURE----- --nextPart1624433.Keim91nXfQ--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201101210124.02738.Rick.van.Hattem>