Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 21 Jan 2011 01:23:57 +0100
From:      Rick van Hattem <Rick.van.Hattem@fawo.nl>
To:        freebsd-hardware@freebsd.org
Subject:   Regular garbled data on console and failing igb0
Message-ID:  <201101210124.02738.Rick.van.Hattem@fawo.nl>

next in thread | raw e-mail | index | archive | help
--nextPart1624433.Keim91nXfQ
Content-Type: multipart/mixed;
  boundary="Boundary-01=_eINON4LFaYyUEWI"
Content-Transfer-Encoding: 7bit


--Boundary-01=_eINON4LFaYyUEWI
Content-Type: text/plain;
  charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

Hi,

A couple of weeks ago I installed FreeBSD 8.2RC1 on a new machine (8.1 was=
=20
having issues with the raid card, since 8.2 is nearly final I figured... wh=
y=20
not). The machine has been running smoothly for a while, even while load-
testing the harddrives and network for more than 24 hours.

Since everything was running smoothly I decided to move one of the producti=
on=20
PostgreSQL databases to this machine. However... after a couple of hours I =
got=20
the following error from the iDRAC:
PCIE Fatal Err: Critical Event sensor, bus fatal error (Slot 3) was asserted

=46ollowed by a lot of garbled text in the console (see the full log in the=
=20
attachment) and immediately this message:
Jan 20 21:09:25 sh4 kernel: NMI ISA 30, EISA ff
Jan 20 21:09:25 sh4 kernel: NMI ... going to debugger
Jan 20 21:09:25 sh4 kernel: NMI ISA 30, EISA ff
Jan 20 21:09:25 sh4 kernel: NMI ... going to debugger
Jan 20 21:09:25 sh4 kernel: NMI ISA N2M0I, I ESIASNA  Mff2

=46ollowed by this:
Jan 20 21:09:38 sh4 kernel: igb0: Watchdog timeout -- resetting
Jan 20 21:09:38 sh4 kernel: igb0: Queue(0) tdh =3D 944, hw tdt =3D 945
Jan 20 21:09:38 sh4 kernel: igb0: TX(0) desc avail =3D 1023,Next TX to Clea=
n =3D=20
944
Jan 20 21:09:38 sh4 kernel: igb0: link state changed to DOWN
Jan 20 21:09:41 sh4 kernel: igb0: link state changed to UP

After which the lagg0 interface (which is using igb0 and igb1 as an lacp=20
trunk) marks the igb0 interface as down. After a while the second interface=
=20
got the same issue which caused the lagg0 interface to become non-functiona=
l=20
and the server unreachable.


This error looks quite a bit like the one talked about in this thread:=20
http://docs.freebsd.org/cgi/getmsg.cgi?fetch=3D81462+0+/usr/local/www/db/te=
xt/2010/freebsd-
net/20100801.freebsd-net
But the given solution there (disabling polling) won't help because I don't=
=20
even have device polling enabled in the kernel.


=46or the record, the machine regularly shows small amounts of garbled text=
 even=20
outside of these network interface crashes as can be seen in the "garbled.l=
og"=20
file. The real crash starts at 21:09:24 according to the iDRAC log.

My kernel config is mainly stock, some modules disabled.
DEVICE_POLLING is not enabled.
The garbled text should be caused by the print buffer since I do have=20
PRINTF_BUFR_SIZE=3D128 in the config.


Thanks in advance for any help.

~rick

--Boundary-01=_eINON4LFaYyUEWI--

--nextPart1624433.Keim91nXfQ
Content-Type: application/pgp-signature; name=signature.asc 
Content-Description: This is a digitally signed message part.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)

iEYEABECAAYFAk040h4ACgkQ25UAkonD6zxvxgCgz324PHg5lkIeNpqq/sbOzQ/A
9jYAoM4uTeb3uCkHO4u8jfR8IDbQZUCH
=Z0Sa
-----END PGP SIGNATURE-----

--nextPart1624433.Keim91nXfQ--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201101210124.02738.Rick.van.Hattem>