Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 11 Nov 2001 02:56:13 +0100 (CET)
From:      "Hartmann, O." <ohartman@klima.physik.uni-mainz.de>
To:        Kris Kennaway <kris@obsecurity.org>
Cc:        freebsd-stable@FreeBSD.ORG
Subject:   Was: Re: FBSD4.4-STABLE SMP broken! 
Message-ID:  <20011111023209.V8104-100000@klima.physik.uni-mainz.de>
In-Reply-To: <20011110110055.A92378@xor.obsecurity.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 10 Nov 2001, Kris Kennaway wrote:

Hello.

Well, Kris is right - and he's not right.

After more than 14 hours of testing and trying to figure out what's
going wrong, I caught the problem.

Today, I cvsupdated the recent sources and built world on all machines.
Yesterday evening I got a new firmware Image from LSI Logic for
the AMI MegaRAID Enterprise 1600 and Elite 1600. As I wrote, we use
two machines with these RAID controllers equipted. Both systems run
now firmware revision 160, BIOS 3.12 on the RAID controlers.

The broken machine is a TYAN Thunder 2500 based system (64 Bit PCI slots),
the still working one is based on ASUS CUV4X-D.

The still working system, 4.4-STABLE, has 1GB memory, a Intel EtherExpress
100/Pro S Server Adapter, a secondary Adaptec SCSI controller (as you can read
in dmesg output as shown below). This machine has nearly the same configuration
as the broken one.

The server locks up immediately or after a few minutes after it came up in
multiuser mode. When running in singleuser mode, there was no problem - up to
the time I tried to start the network!
This machine has a built in Intel based NIC - and an additional Intel Ether
Express 100/Pro S Server Adatper. The built in adatper wasn't used since
the other adapter has been placed in the machine.

The secondary, additional server NIC is fxp0, the built in is fxp1.
Whenever I start networking, either from the singleuser mode by
using /etc/netstart or running the multiuser kernel using fxp0, which
is the secondary server NIC, the system locks up immediately or a few
minutes after being up.

I went back in the code base to date=2001.11.03.00.00.00, but that have
had no effect.

It seems that there is a serious problem with the AMI MegaRAID Enterprise 1600
adapter and the Intel Exther Express server NIC. With the old firmware, A159,
the second server NIC worked well. But I can not say this is a fault by the
hardware. At this moment, NIX fxp0 is down, but the kernel still knows about
this NIC. Whenever it comes to state 'UP', the system locks up - and is
not reachable by console or network.

Yes, Kris is right if he suspect our hardware - but it seems to be a general
'sickness' of FreeBSD in conjunction with highend chipsets like the Server
Works ServerSet III chipset. In the past their were lots of problems reported
here! I have no glue why this recommendation of a 'high end server mainboard'
has so many problems!

Next week I will do some more tests with other NICs and more NICs in the machine.
Maybe there is a FreeBSD problem with mor than one NIC.
On the other hand, it seems more likely that their is something strange happening
with the RAID controller since it's upgrade to firmware F160. But I'm a little
bit confused about the fact that there is no bad response from those using a
similar configuration like mine.


After the total lockup of the machine I tried to do several repairs of
suspected parts of the system
:>On Sat, Nov 10, 2001 at 05:25:20PM +0100, Hartmann, O. wrote:
:>> I did a cvsupdate today, the target machine was running
:>> stable for the last four days.
:>>
:>> After the update and make world today, the system run a few minutes and then
:>> get stuck ... no keyboard input, nothing, no network response.
:>
:>With all the unreproducible crashes and hangs you seem to have, I've
:>gotta wonder whether there's something wrong with your hardware.
:>
:>Kris
:>

--
MfG
O. Hartmann

ohartman@klima.physik.uni-mainz.de
----------------------------------------------------------------
IT-Administration des Institutes fuer Physik der Atmosphaere (IPA)
----------------------------------------------------------------
Johannes Gutenberg Universitaet Mainz
Becherweg 21
55099 Mainz

Tel: +496131/3924662 (Maschinenraum)
Tel: +496131/3924144
FAX: +496131/3923532


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20011111023209.V8104-100000>