Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 11 Nov 2001 03:09:50 +0100 (CET)
From:      "Hartmann, O." <ohartman@klima.physik.uni-mainz.de>
To:        Tor.Egge@cvsup.no.freebsd.org
Cc:        freebsd-stable@freebsd.org
Subject:   Re: FBSD4.4-STABLE SMP broken!
Message-ID:  <20011111025632.K8104-100000@klima.physik.uni-mainz.de>
In-Reply-To: <20011111003641B.tegge@cvsup.no.freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 11 Nov 2001 Tor.Egge@cvsup.no.freebsd.org wrote:

Dear Tor egge.

Well, it is definitely NOT the SMP kernel. The SMP kernel dies
sometimes after several minutes or imemdiately after going into
multiuser mode. I switched back to earlier code base as from
3rd November, but is has the same result.

We have two SMP serverd equipted with AMI MegaRAID controllers,
the faulty machine has a Enterprise 1600, the working one
the smaller Elite 1600. The broken system is based on the
TYAN 2500 mainboard with the LSI Logic 53C896 SCSI chipset,
64 Bit PCI slots and - the faulty factor, two NICs.
Yesterday I upgraded the firmware of both RAID controllers
from AMI firmware A159 and BIOS 3.11 to firmware F160 and
BIOS 3.12. I think this caused the totally locked up machine.
The faulty server has a built in Intel 89C559 chip based
NIC __and__ a additional Intel Ether Express 100/Pro S Server
NIC. This NIC was in use before the upgrade of the firmware was done.

The machine is now running - but it is only stable when using the
built in NIC - whenever the second NIC is configured and brought
to state 'UP', the system crashes that way that it is locked up.
I can see the console output and all the stuff, but it is dead, totaly
dead.

Well, I feel bad at this moment. The only 'fact' I thought to
have approved to accuse the SMP kernel was the fact that the SMP
kernel ran 1 minute and the UP kernel 3 minutes. But a UP kernel ran
up to the point the Linux ABI has been loaded - and freezed. Same
happend when booting into single user mode and then configure fxp0
(the server NIC, fxp1 is the built in NIC).

I didn't do any tests with other NICs or changing the slot of the
second NIC (from AGP slot counted on - the RAID is in slot 2, the
NIC is in slot 3 (I left slot 1 spare between AGP and RAID).

Maybe someone else has the same experiences made after upgrading the
firmware ...

:>> I did a cvsupdate today, the target machine was running
:>> stable for the last four days.
:>>
:>> After the update and make world today, the system run a few minutes and then
:>> get stuck ... no keyboard input, nothing, no network response.
:>
:>Sleeping while holding a simplelock can be fatal on an SMP machine.
:>If the other CPU tries to obtain the simplelock then you get a hang.
:>If the same CPU tries to obtain the simplelock then you get a panic.
:>
:>One example is vinvalbuf() holding the vnode interlock while calling
:>vm_object_page_remove() (which might block), cf. PR 26224.
:>
:>For 4.4-STABLE, it might make sense to just define some simplelock
:>operations to nops also under SMP.  All relevant code is already
:>serialized by mp_lock and further lock pushdown is unlikely to occur
:>on that branch.  The only effect of these simplelock operations under
:>4.4-STABLE SMP is a hang or panic when the locking protocol is
:>violated.
:>
:>The enclosed patch will break 3rd party device drivers that use fast
:>interrupts and simple_lock() to serialize access to shared resources.
:>I don't know about any such device drivers.
:>
:>You'll need to add
:>
:>       options SIMPLELOCK_NULL
:>
:>to the relevant kernel config file and recompile the kernel for the
:>patch to have any effect.
:>
:>- Tor Egge
:>
:>

--
MfG
O. Hartmann

ohartman@klima.physik.uni-mainz.de
----------------------------------------------------------------
IT-Administration des Institutes fuer Physik der Atmosphaere (IPA)
----------------------------------------------------------------
Johannes Gutenberg Universitaet Mainz
Becherweg 21
55099 Mainz

Tel: +496131/3924662 (Maschinenraum)
Tel: +496131/3924144
FAX: +496131/3923532


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20011111025632.K8104-100000>