Date: Sun, 11 Nov 2001 03:09:50 +0100 (CET) From: "Hartmann, O." <ohartman@klima.physik.uni-mainz.de> To: Tor.Egge@cvsup.no.freebsd.org Cc: freebsd-stable@freebsd.org Subject: Re: FBSD4.4-STABLE SMP broken! Message-ID: <20011111025632.K8104-100000@klima.physik.uni-mainz.de> In-Reply-To: <20011111003641B.tegge@cvsup.no.freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 11 Nov 2001 Tor.Egge@cvsup.no.freebsd.org wrote: Dear Tor egge. Well, it is definitely NOT the SMP kernel. The SMP kernel dies sometimes after several minutes or imemdiately after going into multiuser mode. I switched back to earlier code base as from 3rd November, but is has the same result. We have two SMP serverd equipted with AMI MegaRAID controllers, the faulty machine has a Enterprise 1600, the working one the smaller Elite 1600. The broken system is based on the TYAN 2500 mainboard with the LSI Logic 53C896 SCSI chipset, 64 Bit PCI slots and - the faulty factor, two NICs. Yesterday I upgraded the firmware of both RAID controllers from AMI firmware A159 and BIOS 3.11 to firmware F160 and BIOS 3.12. I think this caused the totally locked up machine. The faulty server has a built in Intel 89C559 chip based NIC __and__ a additional Intel Ether Express 100/Pro S Server NIC. This NIC was in use before the upgrade of the firmware was done. The machine is now running - but it is only stable when using the built in NIC - whenever the second NIC is configured and brought to state 'UP', the system crashes that way that it is locked up. I can see the console output and all the stuff, but it is dead, totaly dead. Well, I feel bad at this moment. The only 'fact' I thought to have approved to accuse the SMP kernel was the fact that the SMP kernel ran 1 minute and the UP kernel 3 minutes. But a UP kernel ran up to the point the Linux ABI has been loaded - and freezed. Same happend when booting into single user mode and then configure fxp0 (the server NIC, fxp1 is the built in NIC). I didn't do any tests with other NICs or changing the slot of the second NIC (from AGP slot counted on - the RAID is in slot 2, the NIC is in slot 3 (I left slot 1 spare between AGP and RAID). Maybe someone else has the same experiences made after upgrading the firmware ... :>> I did a cvsupdate today, the target machine was running :>> stable for the last four days. :>> :>> After the update and make world today, the system run a few minutes and then :>> get stuck ... no keyboard input, nothing, no network response. :> :>Sleeping while holding a simplelock can be fatal on an SMP machine. :>If the other CPU tries to obtain the simplelock then you get a hang. :>If the same CPU tries to obtain the simplelock then you get a panic. :> :>One example is vinvalbuf() holding the vnode interlock while calling :>vm_object_page_remove() (which might block), cf. PR 26224. :> :>For 4.4-STABLE, it might make sense to just define some simplelock :>operations to nops also under SMP. All relevant code is already :>serialized by mp_lock and further lock pushdown is unlikely to occur :>on that branch. The only effect of these simplelock operations under :>4.4-STABLE SMP is a hang or panic when the locking protocol is :>violated. :> :>The enclosed patch will break 3rd party device drivers that use fast :>interrupts and simple_lock() to serialize access to shared resources. :>I don't know about any such device drivers. :> :>You'll need to add :> :> options SIMPLELOCK_NULL :> :>to the relevant kernel config file and recompile the kernel for the :>patch to have any effect. :> :>- Tor Egge :> :> -- MfG O. Hartmann ohartman@klima.physik.uni-mainz.de ---------------------------------------------------------------- IT-Administration des Institutes fuer Physik der Atmosphaere (IPA) ---------------------------------------------------------------- Johannes Gutenberg Universitaet Mainz Becherweg 21 55099 Mainz Tel: +496131/3924662 (Maschinenraum) Tel: +496131/3924144 FAX: +496131/3923532 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20011111025632.K8104-100000>