Date: Tue, 28 Jul 2009 17:12:55 +0100 From: Anton Shterenlikht <mexas@bristol.ac.uk> To: Marcel Moolenaar <xcllnt@mac.com> Cc: freebsd-ia64@freebsd.org Subject: Re: FreeBSD 8.0-BETA2/amd64 crashes on SMP under load Message-ID: <20090728161255.GA38375@mech-cluster241.men.bris.ac.uk> In-Reply-To: <E5F6DAD4-43C0-47DF-ACB1-324D3B15EC79@mac.com> References: <4A6DB30B.20705@zedat.fu-berlin.de> <4A6DB9F1.7050404@haruhiism.net> <4A6E0620.6070200@mail.zedat.fu-berlin.de> <20090727210428.GA30253@mech-cluster241.men.bris.ac.uk> <20090728103545.GA22380@mech-cluster241.men.bris.ac.uk> <4A6F09BA.2020703@zedat.fu-berlin.de> <20090728144555.GD75439@mech-cluster241.men.bris.ac.uk> <E5F6DAD4-43C0-47DF-ACB1-324D3B15EC79@mac.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Jul 28, 2009 at 08:34:52AM -0700, Marcel Moolenaar wrote: > > On Jul 28, 2009, at 7:45 AM, Anton Shterenlikht wrote: > > > On Tue, Jul 28, 2009 at 02:22:50PM +0000, O. Hartmann wrote: > >> Anton Shterenlikht wrote: > >>> On Mon, Jul 27, 2009 at 10:04:28PM +0100, Anton Shterenlikht wrote: > >>>> On Mon, Jul 27, 2009 at 09:55:12PM +0200, O. Hartmann wrote: > >>>>> Kamigishi Rei wrote: > >>>>>> O. Hartmann wrote: > >>>>>>> I have the problem of crashing FreeBSD 8.0-BETA2/amd64 under > >>>>>>> load on > >>>>>>> all of our SMP boxes. Is there an issue known at the moment? > >>>>>>> If not, I > >>>>>>> will prepare the kernel for whitnessing and provide more > >>>>>>> informations, > >>>>>>> if you wish. > >>>>>> A quick question: what is in the crash message, i.e. the > >>>>>> backtrace? > >>>>>> And what kind of crash is it - a panic() or a fatal trap? > >>>>> On the 8-core server box, I sometimes see : > >>>>> > >>>>> Fatal trap 12: page fault while in kernel mode > >>>>> fault code = supervisor read, page not present > >>>> Not sure if it's related, but on ia64 SMP (2 cpus) with 8.0- > >>>> current and > >>>> later with 8.0-beta1 (I havent' built beta2 yet) I'm getting > >>>> crashes > >>>> under load every so often. E.g buildworld -j8 is likely to crash > >>>> the > >>>> box. No messages, just a sudden freeze, no backtrace or panic, > >>>> and then reboot. > >>>> > >>>> If load is less heavy, e.g. fewer processes and some idle time, the > >>>> problem doesn't seem to appear. > >>>> > >>>> I'm happy to do any further testing, if suggested. > >>> > >>> my ia64 8.0-beta1 SMP box died again on > >>> make -j8 buildworld > >>> with no panic or log entries. > >>> > >>> Is it possible that some kernel variable needs to > >>> be increased? E.g. kern.maxproc, kern.maxfiles, etc. > >>> Or perhaps I'm talking complete rubbish.. > >>> > >> > >> I suggest you try again with a UP kernel - a suggestion from a > >> kernel-nnob, sorry. My SMP boxes work now with UP-kernel, but they > >> are > >> really slowish although they have modern Intel C2D/Penryn cores. > > > > I need SMP for OpenMP codes. It's a shame if SMP is buggy, but > > I guess all is down to small user base.. > > I have no problems with SMP. If you don't have a panic, then > you may have a hardware problem. yes.. I thought of this myself. I guess I ought to check the Event Logs available from MP on rx2600. But those messages are so cryptic.. > Check for MCA records. # mca mca: no error records found # sysctl hw.mca hw.mca.last: 0 hw.mca.first: 0 hw.mca.count: 0 Faulty DIMMs, as you've suggested, would explain a lot of my problems.. many thanks -- Anton Shterenlikht Room 2.6, Queen's Building Mech Eng Dept Bristol University University Walk, Bristol BS8 1TR, UK Tel: +44 (0)117 928 8233 Fax: +44 (0)117 929 4423
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20090728161255.GA38375>