Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 28 Jul 2009 09:48:39 -0700
From:      Marcel Moolenaar <xcllnt@mac.com>
To:        Anton Shterenlikht <mexas@bristol.ac.uk>
Cc:        freebsd-ia64@freebsd.org
Subject:   Re: FreeBSD 8.0-BETA2/amd64 crashes on SMP under load
Message-ID:  <0C915045-02B9-4E88-8FA8-B2405B8712A6@mac.com>
In-Reply-To: <20090728161255.GA38375@mech-cluster241.men.bris.ac.uk>
References:  <4A6DB30B.20705@zedat.fu-berlin.de> <4A6DB9F1.7050404@haruhiism.net> <4A6E0620.6070200@mail.zedat.fu-berlin.de> <20090727210428.GA30253@mech-cluster241.men.bris.ac.uk> <20090728103545.GA22380@mech-cluster241.men.bris.ac.uk> <4A6F09BA.2020703@zedat.fu-berlin.de> <20090728144555.GD75439@mech-cluster241.men.bris.ac.uk> <E5F6DAD4-43C0-47DF-ACB1-324D3B15EC79@mac.com> <20090728161255.GA38375@mech-cluster241.men.bris.ac.uk>

next in thread | previous in thread | raw e-mail | index | archive | help

On Jul 28, 2009, at 9:12 AM, Anton Shterenlikht wrote:

> On Tue, Jul 28, 2009 at 08:34:52AM -0700, Marcel Moolenaar wrote:
>>
>> On Jul 28, 2009, at 7:45 AM, Anton Shterenlikht wrote:
>>
>>> On Tue, Jul 28, 2009 at 02:22:50PM +0000, O. Hartmann wrote:
>>>> Anton Shterenlikht wrote:
>>>>> On Mon, Jul 27, 2009 at 10:04:28PM +0100, Anton Shterenlikht  
>>>>> wrote:
>>>>>> On Mon, Jul 27, 2009 at 09:55:12PM +0200, O. Hartmann wrote:
>>>>>>> Kamigishi Rei wrote:
>>>>>>>> O. Hartmann wrote:
>>>>>>>>> I have the problem of crashing FreeBSD 8.0-BETA2/amd64 under
>>>>>>>>> load on
>>>>>>>>> all of our SMP boxes. Is there an issue known at the moment?
>>>>>>>>> If not, I
>>>>>>>>> will prepare the kernel for whitnessing and provide more
>>>>>>>>> informations,
>>>>>>>>> if you wish.
>>>>>>>> A quick question: what is in the crash message, i.e. the
>>>>>>>> backtrace?
>>>>>>>> And what kind of crash is it - a panic() or a fatal trap?
>>>>>>> On the 8-core server box, I sometimes see :
>>>>>>>
>>>>>>> Fatal trap 12: page fault while in kernel mode
>>>>>>> fault code              = supervisor read, page not present
>>>>>> Not sure if it's related, but on ia64 SMP (2 cpus) with 8.0-
>>>>>> current and
>>>>>> later with 8.0-beta1 (I havent' built beta2 yet) I'm getting
>>>>>> crashes
>>>>>> under load every so often. E.g buildworld -j8 is likely to crash
>>>>>> the
>>>>>> box. No messages, just a sudden freeze, no backtrace or panic,
>>>>>> and then reboot.
>>>>>>
>>>>>> If load is less heavy, e.g. fewer processes and some idle time,  
>>>>>> the
>>>>>> problem doesn't seem to appear.
>>>>>>
>>>>>> I'm happy to do any further testing, if suggested.
>>>>>
>>>>> my ia64 8.0-beta1 SMP box died again on
>>>>> make -j8 buildworld
>>>>> with no panic or log entries.
>>>>>
>>>>> Is it possible that some kernel variable needs to
>>>>> be increased? E.g. kern.maxproc, kern.maxfiles, etc.
>>>>> Or perhaps I'm talking complete rubbish..
>>>>>
>>>>
>>>> I suggest you try again with a UP kernel - a suggestion from a
>>>> kernel-nnob, sorry. My SMP boxes work now with UP-kernel, but they
>>>> are
>>>> really slowish although they have modern Intel C2D/Penryn cores.
>>>
>>> I need SMP for OpenMP codes. It's a shame if SMP is buggy, but
>>> I guess all is down to small user base..
>>
>> I have no problems with SMP. If you don't have a panic, then
>> you may have a hardware problem.
>
> yes.. I thought of this myself. I guess I ought to check
> the Event Logs available from MP on rx2600. But those messages
> are so cryptic..

The event logs are also mostly useless IMO. They fill
up the log buffer and then cause warnings during the
boot. I typically use the errdump command in EFI to
see if there's anything fishy.

>
>> Check for MCA records.
>
> # mca
> mca: no error records found
>
> # sysctl hw.mca
> hw.mca.last: 0
> hw.mca.first: 0
> hw.mca.count: 0

Hmmmm... I don't know what to make of this yet. I've
always had MCAs when there were spontaneous reboots.
The lack of MCAs can point to something else than
hardware...

-- 
Marcel Moolenaar
xcllnt@mac.com






Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?0C915045-02B9-4E88-8FA8-B2405B8712A6>