Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 17 Aug 2001 17:14:03 -0700 (PDT)
From:      John Baldwin <jhb@FreeBSD.org>
To:        John Baldwin <jhb@FreeBSD.org>
Cc:        alpha@FreeBSD.org, "David O'Brien" <obrien@FreeBSD.org>, imp@FreeBSD.org
Subject:   Re: today's kernel + JHB's trap.c patch is *evil*
Message-ID:  <XFMail.010817171403.jhb@FreeBSD.org>
In-Reply-To: <XFMail.010817152540.jhb@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help

On 17-Aug-01 John Baldwin wrote:
> 
> On 17-Aug-01 David O'Brien wrote:
>> On Thu, Aug 16, 2001 at 04:29:27PM -0700, John Baldwin wrote:
>>> > Mounting root from ufs:/dev/da0a
>>> > SMP: AP CPU #1 Launched!
>>> > 
>>> > fatal kernel trap:
>>> > 
>>> >     trap entry     = 0x2 (memory management fault)
>>> >     cpuid          = 0
>>> >     faulting va    = 0x0
>>> 
>>> NULL pointer deref.
>>> 
>>> >     type           = access violation
>>> >     cause          = load instructon
>>> >     pc             = 0xfffffc00003c3814
>>> 
>>> Do you have a debug kernel?  If so, can you do 'gdb -k kernel.debug' and
>>> then
>>> do 'l *0xfffffc00003c3814'?
>> 
>> 0xfffffc00003c3814 is in _mtx_unlock_sleep
>> (../../../kern/kern_mutex.c:492).
>> 487     
>> 488             p1 = TAILQ_FIRST(&m->mtx_blocked);
>> 489             MPASS(p->p_magic == P_MAGIC);
>> 490             MPASS(p1->p_magic == P_MAGIC);
>> 491     
>> 492             TAILQ_REMOVE(&m->mtx_blocked, p1, p_procq);
>> 493     
>> 494             if (TAILQ_EMPTY(&m->mtx_blocked)) {
>> 495                     LIST_REMOVE(m, mtx_contested);
>> 496                     _release_lock_quick(m);
> 
> Umm, ok.  I'll have to try and reproduce this locally.  The mutex claims to
> be
> contested but has no processes in its list of blocked processes.

Ok, thinking about this some more, I'm guessing this might be a problem with
mtx_owned() not doing a memory barrier.  I want to look at the exact semantics
of an aquire load on ia64 (which is what atomic_load_acq() is based on) and
depending on that I will either change atomic_load_acq() to have another mb and
use that inside of mtx_owned() (and in a few other places) or I will add new
atomic_load() and atomic_store() functions that do the mb's sort of opposite of
how the acq and rel versions do them.  Until this gets done, alpha SMP has the
potential for being very shaky, so I would recommend just doing UP kernels for
now.  I should probably go turn SMP off in GENERIC, and I'm cc'ing Warner so he
can go add a note in UPDATING about this until it can be more properly fixed.
 
As to why the 4100 hasn't seen this, it may be that the slower CPU's are
preventing this race from happening, or that 21164's use a stricter memory
ordering than 21264's.  I've seen similar issues where my dual P3 600 would
have problems when my dual PPro 200 wouldn't.

-- 

John Baldwin <jhb@FreeBSD.org> -- http://www.FreeBSD.org/~jhb/
PGP Key: http://www.baldwin.cx/~john/pgpkey.asc
"Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-alpha" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?XFMail.010817171403.jhb>