Date: Fri, 17 Aug 2001 17:14:03 -0700 (PDT) From: John Baldwin <jhb@FreeBSD.org> To: John Baldwin <jhb@FreeBSD.org> Cc: alpha@FreeBSD.org, "David O'Brien" <obrien@FreeBSD.org>, imp@FreeBSD.org Subject: Re: today's kernel + JHB's trap.c patch is *evil* Message-ID: <XFMail.010817171403.jhb@FreeBSD.org> In-Reply-To: <XFMail.010817152540.jhb@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 17-Aug-01 John Baldwin wrote: > > On 17-Aug-01 David O'Brien wrote: >> On Thu, Aug 16, 2001 at 04:29:27PM -0700, John Baldwin wrote: >>> > Mounting root from ufs:/dev/da0a >>> > SMP: AP CPU #1 Launched! >>> > >>> > fatal kernel trap: >>> > >>> > trap entry = 0x2 (memory management fault) >>> > cpuid = 0 >>> > faulting va = 0x0 >>> >>> NULL pointer deref. >>> >>> > type = access violation >>> > cause = load instructon >>> > pc = 0xfffffc00003c3814 >>> >>> Do you have a debug kernel? If so, can you do 'gdb -k kernel.debug' and >>> then >>> do 'l *0xfffffc00003c3814'? >> >> 0xfffffc00003c3814 is in _mtx_unlock_sleep >> (../../../kern/kern_mutex.c:492). >> 487 >> 488 p1 = TAILQ_FIRST(&m->mtx_blocked); >> 489 MPASS(p->p_magic == P_MAGIC); >> 490 MPASS(p1->p_magic == P_MAGIC); >> 491 >> 492 TAILQ_REMOVE(&m->mtx_blocked, p1, p_procq); >> 493 >> 494 if (TAILQ_EMPTY(&m->mtx_blocked)) { >> 495 LIST_REMOVE(m, mtx_contested); >> 496 _release_lock_quick(m); > > Umm, ok. I'll have to try and reproduce this locally. The mutex claims to > be > contested but has no processes in its list of blocked processes. Ok, thinking about this some more, I'm guessing this might be a problem with mtx_owned() not doing a memory barrier. I want to look at the exact semantics of an aquire load on ia64 (which is what atomic_load_acq() is based on) and depending on that I will either change atomic_load_acq() to have another mb and use that inside of mtx_owned() (and in a few other places) or I will add new atomic_load() and atomic_store() functions that do the mb's sort of opposite of how the acq and rel versions do them. Until this gets done, alpha SMP has the potential for being very shaky, so I would recommend just doing UP kernels for now. I should probably go turn SMP off in GENERIC, and I'm cc'ing Warner so he can go add a note in UPDATING about this until it can be more properly fixed. As to why the 4100 hasn't seen this, it may be that the slower CPU's are preventing this race from happening, or that 21164's use a stricter memory ordering than 21264's. I've seen similar issues where my dual P3 600 would have problems when my dual PPro 200 wouldn't. -- John Baldwin <jhb@FreeBSD.org> -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-alpha" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?XFMail.010817171403.jhb>