From owner-freebsd-alpha Fri Aug 17 17:14: 7 2001 Delivered-To: freebsd-alpha@freebsd.org Received: from mail11.speakeasy.net (mail11.speakeasy.net [216.254.0.211]) by hub.freebsd.org (Postfix) with ESMTP id F19E737B410 for ; Fri, 17 Aug 2001 17:14:01 -0700 (PDT) (envelope-from jhb@FreeBSD.org) Received: (qmail 6078 invoked from network); 18 Aug 2001 00:14:00 -0000 Received: from unknown (HELO laptop.baldwin.cx) ([64.81.54.73]) (envelope-sender ) by mail11.speakeasy.net (qmail-ldap-1.03) with SMTP for ; 18 Aug 2001 00:14:00 -0000 Message-ID: X-Mailer: XFMail 1.4.0 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: Date: Fri, 17 Aug 2001 17:14:03 -0700 (PDT) From: John Baldwin To: John Baldwin Subject: Re: today's kernel + JHB's trap.c patch is *evil* Cc: alpha@FreeBSD.org, "David O'Brien" , imp@FreeBSD.org Sender: owner-freebsd-alpha@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org On 17-Aug-01 John Baldwin wrote: > > On 17-Aug-01 David O'Brien wrote: >> On Thu, Aug 16, 2001 at 04:29:27PM -0700, John Baldwin wrote: >>> > Mounting root from ufs:/dev/da0a >>> > SMP: AP CPU #1 Launched! >>> > >>> > fatal kernel trap: >>> > >>> > trap entry = 0x2 (memory management fault) >>> > cpuid = 0 >>> > faulting va = 0x0 >>> >>> NULL pointer deref. >>> >>> > type = access violation >>> > cause = load instructon >>> > pc = 0xfffffc00003c3814 >>> >>> Do you have a debug kernel? If so, can you do 'gdb -k kernel.debug' and >>> then >>> do 'l *0xfffffc00003c3814'? >> >> 0xfffffc00003c3814 is in _mtx_unlock_sleep >> (../../../kern/kern_mutex.c:492). >> 487 >> 488 p1 = TAILQ_FIRST(&m->mtx_blocked); >> 489 MPASS(p->p_magic == P_MAGIC); >> 490 MPASS(p1->p_magic == P_MAGIC); >> 491 >> 492 TAILQ_REMOVE(&m->mtx_blocked, p1, p_procq); >> 493 >> 494 if (TAILQ_EMPTY(&m->mtx_blocked)) { >> 495 LIST_REMOVE(m, mtx_contested); >> 496 _release_lock_quick(m); > > Umm, ok. I'll have to try and reproduce this locally. The mutex claims to > be > contested but has no processes in its list of blocked processes. Ok, thinking about this some more, I'm guessing this might be a problem with mtx_owned() not doing a memory barrier. I want to look at the exact semantics of an aquire load on ia64 (which is what atomic_load_acq() is based on) and depending on that I will either change atomic_load_acq() to have another mb and use that inside of mtx_owned() (and in a few other places) or I will add new atomic_load() and atomic_store() functions that do the mb's sort of opposite of how the acq and rel versions do them. Until this gets done, alpha SMP has the potential for being very shaky, so I would recommend just doing UP kernels for now. I should probably go turn SMP off in GENERIC, and I'm cc'ing Warner so he can go add a note in UPDATING about this until it can be more properly fixed. As to why the 4100 hasn't seen this, it may be that the slower CPU's are preventing this race from happening, or that 21164's use a stricter memory ordering than 21264's. I've seen similar issues where my dual P3 600 would have problems when my dual PPro 200 wouldn't. -- John Baldwin -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-alpha" in the body of the message